When we’re busy, it may at times be easy to forget certain things that really shouldn’t be forgotten. One of the things considered most important in this business is data. You can’t forget about the importance of regularly controlling the database and the much vital health of its environment. A recommendation is to monitor this, and to have regular check-ups to see that the monitoring truly works. New data is written practically all the time to the database tables, as a result, these tables usually become bigger and bigger. If you’re not in control of their growth, this could lead to performance issues in the long run; this could be in form of long loading times for users, scripts, API-requests and robots. Various other resources like CPU, memory and hard drives are also affected by this as the data grows.
A Health Check on the Database
As the tables in the databases grow larger and larger, the importance of controlling size and growth rate increases. For table-size and the number of rows, it’s easy to follow the growth by, for example, checking DATA_LENGTH and TABLE_ROWS in INFORMATION_SCHEMA.TABLES. This will give you a feel for what the current situation is and whether there are any anomalies. For example, is it reasonable to have close to 100 million rows in one table? Can the size be argued for by saying that the data is actually used, or should a cleansing be considered? It is even more interesting to consider how reasonable the number of rows is in a table with very frequently used data. Doing so, you will obtain good information about the effect large tables have on performance with MySQL’s own slow-query-logger (more about this further down). Regarding the growth rate, you can of course make graphs with help from extracted data from the above mentioned table at regular time intervals. This way you’ll get a feel for whether the growth rate increases when e.g. new functionality is introduced.
To ensure the health of the database tables it’s also good to verify that e.g. CHECK TABLE is actually performed on the tables. You certainly want to avoid taking backups on corrupt/broken tables.
If you find this tricky, feel free to discuss this with us at Netadmin. We’re careful to make sure that the database health in your system is good – both for functional purposes and user experience!
Health Checks from an Environmental and Performance Perspective
Apart from the database itself, it’s also important that the database is operating in an environment where surrounding factors and resources are also healthy and regularly monitored. Under-dimensioned resources can otherwise affect the performance. Some examples of things that could be sensible to monitor are:
- CPU usage/Average Load: It’s interesting to watch the trends for CPU/loadavg to determine if the available CPU performance is enough for the growing data handled by the system.
- Hard drive usage: Monitoring the hard drive usage is certainly very important; when the disc storage is full, it’s likely that many functions both in applications and the OS start having issues operating normally.
- Hard drive I/O: This depends a little bit on what the hard drive set-up looks like. Can the hard drive keep up with reading and writing or just barely complete the task? It may also be interesting to see what SMART has to say about the hard drives.
- Memory usage: How is the memory usage looking when the most memory-demanding functions are running? Is it borderline?
- Network usage: This may also be a point of interest. Monitor this to discover anomalies in transferred data per time unit (especially when very little data is being transferred). This could be very useful information to have when troubleshooting.
- IIS: How’s the response time looking? Are there requests that are hanging for unreasonably long that need to be checked on? Do these requests have a direct connection to the related issues in the database? Perhaps during certain hours of the day?
The Performance Perspective
Large database tables, especially those frequently used, can have a big effect on the general experience of the system’s performance. As mentioned in the previous subheading, it’s definitely a good idea to keep the tables to a healthy size, for precautional purposes. However, it could be difficult to know where the line is drawn at times – how many rows/GB are reasonable? If you monitor the table locks, this could help you figure out which tables that need overseeing and perhaps a bit cleansing in order to attain better performance. Locks for this type of tables are normal for MySQL, but, granted, there are limits to what’s reasonable concerning locking times and waiting times. In order to have a good overview of what the lock-situation is, you can choose from different methods, depending on what level you choose. Some alternatives are:
- Create your own surveillance with for example “SHOW FULL PROCESSLIST” and handle the results as you wish.
- Fetch the global status value for “Slow_queries” (which is a counter) with a periodic schedule for seeing trends and tops (for example per hour or per day, that exceeds specified “long_query_time”).
- Activate MySQL’s built-in slow-query-logger, which is normally set to log locks over 10 seconds. Remember that this logger can also, to a certain extent and depending on the queries, affect the performance when it’s activated.
If you would like a good compilation of MySQL’s slow-query-log, you can take a look at Percona’s “pt-query-digest", which gives you a good overview of the worst performance thieves with a point system. If you’re viewing the log and you notice that that there are locks every day at, say, 2PM, it may be possible to trace the lock to a customer specific scheduled task/job. If possible, it’s better to schedule jobs for out-of-office hours to minimize the effect on system users. In cases where it’s not possible to reschedule jobs, it could be time to oversee the size of the tables that’s causing the long locks. When you’ve analyzed the tables and you have determined that there’s no more data to sort away, there are some options to consider:
- Can the functionality that’s causing the heavy database queries be rewritten or optimized and adjusted to the present or future datasets?
- Can the locks be traced to non-product related tables? In that case, an overseeing of the indexes could be helpful.
- Is it time to consider a Master/Slave set-up for the database? If the function behind the locks can be redirected to a Slave database, this will of course have a big and positive effect on the Master database. Another huge advantage with the M/S set-up is that you can complete a database backup without affecting the production environment. On the other hand, a M/S set-up requires some maintenance, but the advantages often outweigh the disadvantages if used in the right way.
The purpose of this article is to remind you of how important it is that your data and surrounding environment stay healthy, as it is such a crucial part of your business. This includes that the system remains responsive with healthy datasets, but also precautions for possible future issues.
Perhaps you don’t need to begin working on all the different controls at once, but it is better to get going with the most important ones if you aren’t doing much monitoring today. Some monitoring and scheduled/regular controls are better than none at all. A strong recommendation is also to do more than just monitor – it’s obviously a great advantage if you receive alarms on exceeded specified thresholds as well! Regularly verify that the monitoring and alarms truly work.
An Application Management commitment at Netadmin can help you with much of the above, as well as many other important improving and preventative precautions. Please contact us at Netadmin if you’re interested in receiving help with the things described here, amongst other related things.