IT in clouds

Thursday, April 7, 2022

Capacity Planning for MySQL and MariaDB - Dimensioning Storage Size

Ref: https://severalnines.com/database-blog/capacity-planning-mysql-and-mariadb-dimensioning-storage-size

Server manufacturers and cloud providers offer different kinds of storage solutions to cater for your database needs. When buying a new server or choosing a cloud instance to run our database, we often ask ourselves - how much disk space should we allocate? As we will find out, the answer is not trivial as there are a number of aspects to consider. Disk space is something that has to be thought of upfront, because shrinking and expanding disk space can be a risky operation for a disk-based database.

In this blog post, we are going to look into how to initially size your storage space, and then plan for capacity to support the growth of your MySQL or MariaDB database.

How MySQL Utilizes Disk Space

MySQL stores data in files on the hard disk under a specific directory that has the system variable "datadir". The contents of the datadir will depend on the MySQL server version, and the loaded configuration parameters and server variables (e.g., general_log, slow_query_log, binary log).

The actual storage and retrieval information is dependent on the storage engines. For the MyISAM engine, a table's indexes are stored in the .MYI file, in the data directory, along with the .MYD and .frm files for the table. For InnoDB engine, the indexes are stored in the tablespace, along with the table. If innodb_file_per_table option is set, the indexes will be in the table's .ibd file along with the .frm file. For the memory engine, the data are stored in the memory (heap) while the structure is stored in the .frm file on disk. In the upcoming MySQL 8.0, the metadata files (.frm, .par, dp.opt) are removed with the introduction of the new data dictionary schema.

It's important to note that if you are using InnoDB shared tablespace for storing table data (innodb_file_per_table=OFF), your MySQL physical data size is expected to grow continuously even after you truncate or delete huge rows of data. The only way to reclaim the free space in this configuration is to export, delete the current databases and re-import them back via mysqldump. Thus, it's important to set innodb_file_per_table=ON if you are concerned about the disk space, so when truncating a table, the space can be reclaimed. Also, with this configuration, a huge DELETE operation won't free up the disk space unless OPTIMIZE TABLE is executed afterward.

MySQL stores each database in its own directory under the "datadir" path. In addition, log files and other related MySQL files like socket and PID files, by default, will be created under datadir as well. For performance and reliability reason, it is recommended to store MySQL log files on a separate disk or partition - especially the MySQL error log and binary logs.

Database Size Estimation

The basic way of estimating size is to find the growth ratio between two different points in time, and then multiply that with the current database size. Measuring your peak-hours database traffic for this purpose is not the best practice, and does not represent your database usage as a whole. Think about a batch operation or a stored procedure that runs at midnight, or once a week. Your database could potentially grow significantly in the morning, before possibly being shrunk by a housekeeping operation at midnight.

One possible way is to use our backups as the base element for this measurement. Physical backup like Percona Xtrabackup, MariaDB Backup and filesystem snapshot would produce a more accurate representation of your database size as compared to logical backup, since it contains the binary copy of the database and indexes. Logical backup like mysqldump only stores SQL statements that can be executed to reproduce the original database object definitions and table data. Nevertheless, you can still come out with a good growth ratio by comparing mysqldump backups.

We can use the following formula to estimate the database size:

Where,

Bⁿ - Current week full backup size,
B^n-1 - Previous week full backup size,
Db_data - Total database data size,
Db_index - Total database index size,
52 - Number of weeks in a year,
Y - Year.

The total database size (data and indexes) in MB can be calculated by using the following statements:

mysql> SELECT ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) "DB Size in MB" FROM information_schema.tables;
+---------------+
| DB Size in MB |
+---------------+
|       2013.41 |
+---------------+

The above equation can be modified if you would like to use the monthly backups instead. Change the constant value of 52 to 12 (12 months in a year) and you are good to go.

Also, don't forget to account for innodb_log_file_size x 2, innodb_data_file_path and for Galera Cluster, add gcache.size value.

Binary Logs Size Estimation

Binary logs are generated by the MySQL master for replication and point-in-time recovery purposes. It is a set of log files that contain information about data modifications made on the MySQL server. The size of the binary logs depends on the number of write operations and the binary log format - STATEMENT, ROW or MIXED. Statement-based binary log are usually much smaller as compared to row-based binary log, because it only consists of the write statements while the row-based consists of modified rows information.

The best way to estimate the maximum disk usage of binary logs is to measure the binary log size for a day and multiply it with the expire_logs_days value (default is 0 - no automatic removal). It's important to set expire_logs_days so you can estimate the size correctly. By default, each binary log is capped around 1GB before MySQL rotates the binary log file. We can use a MySQL event to simply flush the binary log for the purpose of this estimation.

Firstly, make sure event_scheduler variable is enabled:

1	`mysql>` `SET` `GLOBAL` `event_scheduler =` `ON;`

Then, as a privileged user (with EVENT and RELOAD privileges), create the following event:

mysql> USE mysql;
mysql> CREATE EVENT flush_binlog
ON SCHEDULE EVERY 1 HOUR STARTS CURRENT_TIMESTAMP ENDS CURRENT_TIMESTAMP + INTERVAL 2 HOUR
COMMENT 'Flush binlogs per hour for the next 2 hours'
DO FLUSH BINARY LOGS;

For a write-intensive workload, you probably need to shorten down the interval to 30 minutes or 10 minutes before the binary log reaches 1GB maximum size, then round the output up to an hour. Then verify the status of the event by using the following statement and look at the LAST_EXECUTED column:

mysql> SELECT * FROM information_schema.events WHERE event_name='flush_binlog'\G
       ...
       LAST_EXECUTED: 2018-04-05 13:44:25
       ...

Then, take a look at the binary logs we have now:

mysql> SHOW BINARY LOGS;
+---------------+------------+
| Log_name      | File_size  |
+---------------+------------+
| binlog.000001 |        146 |
| binlog.000002 | 1073742058 |
| binlog.000003 | 1073742302 |
| binlog.000004 | 1070551371 |
| binlog.000005 | 1070254293 |
| binlog.000006 |  562350055 | <- hour #1
| binlog.000007 |  561754360 | <- hour #2
| binlog.000008 |  434015678 |
+---------------+------------+

We can then calculate the average of our binary logs growth which is around ~562 MB per hour during peak hours. Multiply this value with 24 hours and the expire_logs_days value:

mysql> SELECT (562 * 24 * @@expire_logs_days);
+---------------------------------+
| (562 * 24 * @@expire_logs_days) |
+---------------------------------+
|                           94416 |
+---------------------------------+

We will get 94416 MB which is around ~95 GB of disk space for our binary logs. Slave's relay logs are basically the same as the master's binary logs, except that they are stored on the slave side. Therefore, this calculation also applies to the slave relay logs.

Spindle Disk or Solid State?

There are two types of I/O operations on MySQL files:

Sequential I/O-oriented files:
- InnoDB system tablespace (ibdata)
- MySQL log files:
  - Binary logs (binlog.xxxx)
  - REDO logs (ib_logfile*)
  - General logs
  - Slow query logs
  - Error log
Random I/O-oriented files:
- InnoDB file-per-table data file (*.ibd) with innodb_file_per_table=ON (default).

Consider placing random I/O-oriented files in a high throughput disk subsystem for best performance. This could be flash drive - either SSDs or NVRAM card, or high RPM spindle disks like SAS 15K or 10K, with hardware RAID controller and battery-backed unit. For sequential I/O-oriented files, storing on HDD with battery-backed write-cache should be good enough for MySQL. Take note that performance degradation is likely if the battery is dead.

We will cover this area (estimating disk throughput and file allocation) in a separate post.

Capacity Planning and Dimensioning

Capacity planning can help us build a production database server with enough resources to survive daily operations. We must also provision for unexpected needs, account for future storage and disk throughput needs. Thus, capacity planning is important to ensure the database has enough room to breath until the next hardware refresh cycle.

It's best to illustrate this with an example. Considering the following scenario:

Next hardware cycle: 3 years
Current database size: 2013 MB
Current full backup size (week N): 1177 MB
Previous full backup size (week N-1): 936 MB
Delta size: 241MB per week
Delta ratio: 25.7% increment per week
Total weeks in 3 years: 156 weeks
Total database size estimation: ((1177 - 936) x 2013 x 156)/936 = 80856 MB ~ 81 GB after 3 years

If you are using binary logs, sum it up from the value we got in the previous section:

81 + 95 = 176 GB of storage for database and binary logs.

Add at least 100% more room for operational and maintenance tasks (local backup, data staging, error log, operating system files, etc):

176 + 176 = 352 GB of total disk space.

Based on this estimation, we can conclude that we would need at least 352 GB of disk space for our database for 3 years. You can use this value to justify your new hardware purchase. For example, if you want to buy a new dedicated server, you could opt for 6 x 128 SSD RAID 10 with battery-backed RAID controller which will give you around 384 GB of total disk space. Or, if you prefer cloud, you could get 100GB of block storage with provisioned IOPS for our 81GB database usage and use the standard persistent block storage for our 95GB binary logs and other operational usage.

Happy dimensioning!

Monday, January 17, 2022

OpenStack: Add Host entry in dnsmasq

Ex: Setup for hostname "openstack" to ip "10.0.0.1".

Create file /etc/neutron/dnsmasq-neutron.conf with below content in it.

address=/openstack/10.0.0.1

Edit /etc/neutron/dhcp_agent.ini and add below.

dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf

Then kill all existing dnsmasq processes and restart dhcp-agent. Or reboot the network node.

# service neutron-dhcp-agent restart

OpenStack: Setup MTU for VM instance's network

Within the VM instance, set the network interface MTU to 1400 bytes.

To set the MTU automatically to 1400 by neutron dhcp server, follow below steps. This is achieved by dhcp server sending out 1400 MTU to instances as a dhcp option.

Create file /etc/neutron/dnsmasq-neutron.conf with below content in it.

dhcp-option-force=26,1400

Edit /etc/neutron/dhcp_agent.ini and add below.

dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf

Then kill all existing dnsmasq processes and restart dhcp-agent. Or reboot the network node.

# service neutron-dhcp-agent restart

Tuesday, January 11, 2022

Openstack: Launch an instance with heat

Ref: https://docs.openstack.org/heat/latest/install/launch-instance.html

Create a template¶

The Orchestration service uses templates to describe stacks. To learn about the template language, see the Template Guide.

Create the demo-template.yml file with the following content:

heat_template_version: 2015-10-15
description: Launch a basic instance with CirrOS image using the
             ``m1.tiny`` flavor, ``mykey`` key,  and one network.

parameters:
  NetID:
    type: string
    description: Network ID to use for the instance.

resources:
  server:
    type: OS::Nova::Server
    properties:
      image: cirros
      flavor: m1.tiny
      key_name: mykey
      networks:
      - network: { get_param: NetID }

outputs:
  instance_name:
    description: Name of the instance.
    value: { get_attr: [ server, name ] }
  instance_ip:
    description: IP address of the instance.
    value: { get_attr: [ server, first_address ] }

Create a stack¶

Create a stack using the demo-template.yml template.

Source the demo credentials to perform the following steps as a non-administrative project:
```
$ . demo-openrc
```

Determine available networks.

$ openstack network list
+--------------------------------------+-------------+--------------------------------------+
| ID                                   | Name        | Subnets                              |
+--------------------------------------+-------------+--------------------------------------+
| 4716ddfe-6e60-40e7-b2a8-42e57bf3c31c | selfservice | 2112d5eb-f9d6-45fd-906e-7cabd38b7c7c |
| b5b6993c-ddf9-40e7-91d0-86806a42edb8 | provider    | 310911f6-acf0-4a47-824e-3032916582ff |
+--------------------------------------+-------------+--------------------------------------+

Note

This output may differ from your environment.

Set the NET_ID environment variable to reflect the ID of a network. For example, using the provider network:
```
$ export NET_ID=$(openstack network list | awk '/ provider / { print $2 }')
```

Create a stack of one CirrOS instance on the provider network:

$ openstack stack create -t demo-template.yml --parameter "NetID=$NET_ID" stack
+--------------------------------------+------------+--------------------+---------------------+--------------+
| ID                                   | Stack Name | Stack Status       | Creation Time       | Updated Time |
+--------------------------------------+------------+--------------------+---------------------+--------------+
| dbf46d1b-0b97-4d45-a0b3-9662a1eb6cf3 | stack      | CREATE_IN_PROGRESS | 2015-10-13T15:27:20 | None         |
+--------------------------------------+------------+--------------------+---------------------+--------------+

After a short time, verify successful creation of the stack:

$ openstack stack list
+--------------------------------------+------------+-----------------+---------------------+--------------+
| ID                                   | Stack Name | Stack Status    | Creation Time       | Updated Time |
+--------------------------------------+------------+-----------------+---------------------+--------------+
| dbf46d1b-0b97-4d45-a0b3-9662a1eb6cf3 | stack      | CREATE_COMPLETE | 2015-10-13T15:27:20 | None         |
+--------------------------------------+------------+-----------------+---------------------+--------------+

Show the name and IP address of the instance and compare with the output of the OpenStack client:

$ openstack stack output show --all stack
[
  {
    "output_value": "stack-server-3nzfyfofu6d4",
    "description": "Name of the instance.",
    "output_key": "instance_name"
  },
  {
    "output_value": "10.4.31.106",
    "description": "IP address of the instance.",
    "output_key": "instance_ip"
  }
]

$ openstack server list
+--------------------------------------+---------------------------+--------+---------------------------------+
| ID                                   | Name                      | Status | Networks                        |
+--------------------------------------+---------------------------+--------+---------------------------------+
| 0fc2af0c-ae79-4d22-8f36-9e860c257da5 | stack-server-3nzfyfofu6d4 | ACTIVE | public=10.4.31.106              |
+--------------------------------------+---------------------------+--------+---------------------------------+

Delete the stack.
```
$ openstack stack delete --yes stack
```

Openstack: Heat with error "create() got an unexpected keyword argument 'policies' "

In OSP16 the Nova API has changed the policies field to policy
You can work around this issue by specify the compute API --os-compute-api-version 2.63 in your command.

openstack server group create --os-compute-api-version 2.63 --policy affinity demo

Edit file: /etc/heat/heat.conf add the lines below and restart heat service:

[DEFAULT]

...

max_nova_api_microversion = 2.63

Using GTK from MSYS2 packages

Ref:https://www.gtk.org/docs/installations/windows/#using-gtk-from-msys2-packages

Installation

The MSYS2 project provides a UNIX-like development environment for Windows. It provides packages for many software applications and libraries, including the GTK stack. If you prefer developing using Visual Studio, you should use gvsbuild instead.

In MSYS2 packages are installed using the pacman package manager.

Note: in the following steps, we will assume you’re using a 64-bit Windows. Therefore, the package names include the x86_64 architecture identifier. If you’re using a 32-bit Windows, please adapt the instructions below using the i686 architecture identifier.

Step 1.: Download the MSYS2 installer that matches your platform and follow the installation instructions.

Step 2.: Install GTK3 and its dependencies. Open a MSYS2 shell, and run:

pacman -S mingw-w64-x86_64-gtk3

Step 3. (recommended): Install the GTK core applications. Glade is a GUI designer for GTK. It lets you design your GUI and export it in XML format. You can then import your GUI from your code using the GtkBuilder API. Read the GtkBuilder section in the GTK manual for more information.

To install Glade:

pacman -S mingw-w64-x86_64-glade

Step 4. (optional): If you want to develop a GTK3 application in Python, you need to install the Python bindings.

If you develop in Python 3:

pacman -S mingw-w64-x86_64-python3-gobject

If you develop in Python 2:

pacman -S mingw-w64-x86_64-python2-gobject

Step 5. (optional): Install the build tools. If you want to develop a GTK3 application in other languages like C, C++, Fortran, etc, you’ll need a compiler like gcc and other development tools: pacman -S mingw-w64-x86_64-toolchain base-devel

IT in clouds

Thursday, April 7, 2022

Capacity Planning for MySQL and MariaDB - Dimensioning Storage Size

How MySQL Utilizes Disk Space

Database Size Estimation

Binary Logs Size Estimation

Spindle Disk or Solid State?

Capacity Planning and Dimensioning

Monday, January 17, 2022

OpenStack: Add Host entry in dnsmasq

OpenStack: Setup MTU for VM instance's network

Tuesday, January 11, 2022

Openstack: Launch an instance with heat

Create a template¶

Create a stack¶

Openstack: Heat with error "create() got an unexpected keyword argument 'policies' "

Using GTK from MSYS2 packages

Installation

Install and use xorg-server on macOS via Homebrew

Hostens