When using the
on-disk storage engine of BlobCity DB, fine-tuning of storage on physical storage volumes is key. Some of the best practices to be followed are mentioned on this page.
XFS file system on any Unix distribution is the best supported from a performance standpoint.
ext4 is a big no due to a low
inodes limit. An NTFS file format is recommended only over small volumes of data, typically less than 4 billion files per database node.
The DB stores data and indexes in individual files. It creates a large foot print of files when using the
on-disk storage engine and it is crucial that these files are managed appropriate. Moving large number of files is a time consuming task, hence it is recommended that the storage system be designed appropriate from the start.
The DB consumes at minimum 1 file on the file system per record stored in each collection. In addition it consumes at minimum 1 file per column that is indexed within each record. It is important to plan the number of files that will be created on a single node for optimal performance. If you insert 100 files each with 10 columns, of which 2 columns are indexed, the DB would at minimum create
1 * 100 = 100 files (one per each record) and additional
100 (records) * 2 (index per record) = 200 files for the indexes. This would mean you would at minimum consume
300 files on the file system to store 100 records with 2 columns indexed.
The maximum number of files possible on a single NTFS volume is ~4 billion. The maximum number of files possible to store on a single XFS volume is ~18 quintillion. The XFS number is substantially larger and hence recommended when storing large volumes of data in a BlobCity DB cluster. These limits are on a single mount point. The DB can virtually store unlimited files in a cluster. Each node in the cluster must be pointed to an independent file system mount point.
For high performance requirements, the data should be stored on SSD volumes. These could be volumes physically mounted onto the machine or attached over the network. It is recommended that over the network volumes be connected to the compute infrastructure over optical fibre cables and not regular LAN cables.
Since each record creates a file on the file system, and each file system has a minimum block size, matching the block size of the file system closest to the size of each record improves performance. When planning the file system block size, it is important to round up to the nearest block size and never round down. The below table provides examples of file system block size setting to use for different average record sizes.
|Average Record Size||File System Block Size|
|200 bytes||512 bytes|
|520 bytes||1024 bytes|
|1024 bytes||1024 bytes|
|1025 bytes||2048 bytes|
Data of all collections can be stored in a single mounted volume, however it is likely that different collections have different average record sizes. A single block size on the mount point may not be desirable to store data of all collections. For this reason, the
data folder of specific collections can be mounted as independent volumes to increase performance.
As seen in Figure 1 the
index folders are recommended to have different block sizes. The
data folder for
collection1 may have different block size than
data folder for
collection2. However it is recommended that
index folder always have the lowest block size configuration. Depending on your hypervisor and storage system setup, the lowest possible block size for an XFS volume could be either
Have the experts do it for you
BlobCity offers performance optimisation services. We did be happy to help your organisation make the most of BlobCity DB.
Updated over 4 years ago