Manage memory in OceanBase Database
Memory management is key to high-performance servers in C.
In the early days of distributed storage systems, memory-related bugs were quite common, such as memory access out of bounds and core dumps of servers, which are quite difficult to debug.
OceanBase Database provides a global fixed-length memory pool, which maintains an idle linked list consisting of memory blocks with a fixed length of 64 KB.
Apparently, a global memory pool is not suitable for managing small memory blocks. Each module that needs memory, such as a MemTable in UpdateServer or cache in ChunkServer, can only request a large memory block from the global memory pool, and then implement a dedicated memory pool in the module.
OceanBase does not use any sophisticated memory management techniques but has achieved good manageability and zero fragmentation, the top two expectations of early distributed systems.
So, what are the memory management strategies of OceanBase in a real-world production environment?
Let’s look at these strategies in three major business scenarios.
1. OLTP
Online transaction processing (OLTP) is sensitive to data read/write response time (RT) and data consistency, mainly deals with SQL statements in the key-value format, and involves fewer range queries and Table Join queries that are generally executed with optimal indexes. OLTP workload shows obvious peak-and-valley patterns and the read/write traffic ratio does not change significantly.
Scenario description
OLTP has strict requirements for read and write RT, which must be less than 10 ms for OceanBase databases that store gigabytes of data, and even lower, say 5 ms, for OceanBase databases that store terabytes of data.
Recommended configuration
Given the characteristics of the OLTP scenarios, you must set the memory switchover time of OceanBase Database to the minimum and schedule the daily major compaction to off-peak hours to minimize the impact on the cluster performance.
Recommended parameter settings:
memory_limit: You can use the default value 0, which specifies that the maximum memory of the physical server available for OceanBase Database is determined by memory_limit_percentage.
memory_limit_percentage: You can set the parameter to 80 if the memory size of the physical server is less than 512 GB, or 90 if the memory size of the physical server is equal to or greater than 512 GB.
freeze_trigger_percentage: We recommend that you set the parameter based on the read/write traffic ratio. Generally, the parameter is set to a value that ranges from 60 to 75. To store more DML data in memory and reduce the number of minor and major compactions, if the data write traffic is higher you can set the parameter to a value that is closer to that of memory_limit_percentage.
minor_freeze_times: You can set the parameter to a value that ranges from 3 to 8 based on the daily amount of DML data. Minor compaction quickly releases memory but causes slight jitters. The minor compaction speed depends on the storage medium. The minor compaction speed for data stored in solid-state drives (SSDs) is greater than that stored in hard disk drives (HDDs), and the speed for data stored in HDDs is greater than that stored in serial AT attachment (SATA) drives.
2. OLAP
Online analytical processing (OLAP) is tolerant to a read/write RT of several hundred milliseconds or even several seconds and is insensitive to data consistency. To be specific, OLAP is tolerant to millisecond-level data inconsistency between nodes, even though it often handles Aggregate queries that involve tens of thousands of to millions of data rows. In addition, OLAP mainly deals with Where, Range, Order By, Join, and Aggregate queries, whose SQL statements are executed by using the partitioning key or primary key in OceanBase Database. Generally, the data read traffic is less than the data write traffic, and experiences peak hours. Peaks of the data write traffic emerges periodically or intermittently mainly because most AP traffic involves real-time or offline periodic synchronization tasks.
Recommended configuration
Given the characteristics of OLAP, you must set the memory switchover time of OceanBase Database to the minimum value to protect the data synchronization performance from being affected by the daily major compaction. In addition, you must take into account the impact of the physical storage media on the major compaction.
memory_limit: You can use the default value 0, which specifies that the maximum memory of the physical server available for OceanBase Database is determined by memory_limit_percentage.
memory_limit_percentage: We recommend that you set the parameter to 80 so that more auto-scalable memory space is allocated to caches and SQL threads to ensure the normal return of results.
freeze_trigger_percentage: This parameter specifies the threshold of memory used by tenants for triggering a global freeze. We recommend that you set the threshold to a smaller value, say 60, so that the remaining memory is enough to support batch writes when the occupied memory is not released. In this way, synchronous data writes are not affected.
minor_freeze_times: You can set this parameter to a value that ranges from 3 to 5. If you use SSDs or HDDs, we recommend that you also enable the minor compaction feature. Minor compaction quickly releases the memory but causes slight jitters. The minor compaction speed depends on the storage medium. The minor compaction speed for data stored in SSDs is greater than that stored in HDDs, and the speed for data stored in HDDs is greater than that stored in serial SATA drives. If you use SATA drives for storage, we recommend that you disable the minor compaction feature.
3. History databases
A history database requires a data read RT that is roughly the same as that of OLTP and is sensitive to data consistency. A history database is insensitive to the data write RT and mainly deals with queries based on key-value pairs. Occasionally, it also deals with a small amount of Stat and Aggregate queries.
Recommended configuration
Considering the characteristics of history databases and the low bandwidth of the storage media, you must ensure that the memory switchover of OceanBase Database does not affect the data read performance and reduce the major compaction efficiency. Recommended parameter settings:
memory_limit: You can use the default value 0, which specifies that the maximum memory of the physical server available for OceanBase Database is determined by memory_limit_percentage.
memory_limit_percentage: You can set the parameter to 80 if the memory size of the physical server is less than 512 GB, or 90 if the memory size of the physical server is equal to or greater than 512 GB.
freeze_trigger_percentage: We recommend that you set the parameter to a value that is consistent with the MemStore parameters, say 80, to minimize the number of major compactions daily.
minor_freeze_times: We recommend that you set the parameter to 0, which indicates that the minor compaction feature is disabled. This is because most history databases use SATA drives.
sys_bkgd_io_percentage: This parameter specifies the percentage of system I/O threads. We recommend that you set the parameter to 90 so that the data disk bandwidth can be maximized during major compaction.
merge_thread_count: The default value of this parameter is 0, which indicates that the major compaction threads are automatically allocated from the tenant threads. You can set it to 48 to allocate more threads and speed up the major compaction.
To sum up, you must adjust the memory management parameters based on your business scenarios and physical server configuration to meet your business needs at the best cluster performance.
In this way, you can make full use of the low RT of MemTable queries in the OceanBase Database and ensure the maximum equipment utilization with high availability, achieving a balance between performance and costs.