Database-Performance-Optimization-Techniques-part-2

Introduction

In our earlier blog, we looked at different ways to optimize database performance from an application developer perspective. As data grows and workloads increase, simply optimizing queries during development is not enough and ensuring long-term database performance requires strategic infrastructure modifications and continuous monitoring.

Database Administrator (DBA) plays a vital role in ensuring the database infrastructure is optimized for performance. They are responsible for monitoring, measuring, and optimizing database performance. It involves analyzing key performance metrics and implementing the right measures to ensure efficient database operations.

So, in this blog, let’s understand some database performance optimization techniques that focus on server resource optimization, database configurations, hardware and architecture-level optimizations.

Database Performance Optimization

Optimized server resources, proper database maintenance, proper data distribution and scaling can help ensure a fast, scalable, and efficient performance. Here are few techniques which Database Administrators can use to optimize database performance:

Memory Management

Ensure the database server has sufficient RAM to cache frequently accessed data, reducing disk reads. One way to identify if there is a need for more memory is by checking the number of page faults your system has. If the page faults are high, then the servers are running on low memory or are completely out of memory. So, allocating more memory to database server will optimize the performance.

CPU Optimization

Allocate sufficient CPU cores for the database workload. Monitor the CPU ready times, which tell how many times the system attempted to use the CPU, but couldn’t because of the unavailability of resources. It will help to understand the CPU utilization, and you will know whether you should upgrade to a larger CPU. The more powerful the CPU, the better its capabilities to handle multiple applications and requests.

I/O Speed – Disk with High IOPS

IOPS, or Input/Output operations per second, measures the number of read and write operations a storage device performs per second. Databases heavily rely on disk access for storing and retrieving data, so high IOPS ensure the database can handle a large number of requests. Using SSDs instead of HDDs is better, as they have higher IOPS. SSDs do not have moving parts, which makes access to data quick.

Separate disks for data and logs

Having the database data and logs in the same disk would increase disk contention. As a best practice to optimize disk I/O, it is always good to ensure database data files and database logs are on separate disks to reduce contention.

Measure Read/Write Latency

Read/Write latency is the time taken by the database to read data from the disk and write data to the disk. Even if you have a good disk with high IOPS, it’s always a good idea to measure the actual latency. High latency means that the query execution is slow, and there are application performance issues. This article here helps in measuring the database latency – https://www.sqlshack.com/sql-server-troubleshooting-disk-i-o-problems/

Measure Index Fragmentation

Index fragmentation occurs when data pages are not stored contiguously, which leads to slow query performance. Both logical fragmentation (pages are out of order) and internal fragmentation (pages have unused space) can affect performance. Run sys.dm_db_index_physical_stats() command to check the fragmentation of the index. Here’s what the results mean:

  • <15% – No action needed
  • 15% – 30% – This indicates a moderate level of fragmentation, and the index needs to be re-organized. Reorganizing an index defragments the index without rebuilding it. Use the command “ALTER INDEX [index_name] ON [table_name] REORGANIZE;” for re-organizing.
  • >30% – This is more resource extensive than reorganizing as rebuilding the index drops and recreates the index from scratch. It removes fragmentation completely and updates index statistics. The command “ALTER INDEX [index_name] ON [table_name] REBUILD;” can be used. However, it is recommended to use it during off-peak hours to minimize the impact.

If the database is heavily used, use ONLINE=ON to allow queries during rebuilding. The command will be modified to: ALTER INDEX [index_name] ON [table_name] REBUILD WITH (ONLINE = ON).

Update Statistics

SQL Server maintains statistics on indexed columns to help the query optimizer choose the best execution plan. If stats are outdated, SQL Server may choose inefficient query plans, causing performance degradation. The command “UPDATE STATISTICS [table_name] WITH FULLSCAN” can be used, especially after bulk inserts or updates. To update all the outdated statistics, run the “EXEC sp_updatestats;” command.

MAXDOP (Maximum Degree of Parallelism)

MAXDOP, or Max Degree of Parallelism, sets the number of processors SQL Server can use to execute a single query. Default value of 0 allows SQL Server to use all available processors, however, it can cause resource contention and performance degradation when many queries are running concurrently. So, it’s always best to experiment with different values and then set it to the optimal value, which works for your particular use case.

Cost Threshold for Parallelism

The “cost threshold for parallelism” in SQL Server determines when a query will be executed using a parallel plan instead of a serial plan. It essentially sets the minimum estimated cost for a query to be considered for parallel execution.

SQL Server’s default value for Cost Threshold for Parallelism is set to 5, but it is too low for modern systems. The recommended starting point should be between 50 – 75 to leverage the benefit of parallelism while minimizing the overhead.

Database Sharding

Sharding is a powerful database architecture pattern that scales out the database as data volume and user load grows. It spreads the database’s data across different servers, which ensures high availability and optimal performance. There are different database sharding strategies, like key-based, range-based, hash-based, and directory-based. The selection of strategies depends on the exact requirements of the application and its characteristics.

Sharding Strategy NameTechniqueBest Use Case Scenario
Key-Based Sharding (Dynamic Sharding)Distributes data across shards based on a predefined sharding key, such as user_id or customer_id.When data can be evenly distributed using a natural key, avoiding hotspots. Suitable for multi-tenant applications.
Range-Based ShardingDivides data into shards based on value ranges (e.g., ID 1-1000 in Shard A, 1001-2000 in Shard B).When queries often need a sequential range of data, such as time-series data or ordered datasets.
Hash-Based ShardingUses a hash function to assign data to a shard, ensuring even distribution and preventing hotspots.When preventing uneven data distribution (skew) is a priority, such as in high-volume transactional databases.
Directory-Based ShardingMaintains a lookup table that maps each data record to a specific shard.When sharding logic is complex or dynamic, useful for multi-database systems where data locations may change.
Partitioning

Partitioning is a powerful database optimization technique that divides data in large tables into smaller and easily manageable pieces called partitions. Each partition holds a subset of the data based on the defined partitioning strategy, like a range of values or specific criteria.

This approach is helpful in improving query performance, especially in large datasets, as it performs the operations on individual partitions rather than on the entire table. When the databases are designed to handle large volumes of data and the management becomes complex, like a database with 10+ years of transaction data, in such cases, these partitioning techniques make query execution and maintenance faster when trying to retrieve recent data.

There are different types of partitioning techniques, like Range, List, and Hash, along with a hybrid approach that combines two or more original partitioning techniques.

  • Range Partitioning – It is based on partitioning the tables on a specific range of values, like data ranges or numerical values. It is suitable for time-based datasets, like sales transactions by year.
  • List Partitioning – On the basis of a predefined list of values, the data is collected and partitioned into a table. This method is suitable for data with limited and distinct values, like region or department.
  • Hash Partitioning – The data is distributed evenly across using the hash function, creating a balanced storage. It is perfect for high-volume datasets where data access is uniform.
  • Hybrid Approach – The hybrid approach combines two or more partitioning methods. For instance, a combination of range and hash partitioning is suitable when data needs load balancing across multiple servers. The hybrid approach distributes data evenly, preventing hotspots (partitions with higher loads than others).

Deciding on a proper partitioning strategy is important. Understanding the data characteristics and query patterns will help simplify the decision on which partitioning strategy to use.

The Bottom Line

It is recommended to utilize advanced monitoring tools that provide real-time tracking and in-depth analysis of database performance metrics. It assists the database administrators in proactively finding the issues, optimize configurations, and ensure consistent performance of the database in dynamic and high-traffic conditions.

With the rise of data-driven approaches, database infrastructure needs to be agile, dynamic, and quick to ensure the applications run with very minimal downtime and provide low-latency responses. A well-structured performance optimization strategy enhances database efficiency and seamless user experience, which is necessary for business-critical applications.