AWS DMS monitoring

To Nha Notes | Oct. 14, 2022, 2:33 p.m.

While the FreeableMemory metric does not reflect actual free memory available, the combination of the FreeableMemory and SwapUsage metrics can indicate if the replication instance is overloaded.

Monitor these two metrics for the following conditions:

  • The FreeableMemory metric approaching zero.

  • The SwapUsage metric increases or fluctuates.

Common issues

You might face the following common issues that cause resource contention on the replication instance during migration. For information on the replication instance metrics, see Replication instance metrics.

  • If the memory in a replication instance becomes insufficient, this results in writing data to the disk. Reading from the disk can cause latency, which you can avoid by sizing the replication instance with enough memory.

  • The disk size assigned to the replication instance can be smaller than required. The disk size is used when data in memory spills over; it's also used to store the task logs. The maximum IOPS depends on it too.

  • Running multiple tasks or tasks with high parallelism affects CPU consumption of the replication instance. This slows down the processing of the tasks and results in latency.

Best practices

Consider these two most common best practices when sizing a replication instance. For more information, see Best practices for AWS Database Migration Service.

  1. Size your workload and understand if it's computer-intensive or memory-intensive. Based on this, you can determine the class and size of the replication instance:

    • AWS DMS processes LOBs in memory. This operation requires a fair amount of memory.

    • The number of tasks and the number of threads impact CPU consumption. Avoid using more than eight MaxFullLoadSubTasks during the full load operation.

  2. Increase the disk space assigned to the replication instance when you have a high workload during full load. Doing this lets the replication instance use the maximum IOPS assigned to it.

 

The preceding tests show CPU and memory vary with different workloads. Particularly, LOBs affect memory, and task count or parallelism affect the CPU. After your migration is running, monitor the CPU, freeable memory, free storage, and IOPS of your replication instance. Based on the data you gather, you can size your replication instance up or down as needed.

Performance tuning

DMS tasks are highly customizable, with many parameters and settings to aid performance. There are six main areas to consider when you're tuning any DMS task:

  • The replication instance class
  • Task settings
  • LOB settings
  • Task splitting
  • Database settings

The replication instance is one main bottleneck for any DMS task as it controls all of the data flows and converts the source DB into the target DB. During any task, closely monitor the resource usage of the replication instance and consider changing the instance class if CPU and memory are above 80% utilized for a sustained period; spikes can be ignored. DMS is typically memory-intensive and DMS replication instances can support the r5 and c5 instance classes, which have higher memory per core than other instance classes.

The next thing to tune is the task settings themselves. The main items that can improve performance are as follows:

  • The commit rate
  • The number of tables to load in parallel
  • Creating primary key indexes after the full load

The commit rate controls the number of rows that are applied to the target DB in a single transaction. Setting this to a higher value can speed up the migration as there are fewer transactions. However, a higher commit rate requires a higher amount of memory for the replication instance, so you need to carefully control this value to ensure you do not over-utilize the memory.

The number of tables that are loaded in parallel is controlled by a value called MaxFullLoadSubTasks. The default setting is for eight tables to be loaded at once, but if you have a large number of smaller tables and a well-sized replication instance, you can increase the speed of the migration by setting this to a higher number.

The final task setting that can improve performance involves deferring the creation of primary key indexes. Primary keys are used as unique identifiers for each row and they typically have an index on them to speed up SELECT queries. However, indexes slow down INSERT and UPDATE queries as they need to be updated as well as the table. DMS allows you to defer the creation of these indexes until after the full load has been completed.

The next area that can cause poor migration performance is LOBs. Due to how DMS processes LOBs in two stages, as well as the need to hold LOBs in memory during conversion, the correct settings for LOB migrations can offer significant throughput improvement. The best way to optimize LOBs migrations is to identify whether any tables in scope contain LOBs. If they don't, then you can select Don't include LOB columns in the DMS task. If they do contain LOBs, then the best way to handle those is to split them into a task that's away from the other tables. This allows you to tune that task specifically for LOBs. The next step is to find the largest LOB in the tables in scope (you will need to refer to the documentation for your source DB to find out how to do that). Set the task to Limited LOB mode and set the maximum size to the size of the largest LOB you found. By doing this, DMS can more accurately size memory chunks to hold the LOB data, which means more LOB data can be migrated simultaneously, thereby speeding up the data transfer process.

 

References

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Monitoring.html

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_BestPractices.SizingReplicationInstance.html

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_BestPractices.html