Evaluating Disk I/O and Performance Impact of Partition Dropping in MySQL

To Nha Notes | Nov. 14, 2024, 2:59 p.m.

To test the impact on disk I/O and performance of each partition-dropping method on MySQL, you can set up a controlled test with the following steps. Here's an outline of how to do it:

1. Initial Setup: Create Table and Populate Data

1. **Create the table** with a `cid` column and add a partition on `cid` to manage data more effectively:
   ```sql
   CREATE TABLE my_table (
       id BIGINT AUTO_INCREMENT PRIMARY KEY,
       cid INT,
       other_columns VARCHAR(255)
   )
   PARTITION BY HASH(cid) PARTITIONS 10; -- Adjust partitions as needed
   ```

2. **Generate and insert 100 million records** in a loop or through a bulk data-loading method:
   - Use a script or load generator to insert records. To simulate real-world data, add varying values in other columns.
   - Insert the data in bulk to avoid extensive insert times and use `AUTO_INCREMENT` for `id`.

   ```sql
   -- Example script for bulk inserts
   INSERT INTO my_table (cid, other_columns)
   VALUES (FLOOR(RAND() * 10), 'Sample data');
   ```

3. **Measure Initial Disk I/O Baseline**:
   - Use MySQL’s **Performance Schema** or an external tool like **iostat** or **pidstat** to establish baseline I/O usage and query times.

2. Micro-Batch Deletion Then Drop Partition

This method involves deleting the records in a specific partition in small batches until empty, then dropping the partition.

1. **Delete records in micro-batches**:
   - Select records by `cid` and delete them in batches (e.g., 10,000 rows at a time).
   - Measure the time and disk I/O impact of this delete operation.

   ```sql
   SET @batch_size = 10000;
   REPEAT
       DELETE FROM my_table
       WHERE cid = 1
       LIMIT @batch_size;
   UNTIL ROW_COUNT() = 0 END REPEAT;
   ```

2. **Drop the now-empty partition**:
   - After deletion, drop the empty partition.

   ```sql
   ALTER TABLE my_table DROP PARTITION p1;
   ```

3. **Measure Disk I/O and Performance**:
   - Capture disk I/O and performance impact using tools like **Performance Schema**, **iostat**, and **pidstat**.
   - Note the resource impact of this method.

3. Direct Partition Drop without Micro-Batch Deletion

In this case, you skip the deletion step and drop the partition directly.

1. **Drop the partition directly** without deleting data:
   ```sql
   ALTER TABLE my_table DROP PARTITION p1;
   ```

2. **Measure Disk I/O and Performance**:
   - Again, capture disk I/O and performance impact.
   - Direct partition drop without deletion might cause more intense I/O spikes but could be faster.

4. Compare Results

- **Analyze the data from both tests**:
   - Check for time differences in completing the operations.
   - Review I/O metrics for spikes or prolonged usage during each method.
- **Evaluate performance impact**:
   - Dropping a partition directly is typically faster because it avoids row-by-row deletion. However, it may temporarily increase I/O due to the abrupt data removal.
   - Deleting data in batches before dropping may result in longer total operation time but could potentially reduce the I/O impact during partition removal.

5. Conclusion

Comparing the metrics will show:
- **Total time taken** for each method.
- **Disk I/O impact** during each method.
- **Which approach minimizes server impact** based on observed I/O metrics.

### Additional Tips

- Consider MySQL’s **InnoDB** buffer pool settings, as it will affect how the system handles I/O during these operations.
- Try running these tests during off-peak hours or on a replica to prevent server load issues if this is on a production server.
  
This experiment will help you determine the most efficient approach for handling large data removals via partition drops.