Base cntr next 2
|

A simple pt-table-sync USER GUIDE

Overview

Sometimes, when working with master and slave tables, there is the need to synchronize some or all of them. This is because changes on the data that the slave tables might experiment. That process shouldn’t be of much concern to you since there is something that takes care of that. We’re talking about the pt-table-sync command, a powerful tool to synchronize data efficiently between your MySQL tables.

Always keep in mind that pt-table-sync changes data on your tables, so it is a good idea to back it up before you run it!

When you use the pt-table-sync command, along with the –replicate or –sync-to-master methods, know that the changes will always be on the master, and never on the replica (slave) directly. Making changes on the replica usually brings problems, so, for most cases, this is the only safe way to sync a replica with its master. So, as for the master, there shouldn’t be any tempering on its data values, any changes on it should be no-op so that it keeps its current data. The process should then only affect the replica.

An important observation is that, when there are differences between master and slaves, this tool will not be aware of them and will not handle those cases. It will act the same way towards new or missing data on the tables.

If you are looking for a deeper explanation on pt-table-sync you can read about it here.

Limitations

The pt-table-sync tool, although quite useful, has its limitations. For starters, it relies on pt-table-checksum execution for some actions. Also, it has an issue similar to what was mentioned above. It does not check the differences in the databases/tables structure, and assumes that they are the same among master and replicas.

Another issue is that, when you use this tool with the –sync-to-master or –replicate methods, it will require statement-based replication. For this you will need SUPER privilege to set “binlog_format = STATEMENT” on master as it is required. These methods are also necessary for carefully handling a master-master setup.

Simple case study

In this section, we will continue handling the errors which were introduced on the “pt-table-checksum user guide”. You can read how we got this result here.

Now, recalling the differences we created between master and replica, we changed one record, deleted another and updated one more record. You can see the result here:

slave1 [localhost:24029] {msandbox} ((none)) > select chunk,  lower_boundary, upper_boundary, this_crc, master_crc, this_cnt, master_cnt from percona.checksums where tbl='fail' and( this_crc <> master_crc or this_cnt <> master_cnt)\G
*************************** 1. row ***************************
     	chunk: 2
lower_boundary: 3
upper_boundary: 4
  	this_crc: 6c6eee37756ea6e6e9581fe3c11218cf
	master_crc: 1c8de6ca86846fd0dbc3ca84de1e9d39
  	this_cnt: 1
	master_cnt: 2
*************************** 2. row ***************************
     	chunk: 4
lower_boundary: 7
upper_boundary: 8
  	this_crc: 466afe5823613919b3be7a8058e97afb
	master_crc: b9e0bb85309d42c382bc4b458ea9eb7e
  	this_cnt: 2
	master_cnt: 2
*************************** 3. row ***************************
     	chunk: 6
lower_boundary: 8
upper_boundary: NULL
  	this_crc: 0
	master_crc: 0
  	this_cnt: 1
	master_cnt: 0
3 rows in set (0.000 sec)

The same result is shown below, but notice that this one displays different in SQL:

slave1 [localhost:24029] {msandbox} (sync) > select * from fail;
+----+----------+---------------------+
| id | value	| dtm                |
+----+----------+---------------------+
|  1 | A15F0C16 | 2021-03-18 11:51:09 |
|  2 | 16C13FA0 | 2021-03-18 11:51:09 |
|  3 | 8DA91A62 | 2021-03-18 11:51:10 | ← the next row was deleted (id=4)
|  5 | D7358D23 | 2021-03-18 11:51:10 |
|  6 | B3EE718A | 2021-03-18 11:51:11 |
|  7 | changed  | 2021-03-18 11:51:12 | ← the updated row
|  8 | DA5A7FF3 | 2021-03-18 11:51:12 |
|  9 | A44B677A | 2021-03-18 11:53:39 | ← the inserted row
+----+----------+---------------------+
8 rows in set (0.000 sec)

Now, let’s run the pt-table-sync command. We’ll use it to sync the replica to the master. Notice that we will synchronize only the “fail” table:

>> # pt-table-sync --sync-to-master h=127.0.0.1,u=msandbox,p=msandbox,P=24029 --databases=sync --tables=fail --print --function MD5

DELETE FROM `sync`.`fail` WHERE `id`='9' LIMIT 1 /*percona-toolkit src_db:sync src_tbl:fail src_dsn:P=24028,h=127.0.0.1,p=...,u=msandbox dst_db:sync dst_tbl:fail dst_dsn:P=24029,h=127.0.0.1,p=...,u=msandbox lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:16888 user:root host:local*/;

REPLACE INTO `sync`.`fail`(`id`, `value`, `dtm`) VALUES ('4', '8009C58E', '2021-03-18 11:51:10') /*percona-toolkit src_db:sync src_tbl:fail src_dsn:P=24028,h=127.0.0.1,p=...,u=msandbox dst_db:sync dst_tbl:fail dst_dsn:P=24029,h=127.0.0.1,p=...,u=msandbox lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:16888 user:root host:local*/;

REPLACE INTO `sync`.`fail`(`id`, `value`, `dtm`) VALUES ('7', 'FE0790CF', '2021-03-18 11:51:12') /*percona-toolkit src_db:sync src_tbl:fail src_dsn:P=24028,h=127.0.0.1,p=...,u=msandbox dst_db:sync dst_tbl:fail dst_dsn:P=24029,h=127.0.0.1,p=...,u=msandbox lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:16888 user:root host:local*/;

We can notice that the tool deleted the row with id=9, and we can see what we initially added as well.

The next change is in row with id=4. Since we initially deleted it, we can see that the pt-table-sync command re-inserted that row.

Now, since we also modified the value on row with id=7, we can see that the original value was re-inserted.

Since we used the –print option, the master/replica tables were not affected by this. It just displayed the three statements  that would be used in case we actually were to sync the replica to the master. Now that we saw what would be done on the replica’s table, we can run them on the problematic node.

NOTE: It is possible that these changes won’t be atomic. In case you need them to be, you can do so by:

  • Wrapping all three statements with “start transaction/commit”
  • Wrapping the statements with “lock table X write/unlock tables”

Conclusion

Even with its rules and limitations, the pt-table-sync command is a great tool for synchronizing your master table and its replications. If you do not mind reading a little about the syntax, methods, and other details about it, you will find that it will make your MySQL database administrator job much easier. As always, learning about new tools is a challenge, but one that will be well rewarded.

We hope you learned something of value in this article. Stay around for more MySQL tips that will make your life easier!