|

pt-table-checksum overview

pt-table-checksum performs an online replication consistency check by executing checksum queries on the master, which produces different results on replicas that are inconsistent with the master.

pt-table-checksum limitation

  • Tool does not check difference in databases/tables structure and assumes that they are the same
  • Tool requires “binlog_format = STATEMENT” set on master (what it sets by default) and requires the same binlog format on all replicas because queries won’t be replicated further to the rest chain of replicas. We can hit this issue only when we have a chain of replicas: master -> replica1 -> replica2. For the common setup like master -> replica it is not the issue because we can pass flag ‘–no-check-binlog-format’

Very important here to understand that since this script relies on checksum it might lead to the situation when data is different but script shows that data is fine. It happens because of hash collision (what is hash collision in practice: https://stackoverflow.com/questions/45795637/what-exactly-is-hash-collision)

pt-table-checksum simple case study

Under this method, the server can dynamically choose between statement-based replication and row-based replication, depending on certain conditions. Some of these conditions include using a user-defined function (UDF), using an INSERT command with the DELAYED clause, using temporary tables, or using a statement that uses system variables.

master [localhost:24028] {msandbox} ((none)) > create database sync;
Query OK, 1 row affected (0.000 sec)

master [localhost:24028] {msandbox} ((none)) > use sync
Database changed
master [localhost:24028] {msandbox} (sync) > create table ok(id integer auto_increment, value varchar(30), dtm datetime, primary key(id));
Query OK, 0 rows affected (0.056 sec)

master [localhost:24028] {msandbox} (sync) > create table fail(id integer auto_increment, value varchar(30), dtm datetime, primary key(id));
Query OK, 0 rows affected (0.048 sec)

#
# generate some data using the following statements in both tables: ok, failed
#
master [localhost:24028] {msandbox} (sync) > insert into ok(value, dtm) values( hex(rand()*0xFFFFFFFF), now());
Query OK, 1 row affected (0.011 sec)

Then we are going to inject the following inconsistency on the replica:

  • delete a single row
  • update a single row
  • insert a new row

At this point we assume that table called “ok” has the same data on master and replica while table “fail” has inconsistency mentioned above. Here is the content of “fail” table:

master [localhost:24028] {msandbox} (sync) > select * from fail;
+----+----------+---------------------+
| id | value	| dtm             	|
+----+----------+---------------------+
|  1 | A15F0C16 | 2021-03-18 11:51:09 |
|  2 | 16C13FA0 | 2021-03-18 11:51:09 |
|  3 | 8DA91A62 | 2021-03-18 11:51:10 |
|  4 | 8009C58E | 2021-03-18 11:51:10 |
|  5 | D7358D23 | 2021-03-18 11:51:10 |
|  6 | B3EE718A | 2021-03-18 11:51:11 |
|  7 | FE0790CF | 2021-03-18 11:51:12 |
|  8 | DA5A7FF3 | 2021-03-18 11:51:12 |
+----+----------+---------------------+
8 rows in set (0.001 sec)


slave1 [localhost:24029] {msandbox} (sync) > select * from fail;
+----+----------+---------------------+
| id | value	| dtm             	|
+----+----------+---------------------+
|  1 | A15F0C16 | 2021-03-18 11:51:09 |
|  2 | 16C13FA0 | 2021-03-18 11:51:09 |
|  3 | 8DA91A62 | 2021-03-18 11:51:10 | ← the next row was deleted
|  5 | D7358D23 | 2021-03-18 11:51:10 |
|  6 | B3EE718A | 2021-03-18 11:51:11 |
|  7 | changed  | 2021-03-18 11:51:12 | ← the updated row
|  8 | DA5A7FF3 | 2021-03-18 11:51:12 |
|  9 | A44B677A | 2021-03-18 11:53:39 | ← the inserted row
+----+----------+---------------------+
8 rows in set (0.000 sec)

Let’s run pt-table-checksum on the master. I set binlog_format to ROW on both servers for the test purposes.

>> pt-table-checksum --user=msandbox --password=msandbox --port 24028 --host=127.0.0.1 --databases="sync"
Checking if all tables can be checksummed ...
Starting checksum ...
Cannot connect to P=24030,h=node-3,p=...,u=msandbox
Cannot connect to P=24029,h=node-2,p=...,u=msandbox
Diffs cannot be detected because no slaves were found.  Please read the --recursion-method documentation for information.
CRC32 never needs BIT_XOR optimization at /usr/bin/pt-table-checksum line 6124.

Here we tried to run pt-table-checksum but got the error that we cannot connect to the replicas because of credentials. In order to do it we can follow the official documentation and create the following table:

master [localhost:24028] {msandbox} (percona) > CREATE TABLE `dsns` (
	->   `id` int(11) NOT NULL AUTO_INCREMENT,
	->   `parent_id` int(11) DEFAULT NULL,
	->   `dsn` varchar(255) NOT NULL,
	->   PRIMARY KEY (`id`)
	-> );
Query OK, 0 rows affected (0.060 sec)

All tables should have DSNs which provide information on how to connect to all databases in my specific test case.

master [localhost:24028] {msandbox} (percona) > insert into dsns values(1, null, "h=127.0.0.1,P=24029,u=msandbox,p=msandbox");
Query OK, 1 row affected (0.008 sec)

master [localhost:24028] {msandbox} (percona) > insert into dsns values(2, null, "h=127.0.0.1,P=24030,u=msandbox,p=msandbox");
Query OK, 1 row affected (0.007 sec)

Let’s run pt-table-checksum again to see what happen:

>> pt-table-checksum --user=msandbox --password=msandbox --port 24028 --host=127.0.0.1 --databases="sync" --no-check-slave-tables --recursion-method='dsn=h=127.0.0.1,D=percona,t=dsns'
Checking if all tables can be checksummed ...
Starting checksum ...
Replica slave1 has binlog_format ROW which could cause pt-table-checksum to break replication.  Please read "Replicas using row-based replication" in the LIMITATIONS section of the tool's documentation.  If you understand the risks, specify --no-check-binlog-format to disable this check.
Replica slave2 has binlog_format MIXED which could cause pt-table-checksum to break replication.  Please read "Replicas using row-based replication" in the LIMITATIONS section of the tool's documentation.  If you understand the risks, specify --no-check-binlog-format to disable this check.

As you can see the utility says that we should either change binlog_format or skip this check (what we are going to do here because we do not have a replication chain. In some versions of MariaDB you can get an error related to CRC and you can workaround it by passing ‘–function MD5’ ). I will pass “–chunk-size=2” to reduce the chunk size and to show how to find where there is an inconsistency.

>> pt-table-checksum --user=msandbox --password=msandbox --port 24028 --host=127.0.0.1 --databases="sync" --no-check-slave-tables --recursion-method='dsn=h=127.0.0.1,D=percona,t=dsns'  --no-check-binlog-format --chunk-size=2
Checking if all tables can be checksummed ...
Starting checksum ...
        	TS ERRORS  DIFFS 	ROWS  CHUNKS SKIPPED	TIME TABLE
03-18T12:39:51  	0  	3    	8   	6   	0   0.240 sync.fail
03-18T12:39:51  	0  	0    	5   	5   	0   0.195 sync.ok

What we can see here is that table “sync.ok” does not have any issue while table “fail” has some issues. Let’s take a closer look at the content of “percona.checksums” table where this script stores execution results (table is small and inserted here as an image because it is very wide and cannot be displayed as a table).

Now let’s take a look closely at important information:

slave1 [localhost:24029] {msandbox} ((none)) > select chunk,  lower_boundary, upper_boundary, this_crc, master_crc, this_cnt, master_cnt from percona.checksums where tbl='fail' and( this_crc <> master_crc or this_cnt <> master_cnt)\G
*************************** 1. row ***************************
     	chunk: 2
lower_boundary: 3
upper_boundary: 4
  	this_crc: 6c6eee37756ea6e6e9581fe3c11218cf
	master_crc: 1c8de6ca86846fd0dbc3ca84de1e9d39
  	this_cnt: 1
	master_cnt: 2
*************************** 2. row ***************************
     	chunk: 4
lower_boundary: 7
upper_boundary: 8
  	this_crc: 466afe5823613919b3be7a8058e97afb
	master_crc: b9e0bb85309d42c382bc4b458ea9eb7e
  	this_cnt: 2
	master_cnt: 2
*************************** 3. row ***************************
     	chunk: 6
lower_boundary: 8
upper_boundary: NULL
  	this_crc: 0
	master_crc: 0
  	this_cnt: 1
	master_cnt: 0
3 rows in set (0.000 sec)

Here we filtered data by two conditions:

  • this_cnt <> master_cnt: this condition filtered out rows where we have the same amount of rows in each chunk, i.e. no rows were deleted or inserted
  • this_crc <> master_crc: this condition filtered out rows where we would have different crc values, i.e. some rows were changed.

Let’s take a look closer at the output and review each chunk.

Chunk #2

We see that the amount of rows is different and we should check this chunk manually. Since this_cnt is less than master_cnt then rows were deleted on the replica.

Chunk #4

Amount of rows are same but crc are different hence we updated some rows here

Chunk #6

Here we have different amount of rows (crc is empty because we do not have this chunk on the master) and on master we have less than on the replica. Hence we added something here.

Final note

Because we have a very simple case we know what are differences between master and replicas we can easily decide what the inconsistency is.

However, in the real situation we might get the situation when the amount of rows differs by 1 but 5 rows were deleted and 6 were inserted and it will give us in total 11 inconsistencies.

pt-table-checksum in production

This script should be used with caution in a production environment.

If you have any kind of maintenance window then it is great to use it to run pt-table-checksum.

If not then it would be better to run this script during off peak hours and in addition you can limit the amount of data to be scanned during single execution. The key point here is to build a robust workflow to check the whole table in several runs.

There are several parameters which might help to decrease the amount of rows to be scanned in a single execution:

--where

Do only rows matching this WHERE clause. You can use this option to limit the checksum to only part of the table. This is particularly useful if you have append-only tables and don't want to constantly re-check all rows; you could run a daily job to just check yesterday's rows, for instance.

Example:

pt-table-checksum --where "updated_dtm >= '2021-03-18 00:00:00' and updated_dtm < '2021-03-19 00:00:00'"


--tables

Checksum only this comma-separated list of tables. Table names may be qualified with the database name.


--tables-regex

Checksum only tables whose names match this Perl regex.

Example:

pt-table-checksum --tables 'transactions,users'

The top 5 mysql picture 2 3