How to find mysql bottlenecks with pt-stalk and pt-query-digest
pt-stalk Overview
pt-stalk used to collect some information about MySQL/MariaDB and OS during troubleshooting because sometimes it is not enough to have errors. It has two modes:
- It can be executed as a daemon with a condition when it will be executed and gather information
- Run right now as is, i.e. collect everything right now.
How to use pt-stalk in MYSQL?
Here is how we can run it immediately without passing the trigger option. In this case all data will be collected in /tmp/pt-stalk folder
> pt-stalk --iterations=1 --no-stalk --dest=/tmp/pt-stalk
Another case is to run in a daemon mode to wait for some condition when pt-stalk will start to collect data. In this specific case we instruct pt-stalk to wait global status “threads_running” will reach the threshold value 100 and then it will start collecting the data
> pt-stalk --function=status --variable=Threads_running --threshold=100 --cycles=5 --interval=1 --iterations=2 --dest=/tmp/pt-stalk
pt-stalk can also watch for the processlist or even write some custom code.
After all, information gathered by pt-stalk can be used during troubleshooting.
Also we can extend the functionality of pt-stalk. In order to use it we can utilize the “–plugin” option and implement custom function:
before_stalk() {
:
}
before_collect() {
# $EXT_ARGV contains access credentials
(mysql $EXT_ARGV -e 'show master logs' > /tmp/master-logs) &
}
after_collect() {
:
}
after_collect_sleep() {
:
}
after_stalk() {
:
}
Also we can use bash script with “–function” option to get more information. For example:
> pt-stalk --function=additional-metrics.sh --function=status --variable=Threads_running --threshold=100 --cycles=5 --interval=1 --iterations=2 --dest=/tmp/pt-stalk
Where “additional-metrics.sh” can have any other additional functionality to grab.
Slow query log
Slow query can be enabled by setting variable “slow_query_log” to 1 and setting “long_query_time” to the value you want. It makes sense to put 0.
Slow query log should be reviewed from time to time to figure out which queries cause performance issues. The most common case is high cpu usage which could possibly indicate the problem with lack of indexes. To simplify the search for such queries we could “pt-query-digest” from percona toolkit.
Runtime statistics
One of the first lines in this section provides the column headers for the query’s table of information. Pay attention to the following columns:
pct
(Percentage) – The percent of the total for the whole analysis run.total
– The actual value of the specified metric.95%
– The 95th percentile; 95% of the values are less than or equal to this value.stddev
(Standard deviation) – How tightly grouped the values are.
The standard deviation and median are both calculated from the 95th percentile, discarding the extremely large values.
The most simple use case
> pt-query-digest /var/log/mysql/slow.log
Report provides overview of the whole slow query log. Here we can find the general statistics
# 333.6s user time, 3.2s system time, 42.29M rss, 255.46M vsz
# Current date: Thu Nov 4 10:00:13 2021
# Hostname: db1
# Files: /var/log/mysql/slow.log
# Overall: 962.13k total, 308 unique, 290.59 QPS, 0.53x concurrency ______
# Time range: 2021-11-04 09:05:02 to 10:00:13
# Attribute total min max avg 95% stddev median
# ============ ======= ======= ======= ======= ======= ======= =======
# Exec time 1768s 32us 62s 2ms 2ms 66ms 1ms
# Lock time 115s 11us 17ms 119us 185us 57us 119us
# Rows sent 19.83M 0 89.27k 21.61 28.75 295.77 0
# Rows examine 1.59G 0 1.90M 1.73k 5.20k 21.36k 420.77
# Rows affecte 840 0 168 0.00 0 0.38 0
# Bytes sent 1023.12M 11 6.49M 1.09k 2.62k 12.12k 76.28
# Query size 724.91M 31 23.94k 790.04 1.96k 476.80 755.64
Profile shows the top N queries.
# Profile
# Rank Query ID Response time Calls R/Call V/M Ite
# ==== =========================== ============== ====== ======= ===== ===
# 1 0x76BD685795C8E349A7D674... 563.8500 31.9% 443072 0.0013 0.00 query 1
# 2 0x686E927C3596D99477FA99... 340.1520 19.2% 1000 0.3402 0.11 query 2
# 3 0xD8BDBBC16F544B415A2933... 212.9846 12.0% 188505 0.0011 0.00 query 3
# 4 0x65B5510218A1559FF23BB7... 102.8174 5.8% 1104 0.0931 0.00 query 4
# 5 0xB50E6DB8C88BA536C8A9E0... 82.2197 4.7% 65449 0.0013 0.00 query 5
# 6 0x9969FEF92CC8C8095BAFBC... 64.9858 3.7% 55 1.1816 0.01 query 6
# 7 0xE0ED29B8DF7CBD33497A34... 61.9152 3.5% 1 61.9152 0.00 query 7
# 8 0x7D8C1ED96E50D8BB465180... 59.1116 3.3% 7728 0.0076 0.01 query 8
....
# 19 0x31DB055F8054CFAD763CAC... 7.4954 0.4% 8281 0.0009 0.00 query 19
# MISC 0xMISC 86.0445 4.9% 108443 0.0008 0.0 <289 ITEMS>
Below we could see all queries fingertips
# Query 1: 133.82 QPS, 0.17x concurrency, ID 0x76BD685795C8E349A7D674D1C851CDF7 at byte 0
# This item is included in the report because it matches --limit.
# Scores: V/M = 0.00
# Time range: 2021-11-04 09:05:02 to 10:00:13
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 46 443072
# Exec time 31 564s 527us 26ms 1ms 2ms 330us 1ms
# Lock time 50 58s 98us 17ms 131us 159us 37us 125us
# Rows sent 8 1.70M 0 72 4.02 28.75 9.21 0
# Rows examine 9 150.84M 40 586 356.98 487.09 154.49 463.90
# Rows affecte 0 0 0 0 0 0 0 0
# Bytes sent 4 44.34M 70 716 104.93 313.99 80.68 69.19
# Query size 45 332.20M 785 787 786.20 755.64 0 755.64
# String:
# Databases db
# Hosts 1.1.1.1
# Last errno 0
# Users user1
# Query_time distribution
# 1us
# 10us
# 100us #######################
# 1ms ################################################################
# 10ms #
# 100ms
# 1s
# 10s+
# Tables
# ....
# EXPLAIN /*!50100 PARTITIONS*/
Query text
Here is more examples how you can run pt-query-digest:
Top 50 queries ordered by rows_examined
> pt-query-digest --limit=50 --group-by fingerprint --order-by Rows_examined:sum slog.log
# Filter queries for specific period of time
> pt-query-digest --since='2021-05-21 00:40:00' --until='2021-05-21 00:50:00' slow.log
# build report for all queries
> pt-query-digest --limit=100% slow.log
# build report only for queries with execution time higher than 5s
> pt-query-digest --filter '$event->{Query_time} > 5' slow.log
# build report only for queries executed by specific user
> pt-query-digest --filter '$event->{user} =~ /myuser/' slow.log
# --filter option allows to combine several conditions, for example.
> pt-query-digest --filter '$event->{user} =~ /myuser/ && $event->{Query_time} > 5' slow.log
# if you need to get a list of all attributes for the event run the following command:
>pt-query-digest --filter 'print Dumper $event' --no-report --sample 1 slow.log
$VAR1 = {
Bytes_sent => '148',
Filesort => 'No',
Filesort_on_disk => 'No',
Full_join => 'No',
Full_scan => 'Yes',
Lock_time => '0.000048',
Merge_passes => '0',
Priority_queue => 'No',
QC_hit => 'No',
Query_time => '0.000184',
Rows_affected => '0',
Rows_examined => '0',
Rows_sent => '0',
Thread_id => '85778229',
Tmp_table => 'No',
Tmp_table_on_disk => 'No',
arg => 'query',
bytes => 218,
cmd => 'Query',
db =>'db_name',
fingerprint => 'fingerprint of the query',
host => '8.8.8.8,
ip => '8.8.8.8',
pos_in_log => 26187012,
timestamp => '1627462800',
ts => '210728 9:00:00',
user => 'user_name'
};
Your Remote DBA Support
IF you need more help We offer you our services as a remote Database Administrator (DBA). depend on your MySQL database. But, what if your database goes down? Use our MySQL database support!