How to find out mysql bottlenecks with pt stalk and pt query digest
|

How to find mysql bottlenecks with pt-stalk and pt-query-digest

pt-stalk Overview

pt-stalk used to collect some information about MySQL/MariaDB and OS during troubleshooting because sometimes it is not enough to have errors. It has two modes:

  • It can be executed as a daemon with a condition when it will be executed and gather information
  • Run right now as is, i.e. collect everything right now.

How to use pt-stalk in MYSQL?

Here is how we can run it immediately without passing the trigger option. In this case all data will be collected in /tmp/pt-stalk folder

> pt-stalk --iterations=1 --no-stalk --dest=/tmp/pt-stalk

Another case is to run in a daemon mode to wait for some condition when pt-stalk will start to collect data. In this specific case we instruct pt-stalk to wait global status “threads_running” will reach the threshold value 100 and then it will start collecting the data

> pt-stalk --function=status --variable=Threads_running --threshold=100 --cycles=5 --interval=1 --iterations=2 --dest=/tmp/pt-stalk

pt-stalk can also watch for the processlist or even write some custom code.

After all, information gathered by pt-stalk can be used during troubleshooting.

Also we can extend the functionality of pt-stalk. In order to use it we can utilize the “–plugin” option and implement custom function:

before_stalk() {
   :
}

before_collect() {
     # $EXT_ARGV contains access credentials
     (mysql $EXT_ARGV -e 'show master logs' > /tmp/master-logs) &
}

after_collect() {
   :
}

after_collect_sleep() {
   :
}

after_stalk() {
   :
}

Also we can use bash script with “–function” option to get more information. For example:

> pt-stalk --function=additional-metrics.sh --function=status --variable=Threads_running --threshold=100 --cycles=5 --interval=1 --iterations=2 --dest=/tmp/pt-stalk

Where “additional-metrics.sh” can have any other additional functionality to grab.

Slow query log

Slow query can be enabled by setting variable “slow_query_log” to 1 and setting “long_query_time” to the value you want. It makes sense to put 0.

Slow query log should be reviewed from time to time to figure out which queries cause performance issues. The most common case is high cpu usage which could possibly indicate the problem with lack of indexes. To simplify the search for such queries we could “pt-query-digest” from percona toolkit.

Runtime statistics

One of the first lines in this section provides the column headers for the query’s table of information. Pay attention to the following columns:

  • pct (Percentage) – The percent of the total for the whole analysis run.
  • total – The actual value of the specified metric.
  • 95% – The 95th percentile; 95% of the values are less than or equal to this value.
  • stddev (Standard deviation) – How tightly grouped the values are.

The standard deviation and median are both calculated from the 95th percentile, discarding the extremely large values.

The most simple use case

> pt-query-digest /var/log/mysql/slow.log

Report provides overview of the whole slow query log. Here we can find the general statistics

# 333.6s user time, 3.2s system time, 42.29M rss, 255.46M vsz
# Current date: Thu Nov  4 10:00:13 2021
# Hostname: db1
# Files: /var/log/mysql/slow.log
# Overall: 962.13k total, 308 unique, 290.59 QPS, 0.53x concurrency ______
# Time range: 2021-11-04 09:05:02 to 10:00:13
# Attribute      	total 	min 	max 	avg 	95%  stddev  median
# ============ 	======= ======= ======= ======= ======= ======= =======
# Exec time      	1768s	32us 	62s 	2ms 	2ms	66ms 	1ms
# Lock time       	115s	11us	17ms   119us   185us	57us   119us
# Rows sent     	19.83M   	0  89.27k   21.61   28.75  295.77   	0
# Rows examine   	1.59G   	0   1.90M   1.73k   5.20k  21.36k  420.77
# Rows affecte     	840   	0 	168	0.00   	0	0.38   	0
# Bytes sent   	1023.12M  	11   6.49M   1.09k   2.62k  12.12k   76.28
# Query size   	724.91M  	31  23.94k  790.04   1.96k  476.80  755.64


Profile shows the top N queries.

# Profile
# Rank Query ID                	Response time  Calls  R/Call  V/M   Ite
# ==== =========================== ============== ====== ======= ===== ===
#	1 0x76BD685795C8E349A7D674... 563.8500 31.9% 443072  0.0013  0.00 query 1
#	2 0x686E927C3596D99477FA99... 340.1520 19.2%   1000  0.3402  0.11 query 2
#	3 0xD8BDBBC16F544B415A2933... 212.9846 12.0% 188505  0.0011  0.00 query 3
#	4 0x65B5510218A1559FF23BB7... 102.8174  5.8%   1104  0.0931  0.00 query 4
#	5 0xB50E6DB8C88BA536C8A9E0...  82.2197  4.7%  65449  0.0013  0.00 query 5
#	6 0x9969FEF92CC8C8095BAFBC...  64.9858  3.7% 	55  1.1816  0.01 query 6
#	7 0xE0ED29B8DF7CBD33497A34...  61.9152  3.5%  	1 61.9152  0.00 query 7
#	8 0x7D8C1ED96E50D8BB465180...  59.1116  3.3%   7728  0.0076  0.01 query 8
....
#   19 0x31DB055F8054CFAD763CAC...   7.4954  0.4%   8281  0.0009  0.00 query 19
# MISC 0xMISC                   	86.0445  4.9% 108443  0.0008   0.0 <289 ITEMS>


Below we could see all queries fingertips

# Query 1: 133.82 QPS, 0.17x concurrency, ID 0x76BD685795C8E349A7D674D1C851CDF7 at byte 0
# This item is included in the report because it matches --limit.
# Scores: V/M = 0.00
# Time range: 2021-11-04 09:05:02 to 10:00:13
# Attribute	pct   total 	min 	max 	avg 	95%  stddev  median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count     	46  443072
# Exec time 	31	564s   527us	26ms 	1ms 	2ms   330us 	1ms
# Lock time 	50 	58s	98us	17ms   131us   159us	37us   125us
# Rows sent  	8   1.70M   	0  	72	4.02   28.75	9.21   	0
# Rows examine   9 150.84M  	40 	586  356.98  487.09  154.49  463.90
# Rows affecte   0   	0   	0   	0   	0   	0   	0   	0
# Bytes sent 	4  44.34M  	70 	716  104.93  313.99   80.68   69.19
# Query size	45 332.20M 	785 	787  786.20  755.64   	0  755.64
# String:
# Databases	db
# Hosts    	1.1.1.1
# Last errno   0
# Users    	user1
# Query_time distribution
#   1us
#  10us
# 100us  #######################
#   1ms  ################################################################
#  10ms  #
# 100ms
#	1s
#  10s+
# Tables
#	....
# EXPLAIN /*!50100 PARTITIONS*/
Query text




Here is more examples how you can run pt-query-digest:

Top 50 queries ordered by rows_examined
> pt-query-digest --limit=50  --group-by fingerprint --order-by Rows_examined:sum slog.log

# Filter queries for specific period of time
> pt-query-digest --since='2021-05-21 00:40:00' --until='2021-05-21 00:50:00' slow.log

# build report for all queries
> pt-query-digest --limit=100% slow.log

# build report only for queries with execution time higher than 5s
> pt-query-digest --filter '$event->{Query_time} > 5' slow.log

# build report only for queries executed by specific user
> pt-query-digest --filter '$event->{user} =~ /myuser/' slow.log

# --filter option allows to combine several conditions, for example.
> pt-query-digest --filter '$event->{user} =~ /myuser/ && $event->{Query_time} > 5' slow.log

# if you need to get a list of all attributes for the event run the following command:
>pt-query-digest  --filter 'print Dumper $event' --no-report --sample 1 slow.log
$VAR1 = {
  Bytes_sent => '148',
  Filesort => 'No',
  Filesort_on_disk => 'No',
  Full_join => 'No',
  Full_scan => 'Yes',
  Lock_time => '0.000048',
  Merge_passes => '0',
  Priority_queue => 'No',
  QC_hit => 'No',
  Query_time => '0.000184',
  Rows_affected => '0',
  Rows_examined => '0',
  Rows_sent => '0',
  Thread_id => '85778229',
  Tmp_table => 'No',
  Tmp_table_on_disk => 'No',
  arg => 'query',
  bytes => 218,
  cmd => 'Query',
  db =>'db_name',
  fingerprint => 'fingerprint of the query',
  host => '8.8.8.8,
  ip => '8.8.8.8',
  pos_in_log => 26187012,
  timestamp => '1627462800',
  ts => '210728  9:00:00',
  user => 'user_name'
};

Your Remote DBA Support

IF you need more help We offer you our services as a remote Database Administrator (DBA). depend on your MySQL database. But, what if your database goes down? Use our MySQL database support!