在通过mysqldump进行MySQL数据库(innodb引擎)逻辑备份的时候经常使用的参数组合就是 --single-transaction --master-data=2,今天的案例就是来分析下通过--single-transaction --master-data=2参数组合进行单表备份而引发的性能问题。

  • 问题描述:
    某业务系统(数据库采用的MySQL数据库)上午10点左右部分业务业务反应系统缓慢,用户登陆出现超时的现象。
  • 问题分析与处理
    登录数据库服务器查看topiostat –x 1等查看操作系统负载。
    top发现:CPU:load average 超过10且是mysqld进程占用,确定是MySQL等待。
    通过 show processlist查看:发现大量的Waiting for global read lock 。(信息已经脱敏处理)

lock 这里第一感觉是正在进行mysqldump备份数据,然后show processlit查看数据库的进程信息,果然发现了backup 用户正在 执行 FLUSH TABLES WITH READ LOCK

back 通过操作系统 ps –efgrep mysqldump 查看,发现mysqldump进程。

mysqldump

mysqldump进程是root发起的,仔细查看mysqldump 进程的pid是不同的,且dbnametablename也是不同的,很显然这是一个 for循环在mysqldump备份单个表。 切换到root用户,查看定时任务通过crontab –l发现有一个每天上午 4点30分开始执行的mbak.sh脚本,查看脚本是mysqldump备份数据库,通过分析备份脚本发现这个脚本的逻辑是这样的:先整库备份一次使用的是--all-database参数,然后分别备份每个数据库为一个备份文件,最后是单表备份一次,即一个表备份成一个文件。

部分脚本节选如下: 所有的数据库备份一个文件的脚本

1
2
3
cd ${BACKDIR}/alldb_${TIME}
$MYSQLDUMP --default-character-set=utf8 -R --hex-blob --single-transaction --master-data=2  --all-databases --log-error=alldb_${TIME}.err >alldb.${TIME}.sql
echo "`date +"%F %T"` BACKUP all  DATABASE  one file FINISH" tee -a $logF

每个库一个备份文件的脚本

1
2
3
4
5
6
7
8
echo "DB HOST  $IP IS BACKUP DATABASE per database one file " tee -a $logF
for dbname in `$MYSQL -B -N -e "show databases"grep -vE "information_schemaperformance_schema"`
do
mkdir -p ${BACKDIR}/${dbname}_${TIME}
cd ${BACKDIR}/${dbname}_${TIME}
$MYSQLDUMP --default-character-set=utf8 -R --hex-blob --single-transaction --master-data=2  $dbname --log-error=${dbname}_${TIME}.err >${dbname}.${TIME}.sql
echo "`date +"%F %T"` BACKUP $dbname  one fie finish" tee -a $logF
done

每个表一个备份文件的脚本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
### per table per database
echo "DB host $IP is backing" tee -a $logF
for dbname in `$MYSQL -B -N -e "show databases"grep -vE "information_schemaperformance_schema"`
do
mkdir -p ${BACKDIR}/${dbname}_per_${TIME}
cd ${BACKDIR}/${dbname}_per_${TIME}
for table in `$MYSQL $dbname -e "show tables"sed '1d'grep -vE "yxxxxxxx"`
do
echo "`date +"%F %T"` begin backup  ${dbname}.${table}  TABLE" tee -a $logF
$MYSQLDUMP --default-character-set=utf8 -R --hex-blob --single-transaction  --master-data=2  $dbname $table --log-error=${dbname}_${TIME}.err >$dbname_$table.sql
echo "`date +"%F %T"` BACKUP ${dbname}.$table FINISH" tee -a $logF
done
echo "`date +"%F %T"` BACKUP $dbname DATABASE FINISH" tee -a $logF
done

很显然出问题的时候是在备份单个表,通过mbak.sh脚本的逻辑来看,是先全库备份,全库完成再单库备份,单库备份完成之后再单表备份。现在卡在单表备份的  FLUSH TABLES WITH READ LOCK,这是一个全库级别的锁,单表备份为什么会锁整个库呢?仔细查看上面的mysqldump备份命令,可以发现每次mysqldump都添加了 --single-transaction --master-data=2,这是问题的关键,下面通过开启general log来分析问题。
开启 general_log

1
2
3
4
5
6
7
root@hostname 13:14:  [test]> show variables like '%gen%';
+------------------+------------------------------------------------+
Variable_name    Value                                         
+------------------+------------------------------------------------+
general_log      ON                                            
general_log_file /usr/local/mysql/data/BertLiudeMacBook-Pro.log
+------------------+------------------------------------------------+

备份test 库下面的 t表,使用--single-transaction --master-data=2 参数

1
mysqldump  -uroot -ppassword  --default-character-set=utf8 --single-transaction --master-data=2   test t    >test_t.sql

查看 general log

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
2017-08-28T15:14:15.960100Z     106 Connect    root@localhost on  using Socket
2017-08-28T15:14:15.960860Z     106 Query        /*!40100 SET @@SQL_MODE='' */
2017-08-28T15:14:15.962606Z     106 Query        /*!40103 SET TIME_ZONE='+00:00' */
2017-08-28T15:14:15.962793Z     106 Query        FLUSH /*!40101 LOCAL */ TABLES
2017-08-28T15:14:15.962887Z     106 Query        FLUSH TABLES WITH READ LOCK
2017-08-28T15:14:15.962943Z     106 Query        SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
2017-08-28T15:14:15.962997Z     106 Query        START TRANSACTION /*!40100 WITH CONSISTENT SNAPSHOT */
2017-08-28T15:14:15.963094Z     106 Query        SHOW VARIABLES LIKE 'gtid_mode'
2017-08-28T15:14:15.965326Z     106 Query        SELECT @@GLOBAL.GTID_EXECUTED
2017-08-28T15:14:15.965458Z     106 Query        SHOW MASTER STATUS
2017-08-28T15:14:15.965554Z     106 Query        UNLOCK TABLES
…… …...
2017-08-28T15:14:15.970349Z     106 Init DB       test
2017-08-28T15:14:15.970438Z     106 Query        SHOW TABLES LIKE 't'
2017-08-28T15:14:15.971500Z     106 Query        SAVEPOINT sp
2017-08-28T15:14:15.971790Z     106 Query        show table status like 't'
2017-08-28T15:14:15.973180Z     106 Query        SET SQL_QUOTE_SHOW_CREATE=1
2017-08-28T15:14:15.973483Z     106 Query        SET SESSION character_set_results = 'binary'
2017-08-28T15:14:15.973668Z     106 Query        show create table `t`
2017-08-28T15:14:15.973989Z     106 Query        SET SESSION character_set_results = 'utf8'
2017-08-28T15:14:15.974788Z     106 Query        show fields from `t`
2017-08-28T15:14:15.975367Z     106 Query        show fields from `t`
2017-08-28T15:14:15.975627Z     106 Query        SELECT /*!40001 SQL_NO_CACHE */ * FROM `t`
2017-08-28T15:14:15.975779Z     106 Query        SET SESSION character_set_results = 'binary'
2017-08-28T15:14:15.975881Z     106 Query        use `test`
2017-08-28T15:14:15.975994Z     106 Query        select @@collation_database
2017-08-28T15:14:15.976158Z     106 Query        SHOW TRIGGERS LIKE 't'
2017-08-28T15:14:15.976556Z     106 Query        SET SESSION character_set_results = 'utf8'
2017-08-28T15:14:15.976611Z     106 Query        ROLLBACK TO SAVEPOINT sp
2017-08-28T15:14:15.976652Z     106 Query        RELEASE SAVEPOINT sp
2017-08-28T15:14:15.994479Z     106 Quit

发现其执行了

1
2
FLUSH /*!40101 LOCAL */ TABLES
FLUSH TABLES WITH READ LOCK

备份test 库下面的 t 表,不使用--single-transaction --master-data=2 参数

1
mysqldump -uroot –ppassword --default-character-set=utf8 test t  >test_t.sql

查看general log

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
2017-08-28T15:18:00.613324Z     109 Connect    root@localhost on  using Socket
2017-08-28T15:18:00.614229Z     109 Query        /*!40100 SET @@SQL_MODE='' */
2017-08-28T15:18:00.615737Z     109 Query        /*!40103 SET TIME_ZONE='+00:00' */
2017-08-28T15:18:00.615939Z     109 Query        SHOW VARIABLES LIKE 'gtid_mode'
2017-08-28T15:18:00.618627Z     109 Query        SELECT @@GLOBAL.GTID_EXECUTED
………… 省略
2017-08-28T15:18:00.624351Z     109 Init DB       test
2017-08-28T15:18:00.624422Z     109 Query        SHOW TABLES LIKE 't'
2017-08-28T15:18:00.624843Z     109 Query        LOCK TABLES `t` READ /*!32311 LOCAL */《--这里只锁定了 t 表。
2017-08-28T15:18:00.624967Z     109 Query        show table status like 't'
2017-08-28T15:18:00.625452Z     109 Query        SET SQL_QUOTE_SHOW_CREATE=1
2017-08-28T15:18:00.625514Z     109 Query        SET SESSION character_set_results = 'binary'
2017-08-28T15:18:00.625566Z     109 Query        show create table `t`
2017-08-28T15:18:00.625624Z     109 Query        SET SESSION character_set_results = 'utf8'
2017-08-28T15:18:00.626177Z     109 Query        show fields from `t`
2017-08-28T15:18:00.626474Z     109 Query        show fields from `t`
2017-08-28T15:18:00.626725Z     109 Query        SELECT /*!40001 SQL_NO_CACHE */ * FROM `t`
2017-08-28T15:18:00.626896Z     109 Query        SET SESSION character_set_results = 'binary'
2017-08-28T15:18:00.626978Z     109 Query        use `test`
2017-08-28T15:18:00.627034Z     109 Query        select @@collation_database
2017-08-28T15:18:00.627106Z     109 Query        SHOW TRIGGERS LIKE 't'
2017-08-28T15:18:00.627399Z     109 Query        SET SESSION character_set_results = 'utf8'
2017-08-28T15:18:00.627451Z     109 Query        UNLOCK TABLES
2017-08-28T15:18:00.645805Z     109 Quit

发现:没有执行 FLUSH /*!40101 LOCAL */ TABLES、FLUSH TABLES WITH READ LOCK 。而是执行的LOCK TABLES t READ,对备份的 t 表进行一个lock。 备份全库使用--all-databases及--single-transaction --master-data=2 参数

1
mysqldump -uroot -ppassword --default-character-set=utf8  --single-transaction --master-data=2 --all-databases   >all.singel-2.sql

查看general log

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
2017-08-28T15:22:22.002783Z     110 Connect    root@localhost on  using Socket
2017-08-28T15:22:22.003739Z     110 Query        /*!40100 SET @@SQL_MODE='' */
2017-08-28T15:22:22.005306Z     110 Query        /*!40103 SET TIME_ZONE='+00:00' */
2017-08-28T15:22:22.005487Z     110 Query        FLUSH /*!40101 LOCAL */ TABLES
2017-08-28T15:22:22.005606Z     110 Query        FLUSH TABLES WITH READ LOCK
2017-08-28T15:22:22.005684Z     110 Query        SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
2017-08-28T15:22:22.005721Z     110 Query        START TRANSACTION /*!40100 WITH CONSISTENT SNAPSHOT */
2017-08-28T15:22:22.005803Z     110 Query        SHOW VARIABLES LIKE 'gtid_mode'
2017-08-28T15:22:22.008043Z     110 Query        SELECT @@GLOBAL.GTID_EXECUTED
2017-08-28T15:22:22.008193Z     110 Query        SHOW MASTER STATUS
2017-08-28T15:22:22.008256Z     110 Query        UNLOCK TABLES
………… 省略
2017-08-28T15:22:22.022739Z     110 Init DB       b
2017-08-28T15:22:22.022825Z     110 Query        SHOW CREATE DATABASE IF NOT EXISTS `b`
2017-08-28T15:22:22.022890Z     110 Query        SAVEPOINT sp
2017-08-28T15:22:22.022956Z     110 Query        show tables
2017-08-28T15:22:22.023106Z     110 Query        RELEASE SAVEPOINT sp
2017-08-28T15:22:22.023157Z     110 Init DB       e
2017-08-28T15:22:22.023200Z     110 Query        SHOW CREATE DATABASE IF NOT EXISTS `e`
2017-08-28T15:22:22.023277Z     110 Query        SAVEPOINT sp
2017-08-28T15:22:22.023353Z     110 Query        show tables

结论:不管是全库备份还是单表备份使用了--single-transaction --master-data=2 参数会执行FLUSH /*!40101 LOCAL */ TABLES、FLUSH TABLES WITH READ LOCK 来获取 show master status的一致性。 通过进一步测试发现mysqldump备份的时候只使用 --single-transaction 不使用--master-data=2参数是不会进行锁表的。只使用 --master-data=2参数为了获取show master status,会执行 FLUSH TABLES WITH READ LOCK  的全局锁。 只使用 --single-transaction 备份单表的general log

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
2017-08-28T15:24:59.149626Z     111 Connect    root@localhost on  using Socket
2017-08-28T15:24:59.150404Z     111 Query        /*!40100 SET @@SQL_MODE='' */
2017-08-28T15:24:59.151672Z     111 Query        /*!40103 SET TIME_ZONE='+00:00' */
2017-08-28T15:24:59.151944Z     111 Query        SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
2017-08-28T15:24:59.152007Z     111 Query        START TRANSACTION /*!40100 WITH CONSISTENT SNAPSHOT */
2017-08-28T15:24:59.152178Z     111 Query        SHOW VARIABLES LIKE 'gtid_mode'
2017-08-28T15:24:59.154385Z     111 Query        SELECT @@GLOBAL.GTID_EXECUTED
2017-08-28T15:24:59.154538Z     111 Query        UNLOCK TABLES
……省略
2017-08-28T15:24:59.160621Z     111 Init DB       test
2017-08-28T15:24:59.160790Z     111 Query        SHOW TABLES LIKE 't'
2017-08-28T15:24:59.161636Z     111 Query        SAVEPOINT sp
2017-08-28T15:24:59.161832Z     111 Query        show table status like 't'
2017-08-28T15:24:59.162886Z     111 Query        SET SQL_QUOTE_SHOW_CREATE=1
2017-08-28T15:24:59.163046Z     111 Query        SET SESSION character_set_results = 'binary'
2017-08-28T15:24:59.163226Z     111 Query        show create table `t`
2017-08-28T15:24:59.163401Z     111 Query        SET SESSION character_set_results = 'utf8'
2017-08-28T15:24:59.163994Z     111 Query        show fields from `t`
2017-08-28T15:24:59.164420Z     111 Query        show fields from `t`
2017-08-28T15:24:59.164831Z     111 Query        SELECT /*!40001 SQL_NO_CACHE */ * FROM `t`
2017-08-28T15:24:59.165083Z     111 Query        SET SESSION character_set_results = 'binary'
2017-08-28T15:24:59.165263Z     111 Query        use `test`
2017-08-28T15:24:59.165378Z     111 Query        select @@collation_database
2017-08-28T15:24:59.165540Z     111 Query        SHOW TRIGGERS LIKE 't'
2017-08-28T15:24:59.167415Z     111 Query        SET SESSION character_set_results = 'utf8'
2017-08-28T15:24:59.167585Z     111 Query        ROLLBACK TO SAVEPOINT sp
2017-08-28T15:24:59.167707Z     111 Query        RELEASE SAVEPOINT spt

结论:只使用--single-transaction 不执行任何lock table,也就是说可以进行热备份。

只使用 --master-data=2参数备份单表的general log

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
22017-08-28T15:27:20.891774Z   112 Connect    root@localhost on  using Socket
2017-08-28T15:27:20.891932Z     112 Query        /*!40100 SET @@SQL_MODE='' */
2017-08-28T15:27:20.892941Z     112 Query        /*!40103 SET TIME_ZONE='+00:00' */
2017-08-28T15:27:20.893122Z     112 Query        FLUSH /*!40101 LOCAL */ TABLES
2017-08-28T15:27:20.893404Z     112 Query        FLUSH TABLES WITH READ LOCK
2017-08-28T15:27:20.893508Z     112 Query        SHOW VARIABLES LIKE 'gtid_mode'
2017-08-28T15:27:20.897696Z     112 Query        SELECT @@GLOBAL.GTID_EXECUTED
2017-08-28T15:27:20.897990Z     112 Query        SHOW MASTER STATUS
2017-08-28T15:27:20.906499Z     112 Init DB       test
…………省略
2017-08-28T15:27:20.906595Z     112 Query        SHOW TABLES LIKE 't'
2017-08-28T15:27:20.907204Z     112 Query        show table status like 't'
2017-08-28T15:27:20.907865Z     112 Query        SET SQL_QUOTE_SHOW_CREATE=1
2017-08-28T15:27:20.907973Z     112 Query        SET SESSION character_set_results = 'binary'
2017-08-28T15:27:20.908068Z     112 Query        show create table `t`
2017-08-28T15:27:20.908182Z     112 Query        SET SESSION character_set_results = 'utf8'
2017-08-28T15:27:20.908294Z     112 Query        show fields from `t`
2017-08-28T15:27:20.908705Z     112 Query        show fields from `t`
2017-08-28T15:27:20.909113Z     112 Query        SELECT /*!40001 SQL_NO_CACHE */ * FROM `t`
2017-08-28T15:27:20.910751Z     112 Query        SET SESSION character_set_results = 'binary'
2017-08-28T15:27:20.910859Z     112 Query        use `test`
2017-08-28T15:27:20.910947Z     112 Query        select @@collation_database
2017-08-28T15:27:20.911098Z     112 Query        SHOW TRIGGERS LIKE 't'
2017-08-28T15:27:20.911609Z     112 Query        SET SESSION character_set_results = 'utf8'
2017-08-28T15:27:20.928296Z     112 Quit
  • 结论:
    也就是说通过这几次实验发现只要添加了--master-data=2参数就会执行 FLUSH TABLES WITH READ LOCK全局锁,即使你是备份单个表也会锁整个库来获取show master status;从而导致整个库的一个等待。在业务低峰期,获取全局锁会很快,但是由于整个库有700多个表,有部分表在上午10点也就是业务高峰期还没备份完成,由于高峰期有大量的DML操作,从而获取全局锁更加困难,Waiting for global read lock 的时间会更长,从而影响了业务的正常运行。
  • 问题解决与改善
    问题解决: 找到mbak.sh 脚本的pid号,进行kill操作,数据库恢复正常。 改善: 调整备份策略:
    1、取消备份每个单表为一个文件,减少全局锁(经过生产环境实际测试mysqldump全库(17G数据)备份一次不到5分钟);
    2、_如果有必要进行单表备份的话,禁用--master-data=2参数,只使用--single-transaction参数;
    3、采用 xtraback 物理备份替换mysqldump的逻辑备份,来进行在线热备数据库(innodb引擎)。
  • 总结:
    单表备份:禁用--master-data=2参数 全库备份:--master-data=2--single-transaction 组合,实现innodb的在线热备。

附: 官档上的 --single-transaction

https://dev.mysql.com/doc/refman/5.7/en/mysqlpump.html#option_mysqlpump_single-transaction –single-transaction This option sets the transaction isolation mode to REPEATABLE READ and sends a START TRANSACTION SQL statement to the server before dumping data. It is useful only with transactional tables such as InnoDB, because then it dumps the consistent state of the database at the time when START TRANSACTION was issued without blocking any applications. When using this option, you should keep in mind that only InnoDB tables are dumped in a consistent state. For example, any MyISAM or MEMORY tables dumped while using this option may still change state. While a –single-transaction dump is in process, to ensure a valid dump file (correct table contents and binary log coordinates), no other connection should use the following statements: ALTER TABLECREATE TABLE,DROP TABLERENAME TABLETRUNCATE TABLE. A consistent read is not isolated from those statements, so use of them on a table to be dumped can cause the SELECT that is performed by mysqlpump to retrieve the table contents to obtain incorrect contents or fail. –add-locks and –single-transaction are mutually exclusive. 官档上的 –master-data https://dev.mysql.com/doc/refman/5.7/en/mysqldump.html#option_mysqldump_master-data –master-data[=value] Use this option to dump a master replication server to produce a dump file that can be used to set up another server as a slave of the master. It causes the dump output to include a CHANGE MASTER TO statement that indicates the binary log coordinates (file name and position) of the dumped server. These are the master server coordinates from which the slave should start replicating after you load the dump file into the slave. If the option value is 2, the CHANGE MASTER TO statement is written as an SQL comment, and thus is informative only; it has no effect when the dump file is reloaded. If the option value is 1, the statement is not written as a comment and takes effect when the dump file is reloaded. If no option value is specified, the default value is 1. This option requires the RELOAD privilege and the binary log must be enabled. The –master-data option automatically turns off –lock-tables. It also turns on –lock-all-tables, unless–single-transaction also is specified, in which case, a global read lock is acquired only for a short time at the beginning of the dump (see the description for –single-transaction). In all cases, any action on logs happens at the exact moment of the dump. It is also possible to set up a slave by dumping an existing slave of the master, using the –dump-slave option, which overrides –master-data and causes it to be ignored if both options are used. 官档上的 FLUSH TABLES WITH READ LOCK https://dev.mysql.com/doc/refman/5.7/en/flush.html#flush-tables-with-read-lock

Closes all open tables and locks all tables for all databases with a global read lock. This is a very convenient way to get backups if you have a file system such as Veritas or ZFS that can take snapshots in time. Use UNLOCK TABLESto release the lock. FLUSH TABLES WITH READ LOCK acquires a global read lock rather than table locks, so it is not subject to the same behavior as LOCK TABLES and UNLOCK TABLES with respect to table locking and implicit commits:

FLUSH TABLES WITH READ LOCK is not compatible with XA transactions. FLUSH TABLES WITH READ LOCK does not prevent the server from inserting rows into the log tables (see Section 5.4.1, “Selecting General Query and Slow Query Log Output Destinations”).

原文作者: liups.com

原文链接: http://liups.com/posts/528b78ee/

许可协议: 知识共享署名-非商业性使用 4.0 国际许可协议