在通过mysqldump进行MySQL数据库(innodb引擎)逻辑备份的时候经常使用的参数组合就是 –single-transaction –master-data=2,今天的案例就是来分析下通过–single-transaction –master-data=2参数组合进行单表备份而引发的性能问题。
某业务系统(数据库采用的MySQL数据库)上午10点左右部分业务业务反应系统缓慢,用户登陆出现超时的现象。
登录数据库服务器查看top,iostat –x 1等查看操作系统负载。
top发现:CPU:load average 超过10且是mysqld进程占用,确定是MySQL等待。
通过show processlist查看:发现大量的Waiting for global read lock 。(信息已经脱敏处理)
这里第一感觉是正在进行mysqldump备份数据,然后show processlit查看数据库的进程信息,果然发现了backup 用户正在 执行 FLUSH TABLES WITH READ LOCK 。
通过操作系统 ps –ef|grep mysqldump 查看,发现mysqldump进程。
mysqldump进程是root发起的,仔细查看mysqldump 进程的pid是不同的,且dbname和tablename也是不同的,很显然这是一个 for循环在mysqldump备份单个表。
切换到root用户,查看定时任务通过crontab –l发现有一个每天上午 4点30分开始执行的mbak.sh脚本,查看脚本是mysqldump备份数据库,通过分析备份脚本发现这个脚本的逻辑是这样的:先整库备份一次使用的是–all-database参数,然后分别备份每个数据库为一个备份文件,最后是单表备份一次,即一个表备份成一个文件。
部分脚本节选如下:
所有的数据库备份一个文件的脚本
cd ${BACKDIR}/alldb_${TIME} $MYSQLDUMP --default-character-set=utf8 -R --hex-blob --single-transaction --master-data=2 --all-databases --log-error=alldb_${TIME}.err >alldb.${TIME}.sql echo "`date +"%F %T"` BACKUP all DATABASE one file FINISH" |tee -a $logF
每个库一个备份文件的脚本
echo "DB HOST $IP IS BACKUP DATABASE per database one file " |tee -a $logF for dbname in `$MYSQL -B -N -e "show databases"|grep -vE "information_schema|performance_schema"` do mkdir -p ${BACKDIR}/${dbname}_${TIME} cd ${BACKDIR}/${dbname}_${TIME} $MYSQLDUMP --default-character-set=utf8 -R --hex-blob --single-transaction --master-data=2 $dbname --log-error=${dbname}_${TIME}.err >${dbname}.${TIME}.sql echo "`date +"%F %T"` BACKUP $dbname one fie finish" |tee -a $logF done
每个表一个备份文件的脚本
### per table per database echo "DB host $IP is backing" |tee -a $logF for dbname in `$MYSQL -B -N -e "show databases"|grep -vE "information_schema|performance_schema"` do mkdir -p ${BACKDIR}/${dbname}_per_${TIME} cd ${BACKDIR}/${dbname}_per_${TIME} for table in `$MYSQL $dbname -e "show tables"|sed '1d'|grep -vE "yxxxxxxx"` do echo "`date +"%F %T"` begin backup ${dbname}.${table} TABLE" |tee -a $logF $MYSQLDUMP --default-character-set=utf8 -R --hex-blob --single-transaction --master-data=2 $dbname $table --log-error=${dbname}_${TIME}.err >$dbname_$table.sql echo "`date +"%F %T"` BACKUP ${dbname}.$table FINISH" |tee -a $logF done echo "`date +"%F %T"` BACKUP $dbname DATABASE FINISH" |tee -a $logF done
很显然出问题的时候是在备份单个表,通过mbak.sh脚本的逻辑来看,是先全库备份,全库完成再单库备份,单库备份完成之后再单表备份。现在卡在单表备份的 FLUSH TABLES WITH READ LOCK,这是一个全库级别的锁,单表备份为什么会锁整个库呢?仔细查看上面的mysqldump备份命令,可以发现每次mysqldump都添加了 –single-transaction –master-data=2,这是问题的关键,下面通过开启general log来分析问题。
开启general_log。
root@hostname 13:14: [test]> show variables like '%gen%'; +------------------+------------------------------------------------+ | Variable_name | Value | +------------------+------------------------------------------------+ | general_log | ON | | general_log_file | /usr/local/mysql/data/BertLiudeMacBook-Pro.log | +------------------+------------------------------------------------+
备份test 库下面的 t表,使用–single-transaction –master-data=2 参数
mysqldump -uroot -ppassword --default-character-set=utf8 --single-transaction --master-data=2 test t >test_t.sql
查看general log:
2017-08-28T15:14:15.960100Z 106 Connect root@localhost on using Socket 2017-08-28T15:14:15.960860Z 106 Query /*!40100 SET @@SQL_MODE='' */ 2017-08-28T15:14:15.962606Z 106 Query /*!40103 SET TIME_ZONE='+00:00' */ 2017-08-28T15:14:15.962793Z 106 Query FLUSH /*!40101 LOCAL */ TABLES 2017-08-28T15:14:15.962887Z 106 Query FLUSH TABLES WITH READ LOCK 2017-08-28T15:14:15.962943Z 106 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ 2017-08-28T15:14:15.962997Z 106 Query START TRANSACTION /*!40100 WITH CONSISTENT SNAPSHOT */ 2017-08-28T15:14:15.963094Z 106 Query SHOW VARIABLES LIKE 'gtid\_mode' 2017-08-28T15:14:15.965326Z 106 Query SELECT @@GLOBAL.GTID_EXECUTED\ 2017-08-28T15:14:15.965458Z 106 Query SHOW MASTER STATUS 2017-08-28T15:14:15.965554Z 106 Query UNLOCK TABLES …… …... 2017-08-28T15:14:15.970349Z 106 Init DB test 2017-08-28T15:14:15.970438Z 106 Query SHOW TABLES LIKE 't' 2017-08-28T15:14:15.971500Z 106 Query SAVEPOINT sp 2017-08-28T15:14:15.971790Z 106 Query show table status like 't' 2017-08-28T15:14:15.973180Z 106 Query SET SQL_QUOTE_SHOW_CREATE=1 2017-08-28T15:14:15.973483Z 106 Query SET SESSION character_set_results = 'binary' 2017-08-28T15:14:15.973668Z 106 Query show create table `t` 2017-08-28T15:14:15.973989Z 106 Query SET SESSION character_set_results = 'utf8' 2017-08-28T15:14:15.974788Z 106 Query show fields from `t` 2017-08-28T15:14:15.975367Z 106 Query show fields from `t` 2017-08-28T15:14:15.975627Z 106 Query SELECT /*!40001 SQL_NO_CACHE */ * FROM `t` 2017-08-28T15:14:15.975779Z 106 Query SET SESSION character_set_results = 'binary' 2017-08-28T15:14:15.975881Z 106 Query use `test` 2017-08-28T15:14:15.975994Z 106 Query select @@collation_database 2017-08-28T15:14:15.976158Z 106 Query SHOW TRIGGERS LIKE 't' 2017-08-28T15:14:15.976556Z 106 Query SET SESSION character_set_results = 'utf8' 2017-08-28T15:14:15.976611Z 106 Query ROLLBACK TO SAVEPOINT sp 2017-08-28T15:14:15.976652Z 106 Query RELEASE SAVEPOINT sp 2017-08-28T15:14:15.994479Z 106 Quit
发现其执行了
FLUSH /*!40101 LOCAL */ TABLES FLUSH TABLES WITH READ LOCK
备份test 库下面的 t 表,不使用–single-transaction –master-data=2 参数
mysqldump -uroot –ppassword --default-character-set=utf8 test t >test_t.sql
查看general log:
2017-08-28T15:18:00.613324Z 109 Connect root@localhost on using Socket 2017-08-28T15:18:00.614229Z 109 Query /*!40100 SET @@SQL_MODE='' */ 2017-08-28T15:18:00.615737Z 109 Query /*!40103 SET TIME_ZONE='+00:00' */ 2017-08-28T15:18:00.615939Z 109 Query SHOW VARIABLES LIKE 'gtid\_mode' 2017-08-28T15:18:00.618627Z 109 Query SELECT @@GLOBAL.GTID_EXECUTED ………… 省略 2017-08-28T15:18:00.624351Z 109 Init DB test 2017-08-28T15:18:00.624422Z 109 Query SHOW TABLES LIKE 't' 2017-08-28T15:18:00.624843Z 109 Query LOCK TABLES `t` READ /*!32311 LOCAL */《--这里只锁定了 t 表。 2017-08-28T15:18:00.624967Z 109 Query show table status like 't' 2017-08-28T15:18:00.625452Z 109 Query SET SQL_QUOTE_SHOW_CREATE=1 2017-08-28T15:18:00.625514Z 109 Query SET SESSION character_set_results = 'binary' 2017-08-28T15:18:00.625566Z 109 Query show create table `t` 2017-08-28T15:18:00.625624Z 109 Query SET SESSION character_set_results = 'utf8' 2017-08-28T15:18:00.626177Z 109 Query show fields from `t` 2017-08-28T15:18:00.626474Z 109 Query show fields from `t` 2017-08-28T15:18:00.626725Z 109 Query SELECT /*!40001 SQL_NO_CACHE */ * FROM `t` 2017-08-28T15:18:00.626896Z 109 Query SET SESSION character_set_results = 'binary' 2017-08-28T15:18:00.626978Z 109 Query use `test` 2017-08-28T15:18:00.627034Z 109 Query select @@collation_database 2017-08-28T15:18:00.627106Z 109 Query SHOW TRIGGERS LIKE 't' 2017-08-28T15:18:00.627399Z 109 Query SET SESSION character_set_results = 'utf8' 2017-08-28T15:18:00.627451Z 109 Query UNLOCK TABLES 2017-08-28T15:18:00.645805Z 109 Quit
发现:没有执行 FLUSH /*!40101 LOCAL */ TABLES、FLUSH TABLES WITH READ LOCK 。而是执行的LOCK TABLES `t ` READ,对备份的 t 表进行一个lock。
备份全库使用–all-databases及–single-transaction –master-data=2 参数
mysqldump -uroot -ppassword --default-character-set=utf8 --single-transaction --master-data=2 --all-databases >all.singel-2.sql 查看general log:
2017-08-28T15:22:22.002783Z 110 Connect root@localhost on using Socket 2017-08-28T15:22:22.003739Z 110 Query /*!40100 SET @@SQL_MODE='' */ 2017-08-28T15:22:22.005306Z 110 Query /*!40103 SET TIME_ZONE='+00:00' */ 2017-08-28T15:22:22.005487Z 110 Query FLUSH /*!40101 LOCAL */ TABLES 2017-08-28T15:22:22.005606Z 110 Query FLUSH TABLES WITH READ LOCK 2017-08-28T15:22:22.005684Z 110 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ 2017-08-28T15:22:22.005721Z 110 Query START TRANSACTION /*!40100 WITH CONSISTENT SNAPSHOT */ 2017-08-28T15:22:22.005803Z 110 Query SHOW VARIABLES LIKE 'gtid\_mode' 2017-08-28T15:22:22.008043Z 110 Query SELECT @@GLOBAL.GTID_EXECUTED 2017-08-28T15:22:22.008193Z 110 Query SHOW MASTER STATUS 2017-08-28T15:22:22.008256Z 110 Query UNLOCK TABLES ………… 省略 2017-08-28T15:22:22.022739Z 110 Init DB b 2017-08-28T15:22:22.022825Z 110 Query SHOW CREATE DATABASE IF NOT EXISTS `b` 2017-08-28T15:22:22.022890Z 110 Query SAVEPOINT sp 2017-08-28T15:22:22.022956Z 110 Query show tables 2017-08-28T15:22:22.023106Z 110 Query RELEASE SAVEPOINT sp 2017-08-28T15:22:22.023157Z 110 Init DB e 2017-08-28T15:22:22.023200Z 110 Query SHOW CREATE DATABASE IF NOT EXISTS `e` 2017-08-28T15:22:22.023277Z 110 Query SAVEPOINT sp 2017-08-28T15:22:22.023353Z 110 Query show tables
结论:不管是全库备份还是单表备份使用了–single-transaction –master-data=2 参数会执行FLUSH /*!40101 LOCAL */ TABLES、FLUSH TABLES WITH READ LOCK 来获取 show master status的一致性。
通过进一步测试发现mysqldump备份的时候只使用 –single-transaction 不使用–master-data=2参数是不会进行锁表的。只使用 –master-data=2参数为了获取show master status,会执行 FLUSH TABLES WITH READ LOCK 的全局锁。
只使用 –single-transaction 备份单表的general log
2017-08-28T15:24:59.149626Z 111 Connect root@localhost on using Socket 2017-08-28T15:24:59.150404Z 111 Query /*!40100 SET @@SQL_MODE='' */ 2017-08-28T15:24:59.151672Z 111 Query /*!40103 SET TIME_ZONE='+00:00' */ 2017-08-28T15:24:59.151944Z 111 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ 2017-08-28T15:24:59.152007Z 111 Query START TRANSACTION /*!40100 WITH CONSISTENT SNAPSHOT */ 2017-08-28T15:24:59.152178Z 111 Query SHOW VARIABLES LIKE 'gtid\_mode' 2017-08-28T15:24:59.154385Z 111 Query SELECT @@GLOBAL.GTID_EXECUTED 2017-08-28T15:24:59.154538Z 111 Query UNLOCK TABLES ……省略 2017-08-28T15:24:59.160621Z 111 Init DB test 2017-08-28T15:24:59.160790Z 111 Query SHOW TABLES LIKE 't' 2017-08-28T15:24:59.161636Z 111 Query SAVEPOINT sp 2017-08-28T15:24:59.161832Z 111 Query show table status like 't' 2017-08-28T15:24:59.162886Z 111 Query SET SQL_QUOTE_SHOW_CREATE=1 2017-08-28T15:24:59.163046Z 111 Query SET SESSION character_set_results = 'binary' 2017-08-28T15:24:59.163226Z 111 Query show create table `t` 2017-08-28T15:24:59.163401Z 111 Query SET SESSION character_set_results = 'utf8' 2017-08-28T15:24:59.163994Z 111 Query show fields from `t` 2017-08-28T15:24:59.164420Z 111 Query show fields from `t` 2017-08-28T15:24:59.164831Z 111 Query SELECT /*!40001 SQL_NO_CACHE */ * FROM `t` 2017-08-28T15:24:59.165083Z 111 Query SET SESSION character_set_results = 'binary' 2017-08-28T15:24:59.165263Z 111 Query use `test` 2017-08-28T15:24:59.165378Z 111 Query select @@collation_database 2017-08-28T15:24:59.165540Z 111 Query SHOW TRIGGERS LIKE 't' 2017-08-28T15:24:59.167415Z 111 Query SET SESSION character_set_results = 'utf8' 2017-08-28T15:24:59.167585Z 111 Query ROLLBACK TO SAVEPOINT sp 2017-08-28T15:24:59.167707Z 111 Query RELEASE SAVEPOINT spt
结论:只使用–single-transaction 不执行任何lock table,也就是说可以进行热备份。
只使用 –master-data=2参数备份单表的general log
22017-08-28T15:27:20.891774Z 112 Connect root@localhost on using Socket 2017-08-28T15:27:20.891932Z 112 Query /*!40100 SET @@SQL_MODE='' */ 2017-08-28T15:27:20.892941Z 112 Query /*!40103 SET TIME_ZONE='+00:00' */ 2017-08-28T15:27:20.893122Z 112 Query FLUSH /*!40101 LOCAL */ TABLES 2017-08-28T15:27:20.893404Z 112 Query FLUSH TABLES WITH READ LOCK 2017-08-28T15:27:20.893508Z 112 Query SHOW VARIABLES LIKE 'gtid\_mode' 2017-08-28T15:27:20.897696Z 112 Query SELECT @@GLOBAL.GTID_EXECUTED 2017-08-28T15:27:20.897990Z 112 Query SHOW MASTER STATUS 2017-08-28T15:27:20.906499Z 112 Init DB test …………省略 2017-08-28T15:27:20.906595Z 112 Query SHOW TABLES LIKE 't' 2017-08-28T15:27:20.907204Z 112 Query show table status like 't' 2017-08-28T15:27:20.907865Z 112 Query SET SQL_QUOTE_SHOW_CREATE=1 2017-08-28T15:27:20.907973Z 112 Query SET SESSION character_set_results = 'binary' 2017-08-28T15:27:20.908068Z 112 Query show create table `t` 2017-08-28T15:27:20.908182Z 112 Query SET SESSION character_set_results = 'utf8' 2017-08-28T15:27:20.908294Z 112 Query show fields from `t` 2017-08-28T15:27:20.908705Z 112 Query show fields from `t` 2017-08-28T15:27:20.909113Z 112 Query SELECT /*!40001 SQL_NO_CACHE */ * FROM `t` 2017-08-28T15:27:20.910751Z 112 Query SET SESSION character_set_results = 'binary' 2017-08-28T15:27:20.910859Z 112 Query use `test` 2017-08-28T15:27:20.910947Z 112 Query select @@collation_database 2017-08-28T15:27:20.911098Z 112 Query SHOW TRIGGERS LIKE 't' 2017-08-28T15:27:20.911609Z 112 Query SET SESSION character_set_results = 'utf8' 2017-08-28T15:27:20.928296Z 112 Quit
也就是说通过这几次实验发现只要添加了–master-data=2参数就会执行 FLUSH TABLES WITH READ LOCK全局锁,即使你是备份单个表也会锁整个库来获取show master status;从而导致整个库的一个等待。在业务低峰期,获取全局锁会很快,但是由于整个库有700多个表,有部分表在上午10点也就是业务高峰期还没备份完成,由于高峰期有大量的DML操作,从而获取全局锁更加困难,Waiting for global read lock 的时间会更长,从而影响了业务的正常运行。
问题解决:
找到mbak.sh 脚本的pid号,进行kill操作,数据库恢复正常。
改善:
调整备份策略:
1、取消备份每个单表为一个文件,减少全局锁(经过生产环境实际测试mysqldump全库(17G数据)备份一次不到5分钟);
2、如果有必要进行单表备份的话,禁用–master-data=2参数,只使用–single-transaction 参数;
3、采用xtraback 物理备份替换mysqldump的逻辑备份,来进行在线热备数据库(innodb引擎)。
单表备份:禁用–master-data=2参数
全库备份:–master-data=2 和 –single-transaction 组合,实现innodb的在线热备。
附:
官档上的 — single-transaction
https://dev.mysql.com/doc/refman/5.7/en/mysqlpump.html#option_mysqlpump_single-transaction
–single-transaction
This option sets the transaction isolation mode to REPEATABLE READ and sends a START TRANSACTION SQL statement to the server before dumping data. It is useful only with transactional tables such as InnoDB, because then it dumps the consistent state of the database at the time when START TRANSACTION was issued without blocking any applications.
When using this option, you should keep in mind that only InnoDB tables are dumped in a consistent state. For example, any MyISAM or MEMORY tables dumped while using this option may still change state.
While a –single-transaction dump is in process, to ensure a valid dump file (correct table contents and binary log coordinates), no other connection should use the following statements: ALTER TABLE, CREATE TABLE,DROP TABLE, RENAME TABLE, TRUNCATE TABLE. A consistent read is not isolated from those statements, so use of them on a table to be dumped can cause the SELECT that is performed by mysqlpump to retrieve the table contents to obtain incorrect contents or fail.
–add-locks and –single-transaction are mutually exclusive.
官档上的 –master-data
https://dev.mysql.com/doc/refman/5.7/en/mysqldump.html#option_mysqldump_master-data
–master-data[=value]
Use this option to dump a master replication server to produce a dump file that can be used to set up another server as a slave of the master. It causes the dump output to include a CHANGE MASTER TO statement that indicates the binary log coordinates (file name and position) of the dumped server. These are the master server coordinates from which the slave should start replicating after you load the dump file into the slave.
If the option value is 2, the CHANGE MASTER TO statement is written as an SQL comment, and thus is informative only; it has no effect when the dump file is reloaded. If the option value is 1, the statement is not written as a comment and takes effect when the dump file is reloaded. If no option value is specified, the default value is 1.
This option requires the RELOAD privilege and the binary log must be enabled.
The –master-data option automatically turns off –lock-tables. It also turns on –lock-all-tables, unless–single-transaction also is specified, in which case, a global read lock is acquired only for a short time at the beginning of the dump (see the description for –single-transaction). In all cases, any action on logs happens at the exact moment of the dump.
It is also possible to set up a slave by dumping an existing slave of the master, using the –dump-slave option, which overrides –master-data and causes it to be ignored if both options are used.
官档上的 FLUSH TABLES WITH READ LOCK
https://dev.mysql.com/doc/refman/5.7/en/flush.html#flush-tables-with-read-lock
Closes all open tables and locks all tables for all databases with a global read lock. This is a very convenient way to get backups if you have a file system such as Veritas or ZFS that can take snapshots in time. Use UNLOCK TABLESto release the lock.
FLUSH TABLES WITH READ LOCK acquires a global read lock rather than table locks, so it is not subject to the same behavior as LOCK TABLES and UNLOCK TABLES with respect to table locking and implicit commits:
- UNLOCK TABLESimplicitly commits any active transaction only if any tables currently have been locked with LOCK TABLES. The commit does not occur for UNLOCK TABLES following FLUSH TABLES WITH READ LOCKbecause the latter statement does not acquire table locks.
- Beginning a transaction causes table locks acquired with LOCK TABLESto be released, as though you had executed UNLOCK TABLES. Beginning a transaction does not release a global read lock acquired with FLUSH TABLES WITH READ LOCK.
FLUSH TABLES WITH READ LOCK is not compatible with XA transactions.
FLUSH TABLES WITH READ LOCK does not prevent the server from inserting rows into the log tables (see Section 5.4.1, “Selecting General Query and Slow Query Log Output Destinations”).