MySQL Bugs: #42338: Backup of a database takes long time when there is high load on another database

Bug #42338	Backup of a database takes long time when there is high load on another database
Submitted:	26 Jan 2009 9:02	Modified:	11 Aug 2009 11:55
Reporter:	Bernt Marius Johnsen	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Backup	Severity:	S2 (Serious)
Version:	mysql-6.0-backup	OS:	Any
Assigned to:	Assigned Account	CPU Architecture:	Any

Description:
Backup of a small database (26MB) takes approx. 2 seconds when there is no load on the server, but when there is high load on another database, it takes 4-6 minutes.

How to repeat:
Run the attached jar with the attached property file like this:

java -classpath jl.jar:mysql-connector-java-5.1.7-bin.jar com.sun.hadb.misc.jl.JL test-mysql.prop

when the test has been initialized and the test run has started (When you see I line like this:
2009-01-26 09:47:10.867** Starting test)

Abort the test (Ctrl-C), make a copy of the longtest database and start the java program again. When the test has run for some minutes (10-15), try to take a backup of the copy.

Jar file which runs test

Attachment: jl.jar (application/java-archive, text), 31.95 KiB.

Property file for jl.jar

Attachment: test-mysql.prop (application/octet-stream, text), 310 bytes.

See also bug report #42339

Thank you for the report.

Verified as described. Problem only repeatable if updates happen on InnoDB tables. Probably duplicate of bug #34414

I could not repeat the same problem with current tree. I was running measurements on one of Trondheim lab machines. It is 2x1.8G Athlon CPU machine with 2Gb RAM. Uname -a prints there:

Linux 2.6.18-92.1.10.el5 #1 SMP Wed Jul 23 03:55:54 EDT 2008 i686 athlon i386 GNU/Linux

Measurement setup was as follows:

1. I created a database with single innodb table containing around 3.3M rows.
2. I run random transactions against a different database. I used one of the setup files provided with the Random Query Generator (transactions.{zz,yy}). I varied number of parallel connections from 0 to 100.
3. While RQG was performing transactions, I run a test script which performed BACKUP of the first database 15 times and reported average time. The size of backup image was around 39Mb.

I attach results in measurements.xls file.

In the results one can see that the time does not vary with parallel load as reported earlier. Up to 50 parallel connections, performance is almost the same as with no load, i.e. it takes around 50 sec to backup the database. Starting with 80 connections and more, the time is doubled. This does not necessarily have to be related to backup but can be a general characteristics of the server. I'll try to perform more measurements to confirm this thesis.

Results of measurements.

Attachment: measurements.xls (application/octet-stream, text), 9.50 KiB.

test script used to measure BACKUP times.

Attachment: test.test (application/octet-stream, text), 891 bytes.

Results of measurements in txt format:

#of connections	
	avg. time of BACKUP (min)
0	00:44
1	00:47
2	00:48
5	00:49
10	00:59
20	00:50
50	01:01
60	01:05
80	01:55
100	02:06

But. Does it perform well with the same load that I used (the same client)?

I experimented with the attached jar file and found that BACKUP does indeed take a long time to execute when there is concurrent load on another database. I sampled what the BACKUP was doing a number of times, and found this stack:

#0  0xb8010430 in __kernel_vsyscall ()
#1  0xb7fe30e5 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/tls/i686/cmov/libpthread.so.0
#2  0x08652e4e in safe_cond_wait (cond=0x8971fa0, mp=0x8971900, 
    file=0x875b097 "lock.cc", line=1161) at thr_mutex.c:423
#3  0x082955a9 in lock_global_read_lock (thd=0x9709670) at lock.cc:1161
#4  0x08704149 in backup::block_commits (thd=0x9709670, tables=0x0)
    at data_backup.cc:395
#5  0x0870480d in backup::write_table_data (thd=0x9709670, info=@0xa2c32f0, 
    s=@0x9d1b700) at data_backup.cc:700
#6  0x086f9f1b in Backup_restore_ctx::do_backup (this=0xb73e9b64)
    at kernel.cc:1315
#7  0x086fcca8 in execute_backup_command (thd=0x9709670, lex=0x970a5bc, 
    backupdir=0xb73ea140, overwrite=false, skip_gap_event=false)
    at kernel.cc:218
#8  0x082b335e in mysql_execute_command (thd=0x9709670) at sql_parse.cc:2432
#9  0x082bbe3b in mysql_parse (thd=0x9709670, 
    inBuf=0x9749110 "backup database longtestcopy to '6.bak'", length=39, 
    found_semicolon=0xb73eabd4) at sql_parse.cc:5942

As expected, this is the same stack as shown in BUG#44568, and shows that the problem lies in the commit blocker mechanism BACKUP currently uses (GlobalReadLock). 

Marking this bug as duplicate of BUG#44568 since I've started adding info to that bug.