Bug #42338 Backup of a database takes long time when there is high load on another database
Submitted: 26 Jan 2009 9:02 Modified: 11 Aug 2009 11:55
Reporter: Bernt Marius Johnsen Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Backup Severity:S2 (Serious)
Version:mysql-6.0-backup OS:Any
Assigned to: Assigned Account CPU Architecture:Any

[26 Jan 2009 9:02] Bernt Marius Johnsen
Description:
Backup of a small database (26MB) takes approx. 2 seconds when there is no load on the server, but when there is high load on another database, it takes 4-6 minutes.

How to repeat:
Run the attached jar with the attached property file like this:

java -classpath jl.jar:mysql-connector-java-5.1.7-bin.jar com.sun.hadb.misc.jl.JL test-mysql.prop

when the test has been initialized and the test run has started (When you see I line like this:
2009-01-26 09:47:10.867** Starting test)

Abort the test (Ctrl-C), make a copy of the longtest database and start the java program again. When the test has run for some minutes (10-15), try to take a backup of the copy.
[26 Jan 2009 9:03] Bernt Marius Johnsen
Jar file which runs test

Attachment: jl.jar (application/java-archive, text), 31.95 KiB.

[26 Jan 2009 9:04] Bernt Marius Johnsen
Property file for jl.jar

Attachment: test-mysql.prop (application/octet-stream, text), 310 bytes.

[26 Jan 2009 9:10] Bernt Marius Johnsen
See also bug report #42339
[26 Jan 2009 12:29] Sveta Smirnova
Thank you for the report.

Verified as described. Problem only repeatable if updates happen on InnoDB tables. Probably duplicate of bug #34414
[24 Mar 2009 12:23] Rafal Somla
I could not repeat the same problem with current tree. I was running measurements on one of Trondheim lab machines. It is 2x1.8G Athlon CPU machine with 2Gb RAM. Uname -a prints there:

Linux 2.6.18-92.1.10.el5 #1 SMP Wed Jul 23 03:55:54 EDT 2008 i686 athlon i386 GNU/Linux

Measurement setup was as follows:

1. I created a database with single innodb table containing around 3.3M rows.
2. I run random transactions against a different database. I used one of the setup files provided with the Random Query Generator (transactions.{zz,yy}). I varied number of parallel connections from 0 to 100.
3. While RQG was performing transactions, I run a test script which performed BACKUP of the first database 15 times and reported average time. The size of backup image was around 39Mb.

I attach results in measurements.xls file.

In the results one can see that the time does not vary with parallel load as reported earlier. Up to 50 parallel connections, performance is almost the same as with no load, i.e. it takes around 50 sec to backup the database. Starting with 80 connections and more, the time is doubled. This does not necessarily have to be related to backup but can be a general characteristics of the server. I'll try to perform more measurements to confirm this thesis.
[24 Mar 2009 12:26] Rafal Somla
Results of measurements.

Attachment: measurements.xls (application/octet-stream, text), 9.50 KiB.

[24 Mar 2009 12:30] Rafal Somla
test script used to measure BACKUP times.

Attachment: test.test (application/octet-stream, text), 891 bytes.

[24 Mar 2009 12:33] Rafal Somla
Results of measurements in txt format:

#of connections	
	avg. time of BACKUP (min)
0	00:44
1	00:47
2	00:48
5	00:49
10	00:59
20	00:50
50	01:01
60	01:05
80	01:55
100	02:06
[2 Apr 2009 12:31] Bernt Marius Johnsen
But. Does it perform well with the same load that I used (the same client)?
[11 Aug 2009 11:55] Jørgen Løland
I experimented with the attached jar file and found that BACKUP does indeed take a long time to execute when there is concurrent load on another database. I sampled what the BACKUP was doing a number of times, and found this stack:

#0  0xb8010430 in __kernel_vsyscall ()
#1  0xb7fe30e5 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib/tls/i686/cmov/libpthread.so.0
#2  0x08652e4e in safe_cond_wait (cond=0x8971fa0, mp=0x8971900, 
    file=0x875b097 "lock.cc", line=1161) at thr_mutex.c:423
#3  0x082955a9 in lock_global_read_lock (thd=0x9709670) at lock.cc:1161
#4  0x08704149 in backup::block_commits (thd=0x9709670, tables=0x0)
    at data_backup.cc:395
#5  0x0870480d in backup::write_table_data (thd=0x9709670, info=@0xa2c32f0, 
    s=@0x9d1b700) at data_backup.cc:700
#6  0x086f9f1b in Backup_restore_ctx::do_backup (this=0xb73e9b64)
    at kernel.cc:1315
#7  0x086fcca8 in execute_backup_command (thd=0x9709670, lex=0x970a5bc, 
    backupdir=0xb73ea140, overwrite=false, skip_gap_event=false)
    at kernel.cc:218
#8  0x082b335e in mysql_execute_command (thd=0x9709670) at sql_parse.cc:2432
#9  0x082bbe3b in mysql_parse (thd=0x9709670, 
    inBuf=0x9749110 "backup database longtestcopy to '6.bak'", length=39, 
    found_semicolon=0xb73eabd4) at sql_parse.cc:5942

As expected, this is the same stack as shown in BUG#44568, and shows that the problem lies in the commit blocker mechanism BACKUP currently uses (GlobalReadLock). 

Marking this bug as duplicate of BUG#44568 since I've started adding info to that bug.