Bug #19221 RBR performance regression in write_row
Submitted: 20 Apr 2006 14:11 Modified: 3 Feb 2007 7:29
Reporter: Elliot Murphy Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Row Based Replication ( RBR ) Severity:S5 (Performance)
Version:5.1 OS:
Assigned to: Tomas Ulin CPU Architecture:Any

[20 Apr 2006 14:11] Elliot Murphy
Description:
Tomas found a perf degradation of about 10x, seems to be in write_row interface to binlog. Need to inspect/measure the event code and blob handling.

How to repeat:
Compare current with old versions.
[8 Nov 2006 10:54] Stewart Smith
callgrind call graph of injector thread during bank run.

Attachment: injector01.png (image/png, text), 134.66 KiB.

[9 Nov 2006 2:57] Stewart Smith
A simple patch such as the below, can get about a 10% speed boost on delete row (see attached callgrind graphs - opt01-before.png and opt01-after.png)

===== ../sql/sql_class.cc 1.294 vs edited =====
--- 1.294/sql/sql_class.cc      2006-10-13 11:29:55 +10:00
+++ edited/../sql/sql_class.cc  2006-11-08 22:19:44 +11:00
@@ -2682,7 +2682,8 @@ int THD::binlog_delete_row(TABLE* table,
 
   /* add_row_data copies row_data */
   if (table->write_row_record == 0)
-    my_free((gptr)row_data, MYF(MY_WME));
+    table->write_row_record= row_data;
+//    my_free((gptr)row_data, MYF(MY_WME));
 
   return error;
 }
[9 Nov 2006 2:58] Stewart Smith
before malloc removal in delete row

Attachment: opt01-before.png (image/png, text), 87.43 KiB.

[9 Nov 2006 2:58] Stewart Smith
after removal of malloc in delete row

Attachment: opt01-after.png (image/png, text), 84.83 KiB.

[9 Nov 2006 6:03] Stewart Smith
Ideas for areas to optimize:
- remove need for THD::pack_row in as many cases as possible
   - this is about 17% of injector thread run time
   - the memcpy calls take significant time
   - for many tables (without BLOBS or VARCHAR especially... maybe CHAR) we could remove the need to pack the row, or at least remove the memcpy and change in place (which for CHAR may be just space padding/removing)

- optimise ndb_unpack_record  - over 9% of injector runtime spent here

- inline NdbEventOperationImpl::getEventType
  - not inlined by gcc, called for every event (i.e. row), almost 2% of injector runtime spent here.

- inline Ndb::nextEvent, always called, maybe save up to 0.5-1%

- better allocate memory in Rows_log_event::do_add_row_data
  - in one run, 6% of injector thread time was spent performing my_realloc here

- avoid memcpy that my_b_safe_write does when called from Rows_log_event::write_data_body
  - perhaps hand over the buffer to the write routines to write directly?
  - this would only save about about 3% of injector runtime though...

have not fonud any explanation (yet) for apparrent 90% slowdown. IO has not shown up as significant during callgrind or oprofile runs. Possibly a debug build?

Next to look at: 5.1.7 (feb) and compare callgrind results.
[9 Nov 2006 7:55] Stewart Smith
from 5.1.7 run

Attachment: injector-5.1.7-01.png (image/png, text), 133.22 KiB.

[9 Nov 2006 7:56] Stewart Smith
5.1.13 run with relative % (where injector thread = 100%)

Attachment: injector02.png (image/png, text), 108.77 KiB.

[13 Nov 2006 15:15] Jonathan Miller
Okay, I have the latest 5.1 and will run both debug and release builds. What version/clone should we compare to?
[15 Nov 2006 15:23] Jonathan Miller
Hi,

I ran the tests that where listed, and there seem to show that a slowdown from mysql-5.1-wl2325-5.0 compared to 5.1.14:

Important notes for those that test this after me:
You should uncomment the line:
//#define RUN_NDB_BINLOG_TIMER
in ha_ndbcluster_binlog.cc

Results Example (mysql-5.1-wl2325-5.0):
061115 16:10:20 [Note] main_timer 577 ms
061115 16:10:21 [Note] gci 2061 event_count 220 write time 1(220 e/s), total time 2(110 e/s
)
061115 16:10:21 [Note] main_timer 590 ms
061115 16:10:21 [Note] gci 2062 event_count 220 write time 1(220 e/s), total time 2(110 e/s
)
061115 16:10:21 [Note] main_timer 601 ms
061115 16:10:22 [Note] gci 2063 event_count 226 write time 1(226 e/s), total time 2(113 e/s
)
061115 16:10:22 [Note] main_timer 566 ms
061115 16:10:22 [Note] gci 2064 event_count 302 write time 2(151 e/s), total time 2(151 e/s
)
061115 16:10:22 [Note] main_timer 589 ms
061115 16:10:23 [Note] gci 2065 event_count 222 write time 1(222 e/s), total time 2(111 e/s
)
061115 16:10:23 [Note] main_timer 577 ms
061115 16:10:24 [Note] gci 2066 event_count 223 write time 1(223 e/s), total time 2(111 e/s
)
061115 16:10:24 [Note] main_timer 590 ms
061115 16:10:24 [Note] gci 2067 event_count 266 write time 1(266 e/s), total time 2(133 e/s
)
061115 16:10:24 [Note] main_timer 577 ms
061115 16:10:25 [Note] gci 2068 event_count 215 write time 1(215 e/s), total time 2(107 e/s
)
061115 16:10:25 [Note] main_timer 577 ms
061115 16:10:25 [Note] gci 2069 event_count 235 write time 1(235 e/s), total time 2(117 e/s)

Results Example (mysql-5.1-14):
061115  1:03:55 [Note] gci 420 event_count 242 write time 5(48 e/s), tot
s)
061115  1:03:55 [Note] main_timer 578 ms
061115  1:03:55 [Note] gci 421 event_count 242 write time 5(48 e/s), tot
s)
061115  1:03:55 [Note] main_timer 576 ms
061115  1:03:56 [Note] gci 422 event_count 250 write time 5(50 e/s), tot
s)
061115  1:03:56 [Note] main_timer 578 ms
061115  1:03:56 [Note] gci 423 event_count 250 write time 5(50 e/s), tot
s)
061115  1:03:56 [Note] main_timer 589 ms
061115  1:03:57 [Note] gci 424 event_count 240 write time 5(48 e/s), tot
s)
061115  1:03:57 [Note] main_timer 578 ms
061115  1:03:58 [Note] gci 425 event_count 226 write time 5(45 e/s), tot
s)
061115  1:03:58 [Note] main_timer 565 ms
061115  1:03:58 [Note] gci 426 event_count 250 write time 5(50 e/s), tot
s)
061115  1:03:58 [Note] main_timer 577 ms
061115  1:03:59 [Note] gci 427 event_count 235 write time 5(47 e/s), tot
s)
061115  1:03:59 [Note] main_timer 577 ms
061115  1:03:59 [Note] gci 428 event_count 242 write time 5(48 e/s), tot
s)
061115  1:03:59 [Note] main_timer 553 ms
061115  1:04:00 [Note] gci 429 event_count 253 write time 6(42 e/s), tot
s)
061115  1:04:00 [Note] main_timer 578 ms
061115  1:04:00 [Note] gci 430 event_count 224 write time 5(44 e/s), tot
s)
061115  1:04:00 [Note] main_timer 576 ms
[15 Nov 2006 15:48] Stewart Smith
just double checking, it's a non-debug build with adequate optimizations enabled? (i.e. built with -O3)?
[16 Nov 2006 20:05] Jonathan Miller
Hi,

Jonas asked that this be redone using create_tab T1; hugoLoad -r 1000 T1; hugoScanUpdate -l 0 T1. But actually for Replication to work, I have to create the table using MySQLD.

mysql> CREATE TABLE T1(KOL1 INT UNSIGNED KEY, KOL2 INT UNSIGNED, KOL3 INT UNSIGNED, KO
L4 INT UNSIGNED, KOL5 INT UNSIGNED)ENGINE=NDB;

5.1.14 results: (will post the other shortly)

061116 22:45:59 [Note] main_timer 568 ms
061116 22:46:00 [Note] gci 11534 event_count 3128 write time 14(223 e/s), total time 15
(208 e/s)
061116 22:46:00 [Note] main_timer 618 ms
061116 22:46:01 [Note] gci 11535 event_count 3489 write time 15(232 e/s), total time 17
(205 e/s)
061116 22:46:01 [Note] main_timer 611 ms
061116 22:46:01 [Note] gci 11536 event_count 3447 write time 15(229 e/s), total time 16
(215 e/s)
061116 22:46:01 [Note] main_timer 596 ms
061116 22:46:02 [Note] gci 11537 event_count 3256 write time 14(232 e/s), total time 16
(203 e/s)
061116 22:46:02 [Note] main_timer 582 ms
061116 22:46:02 [Note] gci 11538 event_count 3169 write time 14(226 e/s), total time 15
(211 e/s)
061116 22:46:02 [Note] main_timer 600 ms
061116 22:46:03 [Note] gci 11539 event_count 3320 write time 15(221 e/s), total time 16
(207 e/s)
061116 22:46:03 [Note] main_timer 627 ms
061116 22:46:04 [Note] gci 11540 event_count 3383 write time 15(225 e/s), total time 16
(211 e/s)
061116 22:46:04 [Note] main_timer 630 ms
061116 22:46:04 [Note] gci 11541 event_count 3425 write time 15(228 e/s), total time 16
(214 e/s)
061116 22:46:04 [Note] main_timer 600 ms
061116 22:46:05 [Note] gci 11542 event_count 3447 write time 15(229 e/s), total time 17
(202 e/s)
061116 22:46:05 [Note] main_timer 641 ms
061116 22:46:06 [Note] gci 11543 event_count 3617 write time 16(226 e/s), total time 17
(212 e/s)
[16 Nov 2006 20:13] Jonathan Miller
mysql-5.1-wl2325-5.0 Results

061116 23:12:31 [Note] main_timer 751 ms
061116 23:12:32 [Note] gci 386 event_count 8319 write time 16(519 e/s), total time 18(4
62 e/s)
061116 23:12:32 [Note] main_timer 667 ms
061116 23:12:32 [Note] gci 387 event_count 6936 write time 13(533 e/s), total time 15(4
62 e/s)
061116 23:12:32 [Note] main_timer 622 ms
061116 23:12:33 [Note] gci 388 event_count 6872 write time 13(528 e/s), total time 15(4
58 e/s)
061116 23:12:33 [Note] main_timer 670 ms
061116 23:12:34 [Note] gci 389 event_count 7192 write time 14(513 e/s), total time 16(4
49 e/s)
061116 23:12:34 [Note] main_timer 622 ms
061116 23:12:34 [Note] gci 390 event_count 6128 write time 12(510 e/s), total time 14(4
37 e/s)
061116 23:12:34 [Note] main_timer 663 ms
061116 23:12:35 [Note] gci 391 event_count 6680 write time 13(513 e/s), total time 15(4
45 e/s)
061116 23:12:35 [Note] main_timer 642 ms
061116 23:12:36 [Note] gci 392 event_count 6384 write time 12(532 e/s), total time 14(4
56 e/s)
061116 23:12:36 [Note] main_timer 681 ms
061116 23:12:36 [Note] gci 393 event_count 6936 write time 13(533 e/s), total time 15(4
62 e/s)
061116 23:12:36 [Note] main_timer 638 ms
061116 23:12:37 [Note] gci 394 event_count 6808 write time 13(523 e/s), total time 15(4
53 e/s)
061116 23:12:37 [Note] main_timer 629 ms
061116 23:12:38 [Note] gci 395 event_count 6064 write time 12(505 e/s), total time 14(4
[6 Dec 2006 4:53] Stewart Smith
comparitive graph of wl2325 versus 5.1.14

Attachment: write_row_perf.svg (image/svg+xml, text), 32.16 KiB.

[6 Dec 2006 4:54] Stewart Smith
gnumeric spreadsheet source for perf graph

Attachment: write_row_performance.gnumeric (application/x-gnumeric, text), 3.51 KiB.

[6 Dec 2006 4:57] Stewart Smith
It looks rather certain that some performance change has happenned between the two trees.

Currently looking at cache simulation output from callgrind/cachegrind and formulating some theories.
[8 Dec 2006 5:39] Stewart Smith
After looking closer, including at cache performance runs (callgrind simulating CPU cache), the only real area differing in performance is an increase in the L2 Cache Write misses in the memory allocation code for events in ndbapi. Even so, the total cache misses is less than 1% - not indicative of any real performance impact.

Once fixing the code for displaying the events/sec (insead of ms), I'm getting (even with runinng all nodes on localhost) GCIs with 100,000 events being written in less than 0.2 seconds.

I think we'll have to wait for tomas to return to investigate further