Bug #49501 Inefficient information_schema check (system collation)
Submitted: 7 Dec 2009 10:53 Modified: 12 Mar 2010 16:35
Reporter: Domas Mituzas Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S5 (Performance)
Version:5.1.43-bzr OS:Any
Assigned to: Sergei Glukhov CPU Architecture:Any
Triage: Triaged: D3 (Medium)

[7 Dec 2009 10:53] Domas Mituzas
Description:
On my oprofile samples following stack has significant (few percents of mysqld time appearance):

#0  my_strcasecmp_utf8 (cs=0xc36420, s=0x7fbb301dafb0 "cswiki", t=0x86ea2b "information_schema") at ctype-utf8.c:2472
#1  0x00000000006b090c in mysql_change_db (thd=0x7fbb404cbca0, new_db_name=0x40a37830, force_switch=false) at sql_db.cc:1561
#2  0x00000000005c81fa in dispatch_command (command=COM_INIT_DB, thd=0x7fbb404cbca0, packet=0x7fbb405d2f11 "cswiki", packet_length=6) at sql_parse.cc:1023
#3  0x00000000005c8ec6 in do_command (thd=0x7fbb404cbca0) at sql_parse.cc:872
#4  0x00000000005b8f28 in handle_one_connection (arg=0x7fbb404cbca0) at sql_connect.cc:1127
#5  0x00007fc20ff873f7 in start_thread () from /lib/libpthread.so.0
#6  0x00007fc20ef07b2d in clone () from /lib/libc.so.6
#7  0x0000000000000000 in ?? ()

It is always using expensive utf8 rules to verify if it is information schema. 

How to repeat:
 gdb -ex 'br my_strcasecmp_utf8' -ex 'continue' -ex 'bt'   -ex 'continue' -ex 'bt'    -ex 'continue' -ex 'bt'    -ex 'continue' -ex 'bt'      -batch -p $(pidof mysqld) 

Breakpoint 1, my_strcasecmp_utf8 (cs=0xc36420, s=0x7fbb3c1da200 "itwiki", t=0x86ea2b "information_schema") at ctype-utf8.c:2472
2472	ctype-utf8.c: No such file or directory.
	in ctype-utf8.c
Current language:  auto; currently c
#0  my_strcasecmp_utf8 (cs=0xc36420, s=0x7fbb3c1da200 "itwiki", t=0x86ea2b "information_schema") at ctype-utf8.c:2472
#1  0x00000000006b090c in mysql_change_db (thd=0x7fbb38b2ffe0, new_db_name=0x519d9830, force_switch=false) at sql_db.cc:1561
#2  0x00000000005c81fa in dispatch_command (command=COM_INIT_DB, thd=0x7fbb38b2ffe0, packet=0x7fbb38a9c8e1 "itwiki", packet_length=6) at sql_parse.cc:1023
#3  0x00000000005c8ec6 in do_command (thd=0x7fbb38b2ffe0) at sql_parse.cc:872
#4  0x00000000005b8f28 in handle_one_connection (arg=0x7fbb38b2ffe0) at sql_connect.cc:1127
#5  0x00007fc20ff873f7 in start_thread () from /lib/libpthread.so.0
#6  0x00007fc20ef07b2d in clone () from /lib/libc.so.6
#7  0x0000000000000000 in ?? ()
[Switching to Thread 0x523ce950 (LWP 19171)]

Breakpoint 1, my_strcasecmp_utf8 (cs=0xc36420, s=0x7fbb40d9dea0 "ptwiki", t=0x86ea2b "information_schema") at ctype-utf8.c:2472
2472	in ctype-utf8.c
#0  my_strcasecmp_utf8 (cs=0xc36420, s=0x7fbb40d9dea0 "ptwiki", t=0x86ea2b "information_schema") at ctype-utf8.c:2472
#1  0x00000000006b090c in mysql_change_db (thd=0x7fbb38ce2fa0, new_db_name=0x523cd830, force_switch=false) at sql_db.cc:1561
#2  0x00000000005c81fa in dispatch_command (command=COM_INIT_DB, thd=0x7fbb38ce2fa0, packet=0x7fbb38ce5851 "ptwiki", packet_length=6) at sql_parse.cc:1023
#3  0x00000000005c8ec6 in do_command (thd=0x7fbb38ce2fa0) at sql_parse.cc:872
#4  0x00000000005b8f28 in handle_one_connection (arg=0x7fbb38ce2fa0) at sql_connect.cc:1127
#5  0x00007fc20ff873f7 in start_thread () from /lib/libpthread.so.0
#6  0x00007fc20ef07b2d in clone () from /lib/libc.so.6
#7  0x0000000000000000 in ?? ()
[Switching to Thread 0x52587950 (LWP 19185)]

Breakpoint 1, my_strcasecmp_utf8 (cs=0xc36420, s=0x7fbb40c678e0 "itwiki", t=0x86ea2b "information_schema") at ctype-utf8.c:2472
2472	in ctype-utf8.c
#0  my_strcasecmp_utf8 (cs=0xc36420, s=0x7fbb40c678e0 "itwiki", t=0x86ea2b "information_schema") at ctype-utf8.c:2472
#1  0x00000000006b090c in mysql_change_db (thd=0x7fbb38d1de60, new_db_name=0x52586830, force_switch=false) at sql_db.cc:1561
#2  0x00000000005c81fa in dispatch_command (command=COM_INIT_DB, thd=0x7fbb38d1de60, packet=0x7fbb38d20711 "itwiki", packet_length=6) at sql_parse.cc:1023
#3  0x00000000005c8ec6 in do_command (thd=0x7fbb38d1de60) at sql_parse.cc:872
#4  0x00000000005b8f28 in handle_one_connection (arg=0x7fbb38d1de60) at sql_connect.cc:1127
#5  0x00007fc20ff873f7 in start_thread () from /lib/libpthread.so.0
#6  0x00007fc20ef07b2d in clone () from /lib/libc.so.6
#7  0x0000000000000000 in ?? ()
[Switching to Thread 0x52d91950 (LWP 19229)]

Breakpoint 1, my_strcasecmp_utf8 (cs=0xc36420, s=0x7fbb3c50e420 "zhwiki", t=0x86ea2b "information_schema") at ctype-utf8.c:2472
2472	in ctype-utf8.c
#0  my_strcasecmp_utf8 (cs=0xc36420, s=0x7fbb3c50e420 "zhwiki", t=0x86ea2b "information_schema") at ctype-utf8.c:2472
#1  0x00000000005be0bc in st_select_lex::add_table_to_list (this=0x7fbb38adffd8, thd=0x7fbb38ade1b0, table=0x7fbb3c50e0c0, alias=<value optimized out>, table_options=<value optimized out>, lock_type=TL_READ, index_hints_arg=0x0, option=0x0) at sql_parse.cc:6272
#2  0x00000000005fc667 in MYSQLparse (yythd=0x7fbb38ade1b0) at sql_yacc.yy:8602
#3  0x00000000005c77c7 in mysql_parse (thd=0x7fbb38ade1b0, inBuf=0x7fbb3c50dbe0 "SELECT /* LinkCache::addLinkObj Ws227 */  page_id,page_len,page_is_redirect  FROM `page`  WHERE page_namespace = '8' AND page_title = 'Gadget-specialchars.js'  LIMIT 1", length=167, found_semicolon=0x52d90848) at sql_parse.cc:7844
#4  0x00000000005c893b in dispatch_command (command=COM_QUERY, thd=0x7fbb38ade1b0, packet=<value optimized out>, packet_length=<value optimized out>) at sql_parse.cc:1231
#5  0x00000000005c8ec6 in do_command (thd=0x7fbb38ade1b0) at sql_parse.cc:872
#6  0x00000000005b8f28 in handle_one_connection (arg=0x7fbb38ade1b0) at sql_connect.cc:1127
#7  0x00007fc20ff873f7 in start_thread () from /lib/libpthread.so.0
#8  0x00007fc20ef07b2d in clone () from /lib/libc.so.6
#9  0x0000000000000000 in ?? ()

Suggested fix:
allow to change system collation or use more efficient methods to do this basic comparison.
[8 Dec 2009 17:18] Valeriy Kravchuk
Thank you for the problem report.
[11 Dec 2009 8:28] Alexander Barkov
This piece of code can be optimized by a new condition, using length:

  if (new_db_name->length == INFORMATION_SCHEMA_NAME.length &&
      my_strcasecmp(system_charset_info, new_db_name->str,
                    INFORMATION_SCHEMA_NAME.str) == 0) 

so in most cases my_strcasecmp() will even not be executed.

Quick search using "grep my_stecasecmp | grep INFORMATION_SCHEMA"
discovers three pieces where this optimization can be done:

mysql_priv.h:
#define is_schema_db(X) \
  !my_strcasecmp(system_charset_info, INFORMATION_SCHEMA_NAME.str, (X))
sql_db.cc:  if (!my_strcasecmp(system_charset_info, db, INFORMATION_SCHEMA_NAME.str))
sql_show.cc:    if (!my_strcasecmp(system_charset_info, INFORMATION_SCHEMA_NAME.str,
[15 Dec 2009 11:12] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/94104

3273 Sergey Glukhov	2009-12-15
      Bug#49501 Inefficient information_schema check (system collation)
      added check_length optimization for I_S_NAME comparison
     @ sql/event_data_objects.cc
        added check_length optimization for I_S_NAME comparison
     @ sql/events.cc
        added check_length optimization for I_S_NAME comparison
     @ sql/mysql_priv.h
        added check_length optimization for I_S_NAME comparison
     @ sql/repl_failsafe.cc
        added check_length optimization for I_S_NAME comparison
     @ sql/sql_db.cc
        added check_length optimization for I_S_NAME comparison
     @ sql/sql_parse.cc
        added check_length optimization for I_S_NAME comparison
     @ sql/sql_show.cc
        added check_length optimization for I_S_NAME comparison
     @ sql/sql_view.cc
        added check_length optimization for I_S_NAME comparison
     @ sql/table.cc
        added check_length optimization for I_S_NAME comparison
[15 Dec 2009 11:33] Alexander Barkov
http://lists.mysql.com/commits/94104 is Ok to push.

But I'd also perhaps add an overloaded function:

inline bool is_schema_db(const LEX_STR *str);

because it is exactly LEX_STR which is tested in most of the cases.
[15 Dec 2009 11:34] Alexander Barkov
or even:

inline bool is_schema_db(const LEX_STR &);
[15 Dec 2009 15:50] Mark Callaghan
Can we continue the discussion on this topic?

"!my_strcasecmp" can be found throughout the code. Will all of those calls be faster if there is a check for string lengths prior to doing the comparison? Alas, some people might use 'my_strcasecmp(...) == 0' making the search harder. I wish that style was not used.

A similar optimization opportunity exists in sql_string.cc in the stringcmp function. This is used in several places where the caller checks for equality not for order. If a stringeq method where added it could optimize for this check by first checking string length.

int stringcmp(const String *s,const String *t)
{
  uint32 s_len=s->length(),t_len=t->length(),len=min(s_len,t_len);
  int cmp= memcmp(s->ptr(), t->ptr(), len);
  return (cmp) ? cmp : (int) (s_len - t_len);
}

callers to stringcmp that really want equality checks, from cscope for 5.0.84

item.cc         eq    874 return !stringcmp(&str_value, &item->str_value);
item.cc         eq   3117 return !stringcmp(&str_value, &item->str_value);
item.cc         eq   5045 return !stringcmp(&str_value, &arg->str_value);
item.cc         field_is_equal_to_item  
                     6486 return !stringcmp(&field_tmp,item_result);
[19 Jan 2010 8:13] Alexander Barkov
Hi Mark,

thank you very much for your suggestion about the further improvements way.
We cannot do all these improvements in MySQL-5.1 under the terms of this bug report. So we created a new WorkLog task:

WL#5230 Replace strcasecmp() used for checking equality to a more efficient function

It should appear soon under this location:
http://forge.mysql.com/worklog/task.php?id=5230

Thanks!
[19 Jan 2010 9:04] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/97331

3320 Sergey Glukhov	2010-01-19
      Bug#49501 Inefficient information_schema check (system collation)
      added check_length optimization for I_S_NAME comparison
     @ sql/event_data_objects.cc
        added check_length optimization for I_S_NAME comparison
     @ sql/events.cc
        added check_length optimization for I_S_NAME comparison
     @ sql/mysql_priv.h
        added check_length optimization for I_S_NAME comparison
     @ sql/repl_failsafe.cc
        added check_length optimization for I_S_NAME comparison
     @ sql/sql_db.cc
        added check_length optimization for I_S_NAME comparison
     @ sql/sql_parse.cc
        added check_length optimization for I_S_NAME comparison
     @ sql/sql_show.cc
        added check_length optimization for I_S_NAME comparison
     @ sql/sql_view.cc
        added check_length optimization for I_S_NAME comparison
     @ sql/table.cc
        added check_length optimization for I_S_NAME comparison
[19 Jan 2010 17:21] Mark Callaghan
Did you confirm that this change improves performance? The change made here is much more substantial than the change suggested by Domas and may result in many more calls to strlen. The change suggested by Domas required no additional calls to strlen.

The problem reported by Domas requires changes to mysql_change_db() and add_table_to_list(). In both cases, the length of both strings is known in the check for equality with "information_schema".
[21 Jan 2010 8:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/97684

3329 Sergey Glukhov	2010-01-21
      Bug#49501 Inefficient information_schema check (system collation), addon
      removed wrongly introduced strlen calls
     @ sql/events.cc
        removed wrongly introduced strlen calls
     @ sql/mysql_priv.h
        removed wrongly introduced strlen calls
     @ sql/repl_failsafe.cc
        removed wrongly introduced strlen calls
     @ sql/sql_db.cc
        removed wrongly introduced strlen calls
     @ sql/sql_parse.cc
        removed wrongly introduced strlen calls
     @ sql/sql_show.cc
        removed wrongly introduced strlen calls
[22 Jan 2010 10:01] Alexander Barkov
The patch http://lists.mysql.com/commits/97684 removing
newly introduced strlen() calls looks ok.

So now as a result we have two overloaded functions:

is_schema_db(const char *name);

(for the cases when length is not know)

and 

is_schema_db(const char *name, size_t length)

(for the case when length is known).

The second version is about 3 times faster.

In the future we will do refactoring (as Sinisa suggests):

"WL#2894 change strlen() to LEX_STRING or String"

to make length always available and thus get rid
of the slower version.
[22 Jan 2010 10:03] Alexander Barkov
Mark,

> Did you confirm that this change improves performance?

Yes, according to our tests:

   is_schema_db("twiki", 5);

is 3 times faster than

   is_schema_db("twiki");

Thank you very much for helping us!
[22 Jan 2010 10:59] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/97841

3331 Sergey Glukhov	2010-01-22
      Bug#49501 Inefficient information_schema check (system collation), addon
      removed wrongly introduced strlen calls
     @ sql/events.cc
        removed wrongly introduced strlen calls
     @ sql/mysql_priv.h
        removed wrongly introduced strlen calls
     @ sql/repl_failsafe.cc
        removed wrongly introduced strlen calls
     @ sql/sql_db.cc
        removed wrongly introduced strlen calls
     @ sql/sql_parse.cc
        removed wrongly introduced strlen calls
     @ sql/sql_show.cc
        removed wrongly introduced strlen calls
[4 Feb 2010 10:19] Bugs System
Pushed into 5.1.44 (revid:joro@sun.com-20100204101444-2j32mhqroo0iiio6) (version source revid:dao-gang.qu@sun.com-20100125025505-zqa9v2mgdcfza0v6) (merge vers: 5.1.43) (pib:16)
[5 Feb 2010 11:48] Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100204063540-9czpdmpixi3iw2yb) (version source revid:alik@sun.com-20100130192433-9fckdf4o5xc37flw) (pib:16)
[5 Feb 2010 11:55] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20100205113942-oqovjy0eoqbarn7i) (version source revid:alik@sun.com-20100204064210-ljwanqvrjs83s1gq) (merge vers: 6.0.14-alpha) (pib:16)
[5 Feb 2010 12:00] Bugs System
Pushed into 5.5.2-m2 (revid:alik@sun.com-20100203172258-1n5dsotny40yufxw) (version source revid:alexey.kopytov@sun.com-20100123210923-lx4o1ettww9fdkqk) (merge vers: 5.5.2-m2) (pib:16)
[11 Feb 2010 13:49] Paul Dubois
Noted in 5.1.44, 5.5.2, 6.0.14 changelogs.

The method for comparing INFORMATION_SCHEMA names and database names
was nonoptimal and an improvement was made: When the database name
length is already known, a length check is made first and content
comparison skipped if the lengths are unequal.

Setting report to Need Merge pending push to Celosia.
[12 Mar 2010 14:08] Bugs System
Pushed into 5.1.44-ndb-7.0.14 (revid:jonas@mysql.com-20100312135944-t0z8s1da2orvl66x) (version source revid:jonas@mysql.com-20100312115609-woou0te4a6s4ae9y) (merge vers: 5.1.44-ndb-7.0.14) (pib:16)
[12 Mar 2010 14:24] Bugs System
Pushed into 5.1.44-ndb-6.2.19 (revid:jonas@mysql.com-20100312134846-tuqhd9w3tv4xgl3d) (version source revid:jonas@mysql.com-20100312060623-mx6407w2vx76h3by) (merge vers: 5.1.44-ndb-6.2.19) (pib:16)
[12 Mar 2010 14:38] Bugs System
Pushed into 5.1.44-ndb-6.3.33 (revid:jonas@mysql.com-20100312135724-xcw8vw2lu3mijrhn) (version source revid:jonas@mysql.com-20100312103652-snkltsd197l7q2yg) (merge vers: 5.1.44-ndb-6.3.33) (pib:16)
[12 Mar 2010 16:35] Paul Dubois
Fixed in earlier 5.1.x, 5.5.x.