MySQL Bugs: #72993: my_hash_sort_utf8 can be optimized if table has many fields

Bug #72993	my_hash_sort_utf8 can be optimized if table has many fields
Submitted:	13 Jun 2014 1:03	Modified:	11 Aug 2014 14:37
Reporter:	Hao Liu	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	5.6 5.5	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
For many OLTP workload, tables has many fields. MySQL uses my_hash_sort_utf8 as the hash function in the hash search of find_field_in_table function.

We have a benchmark that the table has many fields( 54 fields one table), and have found the my_hash_sort_utf8 can be the hotspot and can be optimized. The pref report's output is below:

+   6.67%  mysqld_old  mysqld_old            [.] MYSQLparse(void*)                                               `
-   2.98%  mysqld_old  mysqld_old            [.] my_hash_sort_utf8                                               a
   - my_hash_sort_utf8                                                                                           a
      - 97.93% cset_hash_sort_adapter                                                                            a
         - 62.04% my_hash_first                                                                                  a
            - my_hash_search                                                                                     a
               - 93.22% find_field_in_table(THD*, TABLE*, char const*, unsigned int, bool, unsigned int*)        a
                    find_field_in_table_ref(THD*, TABLE_LIST*, char const*, unsigned int, char const*, char consta
                    find_field_in_tables(THD*, Item_ident*, TABLE_LIST*, TABLE_LIST*, Item**, find_item_error_repa
                  + Item_field::fix_fields(THD*, Item**)                                                         a
               + 6.78% find_native_function_builder(THD*, st_mysql_lex_string)                                   a
         + 37.96% my_hash_first_from_hash_value                                                                  a
      + 1.74% my_hash_first                                                                                      a
+   2.45%  mysqld_old  mysqld_old            [.] lex_one_token(void*, void*)                                     a
+   1.59%  mysqld_old  libc-2.12.so          [.] __strlen_sse42                                                  a
+   1.58%  mysqld_old  mysqld_old            [.] build_template_field(row_prebuilt_t*, dict_index_t*, dict_index_a
+   1.49%  mysqld_old  mysqld_old            [.] rec_get_offsets_func(unsigned char const*, dict_index_t const*, a
+   1.46%  mysqld_old  libc-2.12.so          [.] memcpy                                                          a

I think we can use murmur3_32 hash function to replace my_hash_sort_utf8.

I did a simple test:
The test is read-only test and cpu-bound( using 4 cores).
After use murmur32_32:
QPS : 19600 => 20400.
The my_hash_sort_utf8 hotspot of perf report's output disappears:  my_hash_sort_utf8 uses 2.98% cpu  vs murmur3_32 uses 0.7% cpu.

How to repeat:
read the code of find_field_in_table and my_hash_sort_utf8.

Suggested fix:
I'll attach a simple patch to prove the performance improvement.

attached patch. I think it can be optimized further.

Attachment: bug#72993.diff (application/octet-stream, text), 4.22 KiB.

murmu's function is basically a hash function that's operating on a block of bytes. And that is all of its functionality.

On the other hand, my_hash_sort_utf8() is doing a collation aware string hashing.
So it's comparing apples to apple seeds. We strongly rely on collation-aware hashing as we have to support hundreds of different collations.

But, thank you for trying to improve a performance of  our server.