MySQL Bugs: #117165: load data local infile replace have poor performance

Bug #117165	load data local infile replace have poor performance
Submitted:	9 Jan 6:53	Modified:	7 Apr 9:57
Reporter:	Wang Zhengmao (OCA)	Email Updates:
Status:	Analyzing	Impact on me:	None
Category:	MySQL Server: DML	Severity:	S5 (Performance)
Version:	8.0.36	OS:	Any
Assigned to:	MySQL Verification Team	CPU Architecture:	Any

Description:
When there are too many duplicate rows in localfile, `load data local infile replace` SQL will become much slower than no duplicate rows.

And There is a hotspot on ut_lock_free_hash_t::get function in perf flame graph.

The execution steps for `load data local infile replace` SQL are：
1. Try inserting one tuple. If no errors, continue inserting the next one.
2. If having HA_ERR_FOUND_DUPP_KEY or HA_ERR_FOUND_DUPP_UNIQUE error, fetch table->file->errkey in ha_innobase::info_low function.
3. Using duplicate key to read tuple from innodb, and then use update or delete + insert to udpate duplicate row.

The problem exists in step 2. 
1. ha_innobase::info_low function calls index_pct_cached funtion for every index of current table to estimate what percentage of an index's pages are cached in the buffer pool, which is not really necessary. Because it just need to fetch the duplicate key number.
2. how many pages from each index are contained in buffer pool are stored in buf_stat_per_index, whose data structure is ut_lock_free_hash_t. 

Here is the call stack:
mysql_execute_command
  -- Sql_cmd_load_table::execute
    -- Sql_cmd_load_table::execute_inner
      -- Sql_cmd_load_table::read_sep_field
        -- write_record
          -- ha_write_row
          -- handler::get_dup_key
            -- ha_innobase::info_low
              -- index_pct_cached
                -- buf_stat_per_index_t::get
                  -- buf_stat_per_index_t::get
                    -- ut_lock_free_hash_t::get

Therefore, when there are many duplicate rows, SQL execution time increases with the number of indexes.
insert ... on duplicate key update and replace into will be affected, too.

How to repeat:
1. create table t1 with primary key
2. create local datafile1, use 'load data replace' to load the data to table t1 and record the time.
3. create new local datafile2 whose primary key is the same as datafile1, but other columns are different. Then use 'load data replace' to load data to table t1 and record the time.
4. create more index on table t1. And repeat step 3.

You can find the SQL execution time increases with the number of indexes.

Suggested fix:
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -17994,22 +17994,24 @@ int ha_innobase::info_low(uint flag, bool is_analyze) {
 
     KEY *key = &table->key_info[i];
 
-    double pct_cached;
-
-    /* We do not maintain stats for fulltext or spatial indexes.
-    Thus, we can't calculate pct_cached below because we need
-    dict_index_t::stat_n_leaf_pages for that. See
-    dict_stats_should_ignore_index(). */
-    if ((key->flags & HA_FULLTEXT) || (key->flags & HA_SPATIAL)) {
-      pct_cached = IN_MEMORY_ESTIMATE_UNKNOWN;
-    } else {
-      pct_cached = index_pct_cached(index);
-    }
+    if (!(flag & HA_STATUS_ERRKEY)) {
+      double pct_cached;
+
+      /* We do not maintain stats for fulltext or spatial indexes.
+      Thus, we can't calculate pct_cached below because we need
+      dict_index_t::stat_n_leaf_pages for that. See
+      dict_stats_should_ignore_index(). */
+      if ((key->flags & HA_FULLTEXT) || (key->flags & HA_SPATIAL)) {
+        pct_cached = IN_MEMORY_ESTIMATE_UNKNOWN;
+      } else {
+        pct_cached = index_pct_cached(index);
+      }
 
-    key->set_in_memory_estimate(pct_cached);
+      key->set_in_memory_estimate(pct_cached);
 
-    if (index == pk) {
-      stats.table_in_mem_estimate = pct_cached;
+      if (index == pk) {
+        stats.table_in_mem_estimate = pct_cached;
+      }
     }
 
     if (flag & HA_STATUS_CONST) {

Hello Wang Zhengmao,

Thank you for the report and feedback.
My apologies for taking time on this, even with the multiple attempts(tried with current GA version instead of reported version i.e. 8.0.36) I'm unable to see difference with default and patched instance. Am I missing something here? Please let me know along with exact make options used for the build and configuration details if not on default. Thank you.

I'm joining the steps which I tried at my end for your reference. Thank you.

regards,
Umesh

8.0.41 test results

Attachment: 117165.results (application/octet-stream, text), 9.91 KiB.

No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

ut_lock_free_hash_t is a lock free hash array, which is a fixed length array with elements of (key, val). The hash value is calculated based on the Key and take the remainder with the array size to find the corresponding slot. If the slot is occupied, search for an available slot to the right. 
For example, if there are many collisions in the hash array, searching the corresponding slot through the key would traversing slots one by one and require more time. If that happened, more indices more time cost. Although it not very easy to repeat, it has a chance to happen and I have encountered.
Additionally, when HA_STATUS_ERRKEY error happened, it's just necessary to call ha_innobase::info_low to fetch the duplicate index number. Update key's m_in_memory_estimate is not very necessary.

Thank you for the feedback.
Let me retry few more times to see if I can reproduce this.

regards,
Umesh