Bug #57827 Adding UNIQUE index to disk data table is very slow with moderate amount of data
Submitted: 28 Oct 2010 22:08 Modified: 4 Nov 2010 15:35
Reporter: Daniel Smythe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Cluster: Cluster (NDB) storage engine Severity:S3 (Non-critical)
Version:7.1.6 OS:Any
Assigned to: Frazer Clement CPU Architecture:Any
Tags: disk data tables, ndb_restore, unique index

[28 Oct 2010 22:08] Daniel Smythe
Description:
When adding a UNIQUE key to a cluster table with disk data, it can take a very long time with moderate amounts of data. 

This issue appears most notably when using ndb_restore --rebuild-indexes.

Investigation is showing that full table scan of disk data is performed, and possibly in random sequence.

How to repeat:
Create a UNIQUE key on a table which is mostly disk data, and filled with a moderate amount of data. This is the test case I used.

#!/bin/bash

mysql -NBe "CREATE LOGFILE GROUP lg_1 ADD UNDOFILE 'undo_1.log' ENGINE NDBCLUSTER;"
mysql -NBe "ALTER LOGFILE GROUP lg_1 ADD UNDOFILE 'undo_2.log' ENGINE NDBCLUSTER;"
mysql -NBe "ALTER LOGFILE GROUP lg_1 ADD UNDOFILE 'undo_3.log' ENGINE NDBCLUSTER;"
mysql -NBe "ALTER LOGFILE GROUP lg_1 ADD UNDOFILE 'undo_4.log' ENGINE NDBCLUSTER;"

mysql -NBe "CREATE TABLESPACE ts_1 ADD DATAFILE 'data_1.dat' USE LOGFILE GROUP lg_1 INITIAL_SIZE 1G ENGINE NDBCLUSTER;"
mysql -NBe "ALTER TABLESPACE ts_1 ADD DATAFILE 'data_2.dat' INITIAL_SIZE 1G ENGINE NDBCLUSTER;"

mysql -NBe "DROP TABLE IF EXISTS test.t1"
mysql -NBe "CREATE TABLE test.t1 ( id INT NOT NULL DEFAULT NULL PRIMARY KEY AUTO_INCREMENT STORAGE MEMORY, \
  a INT NOT NULL STORAGE MEMORY, \
  b INT NOT NULL STORAGE DISK, \
  c VARCHAR(255) NOT NULL STORAGE DISK, \
  d VARCHAR(255) NOT NULL STORAGE DISK \
  ) TABLESPACE ts_1 ENGINE NDBCLUSTER"

echo "Starting hugoFill, cancel it before you run out of space."
hugoFill -b 2500 -d test t1

Allow hugoFill to load the table up. I stopped it around 6 Mil rows.

Then run this:

#!/bin/bash

echo "How many rows did we get?"
mysql -NBe "SELECT count(*) FROM test.t1"

echo "Time to add the unique key on (a)"
time mysql -NBe "AlTER TABLE test.t1 ADD UNIQUE KEY(a)"

It will take a very, very, very long time to complete.

Here is some sample data from my system while this is running:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  4 598040  61156   4784 389500    0    2    40   417    3    6  3  2 88  7  0
 0  6 598028  54700   4852 395768    0    0  5914   387 1438 4849  2  2 15 82  0
 0  5 597988  52716   4936 398352   15    0  5390   330 1392 4596  1  2 16 81  0
 0  5 597912  52592   5152 398108   49    0  5105   290 1432 4445  2  2  3 93  0
 0  4 597908  51972   5240 398160    0    0  5629   417 1450 4792  1  2 11 86  0

root@linux bug]# pidstat -du -p 21664,21612,21613,21619,21620,21657,21658,21664,21665 30
Linux 2.6.34.7-56.fc13.x86_64 (linux)   10/28/2010      _x86_64_        (2 CPU)

02:07:50 PM       PID    %usr %system  %guest    %CPU   CPU  Command
02:08:20 PM     21664    0.00    0.00    0.00    0.00     0  ndbd
02:08:20 PM     21612    0.00    0.03    0.00    0.03     0  ndbd
02:08:20 PM     21613    0.47    0.80    0.00    1.27     1  ndbd
02:08:20 PM     21619    0.00    0.00    0.00    0.00     0  ndbd
02:08:20 PM     21620    0.53    0.67    0.00    1.20     1  ndbd
02:08:20 PM     21657    0.00    0.00    0.00    0.00     0  ndbd
02:08:20 PM     21658    0.53    0.73    0.00    1.27     1  ndbd
02:08:20 PM     21665    0.53    0.73    0.00    1.27     0  ndbd

02:07:50 PM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
02:08:20 PM     21664      0.00      0.00      0.00  ndbd
02:08:20 PM     21612      0.00      0.00      0.00  ndbd
02:08:20 PM     21613   1327.20     74.53      0.00  ndbd
02:08:20 PM     21619      0.00      0.00      0.00  ndbd
02:08:20 PM     21620   1367.47     72.27      0.00  ndbd
02:08:20 PM     21657      0.00      0.00      0.00  ndbd
02:08:20 PM     21658   1604.27     72.27      0.00  ndbd
02:08:20 PM     21665   1523.20     73.33      0.00  ndbd

And a 271 second strace summary of all the pids:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 70.59  191.609216        1138    168336       108 futex
 13.50   36.636437        1600     22902           select
  6.79   18.440219       10690      1725           poll
  5.34   14.491097         294     49369           pread
  2.52    6.849015        1839      3724           fsync
  0.70    1.888898           2   1236956           epoll_wait
  0.37    0.997801         692      1442           write
  0.09    0.242967           1    301570           writev
  0.04    0.117405          34      3480           pwrite
  0.04    0.105462         732       144           open
  0.02    0.044259           0    354596           recvfrom
  0.01    0.018998        4750         4           restart_syscall
  0.00    0.007077           0     56279           sendto
  0.00    0.002075          12       172           stat
  0.00    0.000113           1       144           close
  0.00    0.000000           0        57           times
------ ----------- ----------- --------- --------- ----------------
100.00  271.451039               2200900       108 total

Suggested fix:
Index building operation should not scan disk data.
[29 Oct 2010 15:23] Frazer Clement
Proposed patch

Attachment: bug57827.patch (text/x-patch), 3.41 KiB.

[29 Oct 2010 15:23] Frazer Clement
DICT tells TRIX to tell SUMA to tell LQH not to scan disk for Unique index build.
[29 Oct 2010 15:24] Frazer Clement
Should probably fix in 6.2/6.3+
[29 Oct 2010 18:06] Daniel Smythe
Without patch, the above mentioned test ran in 572m 5.767s
With patch, same test ran in 2m 44.467s

Tested on 7.1.6
[29 Oct 2010 19:34] Jonas Oreland
6.3 is sufficient
[1 Nov 2010 13:17] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/122429

3330 Frazer Clement	2010-11-01
      Bug#57827 Adding UNIQUE index to disk data table is very slow with moderate amount of data
      
      Modify unique index build not to read disk based columns.
[1 Nov 2010 14:33] Bugs System
Pushed into mysql-5.1-telco-7.0 5.1.51-ndb-7.0.20 (revid:frazer@mysql.com-20101101141607-cu5p3i9n0jepzwer) (version source revid:frazer@mysql.com-20101101141607-cu5p3i9n0jepzwer) (merge vers: 5.1.51-ndb-7.0.20) (pib:21)
[1 Nov 2010 14:34] Bugs System
Pushed into mysql-5.1-telco-6.3 5.1.51-ndb-6.3.39 (revid:frazer@mysql.com-20101101131522-33zap16jsi2lj8mw) (version source revid:frazer@mysql.com-20101101131522-33zap16jsi2lj8mw) (merge vers: 5.1.51-ndb-6.3.39) (pib:21)
[1 Nov 2010 15:40] Frazer Clement
Also pushed to 7.1.9
[4 Nov 2010 15:35] Jon Stephens
Documented fix in the NDB-6.3.39, 7.0.20, and 7.1.9 changelogs, as follows:

        Adding unique indexes to NDB tables could take an extremely long
        time. This was particularly noticeable when using ndb_restore
        --rebuild-indexes.

Closed.