Bug #914 MyISAM INSERT blocking SELECT
Submitted: 24 Jul 2003 17:32 Modified: 23 Jan 2004 19:35
Reporter: Paul Coldrey Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: MyISAM storage engine Severity:S2 (Serious)
Version:4.0.14 OS:Linux (RedHat 9.0)
Assigned to: Alexander Keremidarski CPU Architecture:Any

[24 Jul 2003 17:32] Paul Coldrey
Description:
Here is my follow up to the long and sad story that was bug #712.

Upon installing 4.0.14 I ran the piece of code that alerted me to bug #712 (and the related issues that are still outstanding) and the same dismal performance resulted. However, when I attempted to put together a nice test case I discovered that at least some portion of the problems with 4.0.13 had been fixed. I had wanted to isolate the problem a bit further but unfortunately time has escaped me so here is a rather onerous test suite that should expose the problem. As always seems to be the case, the error occurs in the most convolved part of our system and so I apologise in advance to anyone who tries to glean what I'm trying to achieve with this code but I promise it is actually logical.

The basic gist is:

. When permissions are rebuilt the insert in mysqlbug2 happens and insert around 120,000 rows. All these rows have the same SequenceId. Once the new permissions have been fully inserted the previous permissions (with a lower SequenceId) are deleted.
. When people are browsing and a menu needs to be drawn the select in mysqlbug3 is executed and is locked - as shown below. This results in users having to wait 10-20 seconds for a page to render. (Note: the query in mysqlbug3.sql is called recursively to build submenus (and the Grp value changes - I hard-coded it to 112). The SiteId and PersonId are also variables but they remain the same for a given set of calls of the query.)

+-----+------+----------------+----------+---------+------+--------------+-------------------------------------------------------------------------
| Id  | User | Host           | db       | Command | Time | State        | Info                                                                     
+-----+------+----------------+----------+---------+------+--------------+-------------------------------------------------------------------------
| 576 | root | localhost      | BugMe    | Sleep   | 521  |              |                                                                          
| 613 | root | dbserver:57522 | Intranet | Query   | 10   | Sending data | INSERT INTO MenuPermission(SequenceId, MenuId, PeopleId) SELECT 78, 
| 622 | root | dbserver:57533 | Intranet | Query   | 8    | Locked       | SELECT Distinct Id, Type, Item, Link, ILink, MenuGrp, Location, Args, 
| 623 | root | localhost      |          | Query   | 0    |              | show processlist                                                         
+-----+------+----------------+----------+---------+------+--------------+-------------------------------------------------------------------------

As an aside in 4.0.13 the query in mysqlbug2.sql appeared to lock the table for all accesses (I haven't tried this again but I am certain this was the case, as is the guy who was sitting next to me as a sanity check). In 4.0.14 I am to be able to do simple selects while the insert occurs (hence it appears the really heinous bug has been fixed).

Also interesting is that when I run the insert in mysqlbug2.sql on an empty table it inserts 118,000 rows in 2.75 seconds. If I run it again then the next 118,000 rows take 22.5 seconds to insert... is this the expected behaiour??

Cheers,

     Paul.

How to repeat:
I have uploaded the files to ftp://support.mysql.com:/pub/mysql/secret with filename mysqlbug_712.tgz and they can also be found at http://www.ensigma.com.au/mysqlbug.tgz (475k) if anyone else wants to play.

The salient parts of the database are included in the file mysqlbug1.sql. The query that is used to insert information into my permissions table is included in mysqlbug2.sql and the query that blocks is included in mysqlbug3.sql
[24 Jul 2003 20:08] Byron Kidd
I've been following the thread on bug #712 closely as I am also almost ready to take the plunge and upgrade from 3.23 to reap the rewards of improved full text searching and replication.  Investigation of bug #712 (and now #914) has stalled my enthusiasm.

Using the test case provided by Paul Coldrey on 4.0.14 under Windows XP Professional (SP1) I execute the SELECT (mysqlbug3.sql) while the INSERT (mysqlbug2.sql) is running at which point the SELECT blocks and no results are returned.  Once the INSERT finally does complete the SELECT returns its results in an instant.  

Unlike the results reported in the post above I never actually see the SELECT status as 'locked' in the process table.  Infact after numerous attempts I haven't actually seen the SELECT in the process table at all!!  

The only process information I receive is :

+-----+------+----------------+-------+---------+------+--------------+---------
--------------------------------------------------------------------------------
-------------+
| Id  | User | Host           | db    | Command | Time | State        | Info

             |
+-----+------+----------------+-------+---------+------+--------------+---------
--------------------------------------------------------------------------------
-------------+
|  52 | ODBC | localhost:1571 | NULL  | Query   | 0    | NULL         | show pro
cesslist
             |
| 124 | ODBC | localhost:1699 | bugme | Query   | 17   | Sending data | INSERT I
NTO MenuPermission(SequenceId, MenuId, PeopleId)
SELECT 1, Menu.Id, GroupMemberLookup.Perso |
+-----+------+----------------+-------+---------+------+--------------+---------
--------------------------------------------------------------------------------
-------------+

Although I have yet to test my own applications under 4.0.*, I have been able to reproduce the bug given the test procedure provided.
[25 Jul 2003 3:12] Tobias Lind
Be aware that it's normal behavior for INSERTS to block SELECTS with MyISAM-tables. Please run OPTIMIZE and do not delete any rows before running a test-case to show this bug. If SELECTS are still blocked, then we have an issue here, and it's serious. I think many users have optimized their application to benefit from non-blocking INSERTS in a table without holes. In my application I always mark rows in a huge table instead of deleting them right away. Later (about once a week) I run a script that removes all marked rows at once and runs OPTIMIZE... 

/Tobias Lind
[25 Jul 2003 5:46] Sergei Golubchik
I am not sure I understand "How to repeat" section.

There're three files in the archive:mysqlbug1.sql  mysqlbug2.sql  mysqlbug3.sql.

Am I supposed to load data with

mysql < mysqlbug1.sql

and then do insert with

mysql < mysqlbug2.sql

and while insert is working try to select with

mysql < mysqlbug3.sql

?

If yes, then I wasn't able to repeat the bug - mysqlbug3 was happily executed concurrently with mysqlbug2.
[25 Jul 2003 7:17] Alexander Keremidarski
Paul,

First of all thank you for your efforts and especially for good test case. However it was not good enoug to expose problem. I had to modify it before I was able to repeat "Locked" state.

With your test saw locked SELECT, but it happens once in a while, not everytime.

Interestingly your test case can be used for demonstration how Query cache works. Your SELECT query is perfect candindate for caching.

First thing is to make sure QCache is off otherwise your SELECT will be retrieved from Qcache at about 99.99% of time. 

This by the way can explain Byron Kidd's observation. Cached queries return result immediately so yiou can;'t see them in SHOW PROCESSLIST output.

I ran your INSERT in a loop and at another terminal I ran SELECT in a loop

while true; do mysql BugMe < mysqlbug2.sql; done
while true; do time mysql BugMe < mysqlbug3.sql; done

All times were within 0.02 - 0.5 seconds range and no Locked in SHOW PROCESSLIST

Then I tried to delete most of rows in MenuPermission without stoping above loops and I found that *sometimes* there are peaks in select time up to 3 seconds.

Once (just once) I saw select running for 2.5 minutes! and yes at this moment in SHOW PROCESLIST I was able to see select is Locked. Once!

All of the above is with Query Cache turned Off and single-session Select. 

With Query Cacne on select times are within 0.01 - 0.04 seconds range.

At next step I modified your test case in order to satsify:

"Once the new permissions have been fully inserted the previous permissions (with a lower SequenceId) are deleted."

As result SELECT query is Locked at the moment of DELETE which is normal. For subsequent INSERTs SELECT works as before. 

However repeating above procedure (INSERT then DELETE) leads to strange situation when it looks like there is Stalled lock hich remains after DELETE unbtil subsequent INSERT runs.

I have 2 SELECTing loops at thsi point and as per SHOW PROCESSLIST I can see states like "Locked", "System lock" and "Waiting for table".

SELECT time now varies betwen 0.024 sec and 6min 51.219sec !

So this is something to be investigated further.

Just for the record I tested with debug mysqld built from bk tree (post-4.0.14)
I'm running RedHat 9 too.
[4 Aug 2003 6:10] Paul Coldrey
As a quick update for anyone who may be working on this.... it appears the problem may be related to RedHat 9.0. Firstly I moved all the tables that might suffer from concurrency issues to InnoDB and this fixed some problems but the speed was still dismal.

I then installed an old machine with RedHat 7.3 and did some tests with both 4.0.14 and 3.23.57. Performance is similar for both versions. Despite the machine being low-spec the performance was significantly better than my high-end server running 4.0.14 on RedHat 9 (40-50 seconds compared with 7-8 minutes!). Also of interest is that the speed is still poor even when only one process is accessing the database. I hope to tie down a neat test case in the next couple of days.
[4 Aug 2003 23:17] Lenz Grimmer
Thanks a lot for your investigation! Yes, we too noticed degraded performance when using the Red Hat 9 kernel - it seems like its swapping behaviour is suboptimal. Make sure to use "vmstat" to monitor the memory situation. Do you run the latest Red Hat errata kernel? They seem to be working on this (notice the bug IDs below):

# rpm -q --changelog -p kernel-2.4.20-19.9.src.rpm | head
warning: kernel-2.4.20-19.9.src.rpm: V3 DSA signature: NOKEY, key ID db42a60e
* Wed Jul 16 2003 Arjan van de Ven <arjanv@redhat.com>

- add proc and execve fixes
- update ide drivers to fix critical bugs

* Sat Jul 12 2003 Rik van Riel <riel@redhat.com>

- upgrade to latest -rmap to fix #89226, #90668, etc.
[7 Aug 2003 20:34] Paul Coldrey
Just tried it with kernel 2.4.20-19.9 (I think that's the latest and greatest) and there was no change - still seriously slow. I don't want to be a pain but is there a plan on how to fix this - do you know if it's a RedHat problem or whether it's common to all 2.4.20 kernels? Either way I'd guess that it effects a fair portion of your users.

It's now nearly 7 weeks since I reported the bug (and about 9 weeks since I first stumbled across it) and I'm starting to get a lot of grief from people higher up the chain..... I need to tell the something!
[7 Aug 2003 21:27] Paul Coldrey
A little more...

I decided to take as many variables out the equation as possible so that I could generate a simple test case. Hence, I had PHP dump all the SQL commands it was running, to a file, so that I could run the scripts from a command line and show some combinations of MySQL/RedHat taking 30 seconds and others taking 5+ minutes.

Sadly, (at least in the interests of getting this fixed!) from the command-line I couldn't replicate the problem! Hence, it's only slow when run from PHP 4.3.2 on RedHat 9 with any MySQL 4.0 or 4.1.....

BUT!!!!!!

If I compile PHP 4.3.2 with MySQL 4.0.14 on RedHat 7.3 it runs a treat. 

Hence, perhaps the finger should be pointed at a specific version of GLIBC????
[8 Aug 2003 1:07] Lenz Grimmer
Hi Paul, thanks for providing us more details. Sorry, but I don't understand
"it runs a treat" - does this mean it works as expected, or is it still slow?

Can you try to catpture the output of "vmstat 2", while the slowness occurs? It would be interesting to see, if the kernel is busy shuffling memory around.

What about the output of "top" - do you see that mysqld is hogging the CPU, when it takes so long to return the query results?

The glibc C library on Red Hat 9 _might_ be a factor here. Do you use our statically linked Linux binary, or did you roll your own? I could not find this
info in this thread. If it's a dynamically linked binary, does using "export LD_ASSUME_KERNEL=2.2.5" before starting the applications (mysql/PHP) make a difference? This variable forces the application to use the "old" LinuxThreads implementation instead of NPTL from glibc 2.3.2.
[10 Aug 2003 22:53] Paul Coldrey
Hi Lenz,

"Works like a treat" is Australian for "worked correctly" - hence I meant that the slowness was not seen in this case... unfortuantely we speak with so many colloquialisms that I don't always succeed in editing them all out :-)

I am running the standard RPM versions of 4.0.14. I have compiled PHP using both the "in-built" MySQL support and using the installed libraries (--with-mysql and --with-mysql=/usr) and both perform the same way.

I have included at the bottom of this post the output from vmstat... it seems pretty normal - there are a couple of bursts of swapping but they only last a couple of seconds. The output from top also seemed OK.. the general stats looked like:

Size:    28824
RSS      14M
SHARE    2123
STAT     D
CPU      2.5%
Mem      1.4%
CPU      1

At the point marked in the vmstat output this changed to numbers more like:

Size     29000
RSS      16M
Share    2288
CPU      30%
Mem      1.6%

And CPU stayed between 15% and 35% for the rest of the execution. 

Some other testing I tried:

1. As suggested, I set "export LD_ASSUME_KERNEL=2.2.5" before starting MySQL... still slow. When I tried to restart Apache/PHP with this set it wouldn't start and I got "(38) Function not implemented: Couldn't create accept lock" in the Apache error log.

2. I upgraded my 7.3 test machine to 9.0, ran the test suite and surprisingly enough it ran at full speed! It appears to be running the same kernel version and the same glibc version (2.4.20-8 and 2.3.2). The server that is causing troubles was using an SMP kernel so I changed this to single processor and the problem still persisted. I then uploaded the full database to the machine that had been running 7.3 (including all the tables not required by the code that runs slowly) and the execution time went up to 4 minutes.

3. On a whim, I ran the test on a Windows XP machine and I gave up waiting for it to finish at 10 minutes. The CPU usage hovered between 2% and 8% and there was 200Mb of free physical memory and yet the disk thrashed. After 10 minutes I stopped the browser page loading and the browser closed normally but the disk continued to thrash for at least 5 minutes and the mysqld-nt process kept using arounf 3% CPU. I opened up the my.ini file in the c:\WINDOWS directory to see if there were any rediculous settings which might make it overly disk intensive and found a whole section that I was meant to uncomment if using InnoDB (which I am). I also found a note at the top that said to make the setting effective I had to copy it to c:\my.cnf, so I'm guessing that it was running with the default configuration. This isn't the intended behaviour running InnoDB with default settings,... right??

Hence the summary seems to be:
* InnoDB and MyISAM servers both have degraded performance
* Our problem doesn't appear to be due to OS virtual memory issues (since there is almost no paging)
* database size appears to have an effect even when the tables in the query do no change size.
* This may be OS dependant but since it occurs on RedHat 9.0 and Windows XP, the OS probably isn't at the crux of the problem.
* On two RedHat 9.0 machine my low spec machine out-performs my high spec machine which may suggest a problem with software-RAID code (or something similar)..... but this doesn't explain the XP performance
* Perhaps the myriad of seemingly contradictary evidence suggests that there is more that one bug?

I've now spend a few full days trying to help track down this bug and I cannot afford to spend any more... I quipped about moving to PostgreSQL earlier in this whole saga but now that is being examined seriously as a contingency plan. Honestly, I don't want to do this, but the pressure is mounting.... can do you guys give me any idea when you planning to look at this problem. I, personally, would like to wait until you have fixed the  bug that you have found and see if it solves my problems, however, if this is not going to happen in the next fortnight then I may be forced to follow one of our contingency plans.

   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 0  0  0  85852  18136 189516 721220    0    1     8     9   67    33  2  1  8
 0  0  0  85852  18136 189516 721220    0    0     0     0  121    21  0  0 100
 0  1  1  85852  17736 189680 721340   38    0    98   428  329   776  2  1 97
 0  1  1  85852  17268 189976 721440   40    0    90   792  502  1429  2  2 96
 0  1  1  85852  16864 190300 721452   32    0    38   888  579  1104  2  4 94
 0  1  1  85852  16500 190596 721460   30    0    34   826  512  1168  1  3 96
 0  1  1  85852  16088 190976 721476    8    0    16   792  493  1046  2  1 97
 0  1  1  85852  16476 191284 721660   86    0    90  1030  564  1245  1  3 96
 1  1  1  85852  16116 191680 721624    0    0     6   824  498  1193  1  2 97
 0  1  1  85852  15792 192044 721580    2    0     6   820  503  1081  1  1 98
 0  1  1  85852  15488 192340 721592    2    0     8   898  540  1133  2  2 96
 0  1  1  85852  15400 192592 721600    0    0     4   782  485  1000  1  1 97
 0  1  0  85852  14856 192984 721652    0    0     6   980  543  1099  1  3 96
 0  1  1  85852  15632 193484 721660    0    0     4   840  510  1126  1  2 97
 0  1  1  85852  15220 193888 721668    0    0     4   818  504  1201  1  2 97
 0  1  1  85852  14740 194356 721680    0    0     6   886  529  1268  1  2 97
 0  1  1  85852  14332 194752 721692    0    0     6   780  485  1006  1  3 95
 0  1  1  85852  15100 195172 721740   16    0    20  1014  531  1107  2  1 97
 0  1  1  85852  14608 195648 721752    0    0     6   860  520  1235  0  1 99
 0  1  1  85852  14244 196004 721760    0    0     4   812  499  1168  2  2 96
 0  1  1  85852  14700 196268 721772    2    4     8   836  506  1370  1  1 98
   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 1  1  1  85852  14428 196532 721780    0    0     4   822  510  1235  1  2 97
 0  1  1  85852  14132 196780 721824    0    0     6   984  518  1387  1  1 97
 0  1  1  85852  13904 196988 721844    0    4     4   816  498  1159  1  2 97
 0  1  1  85852  14752 197180 721944    0    0     4   828  506  1099  1  2 97
 0  1  1  85852  14236 197384 721960    0    0     6   844  521  1329  2  1 97
 0  1  1  85852  13872 197580 722040    0    4     4   814  498  1221  1  1 97
 0  2  1  85852  13904 197828 722104    0    4    32   878  507  1188  3  3 94
 0  1  2  85852  13352 198104 722136    0    4     6   998  528  1240  1  2 97
 2  0  0  85852  13204 198312 722160    0    4     4   778  487  1102  2  3 95
 0  1  1  85852  13076 198460 722180    0    4     4   834  519  1096  1  2 97
 0  1  1  85852  12888 198548 722280    0    4     6   832  506  1165  1  1 97
 0  1  1  85852  12780 198728 722288    0    0     4   846  518  1111  1  2 97
 0  1  1  85852  12640 198832 722332   16    0    22  1062  529  1167  2  2 96
 0  1  1  85852  12484 198964 722352    0    4     6   808  506  1022  1  1 97
 0  1  1  85852  12344 199144 722344    0    4     4   888  534  1152  1  1 97
 0  1  2  85852  12348 199144 722340    0    0     6   840  521  1223  2  1 98
 0  1  1  85852  12320 199172 722336    0    0     4   806  493  1162  1  1 97
 0  1  1  85852  12220 199332 722268    0    0     6  1060  552  1251  2  1 97
 1  0  1  85852  12156 199556 722296    0    4     4   848  512  1143  1  1 98
 0  1  1  85852  11900 199752 722368    0    4     6   866  521  1129  2  2 96
 0  1  1  85852  12060 199964 722376    0    0     4   814  505  1056  2 10 88
   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 0  1  1  85852  11848 200176 722388    0    0     6   828  508  1467  2  2 96
 0  1  1  85852  11552 200364 722500    0    4     4  1014  555  1143  1  2 96
 0  1  2  85852  11816 200584 722520    0    0     6   864  523  1087  1  2 97
 0  1  0  85852  11660 200740 722520    0    0     4   752  501   941  0  2 98
 0  1  1  85852  11532 200872 722508    0    0     4   822  508  1023  1  1 98
 0  1  1  85852  11532 200872 722508    0    0     6   798  500  1374  2  2 97
 0  1  1  85852  11532 200872 722392    0    0     4  1024  514  1211  0  2 98
 0  1  1  85852  11460 200980 722388    0    0     6   832  512  1216  1  1 98
 0  1  1  85852  11456 201064 722320    0    0     4   808  484  1164  2  2 96
 0  1  1  85852  11288 201220 722328    0    0     4   886  536  1231  1  1 97
 0  1  1  85852  11480 201296 722336    0    0     6   820  514  1083  1  1 97
 1  1  1  85852  11432 201344 722180   24    0    28  1162  555  1212  1  2 97
 0  1  1  85852  11308 201464 722192    0    0     6   780  486  1022  1  3 96
 0  1  1  85852  11192 201572 722200    0    0     4   884  534  1157  2  2 96
 0  1  1  85852  11180 201572 722212    0    0     6   850  527  1123  0  2 98
 0  1  1  85852  11152 201592 722220    0    0     4   818  501  1023  2  1 97
 0  1  1  85852  11132 201592 722260    0    0     4  1018  538  1150  2  2 96
 0  1  1  85852  11120 201592 722268    2    0     6   816  504  1028  1  1 98
 0  1  2  85852  11040 201664 722280    0    0     6   822  509  1066  1  2 97
 0  1  1  85852  11032 201664 722288    0    0     4   822  504  1261  0  1 99
 0  1  1  85852  11232 201748 722284    0    0     6   832  501  1061  1  2 97
   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 0  1  1  85852  11144 201836 722280    0    0     6   994  517  1085  1  1 98
 0  1  1  85852  11144 201836 722272    0    0     4   896  535  1220  0  2 98
 0  1  1  85852  11144 201836 722272    0    0     6   828  513  1154  1  1 97
 0  1  1  85852  11144 201836 722268    0    0     4   826  506  1107  3  3 94
 1  1  2  85852  11144 201836 722268    0    0     6   862  526  1101  2  2 96
 0  1  1  85852  11116 201836 722108    0    0     4  1084  516  1099  1  2 97
 0  1  1  85852  11116 201836 722116    0    0     4   834  526  1047  0  1 99
 0  1  1  85852  11116 201836 722128    0    0     6   868  528  1105  1  1 99
 0  1  1  85852  11116 201836 722140    0    0     6   778  492  1045  1  2 98
 0  1  1  85852  11116 201836 722152    0    0     6   868  526  1133  2  2 96
 0  1  1  85852  11084 201868 722196    0    0     6  1038  527  1131  1  1 98
 1  0  1  85852  11084 201868 722204    0    0     4   774  483  1077  1  1 98
 0  1  0  85852  11084 201868 722216    0    0     6   810  500  1052  1  6 92
 0  1  1  85852  11084 201868 722224    0    0     4   776  482  1016  2  2 96
 0  1  1  85852  11084 201868 722232    0    0     4   824  498  1017  2  1 97
 0  1  1  85852  11084 201868 722284    0    0    26   924  494  1026  1  2 96
 0  1  1  85852  11080 201868 722296    0    0     6   846  521  1135  2  4 94
 0  1  2  85852  11068 201876 722304    0    0     4   806  492  1052  2  1 97
 0  1  1  85852  11064 201876 722316    0    0     6   888  540  1184  2  2 96
 0  1  1  85852  11064 201876 722324    0    0     4   848  516  1294  2  1 97
 0  1  2  85852  11040 201876 722400    0    0     6   976  516  1275  2  2 96
   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 1  1  1  85852  11036 201876 722408    0    0     4   878  532  1120  1  1 98
 0  1  1  85852  11020 201876 722416    0    0     4   794  486  1044  2  3 95
 0  1  1  85852  11008 201876 722428    0    0     6   816  498  1194  1  1 97
 0  1  1  85852  10968 201876 722468    0    0    20   852  522  1101  2  1 97
 0  2  1  85852  10952 201876 722480    0    0     6   964  511  1453  2  1 97
 0  1  1  85852  10792 201876 722584    0    0     4   914  504  1366  1  2 96
 0  1  1  85852  10744 201924 722592    0    0     4   836  514  1062  0  1 98
 2  0  0  85852  10736 201932 722600    0    0     4   814  500  1065  2  4 94
 0  1  1  85852  10728 201932 722612    0    0     6   862  530  1106  0  1 98
 0  1  1  85852  10724 201932 722624    0    0     6   810  504  1087  1  2 96
 0  1  1  85852  10716 201932 722632    0    0     4   986  526  1050  1  2 97
 0  1  1  85852  10716 201932 722640    0    0     4   836  514  1059  1  1 98
 0  1  1  85852  10712 201932 722652    0    0     6   830  519  1071  1  1 97
 0  1  1  85852  10912 201932 722656    0    0     4   820  512  1082  1  1 97
 0  1  1  85852  10912 201932 722652    0    0     4   804  495  1077  1  2 97
 0  1  1  85852  10872 201932 722440    0    0     6  1064  512  1308  1  2 97
 0  1  1  85852  10872 201932 722448    0    0     4   842  522  1078  1  3 96
 0  1  1  85852  10732 202072 722460    0    0     6   842  510  1270  1  2 97
 0  1  1  85852  10708 202096 722468    0    0     4   890  534  1183  2  1 97
 0  1  1  85852  10172 202096 722496    0    0    22   822  509  1075  2  1 97
 0  1  2  85852  10172 202096 722496    0    0     4  1064  538  1107  1  3 96
   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 1  1  1  85852  10144 202124 722496    0    0     4   854  522  1557  1  2 97
 0  1  1  85852  10048 202228 722492    0    0     6   850  516  1239  1  3 96
 0  1  1  85852   9996 202280 722492    0    0     4   852  519  1177  1  0 98
 0  1  1  85852  10156 202324 722488    0    0     4   822  498  1120  1  2 97
 0  1  1  85852   9988 202328 722512    0    0     8   980  527  1106  1  1 97
 0  1  1  85852   9988 202328 722520    0    0     4   884  531  1157  1  2 97
 1  0  1  85852   9988 202328 722532    0    0     6   816  511  1074  1  2 97
 0  1  1  85852   9988 202328 722540    0    0     4   834  509  1098  1  1 98
 0  1  1  85852   9988 202328 722552    0    0     6   890  546  1234  2  2 96
 0  1  1  85852   9988 202328 722592    0    0     4   962  499  1180  1  3 96
 1  1  1  85852   9988 202328 722600    0    0     4   878  532  1113  1  2 96
 0  1  1  85852   9988 202328 722612    0    0     6   816  503  1144  1  2 97
 0  1  1  85852   9984 202328 722624    0    0     6   850  519  1190  1  1 98
 0  1  1  85852   9984 202328 722632    0    0     4   904  544  1179  2  2 96
 1  0  1  85852   9984 202328 722748    0    0     4   980  506  1113  0  1 98
 0  1  0  85852   9984 202328 722744    0    0     6   814  501  1078  1  1 97
 0  1  1  85852   9984 202328 722744    0    0     4   872  534  1279  1  2 97
****************************************************
-> Jump in mysqld stats in 'top' occured here
****************************************************
 1  1  1  85852   9756 202328 722820   88    0   134   776  477  1208  6  2 93
 2  0  0  85852   9008 202328 722664   26    0   168   590  173   190 45  1 54
 2  0  0  85852   9220 202328 720904  156    0   520    90  201   403 65  1 33
 1  0  0  85852   9776 202300 720140  562    4   758  4050  573  1466 48  5 47
   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 1  1  1  85852   9576 202308 720332    0    0   484  3472  346   989 36  4 60
 1  0  0  85852   9780 202312 720444    4    0   278  4280  501  1069 29  6 66
 0  1  4  85852  10488 202320 719956    0    0    84  7228  669  1484 26  7 66
 1  0  0  85852   9996 202320 720340   32    0    76  4136  429   847 31  6 63
 1  0  0  85852   9484 202320 720660  136    0   162  4600  541  1484 29  6 65
 1  0  0  85852   9480 202320 720664    0    0    78  6250  595  1276 30  4 66
 0  1  1  85852  10136 202320 722036    0    0   142  8380  717  1504 22  7 71
 0  1  1  85852   9832 202320 722520    0    0    80  7046  684   936 22  7 71
 0  2  1  85852   9660 202320 722624    0    0    20  6838  913  2413 13  6 80
 0  1  2  85852   9808 202320 722740    0    0    92  9664 1014  2703 13 10 76
 0  1  2  85852   9856 202320 722800    0    0    12  9644  954  2706 13  8 79
 1  0  0  85852   9820 202320 722800    0    0    16  7648  967  2723 12  6 82
 0  2  1  85852   9776 202320 722848    0    0     0 10556 1133  2943  9  9 83
 0  1  1  85852   9696 202320 722928    0    0    12 10482 1115  3088  9 11 80
 1  0  0  85852   9664 202320 723172    0    0   168  8902 1058  2998  9  9 82
 0  1  1  85852   9608 202320 723228    0    0    26  9546 1035  2913 12 10 78
 0  1  2  85852   9648 202320 723088    0    0     0  8008  977  1660 15  6 79
 0  1  1  85852   9768 202320 722888    0    0    24  9012  869  1537 12 14 74
 0  1  1  85852   9768 202320 723008    0    0     4 10470 1105  2496  9 11 80
 0  1  1  85852   9660 202320 723144    0    0    12  8968 1017  2628 10  9 81
 0  1  2  85852   9628 202320 723144    0    0     0 10312 1107  2530  9 10 81
   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 0  1  2  85852   9608 202320 723184    0    0    28  8882 1022  2196 12  8 80
 2  0  0  85852   9868 202320 723008    0    0     0  9120  989  2659 12  7 81
 1  0  0  85852   9748 202320 723264    0    0    64  8546  900  1735 11  8 82
 0  1  2  85852   9720 202340 723156    0    0     8 10342  961  2100 11 11 79
 1  0  1  85852   9648 202340 723248    0    0    24 10216 1072  2297  8 10 83
 1  1  2  85852   9644 202340 723328    0    0     8  9360 1091  2398  9  9 83
 0  1  1  85852   9644 202340 723320    0    0     0 10364 1082  1785  8 13 79
 0  1  1  85852   9676 202340 723208    0    0     4 10348 1074  2722  9  8 83
 0  1  1  85852   9612 202340 723504    0    0    44  8740  981  2061 12  9 79
 0  1  1  85852   9736 202340 723288    0    0    44  9688  970  2613 11 10 79
 0  2  1  85852   9728 202340 723320    0    0     0  9546 1049  3063  9  9 82
 0  1  3  85852  10012 202340 723156    0    0    20 10420  969  2422 10 10 79
 0  1  1  85852   9880 202340 723328    0    0    38  8844  960  2457  8  9 83
 0  1  1  85852   9744 202340 723464    0    0    12  8902 1121  3192  8 10 81
 1  0  0  85852   9696 202340 723508    0    0    20  8580 1000  2818 13  9 78
 0  1  2  85852   9596 202340 723328    0    0    50 10276 1020  2377 15  9 76
 0  1  2  85852   9668 202340 723036    0    0     0  8934 1034  2306 10  7 83
 0  1  1  85852   9660 202340 723060    0    0    12  8904 1038  2614  8  9 83
 1  0  1  85852   9668 202340 723060    0    0     0 10376 1181  3440  9 10 81
 1  0  0  85852   9664 202340 723108    0    0     0  9796 1009  2672  8 11 81
 1  1  1  85852   9756 202328 722820   88    0   134   776  477  1208  6  2 93
 0  1  1  85852   9664 202340 723124    0    0     0 10418 1169  3808  9  9 82
   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 0  1  1  85852   9588 202340 723364    0    0    10  8022  945  1934 16  8 76
 1  0  0  85852   9588 202340 723276    0    0     8  8948  923  1834 13  9 78
 0  1  1  85852   9600 202340 723076    0    0    32 10178  904  2108 12 12 76
 0  1  1  85852   9684 202340 723068    0    0     8  8818 1092  2136  9  8 83
 0  1  1  85852   9700 202340 723132    0    0    12 10384 1111  3423 10 10 80
 0  1  1  85852   9592 202340 723132    0    0     8  8196  946  2404 12  9 79
 0  1  2  85852   9596 202340 722900    0    0    32  9444 1030  2739  9  8 83
 0  1  2  85852   9608 202340 722724    0    0     0 10518 1079  2320 10 11 79
 1  0  0  85852   9608 202340 722820    0    0     0  8798 1123  2883  9  8 84
 0  1  1  85852   9608 202340 722884    0    0     8 11196 1106  3253 10 16 74
 1  0  0  85852   9608 202340 722980    0    0     8  9590 1072  2282  8 14 78
 0  1  1  85852   9608 202340 723068    0    0     4 10448 1144  2012  9  9 81
 0  1  2  85852   9604 202340 723124    0    0    12 10410 1050  2593  8 11 82
 0  1  1  85852   9828 202340 722984    0    0     0  9458 1056  2845  9  9 81
 1  0  1  85852   9808 202340 723064    0    0     0  8282 1017  2730 11  8 80
 0  1  1  85852   9660 202340 723048    0    0     8  8184  844  2003 18  8 74
 2  0  0  85852   9592 202352 723004  238    0   514  2824  522  1158 36  8 55
 2  1  2  84540   9988 202360 722428   52    0   270  3180  619  1843 50  9 42
 0  1  2  84540  10080 202376 721740    0    0   332  5510  690  1619 28  7 65
 1  0  0  84540  10080 202376 721740    0    0     0  3566  541  1367 29  4 67
 0  1  1  84540  10096 202376 721724    0    0     0  8724  987  3108 12  9 79
   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 0  1  1  84540  10096 202376 721724    0    0     0  7820  960  2736 12  7 81
 1  0  0  84540  10100 202376 721724    0    0     0  7758  975  2757 13  8 79
 0  1  2  84540  10100 202376 721724    0    0     0  8556  851  2026 17  8 74
 0  1  1  84540  10100 202376 721724    0    0     0  5450  765  1950 20  4 76
 0  1  1  84540  10112 202376 721724    0    0     0  6922  870  2438 19  7 75
 1  0  0  84540  10112 202376 721724    0    0     0  5230  701  2023 24  5 71
 0  1  1  84540  10116 202376 721724    0    0     0  5294  686  1933 25  5 70
 0  1  2  84540  10116 202376 721724    0    0     0  5048  572  1734 29  6 65
 0  1  1  84540  10116 202376 721724    0    0     0  4674  617  1548 27  4 69
 1  0  0  84540  10120 202376 721724    0    0     0  4420  677  1818 25  6 69
 1  0  0  84540  10120 202376 721724    0    0     0  3656  500  1309 31  4 65
 1  0  0  84540  10124 202376 721724    0    0     0  3748  473  1246 32  4 64
 1  0  0  84540  10124 202376 721724    0    0     0  1970  323   286 41  1 58
[11 Aug 2003 3:52] Alexander Keremidarski
Reading carefully your comments conclusion is that you found that MySQL 4.0.14 works slower on RedHat 9 than on RedHat 7.3 and also 4.0.14 is slower on Windows XP (compared to what?)

Is that true?
[12 Aug 2003 13:39] Sergei Golubchik
salle until you finish these investigations let this bug be assigned to you, ok ?
[27 Aug 2003 7:53] Michael Widenius
I don't have yet a solution to this problem, but some ideas that may
help us find the problem.

First some background:

- If your application worked good in 3.23 it should work at least as good in
  the same environment in 4.0.
- The way MyISAM does locking has not changed between 3.23 and 4.0.
  MyISAM still uses table locking and SELECT and INSERT can't be run at the
  same time (except in the case where you insert into a table without deleted
  rows).   Several SELECTS can however be run at the same time.

  This is described in detail in http://www.mysql.com/doc/en/Table_locking.html

- You can influence how MyISAM does locking by starting mysqld with:
  --low-priority-updates (in which case MySQL will prefer selects before
  inserts) or by using INSERT LOW_PRIORITY ...
  You can also schedule more SELECTS between inserts by setting the variable
  max_write_lock_count to a low value (for example 10).

- If you need to be able to read and write at the same time, then
  Innodb tables should be the way to go. If you set up buffers for InnoDB you
  should get very good speed in most case!
  (If you don't set up buffers InnoDB tables will work, but very slow).

- MyISAM doesn't work as good on windows as on Linux because Windows
  doesn't cache the disk as good as Linux (which is needed because MyISAM
  depends on the system to cache rows (not index).  This could explain the
  test results you got on windows.  Note that InnoDB doesn't have this
  problem on Windows and is know to deliver very good performance on
  this platform (if buffers are set up correctly).

- The first insert in a MyISAM table is much faster than future inserts
  because when you do an insert into an empty tables we create the
  index in a separate batch.  You can get future index faster by increasing
  the key_buffer_size, myisam_sort_buffer_size and bulk_insert_buffer_size.
  If you don't specify key_buffer_size at all, then your inserts will
  be significantly slower as your tables grow.

- MySQL 4.0 has more caches and much more parameters to tune than 3.23.
  (The most important ones are record_rnd_buffer_size, bulk_insert_buffer_size
  and query_cache_size).
  If we could get your my.cnf and status variables we could give you tips
  how to get more speed for your queries.
  (Note that the problem could be that you have set up a high value for
   some cache in 3.23 but not for a new cache in 4.0 that does the same
  job)

- If you are using a binary downloaded from www.mysql.com it should not
  be a glibc bug as our binaries are staticly compiled and thus doesn't
  use the glibc on RedHat 9.

- According to vmstat there was a lot of swap activity when you said that
  mysqld started to go slow.  Usually this indicates some very strange query
  or that another (maybe sleeping) program started.

Here is what you can try next:

- Can you post the output from 'mysqladmin var ext' when the
  slowdown happens?

- What options do use to start mysqld ?
  Please post your my.cnf files here.

  You can also get this by doing:
  my_print_defaults mysqld server mysqld_safe safe_mysqld

- When you tried with innodb, what settings for the innodb buffers did you use.
  The default settings are very low and not suitable for heavy load with
  large tables.  Check settings in my-large.cnf for an idea how to set them.
  You should also try to set innodb_flush_log_at_trx_commit to 0 for even
  higher speed.

- Did you try to run MySQL 3.23 on the new redhat 9.0 machine ?
  Does this work ok ?

- You said that the time went up to 4 minutes.
  Can you please describe in more detail what you meant ?
  Can you make a test for this ?

- Can you create a better test script that also gets the old 7.3
  machine to fail (now when it's upgraded to 9.0)

- Try a kernel from Suse 8.2; They have compiled their kernels with
  totally different options than Redhat.  If this works, then we know a lot
  more.

- I talked with the Redhat people on Linuxworld and they told me that
  their default kernels are optimized for desktop usage. If you want to have
  a (pre-compiled ?) kernel that is suitable for database usage you MUST
  buy RedHat advanced server.  They didn't tell me exactly what was changed
  except that at least the memory system was significantly different in
  advanced server.
  I wonder if they also changed how the file system was cached
  (which could explain a lot of the things you are seeing)

  Could you try if upgrading to RedHat advanced server would help ?

- Have you tested converting your tables to InnoDB (after setting up the
  innodb buffers to match your system). If yes, what changed and what
  performance numbers did you get ?

  The main reason for trying innodb is that innodb has it's own disk cache
  system and is not dependent on the system to cache data.  If you are mainly
  using innodb tables you should set up the innodb buffers to use 80 % of
  the free memory you have on your machine or enough memory to cache a 
  big part of your tables.

- Try moving swap and /tmp to another hardisk.

- Check that you have the same dma settings on both machines on your
  hard disks. (you can do this with hdparm)

- Check that you don't have any program that uses a lot of disk (like
  updatedb) when the slowdown happens!

Some ideas of where the problem could be:

- Bug in kernel.
- Kernel optimized for 'workstation' which is not suitable for
  live data processing with uses lots of memory.
- Some hardware problem on your 'faster' machine.
- Some cache option in MySQL 4.0 that is not optimal
- Some bug in MySQL 4.0 (but not that likely as MySQL 4.0 works ok
  your other machine)
- Other program accessing disk or taking up resources.

We are doing our best to try to repeat this problem.  To do this we
would however need a way to reproduce this.

I am now going to look at the problem Alexander Keremidarski
reported in the bug report and try to reproduce this.

I will also try today to run your tests script on my Redhat 9.0
machine to check things out, but I am worrying that I may not be able
to reproduce this without having access to the startup options you
used for MySQL.

Another option would be if we could get access to your 'new system'
with some scripts that we could use to reproduce this.

Other information:

According to what I have been informed after writing the above, there
is a follow up to this bug in #1027. There has also been some
discussion on the bugs@lists.mysql.com email list about this.

- The config file post in #1027 doesn't have any buffers setup and
  explains why MySQL doesn't work on full speed.
  (It doesn't however explain why the 9.0 machine was much slower)
- For not it looks like a bug in Redhat 9.0.  We are of course
  still interested to fix this (any bugs that causes MySQL to performance
  bad is interesting to us). We would however need a good test case
  to find out what could be the problem.
  One thing we could do is to put RedHat 9 on one on our benchmark machine
  and run our benchmarks on it to see if we can find any notable difference
  between this and Suse 8.2.  I will instruct our benchmark team to do this
  ASAP.
  It would be very interesting to know if RedHat Advanced server also have
  the same problem.

Regards,
Monty
[28 Aug 2003 16:53] Paul Coldrey
Hi Monty,

Firstly, thanks for all your comments :-)

From my point of view this bug could be moved into "cannot repeat" status. However, Alex may have a different opinion??

I do not feel that my test case properly illustrates the issue I am experiencing (sorry). My problems may be related to the issues that Alex saw, but I am not convinced this is the case (and, in fact, I strongly suspect it is not the case).

Here are some related comments from an email that I sent directly to Alex 
when we were discussing the fate of bug 1027...

> The innodb_flush_log_at_trx_commit does have an effect on our systems, as 
> you suggested. However, the server that all this started on, still seems to 
> have inexplicably bad performance regardless of this setting. Perhaps this 
> is tied to the issues you have found 914 in some way.
>
> Also, as you noted, some of the information I provided on 914 is probably 
> spurious because there were so many effects I was struggling with, that it 
> was hard to get a clear picture. Some spots where I was confused are:
>
> 1. I did not realise MyISAM tables blocked SELECTs when INSERTs occur - 
> previously they had always occured so quickly that I had not been effected 
> by this behaviour. Hence I reported this as a bug when it was actually just 
> an artefact of the really poor performance I was experiencing
>
> 2. To try and overcome the blocking issue I moved the tables that require 
> bulk inserts in InnoDB. I did not set the innodb_flush_log_at_trx_commit 
> flag and so I noticed hugely disparate performance between my old and new 
> server - some of which was clearly due to this flag.
>
> 3. The test on my Windows XP machine runs correctly now so I think that was 
> a random Microsoft'ism that occured at the end of a long day :-)
 
Notwithstanding all above notes, there is still definitely a problem with my machine. I have a routine I can run that performs REALLY poorly,.. the problem lies in isolating the key feature of the algorithm which is running slowly, in isolation all the constituent parts seem to run OK. Initially, I did not realise how subtle my problem was as it is quite easy to repeat in the environment in which I found it.

Recently, I have been trying to code around the problem and I believe I may have isolated a repeatable test case which shows some odd behaviour (related to my initial problems). However, I believe it is unrelated to the issues uncovered by Alex in this report and so is not relevant to this bug.

I have been considering changing the server over to FreeBSD and if I follow this path then I will let you guys know whether my problems still occur.  In light of your comments on RedHat, it seems this may be a wise step on any account.

Cheers,

Paul
[28 Aug 2003 17:07] Paul Coldrey
my.cnf file:

[mysqld]
ft_min_word_len=2
[18 Dec 2003 21:53] MySQL Verification Team
Are you tested the latest 4.0.X release (now we are with 4.0.17) 
after your experience with 4.0.14 ? or some relevant news about
hardware/OS experiences?
[21 Dec 2003 14:36] Paul Coldrey
I remain unable to make a repeatable test case. I found two machines on which I could repeat the problem (both running RedHat 9 with software RAID). I have found other machines (running a miscellany of operating systems - including one with RH9 and software RAID) on which the performance was notably better. As noted in one of the reports it appears as if "high-power" server machines run slower than "low-power" desktops (and I feel this looks more like a RedHat issue than a MySQL one). I know this is not much help to tracking down the problem!

In the end, I moved to InnoDB tables and changed the whole algorithm to get around the problem.... I have not run into any other speed issues.

I would not object if this bug were closed.

Thanks for all your help and comments.
[23 Jan 2004 19:35] MySQL Verification Team
According your last comment that wasn't possible to have a test case
and won't possible because you moved to InnoDB. Thank you.