Bug #42503 | "Lost connection" errors when using compression protocol | ||
---|---|---|---|
Submitted: | 31 Jan 2009 21:09 | Modified: | 12 Jan 2011 17:31 |
Reporter: | Tech Magos | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server | Severity: | S2 (Serious) |
Version: | 5.1.34, 5.1 bzr | OS: | Linux |
Assigned to: | Dmitry Shulga | CPU Architecture: | Any |
[31 Jan 2009 21:09]
Tech Magos
[1 Feb 2009 10:47]
Valeriy Kravchuk
Thank you for a problem report. Please, check your in server's error log if there was a server crash at that time. What version of client do you use? Check with mysql --version, please.
[1 Feb 2009 14:38]
Tech Magos
Server reports nothing in the logs. It seems to have no idea of the problem. No server crash either. mysql client Ver 14.14, 5.1.30 (linux); from same version as the server. Note exactly the same behaviour is seen by Windows client code, connecting by direct use of the client API. I.e. if compression is on, library returns lost connection, after call to mysq_real_query call. If compression is off, then query works for win client code too.
[1 Feb 2009 19:58]
MySQL Verification Team
I couldn't repeat with current source server. Could you please provide your my.cnf file and create table statement. Thanks in advance.
[2 Feb 2009 18:29]
Tech Magos
[mysqld] slow_query_log slow_query_log_file=/path/slow.log skip-safemalloc skip-name-resolve skip-locking default-character-set=latin1 default-storage-engine=INNODB max_allowed_packet=250M query_cache_type=1 query_cache_size=800M query_cache_limit=100M max_connections=150 max_connect_errors=100 wait_timeout=3600 connect_timeout=16 net_read_timeout=50 net_write_timeout=100 interactive_timeout=300 innodb_locks_unsafe_for_binlog=1 innodb_autoinc_lock_mode=2 innodb_buffer_pool_size = 16000MB innodb_log_files_in_group = 2 innodb_log_file_size = 320M innodb_log_buffer_size=16M innodb_data_file_path = ibdata1:100M:autoextend innodb_additional_mem_pool_size=25M innodb_file_per_table=1 innodb_open_files = 2048 innodb_max_dirty_pages_pct=95 innodb_checksums=0 innodb_support_xa=0 innodb_doublewrite=0 innodb_flush_method = O_DIRECT innodb_flush_log_at_trx_commit = 2 innodb_lock_wait_timeout=50 innodb_thread_concurrency=16 innodb_commit_concurrency=20 innodb_read_io_threads=16 innodb_write_io_threads=4 key_buffer_size=256M sort_buffer_size=8M myisam_sort_buffer_size=8M read_rnd_buffer_size=32M read_buffer_size=3M join_buffer_size=16M tmp_table_size=256M max_heap_table_size=256M max_tmp_tables=100 thread_cache_size=128 thread_concurrency=16 [safe_mysqld] timezone=GMT CREATE TABLE `ID2NAME` ( `Id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT, `Name` varchar(255) NOT NULL, PRIMARY KEY (`Id`), UNIQUE KEY `iName` (`Name`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
[12 Feb 2009 10:41]
Sveta Smirnova
Thank you for the feedback. I can not repeat described behavior. Please try current version 5.1.31 and inform us if problem still exists.
[13 Mar 2009 0:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[22 May 2009 11:40]
MySQL Verification Team
the might be verified in bug #45031 . crash is very hard to repeat, and depends on certain properties of the compressed stream. hence, using valgrind is needed is alot of times to catch the problem.
[28 May 2009 22:39]
Tech Magos
This is a real issue and the last cross-reference seems to suggest where it is (valgrind report) It only seems to happen when requested data is pretty large; e.g. 300K rows or more of data of 6 20 byte columns or so. It does not seem to be related to some memory exhaustion problem, as when the compression protocol is off, then this works fine. The valgrind report on the cross-referenced issue suggests that this is due to a memory-related bug, i.e. trying to read off bounds. Could you please check if things work ok in the case - The client is 32-bit and server is 64-bit; i recall seeeing complen in my_real_read being declared size_t; this is probably different size in 32 and 64 bit. - Could it be that the size of the compressed block is not always well defined or aligned so the reader gets confused? And this happens only when we get very large data returned? - It is not clear to me the problem originates on the server or the client. The cross-referenced bug report suggests this happens on the client side. Server reports nothing on the logs when the error hits. - When it happens, it occurs from both linux 64-client, 32-bit client and windows 32-bit client.
[1 Jun 2009 22:41]
Tech Magos
This bug occurs also in 5.1.34 and seems to be also connected to the query cache. Here are some important findings: - The problem happens AFTER the first time the query is run, i.e. after it stores itself in the query cache. First time, the query works fine. - If query cache is disabled, the bug never happens. - If SQL_NO_CACHE is placed after the SELECT, the bug never happens. - After the problem happens, if RESET QUERY CACHE is run, the problem does not happen for run of the query after the reset. It does happens again for runs after that. - Seems that if qcache is 1 and the query result does not fit inside the qcache (e.g. if query_cache_limit is small), then the query never fails So the problem seems to appear when the compressed protocol is used AND the query produces a large result that is already cached in the query cache. It is worth mentioning that when the problem appears the server does pipe data through the channel of the same size as when the query succeeds(as SHOW GLOBAL STATUS reveals). For 5.1.34, when valgrind is run on client (mysql) or server (mysqld), no issues are reported. The valgrind problem as mentioned in the cross-referenced bug related to 6.x is NOT reported. So it looks like there is probably malformed data produced when the query result is compressed from the query cache.
[30 Jun 2009 6:35]
Sveta Smirnova
Thank you for the feedback. Verified as described. To repeat: 1. filled table with insert into ID2NAME (Name) values(md5(rand()));, then different combinations of insert into ID2NAME (Name) select md5(rand()) from ID2NAME;, insert into ID2NAME (Name) select concat(Id, md5(rand()), Id) from ID2NAME; etc. 2. mysql> select count(*) from ID2NAME; +----------+ | count(*) | +----------+ | 1048576 | +----------+ 1 row in set (5.89 sec) 3. Run following command: while ./bin/mysql -hsol10-sparc-c -P33051 -uroot -C test -e "select Id, Name from ID2NAME" >/dev/null; do echo 1; done Server started as: ./libexec/mysqld --defaults-file=support-files/my-small.cnf --basedir=. --datadir=./data --port=33051 --socket=/tmp/mysql_ssmirnova.sock -O thread_stack=1280000 --log-error --query_cache_type=1 --query_cache_size=800M --query_cache_limit=100M &
[30 Jun 2009 6:35]
Sveta Smirnova
Backtrace in my env: (dbx) where current thread: t@13 [1] __lwp_kill(0x0, 0x6, 0xff06cbc0, 0xfef21c00, 0xff06c388, 0x6), at 0xff04155c =>[2] my_write_core(sig = 6), line 310 in "stacktrace.c" [3] handle_segfault(sig = 6), line 2536 in "mysqld.cc" [4] __sighndlr(0x6, 0x0, 0xfe638ce0, 0x270964, 0x0, 0x0), at 0xff040494 ---- called from signal handler with signal 6 (SIGABRT) ------ [5] __lwp_kill(0x0, 0x6, 0x85274, 0xff3b3904, 0xff06fb20, 0x0), at 0xff04155c [6] raise(0x6, 0x6, 0x5, 0x6, 0x32aa0, 0x0), at 0xfefdfef8 [7] abort(0xfe639058, 0xff0700b8, 0xff06928c, 0xa8274, 0xff068288, 0xff070904), at 0xfefc00d8 [8] __assert(0x9c1c50, 0x9c1bd0, 0x212, 0xfe63930c, 0xa7f90, 0xb02750), at 0xfefc0354 [9] Diagnostics_area::disable_status(this = 0x33b92568), line 530 in "sql_class.cc" [10] Query_cache::send_result_to_client(this = 0xc2c480, thd = 0x33b91808, sql = 0x33bc0840 "select Id, Name from ID2NAME", query_length = 28U), line 1527 in "sql_cache.cc" [11] mysql_parse(thd = 0x33b91808, inBuf = 0x33bc0840 "select Id, Name from ID2NAME", length = 28U, found_semicolon = 0xfe639cf0), line 5886 in "sql_parse.cc" [12] dispatch_command(command = COM_QUERY, thd = 0x33b91808, packet = 0x33ba412d "select Id, Name from ID2NAME", packet_length = 28U), line 1213 in "sql_parse.cc" [13] do_command(thd = 0x33b91808), line 854 in "sql_parse.cc" [14] handle_one_connection(arg = 0x33b91808), line 1115 in "sql_connect.cc" (dbx)
[30 Jun 2009 6:49]
Sveta Smirnova
If don't use compression protocol, run same client test, then kill thread connected several times server crashes with same trace as well.
[17 Aug 2009 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[18 Aug 2009 16:49]
Konstantin Osipov
This can easily lead to disconnect on production.
[19 Aug 2009 9:49]
Andrei Elkin
A failure http://pb2.norway.sun.com/web.py?template=mysql_show_test_failure&test_failure_id=2281048 can relate. Some details on that crash: test-max-win_ws2008-x86_64 loki05 ps_stm innodb [ fail ] mysqltest: At line 1001: query 'update t1 set c=a' failed: 2013: Lost connection to MySQL server during query
[24 Aug 2009 16:18]
MySQL Verification Team
Sveta, Please check whether this bug is fixed with a patch made for a bug #45031. Thanks ...
[25 Aug 2009 7:54]
Sveta Smirnova
Bug is still repeatable after fix of bug #45031. Please note [10] Query_cache::send_result_to_client(this = 0xc30888, thd = 0x33b95d88, sql = 0x33bc3f80 "select Id, Name from ID2NAME", query_length = 28U), line 1624 in "sql_cache.cc" in the trace. Also bug is not repeatable with query cache disabled. So this is compression + query cache problem.
[16 Aug 2010 6:51]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/115748 3480 Dmitry Shulga 2010-08-16 Fixed bug #42503 - "Lost connection" errors when using compression protocol. @ sql/sql_cache.cc modified Query_cache::append_result_data: added checking for max packet size limit (2^24) when forming result data block.
[6 Sep 2010 4:01]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/117561 3480 Dmitry Shulga 2010-09-06 Fixed bug #42503 - "Lost connection" errors when using compression protocol. @ sql/net_serv.cc net_realloc() modified: consider already used memory when compare packet buffer length @ sql/sql_cache.cc modified Query_cache::send_result_to_client: send result to client in chunks limited by 2^24-1 bytes.
[15 Sep 2010 9:24]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/118285 3480 Dmitry Shulga 2010-09-15 Fixed bug#42503 - "Lost connection" errors when using compression protocol. @ sql/net_serv.cc net_realloc() modified: consider already used memory when compare packet buffer length @ sql/sql_cache.cc modified Query_cache::send_result_to_client: send result to client in chunks limited by 1 megabyte.
[16 Sep 2010 8:02]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/118345 3509 Dmitry Shulga 2010-09-16 Fixed bug#42503 - "Lost connection" errors when using compression protocol. The loss of connection was caused by a malformed packet sent by the server in case when query cache was in use. When storing data in the query cache, the query cache memory allocation algorithm had a tendency to reduce the amount of memory block necessary to store a result set, up to finally storing the entire result set in a single block. With a significant result set, this memory block could turn out to be quite large - 30, 40 MB and on. When such a result set was sent to the client, the entire memory block was compressed and written to network as a single network packet. However, the length of the network packet is limited by 0xFFFFFF (16MB), since the packet format only allows 3 bytes for packet length. As a result, a malformed, overly large packet with truncated length would be sent to the client and break the client/server protocol. The solution is, when sending result sets from the query cache, to ensure that the data is chopped into network packets of size <= 16MB, so that there is no corruption of packet length. This solution, however, has a shortcoming: since the result set is still stored in the query cache as a single block, at the time of sending, we've lost boundaries of individual logical packets (one logical packet = one row of the result set) and thus can end up sending a truncated logical packet in a compressed network packet. As a result, on the client we may require more memory than max_allowed_packet to keep, both, the truncated last logical packet, and the compressed next packet. This never (or in practice never) happens without compression, since without compression it's very unlikely that a) a truncated logical packet would remain on the client when it's time to read the next packet b) a subsequent logical packet that is being read would be so large that size-of-new-packet + size-of-old-packet-tail > max_allowed_packet. To remedy this issue, we send data in 1MB sized packets, that's below the current client default of 16MB for max_allowed_packet, but large enough to ensure there is no unnecessary overhead from too many syscalls per result set. @ sql/net_serv.cc net_realloc() modified: consider already used memory when compare packet buffer length @ sql/sql_cache.cc modified Query_cache::send_result_to_client: send result to client in chunks limited by 1 megabyte.
[16 Sep 2010 10:25]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/118363 3510 Dmitry Shulga 2010-09-16 Fixed bug#42503 - "Lost connection" errors when using compression protocol. The loss of connection was caused by a malformed packet sent by the server in case when query cache was in use. When storing data in the query cache, the query cache memory allocation algorithm had a tendency to reduce the amount of memory block necessary to store a result set, up to finally storing the entire result set in a single block. With a significant result set, this memory block could turn out to be quite large - 30, 40 MB and on. When such a result set was sent to the client, the entire memory block was compressed and written to network as a single network packet. However, the length of the network packet is limited by 0xFFFFFF (16MB), since the packet format only allows 3 bytes for packet length. As a result, a malformed, overly large packet with truncated length would be sent to the client and break the client/server protocol. The solution is, when sending result sets from the query cache, to ensure that the data is chopped into network packets of size <= 16MB, so that there is no corruption of packet length. This solution, however, has a shortcoming: since the result set is still stored in the query cache as a single block, at the time of sending, we've lost boundaries of individual logical packets (one logical packet = one row of the result set) and thus can end up sending a truncated logical packet in a compressed network packet. As a result, on the client we may require more memory than max_allowed_packet to keep, both, the truncated last logical packet, and the compressed next packet. This never (or in practice never) happens without compression, since without compression it's very unlikely that a) a truncated logical packet would remain on the client when it's time to read the next packet b) a subsequent logical packet that is being read would be so large that size-of-new-packet + size-of-old-packet-tail > max_allowed_packet. To remedy this issue, we send data in 1MB sized packets, that's below the current client default of 16MB for max_allowed_packet, but large enough to ensure there is no unnecessary overhead from too many syscalls per result set. @ sql/net_serv.cc net_realloc() modified: consider already used memory when compare packet buffer length @ sql/sql_cache.cc modified Query_cache::send_result_to_client: send result to client in chunks limited by 1 megabyte.
[16 Sep 2010 10:39]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/118370 3206 Dmitry Shulga 2010-09-16 [merge] Auto-merge from mysql-5.1-bugteam for bug#42503.
[20 Sep 2010 8:22]
Dmitry Shulga
Pushed to mysql-5.1-bugteam. Merged and pushed to mysql-5.5-merge.
[24 Sep 2010 12:04]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/119026 3516 Dmitry Shulga 2010-09-24 Follow-up for Bug#42503: fix a compilation warning. @ sql/sql_cache.cc Added include of send_data_in_chunks() definiton when macros EMBEDDED_LIBRARY is on.
[24 Sep 2010 12:13]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/119028 3212 Dmitry Shulga 2010-09-24 [merge] Merged changes from 5.1-bugteam for bug#42503.
[28 Sep 2010 15:39]
Bugs System
Pushed into mysql-trunk 5.6.1-m4 (revid:alik@sun.com-20100928153607-tdsxkdm5cmuym5sq) (version source revid:alik@sun.com-20100928153508-0saa6v93dinqx1u7) (merge vers: 5.6.1-m4) (pib:21)
[28 Sep 2010 15:41]
Bugs System
Pushed into mysql-next-mr (revid:alik@sun.com-20100928153646-pqp8o1a92mxtuj3h) (version source revid:alik@sun.com-20100928153532-lr3gtvnyp2en4y75) (pib:21)
[28 Sep 2010 15:44]
Bugs System
Pushed into mysql-5.5 5.5.7-rc (revid:alik@sun.com-20100928153459-4nudf4zgzlou4s7q) (version source revid:alik@sun.com-20100928153459-4nudf4zgzlou4s7q) (merge vers: 5.5.7-rc) (pib:21)
[29 Sep 2010 19:50]
Paul DuBois
Noted in 5.5.7, 5.6.1 changelogs. A malformed packet sent by the server when the query cache was in use resulted in lost-connection errors. Setting to Need Merge pending push to 5.1.x.
[3 Oct 2010 1:21]
Paul DuBois
Setting to Need Merge pending push to 5.1.x.
[6 Oct 2010 19:34]
Konstantin Osipov
Bug#57074 mysql_store_result() fails if compression is enabled was marked a duplicate of this bug.
[7 Oct 2010 23:03]
Paul DuBois
Noted in 5.1.52 changelog.
[1 Nov 2010 19:00]
Bugs System
Pushed into mysql-5.1 5.1.53 (revid:build@mysql.com-20101101184443-o2olipi8vkaxzsqk) (version source revid:build@mysql.com-20101101184443-o2olipi8vkaxzsqk) (merge vers: 5.1.53) (pib:21)
[11 Jan 2011 15:37]
Bugs System
Pushed into mysql-5.1 5.1.55 (revid:dmitry.shulga@oracle.com-20110111151825-d376jh0jefmsdslw) (version source revid:dmitry.shulga@oracle.com-20110111151825-d376jh0jefmsdslw) (merge vers: 5.1.55) (pib:24)
[11 Jan 2011 15:37]
Bugs System
Pushed into mysql-trunk 5.6.2 (revid:dmitry.shulga@oracle.com-20110111153355-uvff16i445mk2spu) (version source revid:dmitry.shulga@oracle.com-20110111153355-uvff16i445mk2spu) (merge vers: 5.6.2) (pib:24)
[11 Jan 2011 15:38]
Bugs System
Pushed into mysql-5.5 5.5.9 (revid:dmitry.shulga@oracle.com-20110111152631-jezu2dbglg05g6ff) (version source revid:dmitry.shulga@oracle.com-20110111152631-jezu2dbglg05g6ff) (merge vers: 5.5.9) (pib:24)