| Bug #42503 | "Lost connection" errors when using compression protocol | ||
|---|---|---|---|
| Submitted: | 31 Jan 22:09 | Modified: | 18 Aug 9:53 |
| Reporter: | Tech Magos | ||
| Status: | Verified | ||
| Category: | Server | Severity: | S2 (Serious) |
| Version: | 5.1.34, 5.1 bzr | OS: | Linux |
| Assigned to: | Lars-Erik Bjørk | Target Version: | 5.1+ |
| Triage: | Triaged: D1 (Critical) / R2 (Low) / E3 (Medium) | ||
[31 Jan 22:09]
Tech Magos
[1 Feb 11:47]
Valeriy Kravchuk
Thank you for a problem report. Please, check your in server's error log if there was a server crash at that time. What version of client do you use? Check with mysql --version, please.
[1 Feb 15:38]
Tech Magos
Server reports nothing in the logs. It seems to have no idea of the problem. No server crash either. mysql client Ver 14.14, 5.1.30 (linux); from same version as the server. Note exactly the same behaviour is seen by Windows client code, connecting by direct use of the client API. I.e. if compression is on, library returns lost connection, after call to mysq_real_query call. If compression is off, then query works for win client code too.
[1 Feb 20:58]
Miguel Solorzano
I couldn't repeat with current source server. Could you please provide your my.cnf file and create table statement. Thanks in advance.
[2 Feb 19:29]
Tech Magos
[mysqld] slow_query_log slow_query_log_file=/path/slow.log skip-safemalloc skip-name-resolve skip-locking default-character-set=latin1 default-storage-engine=INNODB max_allowed_packet=250M query_cache_type=1 query_cache_size=800M query_cache_limit=100M max_connections=150 max_connect_errors=100 wait_timeout=3600 connect_timeout=16 net_read_timeout=50 net_write_timeout=100 interactive_timeout=300 innodb_locks_unsafe_for_binlog=1 innodb_autoinc_lock_mode=2 innodb_buffer_pool_size = 16000MB innodb_log_files_in_group = 2 innodb_log_file_size = 320M innodb_log_buffer_size=16M innodb_data_file_path = ibdata1:100M:autoextend innodb_additional_mem_pool_size=25M innodb_file_per_table=1 innodb_open_files = 2048 innodb_max_dirty_pages_pct=95 innodb_checksums=0 innodb_support_xa=0 innodb_doublewrite=0 innodb_flush_method = O_DIRECT innodb_flush_log_at_trx_commit = 2 innodb_lock_wait_timeout=50 innodb_thread_concurrency=16 innodb_commit_concurrency=20 innodb_read_io_threads=16 innodb_write_io_threads=4 key_buffer_size=256M sort_buffer_size=8M myisam_sort_buffer_size=8M read_rnd_buffer_size=32M read_buffer_size=3M join_buffer_size=16M tmp_table_size=256M max_heap_table_size=256M max_tmp_tables=100 thread_cache_size=128 thread_concurrency=16 [safe_mysqld] timezone=GMT CREATE TABLE `ID2NAME` ( `Id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT, `Name` varchar(255) NOT NULL, PRIMARY KEY (`Id`), UNIQUE KEY `iName` (`Name`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
[12 Feb 11:41]
Sveta Smirnova
Thank you for the feedback. I can not repeat described behavior. Please try current version 5.1.31 and inform us if problem still exists.
[13 Mar 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[22 May 13:40]
Shane Bester
the might be verified in bug #45031 . crash is very hard to repeat, and depends on certain properties of the compressed stream. hence, using valgrind is needed is alot of times to catch the problem.
[29 May 0:39]
Tech Magos
This is a real issue and the last cross-reference seems to suggest where it is (valgrind report) It only seems to happen when requested data is pretty large; e.g. 300K rows or more of data of 6 20 byte columns or so. It does not seem to be related to some memory exhaustion problem, as when the compression protocol is off, then this works fine. The valgrind report on the cross-referenced issue suggests that this is due to a memory-related bug, i.e. trying to read off bounds. Could you please check if things work ok in the case - The client is 32-bit and server is 64-bit; i recall seeeing complen in my_real_read being declared size_t; this is probably different size in 32 and 64 bit. - Could it be that the size of the compressed block is not always well defined or aligned so the reader gets confused? And this happens only when we get very large data returned? - It is not clear to me the problem originates on the server or the client. The cross-referenced bug report suggests this happens on the client side. Server reports nothing on the logs when the error hits. - When it happens, it occurs from both linux 64-client, 32-bit client and windows 32-bit client.
[2 Jun 0:41]
Tech Magos
This bug occurs also in 5.1.34 and seems to be also connected to the query cache. Here are some important findings: - The problem happens AFTER the first time the query is run, i.e. after it stores itself in the query cache. First time, the query works fine. - If query cache is disabled, the bug never happens. - If SQL_NO_CACHE is placed after the SELECT, the bug never happens. - After the problem happens, if RESET QUERY CACHE is run, the problem does not happen for run of the query after the reset. It does happens again for runs after that. - Seems that if qcache is 1 and the query result does not fit inside the qcache (e.g. if query_cache_limit is small), then the query never fails So the problem seems to appear when the compressed protocol is used AND the query produces a large result that is already cached in the query cache. It is worth mentioning that when the problem appears the server does pipe data through the channel of the same size as when the query succeeds(as SHOW GLOBAL STATUS reveals). For 5.1.34, when valgrind is run on client (mysql) or server (mysqld), no issues are reported. The valgrind problem as mentioned in the cross-referenced bug related to 6.x is NOT reported. So it looks like there is probably malformed data produced when the query result is compressed from the query cache.
[30 Jun 8:35]
Sveta Smirnova
Thank you for the feedback. Verified as described. To repeat: 1. filled table with insert into ID2NAME (Name) values(md5(rand()));, then different combinations of insert into ID2NAME (Name) select md5(rand()) from ID2NAME;, insert into ID2NAME (Name) select concat(Id, md5(rand()), Id) from ID2NAME; etc. 2. mysql> select count(*) from ID2NAME; +----------+ | count(*) | +----------+ | 1048576 | +----------+ 1 row in set (5.89 sec) 3. Run following command: while ./bin/mysql -hsol10-sparc-c -P33051 -uroot -C test -e "select Id, Name from ID2NAME" >/dev/null; do echo 1; done Server started as: ./libexec/mysqld --defaults-file=support-files/my-small.cnf --basedir=. --datadir=./data --port=33051 --socket=/tmp/mysql_ssmirnova.sock -O thread_stack=1280000 --log-error --query_cache_type=1 --query_cache_size=800M --query_cache_limit=100M &
[30 Jun 8:35]
Sveta Smirnova
Backtrace in my env: (dbx) where current thread: t@13 [1] __lwp_kill(0x0, 0x6, 0xff06cbc0, 0xfef21c00, 0xff06c388, 0x6), at 0xff04155c =>[2] my_write_core(sig = 6), line 310 in "stacktrace.c" [3] handle_segfault(sig = 6), line 2536 in "mysqld.cc" [4] __sighndlr(0x6, 0x0, 0xfe638ce0, 0x270964, 0x0, 0x0), at 0xff040494 ---- called from signal handler with signal 6 (SIGABRT) ------ [5] __lwp_kill(0x0, 0x6, 0x85274, 0xff3b3904, 0xff06fb20, 0x0), at 0xff04155c [6] raise(0x6, 0x6, 0x5, 0x6, 0x32aa0, 0x0), at 0xfefdfef8 [7] abort(0xfe639058, 0xff0700b8, 0xff06928c, 0xa8274, 0xff068288, 0xff070904), at 0xfefc00d8 [8] __assert(0x9c1c50, 0x9c1bd0, 0x212, 0xfe63930c, 0xa7f90, 0xb02750), at 0xfefc0354 [9] Diagnostics_area::disable_status(this = 0x33b92568), line 530 in "sql_class.cc" [10] Query_cache::send_result_to_client(this = 0xc2c480, thd = 0x33b91808, sql = 0x33bc0840 "select Id, Name from ID2NAME", query_length = 28U), line 1527 in "sql_cache.cc" [11] mysql_parse(thd = 0x33b91808, inBuf = 0x33bc0840 "select Id, Name from ID2NAME", length = 28U, found_semicolon = 0xfe639cf0), line 5886 in "sql_parse.cc" [12] dispatch_command(command = COM_QUERY, thd = 0x33b91808, packet = 0x33ba412d "select Id, Name from ID2NAME", packet_length = 28U), line 1213 in "sql_parse.cc" [13] do_command(thd = 0x33b91808), line 854 in "sql_parse.cc" [14] handle_one_connection(arg = 0x33b91808), line 1115 in "sql_connect.cc" (dbx)
[30 Jun 8:49]
Sveta Smirnova
If don't use compression protocol, run same client test, then kill thread connected several times server crashes with same trace as well.
[18 Aug 1:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[18 Aug 18:49]
Konstantin Osipov
This can easily lead to disconnect on production.
[19 Aug 11:49]
Andrei Elkin
A failure http://pb2.norway.sun.com/web.py?template=mysql_show_test_failure&test_failure_id=2281048 can relate. Some details on that crash: test-max-win_ws2008-x86_64 loki05 ps_stm innodb [ fail ] mysqltest: At line 1001: query 'update t1 set c=a' failed: 2013: Lost connection to MySQL server during query
[24 Aug 18:18]
Sinisa Milivojevic
Sveta, Please check whether this bug is fixed with a patch made for a bug #45031. Thanks ...
[25 Aug 9:54]
Sveta Smirnova
Bug is still repeatable after fix of bug #45031. Please note [10] Query_cache::send_result_to_client(this = 0xc30888, thd = 0x33b95d88, sql = 0x33bc3f80 "select Id, Name from ID2NAME", query_length = 28U), line 1624 in "sql_cache.cc" in the trace. Also bug is not repeatable with query cache disabled. So this is compression + query cache problem.
