Bug #89585 | Errcode: 24 - Too many open files on replication side | ||
---|---|---|---|
Submitted: | 8 Feb 2018 8:59 | Modified: | 20 Apr 2018 7:35 |
Reporter: | HengWay Ong | Email Updates: | |
Status: | Can't repeat | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S1 (Critical) |
Version: | 5.6.39-log | OS: | Windows |
Assigned to: | MySQL Verification Team | CPU Architecture: | Any |
Tags: | replication |
[8 Feb 2018 8:59]
HengWay Ong
[8 Feb 2018 9:46]
HengWay Ong
tmp_dir that accumulate 20k of temporary file
Attachment: mysql_error.png (image/png, text), 67.46 KiB.
[12 Feb 2018 9:32]
HengWay Ong
Look like this bug very similar with https://bugs.mysql.com/bug.php?id=41969, possible is the previous fix been remove?
[12 Feb 2018 9:36]
HengWay Ong
change category to mysql server: replication
[19 Feb 2018 9:15]
MySQL Verification Team
Hi, This looks like a duplicate of Bug #40013 Lemme investigate and get back to you Bogdan
[19 Feb 2018 14:12]
MySQL Verification Team
Hi, I have issues reproducing this. The test for bug #40013 passes ok for me. I discussed with the colleagues in development team and there are some questions. What's happening on your system with sessions on master? Replication will automatically generate "drop temp table" in binlog only if the session is closed. So if you have bunch of running sessions on the master and no "drop table" that's explicit the behavior is as expected. Thanks Bogdan
[20 Feb 2018 4:29]
HengWay Ong
Thank you for help checking, we have this issue after we upgrade mysql from 5.5 to 5.6. So possible is when master session closed the drop statement been write into binlog?
[20 Feb 2018 9:12]
HengWay Ong
Please ignore previous comment, we have this issue after we upgrade mysql from 5.5 to 5.6. So possible is when master session closed the drop statement NOT BEEN write into binlog?
[20 Feb 2018 9:16]
MySQL Verification Team
hi, it is possible that on windows there is some bug, I'll have to retest the bug on windows today and see how it behaves, on linux it works flawlessly. Are you 100% sure your sessions on master are closed explicitly? What does process list on master looks like? all best Bogdan
[20 Feb 2018 9:51]
HengWay Ong
Hi, the processlist most of the time only have less then 10 process. And we confirm that the connection been close properly, because we use php + mysql, that is always a close connection statement at the end of mysql query. Besides that we set interactive_timeout to 18M, so even there is an error, the connection will be force close after 18 minutes. The most curious part is, this issue just occur after we upgrade from 5.5 to 5.6.
[20 Feb 2018 10:08]
HengWay Ong
I checked on master binlog, found a lot of drop temporary statement like above which generated by server.
Attachment: mysql_binlog.jpeg (image/jpeg, text), 30.79 KiB.
[20 Feb 2018 10:09]
MySQL Verification Team
Thanks for the update, will retest today I just need few windows boxes for test (not something I normally use hence the delay) all best Bogdan
[20 Feb 2018 10:21]
HengWay Ong
I found one main different between mysql 5.5 and 5.6 on slave side which is, on mysql 5.6 will always have 2 worker running query, and the worker will work like round robin. So possible is when worker 1 created a temporary table, and workder 2 run drop temporary table statement.
[20 Feb 2018 12:39]
MySQL Verification Team
Hi, we can't reproduce the problem on windows neither. We can reproduce (as expected) if we are not closing connections on the master but if we do close connections on master it works ok. Are you sure php is closing sessions? if you are using persistent connections on php it will not close them! How easy is for you to reproduce this issue? Are you running any processes on slave (queries?)? Can you use mysqlbinlog .. | grep "DROP TEMPO" and send us the output? Since we can't reproduce by general description I will need your exact master and slave config files to try with them. thanks in advance Bogdan
[20 Feb 2018 12:47]
Venkatesh Duggirala
Post by Developer ================= Hi HengWay, I have answer for your question and I have a question for you :) 1) Your question related to "2 workers and round robin" A) No, it is not possible. DDLs are handled by only one thread (the coordinator), only DMLs goes to workers and as you said different workers can pick DMLs (as long as they do not conflict). And question from my end: I see some garbage (?) + NULLs in your attached binlog output image file? Any idea why is it ? We do not see that. Is it garbage or some issue at your end while displaying them ? Regards, Venkatesh.
[20 Feb 2018 16:14]
HengWay Ong
@Bogdan Kecman, can try reduce the tmp_table_size to minimum value, and try create some huge temporary table. This issue open happen when the temporary table need to create on hdd. Below is the my.ini we are using: [client] port=3317 [mysql] [mysqld] port=3317 thread_concurrency = 50 query_prealloc_size=819200 basedir="D:/database_3317" datadir="D:/database_3317/data/" tmpdir="D:/temporary/db_3317/1/;D:/temporary/db_3317/2/;D:/temporary/db_3317/3/;D:/temporary/db_3317/4/;D:/temporary/db_3317/5/;D:/temporary/db_3317/6/;D:/temporary/db_3317/7/;D:/temporary/db_3317/8/;D:/temporary/db_3317/9/;D:/temporary/db_3317/10/" default-storage-engine = MyISAM default_tmp_storage_engine=MyISAM sql-mode="NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION" max_connections=1000 query_cache_size=300M tmp_table_size=500M max_heap_table_size=500M max_join_size=500M thread_cache_size=40 myisam_max_sort_file_size=100G myisam_sort_buffer_size=200M key_buffer_size=400M read_buffer_size=200M read_rnd_buffer_size=100M sort_buffer_size=10240K innodb_data_file_path=ibdata1:50M:autoextend:max:50M innodb_additional_mem_pool_size=6M innodb_flush_log_at_trx_commit=1 innodb_log_buffer_size=8M innodb_buffer_pool_size=50M innodb_log_file_size=50M innodb_thread_concurrency=10 innodb_mirrored_log_groups=1 innodb_log_files_in_group=2 innodb_lock_wait_timeout=60 innodb_file_io_threads=4 join_buffer_size=500M max_user_connections=500 max_allowed_packet=10M max_connect_errors=999999 max_tmp_tables=500 interactive_timeout=18M wait_timeout=1500 connect_timeout=5 open_files_limit=20000 innodb_open_files=4096 table_open_cache=3082 query_cache_type=1 query_cache_limit=3M long_query_time=5 #log-slow-queries="D:/log/mysql-err-slow.txt" log-bin="D:/database_3317/bindata/mysqld_master.bin.log" server-id = 180124 slave-compressed-protocol=1 log-error="D:/log/mysql-err_3317.txt" skip-delay-key-write myisam-recover=BACKUP,FORCE skip-concurrent-insert delayed_insert_limit=10 delayed_insert_timeout=1 # --- Replication Slave setting begin --- # log_slave_updates gtid-mode = OFF slave-skip-errors=1062,1146 slow-query-log log-output = TABLE binlog-format = STATEMENT log-bin-index = "D:/database_3317/bindata/mysqld_master.index" slave-load-tmpdir = "D:/database_3317/temp/1" replicate-wild-ignore-table = nllcom.% replicate-wild-ignore-table = mysql.% replicate-wild-ignore-table = %demo%.% replicate-wild-ignore-table = %close%.% replicate-wild-ignore-table = %_201%.% slave_checkpoint_group = 512 slave-parallel-workers = 2 relay_log_info_file = D:/database_3317/bindata/mysqld_relay.relay-log.info relay_log_index = D:/database_3317/bindata/mysqld_relay.relay-log.index slave_checkpoint_period = 300 relay-log = D:/database_3317/bindata/mysqld_relay.relay.log explicit_defaults_for_timestamp For mysqlbinlog .. | grep "DROP TEMPO", I will prepare is and attach to this post later. I will setup a test server for reproduce this issue as well, because currently, we face this issue on all our production db which bigger then 50GB of data. @Venkatesh Duggirala, thank you for the answer, for the garbage (?) + NULLs in the binlog, I got no idea what is that, normally when we open using notepad++ direct open the binlog, it always consist those garbage (?) + NULLs between query, we though is some marking mysql master put in. Thank you Bogdan Kecman and Venkatesh Duggirala for the helping.
[20 Feb 2018 16:27]
Venkatesh Duggirala
Hello HengWay, Please note that it is always better to see the content of binlogs using 'show binlog events' command on the server or use mysqlbinlog tool. notepad++ apps cannot understand binary file. One more question: What is the value of "slave_open_temporary_tables" variable on the slave when this is happening ? You can get it using "SHOW STATUS VARIABLE LIKE 'slave_open_temp_tables'; Also, just a small modification on what Bogdan asked. Please attach both "create temp" and "drop temp" queries from the binlogs. Count should be matched. Please let us know if you get reproducible scenario on your test environment, it will help us a lot in proceeding more quick on this bug. Regards, Venkatesh.
[22 Feb 2018 5:39]
HengWay Ong
This is the create and drop temp that we extract from binlog, look like the count is equal
Attachment: binlog.png (image/png, text), 2.72 MiB.
[22 Feb 2018 7:10]
MySQL Verification Team
Hi, can you get us the actual output (store to txt file and attach) instead of img of the mysqlbin output? also, any chance you can get the info my colleague asked for: "One more question: What is the value of "slave_open_temporary_tables" variable on the slave when this is happening ? You can get it using "SHOW STATUS VARIABLE LIKE 'slave_open_temp_tables'; " thanks in advance Bogdan
[22 Feb 2018 11:08]
HengWay Ong
sorry for miss the slave_open_temporary_table, the value not so huge now, because we regularly restart mysql to prevent issue
Attachment: mysql_slave_open_temporary_tables.jpeg (image/jpeg, text), 10.00 KiB.
[22 Feb 2018 11:08]
HengWay Ong
Can help provide me email, so I can email the sq
[22 Feb 2018 11:08]
HengWay Ong
Can please help provide email so I can email the sql script? Thank you
[24 Feb 2018 5:26]
Venkatesh Duggirala
Hello Hengway, Two things: 1) In the attached image, the first section, you are actually searching "Drop table" , you should be searching for "Drop Temporary table". 2) If you are talking about reproducible script, you can upload it here in the same you are uploading the other files (may be you can remove/modify the content if it is private). Regards, Venkatesh.
[26 Feb 2018 1:23]
HengWay Ong
Attachment is the binlog that extract. Please help check. thank you
Attachment: binlog_for_tmp_table.txt (text/plain), 190.98 KiB.
[26 Feb 2018 1:32]
HengWay Ong
this is the slave_open_temporary_tables for one of the replication instance that running for almost 5 days.
Attachment: mysql_1.png (image/png, text), 84.04 KiB.
[26 Feb 2018 1:54]
HengWay Ong
This is the result from mysqladmin status. Hope this data will help
Attachment: mysql_2.png (image/png, text), 14.00 KiB.
[26 Feb 2018 2:09]
HengWay Ong
Can I know possible for us to show temporary table? so I can check what temporary table no been drop properly. Thank you
[27 Feb 2018 11:09]
MySQL Verification Team
Hi, We have issues reproducing this. All temp tables created on master are dropped on slave after session is done. Are you 100% sure your clients actually close the connection? Any reuse of the connection could lead to this issue you are having. Anything between MySQL server and your clients (mysql router, mysql proxy, proxysql..)? Can you reproduce this problem on any setup other then your production setup? As for the status, you can get some data from show global status like '%table%'; all best Bogdan
[28 Feb 2018 3:56]
HengWay Ong
Hi, thank you for the feedback, we direct connect mysql with php, no other service been using in between. Currently, we are setting up a VM and trial reproduce this issue. Once we can reproduce will send VM to you so can make the debug more easier. Thank you
[28 Feb 2018 3:58]
MySQL Verification Team
Hi, That would be perfect! thanks Bogdan
[30 Mar 2018 7:53]
HengWay Ong
Hi, sorry for late update, we still trying to reproduce this issue , but along the way, we found the config parameter that causing this issue, which is slave-parallel-workers. This config parameter that been release on 5.6, which previously not exist in 5.5. We set this parameter to 2 at production server when we migrate from 5.5 to 5.6. During the issue reproduce process, we found once we set this parameter to 0 (slave-parallel-workers) which mean off the MTS, the temporary table do not drop issue been solved in production server. My hypothesis toward this is, temporary table been create at thread 1, but the drop statement been accidentally run on thread 2, so the temporary table not been drop properly. Hope the inform that we provided can help on debug, and we will still continue to reproduce the issue on VM. Once done the vm will update again. Thank you and have a nice day
[2 Apr 2018 12:55]
MySQL Verification Team
I still can't reproduce this but lemme discuss with devs maybe MTS is the culprit here all best Bogdan
[18 Apr 2018 2:42]
MySQL Verification Team
Hi, I can't reproduce this, not with MTS, not without it, whatever I do I never reproduce the issue. all best Bogdan
[19 Apr 2018 17:03]
MySQL Verification Team
Hi, Discussed this with dev team and we can't reproduce this and this makes not much sense to us. You mentioned you can send us the VM where you reproduce this so we can try ourselves? Is that offer still on the table, can you do that? The only suggestion I have for now is to reproduce with binary logging enabled on *slave* (and log-slave-updates), and then check whether the DROPs that the master logged were actually applied on the slave. Kind regards Bogdan
[20 Apr 2018 4:16]
HengWay Ong
Sorry for late reply, for the VM, we still in prepare, but we still cannot reproduce yet. Because in production sever, we have few million sql running per day, so still cannot figure out which sql causing this issue. But once we set slave-parallel-workers = 0, on all production server, this issue been solved.
[20 Apr 2018 4:58]
Venkatesh Duggirala
Hello Hengway, Just one quick question, in the attachment binlog_for_tmp_table.txt, I see one strange thing DROP TEMPORARY TABLE IF EXISTS `tmp_non_replicate_shorttext_5a8cee8362e1b`; CREATE TEMPORARY TABLE tmp_non_replicate_shorttext_5a8cee8362e1b (INDEX(st_id)) Where is column name in create temporary table ? What is the tool used to generate that .txt ? Can you please concentrate on that? May be we get some clues from there ? Regards, Venkatesh.
[20 Apr 2018 7:35]
HengWay Ong
we are using php function to generate this temporary table name, this is the code we use: $tbl_name = uniqid('tmp_non_replicate_shorttext_'); For the column, the create statement is like this:: CREATE TABLE `tmp_non_replicate_shorttext_123456` ( `st_id` varchar(200) NOT NULL DEFAULT '', `st_text` varchar(1000) DEFAULT NULL, `sortfield` int(2) NOT NULL DEFAULT '0' ) ENGINE=MyISAM DEFAULT CHARSET=latin1 For the .txt file, I use this command to generate it: mysqlbinlog "{binlog at slave side}" | findstr "TEMPORARY" > xxxxx.txt Please let me know if need more information Thank you and have a nice day
[20 Apr 2018 9:43]
Venkatesh Duggirala
Hello Hengway, With your above create table structure, ( I changed temporary keyword which is missing above) and used mysqlbinlog on the slave's binary log as you said and it is showing properly with column names. Could you please debug at your end why you are missing the column names in the output. Are you using mysql-5.6.39 downloaded from MySQL site ? Do you have any changes on top of it and rebuilt it by any chance ? Regards, Venkatesh.