Bug #44976 | Incorrect INSERT_ID in binary log | ||
---|---|---|---|
Submitted: | 20 May 2009 9:09 | Modified: | 3 Sep 2010 15:42 |
Reporter: | Vadim TKACHENKO | Email Updates: | |
Status: | Duplicate | Impact on me: | |
Category: | MySQL Server: Replication | Severity: | S2 (Serious) |
Version: | 5.0.77 | OS: | Any |
Assigned to: | CPU Architecture: | Any |
[20 May 2009 9:09]
Vadim TKACHENKO
[12 Jun 2009 8:40]
Susanne Ebrecht
Many thanks for writing a bug report. Unfortunately, I am not able to follow you. Where is the bug?
[12 Jun 2009 23:05]
Vadim TKACHENKO
Hi, The bug is that before two transactions we have SET INSERT_ID=5535367/*!*/; ... SET INSERT_ID=5535357/*!*/; .. That is two different inserts are wortking with the same INSERT_ID, and trying insert into table the same values for auto_increment field
[12 Jun 2009 23:08]
Vadim TKACHENKO
Sorry, I meant SET INSERT_ID=5535367/*!*/; ... SET INSERT_ID=5535367/*!*/; .. The same INSERT_ID
[12 Jun 2009 23:09]
Vadim TKACHENKO
And second bug is that in single transaction TIMESTAMP is decreasing SET TIMESTAMP=1242388653/*!*/ ... SET TIMESTAMP=1242388652/*!*/ As queries in transaction are executed one by another, I would expect second query can't be executed earlier then first one.
[18 Jun 2009 8:41]
Sveta Smirnova
Thank you fro the report. > And second bug is that in single transaction TIMESTAMP is decreasing >SET TIMESTAMP=1242388653/*!*/ >... >SET TIMESTAMP=1242388652/*!*/ This is not a bug: this just means following order of actions: conn1> begin conn1> do some work conn2> begin conn2> do some work conn2> commit conn1> commit
[18 Jun 2009 15:52]
Vadim TKACHENKO
Svetlana, It is from the same transaction SET TIMESTAMP=1242388653/*!*/ BEGIN/*!*/; .. SET TIMESTAMP=1242388652/*!*/; INSERT INTO `iio_91`...
[19 Jun 2009 7:55]
Sveta Smirnova
Thank you for the feedback. I could not repeat described behavior with generic test. Do you use only InnoDB tables or do you have both transactional and not transactional tables in same transaction? Could you please provide configuration file for master?
[24 Jun 2009 15:06]
Vadim TKACHENKO
Sveta, It's only InnoDB files. I do not expect the behavior is repeatable in generic tests. I tried to emulate myself but could not create test case. So I posted the bug for reference, may be someone else also will have similar problem. I think problem is with auto_increment, because when we removed that field, problem disappeared my.cnf is below [client] #password = [your_password] port = 3306 socket = /tmp/mysql.sock [mysqld] port = 3306 socket = /tmp/mysql.sock datadir = /r1/mysql user = mysql group = mysql # server-id = 44 # 2 being the number of servers in the cluster. auto_increment_increment = 2 # 1 for fisrt box, 2 for second box, up to n. auto_increment_offset = 1 back_log = 50 skip-name-resolve # In case of emergency. #skip-networking # disable binlogs #disable-binlog-do-db #slave-skip-errors=1062,1053 max_connections = 3000 max_connect_errors = 9999999 table_cache = 4096 max_allowed_packet = 32M binlog_cache_size = 1M max_heap_table_size = 300M sort_buffer_size = 8M join_buffer_size = 2M thread_cache_size = 1024 thread_concurrency = 8 ft_min_word_len = 3 memlock default_table_type = InnoDB thread_stack = 192K transaction_isolation = REPEATABLE-READ tmp_table_size = 32M query_cache_type = 2 log-bin=mysql-bin max_binlog_size = 256M expire_logs_days=3 log_slave_updates replicate-ignore-db=mysql replicate-ignore-db=pythian replicate-ignore-db=test replicate-wild-ignore-table=test.% log_slow_queries long_query_time = 2 #log_long_format tmpdir = /r1/tmp key_buffer_size = 256M read_buffer_size = 2M read_rnd_buffer_size = 8M bulk_insert_buffer_size = 64M myisam_sort_buffer_size = 64M myisam_max_sort_file_size = 10G myisam_max_extra_sort_file_size = 10G # Set to the number of CPUs #myisam_repair_threads = 2 #myisam_recover skip-bdb #large_pages = True innodb_additional_mem_pool_size = 32M innodb_buffer_pool_size = 52G #innodb_buffer_pool_size = 4G innodb_data_file_path = ibdata1:10M:autoextend innodb_autoextend_increment = 128 innodb_data_home_dir = /r1/mysql innodb_log_group_home_dir = /r1/mysql innodb_io_capacity = 2500 innodb_ibuf_accel_rate = 200 # If you run into InnoDB tablespace corruption, setting this to a nonzero # value will likely help you to dump your tables. Start from value 1 and # increase it until you're able to dump the table successfully. #innodb_force_recovery=5 # Big performance slider depending on how latent the underlying disks # are. # percona tunings innodb_thread_concurrency = 0 innodb_commit_concurrency = 1 innodb_thread_sleep_delay = 0 innodb_flush_log_at_trx_commit = 2 innodb_log_buffer_size = 16M innodb_log_file_size = 1024M innodb_log_files_in_group = 2 innodb_max_dirty_pages_pct = 25 innodb_commit_concurrency = 1 innodb_thread_sleep_delay = 0 innodb_flush_method=O_DIRECT innodb_lock_wait_timeout = 120 innodb_file_per_table = 1 [mysqldump] quick max_allowed_packet = 32M [mysql] no-auto-rehash # Only allow UPDATEs and DELETEs that use keys. safe-updates [isamchk] key_buffer = 512M sort_buffer_size = 512M read_buffer = 8M write_buffer = 8M [myisamchk] key_buffer = 512M sort_buffer_size = 512M read_buffer = 8M write_buffer = 8M [mysqlhotcopy] interactive-timeout [mysqld_safe] open-files-limit = 16384
[20 Jul 2009 8:14]
Susanne Ebrecht
I am still not able to see a bug here. What exactly let you think there is a bug?
[20 Aug 2009 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[21 Sep 2009 23:00]
Bugs System
No feedback was provided for this bug for over a month, so it is being suspended automatically. If you are able to provide the information that was originally requested, please do so and change the status of the bug back to "Open".
[15 Jan 2010 21:52]
justin dahlke
I think we've hit the same problem. It occurs many times a day, but it's def a race condition. After running a maatkit checksum on the subscriberfield table on master/slave we found 22% of batches of 100k rows aren't consistent (so prob overall <1%). There's currently ~100 mil rows in that table. On the master select * from subscriberfield where subscriber_id = 90615540 and field_id = 42846; +------------+---------------+----------+------------+-------+ | id | subscriber_id | field_id | content | mtime | +------------+---------------+----------+------------+-------+ | 1839528432 | 90615540 | 42846 | francois | NULL | +------------+---------------+----------+------------+-------+ Here's how it replicates to the slave select * from subscriberfield where subscriber_id = 90615540 and field_id = 42846; +------------+---------------+----------+---------+-------+ | id | subscriber_id | field_id | content | mtime | +------------+---------------+----------+---------+-------+ | 1839528432 | 90615540 | 42846 | 78633 | NULL | +------------+---------------+----------+---------+-------+ To troubleshoot we brought up a new slave taken from a backup and traced the state of a row we knew would replicate incorrectly. At this point that row hasn't been inserted yet: select * from subscriberfield where subscriber_id = 90615540 and field_id = 42846; Empty set (0.00 sec) show slave status \g; Empty set (0.00 sec) To locate the insert statement we trolled through the bin logs on the slave and found: # at 97813391 #100114 11:09:31 server id 13 end_log_pos 97813419 Intvar SET INSERT_ID=1839528432/*!*/; # at 97813419 #100114 11:09:31 server id 13 end_log_pos 97813767 Query thread_id=44827176 exec_time=1 error_code=0 use mail_app/*!*/; SET TIMESTAMP=1263485371/*!*/; INSERT INTO subscriberfield SET subscriber_id = '90615540', field_id = '42846', content = 'francois' ON DUPLICATE KEY UPDATE content = 'francois' /*!*/; # at 97813767 #100114 11:09:31 server id 13 end_log_pos 97813794 Xid = 28190255 COMMIT/*!*/; Next, we caught replication on right up to the point where the insert had executed CHANGE MASTER TO MASTER_LOG_FILE='db-bin.000239', MASTER_LOG_POS= 324249907, MASTER_USER='******', MASTER_PASSWORD='******', MASTER_HOST='10.0.0.26', MASTER_PORT=3306; START SLAVE UNTIL MASTER_LOG_FILE='db-bin.000247', MASTER_LOG_POS= 97813760; After it catches up... show slave status \G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.0.0.26 Master_User: ******* Master_Port: 3306 Connect_Retry: 60 Master_Log_File: db-bin.000282 Read_Master_Log_Pos: 735993872 Relay_Log_File: db-relay-bin.000025 Relay_Log_Pos: 97813931 Relay_Master_Log_File: db-bin.000247 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 97813794 Relay_Log_Space: 38317566737 Until_Condition: Master Until_Log_File: db-bin.000247 Until_Log_Pos: 97813760 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL 1 row in set (0.00 sec) ERROR: No query specified And there you have it, the insert statement executed mysql> select * from subscriberfield where subscriber_id = 90615540 and field_id = 42846; +------------+---------------+----------+------------+-------+ | id | subscriber_id | field_id | content | mtime | +------------+---------------+----------+------------+-------+ | 1839528432 | 90615540 | 42846 | francois | NULL | +------------+---------------+----------+------------+-------+ 1 row in set (0.00 sec)
[15 Jan 2010 21:52]
justin dahlke
Continued... Let it roll through a little more START SLAVE UNTIL MASTER_LOG_FILE='db-bin.000247', MASTER_LOG_POS= 97813766; mysql> select * from subscriberfield where subscriber_id = 90615540 and field_id = 42846; +------------+---------------+----------+------------+-------+ | id | subscriber_id | field_id | content | mtime | +------------+---------------+----------+------------+-------+ | 1839528432 | 90615540 | 42846 | francois | NULL | +------------+---------------+----------+------------+-------+ 1 row in set (0.00 sec) We are now stepping through each commit one at a time... (I'll not copy/paste each one) Until now, when the data has changed on the slave: mysql> START SLAVE UNTIL MASTER_LOG_FILE='db-bin.000247', MASTER_LOG_POS=97832543; Query OK, 0 rows affected (0.00 sec) mysql> show slave status \G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.0.0.26 Master_User: ******* Master_Port: 3306 Connect_Retry: 60 Master_Log_File: db-bin.000283 Read_Master_Log_Pos: 295414923 Relay_Log_File: db-relay-bin.000025 Relay_Log_Pos: 97842972 Relay_Master_Log_File: db-bin.000247 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 97842835 Relay_Log_Space: 38950730252 Until_Condition: Master Until_Log_File: db-bin.000247 Until_Log_Pos: 97832543 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL 1 row in set (0.00 sec) ERROR: No query specified mysql> select * from subscriberfield where subscriber_id = 90615540 and field_id = 42846; +------------+---------------+----------+---------+-------+ | id | subscriber_id | field_id | content | mtime | +------------+---------------+----------+---------+-------+ | 1839528432 | 90615540 | 42846 | 78633 | NULL | +------------+---------------+----------+---------+-------+ 1 row in set (0.00 sec) "francois" has been swapped out for "78633". Just 5 or so commits later in the bin log we see this query: # at 97837246 #100114 11:09:31 server id 13 end_log_pos 97837137 Intvar SET INSERT_ID=1839528432/*!*/; # at 97837274 #100114 11:09:31 server id 13 end_log_pos 97837343 Query thread_id=44827175 exec_time=1 error_code=0 SET TIMESTAMP=1263485371/*!*/; insert into subscriberfield set subscriber_id = '84779527', field_id = '53781', content = '78633' on duplicate key update content = '78633' /*!*/; # at 97837480 #100114 11:09:31 server id 13 end_log_pos 97837371 Intvar SET INSERT_ID=2133638344/*!*/; Notice that subscriber_id & field_id are different, yet the content field bled over (in addition to creating it's own row). If you look at that table: CREATE TABLE `subscriberfield` ( `id` bigint(20) unsigned NOT NULL auto_increment, `subscriber_id` bigint(20) unsigned NOT NULL, `field_id` int(10) unsigned NOT NULL, `content` text, `mtime` datetime default NULL, PRIMARY KEY (`id`), UNIQUE KEY `subscriberfield_subscriber_id_field_id_idx` (`subscriber_id`,`field_id`), KEY `field_id` (`field_id`), KEY `subscriberfield_field_id_content_idx` (`field_id`,`content`(10)), CONSTRAINT `subscriberfield_ibfk_1` FOREIGN KEY (`subscriber_id`) REFERENCES `subscriber` (`id`), CONSTRAINT `subscriberfield_ibfk_2` FOREIGN KEY (`field_id`) REFERENCES `field` (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=1839941935 DEFAULT CHARSET=latin1 There's a UNIQUE KEY on (`subscriber_id`,`field_id`) which should prevent that insert on duplicate key update from affecting a row with a different subscriber & field ID, yet it does. The problem most likely lies in the primary key. Prior to executing that last query, the binlog has an entry: SET INSERT_ID=1839528432/*!*/; That just so happens to be the primary key of our beloved francois mysql> select * from subscriberfield where subscriber_id = 90615540 and field_id = 42846; +------------+---------------+----------+---------+-------+ | id | subscriber_id | field_id | content | mtime | +------------+---------------+----------+---------+-------+ | 1839528432 | 90615540 | 42846 | 78633 | NULL | +------------+---------------+----------+---------+-------+ Maybe mysql 5.0.77 is writing the inappropriate SET INSERT_ID into the binlog occasionally? We have a multisharded environment and I've been able to see this same problem on all of our shards. Our master/slave pairs are running 5.0.77 which is what we did this troubleshooting on. We also have some shards with a 2nd slave running 5.1.34 which exhibit the exact same symptoms. Any ideas?
[28 Jan 2010 21:25]
justin dahlke
Found a very dirty workaround: We removed the transaction surrounding that insert on dup key statement and made our db wrapper smarter. We essentially had to shift from using a transaction in that one spot to retrying that query until it's successful. After reloading our slaves, we've found no deviation for a week so far.
[3 Sep 2010 15:42]
MySQL Verification Team
duplicate of bug #50413