MySQL Bugs: #50632: Purging records from large partitioned INNODB table into ARCHIVE memory error

Bug #50632	Purging records from large partitioned INNODB table into ARCHIVE memory error
Submitted:	26 Jan 2010 17:16	Modified:	22 Apr 2011 17:08
Reporter:	Glenn Plas	Email Updates:
Status:	Duplicate	Impact on me:	None
Category:	MySQL Server: Archive storage engine	Severity:	S1 (Critical)
Version:	5.1.41	OS:	Linux
Assigned to:		CPU Architecture:	Any

Description:
While performing a purge from a partitioned INNODB table to an ARCHIVE type using the following mk-archive (maatkit) command:

mk-archiver --source h=localhost,P=3330,D=traceme,t=Eventdelijn,u=root,p=datacharmer --dest h=localhost,P=3330,a=Historics,D=Historics,u=root,p=datacharmer,t=his_Event_delijn --safe-auto-increment --txn-size 1000 --where "gps_date BETWEEN '2006-01-01 00:00:00' AND '2007-01-01 00:00:00'" --statistics --progress 1000 --limit 500

I got the following error:
ERROR 5 (HY000): Out of memory (Needed 110556 bytes)
Upon closer examination , inside mk-archive the 'SHOW CREATE TABLE' command seemed to return a different column count which I knew whas impossible as the tables are exactly alike, except for the keys in the archive version of course.

If I do this migration into another INNODB table it works fine.  Both target and source table are partitioned on gps_date,  I use this design for a few years now (except the pruning to ARCHIVE partitions) without issues.

Immediately after this problem, I opened a shell and tried:
mysql> SHOW CREATE TABLE `Historics`.`his_Event_delijn`;
ERROR 5 (HY000): Out of memory (Needed 110556 bytes)

I then tried to create a small table without partitions, this worked and then tried the 'show create' command on it and it returned correct output.  A bit later I could do this on a partitioned version again.

Please note that in the source tables there is hardly data in between those dates.  Only a very small amount (for the very first customers in fact) of data for 2007 exists so it's unlikely that there are large portions of memory in use.

When monitoring with top I can see a steep increase in allocated memory and even the first signs of swap (swappiness=0).  Without restarting my DB I tried a bit later again , the same command and it worked.  There are about 70 tables in 'traceme' db that need to be purged, some are small, some big (big here=over 20 million records).  I've seen tons of people with exponentionaly large datasets so I don't expect the volume to be an issue.

I have noticed something interesting too.  We are running on 32bit debian version on this machine which is the whole reason im preparing a replacement but I need to groom some data first.  The memory usage is at the max of 2 gigs at the moment of doing this archiving(4 gig in machine).  Before that that isn't the case, The DB ran fine.  But the most interesting fact is that I have a slave running on a 64bit install (fresh) using statement based replication.  This machine also exhibits overusage of the memory but it goes really wild there, it uses up the full 4 gigs or ram plus another 4 gigs of swap all the sudden during this process. Which makes sense to me as its not limited like a 32bit install.  All that one does replicate is the INSERTS into ARCHIVE table and the DELETE's from the INNODB version.

How to repeat:
See the attached schema's

Use a large dataset, the size of the raw_data blob's in the tables is 33 bytes in every record, the extra blob field is hardly ever used.  Create about 10 million rows spread over 2 years ( mind the partitioning schema).

create the source table in innodb.  The target table in the ARCHIVE storage engine.

- using maatkit's archiver:

mk-archiver --source h=localhost,P=3330,D=traceme,t=Eventdelijn,u=root,p=datacharmer --dest h=localhost,P=3330,a=Historics,D=Historics,u=root,p=datacharmer,t=his_Event_delijn --safe-auto-increment --txn-size 1000 --where "gps_date BETWEEN '2006-01-01 00:00:00' AND '2007-01-01 00:00:00'" --statistics --progress 1000 --limit 500

Monitor memory usage.  I sync up the 70 tables , each his own command line as described above, it may only happen after doing 20 tables.  But there is no need for any data to be actually transfered for this error to happen.

- using a trigger:

I found out that there is also another way of recreating this problem, by the use of an AFTER INSERT trigger like this:

use traceme;

DELIMITER $$

CREATE trigger tr_LastSeen_Eventdelijn AFTER INSERT ON `Eventdelijn` FOR EACH ROW

BEGIN

-- Doesnt do anything really
DECLARE MyCont int DEFAULT 0;

DECLARE CONTINUE HANDLER FOR SQLSTATE '23000' SET MyCont=1;
-- DECLARE CONTINUE HANDLER FOR NOT FOUND, SQLEXCEPTION, SQLWARNING SET MyCont=1;

-- Store in the Archive right away
INSERT INTO Historics.his_Event_delijn (record_id, gps_date, imei, switch, event_id, latitude, longtitude, IO, raw_data, extra) VALUES (NEW.record_id, NEW.gps_date, NEW.imei, NEW.switch, NEW.event_id, NEW.latitude, NEW.longtitude, NEW.IO, NEW.raw_data, NEW.extra );

END $$

DELIMITER ;

Then, with a dump (bunch of insert) of the original source in hand, load up the first table with the trigger on and let it fire them also into the ARCHIVE table.  (I realize this isnt the most excellent design choice but it sure repeats the problem here).  Again, do this with an INNODB target table and it works fine.

Suggested fix:
There is no real fix at the moment that I know of, playing with some of the options to mk-archiver showed no change in behavior.

I believe there is either a memory leak somewhere in the ARCHIVE CODE that for the moment only seems to show up in combination with the use of PARTITIONS.
If there is any aditional information needed and or tests let me know.

I would suspect my 32 bit version and the fairly low amount of ram in use at first if it wasn't for the fact that I have a 64bit slave that shows the same leap in memory usage.  The second reason I believe this is a bug is that it doesn't happen when not using the ARCHIVE engine.

My appologies for attaching the wrong file.  Can someone please remove it.  Thanks.

Schema of both tables

Attachment: partitioned.txt (text/plain), 54.91 KiB.

Can be related to bug #49161 or even bug #48238. Do you use 32-bit mysqld binary on 64-bit Linux?

Could you also please decrease txnsize to, say, 50 and examine if problem still exits? Thanks in advance.

This master machine is a 32bit linux (kernel + binaries) with mysqld 32 bit version on 64bit capable hardware on the master.  There's nothing even remotely 64bit on this box afaik.

On the new slave we've installed the correct 64bit kernel/binaries fresh lenny install with the backports package of mysqld (64bit), I have lots of experience with that package.  The point of the slave to get promoted to master later on and replace the current machine.  So I need mk-table-sync and mk-archive to work properly so I can sleep.

I'm attaching the output of the test case script with mk-archiver and MKDEBUG=1 on, its bizar, I see this output now but nothing happens, I've cleanly restarted the DB's a few times already, nothing is wrong, we're running production stuff on this with lots of queries/inserts/s all the time so I can assume the DB is running fine.

I noticed when I manually test some of the shown queries that it only shows this output when there is something to do.  I don't think I've changed anything , it ran fine yesterday.  I've checked the target tables and the records of yesterday are there and work fine (half million).  I really figured I contained this enough at this point.  I will investigate.

Other table, same test  (one that worked for sure yesterday)

Attachment: output_txn50.txt (text/plain), 38.14 KiB.

As for bug #49161, I don't see a problem it when I run check on the table but this is of course an Archive table, not sure how far check works on them.  If this was the case and these files get corrupted somehow, why does it only happen for Archive tables apparantly ?

mysql> check table his_Event_syncomict;
+-------------------------------+-------+----------+----------+
| Table                         | Op    | Msg_type | Msg_text |
+-------------------------------+-------+----------+----------+
| Historics.his_Event_syncomict | check | status   | OK       |
+-------------------------------+-------+----------+----------+
1 row in set (1.00 sec)

Also I don't use innodbbackup at all.  My par files dont look broken or are missing, if there is a problem its some logical corruption.

As for #48238 I'm not running out of memory during dauly operations, I never have that error.  Let me know what you need.  Thanks for looking into this.

Thank you for the feedback.

I can not repeat described behavior with test data. Additionally version 5.1.41 is outdated. Please try current version 5.1.43 and if problem still exists provide your configuration file.

5.1.41 outtdated ?   With all due respect, there is nothing in the changes lists of either 5.1.42 nor 5.1.43 that even remotely connects to this issue, upgrading a production machine to the latest/greatest is not only a lousy practice, I cannot see where this would help me? I studied the changes lists in detail of those 2 more recent versions.  Unless some changes which could apply here weren't documented I see no value in upgrading.  I would if It could even remotely help me, but there is nothing new in the archive engine since 5.1.41 so why even bother risking my data.

I can assure you it is there, I can reproduce at will, bring the DB down in the process.  the last restart I did on this DB was the day I logged this bug. I should be safe to assume it is stable.  Only when I start hitting archive tables as described it becomes unstable.

Anyway, I hope I did my duty and reported a severe problem, I'm seriously considering a migration away from mysql, this is the 10th bug or so I've been hitting in the last year... I willing to bet that you will see this one return in the near future.

actually I can confirm archive in general is pretty buggy in 5.1.41.  My random testcase "bug47012.c" on bug #47012 still proves that in 5.1.43.  There are memory errors and choices made depending on random values.  Note, this is not needing an upgrade to see this issue.  I'm guessing the table cache gets corrupted, and FLUSH TABLES might either crash or remedy the situation....

Hi Shane,

Thanks for what is probably the most useful reaction I got on MY problem. I'm very sure there is something up with the ARCHIVE stuff which I can reproduce in my sleep by now. I've been hitting problem after problem with mysql, Innodb just keeps eating disk space, "show triggers from 'schema'" which takes forever and in the mean time locks all my tables. It just doesn't gets committed to my brain anymore why things like a simple information_schema query can destabelize a DB that otherwise runs fine for weeks if not touched.

DB Start-ups are such a pain for me, I have 700 concurent connections at all times. When I start my db up I cannot -at all- start all the client threads up at the same time, all they do is simple but a lot of inserts on tables (I'm talking 33 bytes per record). Whenever I restart the database it's hell to pay, it takes me half an hour/hour to get them all running and once they do they work fine. Everything hangs on OPEN TABLES or CLOSE TABLES etc. So it's a diesel sort of speak but that might still be a compliment compaired.

Now really, when one issues "show tables from 'schema'" and it takes like 30 minutes and locks everything else in the mean time, how useful is this DB then? I'm migrating to MariaDB now and hoping their fixes will help me. I can't look back here. I used to really love MySql but I want a divorce now the sooner the better.

What bothers me even more is that whenever I hit a problem, I search the bug db a bit and usually I find a bugreport opened somewhere in 2003 and fixed in a version way above me. It just makes me cry. How ignorant I was thinking this one would be easy to reproduce and fast to fix.

some examples:
bug #19588 bug #1341

I came to the point that I don't care anymore what happens with this bugreport, I think I did what I had to do but as it is, it doesn't solve my real world issue I have now. Thanks anyone that put effort in this, I do appreciate it but I'm far too desillusioned in the product itself. Feel free to close this and live hapilly ever after.

could still be a duplicate of bug #51252

It could very much be a duplicate of that one actually ....  I will keep an eye on it, as it stands, ARCHIVE isn't an option anymore for us.  I'll have to select another engine to store historical data.  Too bad because partitioned archive tables go very fast.

Let's consider this a duplicate of bug #51252 for now (that bug is not fixed yet anyway...)