Bug #20877 InnoDB data dictionary memory footprint is too big
Submitted: 6 Jul 2006 2:31 Modified: 27 May 17:40
Reporter: Boris Burtin
Status: Verified
Category:Server: InnoDB Severity:S4 (Feature request)
Version:5.1 OS:Linux (Linux)
Assigned to: Inaam Rana Target Version:5.1+
Triage: D2 (Serious) / R4 (High) / E4 (High)

[6 Jul 2006 2:31] Boris Burtin
Description:
The InnoDB dictionary cache never frees data
and hence limits how many tables we are able to open during the lifetime
of the mysql server process. This results in us not being able to
deploy a system where a large number of users are provisioned,
but only a small set are active at any given time.

For instance, visiting 100,000 tables causes a dict_table_t whose heap
size is 22144 to be added to dict_sys in dict_table_add_to_cache().
That is 2GB of RAM!

The code to trim this cache is commented out - possibly because
the mem_fix ref count is never decremented in all the right places.

As we scale to larger deployments we need MySQL to be able to support a
larger number of tables without running out of memory.

How to repeat:
Access 100,000 tables without restarting MySQL.
[6 Jul 2006 21:03] Valeriy Kravchuk
Thank you for a problem report. In fact, for 100000 tables to be used efficiently 2GB of
RAM is not so much. But I agree that some way to force cleaning of dictionary cache,
without restarting server, might be useful. So, I mark this report as a verified feature
request.
[12 Sep 2006 15:38] Marko Mäkelä
I have a patch that significantly reduces the size of dict_col_t and other structures by
making use of bit fields and omitting unnecessary fields, such as type->prec and some
magic numbers.

I may be able to reduce the sizes even further by replacing some pointers with inline
data, such as replacing linked lists with arrays.
[21 Sep 2006 12:44] Tim Smith
Patch queued to -maint team tree (expect to push to 5.1.12)
[3 Oct 2006 22:16] Chad MILLER
Available in 5.1.12-beta.
[6 Oct 2006 5:16] Paul DuBois
Noted in 5.1.12 changelog.

Memory consumption of the InnoDB data dictionary cache was roughly
halved by cleaning up the data structures.
[9 Jan 2007 8:29] Marko Mäkelä
The fix of this bug introduced Bug #24741: existing cascade clauses disappear when adding
foreign keys.
[19 Mar 2007 18:32] Stephan Fudeus
Sorry for posting to this long-closed bug, but is there any chance for this issue to be
fixed also for 5.0.x?
[19 Mar 2007 18:39] Heikki Tuuri
Stephan,

sorry, no. 5.0 is a 'production' version of MySQL, and no new features are allowed. As you
see, this fix caused a serious bug in 5.1.12. We do not want to risk the stability of
5.0.

Regards,

Heikki
[28 Nov 2007 11:07] kenichi yonekawa
Sorry, this bug is really fixed by 5.1?
I tried debugging while watching the dict_sys->size.
The procedure is written as follows. 

1. Starting mysqld from GDB
2. Execute create table query.
3. Watching dict_sys->size value.

MySQL5.0(5.0.46)
 before)
 p dict_sys->size => 19160
 
 after table create)
 p dict_sys->size => 45960

 45960 - 19160 = 26800(heap size of tab_mail_message)

MySQL5.1(5.1.22-rc)
 before)
 p dict_sys->size => 20888

 after table create)
 p dict_sys->size => 82264

 82264 - 20888 = 61376(heap size of tab_mail_message)

It seems not to be fixed like this. 
Please verify it.

The query I tried =>

CREATE TABLE  `test`.`tab_mail_message` (
  `_id` bigint(20) NOT NULL auto_increment,
  `col_abstract_data` char(128) collate utf8_general_ci default NULL,
  `col_abstract_from` char(128) collate utf8_general_ci default NULL,
  `col_abstract_from_email` char(128) collate utf8_general_ci default NULL,
  `col_abstract_from_name` char(128) collate utf8_general_ci default NULL,
  `col_abstract_subject` char(128) collate utf8_general_ci default NULL,
  `col_abstract_to` char(128) collate utf8_general_ci default NULL,
  `col_abstract_to_email` char(128) collate utf8_general_ci default NULL,
  `col_abstract_to_name` char(128) collate utf8_general_ci default NULL,
  `col_action` char(32) collate utf8_general_ci default NULL,
  `col_attached` int(11) default NULL,
  `col_bcc` text collate utf8_general_ci,
  `col_cc` text collate utf8_general_ci,
  `col_confirmation_final_recipient` text collate utf8_general_ci,
  `col_confirmation_org_message_id` char(255) collate utf8_general_ci default NULL,
  `col_confirmation_request` int(11) default NULL,
  `col_confirmation_response` int(11) default NULL,
  `col_confirmation_status` char(32) collate utf8_general_ci default NULL,
  `col_confirmation_to` text collate utf8_general_ci,
  `col_content_type` char(255) collate utf8_general_ci default NULL,
  `col_creator` bigint(20) default NULL,
  `col_creator_foreign_key` char(255) collate utf8_general_ci default NULL,
  `col_creator_name` char(100) collate utf8_general_ci default NULL,
  `col_ctime` int(11) default NULL,
  `col_data` longtext collate utf8_general_ci,
  `col_date` text collate utf8_general_ci,
  `col_draft` int(11) default NULL,
  `col_dtime` int(11) default NULL,
  `col_folder` bigint(20) default NULL,
  `col_from` text collate utf8_general_ci,
  `col_html_data` longtext collate utf8_general_ci,
  `col_in_reply_to` text collate utf8_general_ci,
  `col_message_id` char(255) collate utf8_general_ci default NULL,
  `col_modifier` bigint(20) default NULL,
  `col_modifier_foreign_key` char(255) collate utf8_general_ci default NULL,
  `col_modifier_name` char(100) collate utf8_general_ci default NULL,
  `col_mtime` int(11) default NULL,
  `col_origin_mail` bigint(20) default NULL,
  `col_read_ts` int(11) NOT NULL,
  `col_references` text collate utf8_general_ci,
  `col_reply_to` text collate utf8_general_ci,
  `col_reserve_blob1` blob,
  `col_reserve_blob2` blob,
  `col_reserve_blob3` blob,
  `col_reserve_int1` int(11) default NULL,
  `col_reserve_int2` int(11) default NULL,
  `col_reserve_int3` int(11) default NULL,
  `col_reserve_text1` char(100) collate utf8_general_ci default NULL,
  `col_reserve_text2` char(100) collate utf8_general_ci default NULL,
  `col_reserve_text3` char(100) collate utf8_general_ci default NULL,
  `col_send_ts` int(11) NOT NULL,
  `col_sent` int(11) NOT NULL,
  `col_sign_data` text collate utf8_general_ci,
  `col_signature` bigint(20) default NULL,
  `col_size` int(11) default NULL,
  `col_status` char(64) collate utf8_general_ci NOT NULL,
  `col_subject` text collate utf8_general_ci,
  `col_timestamp` int(11) NOT NULL,
  `col_to` text collate utf8_general_cs,
  `col_unsent` int(11) default NULL,
  `col_user` bigint(20) default NULL,
  PRIMARY KEY  (`_id`),
  KEY `cni_folder` (`col_folder`),
  KEY `cni_origin_mail` (`col_origin_mail`),
  KEY `cni_signature` (`col_signature`),
  KEY `idx_folder_data_n` (`col_folder`,`col_abstract_data`,`_id`),
  KEY `idx_folder_dts_n` (`col_folder`,`col_dtime`,`_id`),
  KEY `idx_folder_from_email_n` (`col_folder`,`col_abstract_from_email`,`_id`),
  KEY `idx_folder_from_n` (`col_folder`,`col_abstract_from`,`_id`),
  KEY `idx_folder_from_name_n` (`col_folder`,`col_abstract_from_name`,`_id`),
  KEY `idx_folder_rts_n` (`col_folder`,`col_read_ts`,`_id`),
  KEY `idx_folder_status` (`col_folder`,`col_status`),
  KEY `idx_folder_sts_n` (`col_folder`,`col_send_ts`,`_id`),
  KEY `idx_folder_subject_n` (`col_folder`,`col_abstract_subject`,`_id`),
  KEY `idx_folder_to_email_n` (`col_folder`,`col_abstract_to_email`,`_id`),
  KEY `idx_folder_to_n` (`col_folder`,`col_abstract_to`,`_id`),
  KEY `idx_folder_to_name_n` (`col_folder`,`col_abstract_to_name`,`_id`),
  KEY `idx_folder_ts_n` (`col_folder`,`col_timestamp`,`_id`),
  KEY `idx_user_messageid` (`col_user`,`col_message_id`),
  KEY `idx_user_status_ts` (`col_user`,`col_status`,`col_timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
[28 Nov 2007 14:33] Heikki Tuuri
Inaam,

can you repeat the latest complaint about 5.1 taking more space?

Marko should have cut the space usage to about 50 % of the original.

Regards,

Heikki
[28 Nov 2007 14:45] Marko Mäkelä
Some of the relevant data structures take significantly more space on 64-bit platforms
than on 32-bit, mainly due to different alignment constraints and wider pointers.

I would like to know if Kenichi Yonekawa obtained the results in the same environment,
with mysqld compiled in the same way (both 32-bit, or both 64-bit). I am guessing that the
difference could be explained by comparing a 32-bit MySQL 5.0 to a 64-bit MySQL 5.1.
[29 Nov 2007 6:14] kenichi yonekawa
Marco,

Sorry, I had forgotten to write the environment. 
The environment was written, and it posted it here. 
http://bugs.mysql.com/32836

Regards.
[29 Nov 2007 7:06] kenichi yonekawa
Oh..., I didn't notice this bug was reopened.
Sorry.
[27 May 17:23] Calvin Sun
both bug#32836 and bug#36613 are marked as duplicates of this one.
[27 May 17:56] Mattias Jonsson
here is what I manage to get when I look at the core file after crashing the server with
about 3000 innodb tables with 500 columns (see bug#36613 oom-test case):
dict_sys->size = 3521749104.
[2 Jun 18:31] Ken Jacobs
While we appreciate that there are people running into this issue, and that it is
exacerbated by the use of partitioning, unfortunately this is simply not a bug.   There is
no memory leak, there is no reported crash related to this issue, and this is not a
regression in performance or memory use.  This bug report and related bug reports
represent a feature request.

The observed behavior is not a memory leak. (It is a stretch to say that because there is
no LRU treatment of meta data there is a "leak", which is an unintentional/undesigned
failure to release memory).   InnoDB works as designed and documented.  

There is no regression here (i.e., InnoDB per-table memory use is the same as before).  As
noted in bug 32836, some changes were made in 5.1.11 that considerably reduced memory
use.

Indeed, as noted in bug 20877, the system works fine for small/normal tables, perhaps up
to 10s of thousands of tables.  Alternative schema designs will reduce or eliminate the
need for large numbers of tables with large numbers of columns and large numbers of
partitions.   

Another "workaround" is to use larger memories (perhaps on a 64-bit machine), providing
more RAM (and sufficient swap space) for this meta data.

The design of partitions in MySQL (above the storage engines) means that each storage
engine will likely maintain repeated meta data for the underlying tables of identical
structure.  It is not clear that a simple LRU mechanism that allows such meta data to page
out would be sufficient to address this issue. (Since many of the partitions might be
quite active, there might be significant performance problems with this approach.)

Further, there would be substantial development investment and risk in making this sort of
change.  It simply cannot happen in the 5.1 (or likely 6.0) time frame.

Therefore we consider this bug report to amount to a feature request (and it should be
classified as such).  The InnoDB team will consider making changes in this area for a
future release, and we will discuss this topic with Sun/MySQL.
[13 Jun 4:09] Ken Jacobs
This issue may affect all storage engines.  Any "fix" could have significant performance
side-effects.  Making changes in the management of data dictionary information would
require substantial work and therefore be very risky to include.

Changing Triage information accordingly.