MySQL Bugs: #20877: InnoDB data dictionary memory footprint is too big

Bug #20877	InnoDB data dictionary memory footprint is too big
Submitted:	6 Jul 2006 0:31	Modified:	13 Feb 2012 1:12
Reporter:	Boris Burtin	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: InnoDB storage engine	Severity:	S4 (Feature request)
Version:	5.1	OS:	Any (Linux)
Assigned to:	Sunny Bains	CPU Architecture:	Any

Description:
The InnoDB dictionary cache never frees data
and hence limits how many tables we are able to open during the lifetime
of the mysql server process. This results in us not being able to
deploy a system where a large number of users are provisioned,
but only a small set are active at any given time.

For instance, visiting 100,000 tables causes a dict_table_t whose heap
size is 22144 to be added to dict_sys in dict_table_add_to_cache().
That is 2GB of RAM!

The code to trim this cache is commented out - possibly because
the mem_fix ref count is never decremented in all the right places.

As we scale to larger deployments we need MySQL to be able to support a
larger number of tables without running out of memory.

How to repeat:
Access 100,000 tables without restarting MySQL.

Thank you for a problem report. In fact, for 100000 tables to be used efficiently 2GB of RAM is not so much. But I agree that some way to force cleaning of dictionary cache, without restarting server, might be useful. So, I mark this report as a verified feature request.

I have a patch that significantly reduces the size of dict_col_t and other structures by making use of bit fields and omitting unnecessary fields, such as type->prec and some magic numbers.

I may be able to reduce the sizes even further by replacing some pointers with inline data, such as replacing linked lists with arrays.

Patch queued to -maint team tree (expect to push to 5.1.12)

Available in 5.1.12-beta.

Noted in 5.1.12 changelog.

Memory consumption of the InnoDB data dictionary cache was roughly
halved by cleaning up the data structures.

The fix of this bug introduced Bug #24741: existing cascade clauses disappear when adding foreign keys.

Sorry for posting to this long-closed bug, but is there any chance for this issue to be fixed also for 5.0.x?

Stephan,

sorry, no. 5.0 is a 'production' version of MySQL, and no new features are allowed. As you see, this fix caused a serious bug in 5.1.12. We do not want to risk the stability of 5.0.

Regards,

Heikki

Sorry, this bug is really fixed by 5.1?
I tried debugging while watching the dict_sys->size.
The procedure is written as follows. 

1. Starting mysqld from GDB
2. Execute create table query.
3. Watching dict_sys->size value.

MySQL5.0(5.0.46)
 before)
 p dict_sys->size => 19160
 
 after table create)
 p dict_sys->size => 45960

 45960 - 19160 = 26800(heap size of tab_mail_message)

MySQL5.1(5.1.22-rc)
 before)
 p dict_sys->size => 20888

 after table create)
 p dict_sys->size => 82264

 82264 - 20888 = 61376(heap size of tab_mail_message)

It seems not to be fixed like this. 
Please verify it.

The query I tried =>

CREATE TABLE  `test`.`tab_mail_message` (
  `_id` bigint(20) NOT NULL auto_increment,
  `col_abstract_data` char(128) collate utf8_general_ci default NULL,
  `col_abstract_from` char(128) collate utf8_general_ci default NULL,
  `col_abstract_from_email` char(128) collate utf8_general_ci default NULL,
  `col_abstract_from_name` char(128) collate utf8_general_ci default NULL,
  `col_abstract_subject` char(128) collate utf8_general_ci default NULL,
  `col_abstract_to` char(128) collate utf8_general_ci default NULL,
  `col_abstract_to_email` char(128) collate utf8_general_ci default NULL,
  `col_abstract_to_name` char(128) collate utf8_general_ci default NULL,
  `col_action` char(32) collate utf8_general_ci default NULL,
  `col_attached` int(11) default NULL,
  `col_bcc` text collate utf8_general_ci,
  `col_cc` text collate utf8_general_ci,
  `col_confirmation_final_recipient` text collate utf8_general_ci,
  `col_confirmation_org_message_id` char(255) collate utf8_general_ci default NULL,
  `col_confirmation_request` int(11) default NULL,
  `col_confirmation_response` int(11) default NULL,
  `col_confirmation_status` char(32) collate utf8_general_ci default NULL,
  `col_confirmation_to` text collate utf8_general_ci,
  `col_content_type` char(255) collate utf8_general_ci default NULL,
  `col_creator` bigint(20) default NULL,
  `col_creator_foreign_key` char(255) collate utf8_general_ci default NULL,
  `col_creator_name` char(100) collate utf8_general_ci default NULL,
  `col_ctime` int(11) default NULL,
  `col_data` longtext collate utf8_general_ci,
  `col_date` text collate utf8_general_ci,
  `col_draft` int(11) default NULL,
  `col_dtime` int(11) default NULL,
  `col_folder` bigint(20) default NULL,
  `col_from` text collate utf8_general_ci,
  `col_html_data` longtext collate utf8_general_ci,
  `col_in_reply_to` text collate utf8_general_ci,
  `col_message_id` char(255) collate utf8_general_ci default NULL,
  `col_modifier` bigint(20) default NULL,
  `col_modifier_foreign_key` char(255) collate utf8_general_ci default NULL,
  `col_modifier_name` char(100) collate utf8_general_ci default NULL,
  `col_mtime` int(11) default NULL,
  `col_origin_mail` bigint(20) default NULL,
  `col_read_ts` int(11) NOT NULL,
  `col_references` text collate utf8_general_ci,
  `col_reply_to` text collate utf8_general_ci,
  `col_reserve_blob1` blob,
  `col_reserve_blob2` blob,
  `col_reserve_blob3` blob,
  `col_reserve_int1` int(11) default NULL,
  `col_reserve_int2` int(11) default NULL,
  `col_reserve_int3` int(11) default NULL,
  `col_reserve_text1` char(100) collate utf8_general_ci default NULL,
  `col_reserve_text2` char(100) collate utf8_general_ci default NULL,
  `col_reserve_text3` char(100) collate utf8_general_ci default NULL,
  `col_send_ts` int(11) NOT NULL,
  `col_sent` int(11) NOT NULL,
  `col_sign_data` text collate utf8_general_ci,
  `col_signature` bigint(20) default NULL,
  `col_size` int(11) default NULL,
  `col_status` char(64) collate utf8_general_ci NOT NULL,
  `col_subject` text collate utf8_general_ci,
  `col_timestamp` int(11) NOT NULL,
  `col_to` text collate utf8_general_cs,
  `col_unsent` int(11) default NULL,
  `col_user` bigint(20) default NULL,
  PRIMARY KEY  (`_id`),
  KEY `cni_folder` (`col_folder`),
  KEY `cni_origin_mail` (`col_origin_mail`),
  KEY `cni_signature` (`col_signature`),
  KEY `idx_folder_data_n` (`col_folder`,`col_abstract_data`,`_id`),
  KEY `idx_folder_dts_n` (`col_folder`,`col_dtime`,`_id`),
  KEY `idx_folder_from_email_n` (`col_folder`,`col_abstract_from_email`,`_id`),
  KEY `idx_folder_from_n` (`col_folder`,`col_abstract_from`,`_id`),
  KEY `idx_folder_from_name_n` (`col_folder`,`col_abstract_from_name`,`_id`),
  KEY `idx_folder_rts_n` (`col_folder`,`col_read_ts`,`_id`),
  KEY `idx_folder_status` (`col_folder`,`col_status`),
  KEY `idx_folder_sts_n` (`col_folder`,`col_send_ts`,`_id`),
  KEY `idx_folder_subject_n` (`col_folder`,`col_abstract_subject`,`_id`),
  KEY `idx_folder_to_email_n` (`col_folder`,`col_abstract_to_email`,`_id`),
  KEY `idx_folder_to_n` (`col_folder`,`col_abstract_to`,`_id`),
  KEY `idx_folder_to_name_n` (`col_folder`,`col_abstract_to_name`,`_id`),
  KEY `idx_folder_ts_n` (`col_folder`,`col_timestamp`,`_id`),
  KEY `idx_user_messageid` (`col_user`,`col_message_id`),
  KEY `idx_user_status_ts` (`col_user`,`col_status`,`col_timestamp`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Inaam,

can you repeat the latest complaint about 5.1 taking more space?

Marko should have cut the space usage to about 50 % of the original.

Regards,

Heikki

Some of the relevant data structures take significantly more space on 64-bit platforms than on 32-bit, mainly due to different alignment constraints and wider pointers.

I would like to know if Kenichi Yonekawa obtained the results in the same environment, with mysqld compiled in the same way (both 32-bit, or both 64-bit). I am guessing that the difference could be explained by comparing a 32-bit MySQL 5.0 to a 64-bit MySQL 5.1.

Marco,

Sorry, I had forgotten to write the environment. 
The environment was written, and it posted it here. 
http://bugs.mysql.com/32836

Regards.

Oh..., I didn't notice this bug was reopened.
Sorry.

both bug#32836 and bug#36613 are marked as duplicates of this one.

here is what I manage to get when I look at the core file after crashing the server with about 3000 innodb tables with 500 columns (see bug#36613 oom-test case):
dict_sys->size = 3521749104.

While we appreciate that there are people running into this issue, and that it is exacerbated by the use of partitioning, unfortunately this is simply not a bug. There is no memory leak, there is no reported crash related to this issue, and this is not a regression in performance or memory use. This bug report and related bug reports represent a feature request.

The observed behavior is not a memory leak. (It is a stretch to say that because there is no LRU treatment of meta data there is a "leak", which is an unintentional/undesigned failure to release memory). InnoDB works as designed and documented.

There is no regression here (i.e., InnoDB per-table memory use is the same as before). As noted in bug 32836, some changes were made in 5.1.11 that considerably reduced memory use.

Indeed, as noted in bug 20877, the system works fine for small/normal tables, perhaps up to 10s of thousands of tables. Alternative schema designs will reduce or eliminate the need for large numbers of tables with large numbers of columns and large numbers of partitions.

Another "workaround" is to use larger memories (perhaps on a 64-bit machine), providing more RAM (and sufficient swap space) for this meta data.

The design of partitions in MySQL (above the storage engines) means that each storage engine will likely maintain repeated meta data for the underlying tables of identical structure. It is not clear that a simple LRU mechanism that allows such meta data to page out would be sufficient to address this issue. (Since many of the partitions might be quite active, there might be significant performance problems with this approach.)

Further, there would be substantial development investment and risk in making this sort of change. It simply cannot happen in the 5.1 (or likely 6.0) time frame.

Therefore we consider this bug report to amount to a feature request (and it should be classified as such). The InnoDB team will consider making changes in this area for a future release, and we will discuss this topic with Sun/MySQL.

This issue may affect all storage engines.  Any "fix" could have significant performance side-effects.  Making changes in the management of data dictionary information would require substantial work and therefore be very risky to include.

Changing Triage information accordingly.

The fix of this bug introduced Bug #40369: dtype_get_sql_null_size() returns 0 or 1, not the size. This affects tables in ROW_FORMAT=REDUNDANT that have fixed-length columns that can be set NULL.

I want to point that problem with data dictionary can be used for local DOS attack

any use with create table privileges can run simple bash script

x=1
while [ 1 ]
do
 mysql -e "CREATE TABLE test$x (id int) engine=InnoDB" test
 ((x++))
done

and in short period of time all memory will be used and mysql will be crashed with Out Of Memory problem.

I consider this bug as security vulnerability with possible DOS attack.

Ronald, means by which authorised users can use their privileges to exhaust system resources are not generally considered by us to be denial of service vulnerabilities.

Pushed into 5.1.47 (revid:joro@sun.com-20100505145753-ivlt4hclbrjy8eye) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into mysql-next-mr (revid:alik@sun.com-20100524190136-egaq7e8zgkwb9aqi) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (pib:16)

Pushed into 6.0.14-alpha (revid:alik@sun.com-20100524190941-nuudpx60if25wsvx) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.5.5-m3 (revid:alik@sun.com-20100524185725-c8k5q7v60i5nix3t) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.5.5-m3 (revid:alik@sun.com-20100615080459-smuswd9ooeywcxuc) (version source revid:mmakela@bk-internal.mysql.com-20100415070122-1nxji8ym4mao13ao) (merge vers: 5.1.47) (pib:16)

Pushed into mysql-next-mr (revid:alik@sun.com-20100615080558-cw01bzdqr1bdmmec) (version source revid:mmakela@bk-internal.mysql.com-20100415070122-1nxji8ym4mao13ao) (pib:16)

Pushed into 5.1.47-ndb-7.0.16 (revid:martin.skold@mysql.com-20100617114014-bva0dy24yyd67697) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-6.2.19 (revid:martin.skold@mysql.com-20100617115448-idrbic6gbki37h1c) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

Pushed into 5.1.47-ndb-6.3.35 (revid:martin.skold@mysql.com-20100617114611-61aqbb52j752y116) (version source revid:vasil.dimov@oracle.com-20100331130613-8ja7n0vh36a80457) (merge vers: 5.1.46) (pib:16)

This bug http://bugs.mysql.com/bug.php?id=57480 shows the similar issue but with only 256 tables

Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:alexander.nozdrin@oracle.com-20101113152450-2zzcm50e7i4j35v7) (merge vers: 5.6.1-m4) (pib:21)

Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (pib:21)

Posted by developer:
 
Fixed as of the 5.6.2 release, and here's the changelog entry:

To ease the memory load on systems with huge numbers of tables, InnoDB
now frees up the memory associated with an opened table using an LRU
algorithm to select tables that have gone the longest without being
accessed. To reserve more memory to hold metadata for open InnoDB tables,
increase the value of the table_definition_cache configuration option.
InnoDB treats this value as a soft limit for the number of open table
instances in the InnoDB data dictionary cache. The actual number of tables
with cached metadata could be higher than the value specified for
table_definition_cache, because metadata for InnoDB system tables, and
parent and child tables in foreign key relationships, is never evicted
from memory. For additional information, refer to the
table_definition_cache documentation.