MySQL Bugs: #97053: Mysql instance crash after many "buffer pool occupied by lock heaps" messages

Bug #97053	Mysql instance crash after many "buffer pool occupied by lock heaps" messages
Submitted:	27 Sep 2019 13:57	Modified:	7 Oct 2019 12:52
Reporter:	Avi Vainshtein	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server	Severity:	S3 (Non-critical)
Version:	5.7.23	OS:	CentOS (6.8)
Assigned to:		CPU Architecture:	Other (VMWare 64-bit)

Description:
Mission-critical DB Instance (5.7.23, community edition) crashed and started after numerous messages :
-------------------------------------------
...
2019-09-26T17:59:21.873376+02:00 2919048 [Warning] InnoDB: Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks. Your buffer pool size is 3071 MB. Maybe you should make the buffer pool bigger?. Starting the InnoDB Monitor to print diagnostics, including lock heap and hash index sizes.
...
2019-09-26T18:16:43.366062+02:00 2884262 [ERROR] [FATAL] InnoDB: Over 95 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do no
t set too many row locks. Your buffer pool size is 3071 MB. Maybe you should make the buffer pool bigger? We intentionally generate a seg fault to print a stack trace on Linux!
2019-09-26 18:16:43 0x7f8ab76fd700 InnoDB: Assertion failure in thread 140233759774464 in file ut0ut.cc line 942
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%15:16:43 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

key_buffer_size=8388608
read_buffer_size=131072
max_used_connections=895
max_threads=1500
thread_count=781
connection_count=780
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 604254 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f89b021fb30
...
2019-09-26T18:16:43.366062+02:00 2884262 [ERROR] [FATAL] InnoDB: Over 95 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do no
t set too many row locks. Your buffer pool size is 3071 MB. Maybe you should make the buffer pool bigger? We intentionally generate a seg fault to print a stack trace on Linux!
2019-09-26 18:16:43 0x7f8ab76fd700 InnoDB: Assertion failure in thread 140233759774464 in file ut0ut.cc line 942
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%15:16:43 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.

Thread pointer: 0x7f89b021fb30
-------------------------------------------

No messages in /var/log/messages.

Prior to the crash - a few (three) applicative sessions Locks were discovered.
Tried to kill two of them - they remained in Killed status till the crash.

Following messages trapped at the Application front-end :
"Message: The total number of locks exceeds the lock table sizeSource..."

The host has 48GN of RAM, innodb_buffer_pool_size is 24GB.

How to repeat:
Almost two monthes ago, on 29-jul-2019 that DB Instance was upgraded
from 5.6.26 to 5.7.23.

On the next day 30-jul-2019 similar behavior happened :
a lot of messages "[Warning] InnoDB: Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index" in the ERR log, but the DB Instance did not crash then.
Sine 30-july and till yesterday 26-sep - that didn't happen.

Suggested fix:
Can't determine the reason of the behavior, so no idea how to fix.

Hi Mr. VAinshtein,

Thank you for your bug report.

However, this is not a bug. Simply, you have misconfigured your MySQL server.

You should either increase buffer pool drastically and / or disable Adaptive Hash Index and / or  rewrite your DMLs in order to decrease the locks involved.

All of this is described in our Reference Manual.

more details :
my script which monitors the locks reported twice following messages prior to DB crash :
ERROR 1206 (HY000) at line 27: The total number of locks exceeds the lock table size

HI,

As I wrote before, this is not a bug.

Our Reference Manual explains in full detail what to do in case when you get this error.

Hello Sinisa Milivojevic,

Many thanks for your explanations.

As from my checks of the trace files - it seems that the issue was related
rather to Locks than to AHI. 

From the InnoDB Monitor Diagnostics - i've detected at least two transactions, which were Active for about 1 hour priot to crash. Both transactions were doing Insert, into separated tables. (i have to mention that there are a lot of ForeignKey relations between our applicative tables.)

During that last hour, prior to instance crash, both transaction remained
Active and were increasing constantly the values of "heap size" and "row lock(s)".

So, just before the crash, those values were "heap size 11428847824" (10.6GB) and "heap size 12953870544" (12GB).

Our innodb_buffer_pool_size is set to 24GB, so the summary of "heap size" from those two Active transactions became finally 22.6GB from 24GB total (>94%).

I've searched in the MySQL documentation for explanations about "heap size" of Active InnoDB transactions, but didn't find something useful.

Could you please kindly refer my to the proper documents, which could explain : under which conditions the "heap size" of an Active transactions keeps increasing ?

Best regards and looking forward your kind assistance,
Avi Vainshtein

By the way: the total size of first Insert-involved table is 0.5GB and of the second table is 3GB. How could it be that the "heap size" became 10.6GB and 12GB ?

Hi Mr. VAinshtein,

Long running transactions are not a problem in the MySQL server, but a problem in your application. It is possible that you made a mistake, so that user's input is waited upon while transaction has already started. There are many other possibilities, like waiting on some processing or network traffic, while transaction is active. 

That is a bad application programming and not our bug.

Hello

I understand that there are many other possibilities when an application can behave wrong/unexpected.
I also understand your statement that this is not a MySQL bug.

I am just trying to clarify under which conditions the locks "heap size" of an Active transaction keeps increasing constantly ?

I will explain myself with the following data : two transactions,
locked at two different "Insert ... Into... Values..." statements at 17:25.
Fifty minutes later, at 18:15, the same two transactions , locked at the same statements as at 17:25, but their "heap size" increased dramatically.
How can this be explained ?

Thu Sep 26 17:25:01 IDT 2019
---TRANSACTION 33679120556, ACTIVE 229 sec inserting
3 lock struct(s), heap size 607527120, 6822790 row lock(s), undo log entries 4
INSERT INTO APP_CMT_MissionAttributes (`ServiceOrderId`,`MissionId`,`SuMissionAttributeId`,`Amount`,`SuActionId`,`IsActive`,`CreatedBy`,`ModifiedBy`,`CreatedOn`,`ModifiedOn`) VALUES (10902699,NULL,27,NULL,1,1,'CMSRV','CMSRV','2019-09-26 17:21:13.285','2019-09-26 17:21:13.285')
---TRANSACTION 33679127185, ACTIVE 152 sec inserting
LOCK WAIT 24 lock struct(s), heap size 356802768, 3397223 row lock(s), undo log entries 6
INSERT INTO APP_ORD_CartChangesSummaryProducts (`CartChangesSummaryId`,`ProductLevelId`,`ProductActionId`,`ActivityId`,`ActivityActionId`,`ProductComponentSocId`,`ProductComponentSocFeatureId`,`PpChangeInd`,`FieldName`,`PrevValue`,`NewValue`) VALUES (890455,2,3,14,1,8815984,11881204,0,'StepId',NULL,'506')
...
...
Thu Sep 26 18:15:01 IDT 2019
---TRANSACTION 33679120556, ACTIVE 3229 sec inserting
3 lock struct(s), heap size 11057733840, 124183508 row lock(s), undo log entries 4
INSERT INTO APP_CMT_MissionAttributes (`ServiceOrderId`,`MissionId`,`SuMissionAttributeId`,`Amount`,`SuActionId`,`IsActive`,`CreatedBy`,`ModifiedBy`,`CreatedOn`,`ModifiedOn`) VALUES (10902699,NULL,27,NULL,1,1,'CMSRV','CMSRV','2019-09-26 17:21:13.285','2019-09-26 17:21:13.285')
---TRANSACTION 33679127185, ACTIVE 3152 sec inserting
LOCK WAIT 24 lock struct(s), heap size 12520612048, 119214792 row lock(s), undo log entries 6
INSERT INTO APP_ORD_CartChangesSummaryProducts (`CartChangesSummaryId`,`ProductLevelId`,`ProductActionId`,`ActivityId`,`ActivityActionId`,`ProductComponentSocId`,`ProductComponentSocFeatureId`,`PpChangeInd`,`FieldName`,`PrevValue`,`NewValue`) VALUES (890455,2,3,14,1,8815984,11881204,0,'StepId',NULL,'506')

Best regards,
Avi

Hi Mr. Vainshtein,

The answer to your question is simple. 

Number of locks will increase by each new row that is locked, for whatever the reason.

I would also like to remind you that this is not a forum for free support, but a forum for the bugs with fully repeatable test cases that demonstrate a bug.

One more comment ........

You have a transaction lasting for 45 minutes and you ask why is heap size for locks increasing ......