Bug #115727 MySQL InnoDB Cluster - Systemd - Kill the process
Submitted: 30 Jul 11:37 Modified: 9 Aug 13:18
Reporter: Galamb Gergo Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Group Replication Severity:S3 (Non-critical)
Version:8.0.36 OS:Red Hat
Assigned to: MySQL Verification Team CPU Architecture:x86
Tags: Innodb Cluster Crash, stop MySQL, systemd

[30 Jul 11:37] Galamb Gergo
Description:
Today we edited some value in MySQL config file (my.cnf) group_replication_member_expel_timeout = 15, group_replication_unreachable_majority_timeout=10 after the edit we restarted the MySQL service on secondary node first, but when the MySQL came back we got a Duplicate Enrty (Error 1062).

"ERROR: applier thread of Group Replication Recovery channel stopped with an error: Worker 1 failed executing transaction '3d853dd0-312c-11eb-b427-005056862228:106407101' at source log , end_log_pos 194424293; Could not execute Write_rows event on table <database>.<table>; Duplicate entry '15740509230' for key '<tabe>.PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's source log binlog.000312, end_log_pos 194424293 (1062) at 2024-07-25 10:03:34.786990", 

	
As far as I can tell - the trigger for this is not a new version of systemd, but a new version of the mysqld.service file in connection with one of the latest patch rounds of MYSQL.I.e. it is a bug in a patch we have received from Oracle
explanation here As I understand it now, the meaning of this sentence, "To maintain backwards compatibility, "0" continues to turn off previously existing timeout settings." , is that for service configuration files created before version 229 came into effect, the old interpretation of TimeoutSec=0 would be upheld, But for service configuration files newer than the release of 229 - the new interpretation - i.e. now - would be upheld.And - in one of the latest patch rounds of MYSQL - a newer mysqld.service file had been deployed - it was suddenly from march 27 2024 and not from 2022, which triggered the new interpretation and not the old one..... 

	
We tried the same stop process on PREPROD Env.

We stop the MySQL with systemctl stop mysqld command and then we checked the status with systemd : you can see the status (code) was killed. Zero note in the log file about shutdown process. After we edited the system service file Timeoutsec = 0 to infinity and we start and stop the MySQL with systemctl and checked the status. The status (code) was exited. The Shutdown process note in log file.

How to repeat:
	
We tried the same stop process on PREPROD Env.

We stop the MySQL with systemctl stop mysqld command and then we checked the status with systemd : you can see the status (code) was killed. Zero note in the log file about shutdown process. After we edited the system service file Timeoutsec = 0 to infinity and we start and stop the MySQL with systemctl and checked the status. The status (code) was exited. The Shutdown process note in log file.
[30 Jul 11:48] MySQL Verification Team
HI Mr. Gergo,

Thank you for your bug report.

However, let us inform you that this is a forum for the reports with fully repeatable test cases. Such a test case should consist of a set of SQL commands that always lead to the bug that is reported.

We are waiting on your proper and full test case.

Can't repeat.
[30 Jul 11:51] Galamb Gergo
We use these command when we recognize the issue: 
 - systemctl stop mysql and we see the MySQL process was killed by the systemd with (code: kill, signal=TERM)
[30 Jul 13:02] MySQL Verification Team
Hi,

I need you to clarify if you are reporting the problem with duplicate entry ( Error 1062 ) or on how systemd killed mysqld after you initiated stop?

> the latest patch rounds of MYSQL

I do not understand what you mean by this. We do not issue patches, we issue full releases. You were upgrading from what version to what version exactly?

Thanks
[30 Jul 13:14] Terje Røsten
Hi,

thanks for your report.

How does systemd check that this is true:

"for service configuration files newer than the release of 229"

Systemd service configuration files don't have versions as far as I know and TimeoutSec=0 should work as before?
[30 Jul 13:34] Galamb Gergo
So when we stopped the MySQL node with systemctl stop mysqld (which node was part of the cluster) stopped immediately (systemd killed the process) so the MySQL can not process the all transaction log as we see, and then we started the MySQL node we get few Duplicate Error.

I I guess this both issue is related to each other.
[30 Jul 13:42] Galamb Gergo
the point being we need
A. an explanation of why systemd stopped mysql with Kill-9
B. how to prevent it from happening again
C. therefore could it have caused duplication?
[30 Jul 14:55] Terje Røsten
Not sure if I follow, if you do

 systemctl mysqld stop

mysqld will of course stop as systemd will first send sigterm and mysqld will do a graceful shutdown. This should be clear from systemd and mysqld logs.

You see something different?
[30 Jul 19:39] Galamb Gergo
Yes, The MySQL immediately stop and does nothing to the log. No create a log wit eg. Shutdown Normal etc... just stop/cut the power.

If I stop the MySQL with mysqladmin shutdown / mysqlsh> shutdown  command then the MySQL stop normal and add some note to the log file also.
[31 Jul 7:26] Terje Røsten
What platform and packages are used?

Can you post output from:

$ rpm -qa | grep mysql

and

$ service mysqld status

when server is running.
[31 Jul 7:46] Galamb Gergo
systemctl status mysqld
● mysqld.service - MySQL Server
   Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/mysqld.service.d
           └─override.conf
   Active: active (running) since Fri 2024-07-26 22:19:19 UTC; 4 days ago
     Docs: man:mysqld(8)
           http://dev.mysql.com/doc/refman/en/using-systemd.html
  Process: 1501688 ExecStartPost=/bin/bash -c echo 2 > /sys/fs/cgroup/memory/system.slice/mysqld.service/memory.swappiness (code=exited, status=0/SUCCESS)
  Process: 1499198 ExecStartPre=/usr/bin/mysqld_pre_systemd (code=exited, status=0/SUCCESS)
 Main PID: 1499231 (mysqld)
   Status: "Server is operational"
    Tasks: 88 (limit: 1234377)
   Memory: 181.9G
   CGroup: /system.slice/mysqld.service
           └─1499231 /usr/sbin/mysqld

rpm -qa | grep mysql
mysql-commercial-common-8.0.36-1.1.el8.x86_64
mysql-commercial-client-8.0.36-1.1.el8.x86_64
mysql-commercial-debuginfo-8.0.36-1.1.el8.x86_64
mysql-commercial-icu-data-files-8.0.36-1.1.el8.x86_64
mysql-commercial-backup-debuginfo-8.0.36-1.1.el8.x86_64
mysql-commercial-server-8.0.36-1.1.el8.x86_64
mysql-connector-python3-commercial-8.0.32-1.1.el8.x86_64
mysql-commercial-client-plugins-8.0.36-1.1.el8.x86_64
mysql-shell-commercial-8.0.36-1.1.el8.x86_64
mysql-commercial-backup-8.0.36-1.1.el8.x86_64
mysql-commercial-libs-8.0.36-1.1.el8.x86_64
[31 Jul 10:22] Terje Røsten
I am not able to reproduce this.

After "service mysqld stop" I get in /var/log/mysqld.log:

2024-07-31T10:15:44.652573Z 0 [System] [MY-013172] [Server] Received SHUTDOWN from user <via user signal>. Shutting down mysqld (Version: 8.0.36-commercial).
2024-07-31T10:15:45.858143Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.36-commercial)  MySQL Enterprise Server - Commercial.

You seems to have a custom mysqld.service file, what is output from:

$ cat /etc/systemd/system/mysqld.service.d/override.conf

and

$ cat /usr/lib/systemd/system/mysqld.service
[31 Jul 12:33] Galamb Gergo
We use 
systemctl --version
systemd 239 (239-74.el8_8.5)

and the TimeoutSec value was 0 when we recognised the issue (kill the MySQL - (code=killed, signal=TERM) and then the TimeoutSec changed to 0.

We have about 30TB data in MySQL.

/etc/systemd/system/mysqld.service.d/override.conf
[Service]
LimitNOFILE=17000
LimitNICE=-20
ExecStartPost=+/bin/bash -c "echo 2 > /sys/fs/cgroup/memory/system.slice/mysqld.service/memory.swappiness"

# with the update to systemd 229 - 0 no longer means infinity
# Disable service start and stop timeout logic of systemd for mysqld service. Must ues infinity instead
TimeoutSec=infinity

/usr/lib/systemd/system/mysqld.service

[Unit]
Description=MySQL Server
Documentation=man:mysqld(8)
Documentation=http://dev.mysql.com/doc/refman/en/using-systemd.html
After=network-online.target
Wants=network-online.target
After=syslog.target

[Install]
WantedBy=multi-user.target

[Service]
User=mysql
Group=mysql

Type=notify

# Disable service start and stop timeout logic of systemd for mysqld service.
TimeoutSec=infinity

# Execute pre and post scripts as root
# hence, + prefix is used

# Needed to create system tables
ExecStartPre=+/usr/bin/mysqld_pre_systemd

# Start main service
ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS

# Use this to switch malloc implementation
EnvironmentFile=-/etc/sysconfig/mysql

# Sets open_files_limit
LimitNOFILE = 10000

Restart=on-failure

RestartPreventExitStatus=1

# Set enviroment variable MYSQLD_PARENT_PID. This is required for restart.
Environment=MYSQLD_PARENT_PID=1

PrivateTmp=false