MySQL Bugs: #102177: TPCC performance worse in NVMe PCIE4.0 SSD than in SATA SSD

Bug #102177	TPCC performance worse in NVMe PCIE4.0 SSD than in SATA SSD
Submitted:	7 Jan 2021 13:05	Modified:	21 Jan 2021 13:10
Reporter:	haochen he	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Server	Severity:	S2 (Serious)
Version:	8.0.22	OS:	Ubuntu (20.10-Desktop)
Assigned to:		CPU Architecture:	x86 (AMD Ryzen 9 3900XT)
Tags:	performance, SSD

Description:
Environment:
  CPU: AMD Ryzen 9 3900XT
  MEM: 64GB DDR4 3200MHz
  Filesystem: all are Ext4
  Disks:
    SATA SSD: Samsung 860 Evo 500GB (SATA)
    NVMe SSD: Samsung 980 Pro 500GB (M.2 PCIe-4.0)

Configurations:
  All default values.

Workload:
  TPCC: https://github.com/Percona-Lab/tpcc-mysql
  ./tpcc_start -hlocalhost -dtpcc10 -uroot -w10 -c32 -r10 -l100

Result (10 seconds per line):
  NVMe SSD
	Transactions	 95% Latency(ms) 99% Latency(ms) Max Latency(ms)
	591	421.557	560.722	824.984
	549	431.903	615.245	872.602
	583	458.413	589.994	700.353
	570	473.9	645.815	850.923
	558	504.646	619.124	877.969
	506	541.586	651.836	856.785
	505	541.1	680.547	1039.654
	588	496.555	642.922	846.336
	555	481.334	669.034	1023.519
	571	444.229	584.37	768.01
  SATA SSD
        Transactions	95% Latency	99% Latency	Max Latency
	1186	187.812	235.577	337.117
	1107	218.265	286.348	369.749
	1166	205.704	272.223	460.509
	1150	199.877	271.734	387.682
	1237	173.178	227.606	312.985
	1278	161.076	207.249	350.234
	1278	157.736	192.479	308.79
	1280	157.312	205.151	270.539
	1219	168.979	220.102	330.278
	1201	176.845	219.575	303.207

I am pretty sure I didn't make any stupid mistakes. This case is just one of many cases I have run that have the same result. I also do `vdbench` test on these two devices to prove that 980 pro is superior to 860 evo:

  > ./vdbench -f test_config.txt

######### CONTENT OF test_config.txt ########
sd=ssd-fast,lun=/dev/nvme1n1p1
sd=ssd-slow,lun=/dev/sda1
sd=hdd-fast,lun=/dev/sdc1
sd=hdd-slow,lun=/dev/sdd1
wd=wd1,sd=ssd-fast,xfersize=4k,rdpct=0
wd=wd2,sd=ssd-slow,xfersize=4k,rdpct=0
wd=wd3,sd=hdd-fast,xfersize=4k,rdpct=0
wd=wd4,sd=hdd-slow,xfersize=4k,rdpct=0
wd=wd5,sd=ssd-fast,xfersize=4k,rdpct=100
wd=wd6,sd=ssd-slow,xfersize=4k,rdpct=100
wd=wd7,sd=hdd-fast,xfersize=4k,rdpct=100
wd=wd8,sd=hdd-slow,xfersize=4k,rdpct=100
rd=run1,wd=wd*,iorate=max,openflags=o_direct,elapsed=30,interval=3
######################################################

And the *IOPS* of the two devices:
  NVMe SSD: 271082.7
  SATA SSD: 54806.8

Quite interesting results. What happened?

How to repeat:
Running the workload above on the two devices.

Questions I have:
* what is fsync, read & write latency for the devices
* are devices healthy (check dmesg)
* what is device status ("nvme smart-log" for nvme, smartctl for sata)
see http://smalldatum.blogspot.com/2017/10/wearing-out-ssd.html

Thanks for the reply.

The result of smart-log & smartctl:

➜  ~ sudo nvme smart-log /dev/nvme1
Smart Log for NVME device:nvme1 namespace-id:ffffffff
critical_warning			: 0
temperature				: 35 C
available_spare				: 100%
available_spare_threshold		: 10%
percentage_used				: 0%
endurance group critical warning summary: 0
data_units_read				: 302,863
data_units_written			: 949,569
host_read_commands			: 11,674,407
host_write_commands			: 31,466,241
controller_busy_time			: 255
power_cycles				: 23
power_on_hours				: 7
unsafe_shutdowns			: 9
media_errors				: 0
num_err_log_entries			: 0
Warning Temperature Time		: 0
Critical Composite Temperature Time	: 0
Temperature Sensor 1           : 35 C
Temperature Sensor 2           : 45 C
Thermal Management T1 Trans Count	: 0
Thermal Management T2 Trans Count	: 0
Thermal Management T1 Total Time	: 0
Thermal Management T2 Total Time	: 0

➜  ~ sudo smartctl --all /dev/sda
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.8.0-36-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 860 EVO 500GB
Serial Number:    S3Z3NB0NA09331Z
LU WWN Device Id: 5 002538 e90a06b62
Firmware Version: RVT04B6Q
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Jan 14 09:16:51 2021 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x53) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  85) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       463
 12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       19
177 Wear_Leveling_Count     0x0013   099   099   000    Pre-fail  Always       -       7
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   072   049   000    Old_age   Always       -       28
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       4
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       1696163175

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The fsync latency:

````````````````The code I use to get fsync latency``````````````
#!/usr/bin/python

import os, sys, mmap

# Open a file
fd = os.open( "testfile", os.O_RDWR|os.O_CREAT|os.O_DIRECT )

m = mmap.mmap(-1, 512)

for i in range (1,1000):
   os.lseek(fd,os.SEEK_SET,0)
   m[1] = "1"
   os.write(fd, m)
   os.fsync(fd)

# Close opened file
os.close( fd )

`````````````````````````````````

Result of fsync latency (which is very STRANGE):
  0.895 ms for Samsung 860 evo (SATA)
  5.352 ms for Samsung 980 pro (NVMe)

For read/write latency: from the official document of these 2 devices:
https://s3.ap-northeast-2.amazonaws.com/global.semi.static/Samsung_SSD_860_EVO_Data_Sheet_...
https://s3.ap-northeast-2.amazonaws.com/global.semi.static/Samsung_NVMe_SSD_980_PRO_Data_S...
980 pro is superior to 860 evo in ALL aspects (i.e. random read/write IOPS, sequential read/write throughput).

What's more I noticed is when I disable the double-write buffer, the performance difference of the two devices is much more narrowed. Please refer to
https://dba.stackexchange.com/questions/282927/using-double-write-buffer-is-8x-slower-in-s...

using dmesg, I do not see any error message about anything after some performance experiments on theses two devices. Actually, I have just bought these two devices from Samsung official mall for 2 weeks.

Hi Mr. he,

Thank you for your bug report.

However, this is not bug in MySQL.

Simply, you diagnosed yourself a latency which is totally independent of our server.

Not a bug.

@haochen he: do you use a Ryzen CPU and what motherboard is it ? 
I experienced this with Ryzen 5 3600 CPU and Gigabyte B550M DS3H motherboard. 

I didn't test with other systems, but I found other people experiencing the same latency in MySQL with Samsung 980 pro and Ryzen CPUs. I am trying to see if the Ryzen or motherboard might be the cause. 

With a Kingston DC1000B NVME it didn't have this bad latency, but with many other NVMEs I seen same bad latency.