Bug #98284 Low sysbench score in the case of a large number of connections
Submitted: 19 Jan 2020 8:58 Modified: 27 Feb 2020 13:17
Reporter: Zhang JiYang Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Connection Handling Severity:S3 (Non-critical)
Version:8.0 OS:Linux (3.10.0-327.x86_64)
Assigned to: CPU Architecture:x86
Tags: point_select, poll, ppoll, sysbench

[19 Jan 2020 8:58] Zhang JiYang
Description:
I run a sysbench test in MySQL 8.0:

sysbench oltp_point_select --mysql-user=sbtest --tables=16 --table_size=1000000 --threads=1024 prepare/ run.

I get a pretty low score, which is less than 1/3 of the same benchmark in MySQL 5.7.

I use perf to find out what wrong it is, then I found out almost 60% time spent in kernel, and the stack is:

_raw_spin_lock_irq
__set_current_blocked
sigprocmask
sys_ppoll
system_call_fastpath
ppoll
vio_socket_io_wait
vio_read
net_read_raw_loop
net_read_packet
my_net_read
Protocol_classic::read_packet
Protocol_classic::get_command
do_command
handle_connection
pfs_spawn_thread
start_thread

It's seems that the system call ppoll acquire a global lock to set signal mask. I use poll rather than ppoll and bench again, as a result, I get a score that is close to MySQL 5.7.

Is it an expected or known behavior?Is it necessary to replace poll with ppoll? What the effect is here? 

Please tell me about this. Thanks a lot!

How to repeat:
Run sysbench:

sysbench oltp_point_select --mysql-user=sbtest --tables=16 --table_size=1000000 --threads=1024 prepare/ run.
[21 Jan 2020 2:49] zhai weixiang
I can repeat the issue on my dev machine, and poll behaviors much better comparing to ppoll while running point select workload.(400k vs 1440k qps)
[21 Jan 2020 12:24] Ståle Deraas
Can the issue be related to: https://lkml.org/lkml/2016/9/27/213 ?
[3 Feb 2020 7:20] Fangxin Flou
code to disable ppoll temporary

diff --git a/include/violite.h b/include/violite.h
index e3a3ccaa539..a6a5d76b8c0 100644
--- a/include/violite.h
+++ b/include/violite.h
@@ -53,6 +53,17 @@ struct Vio;
 #define USE_PPOLL_IN_VIO
 #endif

+#if defined(__linux__)
+# include <linux/version.h>
+#if LINUX_VERSION_CODE < KERNEL_VERSION(4, 19, 0)
+#ifdef USE_PPOLL_IN_VIO
+#undef USE_PPOLL_IN_VIO
+#include <signal.h>
+#include <atomic>
+#endif
+#endif
+#endif
+
 #if defined(__cplusplus) && defined(USE_PPOLL_IN_VIO)
 #include <signal.h>
 #include <atomic>
[3 Feb 2020 9:02] Zhang JiYang
It seems that there's something wrong with the ppoll under low version of kernel.

But I still have a problem:

Pls let me know what the purpose of replacing poll with ppoll. Thanks a lot.
[27 Feb 2020 13:17] MySQL Verification Team
Hi Mr. zjy,

Thank you for your bug report.

I have tested your benchmark on my iMac. Both with latest 5.7 and latest 8.0. A difference in speed is 10 %.

Hence, I can't repeat your findings.

This is an issue in the Linux kernel, which is explained in the link posted by our Stale Deraas within this bug.

We have to use ppoll() instead of poll(), in order to be able to reliably catch signals.

Hence, you should bring this issue up with the teams that are maintaining Linux kernel.