Description:
Log is full of these messages:
```
Jun 26 10:01:59 hostname mysqlrouter[3179808]: after_event_fired(1158, 00000000000000000000000000011000) generic:33 Numerical argument out of domain
Jun 26 10:10:19 hostname mysqlrouter[3179808]: after_event_fired(1106, 00000000000000000000000000011000) not in after_event_fired(after_event_fired(11000000000000000000000000000000
Jun 26 10:10:19 hostname mysqlrouter[3179808]: after_event_fired(1106, 00000000000000000000000000011000) generic:33 1152Numerical argument out of domain
Jun 26 10:10:19 hostname mysqlrouter[3179808]: , 00000000000000000000000000011000) not in 11000000000000000000000000000000
Jun 26 10:10:19 hostname mysqlrouter[3179808]: after_event_fired(1152, 00000000000000000000000000011000) 1149, 00000000000000000000000000011000) not in 11000000000000000000000000000000
Jun 26 10:10:19 hostname mysqlrouter[3179808]: generic:33 Numerical argument out of domain
Jun 26 10:10:19 hostname mysqlrouter[3179808]: after_event_fired(1149, 00000000000000000000000000011000) generic:33 Numerical argument out of domain
Jun 26 10:55:06 hostname mysqlrouter[3179808]: after_event_fired(1122, 00000000000000000000000000011000) not in 11000000000000000000000000000000
Jun 26 10:55:06 hostname mysqlrouter[3179808]: after_event_fired(1122, 00000000000000000000000000011000) generic:33 Numerical argument out of domain
Jun 26 11:16:11 hostname mysqlrouter[3179808]: after_event_fired(1102, 00000000000000000000000000011000) not in 11000000000000000000000000000000
Jun 26 11:16:11 hostname mysqlrouter[3179808]: after_event_fired(1102, 00000000000000000000000000011000) generic:33 Numerical argument out of domain
Jun 26 11:55:51 hostname mysqlrouter[3179808]: after_event_fired(1133, 00000000000000000000000000011000) not in 11000000000000000000000000000000
Jun 26 11:55:51 hostname mysqlrouter[3179808]: after_event_fired(1133, 00000000000000000000000000011000) generic:33 Numerical argument out of domain
Jun 26 12:53:45 hostname mysqlrouter[3179808]: after_event_fired(1116, after_event_fired(00000000000000000000000000011000) not in 11000000000000000000000000000000
Jun 26 12:53:45 hostname mysqlrouter[3179808]: after_event_fired(1116, 00000000000000000000000000011000) generic:33 Numerical argument out of domain
Jun 26 12:53:45 hostname mysqlrouter[3179808]: 1010, 00000000000000000000000000011000) not in 11000000000000000000000000000000
Jun 26 12:53:45 hostname mysqlrouter[3179808]: after_event_fired(1010, 00000000000000000000000000011000) generic:33 Numerical argument out of domain
Jun 26 13:20:10 hostname mysqlrouter[3179808]: after_event_fired(1162, 00000000000000000000000000011000) not in 11000000000000000000000000000000
Jun 26 13:20:10 hostname mysqlrouter[3179808]: after_event_fired(1162, 00000000000000000000000000011000) generic:33 Numerical argument out of domain
Jun 26 13:26:27 hostname mysqlrouter[3179808]: after_event_fired(1134, 00000000000000000000000000011000) not in 11000000000000000000000000000000
Jun 26 13:26:27 hostname mysqlrouter[3179808]: after_event_fired(1134, 00000000000000000000000000011000) generic:33 Numerical argument out of domain
Jun 26 14:21:20 hostname mysqlrouter[3179808]: after_event_fired(1136, 00000000000000000000000000011000) not in 11000000000000000000000000000000
Jun 26 14:21:20 hostname mysqlrouter[3179808]: after_event_fired(1136, 00000000000000000000000000011000) generic:33 Numerical argument out of domain
Jun 26 15:58:20 hostname mysqlrouter[3179808]: after_event_fired(1154, 00000000000000000000000000011000) not in 11000000000000000000000000000000
Jun 26 15:58:20 hostname mysqlrouter[3179808]: after_event_fired(1154, 00000000000000000000000000011000) generic:33 Numerical argument out of domain
Jun 26 16:12:57 hostname mysqlrouter[3179808]: after_event_fired(1139, 00000000000000000000000000011000) not in 11000000000000000000000000000000
Jun 26 16:12:57 hostname mysqlrouter[3179808]: after_event_fired(1139, 00000000000000000000000000011000) generic:33 Numerical argument out of domain
Jun 26 16:37:23 hostname mysqlrouter[3179808]: after_event_fired(1096, 00000000000000000000000000011000) not in 11000000000000000000000000000000
Jun 26 16:37:23 hostname mysqlrouter[3179808]: after_event_fired(1096, 00000000000000000000000000011000) generic:33 Numerical argument out of domain
Jun 26 22:40:38 hostname mysqlrouter[3179808]: after_event_fired(1218, 00000000000000000000000000011000) not in 11000000000000000000000000000000
Jun 26 22:40:38 hostname mysqlrouter[3179808]: after_event_fired(1218, 00000000000000000000000000011000) generic:33 Numerical argument out of domain
Jun 26 23:00:22 hostname mysqlrouter[3179808]: after_event_fired(1200, 00000000000000000000000000011000) not in 11000000000000000000000000000000
Jun 26 23:00:22 hostname mysqlrouter[3179808]: after_event_fired(1200, 00000000000000000000000000011000) generic:33 Numerical argument out of domain
```
The epoll event mask is always the same in all error messages, only file descriptor number differs:
```
Sep 24 12:25:24 hostname mysqlrouter[3690615]: after_event_fired(285, 00000000000000000000000000011000) not in 11000000000000000000000000000000
```
Let's interpret both numbers (mask expected, event received):
```
$ cat test.cc ; g++ test.cc -o test && ./test
#include <iostream>
#include <bitset>
#include <sys/epoll.h>
int main() {
std::cout
<< std::bitset<32>(EPOLLIN) << " EPOLLIN
"
<< std::bitset<32>(EPOLLOUT) << " EPOLLOUT
"
<< std::bitset<32>(EPOLLERR) << " EPOLLERR
"
<< std::bitset<32>(EPOLLONESHOT) << " EPOLLONESHOT
"
<< std::bitset<32>(EPOLLHUP) << " EPOLLHUP
"
<< std::bitset<32>(EPOLLET) << " EPOLLET
"
<< "00000000000000000000000000011000 Received from epoll
"
<< "11000000000000000000000000000000 Mask set for this particular file descriptor"
<< std::endl;
return 0;
}
00000000000000000000000000000001 EPOLLIN
00000000000000000000000000000100 EPOLLOUT
00000000000000000000000000001000 EPOLLERR
01000000000000000000000000000000 EPOLLONESHOT
00000000000000000000000000010000 EPOLLHUP
10000000000000000000000000000000 EPOLLET
00000000000000000000000000011000 Received from epoll
11000000000000000000000000000000 Mask set for this particular file descriptor
```
Mask expected for the event is: EPOLLONESHOT combined with EPOLLET.
Technically, mysqlrouter disabled all events for this particular file descriptor.
What event kernel sends to mysqlrouter? It sends EPOLLERR with EPOLLHUP
In https://elixir.bootlin.com/linux/v4.18/source/net/ipv4/tcp.c#L524 we can see
```
if (sk->sk_shutdown == SHUTDOWN_MASK || state == TCP_CLOSE)
mask |= EPOLLHUP;
/* This barrier is coupled with smp_wmb() in tcp_reset() */
smp_rmb();
if (sk->sk_err || !skb_queue_empty(&sk->sk_error_queue))
mask |= EPOLLERR;
```
MySQL router expects the descriptor to have mask ONESHOT and ET; In https://github.com/mysql/mysql-server/blob/mysql-cluster-8.0.29/router/src/harness/tests/t... we can see:
```
SCOPED_TRACE("// check fd-interest after remove");
interest_res = io_svc.interest(fds.first);
ASSERT_TRUE(interest_res);
EXPECT_EQ(interest_res.value(), EPOLLET | EPOLLONESHOT);
```
Thus the problematic scenario is:
- The socket is closed
- mysqlrouter removes "interest" for fd
- kernel notifies (probably for the second time, regardless to ONESHOT and ET)
The whole situation could be related to https://lwn.net/Articles/864947/
Credit of analysis goes to Nickolay Ihalainen.
How to repeat:
Unknown; Happens in customer environment and we are unable to record strace or equivalent for long enough to capture the epoll sequence of events
Suggested fix:
- Handle the unexpected kernel reply
- Make those lines debug only (they don't seem to add any valuable information for users, and the issue should be innocuous)