Bug #119513 MySQL Router SSL Error Leading to Crash
Submitted: 5 Dec 1:27
Reporter: Hyun Young Jung Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Router Severity:S2 (Serious)
Version:8.0.40 OS:Linux (Rocky Linux 8.10 (Green Obsidian))
Assigned to: CPU Architecture:x86
Tags: crash, OpenSSL 3.0, protocol shutdown, router, SSL

[5 Dec 1:27] Hyun Young Jung
Description:
### Problem Summary
MySQL Router 8.0.40 crashes with SSL protocol shutdown error, causing the router to terminate unexpectedly. The error occurs during SSL handshake/communication in the routing layer when processing client connections, resulting in complete router shutdown requiring manual restart. **Important**: This error occurs even when most clients are configured with `useSSL=false` (explicitly requesting non-SSL connections), suggesting Router may be attempting SSL negotiation despite client's non-SSL preference when Router SSL mode is set to PREFERRED.

### Environment
- **MySQL Router Version**: 8.0.40 (MySQL Community - GPL)
- **Operating System**: Rocky Linux 8.10 (Green Obsidian)
- **Kernel**: Linux 4.18.0-553.34.1.el8_10.x86_64
- **Architecture**: x86_64
- **OpenSSL Version**: 3.0.15 (3 Sep 2024)
- **Router Name**: innoherab-myrouter01
- **Cluster Type**: InnoDB Cluster (Group Replication)
- **Cluster Name**: herab-clusterr
- **Number of Cluster Members**: 5 (all ONLINE)

### Configuration
```
[DEFAULT]
client_ssl_mode = PREFERRED
server_ssl_mode = PREFERRED
server_ssl_verify = DISABLED
client_ssl_cert = /data/mysqlrouter/data/router-cert.pem
client_ssl_key = /data/mysqlrouter/data/router-key.pem

[routing:herab_rw]
bind_address = 0.0.0.0
bind_port = 6446
destinations = metadata-cache://herab-clusterr/?role=PRIMARY
routing_strategy = first-available
protocol = classic
connection_sharing = 1
```

### Error Message
```
2025-12-04 19:28:49 routing ERROR [7f71797fa700] classic::loop() processor failed: error:0A0000CF:SSL routines::protocol is shutdown (tls_err:167772367)
```

### Error Details
- **OpenSSL error code**: 0A0000CF
- **TLS error code**: 167772367
- **Error location**: classic::loop() processor
- **Thread ID**: 7f71797fa700
- **Error type**: SSL routines::protocol is shutdown

### Log Sequence
```
2025-12-04 19:28:49 sql DEBUG [7f71805d0700] innoherab-mydb01:3306 (3015 us)> select cluster_type from mysql_innodb_cluster_metadata.v2_this_instance // OK 1 row
2025-12-04 19:28:49 sql DEBUG [7f71805d0700] innoherab-mydb01:3306 (1876 us)> select count(clusterset_id) from mysql_innodb_cluster_metadata.v2_this_instance i join mysql_innodb_cluster_metadata.v2_cs_members csm on i.cluster_id = csm.cluster_id where clusterset_id is not null // OK 1 row
2025-12-04 19:28:49 sql DEBUG [7f71805d0700] innoherab-mydb01:3306 (334 us)> SELECT member_state FROM performance_schema.replication_group_members WHERE CAST(member_id AS char ascii) = CAST(@@server_uuid AS char ascii) // OK 1 row
2025-12-04 19:28:49 sql DEBUG [7f71805d0700] innoherab-mydb01:3306 (317 us)> SELECT SUM(IF(member_state = 'ONLINE', 1, 0)) as num_onlines, SUM(IF(member_state = 'RECOVERING', 1, 0)) as num_recovering, COUNT(*) as num_total FROM performance_schema.replication_group_members // OK 1 row
2025-12-04 19:28:49 sql DEBUG [7f71805d0700] innoherab-mydb01:3306 (512 us)> select C.cluster_id, C.cluster_name, I.mysql_server_uuid, I.endpoint, I.xendpoint, I.attributes from mysql_innodb_cluster_metadata.v2_instances I join mysql_innodb_cluster_metadata.v2_gr_clusters C on I.cluster_id = C.cluster_id where C.group_name = 'a87f5f72-6c4e-11f0-ae3c-005056a9e37c' // OK 5 rows
2025-12-04 19:28:49 sql DEBUG [7f71805d0700] innoherab-mydb01:3306 (1447 us)> select count(clusterset_id) from mysql_innodb_cluster_metadata.v2_this_instance i join mysql_innodb_cluster_metadata.v2_cs_members csm on i.cluster_id = csm.cluster_id where clusterset_id is not null // OK 1 row
2025-12-04 19:28:49 sql DEBUG [7f71805d0700] innoherab-mydb01:3306 (204 us)> COMMIT // OK
2025-12-04 19:28:49 metadata_cache DEBUG [7f71805d0700] Updating cluster status from GR for 'herab-cluster'
2025-12-04 19:28:49 metadata_cache DEBUG [7f71805d0700] Connected to cluster 'herab-cluster' through innoherab-mydb01:3306
2025-12-04 19:28:49 sql DEBUG [7f71805d0700] innoherab-mydb01:3306 (927 us)> show status like 'group_replication_primary_member' // OK 1 row
2025-12-04 19:28:49 sql DEBUG [7f71805d0700] innoherab-mydb01:3306 (645 us)> SELECT member_id, member_host, member_port, member_state, @@group_replication_single_primary_mode FROM performance_schema.replication_group_members WHERE channel_name = 'group_replication_applier' // OK 5 rows
2025-12-04 19:28:49 metadata_cache DEBUG [7f71805d0700] Cluster 'herab-cluster' has 5 members in metadata, 5 in status table
2025-12-04 19:28:49 metadata_cache DEBUG [7f71805d0700] End updating cluster for 'herab-cluster'
2025-12-04 19:28:49 metadata_cache DEBUG [7f71805d0700] Finished refreshing the cluster metadata
2025-12-04 19:28:49 routing DEBUG [7f71737fe700] [routing:herab_rw] fd=198 connection accepted at 0.0.0.0:6446
2025-12-04 19:28:49 routing DEBUG [7f71797fa700] [routing:herab_rw] fd=198 -- 323: connection closed (up: 85b; down: 0b)
2025-12-04 19:28:49 routing ERROR [7f71797fa700] classic::loop() processor failed: error:0A0000CF:SSL routines::protocol is shutdown (tls_err:167772367)
-- MySQL Router Shutdown
-- MySQL Router Startup (systemctl)
2025-12-04 19:28:59 main SYSTEM [7f72a6632780] Starting 'MySQL Router', version: 8.0.40 (MySQL Community - GPL)
```

### Connection Details
- Connection was accepted successfully on routing port 6446
- **Client Connection**: Most clients connect with `useSSL=false` (non-SSL connection)
- Router SSL mode is set to `PREFERRED` (should allow both SSL and non-SSL connections)
- Connection closed after 323 bytes transferred (up: 85b; down: 0b)
- Error occurred in classic protocol loop processor during SSL communication
- Router process terminated immediately after error
- **Important**: The error occurs even when clients explicitly request non-SSL connections (`useSSL=false`)

### Expected Behavior
MySQL Router should handle SSL errors gracefully without crashing. It should:
1. Log the SSL error appropriately with sufficient context
2. Close the problematic connection without affecting other connections
3. Continue serving other active connections
4. Not terminate the entire router process
5. Provide retry mechanism for transient SSL errors

### Actual Behavior
1. SSL protocol shutdown error occurs during connection processing
2. Router crashes immediately without graceful error handling
3. All routing services stop (Read/Write, Read-Only, X Protocol)
4. Complete service outage requiring manual restart via systemctl
5. No recovery mechanism or error isolation

### Impact
- **Severity**: High - Complete service outage
- **Service Availability**: All routing services unavailable
- **Data Loss**: None (no data corruption)
- **Recovery**: Manual intervention required (systemctl restart)
- **Frequency**: Intermittent (occurs during normal operation)
- **Affected Services**: All routing endpoints (6446, 6447, 6448, 6449)

### Additional Context
- OpenSSL 3.0.15 is being used (OpenSSL 3.x has different behavior compared to 1.x)
- SSL mode is set to PREFERRED (allows both SSL and non-SSL connections)
- SSL certificates are properly configured and present
- **Client Configuration**: Most clients are configured with `useSSL=false` (explicitly requesting non-SSL connections)
- **Mismatch Issue**: Router SSL mode is PREFERRED, but clients request non-SSL, yet SSL protocol shutdown error still occurs
- Error occurs during normal operation, not during startup
- The router was operating normally before the error occurred
- This suggests a problem with Router's handling of SSL protocol negotiation when clients request non-SSL connections

How to repeat:
### Prerequisites
1. MySQL Router 8.0.40 installed
2. OpenSSL 3.0.x installed (tested with 3.0.15)
3. InnoDB Cluster configured with Group Replication
4. SSL certificates configured for Router
5. MySQL clients configured with `useSSL=false` (non-SSL connection mode)

### Steps to Reproduce
1. Configure MySQL Router with SSL mode PREFERRED for both client and server:
   ```
   [DEFAULT]
   client_ssl_mode = PREFERRED
   server_ssl_mode = PREFERRED
   server_ssl_verify = DISABLED
   client_ssl_cert = /path/to/router-cert.pem
   client_ssl_key = /path/to/router-key.pem
   ```

2. Start MySQL Router normally:
   ```bash
   systemctl start mysqlrouter
   # or
   /app/mysqlrouter/bin/mysqlrouter --config /data/mysqlrouter/mysqlrouter.conf
   ```

3. Verify Router is running and accepting connections:
   ```bash
   systemctl status mysqlrouter
   # Check logs for successful startup
   ```

4. Router operates normally for some time (may take minutes to hours)

5. Client connection is accepted on routing port (e.g., 6446):
   - Connection is established successfully
   - **Important**: Client connects with `useSSL=false` (explicitly requesting non-SSL connection)
   - Router SSL mode is PREFERRED (should handle both SSL and non-SSL)
   - SSL handshake/communication begins (or Router attempts SSL negotiation despite client's non-SSL request)

6. SSL error occurs during connection processing:
   - Error: `error:0A0000CF:SSL routines::protocol is shutdown`
   - Error occurs in `classic::loop()` processor

7. Router crashes and shuts down completely:
   - Router process terminates immediately
   - All routing services stop
   - Manual restart required

### Reproducibility
- **Frequency**: Intermittent (not consistently reproducible)
- **Timing**: Occurs during normal operation, typically when processing client connections
- **Conditions**: SSL mode PREFERRED, OpenSSL 3.0.x, normal traffic conditions

### Test Case
```bash
# Monitor router logs
tail -f /log/mysqlrouter/mysqlrouter.log

# Attempt connections to trigger the issue with useSSL=false
mysql -h router-host -P 6446 -u user -p --ssl-mode=DISABLED

# Or in connection string
mysql://user:password@router-host:6446/database?useSSL=false

# Or in JDBC connection string
jdbc:mysql://router-host:6446/database?useSSL=false

# Or use any MySQL client that connects through Router with useSSL=false
```

### Client Configuration Note
- **Most clients are configured with `useSSL=false`** (explicitly requesting non-SSL connections)
- Router is configured with `client_ssl_mode = PREFERRED` (should allow non-SSL when client requests it)
- The error occurs even when clients explicitly request non-SSL connections
- This suggests Router may be attempting SSL negotiation even when client requests non-SSL, or there's a protocol mismatch issue

### Expected vs Actual
- **Expected**: Router should handle SSL errors gracefully and continue serving
- **Actual**: Router crashes immediately upon SSL protocol shutdown error

Suggested fix:
### 1. Error Handling Improvement
Add proper error handling for SSL protocol shutdown errors in the classic protocol loop processor:
- Catch SSL protocol shutdown errors specifically
- Log the error with sufficient context (connection details, SSL state, client SSL preference)
- Handle cases where client requests non-SSL (`useSSL=false`) but Router attempts SSL negotiation
- Close the problematic connection gracefully
- Continue processing other connections
- **Important**: When client explicitly requests non-SSL (`useSSL=false`), Router should not attempt SSL negotiation

### 2. Graceful Error Recovery
Implement graceful error recovery instead of crashing:
- Isolate SSL errors to the affected connection only
- Do not propagate connection-level errors to the router process level
- Implement connection cleanup mechanism for failed SSL connections
- Add connection state validation before SSL operations

### 3. Retry Logic
Add retry logic for transient SSL errors:
- Implement exponential backoff for transient SSL errors
- Add configurable retry count and timeout
- Distinguish between transient and permanent SSL errors
- Provide fallback mechanism (e.g., non-SSL connection if SSL fails and mode is PREFERRED)
- **Client Preference Handling**: When client requests `useSSL=false`, Router should respect this and not attempt SSL negotiation
- If SSL negotiation fails and client requested non-SSL, fallback to non-SSL connection immediately

### 4. Error Logging Enhancement
Improve error logging to provide more context about SSL state:
- Log SSL handshake state when error occurs
- Include connection metadata (client IP, port, connection ID)
- Log OpenSSL error details (error code, error string)
- Add SSL state dump capability for debugging

### 5. OpenSSL 3.0 Compatibility
Review and update SSL handling code for OpenSSL 3.0 compatibility:
- OpenSSL 3.0 has different behavior compared to 1.x
- Ensure proper initialization and cleanup of SSL contexts
- Review SSL protocol shutdown handling
- Test with both OpenSSL 1.x and 3.x

### 6. Connection Isolation
Implement better connection isolation:
- Ensure connection-level errors do not affect router process
- Add connection pool management for failed connections
- Implement circuit breaker pattern for repeated SSL failures
- Add health check mechanism for SSL connections

### 7. Configuration Options
Add configuration options for SSL error handling:
- Configurable SSL error handling behavior (crash vs. graceful)
- SSL error retry configuration
- SSL connection timeout settings
- SSL error logging verbosity