Bug #87579 information_schema.processlist should handle utfmb4 characters in query
Submitted: 29 Aug 2017 10:56
Reporter: Øystein Grøvlen Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Information schema Severity:S3 (Non-critical)
Version:5.7.19 OS:Any
Assigned to: CPU Architecture:Any

[29 Aug 2017 10:56] Øystein Grøvlen
Description:
mysql> select info, "🔥" from information_schema.processlist; +------------------------------------------------------+------+
| info                                                 | ?    |
+------------------------------------------------------+------+
| select info, "?" from information_schema.processlist | 🔥     |
+------------------------------------------------------+------+
1 row in set, 1 warning (0,00 sec)

Warning (Code 1366): Incorrect string value: '\xF0\x9F\x94\xA5" ...' for column 'INFO' at row 1

If you want to store the result of this query in a table, you get into bigger problems since this causes an error.  For an example, see: https://stackoverflow.com/questions/45549840/trouble-inserting-4-byte-utf-8-characters-emo...

The problem seem to be even worse for the corresponding performance_schema table:

mysql> select sql_text, "🔥" from performance_schema.events_statements_current;
+--------------------+------+
| sql_text           | ?    |
+--------------------+------+
| select sql_text, " | 🔥     |
+--------------------+------+
1 row in set, 1 warning (0,00 sec)

Warning (Code 1366): Incorrect string value: '\xF0\x9F\x94\xA5" ...' for column 'SQL_TEXT' at row 1

Here the text after the violating character is missing.

How to repeat:
select info, "🔥" from information_schema.processlist; 
select sql_text, "🔥" from performance_schema.events_statements_current;

Suggested fix:
Make sure to store query strings in the character set that is defined for the information_schema/performance_schema columns.
Change character set for such columns to utf8mb4 for 8.0 to avoid increasing number of issues.  (Since utf8mb4 is default character set in 8.0)
[29 Aug 2017 10:59] Øystein Grøvlen
Posted by developer:
 
utf8mb4 characters was stripped from internal bug report.  See external bug report for actual test cases.