Bug #73721 Request to remove non-ASCII characters from errmsg-utf8.txt for English errors
Submitted: 26 Aug 2014 3:22 Modified: 27 Aug 2014 23:54
Reporter: Roel Van de Paar Email Updates:
Status: Open Impact on me:
None 
Category:MySQL Server: Errors Severity:S4 (Feature request)
Version:5.6 OS:Any
Assigned to: CPU Architecture:Any

[26 Aug 2014 3:22] Roel Van de Paar
Description:
In the past there have been a number of bugs discovered where an error message returned by the server was corrupted:
* http://bugs.mysql.com/bug.php?id=42685 
* http://bugs.mysql.com/bug.php?id=47412

RQG already automatically checks for this and will terminate a trial when corruption is detected (characters outside printable ASCII range). This is great, however there are a some messages in the MySQL error messages file which could trigger this detection incorrectly. It is likely the result of worldwide programmars adding error message (even if English) outside the standard ASCII range. 

The main benefit of fixing this woud be that this would no loger terminate RQG runs incorrectly, and furthermore that any tools which relay these messages to a GUI would more correctly display these error messages.

Fixing this up also has a very very small optimization benefit - error messages (return to client and/or written to logs) will be smaller - 1 byte instead of 3 bytes. For often seen messages this makes a very small difference.

How to repeat:
Finding the characters can be done with a good editor (VIM has the "ga" shortcut which shows you the chacter value) or in a more manual fashion:

[roel@localhost share]$ cat errmsg-utf8.txt | grep "[ \t]*eng" | sed 's/[ \t]//g' | sed 's/\(.\)/\n\1/g' | sort -u | egrep -v "[-:';%\"\!\?=/_\#a-zA-Z0-9
\@]" > out.txt
[roel@localhost share]$ vi out.txt    <---- remove some leftover valid chars
[roel@localhost share]$ cat out.txt   <---- I have added character codes 
?= 47196
z = 382
a = 945
? = 916
e = 949
? = 951
? = 942
? = 953
? = 943
? = 954
? = 955
ยต = 956
? = 957
? = 959
? = 928
p = 960
? = 961
s = 963
t = 964
? = 967
? = 974

Suggested fix:
Then just find a good/easy way to replace them (in editor or small script).