Bug #94761 "Upper ascii" or high-bit characters not mapping correctly in Terminal on Mac OS
Submitted: 24 Mar 2019 21:06 Modified: 26 Mar 2019 11:52
Reporter: Karen x Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Command-line Clients Severity:S3 (Non-critical)
Version:MySQL Community Server 8.0.15, 8.0.15 OS:MacOS (High Sierra 10.13.6)
Assigned to: CPU Architecture:Any
Tags: Mac OS, Special Characters, terminal

[24 Mar 2019 21:06] Karen x
Description:
Using Terminal as my client, when entering a query such as

SELECT CHAR( ); any number above 126 produces only a question mark in response.

The characters appear not to be mapping correctly.

The settings for my client, connection, database, etc. are as follows:

mysql> SHOW VARIABLES LIKE 'char%';
+--------------------------+-----------------------------------------------------------+
| Variable_name            | Value                                                     |
+--------------------------+-----------------------------------------------------------+
| character_set_client     | utf8mb4                                                   |
| character_set_connection | utf8mb4                                                   |
| character_set_database   | utf8mb4                                                   |
| character_set_filesystem | binary                                                    |
| character_set_results    | utf8mb4                                                   |
| character_set_server     | utf8mb4                                                   |
| character_set_system     | utf8                                                      |
| character_sets_dir       | /usr/local/mysql-8.0.15-macos10.14-x86_64/share/charsets/ |
+--------------------------+-----------------------------------------------------------+
8 rows in set (0.02 sec)

I am a very new SQL learner, and this is the best I can do to explain it. I posted about this on the Newbie forum here https://forums.mysql.com/read.php?10,673558,673558#msg-673558 and after much back and forth, it was suggested I report this as a possible bug.

How to repeat:
In Terminal on a Mac running High Sierra 10.13.6 with the latest version of MySQL Community Server installed, try SELECT CHAR( ) queries using all numbers 1-126 and you will see that the results are as expected.

Then type in CHAR( ) with any and all numbers 127 and up, and you will see only a response of ?.

Entering SET NAMES latin1; and see this makes no difference in the output of these queries.

Run the following script:
set names latin1;
set @elat="é", @chr233lat=char(233);
set names utf8mb4;
set @eutf="é", @chr233utf=char(233);
select "utf8mb4", @elat, hex(@elat), @eutf, hex(@eutf), 
       @chr233lat, hex(@chr233lat), @chr233utf, hex(@chr233utf)\G
set names latin1;
select "latin1", @elat, hex(@elat), @eutf, hex(@eutf), 
       @chr233lat, hex(@chr233lat), @chr233utf, hex(@chr233utf)\G

And you should see these responses:

*************************** 1. row *************************** 
utf8mb4: utf8mb4 
@elat: é 
hex(@elat): C3A9 
@eutf: é 
hex(@eutf): C3A9 
@chr233lat: ? 
hex(@chr233lat): E9 
@chr233utf: ? 
hex(@chr233utf): E9 
1 row in set (0.00 sec)

*************************** 1. row *************************** 
latin1: latin1 
@elat: é 
hex(@elat): C3A9 
@eutf: ? 
hex(@eutf): C3A9 
@chr233lat: ? 
hex(@chr233lat): E9 
@chr233utf: ? 
hex(@chr233utf): E9 
1 row in set (0.02 sec)
[25 Mar 2019 0:47] Karen x
I also tried the following script:

set names macroman; 
set @erom="é", @chr233rom=char(233); 
select "macroman", @erom, hex(@erom), @chr233rom, hex(@chr233rom)\G

And received this response:

*************************** 1. row *************************** 
macroman: macroman 
@erom: é 
hex(@erom): C3A9 
@chr233rom: ? 
hex(@chr233rom): E9 
1 row in set (0.00 sec)
[26 Mar 2019 11:52] MySQL Verification Team
Hello Karen,

Thank you for the report and test case.
Verified as described on Mac OS X 10.14.3.

thanks,
Umesh
[3 Apr 2019 14:02] Erlend Dahl
Posted by developer:
 
Comments from Bernt:

mysql> select char(200);
+-----------+
| char(200) |
+-----------+
| �          |
+-----------+
1 row in set (0.00 sec)

According to the docs:

CHAR() interprets each argument N as an integer and returns a string consisting of the characters given by the code values of those integers.

mysql> select hex(char(200));
+----------------+
| hex(char(200)) |
+----------------+
| C8             |
+----------------+
1 row in set (0.00 sec)

C8 is not a legal single byte in UTF-8.

But in order to display codepoint U+00C8 we can do

mysql> select char(50056);
+-------------+
| char(50056) |
+-------------+
| È           |
+-------------+

Because the docs say

CHAR() arguments larger than 255 are converted into multiple result bytes. For example, CHAR(256) is equivalent to CHAR(1,0), and CHAR(256*256) is equivalent to CHAR(1,0,0):
And 50056 becomes  C388 which is UTF-8 of U+00C8 ......