Bug #34843 Character sets are mapped on numbers till 99 only
Submitted: 26 Feb 2008 13:24 Modified: 25 Jul 2008 9:05
Reporter: Salman Rawala Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.1.22 OS:Any
Assigned to: Alexander Barkov CPU Architecture:Any
Tags: character sets

[26 Feb 2008 13:24] Salman Rawala
Description:
Character sets are mapped on numbers till 99 in such a way that 100 onward numbers gives error, and before 100 the 36 mapped wraps around several times.

How to repeat:
You can reproduce this problem using following code:

Sample Code:
SET @@character_set_filesystem = 1;
SELECT @@character_set_filesystem;
SET @@character_set_filesystem = 2;
SELECT @@character_set_filesystem;
SET @@character_set_filesystem = 3;
SELECT @@character_set_filesystem;
SET @@character_set_filesystem = 36;
SELECT @@character_set_filesystem;
SET @@character_set_filesystem = 99;
SELECT @@character_set_filesystem;

--Error ER_UNKNOWN_CHARACTER_SET
SET @@character_set_filesystem = 100;

Actual Output: 
ET @@character_set_filesystem = 1;
SELECT @@character_set_filesystem;
@@character_set_filesystem
big5
SET @@character_set_filesystem = 2;
SELECT @@character_set_filesystem;
@@character_set_filesystem
latin2
SET @@character_set_filesystem = 3;
SELECT @@character_set_filesystem;
@@character_set_filesystem
dec8
SET @@character_set_filesystem = 36;
SELECT @@character_set_filesystem;
@@character_set_filesystem
cp866
SET @@character_set_filesystem = 99;
SELECT @@character_set_filesystem;
@@character_set_filesystem
cp1250
SET @@character_set_filesystem = 100;
ERROR 42000: Unknown character set: '100'
[27 Feb 2008 20:50] Sveta Smirnova
Thank you for the report.

Could you please describe what is the problem with such behavior?
[11 Mar 2008 5:55] Rizwan Maredia
There are some numeric values that are not accepted like 45 and 54. Mysql supports about 36 character sets but when we set numeric values from 1 to 36 we see that it does not cover all such character sets. Also in this range some character sets like latin7 repeats at values 20,41,42.

There is a character set 'filename' that is not shown in SHOW CHARACTER SET. Its numeric value is 17.

SET @@session.character_set_server = 17;
SELECT @@session.character_set_server;
@@session.character_set_server
filename

I think we should not allow numeric values with character sets as there behavior is not consistent.
[11 Mar 2008 6:01] Rizwan Maredia
In my last comment I gave a wrong example of latin7 as it only comes once from 1-36. latin1 comes 4 times at 5,8,15,31
[12 Mar 2008 22:06] Sveta Smirnova
Thank you for the feedback.

Having character set which can be set, but can not be observed with either SHOW CHARACTER SET or SHOW COLLATION is verified:

mysql> SET @@session.character_set_server = 17;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @@session.character_set_server;
+--------------------------------+
| @@session.character_set_server |
+--------------------------------+
| filename                       | 
+--------------------------------+
1 row in set (0.00 sec)

mysql> SHOW COLLATION where Id=17;
Empty set (0.00 sec)

Initial description is not: 

mysql> SHOW COLLATION where Id>200;
+--------------------+---------+-----+---------+----------+---------+
| Collation          | Charset | Id  | Default | Compiled | Sortlen |
+--------------------+---------+-----+---------+----------+---------+
| utf8_turkish_ci    | utf8    | 201 |         | Yes      |       8 | 
| utf8_czech_ci      | utf8    | 202 |         | Yes      |       8 | 
| utf8_danish_ci     | utf8    | 203 |         | Yes      |       8 | 
| utf8_lithuanian_ci | utf8    | 204 |         | Yes      |       8 | 
| utf8_slovak_ci     | utf8    | 205 |         | Yes      |       8 | 
| utf8_spanish2_ci   | utf8    | 206 |         | Yes      |       8 | 
| utf8_roman_ci      | utf8    | 207 |         | Yes      |       8 | 
| utf8_persian_ci    | utf8    | 208 |         | Yes      |       8 | 
| utf8_esperanto_ci  | utf8    | 209 |         | Yes      |       8 | 
| utf8_hungarian_ci  | utf8    | 210 |         | Yes      |       8 | 
| utf8_general_cs    | utf8    | 254 |         | Yes      |       1 | 
+--------------------+---------+-----+---------+----------+---------+
11 rows in set (0.03 sec)
[25 Jul 2008 9:05] Alexander Barkov
That's true that there is some inconsistency here.

However, numeric notation like "SET character_set_filesystem=17" is
mostly for internal purposes (e.g. like replication).
Users should normally use name notation.

Numeric notation is required even for "hidden" character sets,
like "filename" is.

This is not a bug, this is intentional behavior.