Bug #52494 md5() on Windows cmd client returns unexpected results
Submitted: 31 Mar 2010 8:09 Modified: 30 Sep 2010 20:56
Reporter: Peter Laursen (Basic Quality Contributor) Email Updates:
Status: Patch pending Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:any OS:Windows
Assigned to: Assigned Account CPU Architecture:Any

[31 Mar 2010 8:09] Peter Laursen
Description:
The Windows cmd.exe application is not a proper environment for MySQL command line implementation.

Here is another example:

How to repeat:
1) A UTF8 capable client (Linux command line, SQLyog, Query Browser etc.): 

SET NAMES utf8;
SELECT md5(CONVERT('á' USING latin1)); -- 2eece4376cee1433d0e9f200deb75408 
SELECT md5(CONVERT('á' USING utf8)); -- 36b7148acc1b607c473a15a47fa17706 
SELECT md5(CONVERT('á' USING ucs2)); -- f1a1c5dbc5af4beea7511e48f23ca3ae

2) Command line on Windows: 

Welcome to the MySQL monitor.  Commands end with ; or \g
Your MySQL connection id is 54
Server version: 5.1.44-community MySQL Community Server

Type 'help;' or '\h' for help. Type '\c' to clear the cu

mysql> SELECT md5(CONVERT('á' USING latin1)); -- 2eece43
8
+----------------------------------+
| md5(CONVERT('á' USING latin1))    |
+----------------------------------+
| d1457b72c3fb323a2671125aef3eab5d |
+----------------------------------+
1 row in set (0.00 sec)

mysql> SELECT md5(CONVERT('á' USING utf8)); -- 36b7148ac

+------------------------------+
| md5(CONVERT('á' USING utf8))  |
+------------------------------+
| NULL                         |
+------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> SELECT md5(CONVERT('á' USING ucs2)); -- f1a1c5dbc
+----------------------------------+
| md5(CONVERT('á' USING ucs2))      |
+----------------------------------+
| 1b759226eb6991d1214e639ea673d087 |
+----------------------------------+
1 row in set (0.00 sec)

mysql>
mysql> SELECT md5(CONVERT('á' USING latin1));
+----------------------------------+
| md5(CONVERT('á' USING latin1))    |
+----------------------------------+
| d1457b72c3fb323a2671125aef3eab5d |
+----------------------------------+
1 row in set (0.00 sec)

mysql> SELECT md5(CONVERT('á' USING utf8));
+------------------------------+
| md5(CONVERT('á' USING utf8))  |
+------------------------------+
| NULL                         |
+------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> SELECT md5(CONVERT('á' USING ucs2));
+----------------------------------+
| md5(CONVERT('á' USING ucs2))      |
+----------------------------------+
| 1b759226eb6991d1214e639ea673d087 |
+----------------------------------+
1 row in set (0.00 sec)

mysql> set names utf8; -- has no effect
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT md5(CONVERT('á' USING latin1));
+----------------------------------+
| md5(CONVERT('á' USING latin1))    |
+----------------------------------+
| d1457b72c3fb323a2671125aef3eab5d |
+----------------------------------+
1 row in set (0.00 sec)

mysql> SELECT md5(CONVERT('á' USING utf8));
+------------------------------+
| md5(CONVERT('á' USING utf8))  |
+------------------------------+
| NULL                         |
+------------------------------+
1 row in set, 1 warning (0.00 sec)

mysql> SELECT md5(CONVERT('á' USING ucs2));
+----------------------------------+
| md5(CONVERT('á' USING ucs2))      |
+----------------------------------+
| 1b759226eb6991d1214e639ea673d087 |
+----------------------------------+
1 row in set (0.00 sec)

Nothing matches!

Suggested fix:
Maybe 'Windows Power Shell' as shipped with recent Windows versions can be used for the client environment (I noticed that it is fully unicode capable unlike cmd.exe).  

It is not the main question if this is a bug or not.  There is likely a technical explanation (the byte pattern for 'á' in the client), but I find such explanation largely uninteresting. The point is that Windows users constantly get such surprises when working with non-ASCII strings. And nothing is really documented about it.

Besides I would be surprised if only md5() function is affected.
[31 Mar 2010 8:35] Peter Laursen
Ooops .. please ignore the 3 first statements from the Windows environment.  They are duplicates and have garbled and irrelevant comments.
[31 Mar 2010 8:53] Peter Laursen
Clarification: 

In my opinion what is *THE BIG BUG* here is that MySQL since 4.1 release (where unicode support was introduced server-side) has shipped with an invalidated client environment on Windows.

But even a latin1 string with non-ASCII characters fails to return the correct checksum.  This issue is probably older than 4.1.
[31 Mar 2010 10:11] Peter Laursen
A few more experiments/observations:

SELECT hex('á');  -- returns 'A0' in cmd.exe and 'C3A1' in SQLyog/utf8 client;

SELECT hex(CONVERT('á' USING latin1)); -- returns '3F' in cmd.exe;
--  this surprises me. I would expect "CONVERT('á' USING latin1)" to return simply 'á' in that environment but ... 

mysql> SELECT CONVERT('á' USING latin1);
+---------------------------+
| CONVERT('á' USING latin1)  |
+---------------------------+
| ?                         |
+---------------------------+
1 ROW IN SET (0.00 sec)

SELECT hex(CONVERT('á' USING utf8)); -- returns 'C3A1' in SQLyog/utf8 client;
-- this is 100% consistent

-- avoiding non-ASCII characters as argument to md5() is same in all clients (of course): 
SELECT md5(CONVERT(unhex('C3A1') USING utf8)); -- 36b7148acc1b607c473a15a47fa17706
SELECT md5(CONVERT(unhex('A0') USING latin1)); -- 9af7c117d9de9a06fba7a5f1ea5fcc2d
[31 Mar 2010 11:37] MySQL Verification Team
Thank you for the bug report.
[31 Mar 2010 13:00] Alexander Barkov
Note, mysql.exe is a DOS-alike console application,
so mysql.exe (like all other console applications on Windows)
uses OEM character set, which is cp850 on a Windows machine.

To get similar results to what you get in GUI applications using utf8:

SET NAMES utf8;
SELECT md5(CONVERT('á' USING latin1)); -- 2eece4376cee1433d0e9f200deb75408 
SELECT md5(CONVERT('á' USING utf8)); -- 36b7148acc1b607c473a15a47fa17706 
SELECT md5(CONVERT('á' USING ucs2)); -- f1a1c5dbc5af4beea7511e48f23ca3ae

You need to do this in mysql.exe:

SET NAMES cp850;
SELECT md5(CONVERT('á' USING latin1));
SELECT md5(CONVERT('á' USING utf8));
SELECT md5(CONVERT('á' USING ucs2));

There are no bugs here.

You're right that it was confusing that mysql.exe did not set the session 
default character set to cp850. This problem was fixed in version 5.5,

Please find ChangeLog here:
http://dev.mysql.com/doc/refman/5.5/en/news-5-5-0.html

the item starting from these lines:

> Previously, in the absence of other information, the MySQL client programs 
> mysql, mysqladmin, mysqlcheck, mysqlimport, and mysqlshow  used the 
> compiled-in default character set, usually latin1.
> Now these clients can autodetect which character set to use based on the 
> operating system settings...
[31 Mar 2010 13:02] Alexander Barkov
Sorry for a typo in "which is cp850 on a Windows machine."

Please read this instead: "which is cp850 on a Western machine."
[31 Mar 2010 13:32] Peter Laursen
I had only reply to one issue of two I discussed.

I consider it the BIG BUG here that MySQL distributes a client not supporting unicode with the server for WIndows.  'á' was only an example.  How do I handle 'हिं' for instance?  

The culprit of my report was a request that the client should not use cmd.exe but a fully unicode compliant environment instead (and it exists in recent Windows as far as I can tell).

I am sorry but I have to open again.
[31 Mar 2010 13:43] Alexander Barkov
The limitation is not in mysql.exe.
mysql.exe (and other client tools) itself does support utf8.
You can prove that dumping its result to an utf8 text file,
and/or by loading utf8 dump files.

The limitation is in Windows console implementation, we cannot do anything with that.

You can use these workarounds:
- Find a utf8-aware console program running on Windows, and
then start mysql.exe in this console program instead of cmd.exe.

- Use GUI tools

Everything works as expected, according to its design.
I am closing the report as "Not a bug".
[31 Mar 2010 13:56] Peter Laursen
I believe you can use another and more recent console application (with integrated scripting environment)  ..  'Windows Power Shell' if it is available on the actual Windows. It does not have the DOS-like limitations at all.  I have been able to execute other command line programs on it as well as file system commands etc.  

Besides I cannot see that there is no compelling reason why the standard client will have to run in a command interpreter on Windows.

Barkov's reply here is of the type 'it is as per design so this is not a bug'.  Problem is that the design (choosing cmd.exe for the client environment) is itself a bug - at least since MySQL 4.1 was released - in my opinion.
[31 Mar 2010 14:45] Alexander Barkov
Peter,

> I believe you can use another and more recent console application (with 
> integrated scripting environment)  .. 

I am afraid this is Windows who decides which console program
to start when you launch a non-GUI program, like mysql.exe.

> 'Windows Power Shell' if it is 
> available on the actual Windows. It does not have the DOS-like limitations at 
> all.  I have been able to execute
> other command line programs on it as well as file system commands etc.  

Did you try to run "Windows Power Shell" and then start "mysql.exe" in it?
Does it work as expected with ? 

> Besides I cannot see that there is no compelling reason why the standard 
> client will have to run in a command interpreter on Windows.

Command line interpretor is needed when you want pipes and/or 
file/stream redirection, for example:

mysqldump.exe db1 table1 | mysql.exe db2

or

mysqldump.exe db1 >file.sql
mysql.exe db2 <file.sql
[31 Mar 2010 15:34] Peter Laursen
.. but you may face the problem that MySQL never supported 'Windows Unicode' (ie. UTF16 with little endian byte order' properly so you cannot "SET NAMES utf16_little".  Instead they have added support for obscure UTF32.  Total lack of judgement in my opinion.
[5 Apr 2010 23:42] MySQL Verification Team
Changing to documentation to left clear that the command prompt clients on Windows has limitations regarding some character sets and is necessary to use instead GUI tools.
[6 Apr 2010 4:03] Peter Laursen
One more rotten compromise that will not solve anything in my opinion.

A fully functional commmand line client is required for *verification*.  
Whenever you get weird results with an script/application (or a GUI tool) you will want to verify that result in command line client.  That is not possible on Windows if unicode data (or identifiers) are there.  Actually only fully possible with ASCII and not even non-ASCII ANSI (unless SETting NAMES to an obscure charset)

The only solution is to ship the server with a simple *stand alone* client taking commands similar to command line on unix.  The currently active client for MySQL provided by MySQL/Oracle (Workbench) is not an option as it does not provide direct client-server functionalities (it has its private 'model layer' in between the user interface and the server).  And other GUI clients may not be either as they may set SESSION variables, sql-mode etc. transparently for user (they all do because they will have to in order to populate the GUI).

I do not expect miracles overnight of course, but the situation has now been non-sustainable for 5+ years.  It demonstrates that MySQL developers try all they they can to avoid touching Windows.

And btw: nobody checked if 'Windows Power Shell' is an option for the future?
[7 Apr 2010 6:34] James Day
The results are hardly better if you switch Windows 7 cmd.exe to use utf-8 using CHCP 65001:

>chcp 65001
Active code page: 65001
> mysql
mysql> SET NAMES utf8;
Query OK, 0 rows affected (0.03 sec)
mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| character_set_client     | utf8   |
| character_set_connection | utf8   |
| character_set_database   | latin1 |
| character_set_filesystem | binary |
| character_set_results    | utf8   |
| character_set_server     | latin1 |
| character_set_system     | utf8   |
+--------------------------+--------+

SELECT md5(CONVERT('á' USING latin1));
Bye
>

Straight back to the cmd.exe prompt as soon as I pressed enter after typing the ;.

It takes a little font setting experimenting to get cmd.exe to correctly display the á but it's doable - I switched to Lucida Console then back to raster fonts.

5.1.40-enterprise-gpl-advanced 5740 12:45.10 mysql> select unhex('a0');
+-------------+
| unhex('A0') |
+-------------+
| á           |
+-------------+
1 row in set (0.00 sec)

The character in that output is not the one displayed, it's how the display appears. Copying and pasting the displayed character simply produces a space here.

Not that this helps with the result:

mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| character_set_client     | utf8   |
| character_set_connection | utf8   |
| character_set_database   | latin1 |
| character_set_filesystem | binary |
| character_set_results    | utf8   |
| character_set_server     | utf8   |
| character_set_system     | utf8   |
+--------------------------+--------+
7 rows in set (0.00 sec)

mysql> select unhex('A0');
+-------------+
| unhex('A0') |
+-------------+
| á           |
+-------------+
1 row in set (0.00 sec)

The character in that output is not the one displayed, it's how the display appears. Copying and pasting the displayed character simply produces a space here.

5.1.40-enterprise-gpl-advanced 5740 23:32.8 mysql> select unhex('C3A1');
+---------------+
| unhex('C3A1') |
+---------------+
| á            |
+---------------+
1 row in set (0.00 sec)

The first displayed character is really shown as a line-drawing character, not an upper case letter with accent.

5.1.40-enterprise-gpl-advanced 5740 23:55.9 mysql> select md5('á');
Bye
[11 Apr 2010 17:00] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/105379

3144 Vladislav Vaintroub	2010-04-11
      Bug#52494, WL#5331 : Support Unicode for command line
      Windows client.
      
      This patch allows to use Unicode when working with mysql
      client in interactive mode. 
      
      Best results can be achieved if console is using TrueType
      font (Lucida Console/Consolas). Raster fonts are only 
      capable of showing 256 characters.
     @ client/mysql.cc
        Change default encoding of mysql command line client to UTF8.
        Use low-level console IO (Unicode) instead of printf/fprintf.
        Translate command line parameters to user charset.
     @ client/mysqltest.cc
        Correct my_popen() implementation  used in mysqltest.
        Command line is given in UTF8, which means on Windows
        we have to translate it to wide string and use _wpopen()
        so it works correctly.
     @ include/my_sys.h
        Remove unused function.
     @ mysys/CMakeLists.txt
        Remove my_conio.c, my_cgets is no more used.
     @ mysys/Makefile.am
        Remove my_conio.c, my_cgets() is no more used.
     @ mysys/my_conio.c
        Remove my_conio.c, my_cgets is no more used.
[11 Apr 2010 17:43] Vladislav Vaintroub
The patch shows that it is well possible to support unicode on the command line.
I created a worklog for it.
[11 Apr 2010 17:57] Vladislav Vaintroub
Just to give an idea of what client is able to do with the fix: 
No OEMCP encodings, no chcp or similar heresy, but Lucida console font for the client

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 17
Server version: 5.6.99-m4-log Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> select '∫K∂A = 2πχ(M)';
+---------------------+
| ∫K∂A = 2πχ(M)       |
+---------------------+
| ∫K∂A = 2πχ(M)       |
+---------------------+
1 row in set (0.00 sec)

mysql> SELECT hex('á');
+-----------+
| hex('á')  |
+-----------+
| C3A1      |
+-----------+
1 row in set (0.00 sec)

mysql> select 'Küçükyılmaz dislikes Mötörhead!';
+---------------------------------------+
| Küçükyılmaz dislikes Mötörhead!       |
+---------------------------------------+
| Küçükyılmaz dislikes Mötörhead!       |
+---------------------------------------+
1 row in set (0.00 sec)

mysql> select 'Петер любит говорить про Юникод';
+------------------------------------------------------------+
| Петер любит говорить про Юникод                            |
+------------------------------------------------------------+
| Петер любит говорить про Юникод                            |
+------------------------------------------------------------+
1 row in set (0.00 sec)
[14 Apr 2010 11:25] Peter Laursen
Please ensure that this is properly documented at least like what commands should be executed before starting mysql client.
[14 Apr 2010 11:30] Peter Laursen
Please ensure that this is properly documented at least like what commands should be executed in cmd.exe before starting mysql client.

And @Vlad .. I am actually able to understand this (And I'll kick you one day! :-):
"select 'Петер любит говорить про Юникод';"
[14 Apr 2010 14:02] Vladislav Vaintroub
@Peter : in "select 'Петер любит говорить про Юникод';" actually MySQL Peter was meant, who also likes Unicode a lot and talks about it too:)

As for using Unicode on the command line, there is no command in cmd.exe.

It is a font setting (right-click on console's title bar => Properties => Font.
Choose Lucida Console, it has many Unicode characters in it. You can create a shortcut to cmd.exe on desktop and name it whatever you like, and modify font here. rightclick on shortcut=>Properties=>Font. 

You can also add a new TrueType font like here: http://www.hanselman.com/blog/UsingConsolasAsTheWindowsConsoleFont.aspx

Unfortunately there is not that many TrueType fonts that show well on console, only monospace fonts provide good experience (and in default installation it would be Lucida or Consolas)

You can also stay with raster codepage, it limits the character repertoire to the current OEM codepage, but at least "current locale non-ASCIIs" (like in my case ÄÖÜß) should be shown correctly. For lot of folks, it seems to work just fine (otherwise I guess MS would have changed default fonts for console to be TrueType long ago)
[14 Apr 2010 14:04] Vladislav Vaintroub
"You can also stay with raster codepage", I meant raster font of course there is not "raster codepage"..
[14 Apr 2010 14:04] Vladislav Vaintroub
"You can also stay with raster codepage", I meant raster font of course there is not "raster codepage"..
[15 Apr 2010 4:58] Peter Laursen
I think I will need to wait for a release including the patch? I still get

SELECT * FROM test;
/*
    id  txt   
------  ------
     1  ø      -- correct. 'ø' inserted from GUI tool (SQLyog)
     2  o      -- wrong. 'ø' inserted from command line with SET NAMES latin1 using Lucida console font;
     3  o      -- wrong. 'ø' inserted from command line with SET NAMES utf8 using Lucida console font;
*/  
     
     
SELECT id, hex(txt) FROM test;
/*
    id  hex(txt)
------  --------
     1  C3B8    
     2  6F      
     3  6F      
*/
[15 Apr 2010 7:14] Vladislav Vaintroub
>I think I will need to wait for a release including the patch? 
Yes, it is nowhere yet, not yet reviewed not pushed to main trees.
[16 Apr 2010 4:08] Peter Laursen
I was asked if I was able to run the mysql client in WPS.  I cannot reply to that as I am currently traveling and currently only have a XP system available.  It will take around 6 weeks more before I get my hands on a Vista/Win7 machine with WPS.
[16 Apr 2010 17:45] Peter Laursen
I request a re-triage of this.  This is sheer arrogance.
[16 Apr 2010 18:23] Peter Laursen
well Vlad, then you may code such machine too ... 

The problems are
1) lack of Windows knowledge in the vast majority in MySQL dev. team.
2) lack of will to improve knowledge in this respect.

I have heard this for 4-5 years now that "this is a limitation with the OS" (see earlier this thread for instance).  Now it is (due to Vlad's patch) clear that it is a *limitation with the knowledge (and ATTITUDE as well) of most MySQL developers* simply.
[22 Apr 2010 19:11] James Day
For the reason why this is a feature request, consider InnoDB's support for native asynchronous I/O on Windows but not on Linux. Adding that support for Linux was a feature request, not a bug, because the support was never there for Linux. This support for Linux is added in the 5.5 series.

As soon as Windows support for UTF-8 is added in any future release it'll probably be possible to use that new client version for all older server versions.
[23 Apr 2010 7:25] Daniel Fischer
The reason that it's a bug and not a feature request is that we are advertising UTF-8 support for all platforms, not just those that are not Windows. 

The MySQL CLI supports an option --default-character-set, Windows is not mentioned as an exception:
http://dev.mysql.com/doc/refman/5.5/en/mysql-command-options.html

This page says I should be able to use --default-character-set to set the client's character set to the system character set, Windows is again not mentioned as an exception:
http://dev.mysql.com/doc/refman/5.5/en/charset-configuration.html

But when I actually do that...

>chcp 65001
Active code page: 65001

(The system character set is now UTF-8.)

>mysql --default-character-set=utf8
Welcome to the MySQL monitor.  Commands end with ; or \g.
[...]

mysql> SELECT 'á';
Bye
[30 Sep 2010 20:11] Peter Laursen
Any chance that Vlad's patch will be applied.  It looks quite reasonable to me.
[30 Sep 2010 20:35] Vladislav Vaintroub
The internal discussion about the patch went as far as to support UTF16 (little endian) as valid MySQL charset, because the conversion routines between user charset and Windows wide char looked hackish :)

So the patch will likely be applied after UTF16-LE is supported, and this is planned in not-to-far future (can't be more specific due to policies)
[30 Sep 2010 20:56] Peter Laursen
I appreciate this reply. Server side support for UTF16 (LE) would be very nice and would solve quite a lot of problems/issues for Windows users working in a multi-lingual environment.