Bug #36458 Add an option to make general_log file pure utf8
Submitted: 1 May 2008 21:06 Modified: 21 Oct 2008 15:41
Reporter: Peter Laursen (Basic Quality Contributor) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: Logging Severity:S4 (Feature request)
Version:5.0.51b (probably any) OS:Microsoft Windows (Vista 32 bit)
Assigned to: CPU Architecture:Any
Tags: qc
Triage: Triaged: D5 (Feature request)

[1 May 2008 21:06] Peter Laursen
Description:
The general log (I did not check slow log) are created as ANSI encoded files on Windows.

That makes logs almost unusable when using queries containing non-ASCII character - depending on the complexity of queries of course!

The locale setting of my PC is Danish - log file is ANSI/western codepage.

How to repeat:
Start server with general log enabled and default charset UTF8.

Execute queries wiht non-ASCII character like 

-- first some Danish
select 'æøå'; -- executed from both command line and SQLyog
-- now try some Hindi
select 'रामगड'; -- executed from SQLyog 
-- inspect log

.. images will be uploaded!

Suggested fix:
MySQL server should *force* either UTF8 (preferable) or UTF16 encoding for logs - at least when server default charset is a unicode charset (but better always).

If Windows 9x/ME is (still) supported then of course it does not apply to those!
[1 May 2008 21:08] Peter Laursen
Notepad save dialogue tells that file is ANSI encoded

Attachment: start.jpg (image/jpeg, text), 71.54 KiB.

[1 May 2008 21:09] Peter Laursen
"select 'æøå'" as recorded in log

Attachment: query record.jpg (image/jpeg, text), 6.37 KiB.

[1 May 2008 21:10] Peter Laursen
"select 'रामगड'" as recorded in log

Attachment: query record 2.jpg (image/jpeg, text), 6.49 KiB.

[1 May 2008 21:48] Peter Laursen
I should probably add that both clients did SET NAMES UTF8 (I did manually from command line, SQLyog always does with servers >= 4.1).
[5 May 2008 21:03] Sveta Smirnova
Thank you for the report.

Data is written to general query log in encoding which used when it was inserted. So to see data in general log correctly just change encoding of your editor.

Notepad probably does not recognize UTF8 data, because log file does not contain BOM header. But I think this would be bad idea to put this header into general log file, because it can lead to problems with other editors.
[5 May 2008 21:59] Peter Laursen
I am sorry, but I need a few clarifications here!

You write "Data is written to general query log in encoding which used when it was inserted." 
>> well nothing was INSERTED, actually.  Only a 'literal string' was SELECTED.

Does this means that if I

set names utf8:
select 'æøå';
set names latin1;
select 'æøå';

.. then the statement "select 'æøå';" will occur twice in the log with two different encodings (utf8 and ansi/western)?

* If so is this behaviour the same on Unix/Linux?
* Is this documented?

IMHO that makes the log practically unusable in a multilingual environment (unless all clients use the same unicode charset). I would really request an *option* then to encode everything in the logs as UTF8!

This I accept:
"Notepad probably does not recognize UTF8 data, because log file does not contain BOM header. But I think this would be bad idea to put this header into general log file, because it can lead to problems with other editors." 
... though using BOMs is de facto standard on Windows.  No Windows editors have problems with BOMs, I think.  No BOMs on Windows means ANSI! But ok ..let that go!
[6 May 2008 18:40] Sveta Smirnova
Peter,

you are right.

> * If so is this behaviour the same on Unix/Linux?

Not, it is different.

> * Is this documented?

Not.

So bug set to "Verified" as should be clear which encoding uses general query log.
[21 Oct 2008 15:40] Peter Laursen
I als0 think that is not true "sends data (such as command line parameters) to it in the so called ANSI (non-wide) character set."

It only is *if* the folder name is valid within one ANSI codepage. Try with a folder name in हिंदी (Hindi) ... I think it will use (little endian) UTF-16 (native Windows unicode impelmentation) for encoding of the folder name.  Same if there are both western non-ASCII  characters and nonwestern characters at the same time (like 'æøåрусский'). Those simply cannot be represented as ASCII becuase 1) no ANSI codepage for Hindi 2) Not a single ANSI codepage possible for this!
[21 Oct 2008 15:41] Peter Laursen
my mistake.  last post was for this:
http://bugs.mysql.com/bug.php?id=37339