Bug #36458 | Add an option to make general_log file pure utf8 | ||
---|---|---|---|
Submitted: | 1 May 2008 21:06 | Modified: | 21 Oct 2008 15:41 |
Reporter: | Peter Laursen (Basic Quality Contributor) | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Server: Logging | Severity: | S4 (Feature request) |
Version: | 5.0.51b (probably any) | OS: | Windows (Vista 32 bit) |
Assigned to: | CPU Architecture: | Any | |
Tags: | qc |
[1 May 2008 21:06]
Peter Laursen
[1 May 2008 21:08]
Peter Laursen
Notepad save dialogue tells that file is ANSI encoded
Attachment: start.jpg (image/jpeg, text), 71.54 KiB.
[1 May 2008 21:09]
Peter Laursen
"select 'æøå'" as recorded in log
Attachment: query record.jpg (image/jpeg, text), 6.37 KiB.
[1 May 2008 21:10]
Peter Laursen
"select 'रामगड'" as recorded in log
Attachment: query record 2.jpg (image/jpeg, text), 6.49 KiB.
[1 May 2008 21:48]
Peter Laursen
I should probably add that both clients did SET NAMES UTF8 (I did manually from command line, SQLyog always does with servers >= 4.1).
[5 May 2008 21:03]
Sveta Smirnova
Thank you for the report. Data is written to general query log in encoding which used when it was inserted. So to see data in general log correctly just change encoding of your editor. Notepad probably does not recognize UTF8 data, because log file does not contain BOM header. But I think this would be bad idea to put this header into general log file, because it can lead to problems with other editors.
[5 May 2008 21:59]
Peter Laursen
I am sorry, but I need a few clarifications here! You write "Data is written to general query log in encoding which used when it was inserted." >> well nothing was INSERTED, actually. Only a 'literal string' was SELECTED. Does this means that if I set names utf8: select 'æøå'; set names latin1; select 'æøå'; .. then the statement "select 'æøå';" will occur twice in the log with two different encodings (utf8 and ansi/western)? * If so is this behaviour the same on Unix/Linux? * Is this documented? IMHO that makes the log practically unusable in a multilingual environment (unless all clients use the same unicode charset). I would really request an *option* then to encode everything in the logs as UTF8! This I accept: "Notepad probably does not recognize UTF8 data, because log file does not contain BOM header. But I think this would be bad idea to put this header into general log file, because it can lead to problems with other editors." ... though using BOMs is de facto standard on Windows. No Windows editors have problems with BOMs, I think. No BOMs on Windows means ANSI! But ok ..let that go!
[6 May 2008 18:40]
Sveta Smirnova
Peter, you are right. > * If so is this behaviour the same on Unix/Linux? Not, it is different. > * Is this documented? Not. So bug set to "Verified" as should be clear which encoding uses general query log.
[21 Oct 2008 15:40]
Peter Laursen
I als0 think that is not true "sends data (such as command line parameters) to it in the so called ANSI (non-wide) character set." It only is *if* the folder name is valid within one ANSI codepage. Try with a folder name in हिंदी (Hindi) ... I think it will use (little endian) UTF-16 (native Windows unicode impelmentation) for encoding of the folder name. Same if there are both western non-ASCII characters and nonwestern characters at the same time (like 'æøåрусский'). Those simply cannot be represented as ASCII becuase 1) no ANSI codepage for Hindi 2) Not a single ANSI codepage possible for this!
[21 Oct 2008 15:41]
Peter Laursen
my mistake. last post was for this: http://bugs.mysql.com/bug.php?id=37339