Bug #33954 QB saves only in defacto standard UTF-8
Submitted: 21 Jan 2008 10:09 Modified: 16 Feb 2009 12:42
Reporter: Jim Michaels Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Query Browser Severity:S3 (Non-critical)
Version:1.2.12 OS:Microsoft Windows (XP Pro SP2)
Assigned to: Mike Lischke CPU Architecture:Any

[21 Jan 2008 10:09] Jim Michaels
Description:
The character U+233B4 (a Chinese character meaning 'stump of tree') is prepended to a previously ASCII document.  

QB ignores this character and it does not appear in the script editor.

How to repeat:
use a programmer's editor or dreamweaver to create an SQL file.  
open it as a script in QB. 
make a change. 
save it. by default, it will save as UTF-8.
open the document in the programmer's editor using or in dreamweaver (which will notify you of another app changing the document).
notice the 3 garbage characters at the top. 0xEF, 0xBB, 0xBF

I have not seen a UTF-8-encoded web document that contains these characters in it at the top.

Suggested fix:
add more formats for saving.
PHPmyAdmin may or may not appreciate UTF-8 scripts with this at the top.
according to RFC3629, "Character numbers from U+0000 to U+007F (US-ASCII repertoire) correspond to octets 00 to 7F (7 bit US-ASCII values).  A direct
 consequence is that a plain ASCII string is also a valid UTF-8
 string."
Nowhere in the document do I see it stating (did I read it correctly?) that the document should start with this chinese character.  for some reason, it has become a microsoft defacto standard.
[21 Jan 2008 10:20] Jim Michaels
UTF-8 standard is at http://tools.ietf.org/html/rfc3629
[21 Jan 2008 16:12] Valeriy Kravchuk
Thank you for a bug report. Verified just as described.
[22 Jan 2008 7:55] Jim Michaels
note that windows' notepad.exe in UTF-8 mode also inserts this chinese character at the top of the file when it saves.
[16 Feb 2009 12:42] Mike Lischke
This is not a bug. The characters you see are called BOM (byte order mark). Originally they were used for UTF-16/32 to note in which order bytes are stored (big endian, little endian) however they are established now also for UTF-8 as kind of a type indicator, even though the BOM is not strictly needed there.

Generally, every serious Unicode enabled application should be able to handle BOM by now. That's a standard task and an easy one too.

If you really need to get rid of the BOM then save your documents in ANSI format (see Save As in file menu).