MySQL Bugs: #33954: QB saves only in defacto standard UTF-8

Bug #33954	QB saves only in defacto standard UTF-8
Submitted:	21 Jan 2008 10:09	Modified:	16 Feb 2009 12:42
Reporter:	Jim Michaels	Email Updates:
Status:	Not a Bug	Impact on me:	None
Category:	MySQL Query Browser	Severity:	S3 (Non-critical)
Version:	1.2.12	OS:	Windows (XP Pro SP2)
Assigned to:	Mike Lischke	CPU Architecture:	Any

Description:
The character U+233B4 (a Chinese character meaning 'stump of tree') is prepended to a previously ASCII document.  

QB ignores this character and it does not appear in the script editor.

How to repeat:
use a programmer's editor or dreamweaver to create an SQL file.  
open it as a script in QB. 
make a change. 
save it. by default, it will save as UTF-8.
open the document in the programmer's editor using or in dreamweaver (which will notify you of another app changing the document).
notice the 3 garbage characters at the top. 0xEF, 0xBB, 0xBF

I have not seen a UTF-8-encoded web document that contains these characters in it at the top.

Suggested fix:
add more formats for saving.
PHPmyAdmin may or may not appreciate UTF-8 scripts with this at the top.
according to RFC3629, "Character numbers from U+0000 to U+007F (US-ASCII repertoire) correspond to octets 00 to 7F (7 bit US-ASCII values).  A direct
 consequence is that a plain ASCII string is also a valid UTF-8
 string."
Nowhere in the document do I see it stating (did I read it correctly?) that the document should start with this chinese character.  for some reason, it has become a microsoft defacto standard.

UTF-8 standard is at http://tools.ietf.org/html/rfc3629

Thank you for a bug report. Verified just as described.

note that windows' notepad.exe in UTF-8 mode also inserts this chinese character at the top of the file when it saves.

This is not a bug. The characters you see are called BOM (byte order mark). Originally they were used for UTF-16/32 to note in which order bytes are stored (big endian, little endian) however they are established now also for UTF-8 as kind of a type indicator, even though the BOM is not strictly needed there.

Generally, every serious Unicode enabled application should be able to handle BOM by now. That's a standard task and an easy one too.

If you really need to get rid of the BOM then save your documents in ANSI format (see Save As in file menu).