Bug #2218 updating utf-8 text field generate nonsense chars
Submitted: 26 Dec 2003 3:46 Modified: 15 Jan 2004 3:59
Reporter: Vaclav Vobornik Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: MyISAM storage engine Severity:S2 (Serious)
Version:4.1.1-alpha OS:Linux (Linux)
Assigned to: Alexander Barkov CPU Architecture:Any

[26 Dec 2003 3:46] Vaclav Vobornik
Description:
updating _long_ utf-8 text into cp1250 text field generate nonsense chars in table row. Short text works fine. 

How to repeat:
compile and install:

./configure \
--prefix=/opt/mysql \
--with-charset=cp1250 \
--with-extra-charsets=complex \
--with-mysqld-user=mysql \
--localstatedir=/data/mysql \
--enable-local-infile \
--sysconfdir=/etc \
--without-innodb \
--with-low-memory \
--with-collation=cp1250_czech_ci

and run this sql:
http://www.blogator.com/bug/utfcrash.sql

There are some nonsense chars at begin in text field:

root@linux:~# mysql -D test < utfcrash.sql
left(description,20)
lÆ%@lÆ%@e všeobecn&#283; 
I když je všeobecn&#283; 

The 2nd line is right (there is only short text in sql update command)
[26 Dec 2003 7:42] Dean Ellis
I cannot repeat this using the current 4.1.2 development sources.  The columns are updated correctly and return the correct results (HEX() also shows that the values are identical).

I am wondering if perhaps the encoding you are using in your shell/terminal is responsible for the garbled output.
[26 Dec 2003 8:59] Vaclav Vobornik
Hex output is different too:

root@linux:~# mysql -D test -e "select hex(left(description,10)) from items;"
+---------------------------+
| hex(left(description,10)) |
+---------------------------+
| 6CC62540706F3F086520      |
| 49A06B64799E206A6520      |
+---------------------------+

I will try this using 4.1.2 devel sources...
[26 Dec 2003 15:51] Vaclav Vobornik
I got _THE SAME_ output with new version 4.1.2-alpha with these (see above) configure options. Again wrong data:
  
root@linux:~# mysql -V
mysql  Ver 14.3 Distrib 4.1.2-alpha, for pc-linux-gnu (i686)
root@linux:~# mysql -D test < /data/domains/blogator.com/http/bug/utfcrash.sql 
left(description,20)
ôÆ%@ôÆ%@e veobecnì 
I kdy je veobecnì 
hex(left(description,10))
F4C62540F4C625406520
49A06B64799E206A6520
[26 Dec 2003 15:55] Vaclav Vobornik
(not the same output, but with the same error)
[26 Dec 2003 16:27] Vaclav Vobornik
Output of the whole sql file is variable:

hex(left(description,10))
CCC62540CCC625406520
49A06B64799E206A6520

hex(left(description,10))
D4C62540D4C625406520
49A06B64799E206A6520

hex(left(description,10))
6CC625406CC625406520
49A06B64799E206A6520

hex(left(description,10))
CCC62540CCC625406520
49A06B64799E206A6520

I think it cannot be "terminal sensitive" error...
[30 Dec 2003 7:47] Dean Ellis
Please attach utfcrash.sql to this issue (so you do not have to keep it online).

For whatever reason I am now occasionally (inconsistently) able to reproduce this.  I seem to have the best success repeating it if I have just started mysqld, as eventually it seems to start producing correct results for me and continues to do so until I restart (query cache disabled).
[30 Dec 2003 8:18] Vaclav Vobornik
sql

Attachment: utfcrash.sql (text/plain), 5.52 KiB.

[31 Dec 2003 4:18] Alexander Barkov
I found a problem in the sources. I'm not sure how to fix it in the most proper way right now.

While we're fixing this proble, I think this temporary workaround should work:

UPDATE ... SET description=CAST(_utf8'string' AS CHAR CHARACTER SET cp1250)

instead of direct:

UPDATE ... SET description=_utf8'string'
[15 Jan 2004 3:59] Alexander Barkov
Thank you for your bug report. This issue has been committed to our
source repository of that product and will be incorporated into the
next release.

If necessary, you can access the source repository and build the latest
available version, including the bugfix, yourself. More information 
about accessing the source trees is available at
    http://www.mysql.com/doc/en/Installing_source_tree.html