Bug #33475 Incorrect string value when inserting unicode codepoint \xC2\x92
Submitted: 21 Dec 2007 23:33 Modified: 21 Jan 2008 11:26
Reporter: Andreas Götz Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: Charsets Severity:S2 (Serious)
Version:5.0.54 OS:Windows
Assigned to: CPU Architecture:Any

[21 Dec 2007 23:33] Andreas Götz
Description:
I'm trying to insert some data into my DB. Data contains the HTML entity ’ which is the single right quotation mark. This has been decoded into to utf-8 byte sequence \xC2\x92 (using php).
Upon inserting to DB (with prior run on SET NAMES 'utf8'), an error message is shown: Incorrect string value: '\xC2\x92s'.
I've previously successfully inserted utf-8 data.

How to repeat:
Run these statements:

SET NAMES 'utf8';

INSERT INTO videodata SET md5 = '', title = 'Eastern Promises', subtitle = '', language = 'english, russian, turkish', diskid = '', mediatype = '50', comment = '', disklabel = '', imdbID = 'imdb:0765443', year = '2007', imgurl = 'http://ia.imdb.com/media/imdb/01/I/66/31/54/10m.jpg', director = 'David Cronenberg', actors = 'Josef Altin::Ekrem::imdb:nm1086981 Mina E. Mina::Azim::imdb:nm0590875 Aleksandar Mikic::Soyka::imdb:nm2233251 Sarah-Jeanne Labrosse::Tatiana::imdb:nm1436204 Lalita Ahmed::Customer::imdb:nm0014161 Badi Uzzaman::Chemist::imdb:nm0882732 Naomi Watts::Anna::imdb:nm0915208 Doña Croll::Nurse (as Dona Croll)::imdb:nm0188568 Raza Jaffrey::Doctor Aziz::imdb:nm1203005 Sinéad Cusack::Helen::imdb:nm0193661 Jerzy Skolimowski::Stepan::imdb:nm0804592 Tatiana Maslany::Tatiana\'s Voice (voice)::imdb:nm1137209 Viggo Mortensen::Nikolai::imdb:nm0001557 Vincent Cassel::Kirill::imdb:nm0001993 Armin Mueller-Stahl::Semyon::imdb:nm0000090 Shannon-Fleur Roux::Maria::imdb:nm2765707 Lillibet Langley::Violin Girl::imdb:nm2765699 Mia Soteriou::Azim’s Wife::imdb:nm0815420 Radoslaw Kaim::Head Waiter::imdb:nm1393968 Donald Sumpter::Yuri::imdb:nm0838910 Rhodri Wyn Miles::Senior Officer::imdb:nm0943872 Tereza Srbova::Kirilenko::imdb:nm2462293 Elisa Lasowski::Prostitute::imdb:nm2489799 Cristina Catalina::Prostitute::imdb:nm1692771 Alice Henley::Prostitute::imdb:nm2466446 Faton Gerbeshi::Pimp::imdb:nm2192132 David Papava::Chechen::imdb:nm1736565 Tamer Hassan::Chechen::imdb:nm1268748 Gergo Danka::Junior Waiter::imdb:nm1832830 Michael Sarne::Valery::imdb:nm0765398 Boris Isarov::Russian Boss::imdb:nm0410708 Yuri Klimov::Russian Boss::imdb:nm1585733 Andrzej Borkowski::The Gypsy::imdb:nm0096945 Olegar Fedoro::Tattooist::imdb:nm0270194', runtime = '100', country = 'UK, Canada, USA', plot = 'The mysterious and charismatic Russian-born Nikolai Luzhin is a driver for one of London\'s most notorious organized crime families of Eastern European origin. The family itself is part of the Vory V Zakone criminal brotherhood. Headed by Semyon, whose courtly charm as the welcoming proprietor of the plush Trans-Siberian restaurant impeccably masks a cold and brutal core, the family\'s fortunes are tested by Semyon\'s volatile son and enforcer, Kirill, who is more tightly bound to Nikolai than to his own father. But Nikolai\'s carefully maintained existence is jarred once he crosses paths at Christmastime with Anna Khitrova, a midwife at a North London hospital. Anna is deeply affected by the desperate situation of a young teenager who dies while giving birth to a baby. Anna resolves to try to trace the baby\'s lineage and relatives. The girl\'s personal diary also survives her; it is written in Russian, and Anna seeks answers in it. Anna\'s mother Helen does not discourage her, but Anna\'s irascible Russian-born uncle Stepan urges caution. He is right to do so; by delving into the diary, Anna has accidentally unleashed the full fury of the Vory. With Semyon and Kirill closing ranks and Anna pressing her inquiries, Nikolai unexpectedly finds his loyalties divided. The family tightens its grip on him; who can, or should, he trust? Several lives - including his own - hang in the balance as a harrowing chain of murder, deceit, and retribution reverberates through the darkest corners of both the family and London itself.', filename = '', filesize = NULL, filedate = NULL, audio_codec = '', video_codec = '', video_width = NULL, video_height = NULL, istv = 0, custom1 = '7.9', custom2 = '', custom3 = '', custom4 = '', created = NOW();
[21 Dec 2007 23:39] Andreas Götz
Might this be related to bug 30803 which encounters another problem with the same character?
[21 Dec 2007 23:44] Andreas Götz
Character description can be found here: http://www.fileformat.info/info/unicode/char/0092/index.htm
This appears to be a perfectly valid utf-8 character. Suspecting a mixup of the unicode character with the mysql ' string delimiter?
[3 Jan 2008 17:20] Susanne Ebrecht
Please can you add a smaller test here.
[3 Jan 2008 19:21] Andreas Götz
Simple test case:

<?php
require_once('config.inc.php');

$str	= html_entity_decode('&#146;', ENT_NOQUOTES, 'UTF-8');
echo $str;

$db_link = mysql_connect($config['db_server'], $config['db_user'], $config['db_password']);
mysql_select_db($config['db_database'], $db_link);
mysql_query("SET NAMES 'utf8'", $db_link);

mysql_query("INSERT INTO videodata SET title='".$str."'", $db_link);

echo mysql_error($db_link);
?>

I guess I cannot simply type the query in- the exact character matters...
[7 Jan 2008 12:27] Andreas Götz
Test case in pure SQL (if bug tracking does not screw the character set):

set names utf8;
select '’';
[10 Jan 2008 10:46] Andreas Götz
Any chance looking into this? This is a real issue for people working with utf-8 data. 
Please note that the pure-sql test case provided does _not_ work. I've not been able to create a pure-sql test case.
[21 Jan 2008 11:26] Andreas Götz
Resolving as invalid.

Reason:

with db encoding LATIN1 and connection encoding UTF8, the characters in question cannot be inserted as there is no LATIN1 equivalent for the unicode entities, therefore conversion fails.
[23 Jan 2008 14:18] Tonci Grgin
Andreas, this is one of the most common mistake users do, so don't feel bad. I am not a PHP guy but I think I saw somewhere that php does not support UTF8 at all thus making this even harder to fathom. As a general rule, in your next reports, attach my.ini/cnf file as that will definitely help in resolution. Glad the problem is solved.

Thanks for your interest in MySQL.
[9 Dec 2011 0:07] Arkadiy Kulev
This problem solves easily. Don't forget to not only set the database, table and collation to utf8, BUT THE COLUMNS ALSO!

That's what cause the problem for me. I created the table in latin, then switch to utf8, but forgot to also change the columns.