Bug #54652 Workbench does not use utf8 encoding
Submitted: 21 Jun 2010 7:38 Modified: 21 Jun 2010 13:28
Reporter: Guido Van Hoecke Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Workbench Severity:S3 (Non-critical)
Version:5.2.21 beta OS:Linux (Ubuntu 10.4 amd64)
Assigned to: CPU Architecture:Any
Tags: cli, query browser, workbench

[21 Jun 2010 7:38] Guido Van Hoecke
Description:
(Environment: Ubuntu 10.4; Workbench 5.2.21 beta; mysql 5.1)

I have a mysql database with default utf8 setting:
mysql> show create database utf8test;
CREATE DATABASE `utf8test` /*!40100 DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci */

In this database, I have one single table:
mysql> show create table leeslijst;
CREATE TABLE `leeslijst` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`author` varchar(256) COLLATE utf8_unicode_ci NOT NULL,
`published` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`read_date` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
`dutch_title` varchar(256) COLLATE utf8_unicode_ci NOT NULL,
`title` varchar(256) COLLATE utf8_unicode_ci NOT NULL,
`score` varchar(64) COLLATE utf8_unicode_ci NOT NULL,
`comment` varchar(1024) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=214 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

I have loaded this table with utf8 encoded data. When I query it from within an xterm, I get the utf8 results:

mysql> select author, dutch_title from leeslijst where id = 1 or id =5;
+-----------------+------------------------------------+
| author | dutch_title |
+-----------------+------------------------------------+
| Le Carré, John | Telefoon voor de dode              |
| Le Carré, John | De naïeve en sentimentele minnaar |
+-----------------+------------------------------------+

When I query the same table with Mysql Workbench, I get garbled data:
'Le Carré, John', 'Telefoon voor de dode'
'Le Carré, John', 'De naïeve en sentimentele minnaar'

Preceding the select with "set names 'utf8';" does not help.

I thought that WB would obey the character set specification of the
database. Apparently it doesn't.

My locale is fully utf-8:
~ $ locale
LANG=en_US.utf8
LANGUAGE=en_US:en
LC_CTYPE="en_US.utf8"
LC_NUMERIC=nl_BE.UTF-8
LC_TIME=nl_BE.UTF-8
LC_COLLATE="en_US.utf8"
LC_MONETARY=nl_BE.UTF-8
LC_MESSAGES="en_US.utf8"
LC_PAPER=nl_BE.UTF-8
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE=nl_BE.UTF-8
LC_MEASUREMENT=nl_BE.UTF-8
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=
~ $

How to repeat:
The above database can be created with:
create database utf8test character set utf8 collate utf8_unicode_ci;

The above table and its content can be created with following sql file:
-- use vanhoecke;
use utf8test;
drop table if exists leeslijst;
create table leeslijst (
id int not null auto_increment,
author varchar(256) not null,
published varchar(4) not null,
read_date varchar(10) not null,
dutch_title varchar(256) not null,
title varchar(256) not null,
score varchar(64) not null,
comment varchar(1024) not null,
primary key (id));
-- character set utf8 collate utf8_unicode_ci;
insert into leeslijst (read_date, published, author, dutch_title, title, score, comment) values ("","1961","Le Carré, John","Telefoon voor de dode", "Call for the Dead","","");
insert into leeslijst (read_date, published, author, dutch_title, title, score, comment) values ("","1962","Le Carré, John","", "A Murder of Quality","","");
insert into leeslijst (read_date, published, author, dutch_title, title, score, comment) values ("","1965","Le Carré, John","Spion verspeeld", "The Looking Glass War","","");
insert into leeslijst (read_date, published, author, dutch_title, title, score, comment) values ("","1968","Le Carré, John","", "A Small Town in Germany","","");
insert into leeslijst (read_date, published, author, dutch_title, title, score, comment) values ("","1971","Le Carré, John","De naïeve en sentimentele minnaar", "The Naïve and Sentimental Lover","","");
insert into leeslijst (read_date, published, author, dutch_title, title, score, comment) values ("","1974","Le Carré, John","Edelman, bedelman, schutter, spion", "Tinker, Tailor, Soldier, Spy","","");
insert into leeslijst (read_date, published, author, dutch_title, title, score, comment) values ("","1977","Le Carré, John","Spion van nobel bloed", "The Honourable Schoolboy","","");
insert into leeslijst (read_date, published, author, dutch_title, title, score, comment) values ("","1979","Le Carré, John","Smiley''s prooi", "Smiley''s People","","");

Suggested fix:
I expect output to show the proper encoding
[21 Jun 2010 8:25] Guido Van Hoecke
It may be relevant that MySQL Query Browser v1.2.12 shows the same problem.
[21 Jun 2010 8:51] Mike Lischke
Works fine on Windows, fails on OS X.
[21 Jun 2010 10:20] Mike Lischke
WB on OS X also works fine, provided the data is properly inserted. When you use WB to create and fill the table then the data is correct. When you use the cli and SOURCE <file> that is not the case (which indicates that the data was not correctly imported). The table data inserted by WB is shown incorrectly in the CLI until you use "set name utf8" 

We get the same behavior on Ubuntu, so the entire issue seems rather to be a CLI problem.

How did you actually import your data? When you "set names utf8" prior to loading the script then the data is also correct in WB afterwards.
[21 Jun 2010 10:21] Mike Lischke
btw. please use the latest release when reporting a bug as it is well possible that it has been fixed already.
[21 Jun 2010 11:51] Guido Van Hoecke
Indeed, when creating and filling the table with wb, it is correct. The cli then requires a 'set names utf8' before showing it correctly. I had never suspected the cli, so did not try it that way. My fault.

Anyway, as far as I am concerned, this closes this ticket. As long as I am aware about this cli problem, I can work around it.

Thank you guys, for your time and effort,
[21 Jun 2010 13:28] MySQL Verification Team
Closing according last comment.