Bug #41830 Equal query not working for all ANSI letters
Submitted: 2 Jan 2009 15:01 Modified: 13 Jan 2009 0:57
Reporter: Jesper Norgaard Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Query Browser Severity:S3 (Non-critical)
Version:1.2.15 OS:Windows
Assigned to: CPU Architecture:Any

[2 Jan 2009 15:01] Jesper Norgaard
Description:
I am trying to make MySQL recognize all ANSI letters in a query from a table where a field is text (nvarchar(255)), but the equal operator does not distinguish between 'PINA' and 'PIÑA'. The field is encoded as latin1.

To reproduce use the following commands:

CREATE TABLE ANSICHAR (nam nvarchar(255));
INSERT ANSICHAR values ('PINA');
INSERT ANSICHAR values ('PIÑA');
SELECT * FROM ANSICHAR WHERE nam = 'PINA'

The expected result would be a recordset with only the first record, not the second, but both records are returned. It is true that it is possible to get the desired record with the following command

SELECT * FROM ANSICHAR WHERE binary nam = 'PINA'

however that is non-standard SQL behavior in my opinion, since this query would not be possible to submit to any ODBC data source and expect correct results, if the ODBC data source were not an MySQL data source, for instance using Microsoft SQL Server will not recognize the keyword "binary".

Yourse sincerely, Jesper Nørgaard Welen

How to repeat:
To reproduce use the following commands:

CREATE TABLE ANSICHAR (nam nvarchar(255));
INSERT ANSICHAR values ('PINA');
INSERT ANSICHAR values ('PIÑA');
SELECT * FROM ANSICHAR WHERE nam = 'PINA'
[2 Jan 2009 15:17] MySQL Verification Team
Thank you for the bug report. I couldn't repeat either with QB or with mysql client against server 5.1:

c:\dbs>c:\dbs\5.1\bin\mysql -uroot --port=3510 --prompt="mysql 5.1 >"
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.31-nt-log Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql 5.1 >use test
Database changed
mysql 5.1 >CREATE TABLE ANSICHAR (nam nvarchar(255));
Query OK, 0 rows affected (0.06 sec)

mysql 5.1 >INSERT ANSICHAR values ('PINA');
Query OK, 1 row affected (0.05 sec)

mysql 5.1 >INSERT ANSICHAR values ('PIÑA');
Query OK, 1 row affected (0.00 sec)

mysql 5.1 >SELECT * FROM ANSICHAR WHERE nam = 'PINA'
    -> ;
+------+
| nam  |
+------+
| PINA |
+------+
1 row in set (0.05 sec)

mysql 5.1 >

I used QB 1.2.15 please try this version. Thanks in advance.
[2 Jan 2009 15:59] Jesper Norgaard
The bug was repeated with MySQL Query Browser 1.2.15. I will try to upload a screenshot of this fact too.
[2 Jan 2009 16:00] Jesper Norgaard
Screen shot of reported error

Attachment: ANSI.png (image/png, text), 56.14 KiB.

[2 Jan 2009 23:07] MySQL Verification Team
select query output

Attachment: select-query.PNG (image/png, text), 58.33 KiB.

[2 Jan 2009 23:09] MySQL Verification Team
Please try to run the select query alone in the query pane like I did in the screenshot I attached. Thanks in advance.
[2 Jan 2009 23:20] Jesper Norgaard
Yes I tried to run the select query alone in the query pane like you did in the screenshot you attached. However same result for me as before. So your screenshots is proof that it can be done (easily) in MySQL. What is left is to figure out what we have different, for instance default charset or default collection, database charset or collection etc. I doubt that the querying program is at fault, rather this must be the way the database was created, the way the table was created, the way (or environment) the data was loaded, or the INI file for MySQL or something like that. What else could it be?
[2 Jan 2009 23:37] MySQL Verification Team
Thank you for the feedback. I am using the below my.ini file:

c:\dbs>cat 5.1\my.ini
[mysqld]
basedir=c:/dbs/5.1/
datadir=c:/dbs/5.1/data
server-id=51
log-bin=server51
default-character-set=utf8
sql_mode=strict_trans_tables
##log-error=server51.err
port=3510
#port=3500
#enable-shared-memory=1
ssl-ca=C:/dbs/5.0/cacert.pem
ssl-cert=C:/dbs/5.0/server-cert.pem
ssl-key=C:/dbs/5.0/server-key.pem
lower_case_table_names = 2

the QB with its defaults setup.

However, I didn't mentioned you I created the table and inserted the rows with the mysql client, not like you did with multiple commands lines in the query pane.
[2 Jan 2009 23:48] Jesper Norgaard
Quote "However, I didn't mentioned you I created the table and inserted the rows with the mysql client, not like you did with multiple commands lines in the query pane."

Should that make a difference? From my point of view, that would be a bug :)
[3 Jan 2009 11:09] Valeriy Kravchuk
Please, send the results of:

SHOW CREATE TABLE ANSICHAR;
SHOW VARIABLES LIKE 'collation%';

from the Query Browser.
[3 Jan 2009 20:22] Jesper Norgaard
SHOW CREATE TABLE ANSICHAR:
'ANSICHAR', 'CREATE TABLE `ansichar` (
  `nam` varchar(255) character set utf8 default NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1'

SHOW VARIABLES LIKE 'collation%':
'collation_connection', 'utf8_general_ci'
'collation_database', 'latin1_swedish_ci'
'collation_server', 'latin1_swedish_ci'

Already I can see there is a mix of utf8 where I would
have expected everything to be latin1. In my my.ini
I have 

[mysql]
default-character-set=latin1

[mysqld]
# The default character set that will be used when a new schema or table is
# created and no character set is defined
default-character-set=latin1

What I want is simply the full ANSI character set (native Windows
characters from a simple unaltered installation). Still I am wondering
how any combination of character sets can effectively make two letters
that are displayed differently on the screen (e.g. using
SELECT * FROM ANSICHAR) could ever be deemed equal with a command
SELECT * FROM ANSICHAR Where nam = <somestring>.
That is surely counterintuitive. I think the MySQL development team should
fix this. But my problem will be solved if we can find a way to represent
the ANSI character set with all queries showing all letters are different.
[12 Jan 2009 13:14] Susanne Ebrecht
Many thanks for writing a bug report.

This is not a bug. You choosed latin1_swedish_ci as collation and for Swedish the rule is: Ñ is equal to N.

When you want a difference here then please choose a collation from another language where the rule says it isn't equal.
[13 Jan 2009 0:57] Jesper Norgaard
Thank you Susanne Ebrecht

It seems if I create my table with

CREATE TABLE `ansichar` (
  `nam` varchar(255) character set utf8 collate utf8_bin default NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1

instead of

CREATE TABLE `ansichar` (
  `nam` varchar(255) character set utf8 default NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1

... then the table is ready to distinguish correctly between PINA and PIÑA and other types of special letters. So in that case it is not a bug, but a feature of the COLLATION technique that will make N and Ñ equal within the swedish collation, but it certainly is not easy to figure out how you turn this feature off. Thank you for the support.