MySQL Bugs: #68795: Datetime field comparisons do not work properly with utf8_unicode

Bug #68795	Datetime field comparisons do not work properly with utf8_unicode_ci collation
Submitted:	27 Mar 2013 12:22	Modified:	30 Jul 2013 13:47
Reporter:	Ovais Tariq	Email Updates:
Status:	Closed	Impact on me:	None
Category:	MySQL Server: Charsets	Severity:	S3 (Non-critical)
Version:	5.5.30	OS:	Any
Assigned to:		CPU Architecture:	Any

Description:
Server version:

mysql [localhost] {msandbox} (coll_test) > \s
--------------
/work/binaries/mysql-server/5.5.30/bin/mysql  Ver 14.14 Distrib 5.5.30, for linux2.6 (x86_64) using readline 5.1

Connection id:		1
Current database:	coll_test
Current user:		msandbox@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.5.30 MySQL Community Server (GPL)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	latin1
Db     characterset:	utf8
Client characterset:	latin1
Conn.  characterset:	latin1
UNIX socket:		/tmp/mysql_sandbox5530.sock
Uptime:			20 min 42 sec

Threads: 1  Questions: 72  Slow queries: 0  Opens: 40  Flush tables: 1  Open tables: 28  Queries per second avg: 0.057

When comparing datetime values to strings, the utf8_unicode_ci collation prevents correct comparisons. 

Suppose we have the following table:
mysql [localhost] {msandbox} (coll_test) > show create table lang\G
*************************** 1. row ***************************
       Table: lang
Create Table: CREATE TABLE `lang` (
  `dt` datetime NOT NULL,
  PRIMARY KEY (`dt`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
1 row in set (0.00 sec)

And the table contains the following data:
mysql [localhost] {msandbox} (coll_test) > select * from lang;
+---------------------+
| dt                  |
+---------------------+
| 1979-12-09 00:00:00 |
| 1980-11-09 00:00:00 |
| 2012-12-11 12:15:00 |
+---------------------+
3 rows in set (0.00 sec)

Then the results of comparison between the column dt and a date string depends on the collation of the string. 

When the collation is latin1_swedish_ci, the comparison is correctly performed:
mysql [localhost] {msandbox} (coll_test) > select dt, charset(dt), collation(dt) from lang where dt = convert('1979-12-09' using latin1) collate latin1_swedish_ci;
+---------------------+-------------+---------------+
| dt                  | charset(dt) | collation(dt) |
+---------------------+-------------+---------------+
| 1979-12-09 00:00:00 | binary      | binary        |
+---------------------+-------------+---------------+
1 row in set (0.00 sec)

When the collation is utf8_general_ci, the comparison is also correctly performed:
mysql [localhost] {msandbox} (coll_test) > select dt, charset(dt), collation(dt) from lang where dt = convert('1979-12-09' using utf8) collate utf8_general_ci;
+---------------------+-------------+---------------+
| dt                  | charset(dt) | collation(dt) |
+---------------------+-------------+---------------+
| 1979-12-09 00:00:00 | binary      | binary        |
+---------------------+-------------+---------------+
1 row in set (0.00 sec)

But when the collation is utf8_unicode_ci, then the comparison is incorrectly performed:
mysql [localhost] {msandbox} (coll_test) > select dt, charset(dt), collation(dt) from lang where dt = convert('1979-12-09' using utf8) collate utf8_unicode_ci;
Empty set (0.00 sec)

The comparison results should be consistent and should not change as above depending on the collation used.

How to repeat:
create database coll_test default character set = utf8 default collate = utf8_unicode_ci;
use coll_test;

create table lang(dt datetime not null primary key) engine=myisam;
insert into lang values('1980-11-09 00:00:00'), ('1979-12-09 00:00:00'), ('2012-12-11 12:15:00');

set names utf8 collate utf8_general_ci;
select * from lang where dt='1979-12-09'; -- returns 1 row correctly

set names utf8 collate utf8_unicode_ci;
select * from lang where dt='1979-12-09'; -- returns 0 rows

set names latin1 collate latin1_swedish_ci;
select * from lang where dt='1979-12-09'; -- returns 1 row correctly

That exact version that introduced this change is 5.5.3 !

Hello Ovais,

Thank you for the report.
Verified as described.

Thanks,
Umesh

Umesh,

Thanks for verifying it. I am now looking forward to a fix and hopefully it will make into 5.5 as well

Also date and date literal comparison is broken for at least utf8_unicode_ci and utf8_danish_ci collations.

Consider next case, where '20130101' is a valid date literal, it will result in 0:
SET collation_connection = 'utf8_danish_ci'; SELECT CAST('20130101' AS DATE) = '20130101';

Please consider escalating issue's severity because it renders all queries with dates in WHERE condition return invalid results and it affects all Ubuntu releases up from 12.04 LTS.

Noted in 5.5.33, 5.6.13, 5.7.2 changelogs.

Comparison of a DATETIME value and a string did not work correctly
for the utf8_unicode_ci collation.