Bug #68795 Datetime field comparisons do not work properly with utf8_unicode_ci collation
Submitted: 27 Mar 2013 12:22 Modified: 30 Jul 2013 13:47
Reporter: Ovais Tariq Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Charsets Severity:S3 (Non-critical)
Version:5.5.30 OS:Any
Assigned to: CPU Architecture:Any

[27 Mar 2013 12:22] Ovais Tariq
Description:
Server version:

mysql [localhost] {msandbox} (coll_test) > \s
--------------
/work/binaries/mysql-server/5.5.30/bin/mysql  Ver 14.14 Distrib 5.5.30, for linux2.6 (x86_64) using readline 5.1

Connection id:		1
Current database:	coll_test
Current user:		msandbox@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.5.30 MySQL Community Server (GPL)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	latin1
Db     characterset:	utf8
Client characterset:	latin1
Conn.  characterset:	latin1
UNIX socket:		/tmp/mysql_sandbox5530.sock
Uptime:			20 min 42 sec

Threads: 1  Questions: 72  Slow queries: 0  Opens: 40  Flush tables: 1  Open tables: 28  Queries per second avg: 0.057

When comparing datetime values to strings, the utf8_unicode_ci collation prevents correct comparisons. 

Suppose we have the following table:
mysql [localhost] {msandbox} (coll_test) > show create table lang\G
*************************** 1. row ***************************
       Table: lang
Create Table: CREATE TABLE `lang` (
  `dt` datetime NOT NULL,
  PRIMARY KEY (`dt`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
1 row in set (0.00 sec)

And the table contains the following data:
mysql [localhost] {msandbox} (coll_test) > select * from lang;
+---------------------+
| dt                  |
+---------------------+
| 1979-12-09 00:00:00 |
| 1980-11-09 00:00:00 |
| 2012-12-11 12:15:00 |
+---------------------+
3 rows in set (0.00 sec)

Then the results of comparison between the column dt and a date string depends on the collation of the string. 

When the collation is latin1_swedish_ci, the comparison is correctly performed:
mysql [localhost] {msandbox} (coll_test) > select dt, charset(dt), collation(dt) from lang where dt = convert('1979-12-09' using latin1) collate latin1_swedish_ci;
+---------------------+-------------+---------------+
| dt                  | charset(dt) | collation(dt) |
+---------------------+-------------+---------------+
| 1979-12-09 00:00:00 | binary      | binary        |
+---------------------+-------------+---------------+
1 row in set (0.00 sec)

When the collation is utf8_general_ci, the comparison is also correctly performed:
mysql [localhost] {msandbox} (coll_test) > select dt, charset(dt), collation(dt) from lang where dt = convert('1979-12-09' using utf8) collate utf8_general_ci;
+---------------------+-------------+---------------+
| dt                  | charset(dt) | collation(dt) |
+---------------------+-------------+---------------+
| 1979-12-09 00:00:00 | binary      | binary        |
+---------------------+-------------+---------------+
1 row in set (0.00 sec)

But when the collation is utf8_unicode_ci, then the comparison is incorrectly performed:
mysql [localhost] {msandbox} (coll_test) > select dt, charset(dt), collation(dt) from lang where dt = convert('1979-12-09' using utf8) collate utf8_unicode_ci;
Empty set (0.00 sec)

The comparison results should be consistent and should not change as above depending on the collation used.

How to repeat:
create database coll_test default character set = utf8 default collate = utf8_unicode_ci;
use coll_test;

create table lang(dt datetime not null primary key) engine=myisam;
insert into lang values('1980-11-09 00:00:00'), ('1979-12-09 00:00:00'), ('2012-12-11 12:15:00');

set names utf8 collate utf8_general_ci;
select * from lang where dt='1979-12-09'; -- returns 1 row correctly

set names utf8 collate utf8_unicode_ci;
select * from lang where dt='1979-12-09'; -- returns 0 rows

set names latin1 collate latin1_swedish_ci;
select * from lang where dt='1979-12-09'; -- returns 1 row correctly
[28 Mar 2013 10:48] Shane Bester
That exact version that introduced this change is 5.5.3 !
[28 Mar 2013 11:11] Umesh Shastry
Hello Ovais,

Thank you for the report.
Verified as described.

Thanks,
Umesh
[28 Mar 2013 14:18] Ovais Tariq
Umesh,

Thanks for verifying it. I am now looking forward to a fix and hopefully it will make into 5.5 as well
[22 Apr 2013 10:46] Pavel Zubkou
Also date and date literal comparison is broken for at least utf8_unicode_ci and utf8_danish_ci collations.

Consider next case, where '20130101' is a valid date literal, it will result in 0:
SET collation_connection = 'utf8_danish_ci'; SELECT CAST('20130101' AS DATE) = '20130101';

Please consider escalating issue's severity because it renders all queries with dates in WHERE condition return invalid results and it affects all Ubuntu releases up from 12.04 LTS.
[30 Jul 2013 13:47] Paul Dubois
Noted in 5.5.33, 5.6.13, 5.7.2 changelogs.

Comparison of a DATETIME value and a string did not work correctly
for the utf8_unicode_ci collation.