Bug #34921 | comparisons with Information schema tables don't honor collation | ||
---|---|---|---|
Submitted: | 28 Feb 2008 11:16 | Modified: | 7 Sep 2016 19:24 |
Reporter: | Roland Bouman | Email Updates: | |
Status: | Closed | Impact on me: | |
Category: | MySQL Server: Information schema | Severity: | S2 (Serious) |
Version: | 5.1.23/5.1BK | OS: | Linux |
Assigned to: | Assigned Account | CPU Architecture: | Any |
[28 Feb 2008 11:16]
Roland Bouman
[28 Feb 2008 12:25]
MySQL Verification Team
Thank you for the bug report.
[16 Apr 2008 19:44]
Peter Gulutzan
See also Trudy's comments on Bug #25630 "INFORMATION_SCHEMA collation wrong for filesystems with case sensitive filenames" It's true that this won't find 'CITY' SELECT ... WHERE TABLE_NAME = 'City' COLLATE utf8_general_ci; but this will: SELECT ... WHERE TABLE_NAME COLLATE utf8_general_ci = 'City';
[23 Apr 2008 15:27]
Martin Hansson
In fact, comparisons with fields in information_schema.columns should always be be performed non-case-sensitive, unless an explicit COLLATE <case-sensitive collation> is used. The collation coercion rules give the highest priority to explicit COLLATE specifications. After that, a column's collation takes precedence over a string constant. And since the columns in the information_schema.columns table use a non-case-sensitive collation, the comparison uses that collation, unless COLLATE is issued. Consider: select column_name, collation_name from information_schema.columns where table_name='columns'; +--------------------------+-----------------+ | column_name | collation_name | +--------------------------+-----------------+ . . . | TABLE_NAME | utf8_general_ci | . . . Yes, reflexivity is a beatiful thing. :-)
[24 Apr 2008 10:19]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/45938 ChangeSet@1.2562, 2008-04-24 12:11:57+02:00, mhansson@riffraff.(none) +3 -0 Bug#34921: comparisons with Information schema tables don't honor collation When searching for tables using information_schema tables, the optimization to answer the query by only opening FRM files failed because a lookup key was created and the file system will generally disregard collations when using this key. Fixed by not creating the lookup key if the bespoke optimization is used.
[24 Apr 2008 10:20]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/45939 ChangeSet@1.2562, 2008-04-24 12:06:13+02:00, mhansson@riffraff.(none) +3 -0 Bug#34921: comparisons with Information schema tables don't honor collation When searching for tables using information_schema tables, the optimization to answer the query by only opening FRM files failed because a lookup key was created and the file system will generally disregard collations when using this key. Fixed by not creating the lookup key if the bespoke optimization is used.
[29 Apr 2008 16:09]
Martin Hansson
The below fix does not require any collations to be changed. Instead, the fix reimplements the optimization in WL#3732 so that we do not use the key to open files. There is still room for improvement, however, since pre-WL#3732 we would scan all databases, not just those matching the search key. The 2 benchmarks below illustrate where the performance gain is. Execution times were measured on an intel quad-core with 8G Ram. Benchmark 1 ----------- Creates 5000 tables in the same database. With this patch 0.022s Without this patch 0.001s Pre-WL#3732 0.028s Benchmark 2 ----------- Creates 2000 databases, 15 tables in each one. With this patch 0.014 Without this patch 0.003 Pre-WL#3732 0.797
[29 Apr 2008 16:12]
Martin Hansson
Benchmark 1
Attachment: benchmark.test (application/octet-stream, text), 269 bytes.
[29 Apr 2008 16:13]
Martin Hansson
Benchmark 2
Attachment: benchmark2.test (application/octet-stream, text), 829 bytes.
[29 Apr 2008 16:15]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/46198 ChangeSet@1.2569, 2008-04-29 18:07:26+02:00, mhansson@riffraff.(none) +3 -0 Bug#34921: comparisons with Information schema tables don't honor collation The bug introduced by WL 3732 is that the file system's open function is used as an index when searching for tables with a given table name and database name. The problem is that we support a diversity of file systems treating e.g. letter case differently. In general the file system cannot be trusted to implement the correct collation. Fixed by using the search key to prune database names from the result set before scanning the databases, as opposed to opening files by name.
[12 May 2008 8:37]
Martin Hansson
Summary of an IRC discussion regarding Bug#34921 on May 7 ------------------------------------------------------------------ Participants: Alexander Barkov (bar), Sergey Gluhov (gluh), Sergei Golubchik (serg), Peter Gulutzan (peter), Martin Hansson (mhansson), Georgi Kodinov (joro) This is a summary of a proposed fix for this bug - "the proposal" , that will also rectify other small problems related to table and database searches in information_schema. Worklog#3732: Information schema optimization introduced an optimization for table and database lookups in information_schema that broke behaviour on some platforms. It actually broke behaviour on all platforms, but the main problem is on platforms with case-sensitive filenames. This broken behaviour is reported as Bug#34921. Wl#3732 notwithstanding, the current proposal is to change the collation, utf8_general_ci, for all file-based SQL objects in information_schema. According to bar, it was wrong to use this collation in the first place since it does not respect accent differences. This should (says bar) have been fixed along with Bug#25630: INFORMATION_SCHEMA collation wrong for filesystems with case sensitive filenames. joro, bar, serg, and gluh all agree that that the I_S should reflect the general handling of identifiers. I.e. when two identifiers are considered syntactically equal on a given platform, the collation for the corresponding column in an information_schema table should treat strings consisting of the identifiers accordingly. Hence we need to have different collations on different platforms. By implementing this, Bug#34921 will disappear. peter's main objection is that this is a major change in behaviour and has to be approved by the server product team. Another objection is that this contradicts the current manual, which says that there should be only one collation inside information_schema. Another point of concern is whether search with an explicitly mentioned collation will still work. The answer is yes, and the reason is that the optimization from WL#3732 will be disabled in this case. It will be slow but correct. We also discussed whether the aforementioned change of collation is actually desired by users and since this is - as peter correctly pointer out - a matter of opinion, we took a vote on it. The veridict was to change the collations, and the verdict was accepted by everyone. "<serg> it's only logical to say that SELECT FROM I_S.TABLES WHERE TABLE_NAME='tablename' should use the same rules as SELECT * FROM tablename" "<peter> serg: asserting 'it is a bug fix' does not address the evidence, which is (a) the older behaviour was accepted in a clear and written decision, (b) the bug complaint is not about the older behaviour but about the newer behaviour, (c) the documentation says somewhere that we have one collation, (d) there was nothing in the 'specification' for the optimization (as far as i can see) that hinted about this change." "<serg> (a) if the bug is caused by a decided behavior, it's still a bug. (b) "newer" = "caused by the optimization" ? yes, I agree. (c) well, it's wrong then. (d) correct, it's an unexpected bug caused by the optimization. which does not mean that it should not be fixed." The following columns would be affected (according to gluh). Columns within parenteses should probably not be changed since we only populate them with NULL. Catalogs and sql_path's are not used in MySQL. SCHEMATA: (CATALOG_NAME) SCHEMA_NAME (SQL_PATH) TABLES, COLUMNS, STATISTICS, VIEWS, hidden table TABLE_NAMES, PARTITIONS: (TABLE_CATALOG) TABLE_SCHEMA TABLE_NAME EVENT: (EVENT_CATALOG) EVENT_SCHEMA PROC: (ROUTINE_CATALOG) ROUTINE_SCHEMA (SQL_PATH) USER_PRIVILEGES: (TABLE_CATALOG) SCHEMA_PRIVILEGES: (TABLE_CATALOG) TABLE_SCHEMA TABLE_PRIVILEGES, COLUMN_PRIVILEGES: (TABLE_CATALOG) TABLE_SCHEMA TABLE_NAME hidden table OPEN_TABLES: Database Table FILES: FILE_NAME TABLESPACE_NAME (TABLE_CATALOG) TABLE_SCHEMA TABLE_NAME PROCESSLIST: DB TABLE_CONSTRAINTS: (CONSTRAINT_CATALOG) CONSTRAINT_SCHEMA TABLE_SCHEMA TABLE_NAME KEY_COLUMN_USAGE: (CONSTRAINT_CATALOG) CONSTRAINT_SCHEMA (TABLE_CATALOG) TABLE_SCHEMA TABLE_NAME REFERENCED_TABLE_SCHEMA REFERENCED_TABLE_NAME REFERENTIAL_CONSTRAINTS: (CONSTRAINT_CATALOG) CONSTRAINT_SCHEMA (UNIQUE_CONSTRAINT_CATALOG) UNIQUE_CONSTRAINT_SCHEMA REFERENCED_TABLE_NAME TRIGGERS: (TRIGGER_CATALOG) TRIGGER_SCHEMA TRIGGER_NAME (EVENT_OBJECT_CATALOG) EVENT_OBJECT_SCHEMA EVENT_OBJECT_TABLE Another argument that was raised by serg is regarding constraints on the information_schema tables. The definition of base table "TABLES" includes: ... TABLE_CATALOG INFORMATION_SCHEMA.SQL_IDENTIFIER, TABLE_SCHEMA INFORMATION_SCHEMA.SQL_IDENTIFIER, TABLE_NAME INFORMATION_SCHEMA.SQL_IDENTIFIER, CONSTRAINT TABLES_PRIMARY_KEY PRIMARY KEY ( TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME ), ... Currently we cannot support this constraint since we can have two different tables on Linux named 't' and 'T'. The current collation for TABLES.TABLE_NAME treats them as equal, and hence we fail to fulfil this constraint. For Windows, this same applies if we have two tables name 'ta' and 'tä'. This may be of minor imortance, however: "<peter> roland_away: it baffles me that anybody cares about this 'implied pk'. it seems to me that it is a reintroduction of the confusion between "what we will look at when deciding if it's a duplicate" and "what we will use when searching information_schema". and our whole, sole, entire purpose is what we will use when searching information_schema, for now." Other databases: oracle will do searches with a case sensitive collation because that's the collation oracle normally uses. microsoft SQL server will do searches with a case insensitive collation because that is the collation that microsoft normally uses. We support both ways depanding on platform, which is the rationale behind the proposal. Subjects that were not addressed in the meeting are: logfile groups and falcon tablespaces. Discussions are unfolding this week per e-mail.
[10 Oct 2008 21:11]
Arjen Lentz
Is this stalled?
[12 Oct 2008 9:32]
Sveta Smirnova
Bug #39894 was marked as duplicate of this one.
[14 Sep 2009 6:29]
Roel Van de Paar
See bug #46771
[1 Oct 2009 7:10]
Sveta Smirnova
Bug #47755 was marked as duplicate of this one.
[21 Jan 2010 18:59]
Paul DuBois
See http://dev.mysql.com/doc/refman/5.1/en/charset-collation-information-schema.html.
[23 Jun 2010 15:15]
Georgi Kodinov
See bug #53095.
[2 Sep 2016 8:57]
Gopal Shankar
With WL#6599, we do proper comparison. See below. set names utf8; create table city (t1 int); select * from information_schema.columns where table_schema = schema() and table_name = 'City'; TABLE_CATALOG TABLE_SCHEMA TABLE_NAME COLUMN_NAME ORDINAL_POSITION COLUMN_DEFAULT IS_NULLABLE DATA_TYPE CHARACTER_MAXIMUM_LENGTH CHARACTER_OCTET_LENGTH NUMERIC_PRECISION NUMERIC_SCALE DATETIME_PRECISION CHARACTER_SET_NAME COLLATION_NAME COLUMN_TYPE COLUMN_KEY EXTRA PRIVILEGES COLUMN_COMMENT GENERATION_EXPRESSION select * from information_schema.columns where table_schema = schema() and table_name = 'City' collate utf8_general_ci; TABLE_CATALOG TABLE_SCHEMA TABLE_NAME COLUMN_NAME ORDINAL_POSITION COLUMN_DEFAULT IS_NULLABLE DATA_TYPE CHARACTER_MAXIMUM_LENGTH CHARACTER_OCTET_LENGTH NUMERIC_PRECISION NUMERIC_SCALE DATETIME_PRECISION CHARACTER_SET_NAME COLLATION_NAME COLUMN_TYPE COLUMN_KEY EXTRA PRIVILEGES COLUMN_COMMENT GENERATION_EXPRESSION def test city t1 1 NULL YES int NULL NULL 10 0 NULL NULL NULL int(11) select,insert,update,references
[7 Sep 2016 7:01]
Ståle Deraas
Posted by developer: Fixed by WL#6599
[7 Sep 2016 19:24]
Paul DuBois
Posted by developer: Noted in 8.0.0 changelog. For queries on INFORMATION_SCHEMA tables, comparisons of schema and table names could be case sensitive or insensitive, depending on the characteristics of the underlying file system and the lower_case_table_names system variable value. Schema and table name comparisons now are case insensitive. For some queries, this may produce results that differ from previous results; to handle this, provide a COLLATE clause specifying a collation that has the desired comparison properties.
[8 Sep 2016 16:35]
Paul DuBois
Revised changelog entry: For queries on INFORMATION_SCHEMA tables, comparisons of schema and table names could be case sensitive or insensitive, depending on the characteristics of the underlying file system and the lower_case_table_names system variable value. Furthermore, it was ineffective to provide a COLLATE clause to change the comparison properties because that clause was ignored. This has been changed so that COLLATE is no longer ignored and can be used to obtain the desired comparison properties.