Bug #114152 Invalid mysqldump file generated for some views with collate statements
Submitted: 28 Feb 2024 7:53 Modified: 29 Feb 2024 1:11
Reporter: Timothy Evetts Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: Command-line Clients Severity:S2 (Serious)
Version:mysqldump Ver 8.0.36-0ubuntu0.22.04.1 OS:Ubuntu (Pop!_OS 22.04 LTS)
Assigned to: CPU Architecture:x86 (Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz)
Tags: collate, mysqldump, reference, Views

[28 Feb 2024 7:53] Timothy Evetts
Description:
When a database:

a) Contains two views
b) One of those views (the referrer) references an alphabetically subsequent view
c) The referrer view contains a COLLATE statement when referencing the other view

Then mysqldump creates an invalid SQL dump that results in an error relating to the latin1 charset, even when all character set variables are set to utf8mb4:

ERROR 1253 (42000) at line 78: COLLATION 'utf8mb4_unicode_ci' is not valid for CHARACTER SET 'latin1'.

This occurs due to the "Temporary view structure" output by mysqldump using "SELECT 1" for values, which appears to be incompatible with the COLLATE statement in a view that references it.

How to repeat:
Please ignore the somewhat nonsensical database structure, as this is merely a simplification of a production structure intended to reproduce the error.

1. Create and select a database:

CREATE DATABASE `collation_test`;

USE collation_test;

2. Create a table:

CREATE TABLE `foo` (
  `bar` varchar(10) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

3. Create a view that references the table:

CREATE ALGORITHM = UNDEFINED DEFINER =`root`@`%` SQL SECURITY DEFINER VIEW `view_b` AS
SELECT `foo`.`bar` AS `foobar`
FROM `foo`;

4. Create a view that references the previous view but occurs alphabetically before it and that uses a COLLATE statement (obviously unnecessary in this instance, but necessary in the more complex production structure):

CREATE ALGORITHM = UNDEFINED DEFINER =`root`@`%` SQL SECURITY DEFINER VIEW `view_a` AS
SELECT `foo`.`bar` AS `foobar`
FROM `foo` JOIN `view_b` ON (`foo`.`bar` = `view_b`.`foobar` COLLATE utf8mb4_unicode_ci);

5. Check all character set variables:

SHOW VARIABLES LIKE '%char%';

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8mb4                    |
| character_set_connection | utf8mb4                    |
| character_set_database   | utf8mb4                    |
| character_set_filesystem | binary                     |
| character_set_results    | utf8mb4                    |
| character_set_server     | utf8mb4                    |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

6. Check all collation variables:

SHOW VARIABLES LIKE '%coll%';

+----------------------+--------------------+
| Variable_name        | Value              |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database   | utf8mb4_general_ci |
| collation_server     | utf8mb4_general_ci |
+----------------------+--------------------+

7. Use mysqldump to create a dump file (the test.cnf file just contains authentication details):

mysqldump --defaults-extra-file='test.cnf' --default-character-set=utf8mb4 collation_test > collation_test.sql

8. Attempt to import the dump:

mysql --defaults-extra-file='test.cnf' --default-character-set=utf8mb4 collation_test < collation_test.sql

9. An error occurs:

ERROR 1253 (42000) at line 78: COLLATION 'utf8mb4_unicode_ci' is not valid for CHARACTER SET 'latin1'

Suggested fix:
The resultant collation_test.sql dump file contains (in order):

1. Table structure for table `foo`
2. Dumping data for table `foo`
3. Temporary view structure for view `view_a`
4. Temporary view structure for view `view_b`
5. Final view structure for view `view_a`
6. Final view structure for view `view_b`

The problem arises because at the time the final view structure for view_a is created (#5), view_b is only a temporary view (#4) defined as:

CREATE VIEW `view_b` AS SELECT 
 1 AS `foobar`

The selected integer 1 is then referenced by view_a as "`view_b`.`foobar`" (#5):

VIEW `view_a` AS select `foo`.`bar` AS `foobar` from (`foo` join `view_b` on(`foo`.`bar` = `view_b`.`foobar` collate utf8mb4_unicode_ci))

The "collate utf8mb4_unicode_ci" appears to be incompatible with the integer 1 being selected by the temporary view_b.

The workaround is to manually edit the dump file and rearrange the final view structures to define view_b before view_a, which defeats the purpose of having placeholder views (that is, to resolve such dependency problems).

A possible fix to this is for mysqldump to "SELECT ''" for text columns rather than "SELECT 1".  My testing found that this works.  Some searching also indicated that in past versions of mysqldump, tables were used as temporary placeholders for views and were defined with the same column types as were expected in the view.  This may have been a better solution than the current method of a temporary view that selects invalid (in a narrow set of circumstances) integer values.
[28 Feb 2024 11:58] MySQL Verification Team
HI Mr. Evetts,

Thank you so much for your bug report.

However, we were not able to repeat it. 

Here it is what we have got:

mysql> SHOW VARIABLES LIKE '%char%';
+--------------------------+---------------------------------------------------------------------------+
| Variable_name            | Value                                                                     |
+--------------------------+---------------------------------------------------------------------------+
| character_set_client     | utf8mb4                                                                   |
| character_set_connection | utf8mb4                                                                   |
| character_set_database   | utf8mb4                                                                   |
| character_set_filesystem | binary                                                                    |
| character_set_results    | utf8mb4                                                                   |
| character_set_server     | utf8mb4                                                                   |
| character_set_system     | utf8mb3                                                                   |
| character_sets_dir       |  .....                                                                             |
+--------------------------+---------------------------------------------------------------------------+
8 rows in set (0.05 sec)

mysql> SHOW VARIABLES LIKE '%coll%';
+-------------------------------+--------------------+
| Variable_name                 | Value              |
+-------------------------------+--------------------+
| collation_connection          | utf8mb4_0900_ai_ci |
| collation_database            | utf8mb4_0900_ai_ci |
| collation_server              | utf8mb4_0900_ai_ci |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+
4 rows in set (0.00 sec)

mysql> SELECT * FROM performance_schema.session_variables WHERE VARIABLE_NAME IN (   'character_set_client', 'character_set_connection',   'character_set_results', 'collation_connection' ) ORDER BY VARIABLE_NAME;
+--------------------------+--------------------+
| VARIABLE_NAME            | VARIABLE_VALUE     |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_results    | utf8mb4            |
| collation_connection     | utf8mb4_0900_ai_ci |
+--------------------------+--------------------+
4 rows in set (0.01 sec)

mysqldump --defaults-extra-file='test.cnf'  --default-character-set=utf8mb4 test > collation_test.sql
OK

mysql --defaults-extra-file='test.cnf'  --default-character-set=utf8mb4 test < collation_test.sql
OK

Then, a check once again:

mysql> SHOW VARIABLES LIKE '%char%';
+--------------------------+---------------------------------------------------------------------------+
| Variable_name            | Value                                                                     |
+--------------------------+---------------------------------------------------------------------------+
| character_set_client     | utf8mb4                                                                   |
| character_set_connection | utf8mb4                                                                   |
| character_set_database   | utf8mb4                                                                   |
| character_set_filesystem | binary                                                                    |
| character_set_results    | utf8mb4                                                                   |
| character_set_server     | utf8mb4                                                                   |
| character_set_system     | utf8mb3                                                                   |
| character_sets_dir       | ...  |
+--------------------------+---------------------------------------------------------------------------+
8 rows in set (0.00 sec)

mysql> SHOW VARIABLES LIKE '%coll%';
+-------------------------------+--------------------+
| Variable_name                 | Value              |
+-------------------------------+--------------------+
| collation_connection          | utf8mb4_0900_ai_ci |
| collation_database            | utf8mb4_0900_ai_ci |
| collation_server              | utf8mb4_0900_ai_ci |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+
4 rows in set (0.00 sec)

mysql> SELECT * FROM performance_schema.session_variables WHERE VARIABLE_NAME IN (   'character_set_client', 'character_set_connection',   'character_set_results', 'collation_connection' ) ORDER BY VARIABLE_NAME;
+--------------------------+--------------------+
| VARIABLE_NAME            | VARIABLE_VALUE     |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_results    | utf8mb4            |
| collation_connection     | utf8mb4_0900_ai_ci |
+--------------------------+--------------------+
4 rows in set (0.00 sec)

mysql> show tables;
+----------------+
| Tables_in_test |
+----------------+
| foo            |
| view_a         |
| view_b         |
+----------------+
3 rows in set (0.00 sec)

Table foo and views view_a and view_b were exactly the same as they were created.

Let us know if this has helped you ....... 

Please, especially try the above query from the Performance Schema.
[29 Feb 2024 1:11] Timothy Evetts
Thank you for attempting to replicate it.  The fact that it worked for you hinted to me there must be something different about my setup.

Digging further I realised it's because even though I'm running mysql-client locally and therefore using the MySQL version of mysqldump, the server I was connecting to is actually running MariaDB.  I have now re-tested importing the generated dump file into an actual MySQL server and it works as expected.

This is obviously a bug with MariaDB server and not with mysqldump.
[29 Feb 2024 10:50] MySQL Verification Team
Hi Mr. Evetts,

Thank you for your feedback.

We suppose that the company that makes the server that you are using has it's own site for bug reporting.