Bug #96449 scripts/sql_commands_help_data.h: clang-8 reports broken source encoding
Submitted: 7 Aug 2019 12:02 Modified: 16 Aug 2019 17:47
Reporter: Przemysław Skibiński (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Compiling Severity:S3 (Non-critical)
Version:5.7.27 OS:Any
Assigned to: CPU Architecture:Any

[7 Aug 2019 12:02] Przemysław Skibiński
Description:
clang-8 reported the following issue:

In file included from /data/mysql-server/5.7/sql/sql_initialize.cc:30:
/data/mysql-server/5.7-debug-clang8-tokudb/include/../scripts/sql_commands_help_data.h:316:121: error: 
      illegal character encoding in string literal [-Werror,-Winvalid-source-encoding]
  ...[COLLATE collation_name]\\n\\nA TEXT column with a maximum length of 16,777,215 (224 <E2><88>"
                                                                                          ^~~~~~~~
/data/mysql-server/5.7-debug-clang8/include/../scripts/sql_commands_help_data.h:317:4: error: 
      illegal character encoding in string literal [-Werror,-Winvalid-source-encoding]
  "<92> 1)\\ncharacters. The effective maximum length is less if the value contains\\nmultibyte ...
   ^~~~
2 errors generated.

Clang-8 is right as utf-8 encoded characters (3 bytes in this case) shouldn't be split into two separate lines.

How to repeat:
CC=clang-8 CXX=clang++-8 cmake -DCMAKE_BUILD_TYPE=Debug -DMYSQL_MAINTAINER_MODE=ON -DBUILD_CONFIG=mysql_release -DFEATURE_SET=community -DENABLE_DTRACE=OFF -DENABLE_DOWNLOADS=1 -DDOWNLOAD_BOOST=1 -DWITH_BOOST=../deps -DWITH_SSL=system

Suggested fix:
diff --git a/scripts/comp_sql.c b/scripts/comp_sql.c
index cf1a582d4b5..df458274904 100644
--- a/scripts/comp_sql.c
+++ b/scripts/comp_sql.c
@@ -79,7 +79,8 @@ static void print_query(FILE *out, const char *query)
   fprintf(out, "\"");
   while (*ptr)
   {
-    if (column >= 120)
+    /* utf-8 encoded characters are always < 0 (or >= 0x80 for unsigned) */
+    if ((column >= 120) && (*ptr >= 0))
     {
       /* Wrap to the next line, tabulated. */
       fprintf(out, "\"\n  \"");
[7 Aug 2019 12:52] MySQL Verification Team
Hello Mr. Skibinski,

Thank you for your bug report.

I have analysed it and concluded that you are correct.

Verified as reported.
[16 Aug 2019 17:47] Paul DuBois
Posted by developer:
 
Fixed in 5.7.28, 8.0.18.

When generating C source from sql scripts, Some utf8-encoded
characters were split across lines. Thanks to Przemysław Skibiński
for the patch.