MySQL Bugs: #36283: Some non-alphabetic characters seem broken in sql

Bug #36283	Some non-alphabetic characters seem broken in sql_locale.cc.
Submitted:	23 Apr 2008 10:10	Modified:	16 Nov 2018 11:59
Reporter:	Meiji KIMURA	Email Updates:
Status:	Won't fix	Impact on me:	None
Category:	MySQL Server: Compiling	Severity:	S3 (Non-critical)
Version:	5.0.58	OS:	Windows
Assigned to:		CPU Architecture:	Any

Description:
I built MySQL under Bug#36281 environment.
http://bugs.mysql.com/bug.php?id=36281

It seems that no-alphabetic characters seems broken.

How to repeat:
I extract the part of static const char *my_locale_month_names_ar_AE[13] = ... and compile it. 

I made a break point immediate after this and check it with debugger, each pointer specified broken strings.

Suggested fix:
[Original Code]

static const char *my_locale_month_names_ar_AE[13] = 	{"يناير","فبراير","مارس","أبريل","مايو","يونيو","يوليو","أغسطس","سبتمبر","أكتوبر","نوفمبر","ديسمبر", NULLs };

If you continue to use char* as void*, you have to add (const char*)L to each strings like this.

static const char *my_locale_month_names_ar_AE[13] = 
	{(const char*)L"يناير",(const char*)L"فبراير",...}

I have to all my_locale_month_names_ display C4566 warning error.
After this modify, compile complete without C4566 error, and each pointer specify write strings.

[Additional Information]

If programmer want to make a UTF8 strings, my suggestion is wrong.

In source code, arabic characters are stored in UTF8 format. For example, 1st word of them is 'D9 8A D9 86 D8 A7 D9 8A D8 B1'. But after compile, It is stored as '3f 3f 3f 3f 3f 20 3f 3f 3f 3f 3f 3f'. It become meaningless bianry.

All strings with C4566, I think the situation is same.

In japanese environment(Code page 932), these source code without error C4566.
I looked into this in detail. Japanese character is stored as UTF8. For example 1st word of them is '20 31 E6 9C 88'.

static const char *my_locale_month_names_ja_JP[13] = 
 {"1月","2月","3月","4月","5月","6月","7月","8月","9月","10月","11月","12月", NULL };

But after compiling, it is stored in my_locale_month_names_ja_JP[0] as japanese local kanji code(Shift_JIS). It's '31 8c 8e'. 

I wonder Visual C++ 2003(7.1) or POSIX can compile and store these strings as UTF8? Is there any option to handling UTF8 string resources?

>>In japanese environment(Code page 932), these source code without error C4566.

In japanese environment(Code page 932), the array of japanese string (e.g. my_locale_month_names_ja_JP) does not display error C4566.

It is converted to local code set implicitly. (Converted to Shift_Jis)

I found a workaround this situation. If I set OS's language set to 'English_United States.1252', these errors not occur. And it seems that 
these strings stored as UTF8.

I try to find the method not OS but compiler or source code.
But I cannot find parameter of compiler. I found the method to specify on source code like this,

#pragma locale("English_United States.1252")

But this does not change the error situation.

As a result, we have to OS language & codeset in Japanese OS environment.

From http://connect.microsoft.com/VisualStudio/feedback/details/341454/compile-error-with-sourc...

"our suggestion for fixing this issue would be to use a BOM".

So we just need to use ultra editor or other tools to transform sql_locale.cc from utf-8 to utf-8 without BOM. Then, we can compile it in vs 2005.

Transform link:
http://www.ultraedit.com/support/tutorials_power_tips/ultraedit/unicode.html

Sorry, It should be with BOM in transformation.

Posted by developer:
 
The source file in question is being compiled regularly on Windows with VS 2015.
Based on information in found in

http://www.nubaria.com/en/blog/?p=289

it would appear that prefixing string literals containing utf-8 with u8, could be a solution when compiling as C++11 (8.0 and up).
Unfortunately, we are not able to test on Windows/VS using other locales/environment settings, so it impossible to know if this fixes the problem, or indeed if it still is a problem with VS 2015 and later which is required for 8.0 and up.

Please indicate if this would be desirable.

Posted by developer:
 
Closing as there has been no feedback.