Bug #56847 Greek characters limited to 192 to 201
Submitted: 17 Sep 2010 17:50 Modified: 1 Oct 2010 14:07
Reporter: Rick Good Email Updates:
Status: Duplicate Impact on me:
None 
Category:MySQL Server: Charsets Severity:S1 (Critical)
Version:5.0.45 OS:Any
Assigned to: CPU Architecture:Any

[17 Sep 2010 17:50] Rick Good
Description:
The problem, briefly, is if you have more than about 200 characters in a column in Greek in a table - the field being utf8-general, then characters greater then about 200 are cut off. Here's some simple code.

--------------- 
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
</head>
<body>
<% 

Response.CharSet = "utf-8" 
Response.CodePage = 1252 

strConnString = "Driver={MySQL ODBC 3.51 Driver}; SERVER=localhost;" & "charset=UTF8; DATABASE=????; UID=?????; PASSWORD= ??????; OPTION=3" 

'Create the connection object 
Set objConn = Server.CreateObject("ADODB.Connection") 
objConn.Open strConnString 

'Create the recordset object 
Set objRec = Server.CreateObject("ADODB.RecordSet")
'Build the Demographic_Answers SQL string 
strSQL = "Select PIP_Statements FROM el_Statements WHERE ID_Statements = 37" 

' Open the recordset using the query built 
objRec.Open strSQL, objConn 

Response.Write("Statements is " & objRec("PIP_Statements") & "<br><br>") 

%> 

</body> 
</html>
<% 
objConn.Close 
Set objRec = Nothing 
Set objConn = Nothing 
%>
---------------------

Here's the column:

Σχεδóν σε óλα óσα κάνω, το φυσικότερο πράγμα για μένα είναι να αναπτυχθεί μία ρουτίνα ή προσωπική τελετουργία-ένα κανονικό σχέδιο δράσης που λειτσμργεί αποτελεσματικά για μένα, και που επιθυμώ να επαναλάβω.

And here's the result when I run the above code:

Σχεδόν σε όλα όσα κάνω, το φυσικότερο πράγμα για μένα είναι να αναπτυχθεί μία ρουτίνα ή προσωπική τελετουργία-ένα κανονικό σχέδιο δράσης που λειτουργεί αποτελεσματικά για μένα, και που επιθυμώ

You will see the "να επαναλάβω." is cut off - and that's happening on each column which is larger than 192 to 201 characters - but ONLY IF THE CHARACTERS ARE GREEK.

I have put 300+ characters in the same field and the above code works fine. I have also rewritten the field data in Greek with the same problem. Additionally my ISP has confirmed the error - and confirmed the data is good by using PHP. Additionally I have attempted to use the above code with FireFox, Safari, and Crome - all failed. The above code works fine with Danish, Dutch, English, Farsi, and Russian.

Rick

How to repeat:
Create a table with a varchar column which is utf8-generic and allows for, say, 300 characters.

Put more than 200 Greek characters into the column.

Attempt to view the complete column from within ASP.

Suggested fix:
Here is some additional information from my ISP:

Hi Rick,

I spent some time and have investigated this issue up. Duplicated and tested with different PHP, MySQL, ASP and ODBC settings. I installed the latest version of MySQL ODBC 5.1 driver and checked through it. It also showed the limited characters and an unknown character ('�') sitting at the end of line and replacing those extra characters that exceeded the limit.  The URL of a test page using ODBC 5.1 connector is http://shiesl.com/lptest/lptest.asp. After performing multiple tests, I came to the conclusion that its a bug in MySQL ODBC Connector Drivers since there is no other source of error other than this one. Years earlier, I witnessed a similar bug in connector driver 3.51 that got fixed by upgrading the driver, but not this one. Likely, MySQL drivers team is not aware about the issue unless they they are made! So, I'll be opening up a bug report with MySQL, if you won't mind. Meanwhile the only alternative on current shared hosting would be to switch to PHP. PHP5 URL here is working fine http://shiesl.com/lptest/lptest.php. 

Hope this helps saying the least! 

Good day!

Thanks for your email, by the way!

Cheers.

Abbas Khan.
MCTS, MCITP, RHCE.
System Administration Team.
Lunarpages.
[17 Sep 2010 19:09] Sveta Smirnova
Thank you for the report.

Please provide output of SHOW CREATE TABLE table_you_trying_to_insert_greek_characters and exact version of MySQL server which you use.
[17 Sep 2010 19:41] Rick Good
CREATE TABLE `el_statements` (\n `ID_Statements` int(6) NOT NULL,\n `PIP_Statements` varchar(355) character set utf8 default NULL,\n `Misc_Statements` varchar(255) character set utf8 default NULL,\n `Trait_Statements` varchar(255) character set utf8 default NULL,\n `Aspiration_Statements` varchar(255) character set utf8 default NULL,\n PRIMARY KEY (`ID_Statements`)\n) ENGINE=InnoDB DEFAULT CHARSET=utf8
COLLA TE=utf8_unicode_ci

MySQL client version: 5.0.22

Server version: 5.0.45-community-nt

phpMyAdmin - 2.11.6

Protocol version: 10
[17 Sep 2010 19:46] Sveta Smirnova
Thank you for the feedback.

Version 5.0.45 is very old and many bugs were fixed since. Please upgrade to version 5.0.91, try with it and inform if problem still exists.
[17 Sep 2010 19:56] Rick Good
Okay. I'll ask my ISP to upgrade and we'll see what happens. Thanks. Rick
[19 Sep 2010 13:08] Rick Good
Foolishness. The response from my ISP was as follows:

================

Hello Rick,

Since you are under a shared server where multiple  other customer's accounts are currently hosted at, upgrading the mysql connector version in the  server where the component resource is being shared by multiple accounts may bring up further risks with other accounts hosted on the same server as well. As we have gone through several tests and came up into conclusion that the issue may not related to the version of the mysql server verion being used since you're still able to view the contents through phpmyadmin but not when you try to query it from your application, there will be some time required before the latest stable mysql connector is released to replace the current one. As suggested by my colleague, there probably some other alternatives to this which you can evaluate.

-- 
Best Regards,

Zafriyal Afril
windows@lunarpages.com

============

I have written to them and pointed out their foolishness - and the security fixes in the newer version. I'm waiting for a reply. I do remember that they tried the same thing with 5.1 and had the same problem. That was in the note from them I sent you.

Suggestions?

Rick
[20 Sep 2010 11:10] Sveta Smirnova
Thank you for the feedback.

Unfortunately we can not help in this case: problem is not repeatable on my side and we don't backport bug fixes anyway. Please update the report when you are able to provide repeatable test case for current version.
[22 Sep 2010 16:07] Rick Good
I just got this from my ISP. Does it help?

[20 Sep 16:28] Bassu Bassu
Description:
Tables having greek characters are not getting fully displayed and some of characters
from end of each row are being cut off and replaced by a grumbled character ('�') when
fetching data from a db using standard odbc connector.

Some of the facts:

- Tables/fields have proper encoding (utf8), type (varchar) and char limit set.
- Fetching characters of other languages works just fine from the same table.
- ODBC connectors 3.51 and latest 5.1, both have same behaviour.
- ODBC connection encoding is set to 'utf8' of course!
- Setting odbc advanced connection properties options like 'allow big chars' and 'display
full results' also don't make it work.  

Why I think its a bug in odbc connector:

- Testing with odbc-less connection and PHP's mysql_connect, full line shows up
displaying all greek characters. Encoding is set with 'SET NAMES' here.
- Testing with classic ASP and odbc cuts off last few characters. Encoding is set with
Response.CharSet.
- Testing with odbc and PHP's odbc_connect does the same and cuts off last few characters
of selection. 

How to repeat:
1. Create a table having column of type varchar and collation utf8.
2. Insert this line of greek characters into it

Σχεδóν σε óλα óσα κάνω, το φυσικότερο πράγμα για
μένα είναι να αναπτυχθεί μία ρουτίνα ή προσωπική
τελετουργία-ένα κανονικό σχέδιο δράσης που
λειτσμργεί αποτελεσματικά για μένα, και που
επιθυμώ να επαναλάβω.

3. Do a SELECT using odbc connector either with PHP or with classic ASP and you'll see
that last few characters are missing from the full row. In this case the characters
missing are:
"να επαναλάβω"
[21 Sep 7:52] Tonci Grgin
Hi Bassu and thanks for your report.

There is no test case attached so I'm asking you to:
  o Attach small but complete test case demonstrating your problem.
  o Be sure to include connection string or complete description of DNS (type,
options...)
  o Start mysql.exe command line client and provide me with output of SHOW VARIABLES LIKE
"%char%"; statement.

Only after this I will be able to check on problem reported.
[22 Sep 8:50] Bassu Bassu
Screenshot demonstrating the problem

Attachment: screenshot1.png (image/png, text), 123.74 KiB.
[22 Sep 8:52] Bassu Bassu
Just 1K classic asp file using odbc driver to show the problem

Attachment: test.asp (application/octet-stream, text), 659 bytes.
[22 Sep 8:54] Bassu Bassu
Using odbc with PHP, same problem

Attachment: test2.php (application/x-httpd-php, text), 340 bytes.
[22 Sep 9:13] Bassu Bassu
Hi Tonci,

Thanks for replying back.
I believe with the screenshot attached to this bug now, demonstrates actual problem.
Sample test code including connection strings is attached as well.

This is output of charset variables you were looking for.

mysql> SHOW VARIABLES LIKE '%char%';
+--------------------------+------------------------------------------------------------------+
| Variable_name            | Value                                                       
    |
+--------------------------+------------------------------------------------------------------+
| character_set_client     | utf8                                                        
    |
| character_set_connection | utf8                                                        
    |
| character_set_database   | utf8                                                        
    |
| character_set_filesystem | binary                                                      
    |
| character_set_results    | utf8                                                        
    |
| character_set_server     | latin1                                                      
    |
| character_set_system     | utf8                                                        
    |
| character_sets_dir       | ...\MySQL\share\charsets\
+--------------------------+------------------------------------------------------------------+
8 rows in set (0.02 sec)
[22 Sep 2010 17:15] Sveta Smirnova
Thank you for the feedback.

So this is duplicate of bug #56872
[22 Sep 2010 18:06] Rick Good
It looks that way. The main difference seems to be that bug #56872 is marked as "non-critical" and, since this is stopping me from completing the Greek part of a website, I consider it a "Critical" bug.
[25 Sep 2010 22:24] Rick Good
Here is a major hint as to the reason for the problem. It's based on the number of varchars set in MySQL. This varies a little, but here is what I found:

        Number of Varchar characters             Number of characters downloaded

                      255                                                     139
                      355                                                     191
                      400                                                     216
                      450                                                     244
                      500                                                     251
                      600                                                     327
                      700                                                     382
                      750                                                     409
                      800                                                     433+

Thus there seems to be a direct correction. Hope this helps as nothing seems to be happening.

Rick
[1 Oct 2010 1:13] Rick Good
This is worse than I thought. I just tried a Russian statement:

Я – “лицо-организационное” – скорее, “лицо ре-организационное” – чья доминирующая сила выражается в проектировании, построении или эффективном управлении всем множеством деталей новейшего вида организаций (больших или малых), необходимых для того, чтобы помогать человечеству решить ее многие проблемы – будь они юридического, финансового, социального, религиозного или правительственного характера, итд.

It has 404 characters. The results were:

Number of Characters Displayed          Number of Characters Shown

            450                                                     239

            600                                                     299

            700                                                     379

            800                                                     404+

Not good. Not good at all, but at least it's easy to replicate.

Rick
[1 Oct 2010 14:07] Rick Good
And here's a statement in Farsi which is doing the same thing:

 من یک فرد "وابسته به قوانین تشکیلاتی و سازمانی" هستم. در حقیقت من فردی مناسب برای " سازماندهی مجدد قوانین موجود در یک تشکیلات و سازمان " نیز  هستم- کسی که کاملا مسلط بر طراحی,ایجاد و یا اداره  سازماندهی جدیدی است که با هدف کمک کردن به بشریت برای حل مسائل فراوانش بکار برود- این سازماندهی جدید ممکن است مربوط به مشکلات حقوقی , مالی, اجتماعی و مذهبی باشد یا مربوط به مسائل حکومتی و غیره.

(It should be right-to-left.)
[6 Oct 2010 10:20] Lawrenty Novitsky
Rick, I guess you are more interested in solving this problem. Please see the requests we did on the Buh#56872 page. and please share your findings there as this bug is duplicate and thus is not monitored(also because it is filed in wrong category).
Basically - we would like you or bassu to test with latest server version.
btw now, when you see problem with all languages - it makes more sense.