Bug #6568 UTF-8 Support Missing/Broken in 4.1.9 (was 4.1.7)
Submitted: 11 Nov 2004 5:42 Modified: 26 Apr 2005 5:52
Reporter: James Barwick Email Updates:
Status: No Feedback Impact on me:
None 
Category:MySQL Server: Compiling Severity:S3 (Non-critical)
Version:4.1.9 OS:Linux (SuSE 9.0)
Assigned to: CPU Architecture:Any

[11 Nov 2004 5:42] James Barwick
Description:
FIRST, I will preface by saying this MAY be my problem.  But I must explain so that we can be SURE it's not...I don't know...

Two Tests:
1) Download of the Binary 4.1.7 release for Linux x86 RPM's
    a) server-4.1.7-0
    b) client-4.1.7-0
    c) libraries and headers-4.1.7-0
    d) dynamic client libraryis 4.1.7-0

2) Download of the Source 4.1.7 release tarball

Binary 4.1.7 compiled with default latin1 character set
Source 4.1.7 compiled on SuSE 9.0 with libiconv-1.8 and UTF-8 as default character set

I WILL provide the "configure script" that I wrote to compile the source if needed.

Problems:

Option 1) Binary download

    Compiling PHP against the 4.1.7 precompiled client libraries, ALL UTF-8 
    code pulled from the database is corrupt (japanese code block sections).  
    Even with default-character-set=utf-8  in my.cnf [mysql] and [mysqld] sections.
    I assume the binary is compiled against glibc's ICONV library

Option 2) Source download and compile

    Data extract from the 4.1.7 database with the compiled 4.1.7 client in PHP
    compiled with default-character-set=UTF-8 extracts data perfectly (all ICONV
    code sets use libiconv-1.8 with SJIS/CP932 libiconv patch).

    Error:  A simple statment such as "select user from user 
    where user regexp '~{12}$' = 1"  throws a STACK TRACE DUMP 
    and kills the server...this aint the client, this is a sever crash.

See my previous bugs reported by me "James Barwick", you will receive the stack 
trace dump for 4.1.5-gamma...I will reproduce the stack trace and symbol list
as required/requested.

Assumption:  REGEXP library dies with UTF-8...only works with latin1.

NOW...THIS SOUNDS LIKE AN INSTALLATION SUPPORT ISSUE?  YES?  I don't think so, I don't need help installing, just need to know if this is a BUG for UTF-8 in MySQL.  Don't
want to have to start using PostgresSQL...I've waited a LONG time for MySQL 4.1...

3 things I need:

1)  I would love to use the binary 4.1.7...but the UTF-8 code doesn't seem to work (PHP clients return garbage)...my compiled versions are OK...wazzup?

2)  Tell me what parameter is needed to get PHP to return proper UTF-8 code that is
not corrupt, and Microsoft CP932 code pages are recognized in ICONV....or...help me figure out how to get glibc to use libiconv instead of internal iconv library...

3)  Give me a HINT on why REGEXP would crash with a compiled default character set of UTF-8....I will change my config parameters.

Note:  Simple statement such as "Select * from table where name like '%name%' SOMETIMES crash the server...but not always.  The binary version returns NO RESULTS because of UTF-8 code corruption.

HELP me make MySQL better...HELP me VERIFY that this is not a MySQL bug and it's just me...I LOVE you guys!  Let's make it better (oh, noticed that my previous bug reports that you mentioned would be fixed in 4.1.7 are NOT fixed in 4.1.7 effecting 64-bit systems only.....just an FYI...also..RAID aint compilin'...but that's my problem)....

How to repeat:
jbarwick@knowledge:~> cat /usr/src/myconfig
make clean
./configure \
  --prefix=/var/lib/mysql \
  --exec-prefix=/usr \
  --bindir=/usr/bin \
  --sbindir=/usr/sbin \
  --libexecdir=/usr/libexec \
  --datadir=/usr/share \
  --sysconfdir=/etc \
  --sharedstatedir=/usr/com \
  --localstatedir=/var/lib/mysql \
  --libdir=/usr/lib \
  --includedir=/usr/include \
  --infodir=/usr/info \
  --mandir=/usr/man \
  --enable-thread-safe-client \
  --enable-local-infile \
  --with-unix-socket-path=/var/lib/mysql/mysql.sock \
  --with-tcp-port=3306 \
  --with-mysqld-user=mysql \
  --without-debug \
  --with-openssl \
  --with-charset=utf8 \
  --with-collation=utf8_general_ci \
  --with-extra-charsets=all \
  --with-vio \
  --with-isam \
  --with-pthreads \
  --without-readline \
  --without-libedit \
  --enable-assembler \
  --with-berkeley-db \

#  --with-raid=yes \

make
[24 Mar 2005 4:43] Jorge del Conde
Hi

I was unable to reproduce this using FC2 and 4.1.11 from bk.
[24 Mar 2005 4:44] Jorge del Conde
James, can you please tell me if you can reproduce this behaviour using the latest code from our bk tree ?

thanks.
[24 Mar 2005 12:38] Sergei Golubchik
regex crash that you experience was probably fixed in 4.1.8 (see bug#7111).

Incorrect characters - php does not read [mysql] section in my.cnf
[26 Mar 2005 5:52] James Barwick
Couple of follow-ups...We have moved on to the 4.1.9 source and compiled.

We have removed all references to REGEXP in our application (we only had a couple) and are now doing REGEXP in PHP script, not the database. (we are afraid of mysql now).  Our 4.1.9 environment with PHP 4.3.8 is now running on 8 servers in UTF-8 mode in "production".  We are watching performance carefully...Note: LC_TYPE and LANG are set to "en_US.UTF8", not "C".  (SuSE root user configuration has an option to set LC_TYPE to C, we are not doing that..root user is UTF8...and subsiquently the mysql user which the database runs under has LANG/LC_TYPE set to UTF8 as well)

my.cnf default charset is utf8 and all table default charsets are UTF8

3 Machines are SuSE 9.0, and 5 Machines are RedHat 9.0

We have 3 more test Machines Running SuSE 9.0, 1 Development machine running SuSE 9.0, another Development Machine running SuSE 9.2, and another Development Machine running SuSE 9.0 (x86_64...soon to be upgraded to SuSE 9.2-x86_64) 

We do not have any have servers on mySQL 4x release older than 4.1.9.  we do have 3 more servers Running RedHat 9.0 with mySQL 3.23 we are scheduling for upgrade.  And only upgrading the DB because of UTF8 character problems in the database. (I was dumn 2 years ago and thought that we could deploy enough checks in the application to keep bad UTF8 out of the database...biggest problems is inserting into short fields with truncation errors...my guys keep forgetting to truncate the data in PHP or in Java)...oh, well...live and learn....we will soon see if MySQL4.1.9 handles this nicely...hope so...not enough experience just yet...(very difficult to test in the lab...my QA team say the initial tests looked good)...anyway...

I will try to over the next couple of days, follow-up on the 4.1.9 source to determine if REGEXP is dying and causing a server crash.

Sorry guys...please be patient.
[26 Apr 2005 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".