Bug #7707 Unable to full-text index utf8 column
Submitted: 6 Jan 2005 14:50 Modified: 26 Jan 2005 21:01
Reporter: digesh kapadia Email Updates:
Status: Can't repeat Impact on me:
None 
Category:MySQL Server: MyISAM storage engine Severity:S3 (Non-critical)
Version:4.1.7 RPM OS:Linux (Linux 2.6.5-1.327smp / x86_64)
Assigned to: CPU Architecture:Any

[6 Jan 2005 14:50] digesh kapadia
Description:
Server Type: Dedicated MySQL Full-text server with Dual Opteron 64 Bit Processor with 16GB of memory.

I am trying to full-text index an utf8 column of 1 record its been 16 hours and still it hasn't created the index. The data content is html code of a webpage.

Some of the variables settings is as follows:

Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
collation_connection | latin1_swedish_ci |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci 
ft_boolean_syntax        | + -><()~*:""&| |
| ft_max_word_len          | 84             |
| ft_min_word_len          | 3              |
| ft_query_expansion_limit | 20             |
| ft_stopword_file         | (built-in)     
key_buffer_size          | 4294963200 |
| key_cache_age_threshold  | 300        |
| key_cache_block_size     | 1024       |
| key_cache_division_limit | 100        |
max_allowed_packet         | 33553408             |
| max_binlog_cache_size      | 18446744073709551615 |
| max_binlog_size            | 1073741824           |
| max_connect_errors         | 10                   |
| max_connections            | 1024                 |
| max_delayed_threads        | 20                   |
| max_error_count            | 64                   |
| max_heap_table_size        | 16777216             |
| max_insert_delayed_threads | 20                   |
| max_join_size              | 18446744073709551615 |
| max_length_for_sort_data   | 1024                 |
| max_relay_log_size         | 0                    |
| max_seeks_for_key          | 18446744073709551615 |
| max_sort_length            | 4194304              |
| max_tmp_tables             | 32                   |
| max_user_connections       | 0                    |
| max_write_lock_count       | 18446744073709551615 |

How to repeat:
In order to test the scenario I created 2 tables as follows:

CREATE TABLE href_test_non_utf8
(
id int(10) unsigned NOT NULL auto_increment,
content text,
PRIMARY KEY  (id),
FULLTEXT KEY ft_content(content)
)ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=1935228928 AVG_ROW_LENGTH=40000;

CREATE TABLE href_test_utf8
(
id int(10) unsigned NOT NULL auto_increment,
content text CHARACTER SET UTF8,
PRIMARY KEY  (id),
FULLTEXT KEY ft_content(content)
)ENGINE=MyISAM DEFAULT CHARSET=latin1 MAX_ROWS=1935228928 AVG_ROW_LENGTH=40000;

The full-text index on href_test_non_utf8 table was created with seconds but not the other one.
[6 Jan 2005 17:34] Hartmut Holzgraefe
is this supposed to fail on the CREATE statement already or needs data to be inserted into the tables?

The create statements worked just fine for me using 4.1.8 on linux
[6 Jan 2005 19:01] digesh kapadia
The create statement works fine, insert into href_test_utf8 which contains the utf8 column with full-text key stays in the state of Repair by Sorting.
[10 Jan 2005 17:51] digesh kapadia
Hartmut,

Were you successful in inserting and full-text index utf8 column with the attached data file? Feedback/Comments?

Thanks,
Digesh
[26 Jan 2005 21:01] Jorge del Conde
I was unable to reproduce this behaviour in 4.1.7.

Can you please provide us with the insert statement that you used to reproduce this bug ?

Thanks !
[27 Jan 2005 12:44] digesh kapadia
You will have to download and save the attached file as txt, and insert it into the href_test_utf8 table.