Bug #39640 ftb_query_add_word does not recognice MYSQL_FTFLAGS_NEED_COPY
Submitted: 25 Sep 2008 4:01 Modified: 3 Oct 2009 6:19
Reporter: Hiroaki Kawai (OCA) Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: FULLTEXT search Severity:S3 (Non-critical)
Version:5.1.26, 5.1.28, 6.0.6 OS:Any
Assigned to: Assigned Account CPU Architecture:Any
Tags: boolean mode, Contribution

[25 Sep 2008 4:01] Hiroaki Kawai
Description:
As of full-text parser param, mysql_add_word() should see MYSQL_FTPARSER_PARAM->flags and check MYSQL_FTFLAGS_NEED_COPY. The function ftb_query_add_word in storage/myisam/ft_boolean_search.c does not have that process.

How to repeat:
1. Write a parser plugin that pass param->mysql_add(param...) with param->flags=MYSQL_FTFLAGS_NEED_COPY (*1)
2. Create a table with a full-text index with that parser plugin
3. Issue a phrase query the table in boolean mode
4. printf debug (watch at ftb_phrase_add_word()).
5. See phrase_word in ftb_phrase_add_word() is overritten by someone 

*1) You can use 
$ svn co -r44 http://mysqlftppc.svn.sourceforge.net/svnroot/mysqlftppc/space/trunk/ space
$ cd space
$ automake --add-missing
$ automake
$ autoconf
$ ./configure --with-mysql-config=/path/to/mysql_config

Suggested fix:
--- mysql-5.1.26-rc/storage/myisam/ft_boolean_search.c.orig     2008-07-01 07:36:31.000000000 +0900
+++ mysql-5.1.26-rc/storage/myisam/ft_boolean_search.c  2008-09-25 12:41:05.006000000 +0900
@@ -192,6 +192,13 @@
   float weight= (float)
         (info->wasign ? nwghts : wghts)[(r>5)?5:((r<-5)?-5:r)];

+  if(word_len>0 && param->flags & MYSQL_FTFLAGS_NEED_COPY){
+    char *ptr;
+    ptr=alloc_root(&ftb_param->ftb->mem_root, word_len);
+    memcpy(ptr,word,word_len);
+    word=ptr;
+  }
+
   switch (info->type) {
     case FT_TOKEN_WORD:
       ftbw= (FTB_WORD *)alloc_root(&ftb_param->ftb->mem_root,
[25 Sep 2008 6:21] Hiroaki Kawai
ftb_phrase_add_word() has the same problem, but I think fixing the problem needs changing the st_my_ftb_phrase_param structure.

typedef struct st_my_ftb_phrase_param
{
  FTB *ftb; /** add this to access to mem_root */
  LIST *phrase;
  LIST *document;
  CHARSET_INFO *cs;
  uint phrase_length;
  uint document_length;
  uint match;
} MY_FTB_PHRASE_PARAM;
[25 Sep 2008 7:52] Sveta Smirnova
Thank you for the report.

Verified as described using code analysis. Your plugin crashes in my environment when trying to insert a record into test table.
[25 Sep 2008 9:55] Hiroaki Kawai
Created a patch for 5.1.28.
https://mysqlftppc.svn.sourceforge.net/svnroot/mysqlftppc/space/trunk/mysql-5.1.28-rc.patc...
[24 Mar 2009 6:53] Masood Mortazavi
Dear "Name Withheld" .... :-)

It is hard, if not next to impossible, to accept contributions from anonymous contributors.

Given you're interested in contribution to MySQL, please note that MySQL is using a new contributor agreement (i.e. the "Sun Contributor Agreement"). 

Before we can handle, review and absorb your code into the main development branches for MySQL or in updates on existing releases, etc., it is required that you sign the "Sun Contributor Agreement". 

Under the Sun Contributor Agreement (SCA), the contributor retains copyrights while also granting those same rights to Sun as the project sponsor. It supersedes the previously used MySQL Contributor License Agreement (CLA).

You can get a copy of the SCA document here:
http://www.sun.com/software/opensource/sca.pdf 

Once you sign and send your copy of the SCA document as described here: 
  http://forge.mysql.com/wiki/Sun_Contributor_Agreement , 
your signatory status will be reflected, along with your interest in contributing to MySQL, here: 
   https://sca.dev.java.net/CA_signatories.htm 

Having a clear SCA status will enable and facilitate the review and interaction regarding your code contributions to MySQL (as well as to any other Sun-managed Open-Source project).

Regards,
- m.
[28 May 2009 11:40] Sergey Vojtovich
What was verified doesn't conform to what was reported. I.e. crash on INSERT vs buffer overwrite on phrase search.

Generally the problem is that sometimes server is ignoring MYSQL_FTFLAGS_NEED_COPY. Let's try to find out when.

The parser may be called by the server in the following situations:

1. When inserting, updating, deleting records, as well as during checking/repairing a table. In all cases it is expected that parsed words are available after st_mysql_ftparser::parse() completes.

This is true for built-in parser, as it is pointing to the original data buffer. Availability of the original data buffer is guaranteed by the server.

If a parser plugin returning words, which are not pointing to the original data buffer, it is best to use MYSQL_FTFLAGS_NEED_COPY and free all relevant buffers before returning control to the server from the st_mysql_ftparser::parse().

Server is releasing memory occupied by copied words as soon as words are not needed any more.

So far so good. This should work.

2. When parsing natural language query. Generally the same as above.

3. When performing query expansion in NLQ mode. Server has already set MYSQL_FTFLAGS_NEED_COPY, so it must work.

4. When parsing boolean mode query. Words are always copied by the server. MYSQL_FTFLAGS_NEED_COPY doesn't make any sense here. So the proposed solution does nothing except for allocating memory for the same word twice.

On the other hand there are still two bug here. When compiling list of words for phrase search, we're pointing to the buffer passed from parser instead of pointing to the buffer that we allocated. Another problem is that buffer for stopwords is not allocated.

5. When checking if phrase is present in data buffer. Expecting words to be available until the end of ::parse() call. If a plugin parser is returning words one by one (::mysql_add_word()) and overwriting word buffer, currently there is no way to make it work.

Support for MYSQL_FTFLAGS_NEED_COPY in this case should be implemented or make it work without this flag.

6. When calculating relevancy in boolean mode. In this case the server is only expecting word buffer to be available during ::mysql_add_word(). It is safe to ignore MYSQL_FTFLAGS_NEED_COPY here.

So far we have problems in p.4 and p.5.
[2 Sep 2009 18:32] Sergei Golubchik
Sergey - for 5 (ftb_phrase_add_word) I think it's better to fix the algorithm to avoid comparing the same word twice, then we won't need to remember old words and would be able to ignore MYSQL_FTFLAGS_NEED_COPY safely.
[2 Oct 2009 23:00] Bugs System
No feedback was provided for this bug for over a month, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
[24 Nov 2009 12:43] Sergei Golubchik
bug#48781 is a duplicate of this one