Description:
Stop words appear to be handled incorrectly when in full-text boolean mode subexpressions. In the How To Repeat section below I provide four examples based on the boolean query "+history +of +exposure". Since "of" is a noise word, it should be ignored. It is ignored in the simple expression, but when placed in a subexpression - "+history +(of) +exposure" - it appears to be required (per the +) even though it does not exist in the full-text index. Hence, no rows are returned.
When a non-noise word is placed in a subexpression, it is handled properly.
How to repeat:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 64 to server version: 4.1.10-nt
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> # Baseline, without noise word "of"
mysql> select distinct ConceptID, String
-> from ConceptSynonym
-> where match (String) against ("+history +exposure" in boolean mode);
+-----------+------------------------------------------------------------------------------------+
| ConceptID | String |
+-----------+------------------------------------------------------------------------------------+
| 28165 | Personal history of exposure to nitrogen mustard compounds |
| 375812 | History of exposure to asbestos |
| 375813 | History of exposure to potentially hazardous body fluids |
| 375814 | History of exposure to lead |
| 488499 | HISTORY OF INDUSTRIAL EXPOSURE:FINDING:POINT IN TIME:^PATIENT:NARRATIVE:REPORTED |
| 488499 | HISTORY OF INDUSTRIAL EXPOSURE:FIND:PT:^PATIENT:NAR:REPORTED |
| 488500 | HISTORY OF INDUSTRIAL EXPOSURE:FINDING:POINT IN TIME:^PATIENT:NOMINAL:REPORTED |
| 488500 | HISTORY OF INDUSTRIAL EXPOSURE:FIND:PT:^PATIENT:NOM:REPORTED |
| 488503 | HISTORY OF OCCUPATIONAL EXPOSURE:FINDING:POINT IN TIME:^PATIENT:NARRATIVE:REPORTED |
| 488503 | HISTORY OF OCCUPATIONAL EXPOSURE:FIND:PT:^PATIENT:NAR:REPORTED |
| 488504 | HISTORY OF OCCUPATIONAL EXPOSURE:FINDING:POINT IN TIME:^PATIENT:NOMINAL:REPORTED |
| 488504 | HISTORY OF OCCUPATIONAL EXPOSURE:FIND:PT:^PATIENT:NOM:REPORTED |
| 489535 | HISTORY OF INDUSTRIAL EXPOSURE |
| 489537 | HISTORY OF OCCUPATIONAL EXPOSURE |
| 375812 | Personal history of asbestos exposure |
| 28165 | Personal history of mustard gas exposure |
+-----------+------------------------------------------------------------------------------------+
16 rows in set (0.03 sec)
mysql>
mysql> # With noise word. Works
mysql> select distinct ConceptID, String
-> from ConceptSynonym
-> where match (String) against ("+history +of +exposure" in boolean mode);
+-----------+------------------------------------------------------------------------------------+
| ConceptID | String |
+-----------+------------------------------------------------------------------------------------+
| 28165 | Personal history of exposure to nitrogen mustard compounds |
| 375812 | History of exposure to asbestos |
| 375813 | History of exposure to potentially hazardous body fluids |
| 375814 | History of exposure to lead |
| 488499 | HISTORY OF INDUSTRIAL EXPOSURE:FINDING:POINT IN TIME:^PATIENT:NARRATIVE:REPORTED |
| 488499 | HISTORY OF INDUSTRIAL EXPOSURE:FIND:PT:^PATIENT:NAR:REPORTED |
| 488500 | HISTORY OF INDUSTRIAL EXPOSURE:FINDING:POINT IN TIME:^PATIENT:NOMINAL:REPORTED |
| 488500 | HISTORY OF INDUSTRIAL EXPOSURE:FIND:PT:^PATIENT:NOM:REPORTED |
| 488503 | HISTORY OF OCCUPATIONAL EXPOSURE:FINDING:POINT IN TIME:^PATIENT:NARRATIVE:REPORTED |
| 488503 | HISTORY OF OCCUPATIONAL EXPOSURE:FIND:PT:^PATIENT:NAR:REPORTED |
| 488504 | HISTORY OF OCCUPATIONAL EXPOSURE:FINDING:POINT IN TIME:^PATIENT:NOMINAL:REPORTED |
| 488504 | HISTORY OF OCCUPATIONAL EXPOSURE:FIND:PT:^PATIENT:NOM:REPORTED |
| 489535 | HISTORY OF INDUSTRIAL EXPOSURE |
| 489537 | HISTORY OF OCCUPATIONAL EXPOSURE |
| 375812 | Personal history of asbestos exposure |
| 28165 | Personal history of mustard gas exposure |
+-----------+------------------------------------------------------------------------------------+
16 rows in set (0.03 sec)
mysql>
mysql> #|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
mysql> #VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
mysql> # Noise word in subexpression. DOES NOT WORK
mysql> select distinct ConceptID, String
-> from ConceptSynonym
-> where match (String) against ("+history +(of) +exposure" in boolean mode);
Empty set (0.03 sec)
mysql>
mysql> # Non-noise word in subexpression. Works
mysql> select distinct ConceptID, String
-> from ConceptSynonym
-> where match (String) against ("+history +of +(exposure)" in boolean mode);
+-----------+------------------------------------------------------------------------------------+
| ConceptID | String |
+-----------+------------------------------------------------------------------------------------+
| 28165 | Personal history of exposure to nitrogen mustard compounds |
| 375812 | History of exposure to asbestos |
| 375813 | History of exposure to potentially hazardous body fluids |
| 375814 | History of exposure to lead |
| 488499 | HISTORY OF INDUSTRIAL EXPOSURE:FINDING:POINT IN TIME:^PATIENT:NARRATIVE:REPORTED |
| 488499 | HISTORY OF INDUSTRIAL EXPOSURE:FIND:PT:^PATIENT:NAR:REPORTED |
| 488500 | HISTORY OF INDUSTRIAL EXPOSURE:FINDING:POINT IN TIME:^PATIENT:NOMINAL:REPORTED |
| 488500 | HISTORY OF INDUSTRIAL EXPOSURE:FIND:PT:^PATIENT:NOM:REPORTED |
| 488503 | HISTORY OF OCCUPATIONAL EXPOSURE:FINDING:POINT IN TIME:^PATIENT:NARRATIVE:REPORTED |
| 488503 | HISTORY OF OCCUPATIONAL EXPOSURE:FIND:PT:^PATIENT:NAR:REPORTED |
| 488504 | HISTORY OF OCCUPATIONAL EXPOSURE:FINDING:POINT IN TIME:^PATIENT:NOMINAL:REPORTED |
| 488504 | HISTORY OF OCCUPATIONAL EXPOSURE:FIND:PT:^PATIENT:NOM:REPORTED |
| 489535 | HISTORY OF INDUSTRIAL EXPOSURE |
| 489537 | HISTORY OF OCCUPATIONAL EXPOSURE |
| 375812 | Personal history of asbestos exposure |
| 28165 | Personal history of mustard gas exposure |
+-----------+------------------------------------------------------------------------------------+
16 rows in set (0.05 sec)
mysql>
Suggested fix:
Look at how noise words are handled in boolean mode subexpressions.