Bug #66261 select distinct actually updates some rows to previous values
Submitted: 8 Aug 2012 15:17 Modified: 9 Aug 2012 12:49
Reporter: Gregory Book Email Updates:
Status: Not a Bug Impact on me:
None 
Category:MySQL Server: MyISAM storage engine Severity:S2 (Serious)
Version:5.1.61 OS:Linux
Assigned to: CPU Architecture:Any

[8 Aug 2012 15:17] Gregory Book
Description:
When running a select distinct query, it changed some column values of previously updated rows in the same table

How to repeat:
When running 
 select req_status from data_requests where request_id = 23884
it returns 'pending'

Then I run
 select distinct(req_groupid) 'req_groupid', req_modality, req_username from data_requests where req_status = 'pending' or req_status = '' order by req_date
and run
 select req_status from data_requests where request_id = 23884
it returns 'processing' which was a previous value for that column

a timestamp column (on update current timestamp) on the data_requests table shows no change, so I assume no formal update statement occured
[8 Aug 2012 15:24] Gregory Book
This problem is still happening in 5.1.61
[8 Aug 2012 17:24] Sveta Smirnova
Thank you for the report.

Do you have query cache enabled?
[8 Aug 2012 17:35] Gregory Book
It would seem I do not have the query cache enabled, since the query_cache size is 0

query_cache_limit 1048576
query_cache_min_res_unit 4096
query_cache_size 0
query_cache_type ON
query_cache_wlock_invalidate OFF
query_prealloc_size 8192
[8 Aug 2012 18:31] Sveta Smirnova
Thank you for the feedback.

> select req_status from data_requests where request_id = 23884
> it returns 'processing' which was a previous value for that column

Please explain when and how you insert new value in this table.

Please also provide output of queries: CHECK TABLE data_requests; SHOW TABLE STATUS LIKE 'data_requests'; and CREATE TABLE data_requests;
[8 Aug 2012 18:48] Gregory Book
Originally these queries were part of a perl script. The status field was changing and I couldn't find where it was being changed. Then I just ran the queries in phpMyAdmin and found the bug there.

CHECK TABLE data_requests:
Table Op Msg_type Msg_text
ado2.data_requests check status OK

SHOW TABLE STATUS LIKE 'data_requests':
Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length Data_free Auto_increment Create_time Update_time Check_time Collation Checksum Create_options Comment
data_requests MyISAM 10 Dynamic 22505 151 3420308 281474976710655 785408 0 23893 2012-08-08 10:51:29 2012-08-08 11:29:39 2012-08-08 14:36:21 utf8_general_ci NULL

CREATE TABLE data_requests:
#1050 - Table 'data_requests' already exists

While testing it again, just using phpMyAdmin, I updated 14 rows to have req_status = 'pending'. and 14 rows were modified. I run a select, and those 14 rows show req_status is 'pending'. I run this statement:
 select distinct(req_groupid) 'req_groupid', req_modality, req_username from data_requests where req_status = 'pending' or req_status = '' order by req_date
and there are 12 rows 'pending', 1 'processing', and 1 'complete'
[8 Aug 2012 18:52] Gregory Book
I take that back. 12 'pending' 2 'processing', 0 'complete'.

In the original process, a row is inserted by a PHP page with a req_status of 'pending'. It's then changed to 'processing' by a perl program. The perl program is where the select distinct statement is.
[8 Aug 2012 19:00] Sveta Smirnova
Thank you for the feedback.

I meant SHOW CREATE TABLE data_requests;, not CREATE TABLE data_requests;
 I am sorry for confusion
[8 Aug 2012 19:04] Gregory Book
CREATE TABLE `data_requests` (
 `request_id` int(11) NOT NULL AUTO_INCREMENT,
 `req_username` varchar(50) NOT NULL,
 `req_ip` varchar(30) NOT NULL,
 `req_groupid` int(11) NOT NULL,
 `req_pipelinedownloadid` int(11) NOT NULL COMMENT 'filled if this is part of a pipeline download',
 `req_modality` varchar(20) NOT NULL,
 `req_destinationtype` varchar(20) NOT NULL COMMENT 'nfs, localftp, remoteftp',
 `req_nfsdir` varchar(255) NOT NULL,
 `req_seriesid` int(11) NOT NULL,
 `req_filetype` varchar(20) NOT NULL,
 `req_gzip` tinyint(1) NOT NULL,
 `req_anonymize` int(11) NOT NULL,
 `req_preserveseries` tinyint(1) NOT NULL,
 `req_dirformat` varchar(50) NOT NULL,
 `req_ftpusername` varchar(50) NOT NULL,
 `req_ftppassword` varchar(50) NOT NULL,
 `req_ftpserver` varchar(100) NOT NULL,
 `req_ftpport` int(11) NOT NULL DEFAULT '21',
 `req_ftppath` varchar(255) NOT NULL,
 `req_behformat` varchar(35) NOT NULL,
 `req_behdirrootname` varchar(50) NOT NULL,
 `req_behdirseriesname` varchar(255) NOT NULL,
 `req_date` timestamp NULL DEFAULT NULL,
 `req_completedate` timestamp NULL DEFAULT NULL,
 `req_cputime` double NOT NULL,
 `req_status` varchar(25) NOT NULL,
 `req_results` varchar(255) NOT NULL,
 `lastupdate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 PRIMARY KEY (`request_id`),
 KEY `req_groupid` (`req_groupid`),
 KEY `req_date` (`req_date`),
 KEY `req_status` (`req_status`)
) ENGINE=MyISAM AUTO_INCREMENT=23893 DEFAULT CHARSET=utf8
[9 Aug 2012 6:29] Shane Bester
selects don't change data.   you could enable general query log on the server to see who changes the data from another connection.

or easier, consider locking the table during the test :

lock table data_requests read;
select req_status from data_requests where request_id = 23884;
select distinct(req_groupid) 'req_groupid', req_modality, req_username from data_requests where req_status = 'pending' or req_status = '' order by req_date;
select req_status from data_requests where request_id = 23884;
unlock tables;

If you run the above in the mysql command line client, I bet results will be as you expect?
[9 Aug 2012 12:49] Gregory Book
Woops! That is correct, there was another connection. I assumed there wasn't because I am the only person with write privileges... except for the mirror server I just set up, which has the same automated perl programs running on it. Sorry about that, and thanks for your help!