Bug #8065 'federated' fails, loses connection, causes subsequent failures
Submitted: 21 Jan 2005 10:06 Modified: 5 Apr 2005 21:30
Reporter: Joerg Bruehe Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server Severity:S1 (Critical)
Version:5.0.3-pre OS:various Unix
Assigned to: Bugs System CPU Architecture:Any

[21 Jan 2005 10:06] Joerg Bruehe
Description:
Test 'federated' fails on several platforms with identical symptom:
At line 28: query 'insert into federated.t1 (name, other) values ('First Name', 11111)' failed: 1218: Error connecting to master: Can't connect to MySQL server on '127.0.0.1' (111)

This is the first data manipulation statement, extract of expected 'result' file:
 11 drop database if exists federated;
 12 create database federated;
 13 CREATE TABLE federated.t1 ( `id` int(20) NOT NULL auto_increment, `name` varchar(32) NOT NULL default '', `other` in        t(20) NOT NULL default '0', created datetime default '2004-04-04 04:04:04', PRIMARY KEY  (`id`), KEY `name` (`name`)        , KEY `other_key` (`other`)) ENGINE="FEDERATED" DEFAULT CHARSET=latin1 COMMENT='mysql://root@127.0.0.1:9308/federate        d/t1';
 14 insert into federated.t1 (name, other) values ('First Name', 11111);

A consequence of this test abort is that the test stream fails to drop the database 'federated', as a consequence at least 7 further tests fail
(certain: information_schema, ndb_autodiscover, rpl000009, rpl_create_database,  ps_1general, schema, show_check; 
probably same cause:  ndb_autodiscover2).
To prevent this, removal of the database 'federated' must be ensured even if this test aborts and so does not execute its own final cleanup.

This happened on all machines where the tests started:
build, buildqnx, butch, cane, ds20, ds9, hammer, hp3750, intelxeon3, ita2, nocona, pegasos1, powermacg4, powermacg5, quadita2, sunfire100b, and sunfire100c.

Current release build, based on changeSet
  1.1782 05/01/20 16:04:03 mskold@mysql.com +1 -0
  Merge mskold@bk-internal.mysql.com:/home/bk/mysql-5.0
  into mysql.com:/usr/local/home/marty/MySQL/mysql-5.0

How to repeat:
Attempt a release build + test.

Suggested fix:
In addition to the fix that prevents the test stream abort,
a new test 'federated_zzz' should be added that ensures removal of the database 'federated' so that the cleanup occurs even if the test aborts.
[31 Jan 2005 21:18] Patrick Galbraith
Fixed per commit, waiting on review:

my review request email:

Hello,

I committed a patch today that fixes all of my memory leaks as well as  some compile warnings, which I've successfully tested on both my own  SuSe 8.2 workstation and Brian's test server (64bit AMD), and am in the  process of testing on build.mysql.com.

The memory leaks only manifested themselves on Linux (32-bit). I do  most of my work on Mac OSX and never saw these errors while running the  test, but upon running the test, even though it's successful, there  were various places where memory was not freed. These were two places  mainly:

1. string.scheme wasn't freed (used in ha_federated::create)
2. 'result', allocated from 'mysql_store_result' (in various parts of  the code).

In addition to fixing these leaks, when I fixed the store_result leak,  it caused another problem to appear that worked before because  'mysql_free_result' wasn't being called. The snippit from master.trace  below shows essentially what was happening. Note that index_read_idx  sends a query it builds based on the index that is in the query run  against the local federated table, builds the query, building the where  clause with that index, then calls mysql_store_result, which negates  the need for a further table scan. Well, a table scan was still being  called in rnd_init, which follows after index_end (due to how  sql_update.cc calls). I realised that rnd_init has a bool scan variable  that is passed in, and surely enough, it is set to 0 in the case where  index_read_idx already has done a query for the one record. So, what  was happening is rnd_init would do a full table scan (via select * from  table) and then call store_result, wiping out the results of the query  in index_read_idx. This wasn't noticeable before mysql_free_result was  added to index_end, but once it was, made it so the record used in  update_row to build the update query was the wrong record, so the sql  query would update the wrong record!

Subsequently, I am utilising the scan flag on both rnd_init (to not do  the full table scan) and in rnd_pos, which results in rnd_next and  convert_row_to_internal_format from being called (it doesn't have to  because index_read_idx retrieved the correct record and then called  rnd_next on it already).

This fixed my memory problems and the problem of a full table scan  being called after index_read_idx.

Other changes I made are listed in the bk commit message below. Below  shows what happened prior to this fix:

Initial query on federated table:

query: update federated.t1 set name = 'Third name' where name = '3rd  name'
| | | enter: db: federated  want_access: 4  master_access: 33554431
| | | >ha_federated::info
| | | <ha_federated::info
| | | | | info: real table: federated.t1
| | | | ha_federated::scan_time:
| | | | >ha_federated::index_init
| | | | <ha_federated::index_init
| | | | | | >ha_federated::index_read
| | | | | | <ha_federated::index_read
| | | | | | >ha_federated::index_read_idx
| | | | | | | >ha_federated::create_where_from_key
| | | | | | | <ha_federated::create_where_from_key
| | | | | | | ha_federated::index_read_idx: mysql_store_result returned  result at address 8de8588
| | | | | | <ha_federated::index_read_idx
| | | | | | >ha_federated::rnd_next
| | | | | | <ha_federated::rnd_next

This is the correct record:

| | | | | | >ha_federated::convert_row_to_internal_format
| | | | | | | ha_federated::convert_row_to_internal_format: row[0] 3
| | | | | | | ha_federated::convert_row_to_internal_format: row[1] 3rd  name
| | | | | | | ha_federated::convert_row_to_internal_format: row[2] 33333
| | | | | | | ha_federated::convert_row_to_internal_format: row[3]  2004-04-04 04:04:04
| | | | | | <ha_federated::convert_row_to_internal_format
| | | >ha_federated::position
| | | <ha_federated::position
| | | | | | >ha_federated::index_next
| | | | | | <ha_federated::index_next
| | | | | | >ha_federated::rnd_next
| | | | | | <ha_federated::rnd_next
| | | | >ha_federated::index_end
| | | | | ha_federated::index_end: mysql_free_result address 8de8588
| | | | <ha_federated::index_end

This is the query (and following methods) that don't need to be run  since we already have the correct record!!!

| | | | >ha_federated::rnd_init
| | | | | ha_federated::rnd_init: share->select_query SELECT * FROM t1
| | | | | ha_federated::rnd_init: mysql_store_result returned result at  address 8de8548
| | | | <ha_federated::rnd_init
| | | >ha_federated::rnd_pos
| | | <ha_federated::rnd_pos
| | | >ha_federated::rnd_next
| | | <ha_federated::rnd_next

In the remaining lines, here are the incorrect record, and then  incorrect update statement!

| | | >ha_federated::convert_row_to_internal_format
| | | | ha_federated::convert_row_to_internal_format: row[0] 1
| | | | ha_federated::convert_row_to_internal_format: row[1] First Name
| | | | ha_federated::convert_row_to_internal_format: row[2] 11111
| | | | ha_federated::convert_row_to_internal_format: row[3] 2004-04-04  04:04:04
| | | <ha_federated::convert_row_to_internal_format
| | | >ha_federated::update_row
| | | | ha_federated::update_row: Final update query: UPDATE t1 SET  id=1, name='Third name', other=11111, created='2004-04-04 04:04:04'  WHERE id=1
| | | <ha_federated::update_row
| | | >ha_federated::rnd_end
| | | | ha_federated::index_end: mysql_free_result address 8de8548
| | | <ha_federated::index_end

Begin forwarded message:

From: Patrick Galbraith <patg@mysql.com>
Date: January 29, 2005 6:33:11 PM PST
To: dev-public@mysql.com
Subject: bk commit - 5.0 tree (patg:1.1814)

ChangeSet
  1.1814 05/01/29 18:33:06 patg@krsna.patg.net +2 -0
  WL# Federated Storage Handler 2094

  Fixed all memory leaks!

  ha_federated.h:
    added scan_flag, set defaults for scan_flag and result to 0 so  first initial call of rnd_init doesn't free result!
  ha_federated.cc:
    Fixed memory leaks:
    1. share->scheme that was created in parse_url was not being freed
    2. HASH federated_open_tables was being deleted, but not freed
    3. 'result' from mysql_store_result was not being free in several  instances
    4. Fixed the problem where a table scan was being performed after  index_read_idx,
    which didn't cause a problem because the result set from  idx_read_idx was not being
    freed, but once the result set was properly freed, it broke  update_row. Now, I'm using
    the bool 'scan' to determine if I need to perform a table scan,  which it magically
    is false when the query is an update with an index.
    5. Changed all stings containing the query to perform in  mysql_real_query calls from
    string.c_ptr_quick() to string.ptr() per Monty's suggestion  (better performance)
    6. Fixed various cast/type/truth compile warnings.
    7. Got rid of 'load_conn_info' and just let 'parse_url' handle it.  It made no sense before.
    8. ALL MEMORY LEAKS FIXED!!!!!
[17 Feb 2005 18:22] Patrick Galbraith
changeset 1.1856 fixes this, waiting on second review
[17 Feb 2005 18:24] Patrick Galbraith
changeset 1.1856 fixes this, waiting for second review.
[17 Feb 2005 18:29] Patrick Galbraith
This has been fixed with changeset 1.1856, waiting for second review
[5 Apr 2005 21:30] Patrick Galbraith
This and other memory leak issues have been fixed.