Bug #47851 main.ps_ddl1 and funcs_1.storedproc fail with signal 11
Submitted: 6 Oct 2009 8:19 Modified: 25 Oct 2009 20:56
Reporter: Rafal Somla Email Updates:
Status: Closed Impact on me:
None 
Category:Tools: MTR / mysql-test-run Severity:S3 (Non-critical)
Version:6.0 OS:Any
Assigned to: Ingo Strüwing CPU Architecture:Any
Tags: pb2, test failure

[6 Oct 2009 8:19] Rafal Somla
Description:
These tests failed with signal 11 (segmentation fault) in PB on many platforms. The failure seems to be persistent (also when tests were retried). Once, funcs_1.storedproc failed with signal 6 (abort).

How to repeat:
See PB report (look at tests ps_ddl1 and storedproc) : http://pb2.norway.sun.com/web.py?template=push_details&push=560630
[6 Oct 2009 8:34] Valeriy Kravchuk
Both tests work for me with recent 5.1.40 from bzr on Linux. Can you, please, add exact versions/trees/latfroms that fail (without a reference to resources NOT available to public).
[6 Oct 2009 11:01] Susanne Ebrecht
Rafal,

we need to know which server version you tested.

Is it failing on 4.1, 5.0, 5.1, 5.4, or whatever future release?
[6 Oct 2009 17:39] Sveta Smirnova
John,

thank you for the feedback.

Verified as described, although I saw on that page references to crashes in embedded mode on Mac and Solaris also.
[14 Oct 2009 14:50] Ingo Strüwing
What jumps into my mind immediately:
"let digits= 100;" is not conform with the documentation of the "let" command.
It should be "let $digits= 100;".
But a crash shouldn't happen anyway.
[15 Oct 2009 7:10] Rafal Somla
Indeed! Adding "$" makes the test pass. Probably it is a result of bad merge. Thanks Ingo for insight!

The problem is not as easy as just missing "$" in variable name. A simple test with single command "let digits= 100" passes fine. The preceding actions in storedproc test must create a situation where crash can happen.
[15 Oct 2009 20:46] Ingo Strüwing
I could repeat the crash on my Linux x86_64 box (with the original bad syntax). Unfortunately the use of --valgrind made the test pass. That means, a quick result is unlikely. So I'll leave it to the to-be assignee.
[16 Oct 2009 18:54] Ingo Strüwing
Further observations are: The crash happens (on my box) only on a valgrind build (but only without actually running valgrind). In a debug build with actually running with --debug, I found an "Unallocated data" printout in mysqltest.trace. But the corresponding fprintf(stderr,...) went to nowhere.

After digging a bit, I found that the memory was allocated, but with my_once_strdup(). So it bypassed safemalloc. Trying to free such memory is the bug.

5.1 is not affected. The change from my_strdup() to my_once_strdup() has been done in 6.0.
[16 Oct 2009 19:56] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/87196

2881 Ingo Struewing	2009-10-16
      Bug#47851 - main.ps_ddl1 and funcs_1.storedproc fail with signal 11
      
      The named test cases failed with a crash in the mysqltest utility.
      The reason was an attempted my_free() of memeory, previously
      allocated with my_once_strdup().
      
      Memory allocated with my_once_*() must not be freed. Least of all
      with my_free().
      
      Since setting environment strings is not a frequent operation,
      we can live with the leak.
     @ client/mysqltest.cc
        Bug#47851 - main.ps_ddl1 and funcs_1.storedproc fail with signal 11
        Avoid freeing of memory allocated by my_once_strdup().
[17 Oct 2009 7:32] Rafal Somla
I am able to reproduce this bug locally using the following procedure:

1. Get mysql-6.0-backup-merge tree (this is one I've tried)
2. Run BUILD/autorun.sh
3. Configure using attached config.sh script (those are the same config options as used in PB)
4. Build tree
6. Run the following minimal test script:

--echo Testing.
let foo= 1;
let foo= 1;
--exit
[17 Oct 2009 7:34] Rafal Somla
Config options used by PB

Attachment: config.sh (, text), 721 bytes.

[17 Oct 2009 9:15] Rafal Somla
The problem (using my_once_strdup() for allocating memory) was introduced with a fix for BUG#47305. This fix was not pushed into 6.0-codebase but was merged into 6.0-backup-merge. This is why codebase tree was green and backup-merge not.
[17 Oct 2009 9:34] Bjørn Munch
Mea culpa. Confirmed that the patch fixes the problem for the two tests. Moved to the appropriate category.
[17 Oct 2009 9:35] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/87229

2878 Rafal Somla	2009-10-17
      Remove fix for BUG#47305 because it causes BUG#47851.
[17 Oct 2009 10:28] Rafal Somla
Approved. Would be nice to add comment explaining why memory is not freed.
[17 Oct 2009 10:43] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/87231

2869 Bjorn Munch	2009-10-17
      Bug#47851 - main.ps_ddl1 and funcs_1.storedproc fail with signal 11
      
      The named test cases failed with a crash in the mysqltest utility.
      The reason was an attempted my_free() of memeory, previously
      allocated with my_once_strdup().
      
      Memory allocated with my_once_*() must not be freed. Least of all
      with my_free().
      
      Since setting environment strings is not a frequent operation,
      we can live with the leak.
      
      @ client/mysqltest.cc
        Bug#47851 - main.ps_ddl1 and funcs_1.storedproc fail with signal 11
        Avoid freeing of memory allocated by my_once_strdup().
        Add comment explaining why we don't free v->env_s
[17 Oct 2009 10:45] Bjørn Munch
Pushed to 6.0-codebase-mtr
[17 Oct 2009 16:03] Ingo Strüwing
Boy! I'm completely confused by the back and forth.

Yesterday I proposed a fix for the problem. Today Rafal said, how he can repeat the problem. Is this with my fix applied, or without (which would be no news as I've pointed out the real cause already)?

Then Rafal mentioned, how the problem was introduced. I avoided to mention it, as I was the reviewer for that change. :( According to the bug report it was pushed to mysql-6.0-codebase-mtr. And thus is has prbably been merged into 6.0-codebase and then into 6.0-backup-merge. Why does Rafal claim, it was not pushed into 6.0-codebase but was merged into 6.0-backup-merge? Which path did it take then?

Then, in spite of Bjorn confirmed that the patch (I guess, mine was meant) fixes the problem, Rafal proposed a reversion of the former patch. And seemingly he approved it himself?

Later, Bjorn proposed a patch, which has some similarities to mine.

And then Bjorn claimed to have pushed something. But into 6.0-codebase-mtr, which probably won't help 6.0-backup-merge to become green.

So, what exactly has been pushed? Rafal's patch? Bjorn's patch? Both? And who has now fixed the bug? Or is it still open?

Please make exact statements, what has been pushed where and what the status of the bug is.
[17 Oct 2009 17:19] Bjørn Munch
Ok sorry for the lack of explanation. What I pushed was essentially identical to your suggested patch. I couldn't set myself as responsible since I had acted as reviewer.

6.0-codebase-bugfixing had apparenyly cherry picked my patch for Bug #47305, which has not yet made it to 6.0-codebase. Rafal essentially undid that rather than pushing this fix, which had not been reviewed. I suppose undoing an entire patch does not need a new review.

Then I pushed the fix to my branch 6.0-codebase-mtr, so that when it hits 6.0-codebase (soon I hope) there will not be a problem. I can see in PB2 that the tests pass.

So, in my branch I have both 47305 and 47851, while in the branch Rafal was working on (6.0-codebase-bugfixing?) you should have neither, since 47305 has been reverted.
[17 Oct 2009 18:33] Ingo Strüwing
"I suppose undoing an entire patch does not need a new review." Admittedly, we don't have a rule for it. But perhaps we should.

Please take a breath and think, what might happen when both your patch and Rafal's undo-patch meet in a branch. Who will win? Will the problem pop up in form of a merge conflict?

I guess, Rafal's patch will overrule yours. The new patch will remove the my_free(). That will fix Bug#47305 too. But it may make a valgrind or safemalloc report pop up. The one, assigned to it, may add a my_free()...
[19 Oct 2009 5:54] Rafal Somla
Few missing details.

I, as a merge captain for the backup tree, decided to cherry-pick the fix for bug#47305 from 6.0-codebase-mtr into 6.0-backup-merge (the fix was not present in 6.0-codebase). Then, after 2 days of investigations, I found that this was the cause of making backup trees red. In the process I reported bug#47851.

After learning my lesson that I should never pull fixes which have not gone all the way to the main tree, I decided to remove the cherry-picked fix from backup tree and wait for the fix to come through 6.0-codebase as usual.

I hope there will be no big merge problems, because after cherry-picking and reverting bug#47305 the tree should be in a state as if this fix was never present there. When it comes from 6.0-codebase it should merge as usual I hope.

Note: The patch from [17 Oct 11:35] is not related to fixing this bug. This is the patch which removes bug#47305 patch from 6.0-backup-merge tree. It is listed here because commit message mentions the bug number.
[19 Oct 2009 6:20] Rafal Somla
Hmm, I might be over-optimistic regarding a smooth merge of all these patches. The good news is that the potential conflicts will happen only in backup trees, so we should resolve it locally.
[19 Oct 2009 7:39] Ingo Strüwing
"I hope there will be no big merge problems, because after cherry-picking and reverting bug#47305 the tree should be in a state as if this fix was never present there. When it comes from 6.0-codebase it should merge as usual I hope."

The tree is not "in a state as if this fix was never present there". The code is the same as before, but the repository contains the "bugfix revision" plus the "undo revision".

My guess is that the future merge from 6.0-codebase, which will include the "bugfix revision" detects that that revision does already exist in the tree. Hence its code changes are not applied again. But the fix for _this_ bug (the fix for the bugfix) will be merged in.

When our tree is merged to -codebase, the undo revision will be part of it, including the code change. So the original bug fix will be undone there too.

In the revision history we will have a criss-cross merge of the original bug fix plus an undo revision and a follow-up revision of the original patch.

What needs to be done: We need to set some marker to not forget about the problem. As soon as we pull the original revision through -codebase, we must add a "redo" patch for the original patch.
[22 Oct 2009 20:17] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091022201524-0efl2fzebfuuf0vk) (version source revid:bjorn.munch@sun.com-20091017103949-g7vafwnnjmeotfyx) (merge vers: 6.0.14-alpha) (pib:13)
[22 Oct 2009 23:20] Paul DuBois
Test suite change. No changelog entry needed.
[25 Oct 2009 13:38] Bugs System
Pushed into 6.0.14-alpha (revid:alik@sun.com-20091025133616-ca4inerav4vpdnaz) (version source revid:rafal.somla@sun.com-20091017093435-re0ajk0zuubncqg6) (merge vers: 6.0.14-alpha) (pib:13)
[26 Dec 2009 8:45] Sergei Golubchik
This fix is reverted together with the fix for Bug#47305 that caused the problem.
See Bug#47305 for details.