Bug #56654 pb2 log is very hard to read
Submitted: 8 Sep 2010 13:56 Modified: 14 Nov 2010 2:53
Reporter: Sven Sandberg Email Updates:
Status: Closed Impact on me:
None 
Category:Tools: MTR / mysql-test-run Severity:S7 (Test Cases)
Version:5.1 OS:Any
Assigned to: Bjørn Munch CPU Architecture:Any
Tags: mtr, mysqltest, pb2

[8 Sep 2010 13:56] Sven Sandberg
Description:
The pb2 log is sometimes very difficult to analyze. Here is a snippet from https://central.sun.net/http://pb2.norway.sun.com/?action=archive_download&archive_id=2239... where several things are unclear (lines have been broken and enumerated):

 1 sys_vars.rpl_init_slave_func w2 [ pass ] 11826
 2 worker[2] mysql-test-run: WARNING: Waited 60 seconds for /export/ho
   me2/pb2/test/sb_0-2236760-1283495420.63/mysql-5.6.1-m4-linux-x86_64
   -test/mysql-test/var-n_mix/2/mysqld.2/data/auto.cnf to be created, 
   still waiting for 1740 seconds...
 3 worker[1] mysql-test-run: WARNING: Waited 60 seconds for /export/ho
   me2/pb2/test/sb_0-2236760-1283495420.63/mysql-5.6.1-m4-linux-x86_64
   -test/mysql-test/var-n_mix/1/mysqld.3/data/auto.cnf to be created, 
   still waiting for 1740 seconds...
 4 worker[1] mysql-test-run: WARNING: Waited 120 seconds for /export/h
   ome2/pb2/test/sb_0-2236760-1283495420.63/mysql-5.6.1-m4-linux-x86_6
   4-test/mysql-test/var-n_mix/1/mysqld.3/data/auto.cnf to be created,
    still waiting for 1680 seconds...
 5 worker[2] mysql-test-run: WARNING: Waited 120 seconds for /export/h
   ome2/pb2/test/sb_0-2236760-1283495420.63/mysql-5.6.1-m4-linux-x86_6
   4-test/mysql-test/var-n_mix/2/mysqld.2/data/auto.cnf to be created,
    still waiting for 1680 seconds...
 6 rpl.rpl_heartbeat_2slaves w1 [ fail ]
 7         Test ended at 2010-09-03 14:06:10
 8 
 9 CURRENT_TEST: rpl.rpl_heartbeat_2slaves
10 
11 
12 The server [mysqld.2 - pid: 12530, winpid: 12530, exit: 256] crashe
   d while running 'check testcase before test'
13 Server log from this test:
14 ==12531== Memcheck, a memory error detector.
15 ==12531== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward 
   et al.
16 ==12531== Using LibVEX rev 1658, a library for dynamic binary trans
   lation.
17 ==12531== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
18 ==12531== Using valgrind-3.2.1, a dynamic binary instrumentation fr
   amework.
19 ==12531== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward 
   et al.
20 ==12531== For more details, rerun with: -v
21 ==12531== 
22 ==12531== Warning: set address range perms: large range 136000000 (
   undefined)
23 ==12531== Warning: set address range perms: large range 200000000 (
   undefined)
24 100903 15:01:44 [Note] Plugin 'FEDERATED' is disabled.
25 InnoDB: The InnoDB memory heap is disabled
26 InnoDB: Mutexes and rw_locks use GCC atomic builtins
27 InnoDB: Compressed tables use zlib 1.2.3
28 100903 15:01:47  InnoDB: highest supported file format is Barracuda
   .
29 100903 15:01:51 InnoDB 1.2.0 started; log sequence number 1595675
30 100903 15:01:52 [Warning] No existing UUID has been found, so we as
   sume that this is the first time that this server has been started.
    Generating a new UUID: 09ff8468-b753-11df-98f4-00144f20bc5c.
31 100903 15:01:54 [Note] Event Scheduler: Loaded 0 events
32 100903 15:01:54 [Note] /export/home2/pb2/test/sb_0-2236760-12834954
   20.63/mysql-5.6.1-m4-linux-x86_64-test/bin/mysqld: ready for connec
   tions.
33 Version: '5.6.1-m4-debug-log'  socket: '/export/home2/pb2/test/sb_0
   -2236760-1283495420.63/tmp/7U7uGkj2iw/1/mysqld.2.sock'  port: 13001
     Source distribution
34 mysqltest failed but provided no output
35 
36 
37  - saving '/export/home2/pb2/test/sb_0-2236760-1283495420.63/mysql-
   5.6.1-m4-linux-x86_64-test/mysql-test/var-n_mix/1/log/rpl.rpl_heart
   beat_2slaves/' to '/export/home2/pb2/test/sb_0-2236760-1283495420.6
   3/mysql-5.6.1-m4-linux-x86_64-test/mysql-test/var-n_mix/log/rpl.rpl
   _heartbeat_2slaves/'
38 
39 Retrying test, attempt(2/3)...
40 
41 worker[1] mysql-test-run: WARNING: Waited 60 seconds for /export/ho
   me2/pb2/test/sb_0-2236760-1283495420.63/mysql-5.6.1-m4-linux-x86_64
   -test/mysql-test/var-n_mix/1/mysqld.3/data/auto.cnf to be created, 
   still waiting for 1740 seconds...
42 rpl.rpl_heartbeat_basic                  w2 [ exp-fail ]
43         Test ended at 2010-09-03 14:09:17
44 
45 CURRENT_TEST: rpl.rpl_heartbeat_basic
46 ==12933== Memcheck, a memory error detector.
47 ==12933== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward 
   et al.
48 ==12933== Using LibVEX rev 1658, a library for dynamic binary trans
   lation.
49 ==12933== Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.
50 ==12933== Using valgrind-3.2.1, a dynamic binary instrumentation fr
   amework.
51 ==12933== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward 
   et al.
52 ==12933== For more details, rerun with: -v
53 ==12933==

The following needs to be presented in a more clear way:
 1. Line 2-5: What was mtr trying to do and what happened? Is this
    a host problem, a mysql problem or an mtr problem? Which test
    case might be responsible?
 2. Line 12: Which test case was the last one to run before the server
    crashed during 'check-testcase'? Which was next to run? Where
    is the stack trace?
 3. Line 13: What is the name of "this test"? Where does the server
    log end?
 4. Line 22-23: Is this a real problem or something that Valgrind
    just prints for no reason? Can we suppress it?
 5. Line 34: This message sounds like there is a bug in mysqltest.
    Can we report and fix it? (Btw, I think this means that we're
    no longer looking at a server log, right? So there are missing
    end-of-server-log markers somewhere.)
 6. Line 39: What is the name of the test?

How to repeat:
Read the log.

Suggested fix:
 1. It seems that mtr is sometimes waiting for some file to be created, and
    this can fail, and when it fails, a message is presented in the pb2 log.
    The log is read by people who develop mysql, but the error message is
    only understandable by people who develop mtr. The message needs to
    explain if this is likely to be a host problem, an mtr bug, a test case
    bug, or a mysqld bug. If it can be a test case bug or a mysqld bug, the
    message has to explain what mtr expects and how to fix it.

 2. A lot of things are mixed in the log, and it is hard to figure out where
    a message starts and ends. When check-testcase fails, it would be better
    if the error message contained the name of the previously running test.
    Also, it is probably not clear to most people that a failure in
    check-testcase is the result of something wrong in the previous test
    case. I suggest the message is reworded to something like:

      "The server [mysqld.2 - pid: 12530, winpid: 12530, exit: 256]
       crashed while running 'check testcase before test'. Most likely,
       this means that it was the *last* executed test that caused the
       server to crash, after the test finished. The last executed test
       was XYZ."

 3. It is not clear when the server log starts and ends, it is not clear
    which of the servers we get a log from, and the test name is
    not clear. It would be better to format this as:
      ==== BEGIN SERVER LOG FOR <server> AFTER <test case name> ====
      <server log>
      ...
      ==== END SERVER LOG FOR <server> AFTER <test case name> ====

 4. If possible, try to suppress useless information. If it is vital to
    know the Valgrind version and licence, consider if we can print it at
    most once per mtr run, and only when there are any real Valgrind
    warnings.

 5. Ensure that mysqltest always prints an error message when it fails.

 6. Please print the name of the test when retrying. E.g.:

      Retrying test <name of test>, attempt 2/3...
[8 Sep 2010 13:57] Sven Sandberg
See also BUG#56648.
[8 Sep 2010 14:19] Bjørn Munch
What caused all these failures would be Bug #56528 which causes memory to fill up.

1. When memory fills up, bad things happen, mtr cannot be expected to fully analyze the cause but will report the symptom.

2. Error messages relating to a test will refer to the last test mentioned in the log.

3. OK some wrapper lines could probably be added to the server log output.

4. When we dump server log, we dump server log, mtr cannot possibly decide what to include and what not to.

5. The code does but sometimes it still doesn't output, maybe due to resource constraints. Nothing we can do about that, apart from mentioning that it did produce no output.

6. OK can probably do that.
[8 Sep 2010 16:30] Sven Sandberg
1. I think maybe you misunderstood my request. This is a log where
   developers expect to find information to debug their test cases or
   mysqld. If there are messages that indicate host problems, it would be
   good if those messages were clearly marked as such. For example, we
   could print something like the following (assuming the file we are
   waiting for is supposed to be generated by mysqld):

     worker[2] mysql-test-run: WARNING: Waited X seconds for <FILE> to be created. I will wait for Y more seconds. (Explanation: mysql-test-run has started a mysqld process. Normally, the mysqld process creates this file soon after startup. Since mysql-test-run detected that the file is not yet created, it may indicate a host problem, perhaps out of memory or disk space. Beware that subsequent errors may also be due to a host problem.)

2. When running test case X, followed by test case Y, and check-testcase
   fails before test Y, it will print Y as a header and then an error
   message. But it is usually test X that caused a crash. This may lead
   to confusion. I'm only asking that we clarify the message and say that
   X is likely causing the problem.

3. Thank you!

4. OK, it wasn't clear to me that the valgrind version was part of the
   server log. I think it will be more clear after you implement 3. So
   nothing to do here.

5. Can we add something to guide the person who has to analyze the logs? For
   example, I think we could print something like this:

     mysql-test-run has detected that mysqltest failed without producing any output when executing the test <test name>. This is likely due to a host problem, maybe out of memory or disk space. Beware that subsequent errors may also be due to a host problem.

6. Thank you! :-)
[12 Oct 2010 13:32] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/120546

2940 Bjorn Munch	2010-10-12
      Bug #56654 pb2 log is very hard to read
      Added some more info in a number of fail cases
[13 Oct 2010 7:52] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/120627

2941 Bjorn Munch	2010-10-13 [merge]
      Bug #56654 pb2 log is very hard to read
      Added some more info in a number of fail cases
[13 Oct 2010 11:10] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/120669

2941 Bjorn Munch	2010-10-13
      Bug #56654 pb2 log is very hard to read
      Added some more info in a number of fail cases
[19 Oct 2010 12:02] Bugs System
A patch for this bug has been committed. After review, it may
be pushed to the relevant source trees for release in the next
version. You can access the patch from:

  http://lists.mysql.com/commits/121126

2942 Bjorn Munch	2010-10-19
      Bug #56654 pb2 log is very hard to read
      Added some more info in a number of fail cases
      (re-commit for administrative reasons)
[19 Oct 2010 13:10] Bjørn Munch
Pushed to -mtr branches
[29 Oct 2010 15:29] Paul Dubois
Changes to test suite. No changelog entry needed.
[13 Nov 2010 16:08] Bugs System
Pushed into mysql-trunk 5.6.99-m5 (revid:alexander.nozdrin@oracle.com-20101113155825-czmva9kg4n31anmu) (version source revid:alexander.nozdrin@oracle.com-20101113152450-2zzcm50e7i4j35v7) (merge vers: 5.6.1-m4) (pib:21)
[13 Nov 2010 16:39] Bugs System
Pushed into mysql-next-mr (revid:alexander.nozdrin@oracle.com-20101113160336-atmtmfb3mzm4pz4i) (version source revid:vasil.dimov@oracle.com-20100629074804-359l9m9gniauxr94) (pib:21)
[18 Nov 2010 15:53] Bugs System
Pushed into mysql-5.1 5.1.54 (revid:build@mysql.com-20101118153531-693taxtxyxpt037i) (version source revid:build@mysql.com-20101118153531-693taxtxyxpt037i) (merge vers: 5.1.54) (pib:21)
[16 Dec 2010 22:30] Bugs System
Pushed into mysql-5.5 5.5.9 (revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (version source revid:jonathan.perkin@oracle.com-20101216101358-fyzr1epq95a3yett) (merge vers: 5.5.9) (pib:24)