Bug #45899 | api nodes (re-)connecting to starting data nodes too fast ... | ||
---|---|---|---|
Submitted: | 2 Jul 2009 8:23 | Modified: | 7 Apr 2014 9:32 |
Reporter: | Hartmut Holzgraefe | Email Updates: | |
Status: | Verified | Impact on me: | |
Category: | MySQL Cluster: Cluster (NDB) storage engine | Severity: | S2 (Serious) |
Version: | mysql-5.1-telco-6.3 | OS: | Any |
Assigned to: | CPU Architecture: | Any | |
Tags: | 5.1.32-ndb-6.3.23-cluster |
[2 Jul 2009 8:23]
Hartmut Holzgraefe
[3 Jul 2009 10:07]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/77868
[13 Jul 2009 14:08]
Magnus BlÄudd
Testing has shown that patch is no good. Will change tactic to make the server just return a good error and then leave it to the client to decide how fast it should try reconnect.
[17 Jul 2009 12:01]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/78932 2999 jack andrews 2009-07-17 Bug #45899 api nodes (re-)connecting to starting data nodes too fast ... . committed on behalf of magnus Delay further connection attempts after 3 disconnects during "handshake" phase. Increase the delay with number of rejected connects, but limit to max 10 seconds. 3 failed connects -> 1 second "block" 4 -> 2 .. .. 13 -> 10 NOTE! Remove printouts storage/ndb/src/common/transporter/Transporter.cpp | 54 +++++++++++++ storage/ndb/src/common/transporter/Transporter.hpp | 8 + storage/ndb/src/common/transporter/TransporterRegistry.cpp | 12 ++ 3 files changed, 73 insertions(+), 1 deletion(-)
[23 Jul 2009 11:51]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/79148 3000 jack andrews 2009-07-23 Bug #45899 api nodes (re-)connecting to starting data nodes too fast ... . removed ndbout_c print out
[24 Jul 2009 11:04]
jack andrews
hartmut reports the patch works well: > > count accept() calls in logs: > ---------------------------- > ndb-6.3.23 : 361286 > ndb-6.3.23-patch2: 4083 > > So the number of connections that came in during data ndoe restart > was reduced by almost 99% with the current patch and should solve > Italtels problem perfectly. > just need to check the ndb.ndb_partition_range failure is not related in CluB
[3 Aug 2009 16:13]
Bugs System
Pushed into 5.1.35-ndb-7.0.7 (revid:jonas@mysql.com-20090803153854-un8jv9y2md8k221t) (version source revid:jonas@mysql.com-20090803153854-un8jv9y2md8k221t) (merge vers: 5.1.35-ndb-7.0.7) (pib:11)
[5 Aug 2009 13:50]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/80166 2961 Jonas Oreland 2009-08-05 ndb - revert bug#45899
[12 Aug 2009 17:26]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/80699 2962 Jonas Oreland 2009-08-12 ndb - bug#45899 Fix bug in previous patch, causing platforms with CLOCK_MONOTONIC but not PTHREAD_CONDATTR_SETCLOCK to misbehave (e.g dl145a-c)
[12 Aug 2009 18:09]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/80706 3008 Jonas Oreland 2009-08-12 ndb - bug#45899 - disable previous fix as it cause many tsts to fail in autotest
[12 Aug 2009 18:13]
Bugs System
Pushed into 5.1.35-ndb-6.3.26 (revid:jonas@mysql.com-20090812180907-gg0o88k94c5h4i7b) (version source revid:jonas@mysql.com-20090812180907-gg0o88k94c5h4i7b) (merge vers: 5.1.35-ndb-6.3.26) (pib:11)
[12 Aug 2009 18:13]
Bugs System
Pushed into 5.1.35-ndb-7.0.7 (revid:jonas@mysql.com-20090812181054-euhuy5ncf49fzupv) (version source revid:jonas@mysql.com-20090812181054-euhuy5ncf49fzupv) (merge vers: 5.1.35-ndb-7.0.7) (pib:11)
[12 Aug 2009 18:22]
Jonas Oreland
reverted current patch is it causes many test-failures in autotest basically connect time went from 100ms to 10s new solution should be along the lines 1) if connect fails, retry again after 100ms 2) if server-side is not in CONNECTING instead of closing connection, send back BLOCKED (in text) and have client retry after 100ms (which might lead to a new BLOCKED) 3) This requires a new state CONNECTION_PENDING or similar on both server and client side. This will not consume any extra ports Maybe maybe a patch based on SO_LINGER (which also can remove TIME_WAIT) could be acceptable as a intermediate solution...