| Bug #45899 | api nodes (re-)connecting to starting data nodes too fast ... | ||
|---|---|---|---|
| Submitted: | 2 Jul 10:23 | Modified: | 12 Aug 20:22 |
| Reporter: | Hartmut Holzgraefe | ||
| Status: | In progress | ||
| Category: | Server: Cluster | Severity: | S2 (Serious) |
| Version: | mysql-5.1-telco-6.3 | OS: | Any |
| Assigned to: | Magnus Blaudd | Target Version: | |
| Tags: | 5.1.32-ndb-6.3.23-cluster | ||
| Triage: | Triaged: D2 (Serious) / R3 (Medium) / E3 (Medium) | ||
[2 Jul 10:23]
Hartmut Holzgraefe
[3 Jul 12:07]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/77868
[13 Jul 16:08]
Magnus Blaudd
Testing has shown that patch is no good. Will change tactic to make the server just return a good error and then leave it to the client to decide how fast it should try reconnect.
[17 Jul 14:01]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/78932 2999 jack andrews 2009-07-17 Bug #45899 api nodes (re-)connecting to starting data nodes too fast ... . committed on behalf of magnus Delay further connection attempts after 3 disconnects during "handshake" phase. Increase the delay with number of rejected connects, but limit to max 10 seconds. 3 failed connects -> 1 second "block" 4 -> 2 .. .. 13 -> 10 NOTE! Remove printouts storage/ndb/src/common/transporter/Transporter.cpp | 54 +++++++++++++ storage/ndb/src/common/transporter/Transporter.hpp | 8 + storage/ndb/src/common/transporter/TransporterRegistry.cpp | 12 ++ 3 files changed, 73 insertions(+), 1 deletion(-)
[23 Jul 13:51]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/79148 3000 jack andrews 2009-07-23 Bug #45899 api nodes (re-)connecting to starting data nodes too fast ... . removed ndbout_c print out
[24 Jul 13:04]
jack andrews
hartmut reports the patch works well: > > count accept() calls in logs: > ---------------------------- > ndb-6.3.23 : 361286 > ndb-6.3.23-patch2: 4083 > > So the number of connections that came in during data ndoe restart > was reduced by almost 99% with the current patch and should solve > Italtels problem perfectly. > just need to check the ndb.ndb_partition_range failure is not related in CluB
[3 Aug 18:13]
Bugs System
Pushed into 5.1.35-ndb-7.0.7 (revid:jonas@mysql.com-20090803153854-un8jv9y2md8k221t) (version source revid:jonas@mysql.com-20090803153854-un8jv9y2md8k221t) (merge vers: 5.1.35-ndb-7.0.7) (pib:11)
[5 Aug 15:50]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/80166 2961 Jonas Oreland 2009-08-05 ndb - revert bug#45899
[12 Aug 19:26]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/80699 2962 Jonas Oreland 2009-08-12 ndb - bug#45899 Fix bug in previous patch, causing platforms with CLOCK_MONOTONIC but not PTHREAD_CONDATTR_SETCLOCK to misbehave (e.g dl145a-c)
[12 Aug 20:09]
Bugs System
A patch for this bug has been committed. After review, it may be pushed to the relevant source trees for release in the next version. You can access the patch from: http://lists.mysql.com/commits/80706 3008 Jonas Oreland 2009-08-12 ndb - bug#45899 - disable previous fix as it cause many tsts to fail in autotest
[12 Aug 20:13]
Bugs System
Pushed into 5.1.35-ndb-6.3.26 (revid:jonas@mysql.com-20090812180907-gg0o88k94c5h4i7b) (version source revid:jonas@mysql.com-20090812180907-gg0o88k94c5h4i7b) (merge vers: 5.1.35-ndb-6.3.26) (pib:11)
[12 Aug 20:13]
Bugs System
Pushed into 5.1.35-ndb-7.0.7 (revid:jonas@mysql.com-20090812181054-euhuy5ncf49fzupv) (version source revid:jonas@mysql.com-20090812181054-euhuy5ncf49fzupv) (merge vers: 5.1.35-ndb-7.0.7) (pib:11)
[12 Aug 20:22]
Jonas Oreland
reverted current patch is it causes many test-failures in autotest basically connect time went from 100ms to 10s new solution should be along the lines 1) if connect fails, retry again after 100ms 2) if server-side is not in CONNECTING instead of closing connection, send back BLOCKED (in text) and have client retry after 100ms (which might lead to a new BLOCKED) 3) This requires a new state CONNECTION_PENDING or similar on both server and client side. This will not consume any extra ports Maybe maybe a patch based on SO_LINGER (which also can remove TIME_WAIT) could be acceptable as a intermediate solution...
