| Bug #101256 | Regexp_engine::Replace doesn't reset error code after processing a record | ||
|---|---|---|---|
| Submitted: | 21 Oct 2020 6:02 | Modified: | 5 Nov 2020 22:56 |
| Reporter: | Hope Lee (OCA) | Email Updates: | |
| Status: | Closed | Impact on me: | |
| Category: | MySQL Server: Optimizer | Severity: | S2 (Serious) |
| Version: | 8.0.13, 8.0.22 | OS: | CentOS |
| Assigned to: | CPU Architecture: | Any | |
[21 Oct 2020 7:13]
MySQL Verification Team
Hello Lee, Thank you for the report and test case. - 8.0.22 (no crash in latest GA) mysql> SELECT REGEXP_REPLACE(a, ' \+', 'aamz') from t1; +----------------------------------+ | REGEXP_REPLACE(a, ' \+', 'aamz') | +----------------------------------+ | aamzaaa | | | +----------------------------------+ 2 rows in set (0.00 sec) Please ensure to re-send the patch via "Contribution" tab. Otherwise we would not be able to accept it. regards, Umesh
[21 Oct 2020 8:01]
Hope Lee
Fix the issue that Regexp_engine::Replace doesn't reset error code after processing a record. (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: 0001-Bugfix-Regexp_engine-Replace-doesn-t-reset-error-cod.patch (application/octet-stream, text), 4.24 KiB.
[21 Oct 2020 8:06]
Hope Lee
Update the patch. (*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.
Contribution: 0001-Bugfix-Regexp_engine-Replace-doesn-t-reset-error-cod.patch (application/octet-stream, text), 4.24 KiB.
[5 Nov 2020 22:56]
Jon Stephens
Documented fix as follows in the NDB 8.0.23 changelog:
When REGEXP_REPLACE() was used in an SQL statement, the internal
function Regexp_engine::Replace() did not reset the error code
value after handling a record, which could affect processing of
the next record, which lead to issues.
Our thanks to Hope Lee for the contribution.
Closed.

Description: When REGEXP_REPLACE is used in a SQL, the function Regexp_engine::Replace() is responsible for dealing with records one by one. But after processing on a record, the function doesn't reset the value of error code(Regexp_engine::m_error_code). And the error code will affect the execution of processing on the next record, which may cause a problem. How to repeat: In MySQL 8.0.13, the following SQL will cause a crash issue: CREATE TABLE t1 (a varchar(255)); INSERT INTO t1 VALUES (" \aaa"), (""); SELECT REGEXP_REPLACE(a, ' \+', 'aamz') from t1; I have tested it in the newer versions of MySQL server. It will not crash anymore. Because you guys have done the following changes(commit: e79e8bcbeb9c). diff --git a/sql/regexp/regexp_engine.cc b/sql/regexp/regexp_engine.cc index 7361ea5d46c..a89280ef51b 100644 --- a/sql/regexp/regexp_engine.cc +++ b/sql/regexp/regexp_engine.cc @@ -137,6 +137,7 @@ void Regexp_engine::AppendHead(size_t size) { } int Regexp_engine::TryToAppendReplacement(const std::u16string &replacement) { + if (m_replace_buffer.empty()) return 0; UChar *ptr = pointer_cast<UChar *>(&m_replace_buffer.at(0) + m_replace_buffer_pos); int capacity = m_replace_buffer.size() - m_replace_buffer_pos; @@ -170,6 +171,7 @@ void Regexp_engine::AppendReplacement(const std::u16string &replacement) { } int Regexp_engine::TryToAppendTail() { + if (m_replace_buffer.empty()) return 0; UChar *ptr = pointer_cast<UChar *>(&m_replace_buffer.at(0) + m_replace_buffer_pos); int capacity = m_replace_buffer.size() - m_replace_buffer_pos; But I think the above fix doesn't fix the root cause of the problem, just avoid crashing. When an empty string is passed, it should just return m_current_subject, no need to call function Regexp_engine::AppendTail(). Please take the above SQL for a try. Suggested fix: The simple fix is resetting Regexp_engine::m_error_code at the end of Regexp_engine::Replace() function: modified sql/regexp/regexp_engine.cc @@ -101,6 +101,7 @@ const std::u16string &Regexp_engine::Replace(const std::u16string &replacement, AppendTail(); check_icu_status(m_error_code); + m_error_code = U_ZERO_ERROR; m_replace_buffer.resize(m_replace_buffer_pos); return m_replace_buffer; } A more perfect way is resetting Regexp_engine::m_error_code in check_icu_status() function. Because the Regexp_engine::Matches() also has the same issue. And all these functions will call check_icu_status() to check the status at the end. modified sql/regexp/errors.cc @@ -74,8 +74,11 @@ std::unordered_map<UErrorCode, int, UErrorCodeHash> error_map = { {U_REGEX_INVALID_CAPTURE_GROUP_NAME, ER_REGEXP_INVALID_CAPTURE_GROUP_NAME}, {U_REGEX_INVALID_FLAG, ER_REGEXP_INVALID_FLAG}}; -bool check_icu_status(UErrorCode status, const UParseError *parse_error) { - if (status == U_ZERO_ERROR || U_SUCCESS(status)) return false; +bool check_icu_status(UErrorCode *status, const UParseError *parse_error) { + if (U_SUCCESS(status)) { + *status = U_ZERO_ERROR; + return false; + } int error_code = error_map[status]; if (error_code == 0) {