Bug #101256 Regexp_engine::Replace doesn't reset error code after processing a record
Submitted: 21 Oct 2020 6:02 Modified: 5 Nov 2020 22:56
Reporter: Hope Lee (OCA) Email Updates:
Status: Closed Impact on me:
None 
Category:MySQL Server: Optimizer Severity:S2 (Serious)
Version:8.0.13, 8.0.22 OS:CentOS
Assigned to: CPU Architecture:Any

[21 Oct 2020 6:02] Hope Lee
Description:
When REGEXP_REPLACE is used in a SQL, the function Regexp_engine::Replace() is responsible for dealing with records one by one. 

But after processing on a record, the function doesn't reset the value of error code(Regexp_engine::m_error_code). And the error code will affect the execution of processing on the next record, which may cause a problem.

How to repeat:
In MySQL 8.0.13, the following SQL will cause a crash issue:
CREATE TABLE t1 (a varchar(255));
INSERT INTO t1 VALUES (" \aaa"), ("");
SELECT REGEXP_REPLACE(a, ' \+', 'aamz') from t1;

I have tested it in the newer versions of MySQL server. It will not crash anymore. Because you guys have done the following changes(commit: e79e8bcbeb9c).
diff --git a/sql/regexp/regexp_engine.cc b/sql/regexp/regexp_engine.cc
index 7361ea5d46c..a89280ef51b 100644
--- a/sql/regexp/regexp_engine.cc
+++ b/sql/regexp/regexp_engine.cc
@@ -137,6 +137,7 @@ void Regexp_engine::AppendHead(size_t size) {
 }

 int Regexp_engine::TryToAppendReplacement(const std::u16string &replacement) {
+  if (m_replace_buffer.empty()) return 0;
   UChar *ptr =
       pointer_cast<UChar *>(&m_replace_buffer.at(0) + m_replace_buffer_pos);
   int capacity = m_replace_buffer.size() - m_replace_buffer_pos;
@@ -170,6 +171,7 @@ void Regexp_engine::AppendReplacement(const std::u16string &replacement) {
 }

 int Regexp_engine::TryToAppendTail() {
+  if (m_replace_buffer.empty()) return 0;
   UChar *ptr =
       pointer_cast<UChar *>(&m_replace_buffer.at(0) + m_replace_buffer_pos);
   int capacity = m_replace_buffer.size() - m_replace_buffer_pos;

But I think the above fix doesn't fix the root cause of the problem, just avoid crashing. When an empty string is passed, it should just return m_current_subject, no need to call function Regexp_engine::AppendTail(). Please take the above SQL for a try.
 

Suggested fix:
The simple fix is resetting Regexp_engine::m_error_code at the end of Regexp_engine::Replace() function:
modified   sql/regexp/regexp_engine.cc
@@ -101,6 +101,7 @@ const std::u16string &Regexp_engine::Replace(const std::u16string &replacement,
   AppendTail();

   check_icu_status(m_error_code);
+  m_error_code = U_ZERO_ERROR;
   m_replace_buffer.resize(m_replace_buffer_pos);
   return m_replace_buffer;
 }

A more perfect way is resetting Regexp_engine::m_error_code in check_icu_status() function. Because the Regexp_engine::Matches() also has the same issue. And all these functions will call check_icu_status() to check the status at the end.
modified   sql/regexp/errors.cc
@@ -74,8 +74,11 @@ std::unordered_map<UErrorCode, int, UErrorCodeHash> error_map = {
     {U_REGEX_INVALID_CAPTURE_GROUP_NAME, ER_REGEXP_INVALID_CAPTURE_GROUP_NAME},
     {U_REGEX_INVALID_FLAG, ER_REGEXP_INVALID_FLAG}};

-bool check_icu_status(UErrorCode status, const UParseError *parse_error) {
-  if (status == U_ZERO_ERROR || U_SUCCESS(status)) return false;
+bool check_icu_status(UErrorCode *status, const UParseError *parse_error) {
+  if (U_SUCCESS(status)) {
+    *status = U_ZERO_ERROR;
+    return false;
+  }

   int error_code = error_map[status];
   if (error_code == 0) {
[21 Oct 2020 7:13] MySQL Verification Team
Hello Lee,

Thank you for the report and test case.

- 8.0.22 (no crash in latest GA)
mysql> SELECT REGEXP_REPLACE(a, ' \+', 'aamz') from t1;
+----------------------------------+
| REGEXP_REPLACE(a, ' \+', 'aamz') |
+----------------------------------+
| aamzaaa                          |
|                                  |
+----------------------------------+
2 rows in set (0.00 sec)

Please ensure to re-send the patch via "Contribution" tab. Otherwise we would not be able to accept it.

regards,
Umesh
[21 Oct 2020 8:01] Hope Lee
Fix the issue that Regexp_engine::Replace doesn't reset error code after processing a record.

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: 0001-Bugfix-Regexp_engine-Replace-doesn-t-reset-error-cod.patch (application/octet-stream, text), 4.24 KiB.

[21 Oct 2020 8:06] Hope Lee
Update the patch.

(*) I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it.

Contribution: 0001-Bugfix-Regexp_engine-Replace-doesn-t-reset-error-cod.patch (application/octet-stream, text), 4.24 KiB.

[5 Nov 2020 22:56] Jon Stephens
Documented fix as follows in the NDB 8.0.23 changelog:

    When REGEXP_REPLACE() was used in an SQL statement, the internal
    function Regexp_engine::Replace() did not reset the error code
    value after handling a record, which could affect processing of
    the next record, which lead to issues.

    Our thanks to Hope Lee for the contribution.

Closed.