Description:
Decoding of utf-8 string fails if this string starts with \xEF. The reason is that \xEF byte is treated as a BOM marker and skipped, leading to wrong utf-8 data. This happens in function str_decode() in <cdk/foundation/string.h> which invokes rapidjson code to perform utf-8 conversion. It first creates rapidjson input stream from the raw bytes that contain utf-8 data:
rapidjson::EncodedInputStream<FROM, Mem_stream<char> > input(bytes);
It also creates output stream and calls rapidjson::Transcoder<>::Transcode(input, output). When EncodedInputStream<> is constructed, the ctor looks for the BOM marker at the beginning of the data and skips it if present:
EncodedInputStream(InputByteStream& is) : is_(is) {
current_ = Encoding::TakeBOM(is_);
}
This leads to the issue.
How to repeat:
Session sess(...);
auto res = sess.sql("SELECT '\xef\xbc\x88'").execute();
Value row = res.fetchOne().get(0); // throw exception here
Suggested fix:
Use a different input stream implementation, which does not interpret BOM markers. Possibly rapidjson provides such class.