Bug #84245 LOAD XML duplicates prior element's values when encountering empty element
Submitted: 17 Dec 2016 23:03 Modified: 28 Dec 2016 0:02
Reporter: Lawrence Gallagher Email Updates:
Status: Verified Impact on me:
None 
Category:MySQL Server: DML Severity:S2 (Serious)
Version:5.5/5.6/5.7/8.0 OS:MacOS
Assigned to: CPU Architecture:Any
Tags: XML

[17 Dec 2016 23:03] Lawrence Gallagher
Description:
When importing data from a survey in XML format, a question response formatted as an empty XML element (i.e., '<response questionid="102"/>') is not imported as empty or missing. Rather, the field in the database record acquires the value of the previously read response. This is clearly erroneous - if a user skips question 102, that question should not appear as blank or missing in the SQL data table, not filled with the most recent non-null value! This occurs whether the response is formatted as '<response questionid="102"/>' or '<response questionid="102"></response>'

How to repeat:
Sample XML file to import (test.xml):
Note that records survey id 2 and 3 contain empty responses for questioned 102. survey id 4 is simply missing the element altogether (and that record imports correctly) 

<?xml version="1.0" encoding="UTF-8"?>
<data>
	<record surveyid="1">
		<response questionid="101">One</response>
		<response questionid="102">Two</response>
		<response questionid="103">Three</response>
	</record>
	<record surveyid="2">
		<response questionid="101">One</response>
		<response questionid="102"/>
		<response questionid="103">Three</response>
	</record>
	<record surveyid="3">
		<response questionid="101">One</response>
		<response questionid="102"></response>
		<response questionid="103">Three</response>
	</record>
	<record surveyid="4">
		<response questionid="101">One</response>
		<response questionid="103">Three</response>
	</record>
</data>

Here is the SQL Import Code

create table demo (
	surveyid TEXT,
	questionid TEXT,
	response TEXT
);

LOAD XML local INFILE '<path...>/test.xml' 
INTO TABLE demo rows identified by '<response>';

select * from demo.demo;

Result Note replicated "One" values for survey id 2 and 3.

> select * from demo.demo

+ ------------- + --------------- + ------------- +
| surveyid      | questionid      | response      |
+ ------------- + --------------- + ------------- +
| 1             | 101             | One           |
| 1             | 102             | Two           |
| 1             | 103             | Three         |
| 2             | 101             | One           |
| 2             | 102             | One           |
| 2             | 103             | Three         |
| 3             | 101             | One           |
| 3             | 102             | One           |
| 3             | 103             | Three         |
| 4             | 101             | One           |
| 4             | 103             | Three         |
+ ------------- + --------------- + ------------- +
11 rows

Suggested fix:
Treat the empty elements as truly missing?  Use "NA" values or some such?  Whatever the fix, they should NOT be duplicating values from a previous field!
[18 Dec 2016 21:34] MySQL Verification Team
Thank you for the bug report. Verified as described.
[28 Dec 2016 0:02] Lawrence Gallagher
Typo in original submission. Change 

"if a user skips question 102, that question should not appear as blank or missing"

To

"if a user skips question 102, that question SHOULD appear as blank or missing"

The substance of the bug report is still correct, as is the example.
[6 Dec 2022 9:01] linda hd
LOAD XML duplicates the prior element's values while taking the Dgcustomerfirst survey.
Dollar General store offers a variety of delicious food items that everyone can enjoy. From tasty snacks to hearty meals, there is something for everyone at Dollar General. So be sure to take the time to participate in the survey to help improve the store and its offerings. Winners of the Dollar general survey can win a $100 gift card.
Visit the Dgcustomerfirst survey site https://dgcustomerfirst.blog/