Description:
MaxDB returns wrong results (less than would be correct) for a SUM() aggregation if joined with another table (1 record) and two (2) columns in GROUP BY clause (see queries).
Unfortunately I couldn't come up with a test scenario that will safely reproduce this behavior. But I regard the issue serious enough to not leave it unreported.
How to repeat:
Unfortunately I haven't been able to come up with a test case that certainly will reproduce this error. I will try to come up with more information as soon as possible. Here's what I got so far:
db parameter JOIN_OPERATOR_IMPLEMENTATION ='YES' or 'NO' (doesn't show up when set to 'IMPROVED').
DDL of involved tables and views:
CREATE TABLE ag_hourlymaster
(
timestampid TIMESTAMP NOT NULL,
groupid FIXED(5) NOT NULL,
urlid FIXED(9) NOT NULL,
categoryid FIXED(6) DEFAULT 1 NOT NULL,
userid FIXED(9) NOT NULL,
referrerid FIXED(9) NOT NULL,
serverinfoid FIXED(9) NOT NULL,
applianceid FIXED(9) NOT NULL,
transcodeid FIXED(9) NOT NULL,
actionid FIXED(5) NOT NULL,
status FIXED(9) NOT NULL,
userinfoid FIXED(9) NULL,
refcount FIXED(9) NULL,
avgresponsetime FIXED(14,3) NULL,
avgbytecount FIXED(14,3) NULL,
avgbytesdropped FIXED(14,3) NULL,
avgpacketsrequested FIXED(14,3) NULL,
avgpacketsserved FIXED(14,3) NULL,
avgpacketsresent FIXED(14,3) NULL,
avgpacketsdropped FIXED(14,3) NULL,
avgplaytime FIXED(14,3) NULL,
bandwidth FIXED(14,3) NULL,
avgconnecttime FIXED(14,3) NULL,
avgbrowsetime FIXED(14,3) NULL,
reqmodid FIXED(6) NULL,
respmodid FIXED(6) NULL,
virusid FIXED(9) DEFAULT 1 NOT NULL,
CONSTRAINT PK_hourlymaster PRIMARY KEY
( timestampid, applianceid, virusid, categoryid,
transcodeid, serverinfoid, actionid,
userid, status, urlid, referrerid, groupid)
)
//
CREATE INDEX IDX_hourly_userid ON ag_hourlymaster(userid)
//
CREATE INDEX IDX_hourly_appid ON ag_hourlymaster(applianceid)
//
CREATE VIEW v_ag_hourlymaster AS
SELECT timestampid,
groupid,
urlid,
categoryid,
virusid,
userid,
referrerid,
serverinfoid,
applianceid,
transcodeid,
actionid,
0 AS abortcodeid,
status,
userinfoid,
refcount,
avgresponsetime,
avgbytecount,
avgbytesdropped,
avgpacketsrequested,
avgpacketsserved,
avgpacketsresent,
avgpacketsdropped,
avgplaytime,
bandwidth,
avgconnecttime,
avgbrowsetime,
reqmodid,
respmodid,
YEAR(timestampid) AS dt_year,
MONTH(timestampid) AS dt_month,
DAY(timestampid) AS dt_day,
DAYOFWEEK(timestampid) AS dt_dayofweek,
HOUR(timestampid) AS dt_hour,
FLOOR((MONTH(timestampid)-1)/3)+1 AS dt_quarter,
DATE(timestampid) AS dt_date,
(CASE WHEN ((transcodeid > 40 AND transcodeid < 56 ) OR
transcodeid = 201 OR transcodeid = 230 OR transcodeid = 231)
THEN 1
ELSE 0
END) AS cache_hits,
(CASE WHEN (transcodeid = 90 OR transcodeid = 145 OR transcodeid = 96)
THEN 1
ELSE 0
END) AS reload_refs,
0 AS abort_refs,
(CASE WHEN (transcodeid > 89 AND transcodeid < 146)
THEN 1
ELSE 0
END) AS non_cacheable_object_refs,
(CASE WHEN ((transcodeid > 40 AND transcodeid < 56 ) OR
transcodeid = 201 OR transcodeid = 230 OR transcodeid = 231)
THEN 1
ELSE 0
END) AS obj_hit_rate,
(CASE WHEN ((transcodeid > 40 AND transcodeid < 56 ) OR
transcodeid = 201 OR transcodeid = 230 OR transcodeid = 231)
THEN avgbytecount
ELSE 0
END) AS byte_hit_rate,
(CASE WHEN transcodeid < 201
THEN 1
ELSE 0
END) AS tcp_refs,
(CASE WHEN (transcodeid > 200 and transcodeid < 230)
THEN 1
ELSE 0
END) AS icp_refs,
(CASE WHEN ((transcodeid > 70 AND transcodeid < 180) OR
transcodeid = 221 OR transcodeid = 232)
THEN 1
ELSE 0
END) AS miss_refs,
(CASE WHEN ((transcodeid > 40 AND transcodeid < 56 ) OR
transcodeid = 201 OR transcodeid = 230 OR transcodeid = 231)
THEN avgbytecount
ELSE 0
END) AS cache_bytes,
(CASE WHEN ((transcodeid > 70 AND transcodeid < 180) OR
transcodeid = 221 OR transcodeid = 232)
THEN avgbytecount
ELSE 0
END) AS miss_bytes,
(CASE WHEN (transcodeid > 89 AND transcodeid < 146)
THEN avgbytecount
ELSE 0
END) AS non_cacheable_bytes
FROM ag_hourlymaster
//
CREATE TABLE ag_abortcode
(
abortcodeid FIXED(5) CONSTRAINT pk_abortcode PRIMARY KEY,
abortcode VARCHAR(100) NOT NULL,
description VARCHAR(255) NULL
)
//
CREATE UNIQUE INDEX IDX_abortcode
ON ag_abortcode ( abortcode )
//
INSERT INTO ag_abortcode ( abortcodeid, abortcode, description ) VALUES ( 0, '-', '-' )
Query with correct result:
SELECT count(*) AS COL3, SUM(TA2.refcount)
FROM ag_abortcode TA1, v_ag_hourlymaster TA2
WHERE TA1.abortcodeid=TA2.abortcodeid AND ( TA2.applianceid=1 )
Queries with wrong result:
SELECT count(*) AS COL3, SUM(TA2.refcount)
FROM ag_abortcode TA1, v_ag_hourlymaster TA2
WHERE TA1.abortcodeid=TA2.abortcodeid AND ( TA2.applianceid=1 )
GROUP BY TA1.abortcode, TA1.description
SELECT TA1.abortcode AS COL1, TA1.description AS COL2, SUM(TA2.refcount) AS COL3
FROM ag_abortcode TA1, v_ag_hourlymaster TA2
WHERE TA1.abortcodeid=TA2.abortcodeid AND ( TA2.applianceid=1 )
GROUP BY TA1.abortcode, TA1.description
Note that the join is done on a column that is set to a const value in the view.
The error occurs as soon as there are two GROUP BY columns.
Another possible cause might be that statistics may be updated during bulk insert of data into ag_hourlymaster.
I wasn't able to reproduce this by just dumping the two tables and loading them into an empty database. I'm still searching for a simple test case.
Suggested fix:
The easiest fix would probably to disable JOIN_OPERATOR_IMPLEMENTATION and leave it at 'IMPROVED'. However, I don't know what side effects that would have.