[jira] [Comment Edited] (OFBIZ-5115) wrong COUNT DISTINCT if there's GROUP BY clause

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Comment Edited] (OFBIZ-5115) wrong COUNT DISTINCT if there's GROUP BY clause

Nicolas Malin (Jira)

    [ https://issues.apache.org/jira/browse/OFBIZ-5115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540723#comment-13540723 ]

Leon edited comment on OFBIZ-5115 at 12/29/12 1:57 AM:
-------------------------------------------------------

Jacques,

We use OFBiz as the back-end framework of our LIMS (Laboratory Information Management System). It requires very heavy data statistics. In this case, we defined a static view entity to do some basic summary job and use "select" with distinct options to get desired result data-set. The data itself is no problem, but we found the total count it returned is always a little bigger than expected.

After some research, we found the root cause: the SQL with group by implies to select all the fields to be grouped by, that's why "count" and /or "count(distinct)" does not work as usual. e.g.

SELECT COUNT(*) FROM POSTAL_ADDRESS GROUP BY COUNTRY_GEO_ID -> its count result is same as SELECT COUNTRY_GEO_ID, COUNT(*) FROM POSTAL_ADDRESS GROUP BY COUNTRY_GEO_ID. and,

SELECT COUNT(DISTINCT COUNTRY_GEO_ID) FROM POSTAL_ADDRESS GROUP BY COUNTRY_GEO_ID, STATE_PROVINCE_GEO_ID -> its data set result size equals that of  SELECT COUNTRY_GEO_ID, STATE_PROVINCE_GEO_ID, COUNT(*) FROM POSTAL_ADDRESS GROUP BY COUNTRY_GEO_ID, STATE_PROVINCE_GEO_ID.

               
      was (Author: utcb):
    Jacques,

We use OFBiz as the back-end framework of our LIMS (Laboratory Information Management System). It requires very heavy data statistics. In this case, we defined a static view entity to get some basic summary job and use "select" with distinct options to get desired result data-set. The data itself is no problem, but we found the total count it returned is always a little bigger than expected.

After some research, we found the root cause: the SQL with group by implies to select all the fields to be grouped by, that's why "count" and /or "count(distinct)" does not work as usual. e.g.

SELECT COUNT(*) FROM POSTAL_ADDRESS GROUP BY COUNTRY_GEO_ID -> its count result is same as SELECT COUNTRY_GEO_ID, COUNT(*) FROM POSTAL_ADDRESS GROUP BY COUNTRY_GEO_ID. and,

SELECT COUNT(DISTINCT COUNTRY_GEO_ID) FROM POSTAL_ADDRESS GROUP BY COUNTRY_GEO_ID, STATE_PROVINCE_GEO_ID -> its data set result size equals that of  SELECT COUNTRY_GEO_ID, STATE_PROVINCE_GEO_ID, COUNT(*) FROM POSTAL_ADDRESS GROUP BY COUNTRY_GEO_ID, STATE_PROVINCE_GEO_ID.

                 

> wrong COUNT DISTINCT if there's GROUP BY clause
> -----------------------------------------------
>
>                 Key: OFBIZ-5115
>                 URL: https://issues.apache.org/jira/browse/OFBIZ-5115
>             Project: OFBiz
>          Issue Type: Bug
>          Components: framework
>    Affects Versions: SVN trunk
>            Reporter: Leon
>            Priority: Minor
>             Fix For: SVN trunk
>
>         Attachments: OFBIZ-5115.patch
>
>
> if there's a "group by" and distinct find options set, then the count of select returned is always the total number of non-distinct.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira