ROLLUP, CUBE, GROUPING Functions and GROUPING SETS
This article gives an overview of the functionality available for aggregation in data warehouses, focusing specifically on the information required for the Oracle Database SQL Expert (1Z0-047) exam.- Setup
- GROUP BY
- ROLLUP
- CUBE
- GROUPING Functions
- GROUPING SETS
- Composite Columns
- Concatenated Groupings
GROUP BY
Let's start be reminding ourselves how theGROUP BY
clause works. An aggregate function takes multiple rows of data returned
by a query and aggregates them into a single result row.SELECT fact_1_id, fact_2_id, COUNT(*) AS num_rows, SUM(sales_value) AS sales_value FROM dimension_tab GROUP BY fact_1_id, fact_2_id ORDER BY fact_1_id, fact_2_id;
ROLLUP
In addition to the regular aggregation results we expect from theGROUP BY
clause, the ROLLUP
extension produces group subtotals from right to left and a grand total. If "n" is the number of columns listed in the ROLLUP
, there will be n+1 levels of subtotals.SELECT fact_1_id, fact_2_id, SUM(sales_value) AS sales_value FROM dimension_tab GROUP BY ROLLUP (fact_1_id, fact_2_id) ORDER BY fact_1_id, fact_2_id;
CUBE
In addition to the subtotals generated by theROLLUP
extension, the CUBE
extension will generate subtotals for all combinations of the
dimensions specified. If "n" is the number of columns listed in the CUBE
, there will be 2n subtotal combinations.SELECT fact_1_id, fact_2_id, SUM(sales_value) AS sales_value FROM dimension_tab GROUP BY CUBE (fact_1_id, fact_2_id) ORDER BY fact_1_id, fact_2_id;
GROUPING Functions
GROUPING
It can be quite easy to visually identify subtotals generated by rollups and cubes, but to do it programatically you really need something more accurate than the presence of null values in the grouping columns. This is where theGROUPING
function comes in. It
accepts a single column as a parameter and returns "1" if the column
contains a null value generated as part of a subtotal by a ROLLUP
or CUBE
operation or "0" for any other value, including stored null values.The following query is a repeat of a previous cube, but the
GROUPING
function has been added for each of the dimensions in the cube.SELECT fact_1_id, fact_2_id, SUM(sales_value) AS sales_value, GROUPING(fact_1_id) AS f1g, GROUPING(fact_2_id) AS f2g FROM dimension_tab GROUP BY CUBE (fact_1_id, fact_2_id) ORDER BY fact_1_id, fact_2_id;
GROUPING_ID
TheGROUPING_ID
function provides an alternate and more
compact way to identify subtotal rows. Passing the dimension columns as
arguments, it returns a number indicating the GROUP BY
level.SELECT fact_1_id, fact_2_id, SUM(sales_value) AS sales_value, GROUPING_ID(fact_1_id, fact_2_id) AS grouping_id FROM dimension_tab GROUP BY CUBE (fact_1_id, fact_2_id) ORDER BY fact_1_id, fact_2_id;
GROUP_ID
It's possible to write queries that return the duplicate subtotals, which can be a little confusing. TheGROUP_ID
function assigns the value "0" to the first set, and all subsequent
sets get assigned a higher number. The following query forces duplicates
to show the GROUP_ID
function in action.SELECT fact_1_id, fact_2_id, SUM(sales_value) AS sales_value, GROUPING_ID(fact_1_id, fact_2_id) AS grouping_id, GROUP_ID() AS group_id FROM dimension_tab GROUP BY GROUPING SETS(fact_1_id, CUBE (fact_1_id, fact_2_id)) ORDER BY fact_1_id, fact_2_id
GROUPING SETS
Calculating all possible subtotals in a cube, especially those with many dimensions, can be quite an intensive process. If you don't need all the subtotals, this can represent a considerable amount of wasted effort. The following cube with three dimensions gives 8 levels of subtotals (GROUPING_ID: 0-7), shown here.If we only need a few of these levels of subtotaling we can use theSELECT fact_1_id, fact_2_id, fact_3_id, SUM(sales_value) AS sales_value, GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id FROM dimension_tab GROUP BY CUBE(fact_1_id, fact_2_id, fact_3_id) ORDER BY fact_1_id, fact_2_id, fact_3_id;
GROUPING SETS
expression and specify exactly which ones we need, saving us having to
calculate the whole cube. In the following query we are only interested
in subtotals for the "FACT_1_ID, FACT_2_ID
" and "FACT_1_ID, FACT_3_ID
" groups.SELECT fact_1_id, fact_2_id, fact_3_id, SUM(sales_value) AS sales_value, GROUPING_ID(fact_1_id, fact_2_id, fact_3_id) AS grouping_id FROM dimension_tab GROUP BY GROUPING SETS((fact_1_id, fact_2_id), (fact_1_id, fact_3_id)) ORDER BY fact_1_id, fact_2_id, fact_3_id;
Composite Columns
ROLLUP
and CUBE
consider each column independently when deciding which subtotals must be calculated. For ROLLUP
this means stepping back through the list to determine the groupings.ROLLUP (a, b, c) (a, b, c) (a, b) (a) ()
CUBE
creates a grouping for every possible combination of columns.Composite columns allow columns to be grouped together with braces so they are treated as a single unit when determining the necessary groupings. In the followingCUBE (a, b, c) (a, b, c) (a, b) (a, c) (a) (b, c) (b) (c) ()
ROLLUP
columns "a" and "b" have
been turned into a composite column by the additional braces. As a
result the group of "a" is not longer calculated as the column "a" is
only present as part of the composite column in the statement.In a similar way, the possible combinations of the followingROLLUP ((a, b), c) (a, b, c) (a, b) () Not considered: (a)
CUBE
are reduced because references to "a" or "b" individually are not
considered as they are treated as a single column when the groupings are
determined.CUBE ((a, b), c) (a, b, c) (a, b) (c) () Not considered: (a, c) (a) (b, c) (b)
Concatenated Groupings
Concatenated groupings are defined by putting together multipleGROUPING SETS
, CUBE
s or ROLLUP
s
separated by commas. The resulting groupings are the cross-product of
all the groups produced by the individual grouping sets. It might be a
little easier to understand what this means by looking at an example.
The following GROUPING SET
results in 2 groups of subtotals, one for the fact_1_id
column and one for the fact_id_2
column.SELECT fact_1_id, fact_2_id, SUM(sales_value) AS sales_value, GROUPING_ID(fact_1_id, fact_2_id) AS grouping_id FROM dimension_tab GROUP BY GROUPING SETS(fact_1_id, fact_2_id) ORDER BY fact_1_id, fact_2_id;
Fuente: http://www.oracle-base.com/articles/misc/rollup-cube-grouping-functions-and-grouping-sets.php
No hay comentarios:
Publicar un comentario