De modo a compilar os algoritmos icebergue mais predominantes, em ambas as perspectivas de processamento, veja-se o pseudocodigo de cada uma das lógicas de selecção.
Perspectiva Centralizada: Algoritmos Icebergue da família STAR
Algoritmo 1: Top-down Computation 24
Function TDC(arrOrder, size) Inputs:
arrOrder[]: attributes to order the select statement by size: number of attributes in arrOrder
Globals:
minsup: minimum support
numTuples: number of tuples in the database table results: table that contains results of TDC()
Locals:
arrNext[]: hold the tuple being examined arrHold[]: the last unique tuple examined strOrder: string containing ordering of fields i, j: counters
curTable: temporary table that holds results of order by query
Method:
1. strOrder = arrOrder[0];
174 2. for i = 1 to size - 1
3. strOrder = strOrder & "," & arrOrder[i] 4. end for
5. Run ("select " & strOrder & " from " & tableName & " order by " & strOrder); 6. for i = 0 to size - 1
7. arrHold[i] = curTable[0, i]; 8.end for
9.for i = 0 to numTuples
10. arrNext[0] = curTable[i , 0]; // first attribute of current tuple 11.for j = 0 to size - 1
12. if j > 0 then
13. arrNext[j] = arrNext[j - 1] & "-" & curTable[i , j];
14. end if
15. if arrNext[j] != arrHold[j] then 16. if count[j] >= minsup then
17. Run ("insert into results (Combination, Count) 18. values (" & arrHold[j] & "," & count[j];
19. end if 20. count[j] = 0; 21. arrHold[j] = arrNext[j]; 22. end if 23. count[j]++; 24. end for 25. end for 26. return(0);
Algoritmo 2: Bottom-up Computation 25
Inputs:
Input: The relation to aggregate.
Dim: The starting dimension for this iteration.
Globals:
Constant numDims: The total number of dimensions.
Constant cardinality[numDims]: The cardinality of each dimension.
Constant minsup: The minimum number of tuples in a partition for it to be output. OutputRec: The current output record.
DataCount [numDims]: Stores the size of each partition. DataCount[i] is a list of integers of size
Cardinality[i].
Outputs:
One record that is the aggregation of input.
Recursively, outputs CUBE (dim, . . . , numDims) on input (with minimum support).
Method:
Procedure BottomUpCube (input, dim)
1.Aggregate(input); // Places result in outputRec
175 2. if input.count() == 1 then // Optimization
WriteAncestors(input[0], dim); return; 3. write outputRec;
4. for d = dim ; d < numDims ; d++ do 5. let C = cardinality[d];
6. Partition(input, d, C, dataCount[d]); 7. let k = 0;
8. for i = 0 ; i < C ; i++ do // For each partition 9. let c = dataCount[d][i]
10. if c >= minsup then // The BUC stops here
11. outputRec.dim[d] = input[k].dim[d]; 12. BottomUpCube(input[k . . . k+c], d+1); 13. end if 14. k += c; 15. end for 16. outputRec.dim[d] = ALL; 17. end for Algoritmo 3: Star-Cubing 26 Inputs: (1) A relational table R
(2) an iceberg condition, min sup (taking count as the measure)
Outputs:
The computed iceberg cube.
Each star-tree corresponds to one cube-tree node, and vice versa.
Method:
Begin
scan R twice, create star-table S and star-tree T; output count of T:root;
call starcubing(T; T:root); End
Procedure starcubing(T; cnode)// cnode: current node { 1. for each non-null child C of T's cube-tree
2. insert or aggregate cnode to the corresponding position or node in C's star-tree; 3. if (cnode:count ¸ min sup) {
4. if (cnode 6= root)
5. output cnode.count;
6. if (cnode is a leaf)
7. output cnode.count;
8. else { // initiate a new cube-tree
9. create CC as a child of T's cube-tree; 10. let TC as CC's star-tree;
11. TC:root’s count = cnode:count; 12. }
176 13. }
14. if (cnode is not a leaf)
15. call starcubing(T; cnode:first child); 16. if (CC is not null) {
17. call starcubing(TC; TC:root); 18. remove CC from T's cube-tree; g 19. if (cnode has sibling)
20. call starcubing(T; cnode:sibling); 21. remove T;
}
Perspectiva Distribuída: Algoritmos Icebergue da família PnP
Algoritmo 4: Pipe’ n Prune sequencial, na memória principal 27
Input:
R[1..n]: a table representing the raw data set consisting
of n rows R[i], i = 1, . . . , n; v: identifier for a group-by of R; pv: a prefix of v.
Output:
The iceberg data cube.
Locals:
k = |v| − |pv|; Rj : tables for storing rows of R;
b[1..k]: a buffer for storing k rows, one for each group-by v1 … vk ; h[1..k]: k integers; i, j : integer counters.
Method:
1. h[1..k] = [1..1]; b[1..k] = [null..null]. 2: for i = 1..n do
3: for j = k..1 do
4: if (b[j] = null) OR (the feature values of b[j ] are a prefix of R[i]) then 5: Aggregate Rj [i] into b[j ].
6: else
7: if b[j ] has minimum support then 8: Output b[j ] into group-by vj .
9: if j ≤ k −2 then
10: Create a table Rj = ˆR j+1[h[j]] …ˆR j+1[i −1].
11: Sort and aggregate Rj .
12: Call PnP-1(Rj , ˆvj+1, vj ).
13: end if
14: end if
15: Set b[j] = null and h[j] = i.
16: end if
17: end for 18: end for
27 O pseudocodigo do primeiro algoritmo PnP, referente ao processamento em memória principal, foi extraído do artigo [Chen et al., 2005].
177
19: Create a table R [1..n1] by sorting and aggregating ˆR1[1] …ˆR1[n].
20: Call PnP-1(R, ˆv 1,∅).
Algoritmo 5: Pipe’ n Prune sequencial, em memória externa 28
Input:
R[1..n]: a table (stored on disk) representing the raw data set consisting of n
rows R[i], i = 1, . . . , n; v: identifier for a group-by of R; pv: a prefix of v.
Output:
The iceberg data cube (stored on disk).
Locals:
k = |v| − |pv|; Rj : tables for storing rows of R (called partitions);
Fj : disk files for storing multiple partitions Rj ; b[1..k]: a buffer for storing k rows, one for each group-by v1, . . . , vk ; h[1..k]:
k integers; i, j : integer counters.
Method:
1. b[1..k] = [null..null]; h[1..k] = [1..1].
2. for i = 1..n (while reading R[i] from disk in streaming mode. . . ) do 3. for j = k..1 do
4. if (b[j] = null) OR (the feature values of b[j ] are a prefix of R[i]) then 5. Aggregate Rj [i] into b[j ].
6. else
7. if b[j ] has minimum support then
8. Output b[j ] into group-by vj . Flush to disk if vj ’s buffer is full.
9. if j ≤ k −2 then
10. Create a table Rj = ˆR j+1[h[j]] …ˆR j+1[i −1].
11. Sort and aggregate Rj (using external memory sort if necessary).
12. Write the resulting Rj and an “end-of-partition” symbol to file Fj .
13. end if
14. end if
15. Set b[j] = null and h[j] = i.
16. end if
17. end for 18. end for 19. for j = k..1 do
20. for each partition Rj written to disk file Fj in line 11 do 21. Call PnP-2(Rj , ˆvj+1, vj ).
22. end for 23. end for
24. Create a table R [1..n1] by sorting and aggregating ˆR 1[1] … ˆR 1[n]
(using external memory sort if necessary). 25. Call PnP-2(R, ˆv 1, ∅).
28 O pseudocodigo do segundo algoritmo PnP, referente ao processamento em memória externa, foi extraído do artigo [Chen et al., 2005].
178 Algoritmo 6: Pipe’ n Prune distribuído 29
Input:
R: a table representing a d-dimensional raw data set consisting of n rows,
stored on p processors. Every processor Pi stores (on disk) a table Ri of n p rows of R min_sup: the minimum support.
Output:
The iceberg data cube (distributed over the disks of the p processors)
Variables:
On each processor Pi a set of d tables Ti1 …Tid.
Method:
1.for j = 1..d do
2. Each processor Pi : Compute Tij from Tij-1 via sequential sort (Ti 1 = Ri)
3. Perform a parallel global sort on T1j ∪ T1j ∪ … ∪ Tpj 4.end for
5.for j = 1..d do
6. Each processor Pi : Apply Algorithm 4 to Tij.
7. end for
29 O pseudocodigo do terceiro algoritmo PnP, referente ao processamento de cubos icebergue em ambiente distribuído, foi extraído do artigo [Chen et al., 2005]. Repare-se que este algoritmo é uma extensão do algoritmo 4 em versão distribuída.
O PTIMI Z AÇÃ O DE E S T RU T U RAS M UL TI DIM E N S IONA IS DE D A DOS E M A MB IE NTES OLAP 17 9 Tabela 9. Comparativo dos méto dos iceb ergue em difer en tes cen ár ios