Número: Nome:

(1)

--- INSTITUTO SUPERIOR TÉCNICO

Sistemas de Apoio à Decisão Exame: 2 15 Julho 2010

---

1. (6 PTE) Algumas perguntas fáceis para começar.

1) DW (2 pts) Assinale as seguintes frases como verdadeiras (V) ou falsas (F). Cada resposta correcta corresponde a uma cotação de +0.5 valor. Resposta errada -0,25. Se não responder, a cotação é 0.

(Indicate the question with (V) for true and (F) for false. Every correct answer has a value +0.5. False answer has a value -0.25. No answer the value 0.)

a. (0.5pts)

(max()+min())*2 é uma função de agregação algébrica distributive (max()+min())*2 is an algebraic distributive aggregation function

(F) every one got 0.25 for this answer!

b. (0,5pts)

Durante a análise de dados multidimensional todos os cuboids são materializados During multidimensional data analysis all cuboids are materialized

(F)

c. (0.5pts)

Um cuboid com dados mais finos da granulosidade pode ser gerado de coarser- granularity data cuboid

A cuboid with finer granularity data can be generated from coarser-granularity data cuboid

(F)

d. (0.5pts)

A matriz de co-variância é sempre simétrica

A covariance matrix is always symmetric (V)

(2)

2) DM (4pts) Assinale as seguintes frases como verdadeiras (V) ou falsas (F). Cada resposta correcta corresponde a uma cotação de +1 valor. Resposta errada -0,5. Se não responder a cotação é 0.

(Indicate the question with (V) for true and (F) for false. Every correct answer has a value +1. False answer has a value -0.5. No answer the value 0.)

(a) (1pts)

Complete linkage e Single linkage podem dar o resultado diferente quando usados na in tree clustering

Complete linkage and Single linkage may give different result when used in tree clustering (V)

(b) (1pts)

Sequence clustering sofre dos mesmos problemas que k-means. Qual é o número de k e de iniciação aleatória que conduz a resultados diferentes.

Sequence clustering suffers from the same problems as k-means. What is the number of k and random initialization that leads to different results.

(V)

(c) (1pts)

A multilayer Perceptron pode “overfit”.

A multilayer Perceptron may overfit.

(V)

(a) (1pts)

O Multilayer Perceptron com dez níveis escondidos e com função de activação linear é mais poderoso do que um Perceptron com um nível escondido e com função de activação linear Multilayer Perceptron with teen hidden layers and a linear activation function is more powerful then a Perceptron with one hidden layer and a linear activation function (F)

(3)

2. (3 pts) DW

Supor que um DW consiste em três dimensões, tempo, empregado e filial e one measure count. Supor que dimensão tempo tem a hierarquia dia, mês, quarter e ano. Supor dimensão empregado definido pelo attributes name e programming skills.

Suppose that a data warehouse consists of three dimensions, time, employee and branch and one measure count.

Suppose that the dimension time has the hierarchy day, month, quarter, year. Suppose the dimension employee is defined by the attributes name and programming skills.

(a) (1pts)

Apresente o modelo em estrela para esta DW.

Present a Star schema for a DW (b) (1pts)

Apresente o modelo “Snowflake” para esta DW.

Presen a Snowflake schema for a DW (c) (1pts)

Quantos cuboids existem dentro deste cubo?

How many cuboids are present in this cube?

(a)

(4)

(c)

L=5*2*2=20

(5)

3. (2 pts) fp-growth

Considere o seguinte conjunto de transacções:

(a) (1 pts) Construa uma árvore fp da base de dados de transacções com um suporte minimo de Min_sup=50%

Construct a fp-tree with Min_sup=50%

(a)

Min_Sup=50%=2.5 >= 3/5

Item Support

Beer "4/5"

Diaper "4/5"

Baby Powder "2/5"

Bread "1/5"

Umbrella "1/5"

Milk "2/5"

Detergent "1/5"

Sumol "1/5"

TID List of Items

1 Beer, Diaper, Baby Powder, Bread, Umbrella 2 Diaper, Baby Powder

3 Beer, Diaper, Milk 4 Diaper, Beer, Detergent 5 Beer, Milk, Sumol

(6)

fp-tree

(b) (1 pts) Determine conditonal pattern base.

Determine the conditional pattern base.

item cond.pattern base Diaper Beer:3

TID List of Items 1 Beer, Diaper 2 Diaper, 3 Beer, Diaper 4 Beer, Diaper, 5 Beer

(7)

4. (3 pts) ID3

Calcule a árvore de decisão para este conjunto de exemplos (com atributos: “Hair Length”,

“Weight” and “Age”) com target “Class” usando o algoritmo ID3. Indique os seus cálculos.

Calculate the decision tree using the examples given by the table (attributes: “Hair Length”,

“Weight” and “Age”) with target “Class”. Use the id3 algorithm. indicate the calculation steps.

p(M)=5/9, p(F)=4/9

I(table)= -5./9*log2(5./9)-4./9*log2(4./9)= 0.99108

E(Hair Length)=2/9 *(-1/2*log2(1/2) -1/2*log2(1/2))+2/9*(-1/2*log2(1/2) - 1/2*log2(1/2))=0.44444

Gain(Hair_Length)= 0.54664

E(Weight)=0 /*All wights are different*/

Gain(Wight)= 0.99108

E(Age)=0 /*All ages are different*/

Gain(Age)= 0.99108

done.

(8)

Alternative way, binary decision tree (not required here):

Continuous Valued Attributes (see slides) Create a discrete attribute to test continuous Temperature = 24.5⁰C

(Temperature > 20.0⁰C) = {true, false}

Where to set the threshold?

Temperatur 15⁰C 18⁰C 19⁰C 22⁰C 24⁰C 27⁰C

PlayTennis No No Yes Yes Yes No

Hair Length

0 1 2 4 6 6 8 10 10

M M M F F M F F M

Hair_Length >=2

E(Hair_Length>=2)=6/9*(-4/.6*log2(4./6)-2/.6*log2(2./6))= 6.1220 Weight

20 78 90 150 160 170 180 200 250

F F M F F M M M M

Weight >= 170

E(Weight>=170)=5./9*(-4./5*log2(4./5)-1./5*log2(1./5))= 0.40107 Age

1 8 10 34 36 38 41 45 70

F F M F M M F M M

Age<= 45

E(Age<=45)=7./9*(-4./7*log2(4./7)-3./7*log2(3./7))= 0.76629

(9)

(10)

5. (3 pts) Clustering

Dado o conjunto de dados

€

 x _j = 0 0



  

  , 1 0



  

  , 2 2



  

 

 



 

 com

€

µ₁= 1 1



  

  ,µ₂= 2 2



  

  e

€

Σ1= 1 0 0 1



  

  ,Σ2 = 1 0 0 1



  

  ^e P(C=i)=1 α=1. Execute um passo do algoritmo de clustering EM. Quais são os novos valores de

€

µ_i,Σi,w_i?

Given the data

€

x  _j = 0 0



  

  , 1 0



  

  , 2 2



  

 

 



 

 with

€

µ1= 1 1



  

  ,µ2 = 2 2



  

  e

€

Σ1= 1 0 0 1



  

  ,Σ2 = 1 0 0 1



  

  and P(C=i)=1 α=1. Preform on step of EM clustering algorithm. What are the values of

€

µi,Σi,w_i?

By Bayes’ rule pij=αP(xj|C=i)P(C=i)

pij=αP(xj|C=i)P(C=i)

Because we initialize our parameters arbitrary, we set P(C=i)=1 α=1, with the given data we can simplify d=2, and because of

€

Σ1= 1 0 0 1



  

  ,Σ2 = 1 0 0 1



  

  we get

€

p_i= p_ij

j=1 n

∑

€

µ_i← p_ij x _j p_i

j=1 n

∑

€

Σi← p_ij x _j

x _j^T p_i

j=1 n

∑

€

P(x|C=i)= 1 (2π)^d^{/ 2}Σ_C_=i^{1/ 2}

exp−1

2(x−µ_C=i)^t^ΣC=i

−1(x−µ_C_=i)



  

 

€

P(x|C=i)= 1

(2π)exp−1

2(x−µ_C=i)^t(^x⁻^µ^C⁼ⁱ)



  

 

(11)

p11= exp(-2./2)./(2*pi)= exp(-1)./(2*pi )=0.058550 p12= exp(-1./2)./(2*pi)= 0.096532

p13= exp(-2./2)./(2*pi)= exp(-1)./(2*pi )= 0.058550 p21= exp(-8./2)./(2*pi)= exp(-4)./(2*pi)= 0.0029150 p22= exp(-5./2)./(2*pi)= 0.013064

p23= exp(0)./(2*pi)= 0.15915

p1= 0.058550+0.096532+0.058550=0.21363 p2= 0.0029150+0.013064+0.15915=0.17513

µ1=([0 0]*0.058550+[1 0]*0.096532+[2 2]*0.058550)./0.21363 =[ 1 0.5814]

µ2=([0 0]* 0.0029150+[1 0]* 0.013064 +[2 2]* 0.15915)./ 0.17513 =[ 1.8921 1.817]

Σ1=([0 0]'*[0 0]*0.058550+[1 0]'*[1 0]*0.096532+[2 2]'*[2 2]*0.058550)./0.21363=

€

Σ1= 1.5482 1.0963 1.0963 1.0963



  

 

Σ2= ([0 0]'*[0 0]* 0.0029150 +[1 0]'*[1 0]* 0.013064 +[2 2]'*[2 2]* 0.15915)./ 0.17513 =

€

Σ2 = 3.7096 3.6350 3.6350 3.6350



  

 

(12)

Dados os pesos w1={w11=0,w12=0,w13=0,w14=0}, W1={W11=1,W21=1}, e a função de activação

Faça um gradiente descendente estocástico (uma etapa) com η=1 para o vector de entrada

x={2,0,1,0}={x1=21,x2=0,x3=1,x4=0} e alvo (target) t={1,1}, determine

€

ΔW_ij e

€

Δw_jk para o primeiro passo de adaptação.

Given the wights w1={w11=0,w12=0,w13=0,w14=0}, W1={W11=1,W21=1}, and the activation function σ(x).

Perform one step of the stochastic gradient descent with η=1 with the input vector x={2,0,1,0}={x1=2,x2=0,x3=1,x4=0} eland the target t={1,1}, determine

€

ΔW_ij e

€

Δw_jk after one adaptation step.

V1= 1./(1+exp(-(0)))= 0.5 O1= 1./(1+exp(-(0.5)))= 0.62246 O2= 1./(1+exp(-(0.5)))= 0.62246

W11=W21=1+0.088723*0.5=1.0444

δ¹=(1-0.62246)* 0. 62246*(1-0.62246)= 0.088723 δ²=(1-0.62246)* 0. 62246*(1-0.62246)= 0.088723

δ¹= 0.5*(1- 0.5)*(1*0.088723+1*0.088723)= 0.044361 w11=0+2*0.044361= 0.088722

w12=0+0*0.044361=0

w13=0+1*0.044361= 0.044361 w14=0+0*0.044361=0

€

f(x)=σ(x)= 1 1+e^(−x)

€

ΔW_ij =(t_i−o_i)σ(net_i)(1−σ(net_i))V_j

€

δ1=(t₁−o₁)σ(net₁)(1−σ(net₁)) ΔW₁_j =δ1V_j

€

δ₁=σ(net₁)(1−σ(net₁)) W_i1

i=1 2

∑ ^δⁱ