INSTITUTO SUPERIOR TÉCNICO Sistemas de Apoio à Decisão

(1)

--- INSTITUTO SUPERIOR TÉCNICO

Sistemas de Apoio à Decisão Exame: 2 Solution 25 January 2016

---

1. (5 pts) Algumas perguntas fáceis para começar

1) Assinale as seguintes frases como verdadeiras (V) ou falsas (F). Cada resposta correcta corresponde a uma cotação de +1 valor. Resposta errada -0,5. Se não responder, a cotação é 0.

(Indicate the question with (V) for true and (F) for false. Every correct answer has a value +1. False answer has a value -0.5. No answer the value 0.)

(a) (1pts) (V)

Supor que um DW consiste em cinco dimensões: o tempo_1, o tempo_2 a geografia, o produto, o cliente e umas medidas contam, onde a contagem é o número de produtos vendidos. Supor que o dimensão tempo_1 tem a hierarquia mês, dia.

Supor que o dimensão tempo_2 tem a hierarquia ano, mês.

Supor que o dimensão a geografia tem a hierarquia: continente, país.

Então o número de cobóis é 108

Suppose that a DW consists of five dimensions: time_1, time_2, geography, product, customer and a measures count, where count is the number of sold products.

Suppose that the dimension time_1 has the hierarchy: day, month.

Suppose that the dimension time_2 has the hierarchy: month, year.

Suppose that the dimension geography has the hierarchy continent, country.

Then the number of cuboids is 108 (1+2)*(1+2)*(1+2)*2*2=3*3*3*2*2=108 (b) (1pts) (F)

Entropia de Shannon (Information) medido em bits pode ter valores negativos (Shannon’s Entropy (Information) measured in bits can have negative values)

€

I=H(F)=− p_i

i

∑

^log² ^pⁱ

(2)

(c) (1pts) (F)

Num bom ”clustering” os membros de cada “cluster” devem estar tão perto quanto possível e os números do “cluster” devem ser maiores que cinco.

(In a good clustering the members of each cluster should be as close to each other as possible and the number clusters should be bigger than five.)

(d) (1pts) (F)

Uma Support Vector Machine com o kernel <x,y> e mais poderoso que um perceptrão com dois nível escondido e com função activação linear.

(Support vector machine with the kernel <x,y> is more powerful then a Perceptron with two hidden layer and a linear activation function.)

(e) (1pts) (V)

O Perceptrão Multinível (Multilátera Perceptron) com um nível escondidos e com função de cativação

f(x)=e*x só pode classificar problemas linearmente separáveis.

(Multilayer Perceptron with one hidden layers and the activation function f(x)=e*x

can only classify linear seperable problems)

(3)

3) (3 pts) Apriori

TID Items bought 100 {a, b, c, d, f}

200 {a, c}

300 {a}

400 {a, b}

Usando o algoritmo apriori determinar todas as regras fortes com (strong rules) valor de suporte de 50% (min_sup = 2/4) e valor de confiança de 50% (min_conf = 4/2). Por favor, indicar todos os passos do algoritmo.

Using apriori algorithm determine all strong rules with support value 50% (min_sup=2/4) and confidence value 50% (min_conf=2/4). Please indicate all steps of the algorithm.

C1 L1

a 4/4 a 4/4

b 2/4 b 2/4

c 2/4 c 2/4

d 1/4 f 1/4

C2 L2

a, b 2/4 a, b 2/4

a ,c 2/4 a, c 2/4

Strong Rules:

a -> b support 2/4, 50% confidence 2/4, 50%

b -> a support 2/4, 50% confidence 2/2, 100%

a -> c support 2/4, 50% confidence 2/4, 50%

c -> a support 2/4, 50% confidence 2/2, 100%

(4)

3. (3 pts) ) PCA

Suponha que tem a seguinte:

(Suppose we have following data points:)

x!_i= 0 0

⎛

⎝⎜ ⎞

⎠

⎟, 4 0

⎛

⎝⎜ ⎞

⎠

⎟, 1 1

⎛

⎝⎜ ⎞

⎠⎟

⎧

⎨⎪

⎩⎪

⎫

⎬⎪

⎭⎪

€

(a) (2 pts) Qual é a matriz da transformação K-L?

(What is the K-L transformation?) Covariance matrix for a sample is:

(For population we divide by n) m1=(0+4+1)/3=5/3

m2=(0+0+1)/3=1/3

c₁₁=((0-5/3)^2+(4-5/3)^2+(1-5/3)^2)/2=13/3=4.3333

c12=c21=((0-5/3)*(0-1/3)+(4-5/3)*(0-1/3)+(1-5/3)*(1-1/3))/2=-1/3=-0.3333 c₂₂=((0-1/3)^2+(0-1/3)^2+(1-1/3)^2)/2=1/3=0.3333

𝐶= 13

3 −1 3

−1 3

1 3

= 4.3333 −0.3333

−0.3333 0.3333

The system has to become linear dependable (singular). The determinant has to become zero.

|λ · I − C| = 0 𝜆−13

3 ∗ 𝜆−1 3 −1

3∗1

3=𝜆^!−14

3 ∗𝜆+4 3=0 λ1= 0.3058, λ2= 4.3609

€

C= c₁₁ c₁₂ c₂₁ c₂₂

"

# $ %

&

'

€

c_ij =

x_i⁽^k)−m_i

( ) (

^x^j^(k)⁻^m^j

)

k=1 n

∑

n−1

(5)

For λ1=0.3058

0.3058−4.3333 −0.3333

−0.3333 0.3058−0.3333 1

𝑢₂ =0

−4.0275 −0.3333

−0.3333 −0.0275 1

𝑢₂ =0

−4.0275

−0.3333 = ^0.3333 0.0275 ∗𝑢_!

𝒖𝟏≈− 1 12.1

For λ2=4.3609 4.3609−4.3333 −0.3333

−0.3333 4.3609−0.3333 1

𝑢₂ =0 0.0276 −0.3333

−0.3333 4.0276 1

𝑢₂ =0 0.0276

−0.3333 = ^0.3333

−4.0276 ∗𝑢_!

𝒖𝟐 ≈ 1

0.0828 = 1 0.0828

Two normalized eigenvectors are

𝒖𝟏≈− 0.0824 0.9966 𝒖𝟐≈ 0.9966

0.0825 with

𝑈= −0.0824 0.9966

−0.9966 0.0825 as well

𝑈= 0.0824 0.9966 0.9966 0.0825

(6)

(b) 1 pts) Qual dos vectores próprios (eigenvectors) é mais significante? O critério de Kaiser é aplicável?

(Which Eigenvector is more significant? Can we apply the Kaiser criterion?) 𝒖𝟏≈− 0.0824

0.9966 because λ1 < λ2

and

λ1=0.3058, λ1 < 1 the Kaiser criterion can be applied.

(7)

4. (3 pts) Belief Networks

Calcule a probabilidade de “Alarm” dado “Earthquake” e “WatsoCalls” serem verdado.

“Buglary”, RadioNews” serem desconhecidos.

(Determine the probability of “Alarm” given “Earthquake” and “WatsoCalls” are true.

“Buglary”, RadioNews” are unknown) RadioNews is an irrelevant variable!

𝑃 𝐴 𝑒,𝑤 =𝛼∗ 𝑃 𝑤 𝐴 ∗𝑃(

!

𝐴 𝑒,𝑏 ∗𝑃 𝑏 ∗𝑃 𝑒

𝑃 𝐴 𝑒,𝑤 =𝛼∗𝑃 𝑒 ∗𝑃 𝑤 𝐴 ∗ 𝑃(

!

𝐴 𝑒,𝑏 ∗𝑃 𝑏

𝑃 𝐴 𝑒,𝑤 =𝛼∗𝑃 𝑒 ∗𝑃 𝑤 𝐴 ∗(𝑃(𝐴 𝑒,𝑏 ∗𝑃 𝑏 +𝑃(𝐴 𝑒,¬𝑏 ∗𝑃 ¬𝑏 ) 𝑃 𝑎 𝑒,𝑤 =𝛼∗0.001*0.95*(0.99*0.1+0.5*0.9)= 0.00052155

𝑃 ¬𝑎 𝑒,𝑤 = 𝛼∗0.001*0.1*(0.01*0.1+0.5*0.9)= 0.0000451 𝑃 𝑎 𝑒,𝑤 =0.00052155/(0.00052155+0.0000451)= 0.920409

(8)

5. (2 pts) Decision Tree Cluster

F1 F2 F3 F4 Output

c a b x n

a a c a t

a b b a t

c b c x m

a b b a f

Calcule a árvore de decisão para este conjunto de exemplos com target “Output” usando o algoritmo ID3. Indique os seus cálculos.

Determine the decision tree using the ID3 algorithm with the target “Output”. Indicate the calculation.

p(n)=1/5 p(t)=2/5, p(m)=1/5, p(f)=1/5 Log2[x]= Log[x]/Log[2]

I(table)= -3*1/5*Log2[1/5] - 2/5*Log2[2/5] = 1.92193 bits

F1=F4 Ca=(t,t,f), Cc=Cx=(n,m),

I(Ca)= -2/3*Log2[2/3] – 1/3*Log2[1/3]= 0.918296 bits I(Cc)= I(Cx) = -1/2*Log2[1/2] – 1/2*Log2[1/2]= 1 bit E(F1)= 2/5*1+3/5*0.918296= 0.950978

Gain(F1) =Gain(F4)= 1.92193-0.950978=0.970952 F2 Ca=(n,t) Cb=(t,m,f)

I(Ca)= -1/2*Log2[1/2] – 1/2*Log2[1/2]= 1 bit I(Cb)= -3*1/3* Log2[1/3]= 1.58496

E(F2)= 2/5*1+3/5*1.58496=1.35098 Gain(F2)= 1.92193-1.35098=0.57095 F3 Cb=(n,t,f) Cc=(t,m)

I(Cb)= -3*1/3*Log2[1/3] =Log2[3]= 1.58496 bits I(Cc)= -2*1/2Log2[1/2]=Log2[2]=1 bit

E(F2)= 3/5*1.58496+2/5*1=1.3509 Gain(F2)= 1.92193-1.3509=0.57103

€

E(P)= |C_i|

|C|

i=1 n

∑

^I(Cⁱ⁾

€

gain(P)=I(C)−E(P)

(9)

We chose F1 (or F4 since they are equal) as the root The remaining table Ca:

F2 F3 Output

a c t

b b t b b f

F2 and F3 are equal, they ghve the same gain. We chose for the tree F2 The remaining table Cc:

F2 F3 Output a b n

b c m

F2 and F3 are equal as well! (They have the same gain). We chose for the tree F3

(10)

6. (3 pts) Neuronal Network NN

Dados os

w1={w11=0,w12=0,w13=0,w14=0,w15=0}

w2={w21=0,w22=0,w23=0,w24=0,w25=0}

w3={w31=0,w32=0,w33=0,w34=0.,w35=0}

W1={W11=1,W12=1,W13=1}

W2={W21=1,W22=1,W23=1}

e a função de activação: 𝑓 𝑥 = 𝜎 𝑥 =_!!!_(!!∗!)^!

Faça um gradiente descendente estocástico com η=1 para o vector de entrada x={1,1,1,1,1}; e alvo (target) t={2,2}, determine

€

Δw_jk e

€

ΔW_ij para o primeiro passo de adaptação.

netj=net1=net2=net3=0

V1= V2 =V3=1/(1 + Exp[-6*0])= 0.5 Output layer

neti=net1=net2 =Wi1*V1+ Wi2*V2 +Wi2*V2 =1*0.5+1*0.5+1*0.5=1.5 O1= O2 =1/(1 + Exp[-6*1.5])= 0.9999

(11)

Output Layer:

𝑓^! 𝑥 = 6∙𝜎(𝑥)∙(1−𝜎 𝑥 ) neti=net1=net2=1.5

𝛿_! = 𝛿_! =(2-0.9999)*6*1/(1 + Exp[-6*1.5])*(1-1/(1 + Exp[-6*1.5]))= 0.0007 ΔW_ij=δiV_j

Because of symetry (all weights are the same in a layer)

∆𝑊_!"=0.0007*0.5=0.00035

𝑊_!"^!"# =1+0.00035=1.00035

Hidden layer

netj=net1=net2=net3=0

f’(netj)= 6*1/(1 + Exp[-6*0])*(1 - 1/(1 + Exp[-6*0]))=1.5

δ1=δ2=δ3=0.0007*1*1.5+0.0007*1*1.5=0.0021

=0.0021

𝑤_!"^!"#= 0+0.0021=0.0021

€

ΔW_ij =(t_i−o_i)f^'(net_i)V_j

€

Δw_jk = δ_i

i=1 2

∑

^⋅W^ij^f^'^(net^j⁾^⋅^x^k

€

Δw_jk=δ_j⋅x_k