INSTITUTO SUPERIOR TÉCNICO Sistemas de Apoio à Decisão

(1)

---

INSTITUTO SUPERIOR TÉCNICO

Sistemas de Apoio à Decisão

Exame: 1 16 January 2017

---

1. (5 pts) Algumas perguntas fáceis para começar

1) Assinale as seguintes frases como verdadeiras (V) ou falsas (F). Cada resposta correcta corresponde a uma cotação de +1 valor. Resposta errada -0,5. Se não responder, a cotação é 0. (Indicate the question with (V) for true and (F) for false. Every correct answer has a value +1. False answer has a value -0.5. No answer the value 0.)

(a) (1pts) (V)

Supor que um DW consiste em seis dimensões: o tempo_1, o tempo_2 a geografia_1, geografia_2, o produto, o cliente e umas medidas contam, onde a contagem é o número de produtos vendidos.

Supor que a dimensão tempo_1 tem a hierarquia mês, dia. Supor que a dimensão tempo_2 tem a hierarquia ano, mês, dia. Supor que a dimensão a geografia tem a hierarquia: continente, país. Então o número de “cuboids” é 288

Suppose that a DW consists of six dimensions: time_1, time_2, geography_1,

geography_2 product, customer and a measures count, where count is the number of sold products.

Suppose that the dimension time_1 has the hierarchy: day, month. Suppose that the dimension time_2 has the hierarchy: day, month, year. Suppose that the dimension geography_1 has the hierarchy continent, country. Then the number of cuboids is 288

(2+1)*(3+1)*(2+1)*(2*2*2)=3*4*3*2=288

(b) (1pts) (V)

Entropia de Shannon (Information) medida em bits pode ter valor zero. (Shannon’s Entropy (Information) measured in bits can have the value zero)

given p=1, -1*log2 1=0

€

I = H(F) = − pi i

(2)

(c) (1pts) (V) or (F)

Num bom ”clustering” os membros de cada “cluster” devem estar próximos entre si. (In a good clustering the members of each cluster should be as close to each other as possible.)

Compactness, the members of each cluster should be as close to each other as possible Separation, the clusters themselves should be widely spaced

However, you could argue from the point of logic that Compactness Ù Separation should by true, in this case (F). Because of this argument everyone gets a point in c!

(d) (1pts) (V).

Uma “Support Vector Machine” com o kernel <x,y>+e só pode classificar problemas linearmente separáveis.

(Support vector machine with the kernel <x,y>+ e can only classify linear separable problems.)

linear kernel, just a scalar product plus a constant e=2.71828 (e) (1pts) (F)

O Perceptrão Multinível (Multilátera Perceptron) com um nível escondido e com função de cativação

só pode classificar problemas linearmente separáveis.

(Multilayer Perceptron with one hidden layer and the activation function

can only classify linear separable problems) € f (x) = σ(x) = 1 1+ e(−2⋅ x ) € f (x) = σ(x) = 1 1+ e(−2⋅ x )

(3)

2) (3 pts) fp-tree

TID Items bought 100 {a, b, c, d, e} 200 {a, b, c, f} 300 {a, b, g} 400 {a, g}

(a) (2 pts) Construa a fp-tree a partir da base de dados de transações, com o threshold de min_sup=50% (2/4) para o suporte mínimo.

(a) (2 pts) Given the transaction data base build a fp-tree using with the threshold for minimum support min_sup=50% (2/4).

a=4/4 b=3/4 c=2/4 d=1/4 e=1/4 f=1/4 g=2/4

TID Items bought 100 {a, b, c }

200 {a, b, c } 300 {a, b, g} 400 {a, g}

(b) (1pts) Indique a “conditional pattern base” (padrão condicional resultante). (b) (1pts) Indicate the resulting conditional pattern base.

b: a:3

g: a:1, (ab):1 c: (ab):2

(4)

3. (4 pts) PCA

Suponha que tem a seguinte dados (representing a sample):

(Suppose we have following data points representing a sample of data from a population:)

! xi= 0 0 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟, 4 0 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟, 2 1 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟, 6 3 ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ ⎧ ⎨ ⎪ ⎩⎪ ⎫ ⎬ ⎪ ⎭⎪

€

(a) (3 pts) Qual é a matriz da transformação K-L? (What is the K-L transformation?)

Covariance matrix for a sample is:

(For population we divide by n) m1=(0+4+2+6)/4=3 m2=(0+0+1+3)/4=1 c11=((0-3)^2+(4-3)^2+(2-3)^2+(6-3)^2)/3=20/3=6.3333 c12=c21==((0-3)*(0-1)+(4-3)*(0-1)+(2-3)*(1-1)+(6-3)*(3-1))/3=8/3=2.6666 c22=((0-1)^2+(0-1)^2+(1-1)^2+(3-1)^2)/3=6/3=2 𝐶 = 20 3 8 3 8 3 2 = 6.6666 2.6666 2.6666 2

he system has to become linear dependable (singular). The determinant has to become zero. |λ · I − C| = 0 𝜆 −20 3 ∗ 𝜆 − 2 − 8 3∗ 8 3= 𝜆,− 26 3 ∗ 𝜆 + 56 9 = 0 λ1=0.789951, λ2= 7.87672 € C = c11 c12 c₂₁ c₂₂ " # $ % & ' € c_ij = x_i(k )− mi

(

)

(

x_j(k )− mj

)

k=1 n

∑

n −1

(5)

For λ1=0.789951 6.6666 − 0.789951 2.6666 2.6666 2 − 0.789951 1 𝑢₂ = 0 5.87665 2.6666 2.6666 1.21005 1 𝑢2 = 0 5.87665 2.6666 = − 2.6666 1.21005 ∗ 𝑢, 𝒖𝟏 ≈ − 1 2.2 For λ2=7.87672 6.6666 − 7.87672 2.6666 2.6666 2 − 7.87672 1 𝑢₂ = 0 −1.21012 2.6666 2.6666 −5.87672 1 𝑢2 = 0 −1.21012 2.6666 = −2.6666 5.87672 ∗ 𝑢, 𝒖𝟐 ≈ 1 0.45

Two normalized eigenvectors are

𝒖𝟏 ≈ − 8.9:_8.;: or 𝒖𝟏 ≈ 8.9:_8.;: 𝒖𝟐 ≈ 8.;:_8.9: or 𝒖𝟐 ≈ − 8.;:_8.9: Because of with 𝑈 = 0.41 0.91 0.91 0.41

€

C

u

!

_i

=

λ

_i

u

!

_i

(6)

é aplicável?

(Which Eigenvector is more significant? Can we apply the Kaiser criterion?) 𝒖𝟐 ≈ 8.;:_8.9: is more significant, in this dimension the point have higher variance, this is

because λ1 < λ2

and

λ1=0.789951, λ1 < 1 the Kaiser criterion can be applied, means the vector can be put away

(7)

4. (3 pts) Belief Networks

Calcule a probabilidade de “RadioNews” dado “Alarm” e verdado. “Earthquake”,“Buglary”, “WatsoCalls” serem desconhecidos.

(Determine the probability of “RadioNews” given “Alarm” is true. “Earthquake”, “Buglary”, WatsonCalls” are unknown)

Note that WatsonCall is irrelevant.

𝑃 𝑅 𝑎 = 𝛼 ∗ 𝑃 𝑒 ∗ 𝑃 𝑏 ∗ 𝑃 𝑅 𝑒 ∗ 𝑃(𝑎|𝑏, 𝑒) G H 𝑃 𝑅 𝑎 = 𝛼 ∗ (𝑃 𝑒 ∗ 𝑃 𝑅 𝑒 ∗ 𝑃 𝑏 ∗ 𝑃(𝑎|𝑏, 𝑒)) G H 𝑃 𝑅 𝑎 = 𝛼 ∗ ((𝑃 𝑒 ∗ 𝑃 𝑅 𝑒 ∗ (𝑃 𝑏 ∗ 𝑃 𝑎 𝑏, 𝑒 + 𝑃 ¬𝑏 ∗ 𝑃 𝑎 ¬𝑏, 𝑒 + 𝑃 ¬𝑒 ∗ 𝑃 𝑅 ¬𝑒 ∗ (𝑃 𝑏 ∗ 𝑃 𝑎 𝑏, ¬𝑒 + 𝑃 ¬𝑏 ∗ 𝑃 𝑎 ¬𝑏, ¬𝑒 ) P(r|a)= 𝛼 ∗(0.001*0.99*(0.1*0.99+0.9*0.5)+ 0.999*0.03*(0.1*0.98+0.9*0.1))= 0.00617787 P(¬r|a)= 𝛼 ∗(0.001*0.01*(0.1*0.99+0.9*0.5)+ 0.999*0.97*(0.1*0.98+0.9*0.1))= 0.182183 P(r|a)= 0.00617787/(0.00617787+0.182183)= 0.0327981

(8)

5. (5 pts) Perceptron

(a) (3 pts) Derive uma regra de treino de gradiente descendente (gradient descent training rule) para uma única unidade com saída o, em que:

(a) (3 pts) Determine the gradient descent-training rule for one unit/neuron with:

𝑜 = 𝑒 PMQR(LM∗NM)O 𝑛𝑒𝑡 = 𝑤_V, W VX: ∗ 𝑥_V, 𝜕𝑛𝑒𝑡 𝜕𝑤 V = 2 ∗ 𝑤V ∗ 𝑥V , ∆𝑤_V = −𝜂 ∗𝜕𝐸 𝜕𝑤V = −𝜂 ∗ 1 2∗ 2 ∗ (𝑡^ _∈^ − 𝑜_^) ∗ 𝜕 𝜕𝑤V(𝑡^− 𝑜^) ∆𝑤V = −𝜂 ∗ 𝜕𝐸 𝜕𝑤V = −𝜂 ∗ 1 2∗ 2 ∗ (𝑡^ _∈^ − 𝑜^) ∗ 𝜕 𝜕𝑤V(𝑡^− 𝑒 L_MO P MQR ∗NM,aO ) ∆𝑤V = −𝜂 ∗ 𝜕𝐸 𝜕𝑤V = −𝜂 ∗_{_∈^}(𝑡^ − 𝑜^) ∗ (− 𝑒 L_MO P MQR ∗NM,aO ∗ 2 ∗ 𝑤_V ∗ 𝑥 V,^, ) ∆𝑤V = −𝜂 ∗ 𝜕𝐸 𝜕𝑤V = 𝜂 ∗ 2 ∗ 𝑤V∗_{_∈^}(𝑡^ − 𝑒 L_MO P MQR ∗NM,aO ) ∗ 𝑒 PMQRLMO∗NM,aO ∗ 𝑥 V,^, ∆𝑤V = −𝜂 ∗ 𝜕𝐸 𝜕𝑤V = 𝜂 ∗ 2 ∗ 𝑤V ∗_{_∈^}(𝑡^− 𝑜^) ∗ 𝑜^∗ 𝑥V,^ , .

(b) (2 pts) Dados os pesos w={w1=1,w2=0}. Faça um gradiente descendente estocástico com

h=2 para o vetor de entrada x={1,1}={x1=1,x2=1} e alvo (target) t={0}, determine

(considerando 5 (a) )

(b) (2 pts) Given the weights w={w1=1,w2=0}. Perform a stochastic gradient descent

with h=2 with the input x={1,1}={x1=1,x2=1} and target (target) t={0}, determine

using the results of 5 (a) )

o=Exp[1*1*1*1 + 0*0*1*1]=e ∆𝑤_: = 2 ∗ 2 ∗ 1 ∗ 0 − 𝑒 ∗ 𝑒 ∗ 1 = −4 ∗ 𝑒, 𝑤_:WHL _{= 1 − 4 ∗ 𝑒}, _{= −28.5562} ∆𝑤, = 0 𝑤_,WHL _{= 0} € Δw € Δw