INSTITUTO SUPERIOR TÉCNICO Sistemas de Apoio à Decisão

(1)

--- INSTITUTO SUPERIOR TÉCNICO

Sistemas de Apoio à Decisão Exame: 1 13 junho 2014

--- 1. (6 pts) Algumas perguntas fáceis para começar

1) DW (2 pts) Assinale as seguintes frases como verdadeiras (V) ou falsas (F). Cada resposta correcta corresponde a uma cotação de +0.5 valor. Resposta errada -0,25. Se não responder, a cotação é 0.

(Indicate the question with (V) for true and (F) for false. Every correct answer has a value +0.5. False answer has a value -0.25. No answer the value 0.)

a. (0.5pts)

stdev(), standard deviation é uma função de agregação holística (stdev () is a holistic aggregation function)

(F) stedv() is algebraic, it can be computed by an algebraic function with M arguments

b. (0,5pts)

Em um esquema da “star schema“as tabelas não são normalizadas (In star schema tables are not normalized)

(V) c. (0.5pts)

Supor que um DW consiste em tres dimensões: o tempo, a geografia, o produto e umas medidas contam, onde a contagem é o número de produtos vendidos. Supor que o dimensão tempo tem a hierarquia: ano, mês. Supor que o dimensão geografia tem a hierarquia: país, cidade. Então o número de cobóis é 16

(F) (2+1)*(2+1)*2=9*2=18 d. (0.5pts)

Nos podemos construir um cubo usando so um SQL commando.

(We can build a cube using only one SQL command.)

(F) command cube() by involves more than one command ….

SELECT S#, P#, SUM ( QTY ) AS TOTQTY FROM SP

GROUP BY CUBE ( S#, P#) ;

(2)

2) DM (4pts) Assinale as seguintes frases como verdadeiras (V) ou falsas (F). Cada resposta correcta corresponde a uma cotação de +1 valor. Resposta errada -0,5. Se não responder, a cotação é 0.

(Indicate the question with (V) for true and (F) for false. Every correct answer has a value +1. False answer has a value -0.5. No answer the value 0.)

(a) (1pts)

A algoritmo Backpropagation (rede neural) pode “overfit”

(V) The problems that occurs during neural network training is called overfitting. The error on the training set is driven to a very small value, but when new data is presented to the network the error is large. The network has memorized the training examples, but it has not learned to generalize to new situations.

(b) (1pts)

O algoritmo de clustering k-Means providencia melhores resultados EM Clustering (k-Means clustering gives better then EM Clustering.)

(F) not always, its depends on the parameters, like random initialization, number of clusters (c) (1pts)

Uma Support Vector Machine com o kernel

€

x₁⋅y₁+x₂⋅y₂ resolve o problema OR (Support vector machine with the kernel

€

x₁⋅y₁+x₂⋅y₂ solves the OR problem)

(V) OR is a linear problem, it can be separated by a line in two dimensions (d) (1pts)

O Perceptrão Multi-nível (Multilayer Perceptron) com dez níveis escondidos e com função de activação

mais poderoso que um perceptrão com um nível escondido e com função de activação 𝑓 𝑥 =𝑥+𝑒

(V) the activation function f x =x+e is linear and nonlinear!

€

f(x)=σ(x)= 1 1+e^(−2⋅x)

€

f(x)=σ(x)= 1 1+e^(−2⋅x)

(3)

Dado o cuboid da base com dimensões ano, nação e o produto, informação euro vendidos com função de agregação soma, indica todos os cuboids restantes

(Given the base cuboid with the dimensions year and product the information Euros sold with the aggregation function summation indicate all possible remaining cuboids)

3D cuboids

2D cuboids

Year Country Product € 2013 Portugal Computer 2000

2013 Poland Car 10000

2014 Poland Computer 2000

Year Country Product € 2013 Portugal Computer 2000

2013 Poland Car 10000

2014 Poland Computer 2000

Year Product € 2013 Computer 2000

2013 Car 20000

2014 Computer 2000 Year Country €

2013 Portugal 2000 2013 Poland 20000 2014 Poland 2000

Country Product € Portugal Computer 2000

Poland Car 20000

(4)

1D cuboids

0D cuboid

Year € 2013 22000 2014 2000

Country € Portugal 4000 Poland 20000

€ 24000

Product €

Computer 4000

Car 20000

(5)

3. (3 pts) PCA

Suponha que tem a seguinte:

(Suppose we have following data points:)

x!_i= 1 1

!

"

# $

%

&, 2 2

!

"

# $

%

&, 3 3

!

"

# $

%&

' ()

*)

+ ,) -)

(a) (2 pts) Qual é a matriz da transformação K-L?

(What is the K-L transformation?) Covariance matrix for a sample is:

(For population we divide by n)

m₁=m₂=(1+2+3)/3=2

c11=c12=c21=c22=((1-2)*(1-2)+(2-2)*(2-2)+(3-2)*(3-2))/2=(1+1)/2=1

𝐶= 1 1 1 1

The system has to become linear dependable (singular). The determinant has to become zero.

|λ · I − C| = 0 (λ-1)*(λ-1)-1=0

λ1=0, λ2=1 For λ1=0

−1 −1

−1 −1 1

𝑢_! = 0 𝒖𝟏= 1

−1

€

c_ij =

x_i^(k)−m_i

( ) (

^x^j⁽^k)⁻^m^j

)

k=1 n

∑

n−1

€

C= c₁₁ c₁₂ c₂₁ c₂₂

"

# $

%

&

'

(6)

For λ1=2 1 −1

−1 1 1

𝑢_! = 0 𝒖𝟐 = 1

1

Two normalized eigenvectors are

𝒖𝟏= 1

2

− 1 2 𝒖𝟐=

1 12 2 with

𝑈 = 1

2 1

2

− 1 2

1 2 U is an orthonormal matrix UU^T=U^TU=I

The K-L transformation is given by U^T, for example 1

2 ⁻ 1 1 2

2 1

1 = 0 2

(b) (1 pts) Qual dos vectores próprios (eigenvectors) é mais significante? O critério de Kaiser é aplicável?

(Which Eigenvector is more significant? Can we apply the Kaiser criterion?) The K-L transformation maps the two dimensional data set in one dimension because λ1 is zero. It follows that the vector corresponding to λ2 is more significant.

(7)

4. (3 pts) Belief Networks

Calcule a probabilidade de “MaryCalls” dado “Burglary”, “Erthquake”, serem falso,

”Alarm”,”JoghnCalls” serem desconhecidos.

(Determine the probability of “MaryCalls” if “Burglary”, “Erthquake” are false,”Alarm”,”JoghnCalls” and are unknown)

P(M|not b, not e, A)

𝑃 𝑀 ¬𝑏,¬𝑒,𝐴 =𝛼 𝑃 ¬𝑏 𝑃 ¬𝑒 𝑃 𝐴

!

¬𝑏,¬𝑒)𝑃(𝑀|𝐴) 𝑃 𝑚 ¬𝑏,¬𝑒,𝐴

=𝛼(𝑃 ¬𝑏 𝑃(¬𝑒)(𝑃 𝑎 ¬𝑏,¬𝑒 𝑃 𝑚 𝑎 +𝑃 ¬𝑏 𝑃(¬𝑒)(𝑃 ¬𝑎 ¬𝑏,¬𝑒 𝑃 𝑚 ¬𝑎 )

𝑃 𝑚 ¬𝑏,¬𝑒,𝐴 =𝛼𝑃 ¬𝑏 𝑃(¬𝑒)((𝑃 𝑎 ¬𝑏,¬𝑒 𝑃 𝑚 𝑎 +(𝑃 ¬𝑎 ¬𝑏,¬𝑒 𝑃 𝑚 ¬𝑎 ) and 𝑃 ¬𝑚 ¬𝑏,¬𝑒,𝐴 =𝛼𝑃 ¬𝑏 𝑃(¬𝑒)((𝑃 𝑎 ¬𝑏,¬𝑒 𝑃 ¬𝑚 𝑎 +(𝑃 ¬𝑎 ¬𝑏,¬𝑒 𝑃 ¬𝑚 ¬𝑎 )

𝑃 𝑚 ¬𝑏,¬𝑒,𝐴 =𝛼∗0.999∗0.998∗ 0.001∗0.7+0.999∗0.01 = 0.0107 and

𝑃 ¬𝑚 ¬𝑏,¬𝑒,𝐴 =𝛼∗0.999∗0.998∗ 0.001∗0.3+0.999∗0.99 =0.9863

𝑃 𝑚 ¬𝑏,¬𝑒,𝐴 = 0.0107

0.0107+0.9863=0.0107 𝑃 ¬𝑚 ¬𝑏,¬𝑒,𝐴 = ^!.!"#$ = 0.9893

(8)

5. (2 pts) Decision Tree

F1 F2 F3 Output

a a a y

a a b y

a a a y

a a b n

a b a n

a b b n

Calcule a árvore de decisão para este conjunto de exemplos com target “Output” usando o algoritmo ID3. Indique os seus cálculos.

p(y)=3/6=1/2, p(n)=3/6=1/2, I(table)=-1/2*log2(1/2) -1/2*log2(1/2)=log2(2)= 1 bits

F1 table=Ca=(y,y,y,n,n,n) I(Ca)=1, E(F1)=6/6*1=1 Gain(F1)=1-1=0

F2 Ca=(y,y,y,n) Cb=(n,n)

I(Ca)=-3/4*log2(3/4)-1/4*log2(1/4)= 0.8113 I(Cb)=-1*log2(1)=0

E(F2)=4/6*0.8113+2/6*0=0.5409 Gain(F2)= 0.4591

F3 Ca=(y,y,n) Cb=(y,n,n)

I(Ca)=I(Cb)=-2/3*log2(2/3)-1/3*log2(1/3)= 0.9183 E(F3)=3/6*0.9183+3/6*0.9183=0.9183

Gain(F3) =1-0.9183=0.0817

€

E(P)= |C_i|

|C|

i=1 n

∑

^I(Cⁱ⁾

€

gain(P)=I(C)−E(P)

(9)

F2 is the root and the ramianing table is

F1 F3 Output

a a y

a b y

a a y

a b n

I(tableF2))= -3/4*log2(3/4)-1/4*log2(1/4)=0.8113 F1

E(F1)= 0.8113

Gain(F1)= 0.8113-0.8113=0 F3

Ca=(y,y) Cb=(y,n) I(Ca)=0

I(Cb)=1

E(F3)=2/4*0+2/4*1=1/2 Gain(F3)= 0.8113-1/2=0.3113

The next attribute that is used is F3, the remaining uncertainty is 0.3113 bit.

(10)

6. 3. (3 pts.) Neuronal Network

(a) (2 pts) Derive uma regra de treino de gradiente descendente (gradient descent training rule) para uma única unidade com saída o, em que:

(Determine the gradient descent-training rule for one unit/neuron with:

𝑜= −0.5∗𝑥_! ∗𝑤_!^!

!

!!!

∆𝑤_! = −𝜂∗𝜕𝐸

𝜕𝑤_! = −𝜂∗1

2∗ 2∗(𝑡_!

!∈!

−𝑜_!)∗ 𝜕

𝜕𝑤_!(𝑡_!+1

2∗𝑥_!∗𝑤_!^!)

∆𝑤_! =−𝜂∗𝜕𝐸

𝜕𝑤_! =−𝜂∗ (𝑡_!

!∈!

−𝑜_!)∗𝑥_! ∗𝑤_!)

(b) (1 pts) Dados n=2, os pesos w={w1=0.1,w2=0. 1}. Faça um gradiente descendente estocástico com η=1 para para o vector de entrada x={1,1}={x1=1,x2=1} e alvo (target) t={1}, determine

(Given n=2 with weights w={w1=0.1,w2=0. 1}. Perform a stochastic gradient descent with η=1 with the input x={1,1}={x1=1,x2=1} and target (target) t={1}, determine

o=-1/2*0.1^2-1/2*0.1^2=-0.0100

∆𝑤_! = ∆𝑤_! = −1∗ 1+0.0100 ∗ 1∗0.1 =−0.1010 w1=w2=0.1-0.101=-0.001

€

Δw

€

Δw