• Nenhum resultado encontrado

Compressão e Codificação de Dados Segundo Exame e Primeiro/Segundo Teste

N/A
N/A
Protected

Academic year: 2021

Share "Compressão e Codificação de Dados Segundo Exame e Primeiro/Segundo Teste"

Copied!
7
0
0

Texto

(1)

Compress˜

ao e Codifica¸

ao de Dados

Segundo Exame e Primeiro/Segundo Teste

Mestrado em Engenharia Electrot´ecnica e de Computadores, IST 29 de Janeiro de 2013

WARNING: there were several different versions of the exam, with the

correct answers to the questions in Part I in different positions, and some

small differences in the problems of Part II.

NOTAS: (a) Exame (3 horas). Parte I: tudo (certa = 0.5; errada = – 0.25). Parte II: tudo.

(b) 1o teste (90 minutos). Parte I: quest˜oes 1–10 (certa = 1, errada = – 0.5). Parte II: problemas 1 e 2.

(c) 2o teste (90 minutos). Parte II: quest˜oes 11–20 (certa = 1, errada = – 0.5). Parte II: problema 3.

(d) Factos potencialmente ´uteis: log10(2)≃ 0.30; log2(7)≃ 2.807; log2(5)≃ 2.322; log23≃ 1.585

1. Seja X ∈ {0000, 0001, 0002, ..., 9999} a vari´avel aleat´oria (v.a.) que representa o resultado do sorteio da rifa anual de uma determinada associa¸c˜ao de bombeiros volunt´arios; ent˜ao,

a) H(X) = 13 bits/s´ımbolo; 

b) H(X) < 13 bits/s´ımbolo; 

c) H(X) > 13 bits/s´ımbolo. 

Solution: since X takes 10000 different values with equal probability, H(X) = log210000 = log2104 > 13

(because 213= 8192 and 214= 16384.

2. Sejam A, B e C as vari´aveis aleat´orias que representam os trˆes primeiros d´ıgitos do resultado da rifa definida na pergunta anterior (vari´avel aleat´oria X); ent˜ao,

a) H(A, B, C) = H(X); 

b) H(A, B, C) < H(X); 

c) H(A, B, C) > H(X). 

Solution: of course, only three digits of a four digit number have less uncertainty that the full number.

3. Seja Z a v.a. que representa a soma dos quatro d´ıgitos do resultado da rifa definida na pergunta 1; ent˜ao,

a) H(Z) = H(X); 

b) H(Z) < H(X); 

c) H(Z) > H(X). 

Solution: Z is a function of X (if you know the number, you can compute the sum of its digits), thus H(Z)

H(X). Obviously, it is not an injective function (for example, X = 0002⇒ Z = 2, and X = 0101 ⇒ Z = 2),

thus H(Z) < H(X).

4. Seja X a v.a. definida na pergunta 1 e A a v.a. que representa o primeiro d´ıgito do resultado da rifa; ent˜ao

a) H(X|A) = 9 bits/s´ımbolo; 

b) H(X|A) < 9 bits/s´ımbolo; 

c) H(X|A) > 9 bits/s´ımbolo. 

Solution: given one of the digits, the remaining three digits can take 1000 different values with uniform

(2)

5. Sejam X e A as v.a. definidas na quest˜ao anterior; ent˜ao

a) H(A|X) = H(A); 

b) H(A|X) = H(X); 

c) H(A|X) = 0. 

Solution: of course, if you know the number X you know any of its digits, thus H(A|X) = 0.

6. Sejam X e A as v.a. definidas na quest˜ao anterior; ent˜ao

a) I(A; X) = H(A); 

b) I(A; X) = H(X); 

c) I(A; X) = 0. 

Solution: I(A; X) = H(A)−H(A|X) = H(A) (we showed in the previous question that H(A|X) = 0. Intuitively,

the amonunt of information that a digit has about the number to which it belongs is simply the amount of information it has about itself.

7. Seja LX o valor esperado do comprimento do c´odigo bin´ario ´optimo para v.a. X (definida na quest˜ao 1) e LAo

valor esperado do comprimento do c´odigo bin´ario ´optimo para v.a. A (definida na quest˜ao 4); ent˜ao,

a) LX= 4 LA; 

b) LX< 4 LA; 

c) LX> 4 LA. 

Solution: this is exactly like a 4-th order source extension. We have four independent sources (the four digits)

all with the same alphabet{0, 1, ..., 9} and the same uniform probability distribution. Since the probabilities are not powers of 2 (they are all 1/10), it is better to code jointly than separately.

8. Considere uma fonte sem mem´oria X ∈ {a, b, c, d} (com probabilidades P[X = a] = 1/2, P[X = b] = 1/4, P[X = c] = 1/8, P[X = d] = 1/8) que gera 10000 s´ımbolos/segundo. Um codificador bin´ario ´optimo para extens˜ao de segunda ordem esta fonte produz

a) < 35000 bits/segundo; 

b) > 35000 bits/segundo; 

c) = 35000 bits/segundo. 

Solution: the probabilities are powers of 2, thus the extension has no effect on the expected length of the

optimal code, which is simply 7/4 = 1.75 bits/symbol. Since the source produces 10000 bits/second, we have 17500 bits/second (which is less that 35000 bits/second).

9. Considere a fonte markoviana X = (X1, X2, ..., Xt, ...), com Xt∈ {1, 2, 3}, com a seguinte matriz de transi¸c˜ao

P =  1/21/4 1/41/2 1/41/4 1 0 0   .

O valor esperado do comprimento dum esquema de codifica¸c˜ao bin´aria ´optimo para esta fonte ´e:

a) = 3/2 bit/s´ımbolo; 

b) > 3/2 bit/s´ımbolo; 

c) < 3/2 bit/s´ımbolo 

Solution: the expected code-length is the weighted (by the stationary distribution) average of the expected

lengths of the optimal codes for each row of P . In the second and third row, the entropy and expected length of the optimal code are both zero. In the first row, the entropy and the expecte length of the optimal code are both 3/2. Of course, the average of two zeros and 3/2 (with any non-zero weights) is obviously less than 3/2.

(3)

10. Considere a fonte discreta sem mem´oria X ∈ {a, b, c, d}, com probabilidades P[X = a] = 0.4, P[X = b] = 0.25, P[X =c]=0.175, P[X =d]=0.175. O comprimento da palavra de c´odigo aritm´etico para a sequˆencia “bbbb” ´e

a) 8 bits; 

b) 9 bits; 

c) 10 bits. 

Solution: the length of the arithmetic code-word for the sequence is⌈− log P(bbbb)⌉ + 1. Since P(b) = 0.25 =

1/4 = 2−2, we haveP(bbbb) = 2−8 and⌈− log P(bbbb)⌉ + 1 = ⌈8⌉ + 1 = 8 + 1 = 9.

11. Qual das seguintes palavras bin´arias corresponde ao c´odigo delta de Elias para o n´umero natural 19?

a) 001010011; 

b) 0010110011; 

c) 001010010. 

Solution: option (c) must be wrong, because 19 is odd, and any binary number ending in 0 is even. Now, we

know that 1910 = 100112, which has length 5. So, we need to write 5 using the Elias gamma code, which is

its binary expression 101 preceded by a sequence of 2 zeros. Finally, we remove the leading 1 from the binary expression for 19, getting 001010011.

12. Qual das seguintes sequˆencias corresponde `a codifica¸c˜ao Lempel-Ziv-Welch (LZW) da sequˆencia “gaba a baga e a faca” do alfabeto{espa¸co, a, b, c, d, e, f, g}, assumindo que os ´ındices do dicion´ario come¸cam em 1?

a) 8,2,3,2,1,12,11,9,...; 

b) 8,2,3,1,1,12,11,9,...; 

c) 9,2,3,2,1,12,11,8,... 

Solution: its easy to see that option (c) must be wrong, because the first symbol is “g”, which is the

8-th symbol of 8-the alphabet {espa¸co, a, b, c, d, e, f, g}. Obviously, the sequence “8,2,3,2” means “gaba”, while “8,2,3,1” means “gab ”.

13. Considerando o mesmo alfabeto da quest˜ao anterior (e os ´ındices do dicion´ario a come¸car em 1), qual das seguintes sequˆencias corresponde `a descodifica¸c˜ao LZW da sequˆencia “7, 2, 9, 11, 10, 13, 9”?

a) faffaafffaaaff; 

b) fafafafafafafa; 

c) nenhuma das anteriores 

Solution: simply use the LZW decoding procedure.

14. Considere a v.a. X ∈ [0, 1] ⊂ R com fun¸c˜ao densidade de probabilidade (p.d.f.) uniforme no intervalo [0, 1] e a v.a. Y ∈ R, tamb´em com suporte em [0, 1], com a seguinte p.d.f. fY(y) = 2− 2x. Qual das seguintes afirma¸c˜oes

´e verdadeira?

a) h(X) = h(Y ); 

b) h(X) < h(Y ); 

c) h(X) > h(Y ). 

(4)

15. Considere uma v.a. real Z ∈ R cuja entropia diferencial ´e h(Z); para uma vari´avel T = g(Z), em que g ´e uma fun¸c˜ao injectiva, verifica-se necessariamente que

a) h(T )≤ h(Z); 

b) h(T ) = h(Z); 

c) nenhuma das respostas anteriores. 

Solution: for example, consider the variable X in the previous question, which has h(X) = log 1 = 0. Take

Y = αX, which, of course, is an injective function (if α̸= 0). Now, H(Y ) = 0 + log α = log α, which is positive

(thus larger than h(X)), if α > 1, and negative (thus smaller than h(X)) if α < 1.

16. Considere uma fonte X∈ [−1, 1] com fun¸c˜ao densidade de probabilidade fX(x) = (3/2) x2, ligada a um

quanti-zador n˜ao uniforme com as seguintes 4 regi˜oes: R0= [−1, −2/3], R1=]−2/3, −1/3], R2=]−1/3, 0] e R3=]0, 1].

O representante ´optimo de cada uma destas regi˜oes

a) localiza-se sempre `a direita do centro da regi˜ao; 

b) localiza-se sempre `a esquerda do centro da regi˜ao; 

c) nenhuma das respostas anteriores. 

Solution: for x < 0, the pdf (3/2) x2 is decreasing with x, thus the optimal region representatives are towards

the left side of the regions. For x > 0, the pdf (3/2) x2is increasing with x, thus the optimal region representatives

are towards the right side of the region.

17. Considere a fonte e o quantizador definidos na quest˜ao anterior (16); a entropia da sa´ıda do quantizador ´e

a) = 1 bit/s´ımbolo; 

b) > 1 bit/s´ımbolo; 

c) < 1 bit/s´ımbolo. 

Solution: the probability p3=P(X ∈ R3) =P(X ∈]0, 1]) = 1/2 (because the pdf is symmetric. Thus, the other

three probabilities satisfy p1+ p2+ p3= 1/2. So, we need one bit (the entropy of (1/2, 1/2) is 1 bit/symbol) to

distinguish between R3 and the other three regions, plus some more information to distinguish the other three

regions.

18. Considere ainda a fonte definida na quest˜ao 16, agora ligada a um quantizador de 2 bits com o seguinte conjunto de representantes de regi˜oes (o codebook): {y0=−3/4, y1=−1/4, y2= 0, y3= 3/4}. Ent˜ao, o ponto que separa

a regi˜ao R1da regi˜ao R2, no quantizador que minimiza o erro quadr´atico m´edio para este codebook

a) situa-se exactamente em zero; 

b) situa-se `a esquerda de zero; 

c) situa-se `a direita de zero. 

Solution: the boundary between two regions does not depend on the pdf of the input to the quantizer and is

simply the mid-point between the two elements of the codebook. Thus, the separation between R1and R2 is at

(

(−1/4) + 0)/2 =−1/8.

19. Considere ainda a fonte definida na quest˜ao 16, agora ligada a um quantizador uniforme de 4 bits. O erro quadr´atico m´edio resultante ´e

a) = 2−8/3 ; 

b) < 2−8/3; 

c) > 2−8/3. 

Solution: the high resolution approximation for a uniform quantizer is ∆2/12, where ∆ is the width of the

regions. In this case, we have 24 = 16 regions over [−1, 1], thus ∆ = 1/8 = 2−3, and ∆2 = 2−6, and ∆2/12 =

2−8/3, because 12 = 3 22. However, this is not the exact value, because the pdf is not uniform; the exact MSE is lower than the high resolution approximation.

(5)

20. Considere um par de vari´aveis aleat´orias reais X = [X1, X2]∈ R2, independentes e ambas com fun¸c˜ao densidade

de probabilidade gaussiana, com m´edia 0 e variˆancias iguais. Considere o quantizador vectorial ´optimo de 4 bits (16 regi˜oes) para X; as regi˜oes desse quantizador vectorial

a) s˜ao quadrados; 

b) s˜ao hex´agonos; 

c) nenhuma das respostas anteriores. 

Parte II

Problema 1

Considere uma fonte discreta sem mem´oria X ∈ {2, 3}, com probabilidade P[X = 2] = 2/3. Considere uma outra fonte Y que gera s´ımbolos de acordo com Y = XN, onde N ∈ {1, 2} ´e uma vari´avel aleat´oria, independente de X, com probabilidade P[N = 1]= 1/2.

Solution: The following is the table of joint probabilities of the pair (X, Y )

P (X, Y ) x = 2 x = 3 y = 2 1/3 0

y = 3 0 1/6

y = 4 1/3 0

y = 9 0 1/6

Notice that this table shows that X is a function of Y (that is, knowing Y implies knowing X), thus H(X|Y ) = 0. From this table we also immediately get the marginal of Y : P[Y = 2] = P[Y = 4] = 1/3 and P[Y = 3] = P[Y = 9] = 1/6.

a) Obtenha as seguintes quantidades:

H(X) = (2/3) log2(3/2) + (1/3) log23 =−(2/3) + log23≃ 0.92 bit/symbol

H(Y ) = (2/3) log23 + (2/6) log26 = log23 + (1/3)≃ 1.92 bit/symbol

H(X, Y ) = H(X|Y ) + H(Y ) = 0 + H(Y ) = log23 + (1/3)≃ 1.92 bit/symbol

H(Y|X) = H(X, Y ) − H(X) = log23 + (1/3) + (2/3)− log23 = 1 bit/symbol

I(X; Y ) = H(X)− H(X|Y ) = H(X) − 0 = −(2/3) + log23≃ 0.92 bit/symbol

b) Calcule os comprimentos m´edios dos c´odigos bin´arios ´optimos para as fontes X e Y :

Solution: Copt(X) is trivial, since it can only take two values. For Copt(Y ), simply use the Huffman procedure

to design the optimal code.

L[Copt(X)] = 1 bit/symbol

L[Copt(Y )] = 2 bits/symbol

c) Calcule o comprimento m´edio do c´odigo bin´ario conjunto ´optimo para o par (X, Y ) (isto ´e, do c´odigo ´optimo para a distribui¸c˜ao conjunta de (X, Y )).

Solution: Simply use the Huffman procedure to design the optimal code (using the probabilities in table above):

L[Copt(X, Y )] = 2 bits/symbol

d) Calcule o comprimento m´edio do c´odigo quatern´ario ´optimo para a extens˜ao de segunda ordem da fonte X (expresso em quadrits por s´ımbolo da fonte X).

Solution: since Y only takes four different values, the solution is trivial:

L2[C opt

(6)

Problema 2

Considere uma fonte markoviana de primeira ordem, com alfabeto{a, b, c}, com a seguinte matriz de transi¸c˜ao:

P (Xt|Xt−1) Xt= a Xt= b Xt= c

Xt−1= a 1/4 1/4 1/2

Xt−1= b 0 0 1

Xt−1= c 1 0 0

a) Determine a distribui¸c˜ao estacion´aria desta fonte.

Solution:

p= [1/2, 1/8, 3/8]

as you can check by observing that PT[1/2, 1/8, 3/8]T = [1/2, 1/8, 3/8]T.

b) Sabendo que a probabilidade da sequˆencia abca ´e o dobro da probabilidade da sequˆencia bcaa e o triplo da da sequˆencia cabc, isto ´e,P[abca] = 2P[bcaa] e P[abca] = 3P[cabc], determina a distribui¸c˜ao inicial.

Solution: To simplify the notation, writeP[X1= a] = pa,P[X1= b] = pb, andP[X1= c] = pc. Knowing that

P[abca] = pa · 1/4 · 1 · 1 = (1/4)pa, thatP[bcaa] = (1/4)pb, and thatP[cabc] = (1/4)pc, we can write the two

relations given in the question as pa = 2 pb and pa = 3pc. Combining with the condition pa+ pb+ pc = 1, we

have a system of three equations with solution

p1

[

pa, pb, pc

]

=[6/11, 3/11, 2/11].

c) Determine a taxa de entropia condicional desta fonte e o comprimento m´edio do esquema de codifica¸c˜ao ´optimo.

Solution: The entropies of the 2nd and 3rd rows are zero, and the entropy of the first row is 3/2. Thus,

H(X) = (3/2)· (1/2) + 0 · (1/8) + 0 · (3/8) = 3/4 bits/symbol

L[Copt(X)] = (3/2)· (1/2) + 0 · (1/8) + 0 · (3/8) = 3/4 bits/symbol

where the equality between the two results from the fact that all non-zero probabilities are powers of 2.

d) Considere que esta fonte ´e codificada com um c´odigo fixo,{C(a) = 0, C(b) = 10, C(c) = 11}; calcule o comprimento m´edio resultante.

Solution: The expected length under the distribution in the first row is L1= 1·(1/4)+2·(1/4)+2·(1/4) = 5/4

bits/symbol. For rows 2 and 3, we have L2= 1· 0 + 2 · 0 + 2 · 1 = 2 bits/symbol, and L2= 1· 1 + 2 · 0 + 2 · 0 = 1

bits/symbol. Finally

L[C(X)] = (5/4) · (1/2) + 2 · (1/8) + 1 · (3/8) = 10/8 = 5/4 bits/symbol.

Problema 3

Considere uma vari´avel aleat´oria real X, com fun¸c˜ao densidade de probabilidade

fX(x) =

{

B|x| ⇐ x ∈ [−1, 1[,

0 ⇐ x ̸∈ [−1, 1].

a) Determine B para que fX seja uma fun¸c˜ao densidade de probabilidade v´alida.

Solution: The area of fX(x) is that of two triangles of base 1 and height B, that is, each has area B/2, then

the sum of the two areas is B. Since this total area has to be one, such that∫−11 fX(x)dx = 1,

B = 1.

b) Considere que a vari´avel X est´a aplicada na entrada de um quantizador n˜ao uniforme com as seguintes 3 c´elulas:

(7)

Solution:

y0=−y2=

7

9 (due to the symmetry of the problem; see y2)

y1= 0 (trivial, because the region is symmetric around 0 and so is the pdf.)

y2= ∫ 1 1/2 x· x dx ∫ 1 1/2 x dx = ∫ 1 1/2 x2dx ∫ 1 1/2 x dx = [ x3/3 ]1 1/2 [ x2/2 ]1 1/2dx = ( (1/3)− (1/24)) ( (1/2)− (1/8)) = 7 9

c) Considere que a vari´avel X est´a aplicada a na entrada de um quantizador com as seguintes 3 c´elulas: R0= [−1, −C],

R1=]− C, C], R2=]C, 1]. A sa´ıda deste quantizador ´e uma vari´avel aleat´oria discreta bX =Q(X) ∈ {y0, y1, y2}.

Determine C para que a entropia H( bX)tome o m´aximo valor poss´ıvel.

Solution: Define p0=P[X ∈ R0] =P[ bX = y0], p1=P[X ∈ R1] =P[ bX = y1], and p2=P[X ∈ R2] =P[ bX = y2].

Of course, by symmetry of the pdf and of R1, we have p0 = p2 = (1− p1)/2. So, to maximize the entropy we

need p1= 1/3 (which implies that p0= p2= (1− (1/3))/2 = 1/3), that is,

C −C|x| dx = 1 3 ⇒ 2C 0 x dx = 1 3 ⇒ 2 [x2 2 ]C 0 = 1 3 ⇒ C 2= 1 3 ⇒ C = √ 1 3

d) Considere agora que vari´avel X est´a aplicada na entrada de um quantizador uniforme ´optimo de 1 bit (isto ´e, com as seguintes 2 c´elulas: R0 = [−1, 0] e R1 =]0, 1]. Determine o valor exacto do erro quadr´atico m´edio (mean

squared error – MSE) e a sua aproxima¸c˜ao de alta resolu¸c˜ao (designada ]MSE).

Solution: first we need the optimal representatives. Of course, by symmetry, y0=−y1, and

y1= ∫ 1 0 x2dx ∫ 1 0 x dx = [ (x3/3) ]1 0 [ (x2/2) ]1 0 =(1/3) (1/2) = 2 3.

Now, by symmetry of the regions,

MSE = 2 ∫ 1 0 ( x− (2/3))2x dx = 2 ( ∫ 1 0 x3dx | {z } [ x4/4]1 0=(1/4) −(4/3) ∫ 1 0 x2dx | {z } [ x3/3]1 0=(1/3) +(2/3)2 ∫ 1 0 x dx | {z } [ x2/2]1 0=(1/2) ) = (2 4 8 9+ 4 9 ) = 1 18

For the high-resolution approximation, since ∆ = 1, we have simply

] MSE = ∆ 2 12 = 1 12.

e) Considere agora que vari´avel X est´a aplicada na entrada de um quantizador n˜ao uniforme com as seguintes 11

elulas: R0 = [−1, 0] e Ri =](i− 1)/10, i/10], para i = 1, ..., 10. Determine aproxima¸c˜ao de alta resolu¸c˜ao do

erro quadr´atico m´edio, designada ]MSE.

Solution: using the facts that p0 =P[X ∈ R0] = 1/2, that ∆i = (1/10), for i = 1, ..., 10, and that

∑10 i=1pi = 1− p0, we obtain ] MSE = 1 12 10 ∑ i=0 pi∆2i = 1 12 ( ∆20 |{z} 1 P[X∈R0]=1/2 z}|{ p0 + 10 ∑ i=1 pi |{z}∆2i (1/10)2 ) = 1 12 (1 2+ 1 100 10 ∑ i=1 pi | {z } 1/2 ) = 1 12 (1 2+ 1 200 ) = 1 12· 101 200 = 101 2400

Referências

Documentos relacionados

H„ autores que preferem excluir dos estudos de prevalˆncia lesŽes associadas a dentes restaurados para evitar confus‚o de diagn€stico com lesŽes de

Ousasse apontar algumas hipóteses para a solução desse problema público a partir do exposto dos autores usados como base para fundamentação teórica, da análise dos dados

The knowledge of the Basic Infiltration Rate (TIB) of water in the soil is a way to assist in the management and conservation of the soil and allows greater control of

Na hepatite B, as enzimas hepáticas têm valores menores tanto para quem toma quanto para os que não tomam café comparados ao vírus C, porém os dados foram estatisticamente

Novel data covering the universe of oil wells drilled in Brazil allow us to exploit a quasi-experiment: Municipalities where oil was discovered constitute the treatment group,

The probability of attending school four our group of interest in this region increased by 6.5 percentage points after the expansion of the Bolsa Família program in 2007 and

No campo, os efeitos da seca e da privatiza- ção dos recursos recaíram principalmente sobre agricultores familiares, que mobilizaram as comunidades rurais organizadas e as agências