• Nenhum resultado encontrado

Randomized Algorithms I

N/A
N/A
Protected

Academic year: 2023

Share "Randomized Algorithms I"

Copied!
89
0
0

Texto

(1)

Introduction to

Randomized Algorithms I

Joaquim Madeira

Version 0.3 – November 2019

(2)

Overview

Deterministic vs Non-Deterministic Algorithms

Randomized Algorithms

Randomness as a Source of Efficiency – Example

Discrete Probability

Statistical Experiments and Events

Probabilities and Random Variables

Simulation of Random Events

Application Examples

Statistical Experiments

Simple Games

(3)

Algorithms

Algorithm

Sequence of non-ambiguous instructions

Finite amount of time

Input to an algorithm

An instance of the problem the algorithm solves

How to classify / group algorithms?

Type of problems solved

Design techniques

(4)

Deterministic Algorithms

A deterministic algorithm

Returns the same answer no matter how many times it is called on the same data.

Always takes the same steps to complete the task when applied to the same data.

The most familiar kind of algorithm !

There is a more formal definition in terms of state machines…

(5)

Non-Deterministic Algorithms

A non-deterministic algorithm

Can exhibit different behavior, for the same input data, on different runs.

As opposed to a deterministic algorithm !

Often used to obtain approximate solutions to given problem instances

When it is too costly to find exact solutions using a deterministic algorithm

(6)

Non-Deterministic Algorithms

How to behave differently from run to run ?

Factors of non-deterministic behavior

External state other than the input data

User input / timer values / random values

Timing-sensitive operation on multiple processor machines

Hardware errors might force state to change in unexpected ways

(7)

Randomized Algorithms

Use a degree of randomness as part of an algorithm’s logic

Algorithm behavior can be guided by random bits as an auxiliary input

Take decisions by tossing coins !

Aiming at good performance on average !

(8)

Randomized Algorithms

What is the effect of randomness?

Algorithm running time and / or algorithm output are random variables

Determined by the random bits / by the coin tossing results

(9)

Randomness as a source of efficiency

Computers C1 and C2 at separate locations

Connected via a network

Initial copies of the same DB: DB1 and DB2

BUT, contents evolve over time !

DB changes have been done simultaneously

(10)

Deterministic approach

DBs of size n bits (e.g., n = 1016)

Is the data on both computers the same ?

Yes / No – Decision Problem

What is the number of bits that have to be exchaged, between C1 and C2, to solve the problem ?

At least n bits !!

Send the entire DB, without communication errors

(11)

Randomized approach

Contents of DB1 are a string X of n bits

Contents of DB2 are a string Y of n bits

C1 makes a uniform random choice of a prime number p from [2, n2]

Computes s = Number(X) mod p

String X is the binary rep. of natural Number(X)

And sends (s, p) to C2

(12)

Randomized approach

C2 reads (s, p)

Computes r = Number(Y) mod p

If s ≠ r, then C2 outputs “X ≠ Y”

If s = r, then C2 outputs “X = Y”

Reliable answers ?

(13)

Randomized approach

Size of the message (s, p) ?

At most,

4 × ceil( log2 n ) bits

Given that s ≤ p < n2

n = 1016 implies a message of, at most, 256 bits

(14)

Randomized approach

Reliability of the final answer ?

If X = Y, then the answer is always correct !!

If X ≠ Y, then the answer might be wrong !!

For X ≠ Y, the output might be “X = Y”, if the chosen prime was a “bad” prime for (X, Y)

Number(X) mod p = Number(Y) mod p, with X ≠ Y

(15)

Randomized approach

Choose p from {2, 3, 5, 7, 11, 13, 17, 19, 23}

p = 7

X = 01111 → Number(X) = 15

Y = 10110 → Number(Y) = 22

Number(X) mod p = Number(Y) mod p

BUT, X ≠ Y

(16)

Randomized approach

Error probability ?

At most, (ln n2) / n , which presents no real risk…

For n = 1016 the error probability is, at most, 0.36892 × 10-14

(17)

Randomized approach

If we want to be safer, we can use 10 rand.

chosen primes

10 independent repetitions

Message will be 10 times larger !

Error probability ?

Are all 10 primes “bad” primes ?

For n = 1016 the error probability is smaller

(18)

Random Number Generators

The source of randomness is usually a random number generator

Repeated calls return a stream of numbers

That appear to be randomly chosen

From some range / interval

In reality, they are pseudo-random numbers !

Generated by particular recurrence relations

It is possible to calculate each value from a sequence of preceeding values !!

(19)

Random Number Generators

Check the story of Daniel Corriveau at

http://www.americancasinoguide.com/gambling- stories/costly-casino-mistakes-the-keno-mix-

up.html

What happened ?

(20)

Python – The random Module

For integers

randint(…)

randrange(…)

For sequences

choice(…)

sample(…)

(21)

Python – The random Module

For generating real-valued distributions

random() # next random float in [0,1)

uniform(…)

gauss(…)

(22)

Python – The random Module

Reproducibility

It might be useful to reproduce the sequences given by a pseudo random number generator

Re-using of seed values

Same sequence should be reproducible from run to run, as long as multiple threads are not running

seed(…)

(23)

Python – The secrets Module

The pseudo-random generators of the random module should not be used for security purposes !

For security or cryptographic uses, use the secrets module instead !

Generation of secure random numbers

(24)

Discrete Probability

Chance enters into many attempts to understand the world we live in

A theory of probability allows us to calculate the likelihood of complex events

Probabilities are called “discrete” if we can compute the probabilities of all events by summation

(25)

Probability Space

Probability theory starts with the idea of a probability space (Ω,Pr)

A set Ω of all things that can happen

A rule assigning a probability Pr(ω) to each elementary event ω in Ω

(26)

Probability Distribution

For a discrete probability space

Pr(ω) ≥ 0

∑ Pr(ω) = 1

Pr is the probability distribution

It distributes the total probability among the elementary events

(27)

Example – Fair dice

Roll one fair 6-sided die

Each of the 6 possibilities has probability 1/6

Roll a pair of fair dice

Set of elementary events : D2 = ?

Probability of each event ?

(28)

Example – “Loaded” dice

Distribution of probabilities

Probability of each event in D2 ?

(29)

Example – Fair die + “Loaded” die

Consider the case of one fair die and one loaded die

Real-world dice do not turn up equally often on each side !!

No perfect symmetry !!

BUT, 1/6 is usually close to the truth…

(30)

Example – Doubles are thrown

The event that “doubles are thrown”

Probability of an event A

Pr(“doubles are thrown”) = ?

When is this event more probable ?

(31)

Statistical experiments and events

Statistical experiment

Repeatable experiment where the particular outcome of a trial cannot be predicted with certainty

Sample space, S

Set with the representation of all possible outcomes of an experiment

I.e., set of elementary events

Examples ?

(32)

Statistical experiments and events

Event, E

A set of elementary events

Any subset of a sample space

Examples

{2, 4, 6} – Getting an even number when throwing a 6- sided die

Probability ?

(33)

Probabilities

The probability of an event, E, describes the degree of uncertainty of that event

0 ≤ 𝑃 𝐸 ≤ 1

𝑃 𝑆 = 1

𝑃 Ø = 0

𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃[𝐵], if A and B are disjoint

If (𝐴 ∩ 𝐵) ≠ Ø, 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃 𝐴 ∩ 𝐵

(34)

Simple problem

Throwing a 6-sided fair die

What is the probability of getting an even number ?

What is the probability of getting a number larger than 2 ?

What is the probability of getting an even number or a number larger than 2 ?

(35)

Independent events

Two events A and B are said to be independent, if the occurrence of one does not affect the

occurrence of the other

Two events A and B are independent, if 𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 × 𝑃[𝐵]

(36)

Conditional probability

Conditional probability of event A given event B

𝑃 𝐴 𝐵 = 𝑃 𝐴 𝑎𝑛𝑑 𝐵

𝑃[𝐵] , 𝑃 𝐵 ≠ 0

What happens if they are independent ?

Bayes’ Theorem

𝑃 𝐴 𝐵 = 𝑃 𝐵 𝐴 𝑃 𝐴 𝑃[𝐵]

(37)

Uniform probability distribution on S

Each elementary event in S has the same probability

𝑃 𝐴 = 𝑃 𝐵 = 1 𝑆

(38)

Uniform Distribution

(39)

Uniform Distribution

(40)

Uniform Distribution

(41)

Uniform Distribution

(42)

Gaussian Distribution

(43)

Gaussian Distribution

(44)

Gaussian Distribution

(45)

Gaussian Distribution

(46)

Gaussian Distribution

(47)

Another simple problem

Tossing three fair coins

What is the sample space S3 ?

How high is the probability of getting at least one head ?

And at least two heads ?

Idea: relate to the binary representation Idea: triangular representation – paths

(48)

More difficult problem

Tossing n fair coins

How large is the probability to get “head” exactly k times ?

How large is the probability to get “head” at least k times ?

You can write some code to check your answers…

(49)

Binomial probability distribution

Characterizes the probability of obtaining k

“successes” in n experiments 𝑆 = 0,1,2, … , 𝑛

𝑃 𝑋 = 𝑘 = 𝑛

𝑘 𝑝𝑘(1 − 𝑝)𝑛−𝑘, 𝑘 = 0,1,2, … , 𝑛 Check your previous answers !!

(50)

Tasks

What is the probability of getting 6 heads in 15 tosses of a fair coin ?

Estimate the value with simulated experiments

Check that you got the correct value by

computing the probability from the binomial distribution

Now, consider that P[heads] = 2 × P[tails]

(51)

Random variable

A random variable is a function that assigns a real number to each elementary event of the sample space

Discrete r. v. – countable set of real values

Values obtained by throwing a dice numbered 1 to 6

Assigning 0 to tail and 1 to heads when tossing a coin

Continuous r. v. – interval or collection of intervals

Examples: temperature of a room or weight of product

(52)

Example – Throwing two dice

S(w) = sum of spots on the dice roll w

What is the probability that the spots total 7 ?

Fair dice ?

Loaded dice ?

(53)

Example – Throwing two dice

A random variable is characterized by the probability distribution of its values

s 2 3 4 5 6 7 8 9 10 11 12

Pr00[S=s] 1 36

2 36

3 36

4 36

5 36

6 36

5 36

4 36

3 36

2 36

1 36 Pr11[S=s] 4

64

4 64

5 64

6 64

7 64

12 64

7 64

6 64

5 64

4 64

4 64

(54)

Discrete random variable – Features

Mean – Expected value

𝜇 = 𝐸 𝑋 = ෍

𝑛

𝑋𝑛𝑃[𝑋 = 𝑋𝑛]

Variance

Measures how far a set of numbers are spread out

𝜎2 = 𝐸 (𝑋 − 𝜇)2 = 𝐸 𝑋2 − 𝜇2

𝜎 is the standard deviation

(55)

Sum of independent random vars.

Let 𝑍 = 𝑋 + 𝑌 be the sum of two independent random variables, defined on the same

probability space

𝐸 𝑍 = 𝐸 𝑋 + 𝑌 = 𝐸 𝑋 + 𝐸[𝑌]

𝜎

𝑧2

= 𝐸 (𝑋 + 𝑌 − 𝜇)

2

= 𝜎

𝑋2

+ 𝜎

𝑌2

(56)

Sum of independent random vars.

Let 𝑆𝑛 be the sum of 𝑛 independent and identically distributed random variables

𝑆𝑛 = 𝑋1 + 𝑋2 + … + 𝑋𝑛 𝐸 𝑆𝑛 = 𝑛 × 𝐸[𝑋]

𝜎

𝑆2𝑛

= 𝑛 × 𝜎

𝑋2

(57)

Sequence of numbers – Average value

Mean

Sum of all values, divided by the number of values

Median

Middle value, numerically

Mode

Value that occurs most often

(58)

statistics – Python module

Computing mathematical statistics of numeric data

Averages and measures of central location

mean(…)

median(…)

mode(…)

Measures of spread

stdev(…)

variance(…)

(59)

Estimating the mean of a rand. var.

Set of independent empirical observations

Sample mean is

𝜇 = 1

𝑛 ෍ 𝑋𝑖

Keep a record of the sum as the experiment progresses

Update the sample mean, when needed

(60)

Estimating the mean of a rand. var.

Sample variance is

𝜎2 = 1

𝑛 − 1 ෍ 𝑋𝑖2 1

𝑛(𝑛 − 1) ෍ 𝑋𝑖

2

Keep a record of the sums as the experiment progresses

Update the sample variance, when needed

Estimate the mean as

𝜇 ± ො𝜎/ 𝑛

(61)

Example – 10 rolls of two dice

Sample mean of the spot sum

Ƹ𝜇 = 7.4

Sample variance

𝜎

2

≈ 2.1

2

Estimate

(62)

Simulation of Random Events

Model random events, such that simulated

outcomes closely match real-world outcomes

Analyze simulated outcomes to gain insight !

Why approximate the real-world ?

No precise mathematical description…

OR

Less time / effort / cost than other approaches

(63)

Simulations have to be useful

How to mirror the real-world ?

1st – Prepare the experiment !

Identify the possible outcomes

Link each outcome to one (or more) random number(s)

Choose a source of random numbers

(64)

Simulations have to be useful

2nd – Run the experiment loop !

Choose one (or more) random number(s)

Record the simulated outcome

3rd – Analyze the data and report results

Histogram

(65)

Applications

Simulation of real-world systems for which the input is random in some way

Queueing in check-out lines

Simulation of statistical experiments

Tossing balanced / biased coins

Throwing fair / unfair dice

(66)

A Coin Experiment – V1

Toss a balanced coin n times

n >= 1 – parameter of the experiment

n independent replications of the simplest exp.

Record the total score of the experiment

1 for heads or 0 for tails

What do you expect ?

(67)

A Coin Experiment – V2

The coin is now biased !!

It turns up heads only 45% of the time

Again, toss the biased coin n times

And record the total score

Now, what do you expect ?

(68)

Tasks – Simulations

Simulate both coin experiments

For n = 1, 3, 5, and 7

Run the simulations 10, 100 and 1000 times

Observe the outcomes

Histograms

(69)

A Die Experiment – V1

Throw a standard 6-sided die n times

Record the total score of the experiment

What do you expect ?

(70)

A Die Experiment – V2

The die is now an unfair die !!

For which an ace is twice as likely to turn up as any other face

Again throw the unfair die n times

And record the total score

What do you expect ?

(71)

Tasks – Simulations

Simulate both die experiments

For n = 1, 3, 5, and 7

Run the simulations 10, 100 and 1000 times

Observe the outcomes

(72)

Another experiment with coins

Toss two balanced coins n times !!

Record the total score of the experiment

1 for heads or 0 for tails

What do you expect ?

(73)

Another experiment with dice

Throw a pair of fair dice n times !!

Record the sum of the faces that turn up

What do you expect ?

(74)

Tasks – Simulations

Simulate both experiments

For n = 1, 3, 5, and 7

Run the simulations 10, 100 and 1000 times

Observe the outcomes

Histograms

(75)

A Die-Coin Experiment

A standard die is thrown and then a coin is tossed the number of times shown on the die

Compound experiment

Second, dependent stage

Record the total coin score

Randomization of the first coin experiment !

(76)

A Coin-Dice Experiment

A coin is tossed

If the coin lands heads, a red die is throw

If the coin lands tails, a green die is thrown

Again, a compound experiment

Record the die color and score

(77)

Tasks – Simulations

Simulate both experiments

Run the simulations 10, 100 and 1000 times

Observe the outcomes

Histograms

(78)

Extra Tasks

Simulate experiments using k-sided dice

(79)

Task – A Simple Game

You pay 1 euro to roll two dice

Red + Green

You win 2 euros, if there are more eyes on red than on the green die

Should you play this game ?

Run a few simulations and decide !!

(80)

Task – A Simple Game

You roll two dice and, beforehand, guess the sum of the eyes: n eyes

If the guess turns out to be right, you earn n euros; otherwise, you pay 1 euro

Should you play this game ?

Run a few simulations and decide !!

(81)

Task – Problem 1

Consider the statistical experiment in which a fair die is thrown repeatedly until one of the faces appears for the second time

An outcome of the experiment is a list of the faces that are thrown

Possible outcomes: (3, 6, 2, 6) ; (5, 5) ; (1, 4, 3, 6, 2, 4)

Let Y denote the random variable that tells how many throws were necessary to produce a repetition of a face

For the examples above: 4, 2, 6

Simulate an observation of Y

(82)

Task – Problem 2

In a party with n people, what is the probability of at least two of them celebrating their birthday in the same day ?

What is the smallest n that guaranties that the previous probability is above 50% ?

Estimate the values with simulated experiments

Consider that each birthday is equally likely

Read about “The Birthday Paradox” !

(83)

Task – Problem 3

Consider n = 4000

Generate random numbers in the domain [n] until two have the same value

How many random trials did that take ?

Use k to represent this value

Repeat the experiment m = 300 times, and record for each how many random trials that took

(84)

Task – Problem 3

Plot that data as a cumulative density plot

The x-axis records the number k of trials required, and the y-axis records the fraction of experiments that succeeded (a collision) after k trials

Empirically estimate the expected number of k random trials in order to have a collision

That is, add up all values k, and divide by m

How long did it take ?

Carry out some tests for much larger n and m values !!

(85)

Task – Problem 4

Consider n = 200

Generate random numbers in the domain [n] until every value i in [n] has had one random number equal to i

How many random trials did that take ?

Use k to represent this value

Repeat the experiment m = 300 times, and record for

each how many random trials were required to collect all values

(86)

Task – Problem 4

Plot that data as a cumulative density plot

Empirically estimate the expected number of k random trials in order to collect all values

How long did it take ?

Carry out some tests for much larger n and m values !!

Read about “The Coupon Collectors Problem” !!

(87)

Extra Task – Problem 5

Consider a blind-folded game of darts

n darts are thrown to m targets

Each dart reaches one and only one target !

What is the probability of no target being hit more than once ?

What is the probability of at least one target being hit at least twice ?

(88)

References

D. Vrajitoru and W. Knight, Practical Analysis of Algorithms, Springer, 2014

Chapter 6

R. L. Graham, D. E. Knuth and O. Patashnik, Concrete Mathematics, Addison-Wesley, 1989

Chapter 8

J. Hromkovic, Design and Analysis of Randomized Algorithms, Springer, 2005

Chapter 1 + Chapter 2

H. P. Langtangen, A Primer on Scientific Programming with Python, 4th Ed., Springer, 2014

(89)

Acknowledgments

An earlier version of some of these slides was developed by Professor Carlos Bastos

Some of the initial examples with dice are from the “Concrete Mathematics” book

Problems 3 and 4 are from Jeff M. Phillips’ Data Mining course at the University of Utah

Referências

Documentos relacionados

O presente trabalho surge no âmbito da dissertação do Mestrado em Reabilitação Psicomotora da Faculdade de Motricidade Humana, Universidade de Lisboa e tem como objetivo

Assim, com este estudo tem-se como objetivo conhecer o perfil e avaliar a qualidade de vida e satisfação no trabalho dos profissionais de enfermagem em Atenção Básica (AB) no

a família das personalidades anti-sociais: ao contrário das anteriores, estas pessoas caracterizam-se por uma disposição persistente para o comportamento anti-social em geral

Some binary mixtures of fatty acids, mixtures of fatty esters or systems composed by fatty acids and fatty alcohols present this behavior.. Simple-eutectic mixtures are

A metodologia do índice de competitividade turística desenvolvida pela FGV em parceria com o MTur e o Sebrae Nacional destinos indutores surgiu como uma ferramenta de

Para esta discussão iremos definir as OBE como: acções coordenadas que procuram condicionar o estado de um sistema através da aplicação integrada de instrumentos de poder

Dados semelhantes foram igualmente encontrados em Portugal, nomeadamente nos estudos realizados pela equipa coordenada por Margarida Matos, onde se constatou, para além da

EpiReumaPt Investigation Team - Sociedade Portuguesa de Reumatologia; Instituto de Medicina Molecular da Faculdade de Medicina da Universidade de Lisboa; Serviço de Reumatologia