• Nenhum resultado encontrado

Design of Impact Evaluation Port-au-Prince

N/A
N/A
Protected

Academic year: 2021

Share "Design of Impact Evaluation Port-au-Prince"

Copied!
64
0
0

Texto

(1)

Impact evaluation and

measurement issues

Training for University of Haiti-Tulane Health Monitoring and Evaluation Course May 2013

(2)

Learning Objectives

To become familiar with specific

research designs used to answer impact

and effectiveness questions

To be aware of measurement issues

To explore the full range of evaluation

(3)

Definition

Impact Evaluation: A type of outcome evaluation

that focuses on the broad, longer-term impact or

results – often health results -of an intervention in a population. For example, an impact evaluation could show that a decrease in vertical transmission of HIV was the direct result of a program designed to

improve testing and referral services for pregnant women, provide high quality delivery practices, pre- and post-natal treatment and appropriate counseling and support for feeding.

(4)

Theory of Impact Analysis

 Measurement of impact requires an

evaluation framework

 Measurement of effect and attribution require

a theory of impact analysis

 Requires a theory about how the program or

intervention works (treatment or program theory)

 Impact Analysis draws on the work of

Campbell and Stanley Experimental and

Quasi-experimental designs for research. Chicago: Rand McNally (1966)

(5)

Evaluation framework

Logical model of treatment with

elements that can be observed and

measured (sound working knowledge)

A design to measure the model of

treatment ( or effect)

A way to measure efficacy/effectiveness

and coverage of the intervention (and

to judge success) also called adequacy

(6)

Evaluation Framework

Combines 3 dimensions discussed in the

course:

 The relationship of the intervention to the

problem

 The “character” of the intervention and

therefore the strategy for the design

 Judgement, to identify criteria for success

(7)

Impact analysis

The key to impact analysis is

measuring:

 What did happen, and attributing it to

program, as compared to

 What would have happened if the program

(8)

Impact analysis

 Impact analysis is about “cause and effect”

 X produces Y1, Y2…YX

 Measured as a regression coefficient, a

difference between two means, or two proportions, with tests of statistical

(9)

Measuring impact

Problem:

 Measurement result:

 R (received treatment) – C (control) = E

(Effect)

 But E doesn’t tell you if the effect is big enough

to be a “success”

 So, compare to planned impact or adequacy

 Coverage:

 Consider adequacy (proportion of problem

(10)

Internal and External Validity

Internal validity

 Related closely to design

 Conclusions regarding what happened to

subjects at that time and in that context are actual conclusions

External validity

 Are the results generalizeable: other

(11)

Internal Validity

Goal of Design Strategy for impact

questions is Internal Validity

(12)

Internal Validity

Internal validity refers to the extent to

which the design enables you to

determine that the program, rather

than other factors, caused the changes

you have observed.

This is important when answering

(13)

Threats to Internal Validity

There are several threats:

 Selection

 Instrumentation

 History

 Contamination

Does not mean they actually exist

You need to consider the plausibility

(14)

Threats to internal validity

Selection

 Something other than T accounts for the

outcome: the 2 populations are different from the beginning

 Two kinds of selection bias:

 P = differences at pretest

 Q = all other selection biases:

 Example = early adopters, people easier to influence

(15)

Threats to Internal Validity

Instrumentation

 Is the factor you are exploring

conceptualized correctly?

 Does your questionnaire operationalize the

(16)

Threats to internal validity

History

 Something else beside the T accounts for

outcome

 External events  Maturation

 Regression  Attrition

(17)

Threats to internal validity

Contamination

 Intervention not delivered properly

(18)

Tips to reduce threats to internal

validity

 Determine the comparability of program participants and

the “typical” population

 Limit time period between pre and posttest and identify

other changes in community

 Carry out the pre and post test with same methodology  Ensure that participants are not “extreme”

 Ensure maximum control over validity and reliability of

measurement

 Identify any natural changes in the population over time  Identify the effect of participants dropping out

(19)

Threats to external validity

Even if program worked in a given

population, how do you know if it would

work in another?

(20)

Threats to external validity

Random selection, sociodemographic

diversity, large study

 How often do we have a chance to do

that?

Time: will the intervention be effective

over time:

(21)

Threats to external validity

Operationalization of:

 The intervention (radio vs. face-to-face;

kind of condom)

 The selection and measurement of

variables

(22)

Importance of Design

Designs attempt to eliminate or reduce

other possible explanations

Design is crucial in evaluations that

want to show that the program caused

the desired result or had an impact.

(23)

General Types of Designs for

Answering Impact Questions

Experimental

Quasi-Experimental

(24)

Types of Design

Experimental Design

Key elements

Central control of selection of

participants

(25)

Experimental Design 1

 R: O1EX O2E  R: O1C O2C

 R indicates Random assignment

 O is the Observation or measurement

 E is the experimental group, C is the control group  O1 = pretest; O2 = posttest

 X is the Program (or treatment)  Tic-tac-toe

(26)

Experimental Design 2

R:

X O

2T

R:

O

(27)

What’s the difference between

1 and 2?

What don’t you get without the pretest?

What do you gain?

(28)

Experimental Design 1

 R: X1E T Y2E  R: X1C Y2C

 R indicates Random assignment

 E is the experimental group, C is the control group  X = pretest; Y = posttest on outcomes of interest  T is the treatment

(29)

Experimental Design 2

R:

T Y

2T

R:

Y

(30)

Experimental Design

 Y2T - Y2C (Y = mean) tested for significance to see if

could have come from same population or different and have to make assumptions about the

randomization (happy or unhappy)

 If X exists, can include as a control variable –

(31)

Sensitivity

How small the difference between Y

2T

and Y

2C

can be to demonstrate impact

of the program

(32)

Sensitivity

(33)

Types of Design

Quasi-Experiment

“Quasi” means no random assignment Key elements:

 Comparison (with and without the

Program)

(34)

Quasi-Experimental Design

A/C: X

1E

T Y

2E

Program Group

A/C: X

1C

Y

2C

Control Group

Groups

Matched pairs

Non-equivalent comparison groups

A = autonomous; C = controlled

(35)

Types of Design

Quasi-Experimental Designs (cont’d)

Use when you cannot control the

process for deciding who gets the

treatment

Weak because there may be selection

bias and other biases

But this is often more practical in public

(36)

Quasi-Experimental Design:

Interrupted Time Series

Key elements: many measures before

and after the “treatment”

A/C: Y

1

Y

2

Y

3

T Y

4

Y

5

Y

6

(Some suggest that you should have at

least 10 measures before and after the

treatment (T))

(37)

Quasi-Experimental Design:

Comparative Time Series

Key elements: many measures before

and after the “treatment”

A/C: Y

1E

Y

2E

Y

3E

T Y

4E

Y

5E

Y

6

A/C: Y

1C

Y

2C

Y

3C

Y

4C

Y

5C

Y

6C

(38)

Types of Design

Ex post facto and Non-Experimental

Designs

Key elements:

 No random assignment

 Maybe no before-program measures

(39)

Ex post facto and

Non-experimental Design

Ex post facto: no true sampling plan Before and After Design

Y1 T Y2 One Shot

(40)

Ex post facto and

Non-experimental Design

 Very common!

 No evaluator control of selection or exposure

to treatment

 Threat of “spuriousness”

 Threat of self-selection and “volunteerisim”

 Can “control” by selecting a criterion population,

with some of the same characteristics as the volunteers.

(41)

Non Experimental Designs:

Pre and posttest

Provides a measure of change, with preliminary

evidence, when supported by strong process

evaluation data, but no strong conclusive results.

Uses:

To conduct a pilot test

To demonstrate impact of short term intervention

Period between 0

1 and O2 should be as short as possible

Maximum control over validity and reliability of measurement and data collection methods

This design is susceptible to almost all the threats to internal validity

(42)

Learning from

non-experiments

Analyses:

 Regression analysis  Econometric techniques  Propensity scoring

(43)

Types of Evaluation Design

Randomized or True Experiment Is randomized assignment used? YES NO Quasi Experiment Non Experiment

Is there a control group or multiple measures?

NO YES

(44)

Considerations in Choosing an

Evaluation design

 What is the strength of evidence require to address

the purpose of the evaluation?

 Is there any ethical or legal consideration?

 What is the amount of resources available?

 Has the intervention been introduced already?

 What is the time frame required?

(45)

Discussion:

Applying Design to Case study

 Given what you know about programs to

interrupt vertical transmission,

what type of design could it use to:

1) Determine the impact of counseling on mothers?

2) Determine the impact of the program on vertical transmission?

3) Determine whether Ministry staff are

satisfied with the performance of clinicians who participated in the training?

(46)

Measurement Strategy

What do you want to know?

How will you know it?

(47)

Developing Measurement

Strategy

Conceptual definition

 Of Key terms/concepts:

 Training, counseling, attitudes

 Boundaries:

 In 9 district hospitals from 2005-2006 

Operational definition

 How will each variable be measured?

(48)

Indicators/Monitoring

Monitoring program performance at

repeated intervals to track progress

requires the use of carefully identified

and defined indicators so that

meaningful comparisons can be made.

The definition and measurement issues

we discuss here are common to both

monitoring and evaluation.

(49)

Definitions

 An indicator is a word or phrase which

“indicates” the level or extent of some phenomenon of interest

(Example: % HIV+ mothers receiving neverapine)

 A measure is the operational definition of how

data are collected to assign a value to an indicator

(Example: % of pregnant antenatal attendees who accept HIV test, test positive, counseled who

(50)

Defining Your Terms

It means translating vague words into

specific meanings.

Defining your terms means obtaining

agreement from the stakeholders about

the question, the definitions and the

(51)

Defining Your Terms

Sometimes it is difficult to assign a

number or to actually measure what

you want to measure.

For example, you may not really be able

to measure the quality of a program.

Instead, you may have to be content

with measuring whether people think it

is a quality program.

(52)

Example: Training clinical staff

 Clinician attitudes:

 Measured by using a survey that asks clinicians

about their attitudes

 Quality of care:

 Measured by having observers rate specific

components of performance

 Effectiveness of the training system:

 Measured by the number of participants

 Measured by meeting set targets for % HIV

(53)

Case: Measures

1. Did clinician attitudes change after the

training?

a. Indicator: attitudes

b. Measure: responses to a series of

attitude questions about the kind of women who are HIV+ (0-4 scale)

2. Did patients counseled intend to test?

a. Indicator: % agreeing to test b. Measure: # of eligible mothers tested/total #of eligible mothers

(54)

Some commonly used

measures

 Frequencies, percents, proportions

 Means, Medians, Modes

 Cost, in currency

 Percent change over time or between groups

 Rates, Ratios

 Treatment effect: difference in βT

 Yi = α + βXi + ui same as Yi = α + βTTi + ui  where T = treatment (0,1)

(55)

Key issues about measures

Are they relevant?

Are they valid?

Are they reliable?

Are they precise?

(56)

The problem of behavior

What is a behavior?

Are behaviors consistent?

(57)

Data Source Issue

What are the best sources of data?

 Validity  Reliability

Do the data already exist?

 Are they reliable?

(58)

Discussion: Where Can we

Find Data?

 Monitoring (M): Number of training seminars

held

 (M) Number of clinicians who completed

training

 (E) Attitudes of clinicians

 (E) Quality of care

 (E) Quality of teaching material

 (M) Participation of clinicians

(59)

Data Source Lessons

Which ones might be easier to obtain?

Which ones might be very difficult to

obtain?

How accurate and reliable are each of

the data sources?

How valid are existing data?

(60)

Case discussion

Goal: Capability of health educators is

upgraded

 How do they define capability?

(61)

Evaluation Grid

 One tool that some find useful is the

evaluation grid

 This tool helps you see how you intend to

answer each question

 For each question, you will need to identify

the information needed, sources of that

information, and how you will collect the data

(62)

Evaluation Grid

Evaluation

Criteria Eval. Questions 2a Questions Basis for Judgement Data needed Data sources Data collection methods Relevance Effectiveness Efficiency Impact Sustainability Others

(63)

Exercise: Evaluation Grid

 For PMTCT:

 Identify two evaluation questions

 What data/measures would best answer your

questions?

 What are likely sources of information?

 Complete the

 Data Needed column  Data Source column

(64)

Bibliography

Mohr, L.B., Impact Analysis for

Program Evaluation. Thousand Oaks:

Sage, 1995

 Habicht, J.P., CG Victora and JP Vaughan.

Evaluation designs for adequacy, plausibility and probability of public health programme performance and impact. Int J of

Referências

Documentos relacionados

Dioecy signifi cantly infl uenced the growth of Araucaria angustifolia , and female Araucaria trees have higher growth in diameter, and individual volume, while

Parameters from the probit regression used to evaluate the spread of germination as a function of temperature (T), for pea, broad bean, corn and sorghum: the most likely values of

Habitat and time of activity: Oxyrhopus melanog- enys is a moderate-sized pseudoboine (maximum SVL = 901 mm, female; this study) and the data avail- able indicate that this is

apetala coletadas de árvores em florestas preservadas em comparação com aquelas isoladas em matriz de pastagens cultivadas no Pantanal.. Material

despendendo assim muito do seu tempo a trabalhar. O número de horas de trabalho semanais tem vindo, continuamente, a aumentar nos últimos 20 anos. Por outro lado, há uma crescente

Tal estudo se justifica pela oportunidade de apresentar uma consolidação das pesquisas sobre o assunto, além de possibilitar a abordagem de um tema que tem desencadeado projetos que

Nesse sentido, poder-se-ia afirmar que tal município também pode ser entendido enquanto uma destinação para viagens de negócios, visto que seu mercado é aquecido

In considering these reflections the following objectives were delimited: to produce dialogues from the creation of an individual design on the perception of the own body as