Design of Impact Evaluation Port-au-Prince

(1)

Impact evaluation and

measurement issues

Training for University of Haiti-Tulane Health Monitoring and Evaluation Course May 2013

(2)

Learning Objectives



To become familiar with specific

research designs used to answer impact

and effectiveness questions



To be aware of measurement issues



To explore the full range of evaluation

(3)

Definition

 Impact Evaluation: A type of outcome evaluation

that focuses on the broad, longer-term impact or

results – often health results -of an intervention in a population. For example, an impact evaluation could show that a decrease in vertical transmission of HIV was the direct result of a program designed to

improve testing and referral services for pregnant women, provide high quality delivery practices, pre- and post-natal treatment and appropriate counseling and support for feeding.

(4)

Theory of Impact Analysis

 Measurement of impact requires an

evaluation framework

 Measurement of effect and attribution require

a theory of impact analysis

 Requires a theory about how the program or

intervention works (treatment or program theory)

 Impact Analysis draws on the work of

Campbell and Stanley Experimental and

Quasi-experimental designs for research. Chicago: Rand McNally (1966)

(5)

Evaluation framework



Logical model of treatment with

elements that can be observed and

measured (sound working knowledge)



A design to measure the model of

treatment ( or effect)



A way to measure efficacy/effectiveness

and coverage of the intervention (and

to judge success) also called adequacy

(6)

Evaluation Framework



Combines 3 dimensions discussed in the

course:

 The relationship of the intervention to the

problem

 The “character” of the intervention and

therefore the strategy for the design

 Judgement, to identify criteria for success

(7)

Impact analysis



The key to impact analysis is

measuring:

 What did happen, and attributing it to

program, as compared to

 What would have happened if the program

(8)

Impact analysis

 Impact analysis is about “cause and effect”

 X produces Y₁, Y₂…Y_X

 Measured as a regression coefficient, a

difference between two means, or two proportions, with tests of statistical

(9)

Measuring impact



Problem:

 Measurement result:

 R (received treatment) – C (control) = E

(Effect)

 But E doesn’t tell you if the effect is big enough

to be a “success”

 So, compare to planned impact or adequacy

 Coverage:

 Consider adequacy (proportion of problem

(10)

Internal and External Validity



Internal validity

 Related closely to design

 Conclusions regarding what happened to

subjects at that time and in that context are actual conclusions



External validity

 Are the results generalizeable: other

(11)

Internal Validity



Goal of Design Strategy for impact

questions is Internal Validity

(12)

Internal Validity



Internal validity refers to the extent to

which the design enables you to

determine that the program, rather

than other factors, caused the changes

you have observed.



This is important when answering

(13)

Threats to Internal Validity



There are several threats:

 Selection

 Instrumentation

 History

 Contamination



Does not mean they actually exist



You need to consider the plausibility

(14)

Threats to internal validity



Selection

 Something other than T accounts for the

outcome: the 2 populations are different from the beginning

 Two kinds of selection bias:

 P = differences at pretest

 Q = all other selection biases:

 Example = early adopters, people easier to influence

(15)

Threats to Internal Validity



Instrumentation

 Is the factor you are exploring

conceptualized correctly?

 Does your questionnaire operationalize the

(16)

Threats to internal validity



History

 Something else beside the T accounts for

outcome

 External events  Maturation

 Regression  Attrition

(17)

Threats to internal validity



Contamination

 Intervention not delivered properly

(18)

Tips to reduce threats to internal

validity

 Determine the comparability of program participants and

the “typical” population

 Limit time period between pre and posttest and identify

other changes in community

 Carry out the pre and post test with same methodology  Ensure that participants are not “extreme”

 Ensure maximum control over validity and reliability of

measurement

 Identify any natural changes in the population over time  Identify the effect of participants dropping out

(19)

Threats to external validity



Even if program worked in a given

population, how do you know if it would

work in another?

(20)

Threats to external validity



Random selection, sociodemographic

diversity, large study

 How often do we have a chance to do

that?



Time: will the intervention be effective

over time:

(21)

Threats to external validity



Operationalization of:

 The intervention (radio vs. face-to-face;

kind of condom)

 The selection and measurement of

variables

(22)

Importance of Design



Designs attempt to eliminate or reduce

other possible explanations



Design is crucial in evaluations that

want to show that the program caused

the desired result or had an impact.

(23)

General Types of Designs for

Answering Impact Questions



Experimental



Quasi-Experimental

(24)

Types of Design



Experimental Design

Key elements



Central control of selection of

participants

(25)

Experimental Design 1

 R: O_1EX O_2E  R: O_1C O_2C

 R indicates Random assignment

 O is the Observation or measurement

 E is the experimental group, C is the control group  O₁= pretest; O₂= posttest

 X is the Program (or treatment)  Tic-tac-toe

(26)

Experimental Design 2



R:

X O

2T



R:

O

(27)

What’s the difference between

1 and 2?



What don’t you get without the pretest?



What do you gain?

(28)

Experimental Design 1

 R: X_1E T Y_2E  R: X_1C Y_2C

 R indicates Random assignment

 E is the experimental group, C is the control group  X = pretest; Y = posttest on outcomes of interest  T is the treatment

(29)

Experimental Design 2



R:

T Y

2T



R:

Y

(30)

Experimental Design

 Y_2T- Y_2C (Y = mean) tested for significance to see if

could have come from same population or different and have to make assumptions about the

randomization (happy or unhappy)

 If X exists, can include as a control variable –

(31)

Sensitivity



How small the difference between Y

2T

and Y

_2C

can be to demonstrate impact

of the program

(32)

Sensitivity

(33)

Types of Design



Quasi-Experiment

“Quasi” means no random assignment Key elements:

 Comparison (with and without the

Program)

(34)

Quasi-Experimental Design

A/C: X

_1E

T Y

_2E

Program Group

A/C: X

_1C

Y

_2C

Control Group

Groups

Matched pairs

Non-equivalent comparison groups

A = autonomous; C = controlled

(35)

Types of Design

Quasi-Experimental Designs (cont’d)



Use when you cannot control the

process for deciding who gets the

treatment



Weak because there may be selection

bias and other biases



But this is often more practical in public

(36)

Quasi-Experimental Design:

Interrupted Time Series



Key elements: many measures before

and after the “treatment”

A/C: Y

₁

Y

₂

Y

₃

T Y

₄

Y

₅

Y

₆

(Some suggest that you should have at

least 10 measures before and after the

treatment (T))

(37)

Quasi-Experimental Design:

Comparative Time Series



Key elements: many measures before

and after the “treatment”

A/C: Y

_1E

Y

_2E

Y

_3E

T Y

_4E

Y

_5E

Y

₆

A/C: Y

_1C

Y

_2C

Y

_3C

Y

_4C

Y

_5C

Y

_6C

(38)

Types of Design



Ex post facto and Non-Experimental

Designs

Key elements:

 No random assignment

 Maybe no before-program measures

(39)

Ex post facto and

Non-experimental Design

Ex post facto: no true sampling plan Before and After Design

Y₁ T Y₂ One Shot

(40)

Ex post facto and

Non-experimental Design

 Very common!

 No evaluator control of selection or exposure

to treatment

 Threat of “spuriousness”

 Threat of self-selection and “volunteerisim”

 Can “control” by selecting a criterion population,

with some of the same characteristics as the volunteers.

(41)

Non Experimental Designs:

Pre and posttest

 Provides a measure of change, with preliminary

evidence, when supported by strong process

evaluation data, but no strong conclusive results.

 Uses:

 To conduct a pilot test

 To demonstrate impact of short term intervention

 Period between 0

1 and O2 should be as short as possible

 Maximum control over validity and reliability of measurement and data collection methods

 This design is susceptible to almost all the threats to internal validity

(42)

Learning from

non-experiments



Analyses:

 Regression analysis  Econometric techniques  Propensity scoring

(43)

Types of Evaluation Design

Randomized or True Experiment Is randomized assignment used? YES NO Quasi Experiment Non Experiment

Is there a control group or multiple measures?

NO YES

(44)

Considerations in Choosing an

Evaluation design

 What is the strength of evidence require to address

the purpose of the evaluation?

 Is there any ethical or legal consideration?

 What is the amount of resources available?

 Has the intervention been introduced already?

 What is the time frame required?

(45)

Discussion:

Applying Design to Case study

 Given what you know about programs to

interrupt vertical transmission,

what type of design could it use to:

1) Determine the impact of counseling on mothers?

2) Determine the impact of the program on vertical transmission?

3) Determine whether Ministry staff are

satisfied with the performance of clinicians who participated in the training?

(46)

Measurement Strategy



What do you want to know?



How will you know it?

(47)

Developing Measurement

Strategy



Conceptual definition

 Of Key terms/concepts:

 Training, counseling, attitudes

 Boundaries:

 In 9 district hospitals from 2005-2006 

Operational definition

 How will each variable be measured?

(48)

Indicators/Monitoring



Monitoring program performance at

repeated intervals to track progress

requires the use of carefully identified

and defined indicators so that

meaningful comparisons can be made.



The definition and measurement issues

we discuss here are common to both

monitoring and evaluation.

(49)

Definitions

 An indicator is a word or phrase which

“indicates” the level or extent of some phenomenon of interest

(Example: % HIV+ mothers receiving neverapine)

 A measure is the operational definition of how

data are collected to assign a value to an indicator

(Example: % of pregnant antenatal attendees who accept HIV test, test positive, counseled who

(50)

Defining Your Terms



It means translating vague words into

specific meanings.



Defining your terms means obtaining

agreement from the stakeholders about

the question, the definitions and the

(51)

Defining Your Terms



Sometimes it is difficult to assign a

number or to actually measure what

you want to measure.



For example, you may not really be able

to measure the quality of a program.

Instead, you may have to be content

with measuring whether people think it

is a quality program.

(52)

Example: Training clinical staff

 Clinician attitudes:

 Measured by using a survey that asks clinicians

about their attitudes

 Quality of care:

 Measured by having observers rate specific

components of performance

 Effectiveness of the training system:

 Measured by the number of participants

 Measured by meeting set targets for % HIV

(53)

Case: Measures

1. Did clinician attitudes change after the

training?

a. Indicator: attitudes

b. Measure: responses to a series of

attitude questions about the kind of women who are HIV+ (0-4 scale)

2. Did patients counseled intend to test?

a. Indicator: % agreeing to test b. Measure: # of eligible mothers tested/total #of eligible mothers

(54)

Some commonly used

measures

 Frequencies, percents, proportions

 Means, Medians, Modes

 Cost, in currency

 Percent change over time or between groups

 Rates, Ratios

 Treatment effect: difference in β_T

 Y_i= α + βX_i + u_isame as Y_i= α + β_TT_i + u_i  where T = treatment (0,1)

(55)

Key issues about measures



Are they relevant?



Are they valid?



Are they reliable?



Are they precise?

(56)

The problem of behavior



What is a behavior?



Are behaviors consistent?

(57)

Data Source Issue



What are the best sources of data?

 Validity  Reliability



Do the data already exist?

 Are they reliable?

(58)

Discussion: Where Can we

Find Data?

 Monitoring (M): Number of training seminars

held

 (M) Number of clinicians who completed

training

 (E) Attitudes of clinicians

 (E) Quality of care

 (E) Quality of teaching material

 (M) Participation of clinicians

(59)

Data Source Lessons



Which ones might be easier to obtain?



Which ones might be very difficult to

obtain?



How accurate and reliable are each of

the data sources?



How valid are existing data?

(60)

Case discussion



Goal: Capability of health educators is

upgraded

 How do they define capability?

(61)

Evaluation Grid

 One tool that some find useful is the

evaluation grid

 This tool helps you see how you intend to

answer each question

 For each question, you will need to identify

the information needed, sources of that

information, and how you will collect the data

(62)

Evaluation Grid

Evaluation

Criteria Eval. Questions 2a Questions Basis for Judgement Data needed Data sources Data collection methods Relevance Effectiveness Efficiency Impact Sustainability Others

(63)

Exercise: Evaluation Grid

 For PMTCT:

 Identify two evaluation questions

 What data/measures would best answer your

questions?

 What are likely sources of information?

 Complete the

 Data Needed column  Data Source column

(64)

Bibliography



Mohr, L.B., Impact Analysis for

Program Evaluation. Thousand Oaks:

Sage, 1995

 Habicht, J.P., CG Victora and JP Vaughan.

Evaluation designs for adequacy, plausibility and probability of public health programme performance and impact. Int J of