RE-512-1
INTERNAL USE
PUBLIC UPON APPROVAL OR CONSIDERATION
ANNEX V. CLEAR QUALITY ASSESSMENT
OF IDB IMPACT EVALUATIONS
IDB’S IMPACT EVALUATIONS: PRODUCTION, USE, AND INFLUENCE
Office of Evaluation and Oversight, OVE
Inter-American Development Bank Washington, D.C. September 2017
This document contains confidential information relating to one or more of the ten exceptions of the Access to Information Policy and made available only to Bank employees. The document will be disclosed and made available to the public upon approval or consideration.
TABLE OF CONTENTS
EXECUTIVE SUMMARY
I. DATABASE OF IMPACT EVALUATION STUDIES AT IDB .................................................. 1
II. QUALITY ASSESSMENT: INSTRUMENTS AND IMPLEMENTATION ..................................... 3
III. RESULTS ................................................................................................................. 4
IV. PROPOSALS ........................................................................................................... 25
APPENDIX 1
APPENDIX 2
APPENDIX 3
APPENDIX 4
i
QUALITY ASSESSMENT IDBS IMPACT EVALUATIONS1
EXECUTIVE SUMMARY
Multilateral development banks (MDBs) and other international development agencies have been increasingly emphasizing the use of rigorous evidence for decision making. This trend has led to higher efforts and resources to conduct impact evaluation studies. The quality of evaluation research significantly influences the way it is used.2 Useful evidence needs to come from applied research that is relevant, has a clear stated purpose, is adequately designed, has methodological rigor, and relies on appropriate data to estimate results.
However, conducting rigorous evaluations entails several challenges, so it is necessary to constantly analyze the quality of available research to ensure that decision makers are indeed using sound, reliable evidence. In this context, this project3 performed a quality analysis of impact evaluations conducted by the Inter-American Development Bank (IDB), as part of the Office of Evaluation and Oversight’s (OVE) activities.
This document presents the results of a quality assessment of a sample of IDB impact evaluations, including both completed evaluations and proposals. The assessment instrument features ratings and opinions on relevance, methodology, data, results, and overall quality. This report is structured in three sections: 1) explanation of the impact evaluation documents database that OVE compiled; 2) description of the method used for the quality assessment, including the instrument elaborated by OVE and piloted by CLEAR and the Centro de Investigación y Docencia Económicas (CIDE), as well as its implementation process; 3) analysis of key results. The last section contains two subsections: completed evaluations and evaluation proposals.
The main findings of this assessment leave us with two main takeaways. The first one is that about half of the evaluation reports are at least partially satisfactory in terms of overall quality. The second one is that this varies importantly by time period, sector, and specific quality criteria.
Overall, completed evaluation reports are relevant, since a large share of them intends to provide new evidence on how well a new approach or program works, including those supported by IDB. Most of them also include a pertinent literature review and adequate research protocols. Randomized control trials (RCTs) were the most popular methodology, particularly for Education and Social Protection and Health. The data were mostly obtained from ad hoc surveys, with little use of other sources such as administrative records or national surveys.
Only 43% of completed evaluations proved to have methodological rigor in most criteria. The criteria used for assessing rigorousness were a revision of previous work, discussion of indicators for measuring relevant impacts, validity concerns, estimation strategies, and adequate sample sizes based on statistical power calculations. 80% of completed
1 Report prepared by CLEAR-LAC Center with inputs and supervision of OVE. 2 Kelly Johnson and others, ‘Research on Evaluation Use: A Review of the Empirical Literature From 1986
to 2005’, American Journal of Evaluation, 30.3 (2009), 377–410 <https://doi.org/10.1177/1098214009341660>.
3 The official name of the project is “RFP 17-012 Quality Assessment: IDB’s Impact Evaluations.”
ii
evaluations are missing power calculations, and less than 50% have satisfactory sample sizes. Around 40% of completed evaluations fail at either describing how variables were constructed or at providing sufficient descriptions. As for implementation problems, 38% of evaluation reports mentioned attrition and non-response, 19% acknowledge spillover effects, and 10% faced multiple hypothesis testing. These problems are likely to affect the reliability of the results in one-third of the evaluations.
Completed evaluations do not show an obvious time trend for overall quality. However, if we exclude three 2014-2015 evaluations from the sample, overall quality during 2008-2013 is stable with an average of 50% of reports considered satisfactory and partially satisfactory. Finally, overall assessment results match our recommendations on whether to publish the reports—around half reports were recommended for publication. However, only 26% of reports would be recommended for publication in top journals or top field journals.
As for non-completed evaluations (proposals), 44% have satisfactory or partially satisfactory quality. More than half proposals lacked an adequate literature review, faced serious methodological shortcomings, and recommended an unacceptable selection of variables. RCTs remain as the most popular evaluation method among proposals. 66% of proposals include adequate research protocols. Most proposals suggest gathering survey data, as completed evaluations did, also with little use of administrative records or national surveys.
On average, proposals rank lower in methodological rigor than completed evaluations. However, about 70% of proposals mention power calculations, and slightly more than 50% include them and are sufficient to detect meaningful differences. Complete evaluations and proposals relatively share common assessments on sample size adequacy and unit of analysis.
Most proposals state that their relevance is to fill a knowledge gap or an operational knowledge requirement, much like completed evaluations. But in contrast, 20% of proposals also state their purpose of providing incentives to complete the implementation of a policy or program, while only 6% of completed evaluations mention that as their main driver.
In contrast with completed evaluations, the quality of proposals seems to improve over time. For 2015-2016, more than 60% of proposals are of satisfactory or partially satisfactory quality. However, proposals would be recommended by external reviewers in a lower proportion than completed evaluations.
In summary, proposals on average feature clear and complete research protocols, power calculations, and overall time improvement. Completed evaluations rank higher in overall quality, relevant literature reviews, methodological rigor, data quality, authorization and recommendation for publication.
1
I. DATABASE OF IMPACT EVALUATION STUDIES AT IDB
1.1 The analysis of this report is based on the data collected by OVE regarding impact evaluations proposed by IDB in loans and technical cooperations between 2006 and 2016. This database includes impact evaluation proposals that are in three different stages of implementation: complete, ongoing, and cancelled. A total of 531 impact evaluations (IEs) were proposed. These IEs correspond to 416 IDB operations, 156 grants and 377 loans. 18% of all IEs has been completed 29% has been cancelled and more than 50% are on-going.1
Table 1.1 - Impact evaluations by status
Status Documents2 Additional IEs in
Documents Total number of IEs
Completed 91 6 94
Proposal (ongoing) 229 51 286
Cancelled 132 24 151
Total 452 81 531
Source: OVE. IE Database 2017.
1.2 Table 1.2 shows the distribution of IEs by division. The highest share of IEs is in Social Protection and Health (22%), followed by Education (16%) and Institutional Capacity of the State (14%). More than 60% of completed evaluations are concentrated in three divisions: Education (28%), Social Protection and Health (21%), and Labor Markets (13%).
1.3 Social Protection and Health has the highest percentage of cancelled IEs (23%). Energy, Trade and Investment, and Water and Sanitation do not have completed IEs. Labor Markets ranks first in completed IEs as a percentage of the division’s total (39%), followed by Gender and Diversity (33%), Education (31%), and Social Protection and Health (17%).
1 The original sample frame included 97 completed IE, 280 on-going and 156 cancelled IE. The rest of the
document was based on with the original universe, but the changes are not different after adjusting the two IEs.
2 The reason for the difference between the amount of documents and the total number of identified IEs is that some documents include more than one IE. Some documents evaluate different components of the programs or interventions.
2
Table 1.2 - Impact evaluations by division and status
Division Total
IE
% CAN % COMP % PROP %
Capital Markets & Financial
Institutions 24 5% 13 9% 3 3% 8 3%
Competitiveness &
Innovation 47 9% 11 7% 7 7% 29 10%
Education 83 16% 27 18% 26 28% 30 11%
Energy 4 1% 1 1% 0 0% 3 1%
Environment, Rural Dev. &
Disaster Risk 56 11% 12 8% 7 7% 37 13%
Fiscal and Municipal
Management 7 1% 2 1% 0 0% 5 2%
Gender and Diversity 21 4% 5 3% 7 8% 9 3%
Housing and Urban
Development 20 4% 9 6% 1 1% 10 4%
Institutional Capacity of the
State 72 14% 20 13% 10 11% 42 15%
Labor Markets 31 6% 11 7% 12 13% 8 3%
Social Protection & Health 119 22% 33 22% 20 21% 66 23%
Trade & Investment 23 4% 3 2% 0 0% 20 7%
Transport 14 3% 0 0% 1 1% 13 5%
Water and Sanitation 10 2% 4 3% 0 0% 6 2%
TOTAL 531 100% 151 100% 94 100% 286 100%
Source: OVE. IE Database 2017. Note: CAN refers to cancelled IEs; COMP, to completed; PROP to projects or ongoing.
1.4 All divisions have at least one proposal. 50% of proposals are concentrated in three divisions: Social Protection and Health, Institutional Capacity of the State, and Environment, Rural Development and Disaster Risk.
1.5 Cancellations have been declining substantially as a proportion of total IEs. 44% of IEs got cancelled in 2012, but in 2016 only 6% did (see Graph 1.1). However, while 76 evaluations were completed between 2009 and 2013, only three were finalized between 2014 and 2016. Proposals increased significantly from 2011 to 2014, but decreased in more than 40% during the following years.
3
Graph 1.1 - Impact evaluations by year and status
Source: OVE. IE Database 2017.
II. QUALITY ASSESSMENT: INSTRUMENTS AND IMPLEMENTATION
2.1 For the quality assessment, two questionnaires were used—one for completed evaluations and one for proposals—to evaluate the relevance, methodology, data, results, and overall quality of IEs. As part of the project, CLEAR LAC piloted the questionnaires and shared the results with OVE. OVE’s feedback was incorporated to further refine these tools. The final version of the questionnaires appears in Appendix 1 and 2. They are structured in five sections:
a. Relevance
Aspects addressed: main purpose of the evaluation; inclusion of relevant literature review; formulation of a research protocol.
b. Methodology
Aspects addressed: issues with the method used; method adequacy and shortcomings; number and timing of post-treatment follow-ups; power calculations.
c. Data
Aspects addressed: sample size assessment; unit of analysis; type and quality of data.
d. Results
Aspects addressed: effect of treatment; expected sign; discussion of implementation problems—noncompliance, attrition, spillovers—; reliability of the evaluation.
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
Cancelled 8 3 8 15 15 25 38 17 10 10 2
Completed 6 3 9 19 12 16 11 15 2 1 0
Ongoing 0 1 1 6 12 32 42 52 58 48 34
0
10
20
30
40
50
60
Nu
mb
er
of
IE
4
(This section was omitted from the questionnaire for proposals).
e. Overall quality assessment
Experts provide an overall quality assessment with opinions on authorization and publication.
2.2 The questionnaires’ variables are categorical and many are ordinal (Likert-type scale). Most items include a brief explanation of the expert’s opinion.
2.3 The collected information is adequate for a descriptive analysis on general aspects of quality, but it does not consider the specifics of each intervention.3 This means that the information we obtained is relevant but not sufficient for analyzing factors that may affect intervention quality, such as evaluation capacities, financial resources, political and administrative context, etc. This assessment did not include an in-depth study in that regard. Moreover, this analysis is based only in what the evaluation reports state; there were no revisions of the original data or model estimations.
2.4 The quality assessment instrument was applied to 86 completed IEs and a random sample of 59 proposals. The application was performed by researchers from Centro de Investigación y Docencia Económicas (CIDE) and other academic institutions with solid economic and quantitative expertise in impact evaluations. Evaluations were assigned to researchers based on their area of expertise. A subset of the assessment questionnaires was reviewed by two researchers. OVE received the completed questionnaires4 for each IE and the database of the assessment results. Appendix 3 and Appendix 4 contain a list with the full names of the completed IEs and proposals included in this study.
III. RESULTS
Completed Evaluations
3.1 The following descriptive analysis is structured by the quality dimensions included in the questionnaire. We describe aggregate or general results first, and add subsections by division, time, or method as relevant.
1. Relevance
3.2 This section assesses three main characteristics of the IEs: their purpose, their connection with and contribution to the literature, and whether they have an adequate research protocol.
3 There are several studies, particularly in medical interventions, to assess the quality of RCTs that
consider aspects of the interventions. See, for example, S. Uman, L.S., Chambers, C. T., McGrath, P. J., & Kisely, ‘Assessing the Quality of Randomized Controlled Trials Examining Psychological Interventions for Pediatric Procedural Pain: Recommendations for Quality Improvement’, Journal of Pediatric Psychology, 35.7 (2010), 693–703.
4 This was part of the project’s terms of reference (TOR). OVE and CLEAR agreed on keeping the identity of reviewers anonymous to allow for an impartial assessment.
5
3.3 The first characteristic—the evaluations’ purpose—was classified in five different options. The options are defined by whether the evaluation: 1) fills a knowledge gap, specifically by providing new evidence on whether a new approach or program works; 2) fills an operational knowledge requirement for IDB—this is, the evaluation provides evidence on whether a new approach for an IDB program works; 3) provides evidence about a program being replicated in another country; 4) provides incentives to complete the implementation of a program; and 5) other.
3.4 Overall, 64% of IEs either fill a knowledge gap or provide operational knowledge for IDB (see Graph 3.1). An example of an evaluation that fills a knowledge gap is “Unraveling the Threads of Decentralized Community-Based Irrigation Systems on the Welfare of Rural Households in Bolivia” (code BO-L1084). It is innovative because it analyzes possible interconnections between the effects of community-managed irrigation systems on average income, agricultural productivity, and adoption of technology by participants.
3.5 An example of an evaluation that fills an operational knowledge requirement is “Programa de Mejora de la Enseñanza de las Ciencias Naturales y la Matemática” (Code AR-L1038). The aim was to measure the impact of three IDB-supported educational programs run by Argentina’s Ministry of Education. The evaluators used several dimensions and measurements of improvement in teacher skills and teaching practices.
Graph 3.1 - Purpose of impact evaluations
Source: CLEAR-LAC Center.
3.6 Almost one-fifth of IEs provide evidence for replication of programs implemented in other countries. This is the case of the evaluations of conditional cash transfer programs (CCTs) that had been previously implemented in Mexico, Brazil, and Colombia, among other countries. For example, “Evaluación externa de impacto del programa de transferencias monetarias condicionadas” (code GU-T1089) analyzes the MIFAPRO program in Guatemala.
3.7 Less than 10% of evaluations fall in the policy completion category. An example is “The Effect of In-Service Teacher Training on Student Learning of English as a Second Language” (code ME-T1114). The assessment shows that a better
34%
30%
16%
6%
14%
Knowledge gap Operational knowledge requirement for IDB Replication Policy completion Other
6
understanding of how in-service teacher training affects teacher behavior is required to improve future program implementation.
3.8 14% of IEs have a purpose that, in the reviewers’ opinion, do not fit into the first four categories. Some of these evaluations looked for the effect of a specific phenomenon, but not of a program or a new approach, or also for specific outcomes on particular populations. This is the case of “Violence and Birth Outcomes: Evidence from Homicides in Brazil” (code RG-T2377). This evaluation does not assess a program or policy. Its main objective was to estimate the causal effects of violence on newborn health parameters in Brazil. Another example is “The Effects of Tropical Storms on Early Childhood Development” (code RG-T2293). Other evaluations refer to impacts of national policies, but not specific programs or approaches. This is the case of “Evaluación de impacto de la reforma tributaria de 2012 a través de equilibrio general” (code CO-T1345), which seeks to evaluate if Colombia’s fiscal reform created more formal jobs in the country.
3.9 An analysis by division (Graph 3.2) shows that the two5 with the highest proportion of IEs that fill a knowledge gap are Gender and Diversity (67%) and Environment, Rural Development and Disaster Risk (57%). The divisions that produce most IEs for operational knowledge are Social Protection and Health (50%) and Education (40%).
Graph 3.2 - Purpose of impact evaluations by division
Source: CLEAR-LAC Center. FMM: Fiscal and Municipal Management; TSP: Transport; CMF: Capital Markets and Financial Institutions; GDI: Gender and Diversity; RND: Environmental, Rural Development & Disaster Risk; CTI: Competitiveness and Innovation; ICS: Institutional Capacity of the State; LKM: Labor Markets; SPH: Social Protection & Health; EDU: Education
3.10 Although 70% of completed IEs provide a relevant literature review, in 50% of the cases the review is incomplete. One-third of IEs either provide an irrelevant review or do not provide one at all (Table 3.1).
5 Excluding the Fiscal and Municipal Management division, which only has one evaluation.
0
5
10
15
20
25
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
FMM TSP CMF GDI RND CTI ICS LKM SPH EDU
Other Operational knowledge requirement for IDB
Replication Policy completion
Knowledge gap Total number of IEs per division
7
Table 3.1 - Literature review assessment
Literature review Complete IEs Percentage
Report does not provide a literature review 23 27%
Provides irrelevant literature 3 3%
Provides relevant but incomplete literature review 35 41%
Provides relevant and exhaustive review 25 29%
Total 86 100%
Source: CLEAR-LAC Center.
3.11 Considering literature reviews per division (Graph 3.3), Social Protection and Health, Education, and Environment, Rural Development and Disaster Risk present the highest percentage of reports that do not provide a proper literature review. The divisions with the highest percentages of IEs with a relevant but incomplete literature review are Labor Markets and Competitiveness and Innovation. The division with the highest percentage of IEs that provide relevant and exhaustive literature review is Gender and Diversity.
Graph 3.3 - Literature review by division
Source: CLEAR-LAC Center. FMM: Fiscal and Municipal Management; TSP: Transport; CMF: Capital Markets and Financial Institutions; GDI: Gender and Diversity; RND: Environmental, Rural Development & Disaster Risk; CTI: Competitiveness and Innovation; ICS: Institutional Capacity of the State; LKM: Labor Markets; SPH: Social Protection & Health; EDU: Education
3.12 The assessment shows that 62% of completed IEs have a complete or only slightly flawed research protocol (see Table 3.2). Some of the flaws pointed out by reviewers were the lack of strategies to overcome possible biases in estimations, weak explanations of interventions and theories of change, absence of problem background and justification of interventions (see “Evaluar para seguir adelante: resultados del programa Redes” [code AR_L1142] or “Computer-Assisted English Language Learning in Costa Rica Elementary Schools: An Experimental Study” [code CR-T1055]).
0
5
10
15
20
25
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
FMM TSP CMF GDI RND CTI ICS LKM SPH EDU
Does not provide a literature review Provides irrelevant literature
Provides relevant but incomplete literature review Provides relevant and exhaustive review
Total number of IEs per division
8
Table 3.2 - Research protocol assessment
Research protocol Completed IEs Percentage
Report does not include research protocol 15 17%
Report provides a poor and not well-defined research
protocol
18 21%
Reports provides a research protocol with minor flaws 28 33%
Report includes a clear and complete research protocol 25 29%
Total 86 100%
Source: CLEAR-LAC Center.
3.13 When research protocols are analyzed by division (see Graph 3.4), the division that lacks protocols the most is Competitiveness and Innovation. Institutional Capacity of the State has the highest percentage of IEs that provide a poor and not well-defined research protocol. Labor Markets has the highest percentage of reports with a research protocol with minor flaws. Finally, the division that has the highest percentage of IEs that include a clear and complete research protocol is Gender and Diversity (5 IEs, 83%).
Graph 3.4 Research protocol by division
Source: CLEAR-LAC Center. FMM: Fiscal and Municipal Management; TSP: Transport; CMF: Capital Markets and Financial Institutions; GDI: Gender and Diversity; RND: Environmental, Rural Development & Disaster Risk; CTI: Competitiveness and Innovation; ICS: Institutional Capacity of the State; LKM: Labor Markets; SPH: Social Protection & Health; EDU: Education
3.14 “Literature review” and “Research protocol” (Table 3.3) are not independent characteristics. Good literature reviews are more frequent in evaluations that include adequate research protocols. There was only one IE that included a clear and complete research protocol without a literature review — “Primera evaluación de impacto del programa de transferencias monetarias ‘Bono 10,000’ en zonas rurales de la República de Honduras” (code HO-L1071). Another interesting case is “Evaluación de impacto del Programa de Acceso al Crédito y Competitividad para micro, pequeñas y medianas empresas (PACC) en Argentina” (code AR-L1033). The results of this evaluation show that it provides a relevant and exhaustive literature review, though it lacks a well-defined research protocol.
0
5
10
15
20
25
0%
20%
40%
60%
80%
100%
FMM TSP CMF GDI RND CTI ICS LKM SPH EDU
Report includes a clear and complete research protocolReports provides a research protocol with minor flawsReport provides a poor and not well defined research protocolReport does not include research protocolTotal number of IEs per division
9
Table 3.3 - Crosstabs, literature review, and research protocol.
(% of IEs in each category)
Research protocol/ Literature review
Report does not provide a
literature review
Provides irrelevant literature
Provides relevant
but incomplete literature
review
Provides relevant
and exhaustive
review
Total
Report does not include research protocol
9% 1% 7% 0% 17%
Report provides a poor and not well defined research protocol
9% 1% 9% 1% 21%
Reports provides a research protocol with minor flaws
7% 1% 14% 10% 33%
Report includes a clear and complete research protocol
1% 0% 10% 17% 29%
Grand Total 27% 3% 41% 29% 100%
Source: CLEAR-LAC Center.
2. Methodology
3.15 This section looks into the type of methodology used by IEs and how rigorous it is, the appropriateness of the post-treatment period and follow-ups, and explicit power calculations.
3.16 35% of completed evaluations used RCTs. 56% used quasi-experimental methods, including matching, regression discontinuity, instrumental variables (IVs), and other methods. Evaluations that used mixed methods represent 9% (see Graph 6). It is worth noting the absence of IEs with synthetic controls. Out of the evaluations using RCTs, two-thirds (20) come from the Social Protection and Health and Education divisions (see Graph 3.5).
3.17 Methodological rigor was assessed in three categories: a) rigor in almost all aspects; b) rigor in some aspects but weaknesses in others; and c) serious methodological shortcomings. IEs that were poorly ranked in methodological rigor either lack an analysis and justification for the estimation methods used, a discussion of possible biases, or a balance analysis between groups. One example is “Informe final de evaluación de impacto del Proyecto Espacios Educativos y Calidad de los Aprendizajes” (code PN-L1064). IEs ranked as rigorous in some aspects but weak in others lacked a clear theoretical justification to use certain methods, or overlooked some of the assumptions that those methods imply. An example is “Evaluación de impacto de un programa de inclusión social y prevención de violencia estudiantil” (code RG-T2321). Finally, IEs with rigor in almost all aspects had creative identification strategies and discussed how the evaluation complied with the relevant methodological assumptions. An example is “Evaluación de impacto del programa de salud materno-infantil ‘Bono Juana Azurduy’” (code BO-L1032).
10
Graph 3.5 - Methodology
Source: CLEAR-LAC Center.
Graph 3.6 - Methodology by division
Source: CLEAR-LAC Center. FMM: Fiscal and Municipal Management; TSP: Transport; CMF: Capital Markets and Financial Institutions; GDI: Gender and Diversity; RND: Environmental, Rural Development & Disaster Risk; CTI: Competitiveness and Innovation; ICS: Institutional Capacity of the State; LKM: Labor Markets; SPH: Social Protection & Health; EDU: Education
3.18 Reviewers found that 42% IEs exhibit rigorous methodology, 35% are rigorous in some respects but weak in others, and 23% present serious shortcomings, as they lack an explanation on the methodology or a validation of selected methods (Graph 3.7).
3.19 The assessment criteria were the existence of an adequate revision of previous evaluative work; an analysis of different indicators for measuring relevant impacts; a discussion on validity concerns and strategies for estimation; and an adequate sample size for statistical power calculations. An example of an evaluation with rigorous methodology is “Impact Evaluation of the National Youth Service of Jamaica” (code JA-T1035). The three divisions with the best results in
0%
5%
10%
15%
20%
25%
30%
35%
40%
Experimental(RCT)
Other quasi Matching Multiple methods Regressiondiscontinuity
IV
0
5
10
15
20
25
0%
20%
40%
60%
80%
100%
FMM TSP CMF GDI RND CTI ICS LKM SPH EDU
Experimental (RCT) Regression discontinuity IV
Matching Multiple methods Other quasi
Total number of IEs per division
11
methodological rigor are Environmental, Rural Development and Disaster Risk, Gender and Diversity, and Labor Markets ( Source: CLEAR-LAC Center.
3.20 Graph 3.8).
Graph3.7 - Methodological rigor
Source: CLEAR-LAC Center.
Graph 3.8 - Methodological rigor by division
Source: CLEAR-LAC Center. FMM: Fiscal and Municipal Management; TSP: Transport; CMF: Capital Markets and Financial Institutions; GDI: Gender and Diversity; RND: Environmental, Rural Development & Disaster Risk; CTI: Competitiveness and Innovation; ICS: Institutional Capacity of the State; LKM: Labor Markets; SPH: Social Protection & Health; EDU: Education.
3.21 Methodological rigor is most frequently found in the RCTs (See Graph3.9).
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
Serious shortcomings results are not reliable
Rigorous in some respects, weak in others
Rigorous in almost all respects
0
5
10
15
20
25
0%
20%
40%
60%
80%
100%
FMM TSP CMF GDI RND CTI ICS LKM SPH EDU
Methodology rigorous in almost all respects
Methodology rigorous in some respects, weak in others
Methodology with serious shortcomings
Total number of IEs per division
12
Graph 3.9 - Methodological rigor and type of method use
Source: CLEAR-LAC Center.
3.22 60% of IEs have a reasonable number of follow-ups and an adequate post-treatment time period to capture the effects. However, 26 reports do not have enough information to conclude if the follow-ups are adequate or useful while trying to determine if the time period is correct. There seems to be a general improvement in the follow-ups and timing adequacy, as seen in Graph 3.10. From 2011 onward, there were no completed IEs with inadequate follow-ups or timing.
Graph 3.10 - Evaluation with adequate follow-ups and timing per year
Source: CLEAR-LAC Center.
3.23 Regarding follow-up adequacy by division (Graph 3.11), Transport, Institutional Capacity of the State, Social Protection and Health, and Education had IEs with inadequate follow-ups or timing. Meanwhile, Competitiveness and Innovation and Labor Markets had the highest percentages of IEs with adequate follow-ups.
0
5
10
15
20
25
30
35
40
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Methodology with seriousshortcomings
Methodology rigorous in somerespects, weak in others
Methodology rigorous in almost allrespects
Experimental (RCT)Regression discontinuityIVMatchingMultiple methodsOther quasiTotal number of IEs in each category
0
2
4
6
8
10
12
14
16
18
20
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Yes No NA Total IEs per year
13
Graph 3.11 - Adequacy of follow-ups and timing by division
Source: CLEAR-LAC Center.
3.24 As for power calculations (Graph 3.12), 67 IEs do not discuss power calculations, 4 mention them with important shortcomings, 6 are included but need further clarification, and 9 are useful enough to detect meaningful differences. Regarding the 4 IEs that had important shortcomings, the assessment results show that none of their power calculations were statistically valid or evidence-based.
Graph 3.12 - Power calculations by year
Source: CLEAR-LAC Center.
3.25 Graph 3.13 shows the results of the power calculations assessment by division. Fiscal and Municipal Management, Gender and Diversity and Education appear to have better performance in this attribute.
0
5
10
15
20
25
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
FMM TSP CMF GDI RND CTI ICS LKM SPH EDU
Yes No NA Total IEs per division
0
5
10
15
20
0%
20%
40%
60%
80%
100%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Sufficiency of power to detect meaningful differences
Power calculations included
Mention of power calculations but with some important shortcomings
No Power calculations
Total IEs per year
14
Graph 3.13 - Power calculations by division
Source: CLEAR-LAC Center. FMM: Fiscal and Municipal Management; TSP: Transport; CMF: Capital Markets and Financial Institutions; GDI: Gender and Diversity; RND: Environmental, Rural Development & Disaster Risk; CTI: Competitiveness and Innovation; ICS: Institutional Capacity of the State; LKM: Labor Markets; SPH: Social Protection & Health; EDU: Education.
3. Data
3.26 The third aspect reviewed is data quality. Considered criteria are: sample size, unit of analysis, and overall data quality. The analysis reveals that the sample size of completed evaluations is quite heterogeneous, ranging from 6 to over 500,000 (excluding the case with sample size of over three million observations). Table 3.4 shows descriptive statistics of sample sizes by method used.
Table 3.4 - Sample size by method used
Methodology IEs Mean Std. Dev. Max. Min.
o 30 3,817.40 5,764.06 30,736.00 272.00
Regression discontinuity 4 19,267.50 24,762.99 55,617.00 3,087.00
IV 3 946.67 478.14 1,287.00 400.00
Matching 13 1,614.62 1,476.62 4,816.00 15.00
Multiple methods 7 44,011.43 75,880.60 198,333.00 355.00
Other quasi 24 29,365.50 103,195.48 505,253.00 6.00
Total 81 15,163.88 61,226.97 505,253.00 6.00
Source: CLEAR-LAC Center. Four impact evaluations do not provide the sample size in the report. 2 Summary statistics exclude the case with the outlier of 3 million observations.
3.27 The assessment shows that 37 IEs (45%) have a satisfactory sample size, 25 (30%) have a partially satisfactory sample, 10 (12%) have partially unsatisfactory
0
5
10
15
20
25
0%
20%
40%
60%
80%
100%
FMM TSP CMF GDI RND CTI ICS LKM SPH EDU
Sufficiency of power to detect meaningful differences
Power calculations included in the proposal, but some clarifications needed
Mention of power calculations but with some important shortcomings
No Power calculations described in the document
Total number of IEs per division
15
sample size, and 11 (13%) have unsatisfactory samples (Graph 3.14)6. An unsatisfactory evaluation was “Análisis de las bases de datos de la evaluación de impacto del programa de salud sexual y reproductiva en adolescentes en Medellín” (code CO-T1020) and “English for Latin America” (code CR-T1084). The former shows that there exists confusion in the measurement effect and that variables extracted from the sample come from different data sets, which were not clearly explained. The latter did not provide elements for sample selection; therefore, there was no evidence to determine if the sample was satisfactory to some extent.
Graph 3.14 Sample size adequacy
Source: CLEAR-LAC Center.
3.28 Almost all divisions rank at least partially satisfactory in this aspect, except for the Institutional Capacity of the State and Transport.
6 Three IEs did not have an answer for this question: “Evaluación de impacto de la reforma tributaria de 2012
a través de equilibrio general” (code CO-1345), “Construcción de un modelo económico de evaluación de impacto local (informe final PNT)” (code NI-L1039), “Informe final de evaluación de impacto del proyecto espacios educativos y calidad de los aprendizajes” (code PN-L1064).
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
Satisfactory Partly satisfactory Partly unsatisfactory Unsatisfactory
16
Graph 3.15 - Adequacy of sample size by division7
Source: CLEAR-LAC Center. FMM: Fiscal and Municipal Management; TSP: Transport; CMF: Capital Markets and Financial Institutions; GDI: Gender and Diversity; RND: Environmental, Rural Development & Disaster Risk; CTI: Competitiveness and Innovation; ICS: Institutional Capacity of the State; LKM: Labor Markets; SPH: Social Protection & Health; EDU: Education.
3.29 As for the unit of analysis, results show that a majority of IEs are at the individual level. By method, individuals are the unit of analysis for over 70% of evaluations that use RCTs, regression discontinuity, and matching. Families are the second most used unit of analysis.
Graph 3.16 - Unit of analysis by method
Source: CLEAR-LAC Center.
7 For two IEs “Evaluación de impacto de la reforma tributaria de 2012 a través de equilibrio general”
(code CO-T1345) and “Construcción de un modelo económico de evaluación de impacto local (informe final PNT)” (code NI-L1039), this analysis was not applicable. Meanwhile, “Informe final de evaluación de impacto del proyecto espacios educativos y calidad de los aprendizajes” (code PN-L1064) did not report this information. Thus, these three IEs were excluded from the analysis.
0
2
4
6
8
10
12
14
16
18
20
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
FMM TSP CMF GDI RND CTI ICS LKM SPH EDU
Unsatisfactory Partly unsatisfactory
Partly satisfactory Satisfactory
Total number of IEs per division
0
10
20
30
40
50
60
0%
20%
40%
60%
80%
100%
Individuals Families Schools Communities Geographical unit Other
Experimental (RCT) Regression discontinuity
IV Matching
Multiple methods Other quasi
Total number of IEs per unit of analysis
17
Table 3.5 - Adequacy of unit analysis
Adequacy of unit analysis IE %
Yes 81 94%
No 2 2.5%
N/A 3 3.5%
Source: CLEAR-LAC Center. N/A: not enough information to assess adequacy of UA.
3.30 The results show that, for 94% of IEs, the unit of analysis is considered adequate. There were three completed IEs that did not have enough information to asess the adequacy of the unit of analysis. Besides these, only two IEs had an inadequate unit of analysis. The report “Evaluar para seguir adelante: resultados del programa Redes” (code AR-L1142) did not have an adequate unit of analysis, since its purpose was to find the effect of a program on antihypertensive treatment, but it used aggregate data instead of individual data.
3.31 Regarding the main type of data used, 45% of IEs use data that comes from specific surveys; 23% use administrative data; 27% use a combination of survey and administrative data; and only 5% of the data belong to other types (Graph 3.17). Other data sources are national surveys and bibliographic databases.
Graph 3.17 - Type of data by method
Source: CLEAR-LAC Center.
3.32 Regarding data quality (Table 3.6), 12% of the IEs do not indicate how variables were constructed; 27% include some description on how the variables were constructed, but no discussion on the reliability of the measures; 38% do describe how the variables were constructed; and 24% have a careful selection of relevant variables and most measures used are reliable.
0
5
10
15
20
25
30
35
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
IV Regressiondiscontinuity
Multiple methods Matching Other quasi Experimental(RCT)
Administrative Survey collected in for the IE
Mixed Other
Total number of IEs per methodology
18
Table 3.6 - Evaluation of data8
Data quality to estimate the impact of the program or policy IE %
No indication of how study variables were constructed or obtained 10 12%
Some description on how the variables were constructed, but no discussion on
the reliability of the measures 23 27%
Includes a description on how the variables were constructed. Some reliability
reported; not all measures demonstrated to be reliable 32 38%
Careful selection of relevant variables considering their prior use and reliability
demonstrated for all or most of the measures (public codes, scales) 20 24%
Source: CLEAR-LAC Center.
3.33 Data quality proved to be stronger when the evaluation was an RCT (Graph 3.18).
Graph 3.18 - Data quality and method
Source: CLEAR-LAC Center.
3.34 Results of data quality by division are shown in Graph 3.19 Competitiveness and Innovation and Gender Diversity ranked highly (more than 80% of IEs) in this criterion.
8 The IE “Programa de Educación Primaria e Integración Tecnológica (PEPIT): Componentes 2 y 3” (code
HO-L1062) did not have an answer for this question. Thus this IE was excluded from the analysis.
0
5
10
15
20
25
30
35
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
No indication of how studyvariables were constructed
or obtained
Some description on howthe variables were
constructed
Includes a description onhow the variables were
constructed
Careful selection of relevantvariables
Experimental (RCT) Regression discontinuity
IV Matching
Multiple methods Other quasi
Total number of IEs per data quality
19
Graph 3.19 - Data quality by division
Source: CLEAR-LAC Center. FMM: Fiscal and Municipal Management; TSP: Transport; CMF: Capital Markets and Financial Institutions; GDI: Gender and Diversity; RND: Environmental, Rural Development & Disaster Risk; CTI: Competitiveness and Innovation; ICS: Institutional Capacity of the State; LKM: Labor Markets; SPH: Social Protection & Health; EDU: Education
4. Results
3.35 This section looks at the following criteria for evaluation results: sign of the treatment effect; consistence between findings and expected sign according to relevant literature; implementation problems; and reliability of results.
3.36 Many IEs (80%) consider more than one outcome and some (23%) even up to four. The sign of effect of treatment is positive for 57% of outcomes and negative for 16%. More than 20% of estimated outcomes show a null treatment effect.
3.37 The following implementation problems were identified: attrition and non-response, spillover effects, and multiple hypothesis testing. The results show attrition and non-response were reported in 35 of completed IEs, spillover effects in 16, and multiple hypothesis testing in 9 (Graph 3.20). 54% of evaluations reported at least one implementation problem. There were 35 IEs with only one implementation problem mentioned, 11 with two, and one with all three problems. Attrition and non-response were present in almost all divisions but Competitiveness and Innovation. Spillover effects were most highly found in Institutional Capacity of the State (Graph 3.21).
0
5
10
15
20
0%
20%
40%
60%
80%
100%
FMM TSP CMF GDI RND CTI ICS LKM SPH EDU
Careful selection of relevant variables
Includes a description on how the variables were constructed
Some description on how the variables were constructed
No indication of how study variables were constructed or obtained
Total number of IEs per division
20
Graph 3.20 - Implementation problems
Source: CLEAR-LAC Center.
Graph 3.21 - Implementation problems by division
Source: CLEAR-LAC Center. FMM: Fiscal and Municipal Management; TSP: Transport; CMF: Capital Markets and Financial Institutions; GDI: Gender and Diversity; RND: Environmental, Rural Development & Disaster Risk; CTI: Competitiveness and Innovation; ICS: Institutional Capacity of the State; LKM: Labor Markets; SPH: Social Protection & Health; EDU: Education
3.38 The assessment asked the reviewers if these problems could cause unreliable results. 66% of reviewers did not think these problems represented reliability concerns, but reliability was indeed an issue for 34% of cases.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Attrition and non-response Spill-over effects Multiple-hypothesis testing
Yes No
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
FMM TSP CMF GDI RND CTI ICS LKM SPH EDU
Attrition and non-response Spill-over effects Multiple-hypothesis testing
21
Table 3.7 - Reliability of results.
Are these problems affecting the reliability of the evaluation?
Response Total IE %
Yes 16 34%
No 31 66%
Total 47 100%
Source: CLEAR-LAC Center. Note: 39 evaluations did not report problems of attrition and non-response, spillover effects, and
multiple hypothesis testing.
5. Overall quality assessment
3.39 This last section refers to overall quality. The reviewers were asked to provide: an informed opinion on the evaluation’s quality (in four categories, from unsatisfactory to satisfactory); a statement on whether the report should be recommended for publication (from no publication to publication in a top academic journal); a professional opinion on whether they would have recommended to conduct the evaluation. From 86 completed and analyzed IEs, the overall quality assessment shows that 33% are satisfactory, 22% are partially satisfactory, 23% are partially unsatisfactory, and 22% are unsatisfactory (Graph 3.22).
Graph 3.22 - Quality assessment of IEs
Source: CLEAR-LAC Center.
3.40 By division (Graph 3), the ones with highest percentages of unsatisfactory and partially unsatisfactory are Transport, Competitiveness and Innovation and Capital Markets and Financial Institutions. In contrast, Gender and Diversity, Environmental, Rural and Disaster Risk and Labor Markets have the highest percentages of partially satisfactory and satisfactory IEs.
0
5
10
15
20
25
30
0%
5%
10%
15%
20%
25%
30%
35%
Unsatisfactory Partly unsatisfactory Partly satisfactory Satisfactory
22
Graph 3.23 - Overall quality assessment by division
Source: CLEAR-LAC Center. FMM: Fiscal and Municipal Management; TSP: Transport; CMF: Capital Markets and Financial Institutions; GDI: Gender and Diversity; RND: Environmental, Rural Development & Disaster Risk; CTI: Competitiveness and Innovation; ICS: Institutional Capacity of the State; LKM: Labor Markets; SPH: Social Protection & Health; EDU: Education
3.41 Quality assessment by year of approved operation is shown in Graph 3.24. There is no obvious time trend in IEs quality. In 6 out of 10 years, quality ranked satisfactory and partially satisfactory for more than half of evaluations. An analysis of the three divisions with more than 60% of evaluations, by year and quality, show no trend within divisions (Graph 3.25).
Graph 3.24 - Quality assessment by year
Source: CLEAR-LAC Center.
0
5
10
15
20
25
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
FMM TSP CMF GDI RND CTI ICS LKM SPH EDU
Unsatisfactory Partly unsatisfactory
Partly satisfactory Satisfactory
Total number of IEs per division
0
2
4
6
8
10
12
14
16
18
20
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
Satisfactory Partly satisfactoryPartly unsatisfactory UnsatisfactoryTotal number of IEs per year
23
Graph 3.25 - Quality by division and year
Source: CLEAR-LAC Center.
3.42 Reviewers were asked to determine if they would recommend conducting the IE, considering the intervention, budget, and timeline. From 86 completed IEs, 56% would be recommended and 42% would not (Graph 3.26)9.
Graph 3.26 - Recommendation to conduct the evaluations
Source: CLEAR-LAC Center.
9 In two cases reviewers indicated they did not consider the document to be an impact evaluation. These two
cases were “Evaluating the Impact of the Agricultural Technology Transfer Program on Peanut Production in Haiti” (Code HA-L1059) and “The effects of natural disasters on labor markets: do hurricanes increase informality? (RG-T2293).
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2006 2008 2009 2010 2011 2012 2013 2014 2009 2011 2012 2013 2006 2008 2009 2010 2011 2012 2015
Education Labor Markets Social Protection & Health
1. Unsatisfactory 2. Partly unsatisfactory 3. Partly satisfactory 4. Satisfactory
56%
42%
2%
Yes No N/A
24
3.43 When IEs recommendation data are compounded with overall quality (Graph 3.27), we found that all satisfactory IEs would be recommended for implementation. It is also interesting to see that all of the unsatisfactory IEs would not be recommended.
Graph 3.27 - Authorization and overall quality
Source: CLEAR-LAC Center.
3.44 As for publication (see Graph 3.28), 47% of completed IEs are not material for peer review, 27% could be published in low-ranked journals, 21% could be published in top field journals, and 5% in top journals.
Graph 3.28 - Recommendation for publication
Source: CLEAR-LAC Center.
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Yes NoSatisfactory Partly satisfactory Partly unsatisfactory Unsatisfactory
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
It is not material for peerreview journal
Low rank journal Top field journal Top Journal
25
3.45 Finally, publication recommendations by division appear in Graph . Labor Markets have the best outcome with more than 60% of documents (9) recommended for publication.
Graph 3.29 - Recommendation for publication by division
Source: CLEAR-LAC Center. FMM: Fiscal and Municipal Management; TSP: Transport; CMF: Capital Markets and Financial Institutions; GDI: Gender and Diversity; RND: Environmental, Rural Development & Disaster Risk; CTI: Competitiveness and Innovation; ICS: Institutional Capacity of the State; LKM: Labor Markets; SPH: Social Protection & Health; EDU: Education.
3.46 Finally, overall quality for each IE was compared with other quality attributes. Unsatisfactory IEs generally exhibit serious methodological shortcomings, no indication of how study variables were constructed, and reliability issues due to implementation problems. Satisfactory IEs have mostly rigorous methodologies, careful variable selection, and no reliability issues. Thus, the overall quality assessment is consistent with the quality attributes included in the questionnaires.
IV. PROPOSALS
4.1 A stratified random sample of 59 evaluation proposals was selected to do the quality assessment analysis. The main findings show that more than half proposals were in the categories of unsatisfactory or partially unsatisfactory for overall quality. This is related to the proposals’ relevance, methodological rigor, and data quality.
4.2 Proposals were mostly designed (59%) to fill a knowledge gap or requirement, that is, to provide new evidence on a new approach or program’s functioning, including those IDB-supported. In contrast with complete IEs, a larger percentage of proposals are relevant to provide incentives to complete the implementation of a policy or program (20%). However, a high proportion of proposals (46%) does not include a relevant literature review.
4.3 RCTs are the most popular method in proposals and are concentrated in Social Protection and Health. Meanwhile, quasi-experimental methods (including all variants) surpass experiments by more than 30%. Data is mostly obtained through ad hoc surveys with little use of administrative information.
0
5
10
15
20
25
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
FMM TSP CMF GDI RND CTI ICS LKM SPH EDU
It is not material for peer review journal Low rank journal
Top field journal Top Journal
Total number of IEs per division
26
4.4 Less than one-third of the proposals have methodological rigor in almost all aspects. The majority of proposals are weak or have serious shortcomings (71%). Power calculations are mentioned in more than 60% of proposals, and 54% actually have them or are sufficient to detect meaningful differences. Power calculations are absent in proposals by the Fiscal and Municipal Management and Trade and Investment divisions. Labor Markets and Competitiveness and Innovation have the higher percentages of adequate power calculations. More than half proposals have satisfactory or partially satisfactory sample sizes. Almost 40% of proposals have no indication on how variables were constructed. This weakness appears to be significantly relevant in the Education division. The divisions that have the highest percentages of adequate data are Competitiveness and Innovation and Labor Markets.
4.5 More than half proposals are unsatisfactory or partially unsatisfactory in overall quality. However, overall quality appears to be improving in time. Four divisions only have proposals ranked as unsatisfactory of partially unsatisfactory (12 proposals).10 More than half proposals would not be recommended by external reviewers and would not be recommended to be published. None of the proposals qualified to be published in top journals.
4.6 Graphs and Tables by sections of the assessment instrument can be found below.
1. Relevance
Graph 4.1 - Purpose of IE proposals
Source: CLEAR-LAC Center.
10 The four divisions are CMF (Capital Markets and Financial Institutions); FMM (Fiscal and Municipal
Management); HUD (Housing and Urban Development) and TRI (Trade and Investment).
20%
39%8%
20%
12%
Knowledge gap Operational knowledge requirement for IDB Other Policy completion Replication
27
Graph 4.2 - Purpose of IE proposals by year11
Source: CLEAR-LAC Center.
Graph 4.3 - Purpose of IE proposals by division
Source: CLEAR-LAC Center. GDI: Gender and Diversity; CMF: Capital Markets and Financial Institutions; ENE: Energy; FMM: Fiscal and Municipal Management; LKM: Labor Markets; TSP: Transport; HUD: Housing and Urban Development; TRI: Trade and Investment; CTI: Competitiveness and Innovation; RND: Environmental, Rural Development & Disaster Risk; EDU: Education; ICS: Institutional Capacity of the State; SPH: Social Protection & Health.
11 There was no information about the approval year of 14 proposals. Thus they were excluded from the
analysis.
0
2
4
6
8
10
12
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2010 2011 2012 2013 2014 2015 2016Knowledge gap Operational knowledge requirement for IDB
Other Policy completion
Replication Total proposals per year
0
2
4
6
8
10
12
14
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
GDI CMF ENE FMM LKM TSP HUD TRI CTI RND EDU ICS SPH
Fills a knowledge gap Fills an operational knowledge requirement
Policy completion Replication
Other Total number of proposals
28
Table 4.1 - Literature review assessment
Literature review IE Proposals Percentage
Report does not provide a literature review 27 46%
Provides irrelevant literature 4 7%
Provides relevant but incomplete literature review 7 12%
Provides relevant and exhaustive review 21 36%
Total 59 100%
Source: CLEAR-LAC Center.
Graph 4.4 - Literature review by division
Source: CLEAR-LAC Center. GDI: Gender and Diversity; CMF: Capital Markets and Financial Institutions: ENE: Energy; FMM: Fiscal and Municipal Management; LKM: Labor Markets; TSP: Transport; HUD: Housing and Urban Development; TRI: Trade and Investment; CTI: Competitiveness and Innovation; RND: Environmental, Rural Development & Disaster Risk; EDU: Education; ICS: Institutional Capacity of the State; SPH: Social Protection & Health.
Table 4.2 - Research protocol assessment
Research protocol IE Proposals Percentage
Report does not include research protocol 16 27%
Report provides a poor and not well-defined research protocol 4 7%
Reports provides a research protocol with minor flaws 24 41%
Report includes a clear and complete research protocol 15 25%
Total 59 100%
Source: CLEAR-LAC Center.
0
2
4
6
8
10
12
14
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
GDI CMF ENE FMM LKM TSP HUD TRI CTI RND EDU ICS SPH
Does not provide a literature review Provides irrelevant literature
Provides relevant but incomplete literature review Provides relevant and exhaustive review
Total number of proposals
29
Graph 4.5 - Research protocol by division
Source: CLEAR-LAC Center. GDI: Gender and Diversity; CMF: Capital Markets and Financial Institutions: ENE: Energy; FMM: Fiscal and Municipal Management; LKM: Labor Markets; TSP: Transport; HUD: Housing and Urban Development; TRI: Trade and Investment; CTI: Competitiveness and Innovation; RND: Environmental, Rural Development & Disaster Risk; EDU: Education; ICS: Institutional Capacity of the State; SPH: Social Protection & Health
2. Methodology12
Graph 4.6 - Methodology
Source: CLEAR-LAC Center.
12 The proposal “Sistema Regional de Evaluación de Impacto de Políticas de Seguridad Ciudadana para
América Latina” RG-T2009 did not include a methodology section and was excluded from this analysis
0
2
4
6
8
10
12
14
0%
20%
40%
60%
80%
100%
GDI CMF ENE FMM LKM TSP HUD TRI CTI RND EDU ICS SPHProposal with a clear and complete protocol
Protocol has minor flaws
Protocol is poor and not well-defined research question
Proposal does not include research protocol
Total general
0%
5%
10%
15%
20%
25%
30%
35%
40%
Experimental(RCT)
Other quasi Matching Multiple methods Regressiondiscontinuity
IV
30
Graph 4.7 - Methodology by division
Source: CLEAR-LAC Center. GDI: Gender and Diversity; CMF: Capital Markets and Financial Institutions: ENE: Energy; FMM: Fiscal and Municipal Management; LKM: Labor Markets; TSP: Transport; HUD: Housing and Urban Development; TRI: Trade and Investment; CTI: Competitiveness and Innovation; RND: Environmental, Rural Development & Disaster Risk; EDU: Education; ICS: Institutional Capacity of the State; SPH: Social Protection & Health
Graph 4.8 - Evaluation of methodology
Source: CLEAR-LAC Center.
0
2
4
6
8
10
12
14
0%
20%
40%
60%
80%
100%
GDI CMF ENE FMM LKM TSP HUD TRI CTI RND EDU ICS SPH
Experimental (RCT) Regression discontinuity
Matching Multiple methods
Other quasi IV
Total number of Proposals per division
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
Serious shortcomings results are not reliable
Rigorous in some respects, weak in others
Rigorous in almost all respects
31
Graph 4.9 - Evaluation of methodology by division
Source: CLEAR-LAC Center. GDI: Gender and Diversity; CMF: Capital Markets and Financial Institutions: ENE: Energy; FMM: Fiscal and Municipal Management; LKM: Labor Markets; TSP: Transport; HUD: Housing and Urban Development; TRI: Trade and Investment; CTI: Competitiveness and Innovation; RND: Environmental, Rural Development & Disaster Risk; EDU: Education; ICS: Institutional Capacity of the State; SPH: Social Protection & Health
Graph 4.10 - Methodological rigor and type of method used
Source: CLEAR-LAC Center.
0
2
4
6
8
10
12
14
0%
20%
40%
60%
80%
100%
GDI CMF ENE FMM LKM TSP HUD TRI CTI RND EDU ICS SPH
Methodology rigorous in almost all respects
Methodology rigorous in some respects, weak in others
Methodology with serious shortcomings. Results are not reliable
Total number of Proposals per division
0
5
10
15
20
25
0%
20%
40%
60%
80%
100%
Methodology with seriousshortcomings
Methodology rigorous in somerespects, weak in others
Methodology rigorous in almost allrespects
Experimental (RCT) Regression discontinuity
Matching Multiple methods
Other quasi IV
Total number of Proposals in each category
32
Graph 4.11 - Adequate follow-ups and timing
Source: CLEAR-LAC Center.
Graph 4.12 - Power calculations
Source: CLEAR-LAC Center. NA: The Proposal “Sistema Regional de Evaluación de Impacto de Políticas de Seguridad Ciudadana para América Latina” (code RG-T2009) did not include any calculations related with power calutions.
41%
7%
53%
N/A No Yes
0
5
10
15
20
25
No Power calculationsdescribed in the
document
Mention of powercalculations but with
some importantshortcomings
Power calculationsincluded in the
proposal, but someclarifications needed
Sufficiency of power todetect meaningful
differences
NA
33
Graph 4.13 - Power calculations by division
Source: CLEAR-LAC Center. GDI: Gender and Diversity; CMF: Capital Markets and Financial Institutions: ENE: Energy; FMM: Fiscal and Municipal
Management; LKM: Labor Markets; TSP: Transport; HUD: Housing and Urban Development; TRI: Trade and Investment;
CTI: Competitiveness and Innovation; RND: Environmental, Rural Development & Disaster Risk; EDU: Education; ICS:
Institutional Capacity of the State; SPH: Social Protection & Health
3. Data
Table 4.3 - Sample size by method used
Methodology IEs Mean Std. Dev. Max. Min.
Experimental (RCT) 14 5,861 11,213 43,764 60
Regression discontinuity 3 9,927 13,060 25,000 1,980
Matching 5 1,448 1,056 3,000 190
Multiple methods 6 2,582 4,073 10,800 400
Other quasi 6 485 741 1,840 6
Total 34* 4,044 8,418 43,764 6
Source: CLEAR-LAC Center. *Out of the 59 reviewed proposals 25 did not specify their sample size.
0
2
4
6
8
10
12
14
0%
20%
40%
60%
80%
100%
GDI CMF ENE FMM LKM TSP HUD TRI CTI RND EDU ICS SPH
Sufficiency of power to detect meaningful differences
Power calculations included in the proposal, but some clarifications needed
Mention of power calculations but with some important shortcomings
No Power calculations described in the document
Total number of Proposals in each category
34
Graph 4.14 - Sample size adequacy
Source: CLEAR-LAC Center. NA: From the total number of proposals the following 11 had no indication of how the sample was conceptualized or designed: AR-L1251, BR-L1175, CH-L1061, CH-L1064, DR-L1048, PE-L1135, PR-L1084, RG-T2009 and RG-T2095.
Graph 4.15 - Adequacy of sample size by division
Source: CLEAR-LAC Center. GDI: Gender and Diversity; CMF: Capital Markets and Financial Institutions: ENE: Energy; FMM: Fiscal and Municipal Management; LKM: Labor Markets; TSP: Transport; HUD: Housing and Urban Development; TRI: Trade and Investment; CTI: Competitiveness and Innovation; RND: Environmental, Rural Development & Disaster Risk; EDU: Education; ICS: Institutional Capacity of the State; SPH: Social Protection & Health
0%
5%
10%
15%
20%
25%
30%
35%
Satisfactory Partly satisfactory Partly unsatisfactory Unsatisfactory NA
0
2
4
6
8
10
12
14
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
GDI CMF ENE FMM LKM TSP HUD TRI CTI RND EDU ICS SPH
NA Unsatisfactory
Partly unsatisfactory Partly satisfactory
Satisfactory Total number of Proposals per division
35
Graph 4.16 - Unit of analysis by method
Source: CLEAR-LAC Center.
Table 4.4 - Adequacy of unit analysis
Adequacy of unit analysis IE %
Yes 45 76%
No 3 5.08%
N/A 11 18.64%
Source: CLEAR-LAC Center. N/A – not enough information to assess adequacy of the following proposals: BO-L1063, BO-
L1079, BO-T1193, BR-L1175, BR-L1176, BR-L1223, EC-T1236, EC-T1236, PR-L1084, RG-
T2009 and UR-L1060.
Graph 4.17 - Type of data by method
Source: CLEAR-LAC Center.
0
5
10
15
20
25
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Individuals Families Communities Geographical unit Other
Experimental (RCT) Regression discontinuity
IV Matching
Multiple methods Other quasi
Total number of Proposals per unit of analysis
0
5
10
15
20
25
30
35
40
0%
20%
40%
60%
80%
100%
Administrative Survey collected in for theIE
Mixed Other
Experimental (RCT) Regression discontinuity
IV Matching
Multiple methods Other quasi
Total number of Proposals by methodology
36
Table 4.5 - Evaluation of data
Data Quality to estimate the impact of the program or policy IE %
No indication of how study variables were constructed or obtained 21 36%
Some description on how the variables were constructed, but no discussion on the
reliability of the measures 23 39%
Includes a description on how the variables were constructed. Some reliability reported;
not all measures demonstrated to be reliable 11 19%
Careful selection of relevant variables considering their prior use and reliability
demonstrated for all or most of the measures (public codes, scales) 3 5%
NA 1 2%
Source: CLEAR-LAC Center.
Graph 4.18 - Data quality and method
Source: CLEAR-LAC Center.
0
5
10
15
20
25
0%
20%
40%
60%
80%
100%
No indication of how studyvariables were constructed
or obtained
Some description on howthe variables were
constructed
Includes a description onhow the variables were
constructe
Careful selection of relevantvariables
Experimental (RCT) Regression discontinuity
IV Matching
Multiple methods Other quasi
Total number of Proposals per data quality
37
Graph 4.19 - Data quality by division
Source: CLEAR-LAC Center. GDI: Gender and Diversity; CMF: Capital Markets and Financial Institutions: ENE: Energy; FMM: Fiscal and Municipal Management; LKM: Labor Markets; TSP: Transport; HUD: Housing and Urban Development; TRI: Trade and Investment; CTI: Competitiveness and Innovation; RND: Environmental, Rural Development & Disaster Risk; EDU: Education; ICS: Institutional Capacity of the State; SPH: Social Protection & Health.
4. Overall quality assessment
Graph 4.20 - Quality assessment of the evaluation
Source: CLEAR-LAC Center.
0
2
4
6
8
10
12
14
0%
20%
40%
60%
80%
100%
GDI CMF ENE FMM LKM TSP HUD TRI CTI RND EDU ICS SPH
Careful selection of relevant variables
Includes a description on how the variables were constructe
Some description on how the variables were constructed
No indication of how study variables were constructed or obtained
Total number of Proposals per divison
0%
5%
10%
15%
20%
25%
30%
35%
Unsatisfactory Partly unsatisfactory Partly satisfactory Satisfactory
38
Graph 4.21 - General quality assessment by division
Source: CLEAR-LAC Center. GDI: Gender and Diversity; CMF: Capital Markets and Financial Institutions: ENE: Energy; FMM: Fiscal and Municipal Management; LKM: Labor Markets; TSP: Transport; HUD: Housing and Urban Development; TRI: Trade and Investment; CTI: Competitiveness and Innovation; RND: Environmental, Rural Development & Disaster Risk; EDU: Education; ICS: Institutional Capacity of the State; SPH: Social Protection & Health.
Graph 4.22 - Quality assessment by year13
Source: CLEAR-LAC Center.
13 14 proposals did not have information regarding their approval year: BO-L1063, BO-L1079, BR-L1155, BR-
L1223, BR-L141, CH-L1132, CR-L1043, EC-L1087, EC-T1236, EC-T1236, GU-L1087, HO-G1001, HO-G1001 and PN-L1115.
0
2
4
6
8
10
12
14
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
GDI CMF ENE FMM LKM TSP HUD TRI CTI RND EDU ICS SPH
Unsatisfactory Partly unsatisfactory
Partly satisfactory Satisfactory
Total number of Proposals per divison
0
2
4
6
8
10
12
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
2010 2011 2012 2013 2014 2015 2016
Satisfactory Partly satisfactory
Partly unsatisfactory Unsatisfactory
Total number of Proposals per year
39
Graph 4.23 - IEs’ recommendation
Source: CLEAR-LAC Center.
Graph 4.24 - Authorization and overall quality
Source: CLEAR-LAC Center.
3%
51%
46%
N/A No Yes
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Yes No
Satisfactory Partly satisfactory Partly unsatisfactory Unsatisfactory
40
Graph 4.25 - Recommendation for publication
Source: CLEAR-LAC Center.
Graph 4.26 - Recommendation for publication by division
Source: CLEAR-LAC Center. GDI: Gender and Diversity; CMF: Capital Markets and Financial Institutions: ENE: Energy; FMM: Fiscal and Municipal Management; LKM: Labor Markets; TSP: Transport; HUD: Housing and Urban Development; TRI: Trade and Investment; CTI: Competitiveness and Innovation; RND: Environmental, Rural Development & Disaster Risk; EDU: Education; ICS: Institutional Capacity of the State; SPH: Social Protection & Health
0%
10%
20%
30%
40%
50%
60%
It is not material for peer reviewjournal
Low rank journal Top field journal
0
2
4
6
8
10
12
14
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
GDI CMF ENE FMM LKM TSP HUD TRI CTI RND EDU ICS SPHIt is not material for peer review journal Low rank journal
Top field journal Total number of Proposals per divison
Appendix 1 Page 1 of 8
APPENDIX 1
Office of Evaluation and Oversight (OVE), IDB Impact Evaluations: Questionnaire on the Quality of Complete Evaluations
General
Evaluation:
First author:
Division: (drop down)
Date of design:
Date finished:
Published: yes/ no Where:
External Link
Type of Funding
Some evaluations include other sub-evaluations for which evaluation results are reported separately. These sub-studies often use different methodologies and must be summarized separately. A "module" is defined as a study component for which a distinct treatment group is identified and separate evaluation results are reported. If a study reports on four different trials of an intervention, each trial having a different treatment group, the study has four modules.
1. Relevance
Purpose. What is the main
purpose of this Impact
evaluation?
Fills a knowledge gap. Provides new evidence on the functioning of
new approach/program
Fills an operational knowledge requirement for the IDB. IE provides
evidence on the functioning of a new approach for a program
supported by the IDB
Replication. Provides evidence of the replication of a program used
in other country
Policy completion. IE provides incentives to complete the
implementation of a policy or program. IE is used as
accountability/transparency tool
Other (please justify)
I____|
Why?
Please provide a summary of
your previous classification
Appendix 1 Page 2 of 8
Does the report of the results of
the IE provide relevant literature
of how this evaluation can be
identified in the relevant policy
area?
Report does not provide a literature review
Provides irrelevant literature
Provides relevant but incomplete literature review
Provides relevant and exhaustive review
I____|
Please provide a brief
explanation for your response in
the previous question
Does the report includes a
section of the research protocol
followed in the evaluation,
including clear evaluation
question(s) and assumption(s) to
be tested?
Report does not include research protocol
Report provides a poor and not well-defined research protocol
Report provides a research protocol with minor flaws
Report includes a clear and complete research protocol
I____|
Please provide a brief justification
for your response in the previous
question
2. Methodology
Identified Methodology. What is
the main evaluation methodology
followed by the authors?
Experimental (RCT)
Regression discontinuity
IV
Matching
Multiple methods
Synthetic controls
Other quasi
I____|
Please provide a brief
explanation for your response in
the previous question. If there are
multiple methodologies, please
Appendix 1 Page 3 of 8
justify the classification of the
main one.
Evaluation of Methodology Methodology with serious shortcomings. Results are not reliable
Methodology rigorous in some respects, weak in others
Methodology rigorous in almost all respects (influences of independent variables extraneous to the purposed of the study been minimized -randomization, matching treatment, relevant statistically control-; minimization of error variance: Are the measures relatively free of error; sufficiency of power to detect meaningful differences)
I____|
Please provide a brief
explanation of your assessment
of the previous question.
Post treatment period. (time after
the baseline). How long after the
baseline, the follow-ups were
measured?
Follow Up
_____________________________
______________________________
______________________________
______________________________
Time after Baseline (months)
______________________________
______________________________
______________________________
______________________________
Do you consider the number of
follow-ups and their time
adequate for the objective of the
impact evaluation?
Yes
No
NA
Please provide a brief
explanation of your assessment
of the previous question.
Power calculations No Power calculations described in the document I____|
Appendix 1 Page 4 of 8
3. Data
Sample Size
|____|
How is the adequacy of the
sample size?
Unsatisfactory
Partly unsatisfactory
Partly satisfactory
Satisfactory
|____|
Please provide a brief
explanation of your
categorization in the previous
question.
Unit of Analysis. Please
characterize the main unit of
analysis in the evaluation
Individuals
Families
Classrooms
Schools
Communities
Geographical unit (block, municipio)
Other: _____________________
I____|
Mention of power calculations but with some important
shortcomings
Power calculations included in the proposal, but some clarifications
needed
Sufficiency of power to detect meaningful differences
Please provide a brief
explanation of your assessment
of the previous question.
Appendix 1 Page 5 of 8
3. Data
Do you consider this unit of
analysis as adequate?
Yes
No
NA
I____|
Please provide a brief
explanation of your
categorization in the previous
question.
Main type of data used in the IE Administrative
Survey collected in for the IE
Mixed
Other ______________
I____|
Data Quality to estimate the
impacts of the program/policy
No indication of how study variables were constructed or obtained
Some description on how the variables were constructed, but no
discussion on the reliability of the measures
Includes a description on how the variables were constructed.
Some reliability reported; not all measures demonstrated to be
reliable
Careful selection of relevant variables considering their prior use and reliability demonstrated for all or most of the measures (public codes, scales)
I____|
Please provide a brief
explanation of your
categorization in the previous
question.
4. Results
Outcome
Effect of the treatment Is this a “expected
sign” considering the
relevant literature
Comment
1 Positive
Negative
Yes
No
Appendix 1 Page 6 of 8
Null NA
2 Positive
Negative
Null
Positive
Negative
Null
3 Positive
Negative
Null
Positive
Negative
Null
4 Positive
Negative
Null
Positive
Negative
Null
5 Positive
Negative
Null
Positive
Negative
Null
Please provide a general
assessment of the reported
results in the evaluation.
Is any of the following
problems in the
implementation discussed in
the document
Attrition and non-
response
Yes No
Spill-over effects Yes No
Multiple-hypothesis
testing
Yes NO
In your opinion, are these
problems affecting the
reliability of the evaluation?
Yes
No
Appendix 1 Page 7 of 8
NA
Please provide a brief
explanation of your
categorization in the previous
question.
5. Overall Quality Assessment
How would you assess the
quality of this IE?
Unsatisfactory
Partly unsatisfactory
Partly satisfactory
Satisfactory
I____|
Where would you recommend
this report to be submitted for
publication?
It is not material for peer review journal
Low rank journal
Top field journal
Top Journal
I____|
In your professional opinion and
given the intervention, budget
and timeline, would you have
recommended to do this IE?
Yes
No
NA
I____|
Why?
Comments and observations
Appendix 1 Page 8 of 8
Name of the person who filled this template
Code |____|____|____|____|____|
Sources of information Evaluation 1
IDB proposal 2
Interviews 3
External documents 4
Webpage 5
NA 9
Date of completion
|____|____| |____|____| 2017
Day Month
Validated by Validator Code
|____|____|____|____|____|
Date of validation
|____|____| |____|____| 2017
Day Month
___________________________
Appendix 2 Page 1 of 6
APPENDIX 2
Office of Evaluation and Oversight (OVE), IDB Impact Evaluations: Questionnaire on the Quality of Proposals
General
Evaluation:
First author:
Division: (drop down)
Date of design:
Date finished:
Published: yes/ no Where:
External Link
Type of Funding
Some evaluations include other sub-evaluations for which evaluation results are reported separately. These sub-studies often use different methodologies and must be summarized separately. A "module" is defined as a study component for which a distinct treatment group is identified and separate evaluation results are reported. If a study reports on four different trials of an intervention, each trial having a different treatment group, the study has four modules.
1. Relevance of the Evaluation
1.1 Purpose. What is the main purpose of this Impact evaluation?
1. Fills a knowledge gap. Provides new evidence on the functioning of new approach/program.
2. Fills an operational knowledge requirement for th IDB. IE provides evidence on the functioning of a new approach for a program supported by the IDB
3. Replication. Provides evidence of the replication of a program used in other country
4. Policy completion. IE provides incentives to complete the implementation of a policy or program
5. Other (please justify)
I____|
1.2 Why?
Please provide a summary of
your previous classification
1.3 Does the proposal of the IE provides the relevant literature for this evaluation?
1. Does not provide a literature review 2. Provides irrelevant literature 3. Provides a relevant but incomplete literature review 4. Provides a relevant and exhaustive review
I____|
Appendix 2 Page 2 of 6
1.4 Please provide a brief explanation for your response in the previous question
1.5 Does the proposal include a research protocol with clear question(s) and assumption(s)?
1. Proposal does not include research protocol 2. Protocol is poor and not well-defined research question 3. Protocol has minor flaws 4. Proposal with a clear and complete protocol
I____|
1.6 Please provide a brief justification for your response in the previous question
2. Methodology
2.1 Identified Methodology. What is the main evaluation methodology selected for the evaluation?
1. Experimental (RCT) 2. Regression discontinuity 3. IV 4. Matching 5. Multiple methods 6. Synthetic controls 7. Other quasi
I____|
2.2 Please provide a brief description for your response in the previous question. If there are multiple methodologies, please justify the classification of the main one.
2.3 Evaluation of Methodology. Please rate the selected methodology
1. Methodology with serious shortcomings. Results are not reliable
2. Methodology rigorous in some respects, weak in others 3. Methodology rigorous in almost all respects (influences of
independent variables extraneous to the purposed of the study been minimized -randomization, matching treatment, relevant statistically control-; minimization of error variance: Are the measures relatively free of error; sufficiency of power to detect meaningful differences)
I____|
2.4 Please provide a brief explanation of your assessment of the previous question.
Appendix 2 Page 3 of 6
3. Data
3.1 Sample Size
|____|
3.2 How is the adequacy of the sample size?
1. Unsatisfactory 2. Partly unsatisfactory 3. Partly satisfactory 4. Satisfactory
|____|
3.3 Please provide a brief explanation of your categorization in the previous question.
2.5 Post treatment period (time after the baseline). How long after measuring baseline are scheduled?
Follow Up
______________________________
______________________________
______________________________
______________________________
Time after Baseline (months)
______________________________
______________________________
______________________________
______________________________
2.6 Do you consider the number of follow-ups and their time adequate for the objective of the impact evaluation?
1. Yes 2. No 3. NA
2.7 Please provide a brief explanation of your assessment of the previous question.
2.8 Power calculations 1. No Power calculations described in the document 2. Mention of power calculations but with some important
shortcomings 3. Power calculations included in the proposal, but some
clarifications needed 4. Sufficiency of power to detect meaningful differences
I____|
2.9 Please provide a brief explanation of your assessment of the previous question.
Appendix 2 Page 4 of 6
3. Data
3.4 Unit of Analysis. Please characterize the main unit of analysis in the evaluation
1. Individuals 2. Families 3. Classrooms 4. Schools 5. Communities 6. Geographical unit (block, municipio) 7. Other: _____________________
I____|
3.5 Do you consider this unit of analysis as adequate?
1. Yes 2. No 3. NA
I____|
3.6 Please provide a brief explanation of your categorization in the previous question.
3.7 Main type of data used in the IE
1. Administrative 2. Survey collected in for the IE 3. Mixed 4. Other ______________
I____|
3.8 Data Quality to estimate the impacts of the program/policy
1. No indication of how study variables were constructed or obtained
2. Some description on how the variables were constructed, but no discussion on the reliability of the measures
3. Includes a description on how the variables were constructed. Some reliability reported; not all measures demonstrated to be reliable
4. Careful selection of relevant variables considering their prior use and reliability demonstrated for all or most of the measures (public codes, scales)
I____|
3.9 Please provide a brief explanation of your categorization in the previous question.
Appendix 2 Page 5 of 6
4. Overall Quality Assessment
5.1 How would you assess the quality of this IE proposal?
1. Unsatisfactory 2. Partly unsatisfactory 3. Partly satisfactory 4. Satisfactory
I____|
5.2 Does this proposal has the potential to be submitted for publication?
1. It is not material for peer review journal 2. Low rank journal 3. Top field journal 4. Top Journal
I____|
5.3 In your professional opinion. Would you authorize this IE?
1. Yes 2. No 3. NA
I____|
5.4 Why?
5.5 Comments and observations
Appendix 2 Page 6 of 6
Name of the person who filled this template
Code |____|____|____|____|____|
Sources of information Evaluation 1
IDB proposal 2
Interviews 3
External documents 4
Webpage 5
NA 9
Date of completion
|____|____| |____|____| 2017
Day Month
Validated by Validator Code
|____|____|____|____|____|
Date of validation
|____|____| |____|____| 2017
Day Month
___________________________
Appendix 3 Page 1 of 4
APPENDIX 3
N° Operation
number Evaluation Name
1 AR-L1022 El impacto del Programa de Crédito para el Desarrollo de la Producción y el Empleo en la
Provincia de San Juan
2 EC-L1073 Evaluación de impacto intermedia y final del fondo de crédito de segundo piso de la
corporación nacional de finanzas populares
3 PR-L1062 PCR Anexo V Informe de Evaluación de Impacto, Análisis de Atribución de Resultados
4 AR-L1033 Evaluación de impacto del Programa de Acceso al Crédito y Competitividad para MiPyMEs
(PACC) en Argentina
5 AR-L1073 Knowledge Spillovers of Innovation Policy through Labor Mobility: An Impact Evaluation of
the FONTAR Program in Argentina
6 AR-L1073 Evaluating a Program of Public Funding of Scientific Activity. A case study of FONCYT
Argentina
7 AR-L1111 Evaluación del diferencial del aumento en producción científica en investigadores apoyador
por PICT y PAE vs grupo de control
8 AR-L1111
Evaluación del diferencial en el aumento de inversión en actividades innovativas respecto a
ventas entre empresas beneficiarias del Programa de Innovación Tecnológica vs grupo de
control
9 PE-T1247 Information and Communication Technologies, Prenatal Care Services and Neonatal
Health
10 RG-T2095 Evaluación de programas de apoyo público del Caribe
11 AR-L1038 Programa de Mejora de Enseñanza de las Ciencias Naturales y la Matemática
12 AR-T1128 Do scholarships and mentoring improve student performance
13 CH-T1061 Education quality and teacher practices
14 CH-T1138 The consequences of educational voucher reform in Chile
15 CO-L1010 Evaluación de impacto, de resultados y análisis de indicadores al programa de equidad en
educación en Bogotá.
16 CR-T1055 Computer assisted English language learning in Costa Rica
17 DR-T1084 Informe final English for Latin America
18 HO-L1062 Evaluación de Impacto Programa de Educación Primaria e Integración Tecnológica
(PEPIT): Componentes 2 y 3
19 JA-T1035 Technical Note Social Fund Jamaica
20 ME-L1033 Experimental Evidence on Credit Constrains
Appendix 3 Page 2 of 4
N° Operation
number Evaluation Name
21 ME-L1086 El Impacto del estado físico de los planteles educativos sobre la deserción y reprobación
de los estudiantes
22 ME-T1114 Effect on In Service Teacher on ESOL program
23 ME-T1190* Evaluación de impacto bécate
24 NI-L1009 Evaluación del programa PAININ Programa de Atención Integral a la niñez nicaragüenses
25 PN-L1064 Evaluación de impacto proyecto espacios educativos
26 PR-T1092 Evaluación pequeños matemáticos
27 RG-T1575 Using the infrastructure of conditional cash transfer for an integrated ECD program
28 RG-T1946 Challenges in educational reform
29 UR-L1093 Evaluación del Sistema CEIBAL
30 BO-L1040 Impacts of Technology Adoption in Small Subsistence Farmers in Bolivia
31 BO-L1084 Unraveling the Threads of Decentralized Community -Based Irrigation Systems on the
Welfare of Rural Households in Bolivia
32 DR-T1074 Evaluación de impacto al programa de apoyo a la innovación tecnológica agropecuaria
(PATCA II)
33 HA-L1059 Haiti Project of Technology Transfer to Small Farmers
34 HO-L1010 Evaluación de Impacto Pro negocios Rurales **
35 NI-L1020 PCR Anexo 3 Informe de la evaluación final y de impacto del programa apoyos productivos
agroalimentarios
36 NI-L1039 Informe Final PNT Preliminar
37 PE-T1155 Evaluando la efectividad del piloto de ciencia y ambiente Perú
38 ME-L1019 Informe de Evaluación de Impacto del programa Hábitat 2009-2011
39 CH-T1112 Childcare effect on maternal employment Evidence from Chile
40 ES-L1056 Evaluación de Impacto Proyecto Ciudad Mujer
41 RG-T1646 True Love Effectiveness of a School Based Program to Reduce Dating Violence among
adolescents in Mexico City
42 RG-T1908 Capacitación de funcionarias de Comisarías de Familia en Medellín sobre servicios
amigables para víctimas de violencia íntima de pareja
43 RG-T1908 Cómo marcar tres dígitos reduce la violencia de pareja, Colombia
44 RG-T2206 Sumq Warmi Reducing Violence Against Women in Microfinance
Appendix 3 Page 3 of 4
N° Operation
number Evaluation Name
45 BL-L1014 Impact Evaluation for the Youth Development Program ***
46 CO-T1246 Capacitación para la planeación del servicio de policía y la reducción del crimen
47 ME-T1232 Aprendiendo valores a través del deporte México
48 PN-L1003 Evaluación de impacto de un programa de inclusión social
49 RG-T2321 Evaluación de impacto de un programa de inclusión social y prevención de violencia
estudiantil
50 RG-T2377 Evaluación de Impacto Viajemos Seguras en México
51 RG-T2377 Violence and Birth outcomes evidence from Brazil
52 RG-T2377 The effects of punishment of crime in Colombia
53 UR-L1062 Evaluación de impacto de medidas aplicadas en la policía de Montevideo
54 CO-T1345 Evaluación de impacto de la reforma tributaria de 2012 a través de equilibrio general
55 DR-T1049 Are you (not) expecting? The unforeseen benefits of job training on teenage pregnancy
56 DR-T1049 Life skills, employability and training for disadvantaged youth: Evidence from a randomized
evaluation design
57 DR-T1103 Experimental Evidence on the long-term impacts of a youth training program
58 JA-L1005 Impact evaluation of the career advancement program in Jamaica
59 ME-L1084 Evaluación Final del Componente 1 del PROFORHCOM Programa de Formación de
Recursos Humanos Basada en Competencias
60 ME-T1190 Evaluación de impacto Bécate
61 PE-T1233 Impact Evaluation of the Job Youth Training Program Projoven
62 RG-T2199 Privately Managed Public Secondary Schools and Academic Achievement in Trinidad and
Tobago
63 RG-T2199 The effect of single sex education on academic outcomes and crime
64 RG-T2293 Do Remittances Help Smooth Consumption During Health Shocks? Evidence from
Jamaica.
65 RG-T2293 Healthy to Work Impact of Free Public Healthcare on Health Status and Labor Supply in
Jamaica
66 RG-T2293 The effects of natural disasters on labor market, do hurricanes increase informality
67 RG-T2293 Effects of Tropical Storms on Early Childhood Development
68 AR-L1142 Evaluar para seguir adelante: Resultados del Programa de Redes, 1
Appendix 3 Page 4 of 4
N° Operation
number Evaluation Name
69 AR-L1142 Evaluar para seguir adelante: Resultados del Programa de Redes, 2
70 BO-L1032 Evaluación de Impacto Programa Salud Materno Infantil
71 CO-T1020 Evaluación de impacto del programa de salud sexual y reproductiva en adolescentes en
Medellín
72 DR-L1039 Análisis del Sistema de Pagos del Programa de Transferencias Condicionadas de la
República Dominicana
73 DR-L1039 Casual effect of Competition on Prices and Quality: Evidence from field experiment
74 EC-T1111 On the effectiveness of child care centers
75 GU-T1089 Evaluación de impacto MIFAPRO
76 HO-L1071 Informe final primera evaluación de impacto del programa transferencias monetarias bono
10,00 Honduras
77 ME-L1052 Evaluación Externa del Programa de Desarrollo Humano Oportunidades
78 NI0155 Resumen de Resultados de la Evaluación de Impacto del programa Urbano
79 PE-L1154 Resultados - evaluación SAF FINAL limpio
80 PE-T1150 Impact evaluation of PARSALUD
81 PR-T1078 Evaluación de impacto Tekopora
82 RG-T1894 Integrating a Parenting Intervention with routine primary health care: a cluster randomized
trial
83 UR-L1046 Informe de resultados de evaluación de impacto del componente y consulta de los SOCAT
84 UR-L1046 Informe evaluación cercanías
85 VE-T1026 Evaluación nacional de orquestas y coros juveniles
86 PE-L1011 Evaluación de Impacto del Programa de Transporte Rural Descentralizado
Appendix 4 Page 1 of 3
APPENDIX 4
Num Operation
number
Proposal Name
1 AR-L1067 Diseño metodológico para una evaluación de impacto del Proyecto “Programa de Sustentabilidad
y Competitividad Forestal en Argentina.
2 AR-L1154 Programa de competitividad de economías regionales
3 AR-L1179 Nota metodológica para la evaluación de impacto. Programa de Mejoramiento de Barrios
4 AR-L1180 Programa de Apoyo a la Política de Mejoramiento de la Equidad Educativa (PROMEDU) IV.
5 AR-L1251 Programa de implementación del régimen nacional de ventanilla única de comercio exterior
argentino
6 BO-L1063 Programa de mejora de la gestión municipal
7 BO-L1079 Programa Multifase de Reordenamiento Urbano de La Ceja de El Alto
8 BO-L1096 Apoyos Directos para la Creación de Iniciativas Agroalimentarias Rurales II
9 BO-L1106 Programa Nacional de Riego con Enfoque de Cuenca III PRONAREC III
10 BO-T1193 Elaboración de línea base del proyecto SIPPASE-VRG
11 BO-T1259 Situación de salud, nutrición y saneamiento entre los niños menores de 12 meses en el Distrito 8
de El Alto.
12 BR-L1175 Acciones para el monitoreo y evaluación del programa (BR-L1175)
13 BR-L1176 Programa de Desarrollo Urbano de Polos Regionales de Ceará
14 BR-L1223 Plan de monitoreo y evaluación (PME) del programa de fortalecimiento de la prevención y
combate a la corrupción en la gestión pública brasilera
15 BR-L1287 Inclusión Social y Oportunidades para Jóvenes en Río de Janeiro.
16 BR-L1328 Programa de Aceleración del Desarrollo de la Educación de Amazonas.
17 BR-L1414 Programa Fortalecimiento la Inclusión Social y de Redes de Atención Proredes Fortaleza.
18 BR-L1415 Programa Fortalecimiento la Inclusión Social y de Redes de Atención Proredes Fortaleza.
19 CH-L1061 Apoyo para el establecimiento de un sistema integrado de comercio exterior (SICEX)
20 CH-L1064 Programa de Apoyo a la efectividad del SENCE
21 CH-L1064 Programa de apoyo a la efectividad del SENCE
22 CH-L1085 PROGRAMA DE MEJORA DE LA GESTION PUBLICA Y DE LOS SERVICIOS AL CIUDADANO
23 CH-L1105 PROGRAMA DE DESARROLLO Y FOMENTO INDIGENA
24 CO-L1126 Plan de monitoreo y evaluación
Appendix 4 Page 2 of 3
Num Operation
number
Proposal Name
25 CO-L1127 Arreglos para Monitoreo y Evaluación
26 CH-L1132 TERCER POGRAMA PARA EL FINANCIAMIENTO DE PROYECTOS DE INVERSION,
RECONVERSION PRODUCTIVA Y DESARROLLO EMPRESARIAL Y EXPORTADOR
27 CR-G1001 Salud Mesoamérica 2015-Costa Rica
28 CR-L1043 PROGRAMA DE INNOVACIÓN Y CAPITAL HUMANO PARA LA COMPETITIVIDAD
29 DR-L1048 Sanidad e Inocuidad Alimentaria (componente ganadería)
30 EC-L1087 Resultados obtenidos del levantamiento de la línea base de proyectos FERUM
31 EC-T1236 Mejoramiento de la calidad de la educación básica.
32 EC-T1236 Mejoramiento de la calidad de la educación básica.
33 EC-T1298 Diseño de evaluación transición hacia los procesos orales, Consejo de la Judicatura Ecuador
34 ES-L1057 Programa de apoyo al desarrollo productivo para la inserción internacional
35 ES-L1075 Programa de corredores productivos
36 ES-L1089 Préstamo global de crédito para el financiamiento del desarrollo productivo de El Salvador.
37 ES-L1092 CIUDAD MUJER FASE II APOYANDO EL EMPODERAMIENTO DE LAS MUJERES EN EL
CONTEXTO DEL PLAN DE LA ALIANZA PARA LA PROSPERIDAD DEL TRIANGULO NORTE
EN EL SALVADOR
38 GU-L1022 Programa de Mejoramiento del Acceso y Calidad de los Servicios de Nutrición y Salud.
39 GU-L1087 Programa de Mejoramiento de la Cobertura y la Modalidad Educativa.
40 GY-L1042 Citizen Security Strengthening Programme
41 HA-L1097 Natural Disaster Mitigation Program II: Climate Proofing of Agriculture in the Centre-Artibonite
Loop Arra.
42 HO-G1001 SALUD MESOAMÉRICA 2015 (Modelo fortalecido de AINC17)
43 HO-G1001 SALUD MESOAMÉRICA 2015 (Financiamiento SM2015 a nivel de gestor)
44 JA-L1043 Citizen Security and Justice Program III
45 ME-G1004 Evaluación de incentivos a la demanda para y subsidios al transporte para promover el parto
institucional y el acceso a servicios materno-infantiles en Chiapas.
46 ME-L1144 PROGRAMA DE APOYO A LA CAPACITACION Y AL EMPLEO
47 NI-L1092 Programa de Integración Vial
48 PE-L1062 Programa de Modernización de la Gestión para la Cobertura Universal en Salud.
Appendix 4 Page 3 of 3
Num Operation
number
Proposal Name
49 PE-L1135 Programa de apoyo al transporte subnacional (PATS).
50 PE-L1169 Programa de Mejoramiento de la Educación Inicial en Ayacucho, Huancavelica y Huánu.
51 PN-L1115 Fortalecimiento de Redes Integradas de Servicios de Salud.
52 PR-L1084 Evaluación de Impacto Programa de Mejoramiento de Caminos Vecinales Paraguay
53 RG-T2009 Sistema Regional de Evaluación de Impacto de Políticas de Seguridad Ciudadana para América
Latina
54 RG-T2095 Consulting services to design an experimental impact evaluation in Bolivia
55 SU-L1009 Support to improve the sustainability of the electricity service: Final Baseline Report.
56 UR-L1060 Programa de Apoyo a los servicios globales de exportación (Apoyos específicos a sectores
claves)
57 UR-L1064 Programa de Desarrollo Productivo Rural
58 UR-L1109 Programa de Mejora de los Servicios Públicos y de la Interacción Estado-Ciudadano
59 UR-L1110 Programa de Apoyo al Sistema Nacional Integrado de Cuidados.
Top Related