Impactful Quantitative Analysis in Evaluation Research

Quantitative approaches can be the vehicle deploying the scientific method in evaluation research. Good quantitative analyses start with using a program’s logical frame (or theory of change) to construct a testable evaluation hypothesis and to propose a null hypothesis that would hopefully be rejected. Proper statistical evidence may yield probable estimates of cause and effect in the relationship stated in the evaluation hypothesis. But quite often the findings of surveys are not cut-and-dry. When documenting correlation between variables, the evaluator still has to demonstrate that the association is not spurious. Statistics should not be confused for scientific evidence on their own right. It could have been a full moon the night of an exam passed by all students, but it would be laughable to infer a relationship between the two events. Even when statistical tests control for certain variables, this should not be confused with the experimental method. The latter can only be approximated in evaluation research. It is the gold standard, although it is usually too expensive for some programs. When there are two similar population groups (matched on key demographical indicators) with only one having received benefits from an intervention, the evaluator can confidently attribute any post-program differences to the program. Still, the evaluator must rule out alternative explanations.

Numbers are powerful summation of data, but lacking a proper analytical framing they are meaningless. An evaluator must first justify the use of the quantitative method by describing the reasons behind consulting statistical data and by demonstrating how specifically they are going to help the analysis. The key advantage of statistics is allowing the evaluator to make generalizations that apply beyond a few selected cases of study. But without a convincing explanation of the value and limitations of the research design and the collected data, the analysis will have little impact.

Internal and External Validity

Essential to high quality quantitative analysis is recognition of possible threats to data accuracy and representation. Several internal validity threats usually loom large and must be examined:

  1. History: Larger events could cause change affecting samples irrespective of the project being evaluated. Attributing the change to the project could be highly questionable. The evaluator should acknowledge these events and their impact on the evaluation.
  2. Maturation: This refers to the tendency of people to improve as they grow older and gain experience. How can an evaluator attribute their response to the intervention and not to their maturation? Addressing this possible explanation is necessary to hypothesis testing, because otherwise the null hypothesis that maturation cause the observed changes cannot be rejected.
  3. Construct validity: How accurate is the data collection tool? Does it measure what it is designed for? Validity could become a problem if the survey questionnaire is not implemented properly. For example, the extensive use of interpreters could lead to unreliable or inaccurate information entered in questionnaire forms.
  4. Confounding variables: A third variable, perhaps a seasonal factor, could affect the DVs— aside from the program. An evaluator can mitigate their influence by including other data, such as qualitative information, to dismiss these variables.
  5. Dissimilar groups: If the treatment group and control group in an experimental design are not matched in terms of key indicators or demographic characteristics, a plethora of independent variables could be alternative explanations; their effect must be accounted for.
  6. Mortality: This happens when respondents drop out of the study, leaving too many missing values. Threats to external validity can determine whether the findings represent the whole population or if cause and effect in one evaluation setting can apply to other settings. Reliable data collection instruments, randomization and appropriate hypothesis construction and testing are key to enhancing external validity.

Conflict of Interest

There is also an inherent challenge in evaluation research in that it is quite often funded by organizations that have managed or sponsored the very programs under evaluation. Silvera and Neiland demonstrate that people tend to satisfy their self-esteem needs by affirming the efficacy of what they do. This creates a natural conflict for evaluators whose jobs hinge on developing objective criteria for assessing the work of those who pay them. The risk of potentially delivering the bad news to program sponsors may lead to another evaluation trap: setting unjustifiably low performance expectations, which could mask deficiencies in program implementation and/or impact.

This happens without an intentional attempt to hide the truth. For example, in 2006 the British Education magazine Independent Schools Council in the UK criticized official claims of improved educational attainment in secondary schools on grounds that the national qualifications measures used by the government fell short of previously used international standards and thus artificially inflated performance levels. On the other hand, unjustified high expectations downplay important successes that may have been achieved. In short, the abuse of statistics can render quantitative measures of program performance meaningless or even shady. Balance, logic and evidence assessing competing explanations are key to developing a sound evaluation framework.


Michael Bamberger, Jim Rugh and Linda Mabry, RealWorld Evaluation: Working Under Budget, Time, Data, and Political Constraints. Thousand Oaks, California, SagePublications, 2012, pp.490-499. Philip A Schrodt, “Seven deadly sins of contemporary quantitative political analysis,” Journal of Peace Research, Vol. 51, No. 2, 2014, pp.287-300.
David Silvera and Tor Neiland, “Interpreting the uninterpretable: The effect of self-esteem on the interpretation of meaningless feedback,” Scandinavian Journal of Psychology, 2004, 45, pp.61–66.
“Official GCSE stats ‘meaningless’, claims school council” Education, October 20, 2006, p2.

About the Author

Dr. Mohamed Nimer has more than thirty years of research in international development and politics. He has administered surveys in hard to reach populations in the Middle East and North America in conditions where systematic demographic data are scant or completely missing. He improvised techniques and methods for random sampling and population estimates. He particularly focused on local and community-based programs of institution building and national and international policies impacting them. Dr. Nimer has been teaching Quantitative Methods in Monitoring and Evaluation since 2014. He earned his Ph.D. degree from the University of Utah.

To learn more about American University’s online MS in Measurement & Evaluation or online Graduate Certificate in Project Monitoring & Evaluation, request more information or call us toll free at 855-725-7614.