The Social Science of Measuring Impact

Economics, and other related social science, have undergone a systematic evolution of randomized experimentation and econometric practices in managing and analyzing data over time.

Shereein Saraf

Shereein Saraf

November 16, 2020 / 8:00 AM IST

Social Science of Measuring Impact

Economics, and other related social science, have undergone a systematic evolution of randomized experimentation and econometric practices in managing and analyzing data over time.

For a long time, the physical sciences have proved complex theories using neat mathematical tools. But in the case of social science, such concrete measures are simply unavailable, owing to the irrational nature and individual preferences – such as the extent of risk tolerance, time-preferences, and cooperation – of human beings.

The study of behavioral economics and its relatability to the field of development economics has made the neoclassical assumptions redundant in impact evaluation. The development economists, using econometric or regression analysis in the 1950s, now employ experimental and quasi-experimental techniques to test whether these neoclassical assumptions hold.

The forerunners of Randomized Controlled Trials in economic sciences, a method well-known in the field of pharmacology and medical sciences, were the Nobel Prize-winning economists Abhijit Banerjee, Esther Duflo, and Michael Kremer. They led what is now a two-decade-long revolution in measuring impact and influencing local and national policies across the globe. 

In theory, a Randomized Controlled Trial is an experimental approach to evaluate the impact of policy intervention that alters behavior. The practical approach to conducting such an experiment is selecting two or more groups of people – the control group and the treatment group. Both groups are statistically identical in terms of demographics and other variables, but the administered intervention. 

While the control group controls the impact of the intervention, the treatment group studies the outcomes of that intervention post the experimentation period. This method allows for inferring causality and pinpoints the origin of behavior change. 

The logic behind this technique is simple yet complex – the method uses two groups as we cannot control and not control, at the same time, for the same set of people to study the impact of the intervention. So we create another group of people, who have similar characteristics, to compare with the treatment group after the experimentation period. The methodology uses the average of these attributes to study these differences and make the estimate more reliable. These averaged effects are called Average Treatment Effects (ATE), which come from the Rubin Causal Model, commonly used in epidemiology. 

In other words, the Average Treatment Effect (ATE) is the difference between the average outcomes from the treatment group and the control group. We can observe the total mean, while we cannot observe individual treatment effects. This difference is an unbiased estimate. This design checks for the internal validity of the experiment – whether the same results apply to everyone in the sample. 

What it fails to factor in is the external validity of the intervention – whether the results apply for other settings or another group of people. Generally, the literature available tests for specific questions in distinct locations with people from a varied cultural, psychological, and socio-economic background. So the cause-and-effect criterion may be only locally causal. To test such a hypothesis, researchers try several field experiments in different countries and locations. If they obtain a similar result, the experimental design is externally valid, otherwise not. 

Contrary to what the proponents of Randomized Controlled Trials claim, Agnus Deaton, another Nobel laureate for his work in the field of development economics, questions this school of thought for impact evaluation. In one of his recent papers – Understanding and Misunderstanding Randomized Controlled Trials – he presents a critical argument for estimates given by Randomised Controlled Trials. 

Randomization does not equalize all the characteristics of the treatment and the control group, as all individuals have different, even though quite similar preferences. So the Average Treatment Effects (ATE) are not precise in estimating the impact of an intervention. It also ignores questions about other observed and unobserved covariates. (Deaton and Cartwright, 2018)

Moreover, it estimates counterfactuals by using another set of people – the control group – which might or might not be alike individually. The problem with such an estimation is that it assumes that counterfactuals are precise to confirm the outcomes of the intervention. 

Confounding – or outcomes due to a hidden cause – is common in such experiments, which makes causal inference invalid. Interestingly enough, a solution to this problem of internal validity is randomizing the sample. There are other limitations too. Attrition – or non-compliance with the study – experimenter bias, or historical events that influence the experiment are some of the potential threats. 

Even to conduct and analyze a Randomized Controlled Trial, other frameworks – theoretical or empirical – need to support the results and make it usable for policymakers and other practitioners. Instead of using these methods for all the development questions that arise, there is a need to use concrete approaches to maintain precision and eliminate bias.

Instead, when Randomized Controlled Trials are not conceivable, researchers employ quasi-randomization techniques using Instrumental Variables or Natural Experiments, whichever is feasible. In such a case, there lie two shortcomings – misconstrued exogeneity of variables and mistreatment of heterogeneity. 

Although the primary data from such field experiments is beneficial to study to apply in policymaking and implementation, these studies are highly costly, require large personnel, and are hard to evaluate. In the current scenario, this method overwhelmingly crowds the development economics literature, leaving behind how economists perceived models with fundamental assumptions to simulate novel results using the repository of secondary data. After successful two decades and increasing popularity, will it soon be superseded by another methodology, or will it continue to dominate future researches in this field?