So, the Netherlands and the discipline of Social Psychology have their big fraud affair! For years Diederick Stapel had been inventing data and publishing like hell in peer-reviewed journals until three rather courageous junior colleagues finally managed to alert the University that something was fishy…The University of Tilburg investigated the case and came to the conclusion that this case not only revealed individual misbehaviour but also major dysfunctions of the academic field of social psychology.
I think that other social sciences would do well to keep their “Schadenfreude” deep inside, read chapter 5 of the report and rather have seriously critical look at practices in their own respective fields. Many of the weaknesses the Tilburg committee had identified for the Stapel’s case can be found generally throughout social sciences, even though not in this extreme form. It is the very nature of a good scandal that it is about extreme, exceptional acts. Yet,the outright fraudelent papers are only part of Stapel’s “oeuvre”, the much larger part of his publications were found too be simply characteristic of “sloppy science” as the report says. This sloppiness has reasons and these are pretty much the same as for other cases of fraud and imposture: fast reputation, fast money and telling the world what the world wants to hear. Stapel’s confessions are sympathetic in this respect: “I have created a world in which almost nothing ever went wrong, and everything was an understandable success. The world was perfect: exactly as expected, predicted, dreamed. In a strange, naive way I thought I was doing everybody a favor with this. That I was helping people. …”
Wanting to do good, dreaming up a world and reaping the benefits of a place in the spotlight are a poisenous mixture for any scientist. In much of the social sciences and academia generally being convinced of one’s own intelligence, intellectual beauty and importance is actually a quite essential quality to survive the shark’s bassin of competitors over grants, posts, and honours. Being shown around as a poster child among the powerful, and even if this happens only in a very small and secluded circle of let’s say “the development experts” or “the NGO advisors”, easily gets to the head of quite a lot of people. In an international studies conference, one will have more difficulties numbering out the colleagues who do not suffer from occasional fits of megalomania than counting those who are humble, devoted and reserved about their achievements. The latter happen to be also those who receive less prizes, are less promoted and who would committ such follies as not applying for a new grant facility for the simply reason that it is not in their habitual area of research… in short, those colleagues who are less visible, quieter and, hence, often considered as less succesful than their brawling, boasting and, eventually, overbearing colleagues. Yet, it also happens that their research is often much more thorough, detailled, painstaking, “data scratching” rather than “data crunching” and that their theoretical reflections as well as conclusions are hesitant, careful, obssessed with the specificities of their cases and, to put it in a nutshell, utterly “unsexy”. They refuse to be squeezed into two-word headlines and to be summarized in 300 word abstracts. Annoying, indeed. And disadvantaged in comparison to the loud researchers who have no problems to wash away cumbersome complexities of the social world in order to replace them with catchy labels and categories which show “impact” and “larger audience” qualities.
There are many complaints and since long that the peer-review system is not working well and indeed the main malaise of the academic world remains the overbearing influence of “peers”. The Tilburg Report states: “In the case of the fraud committed by Mr Stapel, the critical function of science has failed on all levels. Fundamental principles of scientific method have been ignored, or set aside as irrelevant.” (p. 54) and they say this not only with respect to the invented data but also with respect to other papers which display “sloppiness”. This sloppiness concerns numerous statistical flaws, misleading or missing information on the research procedures or manipulating the data in a way that it shows the desired results (for instance omitting variables, “shaving” off outliers to enhance significance etc.). The committee is appalled that these errors, omissions, mistakes and flaws have not been detected and denounced by colleagues, journal reviewers, editors or simply attentive readers.
When talking about conflict studies, let’s examine for instance those econometric methods which are so en vogue. Of course, there is not any study that committs uses fabricated data as Stapel did. Yet, there is a lot of sloppiness and complacent in-circle reasoning that lets slip more than one dubious hypothesis and finding through the net of critical examination. Much of this is due to the relatively great institutional power and visibility this research area has gained in recent decades, among others by advising international bodies and national development agencies on questions of development aid and security, the infamous “greed hypothesis” which I will discuss later being a case in point. At any given international studies conference of the past years, there will be easily three numbercrunchers for one qualitative working social scientist and at least five colleagues using some kind of decision making model based on rational choice for every one colleague having talked or at least listened to people in armed conflict (you do not always have to talk to them yourself as I will discuss further below). Econometric methods clout their analyses in the aura of natural science preciseness and objectivity, and usually strictly avoid discussing any of their assumptions, methods or findings in a reflective and critical way.
What is particularly fascinating about the numbercrunching colleagues is that they tend to use all the same data despite loud and recurrent criticism. It is for instance entirely normal to teach a critical understanding of GDP figures in any high school economics class; but it is still rare that econometrists working on conflicts and poverty will critically discuss the explanatory value of GDP figures. They are simply used as “proxy” for economic performance no matter if we can have major doubts that GDP actually tells us something about national economies or not, or if they, indeed, available in sufficient quality for those countries we are interested in when investigating civil wars in the past two decades. If GDP figures are not available in good quality, other indicators which are derived from GDP are not so either. And yet, Gini coefficients are for instance widely used particularly in those studies which aim at proving that there is a direct relationship between poverty and violence like the infamous “Greed vs. Grievance Study” of Paul Collier, at the time adivsor to the World Bank, and Anke Hoeffler, at the time junior scientist in Collier’s team at Oxford (so far for the glamour of research). Taking the same data set as used in the Collier and Hoeffler Study of 2004 , it was only possible to identify Gini coefficients of good qualityfor four out of the 79 cases. Crucially, the entire hypothesis that grievances do not play a major role in civil war outbreaks hinges upon the argument that inequality, measured by the proxy of the Gini coffefficient, had no significant positive correlation with war outbreak.
How much critique does it need to invalidate an analysis and is this dependent on the author’s status? It needs masses and the more popular the author is the less likely is it that sharp critique will be heard. Nicolas Sambanis and Harvard Hegre for instance, both by no means big critics of numbercrunching, showed in their article on civil wars and the PRIO dataset that slight changes to the coding of civil wars already had a major impact on the results. This critique was published; Sambanis’ very long detailled discussion of every single proxy used by Collier and Hoeffler, and how it NOT contributes to our analysis of war is only available as working paper on his webpage. As another colleague said in 2011 “It took over 10 years argument to get over Collier’s and Hoeffler’s greed hypothesis; they have diverted much needed attention and energy from the study of civil wars”.
Sambanis discussion of proxies also points to the observation that many studies contain already major flaws in their very conception not only in the data they use or the statistical methods they employ. An extraordinary example of such studies can be found in Macartan Humphrey’s and Jeremy Weinstein’s work. Methodologically their work is certainly absolutely flawless and the way they put their data at disposition for replication is extremely laudable. Yet, the very conception of some of their studies are, to say the least, astonishing. For their survey of ex-fighters in Sierra Leone which was published in 2008 undert the title “Who fights?”, the authors had interviewed members of the Sierra Leonean RUF and Self defence units who were being demobilised. The survey produced a wide array of interesting data on the origins of these fighters and contained also a large section that sought to explore their motives of taking up arms…and it is here where a look at the original questionnaire makes the critical mind wonder.
Both authors indicate that their interviewees were commonly at the beginning of their twenties at the time of their interview. They were also in the large majority of rural background. Most of them had merely finished elementary schooling before joining their respective combat unit. One of the questions to assess their political awareness asks: “Which political party or group did you support before the conflict began?”. What seems to be a question that is perfectly fine when asked in the run-up to the US presidential elections becomes extremely irrealistic when asked Sierra Leoneans who were at the outbreak of the war, 12 years earlier, around 10-13, who lived in large isolation of the capital city where party politics took place and who, as the findings of their own survey, were barely literate.
Further down, Weinstein and Humphreys ask in several questions for the motives of joining the warring factions. At each question the choice of answers that indicate material motives outnumber other choices. Answers indicating material incentives are explicit and concrete; answers indicating political goals are worded in very abstract and cloudy sentences. For instance: “What did the group tell you you would gain from joining?” with the choice of answers “1. Money, 2. Diamonds, 3. Women/ Men, 4. Food, 5. A Job, 6. Land, 7. A way to improve the situation in Sierra Leone, 8. That my family would be protected, 9. A possibility to get revenge, 10. other” … the ex-fighters would have needed to be fine ideologists to answer 7 above all and alone. There are other startling examples in the questionnaire which tell a lot about the authors’ preconceived ideas and how the questionnaire was streamlined to produce the inevitable result that political motives were irrelevant as compared to material motives; a conclusion that so shortly after the war and at the moment where there was the large international support for the conservative-liberal President Kabbah was exactly what the UN and other international donors wanted to hear…
Both authors are very transparent about the data and the statistical methods they use (although I cannot find the link to the questionnaire anymore…). To mention them in a blog post that starts with a link to a ousted fraudster seems extremely unfair. Yet, my aim is to push the nail of sloppiness in social sciences further in. It is actually not sloppiness but more or less conscious complacency and power schmoozing that is at the heart of the matter. In some quarters Humphreys and Weinstein’s work has been hailed as being brilliant because they would be the first to have asked fighters why they fight…a statement that ignores all the detailed and on-the-ground work that had been done before but which, unfortunately, had come to conclusions that neither pleased the UN nor Western donor agencies (for instance Paul Richards “Fighting for the rainforest” and Krijn Peters earlier publications of the research of his book). The statistics additionally give these findings the aura of the “scientific” and the “objective”, hence providing a legitimation for the results that is rather based on the reader’s (willing) ignorance of the arbitrariness of survey methods. Such ignorance has a reason and that is that not only authors sometimes only like to publish what they like but readers too only like to read what they think they know already.
The formation of cliques, schools of thought, chapels and sects and their grip to institutional power in the form of university chairs, tenure committees, professional association committees, editorial boards of journals and lucrative advisor jobs for government and IOs has yet to be broken. What the Stapel Affair so brillantly shows is that whoever has gained the admiration and confidence of those illustre circles can go very far in writing whatever pleases and confirms received ideas. Critical voices are not only less published ; they are also less sollicited by those who confer external legitimacy to fashionable research, namely government agencies, international organisations etc. It is not only the scientific community that needs to rethink the way it pushes “likeable” papers and suppresses the annoying ones (a review of mine that contained the above criticism and more was rejected by one journal reviewer in one single paragraph which quintessentially said “this is too critical, I don’t like it”, an experience other critics of these approaches above know all too well). Those at who this research is addressed have to rethink, too, if they prefer to read what they know and think already or if they want thoroughly researched, alas uncomfortable truths that eventually could lead to real policy change.
 Klaus Deininger and Lyn Squire, “A New Data Set Measuring Income Inequality”, World Bank Economic Review 10 (1996): 565-591.
 Deininger and Squire distinguish the quality of their data according to the reliability of their sources; acceptable quality means that the income surveys on which the calculations of the Gini coefficient are based cover the entire national territory and are representative of the populations’ income. In most of the cases here where the quality was not acceptable the weakness was that survey data did not cover the whole national territory.