The qualitative/quantitative divide is sort of useless. Focus on replicability instead.

I’ve decided to create a new “no longer useful” tag for posts about topics that social researchers seem to harp on a lot but for which it seems we have already derived all of the useful lessons to be had. I’ve gone back and appended this label to my post on the George Box quote that “all models are wrong but some are useful,” and to my most recent post on the lack of individual objectivity among scientists. I don’t think my posts will put these sort of assertions to bed, but I feel better adding my voice to those who think they ought to be put to bed.

And now, my latest addition to the “no longer useful” category: the convention of distinguishing between qualitative and quantitative research, researchers, methods, etc. These two categories seem to be one of the most common ways to talk about social research. Many journals have built up reputations for publishing or being receptive to only one or the other category of research, and vocal members of different disciplines often criticize their journals and professional organizations for being too qualitative or too quantitative.

This debate may have been useful at some point, but it doesn’t seem very useful now. Part of the reason, I think, has to do with the ambiguity with which people use the terms. But mostly I’m concerned that qualitative stuff and quantitative stuff isn’t really different stuff. It doesn’t help to pretend that it is.

The Current Definitions Aren’t Clear

The qualitative/quantitative dichotomy seems to be used in two different ways. I’ve seen researchers use the terms in the ontological sense to characterize those things that made up the focus of their research, but just as often I’ve heard researchers use the same terms to characterize the manner in which they conducted their research. The sense in which people use the terms has implications for assessing the usefulness of those terms.

“Qualitative research” in the ontological sense is research employed to describe, predict, and/or explain differences in kind, whereas “quantitative research” in that same, ontological sense is employed to describe, predict, and/or explain differences in degree. I’ve tried hard to understand how it could be a good idea to make that ontological distinction, and it’s just not working for me. Quantitative differences are just qualitative differences that have been counted, and qualitative differences are often just quantitative differences that have been binned into categories for the sake of convenience. I don’t see what is to be gained by focusing on only one or the other side of the same coin.

The epistemological definitions are, for me, a little harder to pin down. It seems that “qualitative research” in the epistemological sense is often talked about as research that describes or interprets, as opposed to “quantitative research” which measures and counts. That seems like a weak distinction, since there are plenty of ways to use numbers to describe and plenty of ways to describe numbers, and quantified information is no easier to understand without interpretation – and is no less able to aid in interpretation – than is non-quantified information. I think a lot of researchers who fall at various points along the qualitative-quantitative spectrum of self-identification recognize this. Sometimes I get the sense that self-identified “qualitative” researchers consider themselves the researchers of un-measurable things like “identity” or “meaning,” but that ignores all the research into those issues by cognitive scientists and neuroscientists, who generally (I think) don’t place themselves under the qualitative banner, and it likewise ignores that just recording an observation is an act of measurement, even if it is not recorded using a technical instrument or a standardized scale.

Clearer Definitions Make the Categories Unnecessary

I do think there is an underlying difference in research priority that warrants categorization. I just don’t think the qualitative/quantitative distinction characterizes that difference very well.

The useful distinction, I think, deals with the issue of replicability. I don’t mean replicability of results – I know the fairy-tale version of science tells a story of how one researcher conducts a study and gets a certain sets of results, and then other scientists use similar methods in a similar setting and get similar results, and that bolsters the validity of the original findings. I think that standard of replicability is kind of unrealistic. For one thing, it seems like it’s pretty hard to get the funding to copy someone else’s stuff, and even if you get the funding it’s hard to publish a copy of someone else’s stuff. It happens, but it seems that researchers are much more focused on doing something different from what other people have done, and it’s easy for me to attribute that focus to the dynamics of the research market itself.

But even if replication were as easy to fund and publish as non-replication, I still don’t think it would be realistic to expect that replication to appreciably increase the tendency for social and behavioral researchers to coalesce in their opinions about specific topics or theories. Replication of findings requires a large degree of control over the research settings, and identification of many if not most of the different factors that could impact the result. Most social and behavioral research doesn’t take place under controlled conditions, and our collective understanding of most social and behavioral problems is shaky enough – and our data collection methods are limited enough – that most studies measure only a very small subset of the things that could impact the topic of interest.

I think, instead, it’s useful to distinguish between studies that lay out data collection, cleaning, organization, analysis, and reporting methods clearly enough that any researcher could potentially do the study him- or herself from start to finish without requiring any hand-holding from the original authors, and studies that do not. The studies in the first category are replicable in that they are technically able to be replicated. Studies in the second category do not provide enough information for even an attempted replication to be possible.

No matter whether a researcher defines the qualitative/quantitative distinction in ontological or epistemological terms, the replicability distinction remains important. Research that is technically able to be replicated can be used as the basis for future research, but it’s also a fair admission into any argument about what people really tend to do or not do, or which theoretical concepts are really valid, or which finding have really stood the test of time.

No matter what other arguments someone might make, a researcher can always point to replicable research and say, “Look, I know you disagree with me. That’s fine. I’ve laid out exactly what I did and didn’t do. You can see all of it. Show me where I made a mistake with the data I had, or do your own replicable study and show me how I would have reached different conclusions if I had had different data. Otherwise, even though you may actually have valid points, you’re not giving me any reason to believe what you say.”

Basically, replicable research gives a researcher the right to make the put-up-or-shut-up argument. Non-replicable research doesn’t do that. I’ve done a lot of non-replicable research, and I think it’s valuable. It can help generate ideas, or counter preconceived notions, or even just give a researcher a “feel” for the subject he or she is studying. But it can’t be evidence. For observations to be evidence of a general tendency or pattern, a researcher needs to demonstrate that he or she did not cherry pick those observations (or the results based on an analysis of those observations). A non-replicable study basically says, “Trust me. I really saw what I said I saw and I really didn’t make any mistakes and I really didn’t omit any relevant information and I really understood the whole thing correctly.” No. I don’t trust you. I have no reason to trust you, and you have no reason to trust me. That’s why replicability is important.

Obviously, the replicable/non-replicable distinction is a spectrum: studies can be more or less replicable and there will always be fights over whether a study was replicable enough. But at least that’s a fight that matters. I don’t care if a researcher studies differences of kind or differences of degree, and I don’t care if a researcher prioritizes interpretation over description or the other way around. I care about having some reason to believe a set of findings, and the best way to find that reason is to be able to look at everything a researcher did, from start to finish, and not be able to find anything wrong that could have reasonably been done right. My minimum standard for trusting what a researcher says is the provision that the researcher gives me at least the opportunity find something wrong.

It seems sort of beside the point to state that a qualitative or a quantitative or a “mixed-methods” approach would have been more appropriate in this or that research situation. The distinction doesn’t help us evaluate the research or even have a useful debate about the merits of the findings. All such statements do is point out that there are always other methods a researcher could have used to tackle a problem. I think everyone already knows that.


18 thoughts on “The qualitative/quantitative divide is sort of useless. Focus on replicability instead.

  1. Nice essay. Another possible line of development is to consider the pragmatics of research. Got this from a methods book way back in the 1960s.

    Envision a standard four-cell table. In the lower left-hand cell, one question is asked of one subject. Sometimes the answer can be important, e.g., “Will you marry me?” In the upper left-hand cell is one question of many subjects, good for hypothesis-testing. In the lower right-hand cell is many questions of one subject, good if the subject is an expert and you know nothing at all. In the upper right-hand cell, many questions are asked of many subjects. For most academic researchers this approach is not feasible, it costs too much in time and effort. This, however, is where technology is making a difference. With big data sets and fast processing, it is now possible to conduct exploratory research involving multiple hypotheses, tested in sequence or simultaneously in a way that was literally impossible in the days when FORTRAN programs were key punched onto Hollerith cards and handed over to the computer center for feedback a day later. And what you’ll really like, this kind of research is replicable and challengeable in spades. If you can do it, so can anyone else with the same data sets and software, and if they have a better idea they can demonstrate what they’re talking about. Pretty cool, what.

  2. I’ve always ignored this distinction myself, for pretty much the same reasons as you state here, but your proposal of replicability as the basis of categorization really helps clarify the uses of research. Thanks. I liked John’s categories, too. I will be applying both your sets of categories to what I read.

  3. John(s),

    Thanks for your comments.

    I’ve seen that four-square breakdown of research approaches before, and I do think it’s useful. I my personal experience (which may be an entirely untrustworthy sample of what researchers tend to do in general), people who go with the many questions/one person approach tend to downplay the extent to which replicability is feasible or even desirable. On the other hand, I’ve seen a lot of people who do the many questions/many people approach who seem to assume that that approach is replicable by its very nature. Both sets of assumptions seem problematic. Any of the four approaches you outline can be replicable if the researcher lays out his or her collection, cleaning, organization, and analysis of the data, and all four approaches can be non-replicable if the researcher doesn’t do those things.

    The position I take issue with that originally prompted me to write the post is the assertion that a certain type of method is more or less able to contribute to some aspect of our understanding of people’s behavior. Every method has problems, if not in its design then at least in its implementation, so the extent to which any method is useful depends upon the extent to which findings from that method can be compared to and combined with other findings, and replicable research is the only type of research for which that comparison and combination can be done. The qualitative- or quantitative-ness of the method seems like a rather trivial detail.

  4. My work involves designing, distributing and analysing surveys. The easy part is measuring the quantitative parts. The harder, but more rewarding, the qualitative. Your comment: “qualitative differences are often just quantitative differences that have been binned into categories for the sake of convenience” has some truth, in that I take, say, 800 open responses, categorise them, code and node them and extract a measurable result, but the difference is that, unlike the quantitative questions, designed by the researcher which allow only a set reply, open responses can express and give some truly refreshing results showing human and unexpected answers. My quanlitative analysis has changed the shape of town buildings and roads and been used in policy to improve vulnerable people’s lives. They are chalk and cheese, Schaun!

  5. Julia,

    You seem to be equating the extent to which an analysis informs decisions and the extent to which an analysis is valid. The fact that your analysis has been used to make changes to towns or policies doesn’t mean the analysis was right. That certainly doesn’t mean it was wrong, either! I’m just saying that adoption of findings and validity of findings are two different things.

    I agree with everything you say about open-response questions. I use them myself, and they often give me insights that I couldn’t have gotten from standardized measures. But standardized measures often give me insights that I never could have gotten from the open-ended responses. Neither qualitative nor quantitative approaches have a monopoly on insight or informativeness or anything else.

    And those open-ended responses don’t necessarily fall into the “non-replicable” category of research that I talked about in the original post. If you can explicitly lay out how you categorized, coded, and noded those responses to the point that someone else could do it without you needing to guide them along the process, then that would be replicable.

    I guess I’m still just not seeing how qualitative and quantitative approaches are so different, except on the very surface. Both involve choices on the part of the researcher about what information is relevant and what information is not relevant. Both are constrained by the realities of data collection technologies. Both can offer insight and surprises. Both can facilitate confirmation bias. Both can be adopted by third parties in the service of particular projects, policies, or programs.

    One last thing: as a researcher who has spent a lot of time doing both “qualitative” and “quantitative” analyses, I have to disagree with your characterization of qualitative research as both harder and more rewarding. I’ve done incredibly hard statistical analyses and incredibly hard interview and focus-group coding. And at different times I’ve had both types of analysis yield either incredibly useful, exciting, insightful results, or nothing at all.

  6. Dear Schaun,

    I think you may be missing the important complimentary nature of mixed qualitative and quantitative methods, wherein qualitative methods can be employed to inform the creation of quantitative instruments that can then be administered to a broader sample of the population and the quantitative results can be brought back to a usually smaller (albeit, not necessarily) group of respondents from the target population to help you interpret the results from their “insider” perspective.

    For example, I recently conducted interviews with a randomly selected group of people who were involved in a matched savings program in which tihey made contributions to an account that was earmarked for a specified goal (education, starting a small business, buying a house) which were then matched by the federal government and once again by our nonprofit organization. I used SPSS Text Analytics to extract (data driven) and to search for (conceptually driven) themes (memes?) based on Natural Language Processing that has a psycholinguistic basis rooted in meaning. Yes, I created quantitative variables and assessed their frequencies and make a preliminary assessment of the bivariate relationships among these variables.

    This initial analysis helped me to create a survey instrument with at least three questions for each theme so that I could later create structural equation models in which the latent variables were “identified” (i.e., I had enough information for each theme to cover the cost of estimating the parameters for each latent variable). Once I had administered the instrument to the entire sample of participants, I performed my usual reliability analyses (the corrected item-total correlations, coefficient alphas and Item-Response analyses – I didn’t have longitudinal data at this point, so I couldn’t compute the test-retest reliability yet), then took the results, including the multiple regression and structural equation models I had confirmed and asked a randomly selected subset of the participants what they thought the results meant and asked how I might modify the structural equation models to better represent the nature of their experiences. As you know, many different structural equations can fit a single dataset, so asking the target population their opinion about which one reflected their experience, I gained insight that I would not have obtained by simply placing my way of thinking (based on theory and previous evidence) onto the quantitative dataset.

    My points are:

    1. It was cost-effective not to have to interview the entire group of participants; albeit, I could see an internet based assessment in which all of the participants typed in their responses or called into an automated system that could turn their responses into text via speech recognition software.

    2. The variables created via Natural Language Processing were possibly more objective than variables coded by humans according to themes I would otherwise just made up in my head. Albeit, I also searched for themes that I expected to see based on theory and previous evidence, but these too were coded according to the “deep structure” (meaning-based aspects) of language that may be less susceptible to human bias based on self-fulfilling expectations. However, I could also see having human coders code the data in case there is something that they would find that would be different than the automatic analysis and it would be interesting to see the degree of agreement between human (at least two, usually three so that you can calculate coefficient alpha) and machine interpretations.

    3. The themes coded on the basis of the focus group responses were effectively manifest variables; thus, the power of being able to properly assign true score versus error variance was not contained in the dataset. It may be possible to construct latent variables by examining the bivariate relationships among the quantified qualitative responses, but that would be an ad hoc process that would undermine the validity of the latent variables so constructed.

    4. I developed a quantitative instrument that possessed enough information to represent each of the themes as latent variables and for which I could compute reliability analyses that are not possible with a single variable coding a single theme. So, I upped my ability to perform far more powerful exploratory analyses and conceptually-driven confirmatory analyses to model the dataset.

    5. I let the participants inform me about their interpretation of the results, something that gave me greater insight into their nature and allowed me to select the structural equation model that resonanted most closely with the participant’s experience. In my wildest dreams, I could also compare the model they thought was the best with the ones various theories would support and find out which ones produced the best fit based on an analysis in which I would compute the difference in chi-squares.

    In conclusion, I think the distinction between quantitative and qualitative research is quite valid from both a definitional (ontological) and epistemological (our sources of knowledge) basis. The important point, however, is that these are complimentary, not exclusionary methods of research.

    Very truly yours,
    Bill Lapp

  7. Dear Schaun,

    On the subject of replicability, I have found it to be quite efficacious for purposes of funding, publication and validity to use research designs that replicate previous research and then extend it to include new possibilities. This way I can assess whether or not my methods captured the phenomenon in question and then show why its occurrence is possibly attributable or controlled by some other factor that I may have predicted on a theoretical basis. Yes, you can get money and publications on the basis of replicating what someone has done before (in fact, it’s kind of required), you just have to expand the study to include new possibilities – at least it has worked for me for thirty years of performing highly replicable research.

    Yours truly,
    Bill Lapp

  8. Bill,

    Thank you for your very thoughtful comments. I’ve read through them a couple times now and I’m not sure if I’ve misunderstood them, of if I wasn’t clear enough in my original post, since what you are arguing seems to be what I thought I was arguing in the first place!

    I do think the quantitative/qualitative distinction, if carefully defined to avoid the ambiguities I mentioned in the post, can be entirely valid. My argument is that that valid distinction is not a useful way of talking about research. If the distinction is used in an exclusionary way, then all it does is allow researchers to pretend that one or the other kind of approach is sufficient in and of itself. In the distinction is used in a complementary way, which is what I would generally advocate and what I think most researchers do whether they realize it or not, the distinction still isn’t very helpful in evaluating the research findings. In short, I tried to argue that the distinction is sometimes valid but very rarely useful, and that turning our attention to other issues could be a more fruitful way of discussing social research.

    That brings us to the issue of replicability. Remember, I carefully defined that word in the original post. In its most strict definition, it is impossible to both replicate what someone has done as well as expand the study to include new possibilities. Including those new possibilities constitutes a change in the research design of the original study, and is therefore not a pure replication. It was replication in that pure sense that I said was neither very feasible nor likely to get funded.

    But the definition of replicability that I advocated was that of something being technically able to be replicated if someone desired to try. It means laying out all steps of the research so someone could at least try to do a pure replication. The purpose of that detailed record keeping is not to then expect that pure replication to take place. It’s to ensure that people have all the information they would ever need to critique one’s work. The extent to which a research project openly offers up the means for people to refute it is a better measure of that project’s findings than is the extent to which it used qualitative or quantitative (or both kinds of) methods.

  9. Hi Sean,

    Thanks for engaging me in dialogue because I love conversing about these matters and learning new things from other people’s perspective. I can give you a concrete example of how you can do a replication and extend the research to include new possibilities;

    Lapp, W. M., Collins, R. L., Zywiak, W. H., & Izzo, C. V. (1994). Psychopharmacological effects of alcohol on time perception: The extended balanced placebo design. Journal of Studies on Alcohol, 55, 96-112.

    In this study, I addressed a critque about the orignial balanced placebo design in which Ross and Pihl said it was limited to low doses of alcohol. The original design manipulates the actual dose of alcohol with the expectation of the subject that s/he is receiving alcohol, thereby teasing apart the pharmacological and psychological aspects of alcohol intoxication. Ross and Pihl created a “high dose” version of the balanced placebo design by telling/acting out the process of receiving a low versus a high dose of alcohol and then either actually giving them a low or high dose of alcohol. However, the high dose version leaves out the control condition of neither expecting nor receiving alcohol and the pure psychological effect of not getting alcohol but expecting to have received it and the pure pharmacological effects of expecting no alcohol but receiving some. What I did was to use a combination of both designs, while leaving out the impossible conditions: (a) you cannot tell someone they are getting a high dose of alcohol and then give them none, and (b) you cannot tell someone they received no alcohol and give them a high dose. That leaves some holes in the factorial design, but it is perfectly interpretable using the sequential sums of squares method (also called the “hierarchical method” or type I SS) for computing the ANOVA. Thus, I replicated both designs and extended it to include the merits of both designs. There was effectively no change in the conditions, just an extention of the two designs that made for a stronger, more inclusive research methodology.

  10. I like the mixed method approach which allows one to ‘pursue the scientific quarry into every fenced off field into which it may wonder’. This also allows Sociological theory and hypotheses to fly free with every chance of being tested, proven or disproved.

  11. Bill,

    Sorry it’s taken me a while to get back to you. Two comments about your study:

    1. It is not a “pure” replication in the sense that I defined it because you did things in your study that had not been done in the previous studies. I don’t think that’s a bad thing at all – the only reason I mention it is to highlight that when I said pure replications tend to be in low demand, I was referring to that extremely restricted sense of doing exactly what someone had done before. It was just a very minor point I needed to make in order to move on the more important point of why explicitly laid-out studies are more credible.

    2. It is a perfect example of what I was talking about when I said we needed to focus on replicability. I think now I should have said something clunky like “replicable-ability” instead of “replicability”, since the simpler word is loaded with all kinds of assumptions about actually reproducing results when I was trying to focus on the technical ability to try to reproduce results if one were to so choose. The only reason you were able to do your very interesting expansion of previous studies is because those studies laid out their methods and procedures in enough detail for you to be able to copy them exactly. Because you were technically able to copy them exactly, you were able to choose to instead to systematically change your copy to explore possibilities that the previous studies hadn’t.

    Again, my point is that studies that are replicable-able deserve greater credence because they allow belief in research findings to be based on something other than personal trust in the individual researcher.

    Thanks for sharing your studies, by the way. It looks like an impressive body of work, and I always like to bring real-world examples into these discussions.

  12. Timothy,

    I really want to like the mixed-methods approach…but I just don’t. The idea of “mixed methods” assumes that qualitative and quantitative approaches are two actually different types of approaches that can be mixed. So the mixed-methods approach just advocates the combining of the two parts of a false dichotomy.

    I have heard some people talk about mixed methods in the more general sense of picking the right tools for the particular job at hand. I don’t see much use in calling that approach by a particular name – that’s kind of what we ought to be doing all the time. I’ve also heard people talk about mixed methods as if it were always wiser to get unsystematic insights as well as systematic observations rather than just get one or the other. That makes sense on the intuitive level, but I haven’t seen any evidence to give me reason to believe that it is actually true. Sometimes getting unsystematic findings just adds a time and resource burden to a study that could just as easily focus only on systematic observation, and sometimes a systematic study just wastes time by coding and categorizing things that we don’t yet understand well enough to study systematically.

    I guess I’m just trying to say that I find it more useful to ignore qualitative/quantitative distinction, even in its mixed form.

  13. I am agree with yours arguments of your purpose. This dichotomy is complementary, they need it for achieve more deep framework of understanding one reality. the replic is an academic consensus. The reality is dinamic, on constantly changing. Maybe, in dialogue between social researchers and our expiriences like humans beings. The market is no exception.

  14. Pingback: Why defenses of unsystematic methods often undermine those methods instead « House of Stones

  15. Pingback: “Why Should We Believe You?” Anthropology and Public Interest « House of Stones

  16. Pingback: Trying to figure out why I don’t want to call myself a data scientist « House of Stones

  17. Pingback: Social scientists sometimes have kind of a weird view of their own relevance | House of Stones

  18. Pingback: Science is more than its methods (but social science currently isn’t) | House of Stones

Comments are closed.