Research and the tools we use to do it

Posted on February 4, 2012 by Paul Meinshausen

There’s probably no consensus definition of research. So I’ll take the entry from Wikipedia as a fairly safe summary: “Research can be defined as the search for knowledge, or as any systematic investigation, with an open mind, to establish novel facts, solve new or existing problems, prove new ideas, or develop new theories.” Search, knowledge, investigation, mind, facts, problems, ideas, theories; these are the concepts that anchor our notion of research.

I have a growing tendency to sidestep these and instead think: invention, construction, instrument, equipment, technology, and tools. It’s not that I think the conventional concepts aren’t important. I just think it’s usually more efficient to look at the instruments and technology used in research before taking the time to consider its ideas, theories, hypotheses, facts, or evidence. And that’s especially true if the research involves any kind of human behavior or social phenomena.

The human world is an extremely messy place. Science is about dividing, analyzing and reassembling. There are far more ways to divide, analyze and reassemble the human world than are possible to evaluate and compare. Clearly some ways are better than others, so how do we decide which? Science is also about evidence: connecting hypotheses forward and backward to observable data. So perhaps we could evaluate science and research based on its data and evidence. Yeah, we could and should. But there’s even more data out there than there are theories and ideas, and comparing huge surveys and other datasets quickly becomes only marginally more valuable than comparing the ideas and theories behind them.

So here’s what I tend to do. Instead of asking what a researcher is hypothesizing or what data they’re presenting, I look at HOW they acquired and developed their data. If they conducted a survey and ran a regression on it, I tend to not look much further. I only have a limited amount of time. How can I trust that this particular survey and regression are any more informative than the hundreds of thousands of other surveys and regression-based studies that have been done and that I haven’t had time to look at? Do I just trust the apparent coincidence that brought me into contact with that particular study as opposed to all the others out there?

The barrier to entry that I use to determine whether research is worth considering further is whether the researcher developed a new tool for observing the behavior she’s researching, or whether she used an older tool in an innovative way. It’s difficult to invent or develop research technology and tools. Researchers in social science disciplines aren’t usually taught that we have to do that kind of thing. It’s common to think we’re supposed to spend our time in the world of ideas. Nonetheless I say our real job is to import goods from that world into the world of hard practical realities. In his last post Schaun pointed out that good research needs hard walls of reality to approach and overcome. I think how we approach those walls is just as important as whether they are there. Especially for behavioral research, it’s really, really hard to approach such walls without really innovative tools.

Here are some examples.

Iain Couzin is a computational biologist who studies adaptive collective phenomena in animal groups – the complex and coordinated collective behaviors that result from social interactions among individuals. He presented several great streams of research in a talk he gave this past fall, most of which were characterized by innovative uses of technology. For example he helped develop video-technology to track the head direction (as a proxy for gaze direction) of individuals in large crowds. Then he used that technology to observe the perception and movement patterns that emerge in response to interventions like an individual engaging in surreptitious behavior (in this case, surreptitiously filming something in the crowd). So for instance, how do crowds respond when they start to realize that something isn’t quite right?

There are cell-phone studies that track movement and reveal routinized behavior over time and which reveal social network information. Mobile phones and other kinds of information and communication technologies are being used more and more to observe large-scale social behavior. Innovative researchers have used internet traffic to investigate the spread and contagion of ideas and information. Others have examined the evolution of language using around 5 million books stored by Google. Using a robot capable of simulating facial expressions and filmed interaction, researchers have been able to explore how people use physical cues and facial expressions to evaluate the trustworthiness of others. Check out the MIT Media Lab and the range of research projects they’re involved in, almost all of which are characterized by innovative uses of technology as a way of observing and recording behavior.

What differentiates the examples above from the majority of the research that’s out there is that each brings a relatively new tool to the table, or uses an older tool in a relatively new way. The questions, hypotheses, conclusions, and even the actual data they use are all conditional on the equipment they used to get their data. The people behind the research are as much inventors and engineers as they are researchers.

I arrived at this position somewhat intuitively, mostly as something I built up while trying to deal with the massive amount of information one could try to assess and use. But it’s not really a new position to take. Just today a friend shared this article with me. It was written in 1994. The author argues that “what is fundamentally lacking in the social sciences is a genealogy of research technology, whose manipulation reliably produces new phenomena and a rapidly moving research front.” In the text he puts in this clever way: “What guides them is on a nonintellectual level, a sense of what kinds of physical manipulations have resulted in interesting phenomena in the past, and what sorts of modifications might be tried that will produce yet further phenomena…Research scientists lead a double life: as intellectuals in the game of argument for theoretical positions and as possessors of a genealogy of machines.”

And if you think about it, this isn’t just about the last decade of computer- and ICT-driven development. B.F. Skinner is one of the most famous psychologists of the 20^th century, and he’s known primarily for the “Skinner box” – a tool for measuring behavior that wasn’t even developed to measure human behavior. There’s also the Implicit Association Test, a tool developed in the late 1990s to make implicit associations and attitudes observable. And it’s not just about the tools for observing behavior. Invention and innovation also occur within the tools used to analyze the resultant data. The R Project for Statistical Computing has been built by an entire research community devoted not just to using the same methods over and over, but instead to constantly developing new methods that in turn make it worth trying to observe behavior in yet newer ways.

I’m not denigrating theory or the importance of the questions and hypotheses posed by research. Nor am I calling for a mindless or uncritical use of technology in research. But here are some questions I think are worth considering:

If the tools we’re using to observe behavior have been around for decades, if not centuries, and if the data we’re using is just a new sample of the same kind of data that’s been around for decades, what are the chances that the difference between what we’re doing and what has been done over and over in the past is the cleverness or originality of our questions or hypotheses or conclusions? What gives us confidence that what we’re doing is worth doing, and that it’s not just our unconscious biases and inherent self-preference that’s making us satisfied with our own research? If the only way to answer the research questions we pose is to ask people to tell us what they think about something (for example), then what makes us think we’ve thought of a question that hasn’t been asked over and over again? Or if it hasn’t been asked, what gives us confidence that it’s worth asking?

It’s a question that I’ve had to ask myself about my own research over and over again. A year or two ago I wouldn’t have known how to build anything, and the only way I knew how to use a tool like a keyboard was to type in English or Turkish. I just don’t think that’s good enough anymore. So now I’m spending my evenings in computer science and physical computing courses. Those are the tools I’ve decided to learn how to use, so that eventually I can build my own. I think of it like this: Good research needs hard walls. When it meets them it should inform how to go through, over, around, or under them. All of that involves tools. We should help design, build, and test those tools; not just sit back and hope they get built one day by someone else. And when we’re looking to be informed by others’ research, it makes sense to spend most of our time on the research that has demonstrated originality and value as much in how it has observed behavior, and made behavior observable, as it has in the cleverness or originality of its theories and hypotheses.

10 thoughts on “Research and the tools we use to do it”

Schaun Wheeler says:

February 4, 2012 at 4:42 pm

In general, I think I’m sympathetic to Collin’s arguments about genealogies of research technologies creating rapid scientific discoveries, but unless someone has systematically measured the degree to which those arguments match reality, I’m left feeling that his claims are more truth-y than necessarily true. They feel like they ought to be true, but that’s not really good enough reason to believe them.

But if we assume that Collin’s is right, I think you might be giving too much weight in your post to the “research technologies” parts of the “genealogies of research technologies” phrase and not enough weight to the “genealogies” part. The technologies themselves are less important than the fact that technologies are something that can be calibrated over time. For example, I think Collins uses the telescope as an example of a research technology that drove scientific discovery. I know that you are (or used to be) a fan of some of Feyerabend’s writing on science, so I know you’re aware of his writing on just how truly awful, from a measurement perspective, the early telescopes were. It seems that what made the telescope a useful research technology was that it could be calibrated. People could use the tool, see what didn’t work, then modify the tool to avoid some of those problems, and the look again for things that didn’t work.

So if the availability of calibrate-able research tools makes scientific discover more probable, then surveys and regression analyses and other tools whose usefulness you rightfully (in my opinion) question are just as likely candidates for faciltiators-of-science as anything else. For example, using Google’s numbers has a load of problems – for example, their search numbers seem to be largely tied to the ups and downs of internet traffic associated with the cycle of the academic year at high schools and universities. That’s a problem, but it’s a totally surmountable one. Once we identify problems with the tool, we can modify it to adjust for seasonality or in some other way strain out the measurement error. Researchers can do the same thing with surveys and statistical analyses and pretty much anything else.

It seems to me that the problem isn’t that the traditional tools of social science aren’t calibrate-able, but that very few people seem interested in calibrating them. For example, I’ve been delving into the issue of measurement scales for surveys. There is a truly astounding number of scales for psychological measurement. (Check out the International Personality Item Pool). It’s pretty rare, though, to find scales that have been calibrated beyond their initial use. For example, I easily found two studies (here and here) that showed that long-standing, expert-constructed professional scales fared just about the same as scales that were constructed by psychology grad students and even non-specialists. That’s a problem, but it’s not necessarily a problem of the tool itself, but rather a problem of people repeatedly using the tool without checking to see if the tool really did what it was supposed to do.

It’s possible to do some of that checking statistically, and possible to do even more of that checking through replication of studies (for example, see here). But in the end it’s really easy to define what, say, a personality scale is supposed to do as “measure where a person falls along the particular scale of choice.” That’s cop-out answer to the question of what a tool is supposed to do. The purpose of a telescope isn’t to have something to look through. It’s to see things that are far away. No one measures something like personality just because he wants to know what a person’s personality is. He measures because he wants to to know what he can expect that person to do. A tool that is administered to predict behavior and then rarely checked to see how well it actually predicts is a tool that is never going to get calibrated.

You asked the question: If the only way to answer the research questions we pose is to ask people to tell us what they think about something (for example), then what makes us think we’ve thought of a question that hasn’t been asked over and over again?” We don’t need to ask questions that no one’s asked before (although it would be really nice if we could). I think it’s well worth asking already-asked questions, but asking in ways that get better answers. That requires calibration, and that actually requires us to ask and re-ask. New tools are great. I always want new tools. But I think I might actually prefer new calibration of old tools.

This ended up being longer than I thought it would be, and I still have other ideas. I’ll try to put some of those ideas into a post of their own and get it up within the next few days.
Paul Meinshausen says:

February 4, 2012 at 5:58 pm

I agree, Collins can’t be taken as evidence.

You’re right, neither Kepler, Galileo nor the people they were trying to convince were using the telescope appropriately. Telescopes were great at measuring distance on earth, from point A to point B, but most people didn’t realize that distance on earth is quite different than distance between earth and space, so the telescopes weren’t calibrated appropriately. I think that actually kind of supports my point, which was (broken down): 1) plenty of astronomers at the time were entirely content to continue on making observations of the sky using mostly the naked eye, so it took some initiative and innovation to begin using telescopes; 2) astronomers couldn’t begin to calibrate the telescopes until they had begun to use them, because they would never have realized they were flawed until after they observed those flaws through use; 3) In order to both use and calibrate the telescopes, astronomers had to learn how to deal with the physical construction of telescopes and the science of optics and how vision occurs (which probably provoked some reactions of “I care about stars, not about metal tubes and eyes”). I just mean to say that calibration presumes the invention and use of a tool in the first place.

That being said, I take your point about an over-concern with new technologies when that excludes the repeated and improved calibration of old tools. I think it’s absolutely valuable to re-use and calibrate old tools. But if we continue to use this historical analogy (and the one that Collins used) then I’d say that the current position of human and social behavioral research is more akin to prior to Galileo than to the period after him – in other words, most research involves very few tools, and that’s why it’s not sufficient to depend on calibration right now.

You use surveys and psychometric scales as an example of old tools and say we should spend more time calibrating them. I agree it would be more valuable to calibrate old surveys/scales than develop new ones. I don’t really see them as tools though. With surveys, how is behavior observed? It’s observed through asking people questions. The difference between surveys is between the particular questions asked and the ways in which they’re combined, tested, and analyzed. But ultimately they’re just questions. I don’t see that asking people questions is a tool. Again, I’m not saying that asking people questions is never useful (I’m conducting a study right now that does it), I just don’t really see it as a tool.

This connects me to your last point: that the questions/items in surveys and scales are used not just to ask questions, but to predict behavior. I don’t just tend to dislike questions because they don’t require tools, I tend to dislike them because I don’t think they predict behavior very well. I think answers (and answering) is more a behavior in themselves than an indicator or predictor of other behavior. So my alternative? Say you want to get at consumer behavior (buy one brand over another) – the more we can develop tools that allow us to actually observe consumption as it occurs, the better we’re going to get at predicting it. Observing that kind of behavior at a large enough scale to be useful is practically impossible with naked-eye observation. So we need to use technologies that help us do it.

I’m looking forward to your post!
Schaun Wheeler says:

February 5, 2012 at 12:27 am

I don’t see how asking people questions isn’t a tool. It’s just another way of measuring behavior. I agree that it’s silly to assume that surveys are getting inside people’s heads, and it’s probably even more silly to assume that whatever answers they give exert some sort of causal effect upon their other behaviors. Maybe I shouldn’t have used personality tests as an example. Surveys can often indirectly measure aspects of the environment that we just aren’t very good at observing directly. People’s interactions with their environments produce outcomes that to varying degrees are either satisfying or frustrating. Satisfaction and frustration (in the sense of achieving or not achieving a particular outcome, not in the sense of feeling a particular emotion) are difficult to observe, especially in setting where we’re trying to play catch-up – trying to understand a situation as it unfolds. Surveys are admittedly very imprecise tools for capturing outcome information, but they are tools nonetheless, and at least in the hypothetical situation I just outlined, they could be valid ones.

I agree that direct observation is almost always better than indirect observation, but it seems to me that direct observation is often not possible (or so impractical or cost-prohibitive to be as good as impossible), although I’ll grant that it’s probably more often possible than researchers may assume.

I don’t know why I’m stuck on this point since I personally really dislike the use of surveys, but whatever my biases against surveys as a research technology, it seems to me that we have to conclude that they are, in fact, a research technology – they allow us to do things that we wouldn’t be able to do (as easily or efficiently) without them.
Paul Meinshausen says:

February 5, 2012 at 7:35 pm

Surveys are a tool in the sense that language and mathematics are a tool. That’s not how I’m using the term.

Say you watch 100 men purchase 10 items each at a store. Was the resultant data collected using a tool? I wouldn’t say so (unless you want to count the pen and paper).

Take the same scenario but this time you’re too far away to record the items. So you use binoculars. In this version of the scenario you are using a tool – the binoculars.

I’m talking about some material extension of our native abilities, and I’m talking specifically about tools that help us observe, not tools that help us analyze. So in the first scenario above, if you subsequently took the data and used a computer to analyze it, then you’d be using a tool for analysis, but not for observation.

I agree direct observation is often not possible. But I think that’s the case because we haven’t developed technologies that make it possible. Such technologies are beginning to be developed. (Speaking of which, this guy’s efforts are funny: http://feltron.com/ar09_05.html – the tools necessary for him to do this were computers, the internet, and his website url). I really think the potential is there, it’s just that most researchers that are interested in people aren’t yet very familiar with screen-based or physical design, engineering, or computing.

Astronomers could and did study the stars before they had telescopes. I’m just saying they were able to do it a lot better once they invented and started working with telescopes.
Schaun Wheeler says:

February 6, 2012 at 8:30 pm

I think now we’re getting into a discussion of what “tool” means, which is interesting, but probably not all that useful. I’d define tool as something along the lines of a material or collection of materials that enhances a person’s capability to perform an action beyond what is offered by that person’s innate capabilities. So a survey would be a tool because it allows a research to collect information in a way and on a scale not possible from just talking or watching. Watching people buy stuff wouldn’t be a tool because the only measurement instruments being used are the ones the researcher was born with.

But I wholeheartedly agree with you that we ought to be looking for a greater number and variety of ways to measure behavior. I think we’ve kept our toolkit pretty small for a pretty long time, and there’s certainly no need to keep it small now that there are so many opportunities to expand it. I think that means social scientists are going to need to learn more math, more computing skills, more of pretty much everything that researchers in the hard sciences are already required to learn in the course of their training. The bar for entry into social science discourse has been embarrassingly low for a very long time.
Paul Meinshausen says:

February 6, 2012 at 9:21 pm

Well put.
Ron Swanson says:

February 8, 2012 at 12:34 pm

“The bar for entry into social science discourse has been embarrassingly low for a very long time.” I’ve been feeling this for the last couple years. But I will say that from a purely educational perspective, there is a fairly high conceptual bar to clear among the different social science disciplines, relative to the natural sciences. “Fluffiness”–highly complex/contingent behavior–begets theory-heavy disciplines. (Oh, how social scientists long for some laws of social behavior!) Combining conceptual mastery in such disciplines with training in methodological innovation takes a long time–but streamlining this process in our PhD programs is essential.

A different angle on your discussion about surveys v. observations, etc.: What about attempting to predict behaviors where, to borrow Schaun’s language, future environmental constraints are significantly different from those in the past? I’m not saying that interviewees, themselves, understand how they would behave in hypothetical scenarios, but how else might we approach an understanding of behavioral phase shifts, threshold effects, etc.? Can new technologies improve our ability to predict behavior in novel environments?
Paul Meinshausen says:

February 8, 2012 at 5:00 pm

“Fluffiness”–highly complex/contingent behavior–begets theory-heavy disciplines. (Oh, how social scientists long for some laws of social behavior!)

This is a good point. There’s always a question about whether you approach a problem in a more deductive (lay out your concepts first) or an inductive (sort out your concepts based on your data) way. A while back I would have fallen heavily on the deductive side. I still think conceptual clarity and validity is hugely important. But I’m starting to think that part of the problem with the inductive approaches of the past were that they were really poor inductive approaches, because they didn’t have a very good sample of observations/data from which to work. It’s generally horrible to try and go out and just ask people a bunch of questions and then see what comes out. However, that might be a really bad example of induction. The alternative (a good example of induction) would be something that really gets a broad and large set of observations of actual behavior (something you can’t really hope to achieve with a survey). The reason I think that would be useful is that it might reveal that we’re dealing with a whole lot less variation and chaos than we would think based on our ground-level perceptions and observations.

This is kind of the problem I’m dealing with now with research on corruption. I’m hearing a lot of people talk about how we need to develop “typologies” and categories of corruption. But they all seem to be based on little more than the creative imagination of the researchers – who rarely can express or identify what concrete behaviors they’re talking about. The result is that there’s almost no constraint on the concepts and categories they can create. But what if that’s because we really just don’t have a good high-level view of the world, because we haven’t been able to get the kind of high-level data necessary for such a view.

This research (and here) is an example of what I think could be an alternative approach. She’s using robust collective outcomes to help identify the rules affecting the individuals’ behavior, and vice-versa. Yes, robots are much simpler than humans. On the other hand, if we can find large-scale social patterns of the behavior we really care about, then potentially we can set aside the immense variation at the individual level (whether a person wears a red or blue sweater, or is liberal or conservative) as more noise than signal. As individuals that variation matters a whole lot (“I don’t like red”), so I think it’s hard to set that variation aside until we’ve been repeatedly shown the large-scale patterns. In a sense we’re mired in details until we get a glimpse of the bigger picture. Which brings me back to the post – we can’t get a glimpse of the bigger picture until we have a bigger camera.

Anyway- this is a longer response than I intended. But it’s something I’ve been thinking about. I’ll probably write a post on it sometime soon. Thanks for bringing it up!
Schaun Wheeler says:

February 8, 2012 at 8:53 pm

Ron,

I agree that social scientists are usually trying to understand much more ambiguous concepts. On one hand, I sympathize. On the other hand, I can’t shake the feeling that our theoretical fuzziness is a self-inflicted wound. It’s one thing to have ambiguous concepts because we are dealing with a subject matter that is inherently fuzzy. It’s another thing to have ambiguous concepts because so few of us have become good enough philosophers to really lay out the logic of our theories in a rigorous way. Now, I’ll admit that in many ways hard-science researchers have it easier, in that more obviously physical subject matter are easier to study in ways that your conclusions will run up against walls of reality. But social scientists have long had similarly physical subject matter – behavior itself. It’s hard to read through the social science literature – especially the theoretical stuff – and not come away a bit overwhelmed but just how psychologized the whole endeavor has become. I can understand why so many researchers looked for intrinsic causes of behavior, and I can understand why, until the cognitive and later neuroscience revolutions in the study of human thinking, those intrinsic causes ended up mostly being a mishmash of assumption with some anecdote sprinkled in, but I think we lost sight of behavior somewhere along the way. A lot of us stopped trying to explain what people do and shifted to trying to explain what people think. Unfortunately, a lot of social research seems to still hold to those folk/pop psychology explanations.

I used to be pretty obsessed with the theoretical literature. I’ve read a fair amount of it (though certainly only a small fraction of the total amount…maybe I’ll write a post on that subject in the near future). I haven’t managed to find any theory that justifies belief in the existence of the various pieces into which the theory cuts up the world, and I’ve found even fewer attempts to justify belief in (or even mention) mechanisms by which the supposed pieces would interact. We either ought to limit our theoretical assumptions to just positing the existence of behaviors and a few basic environmental features, or we ought to do the work to really lay out a philosophically sound theory. I used to think the latter was the better way to go. I now lean towards the former option (although I still think the fully-theory approach isn’t a bad idea).
Pingback: Surveys, Assumptions, and the Need for Data Collection Alternatives « House of Stones

Comments are closed.

House of Stones

"Science is facts; just as houses are made of stones, so is science made of facts; but a pile of stones is not a house and a collection of facts is not necessarily science." – Henri Poincare

Share this:

Related

10 thoughts on “Research and the tools we use to do it”