Small-Data Research in a Big-Data World – an Ethnographer’s Perspective on Bridging the Divide
As an organization scientist, I am interested in how knowledge workers create, use, and maintain their social networks to get their jobs done. I was trained as an ethnographer, and I observe people at work for a living. What, one might ask, does this small-data girl have to say about Big Data?
A lot, actually. I’m even willing to go as far as proposing that the inclusion of small data in Big Data research is a necessary (albeit not sufficient) condition for truly making sense of the latter. Let me clarify this proposition using the example of social networks: Our increasing ability to analyze massive data sets offers unprecedented insights into human social behavior, giving rise to a new discipline, computational social science, which employs sophisticated computational methods (e.g., complexity modeling, social simulation, and machine learning) to detect patterns in human interaction at a large scale. These patterns are essential to understanding the structure and dynamics of social networks, such as information diffusion, social movements, and even the contagion of emotions. Furthermore, these methods have sharply decreased the import of human error in data collection and analysis. When you have several billion data points, an individual data point carries very little weight. This has positive implications for a study’s validity and replicability, and thus makes good science.
But there are huge caveats to this trend in social science research. The social sciences have long felt that they play second fiddle to the hard sciences in terms of soundness, and recent failures to replicate studies from published social science research have only worsened that perception.
Big Data has become the shiny new toy for social scientists, a redemption from the ridicule of “real” scientists. Yet, an almost religious respect for numbers and fancy computational methods can steer peers, and the public, away from scrutinizing the social science that underlies Big Data research. This is a core argument of data scientist Cathy O’Neil’s aptly titled book Weapons of Math Destruction, which I highly recommend.
In the field of social networks, Big Data studies tell us very little about how and why individuals connect to one another. I’ll draw from my research on scientific collaborations to illustrate my point. Big Data network studies are really good at identifying the structures of social networks (all the connections between scientists who have ever coauthored a paper or patent) and can tell us more and more about network dynamics. For example, the average size of scientist teams (as measured by the number of coauthors on a paper or patent) has increased; and successful scientists (identified by the number of citations of their publications) tend to attract less-successful scientists – a phenomenon called preferential attachment. One could therefore say that Big Data renders the Big Picture (which is of tremendous value!). Yet, as I have previously argued, networks are something that people do, rather than have. Our relational practices (anything we do that involves someone else) create and modify the networks we’re embedded in. Individual practices vary substantially, and although information gathered from the data traces we leave can yield important insights (see this cool project at MIT), it cannot yet fully explain what people do to relate (or not relate) to one another. This is an ethnographer’s work, requiring a focus on practices rather than structures, as well as an inductive approach (e.g., “Let’s observe what’s happening here,” rather than “Let’s test whether what we hypothesized is true”). Our research on scientific collaborations combined social network analysis with ethnographic case studies and found that relationships between two coauthors on a paper can vary significantly in terms of their evolution and relational practice; and these different practices affect how science is done. Here are two extreme examples: The Principal Investigators (PIs) of two labs we observed had a long common history, having been postdocs in the same lab at the beginning of their careers. They considered each other friends and would talk shop over beers whenever they met at conferences. They finally coauthored a paper as a result of these discussions, almost two decades after their names had last appeared on the same publication. Looking only at the network data, these two PIs seemed to have a weak tie (measured in frequency of coauthorship); yet we knew from our fieldwork that they had had a profound influence on each other over the years. At the other extreme, we found instances where coauthors didn’t even know each other, despite being listed together on dozens of papers. Here the opposite was true: Although the network data indicated a strong tie between those coauthors, there was an absence of true cocreation of scientific knowledge.
In sum, it not only makes sense to combine small data with Big Data; it is a necessity in our quest to understand complex systems. However, there are several reasons why we don’t all happily zoom in and out of levels of analysis, and zoom in and out of theoretical lenses. First of all, academic training is highly specialized, to the point that most of us are able to do very few things (but these we do extremely well). So we’re trying to solve complex problems, but each of us only understands a tiny part of them and we lack a common language to link the pieces together. Furthermore, most academic institutions insist on organization by traditional disciplines, enforced by disciplinary tenure and promotion structures that discourage early-career researchers, especially, from venturing outside their disciplinary silos. Finally, even if we do manage to bridge the divides and to collaborate across disciplines, theories, and methods, it is extremely difficult to get these studies through traditional peer review, as few editors and reviewers have the intellectual generosity to consider papers that aren’t a precise fit for their expertise. No reason to despair, however. As we know from Big Data research, institutional barriers will not be able to withstand the onslaught of open data. Each one of us must learn to leverage this development by embracing, rather than fearing, a variety of different approaches. I don’t mean to imply that we all need to acquire in-depth knowledge in multiple disciplines, theories, and methods; but in order to bridge the gulf between small and Big Data, we must at least strive to become conversant with both sides of the great divide.