Who should drive the data?
We probably agree that relevant, effectively usable “data” are one key ingredient to approaching the grand challenges of the 21st century. Their central role is demonstrated daily in areas ranging from economics to climate science, from the digital humanities to malaria research. Liveable cities can be built only if we learn from data over longer time frames and, increasingly, these data are collected by citizens. Tackling climate change fundamentally relies on scientists’ ability to analyze reliable time-series data from diverse sources. For example, the highly acclaimed work on income equality by economists Piketty and Saez has curated data at its heart. Our ability to analyze and assess the actions of politicians, media, organizations, and governments relies on the capacity to establish facts from digital evidence, as available in open data. Of course, today, the majority of these “data” are created, measured, captured, stored, processed, combined, duplicated, preserved, and used in digital form by software-intensive systems whose boundaries have become increasingly blurred.
Surprisingly, once we start unpacking the term “data,” few agree on exactly what the word means. Is it the bits used in computing or evidence used to represent phenomena of interest? Is that evidence unproblematic or constructed? Is there such a thing as “raw data”? While we often take technology for granted, one advantage of interdisciplinary access to digital data is the opportunity – even the necessity -- to unpack its assumptions. As a faculty member in an information school, my favorite classes to teach are those in which future archivists, librarians, information systems analysts, and museum curators sit in one room to figure out how to reconcile the technical and the social in their systems design projects. It’s difficult, but rewarding.
The ability to manage “data” for current and future use is critical for a sustainable society. Digital curation – active involvement in managing and preserving digital resources for future use – is by definition oriented towards an uncertain future, whether that be tomorrow or in a hundred years. The longer-term perspective that comes with this future orientation has led me – trained in such fields as software engineering, computer science, and business informatics – to start a conversation within the software engineering discipline about sustainability, the "capacity to endure."
Sustainability has been emerging as a transformative challenge in many disciplines and professions, including computer science. To consider sustainability in designing the systems mentioned above requires simultaneous consideration of at least five dimensions: environmental resources, social as well as individual well-being, economic prosperity, and long-term technical viability. Each of these is complex in itself, and to consider their interactions requires a cross-disciplinary approach to research and systems design. That approach must emphasize appreciation of the complexity of “wicked problems” – which are not just “difficult, hard-to-compute problems” but are fundamentally social in nature. Thus, any approach must abandon reductionist approaches to systems analysis that focus on puzzles and pieces and, instead, must favor attempts to develop an understanding of socio-technical systems. These shifts do not come easily to a world focused on rapid “problem solving” through technology. Frankly, we have barely begun to ask the right questions.
Asking these questions takes time. In a recent Bridges blog article, Wolfgang Gatterbauer suggests that, while they are being discussed, technologists will drive the data innovation home and render the discussants obsolete. In that process, technology developers too often forget about the politics of their artefacts and suggest that ethics are merely a question of configuration parameters. But as Katja Schechtner highlighted in her blog piece “Culture Eats Technology for Breakfast,” technology is not neutral, and with its development comes responsibility. Ann Cavoukian, former Information & Privacy Commissioner of Ontario, also reminds us that fundamental values such as privacy must be considered as a central guiding principle of systems design.
The social sciences and humanities have plenty of insights to share about such questions as the role of human values in the social construction of technology. It’s hard to bring these perspectives to bear on fast-paced technology developments, but it is worth trying. As Maria Binz Scharf argues in a Bridges blog article, small data and big data should not be mutually exclusive. There are many wonderful examples of interdisciplinary research in human-centered data science, excellent studies of the immense variety of data-intensive research, and inspiring collaborations between researchers in data-intensive computing and history. I was fortunate in my own interdisciplinary research, having worked with cultural heritage organizations such as national libraries and archives across Europe (including the UK) on developing data-intensive approaches to preserving large-scale cultural heritage and electronic records. One challenge here is that the capacity of digital materials to endure is less dependent on storing the bits – they can always be copied to new media – than on retaining the ability to compute and make sense of the dynamic experiences that we care about. We would like to trust in the ability to read this blog and follow its hyperlinks, but don’t expect them to last!
There is much to gain from collaborations that will take all these issues seriously. Data is now pervading disciplines that would never have dreamt of such a thing before. Too often, divisions between diverse disciplinary cultures cause polarized groups to press ahead on their own. And too often, the results are missed opportunities and collateral damage – be they privacy breaches, unexpected economic effects, reinforced cultural and social divides, or counterintuitive environmental impacts.
Interdisciplinarity is hard. It’s hard to get funded, it’s hard to do, it’s hard to get published, and it’s hard to receive appropriate credit across the disciplines. But I am convinced that only through research that takes discipline-crossing questions to heart and engages in a genuine debate about them can we make meaningful long-term contributions to societies on our planet.
Big Data presents enormous opportunities to work across disciplines. As we prepare for a promising debate this weekend, I look forward to discussing the opportunities that lie between the disciplines, and how we can make the best of them.