Digital Methods: First Steps

Amsterdam Workshop Report

In January 2009 the Digital Methods Initiative commenced with its four-year program of seminars (two per year) and international progress conferences (one per year), the latter named after the once annual event in Science & Technology Dynamics at the University of Amsterdam. The Digital Methods Initiative (http://www.digitalmethods.net) is an ongoing collaboration among new media researchers at the University of Amsterdam, seeking to rework method for the Web. Among the key concepts from digital methods that appear in the following are digital groundedness, the technicity of content as well as post-demographics. The workshop report concludes with the research opportunities discussed, including a Google theory of power.

Present at the inaugural Digital Methods research seminar were the Initiative’s director Richard Rogers, coordinator-researchers Sabine Niederer and Esther Weltevrede, researchers Erik Borra, Andrea Fiore, Marijn de Vries Hoogerwerff, Michael Stevenson and Laura van der Vlies, and invited participant Lanu Kim, the exchange student who arrived days before from South Korea. Seven papers, distributed to participants beforehand, were introduced and discussed in four sessions.

Google as Inculpable Engine

In the first session, both Rogers and de Vries Hoogerwerff presented work on critical approaches to Google. In “The Googlization Question, and the Inculpable Engine,” Rogers explores what other effects might characterize ‘Googlization,’ generally understood as media concentration, and outlines a research agenda in terms of ‘back-end’ and ‘front-end’ Googlization, drawing from his book, Information Politics on the Web (MIT Press, 2004). Where back-end effects may refer to attempts (by Yahoo, Microsoft and others) to emulate Google’s search engine algorithm, front-end effects include the reproduction of the engine’s minimalist aesthetic. Subsequent discussion attempted to further identify a Google-specific form of power, and revolved primarily around the search engine company’s unique status as a “blame-avoidance” machine, first expressed in Google’s own parlance, “Don’t Be Evil.”

If this is to be an accurate description of Google’s strength, one must account for at least one anomaly. Marijn de Vries Hoogerwerff’s paper, “Cybercosmopolitanism: The Other Californian Ideology,” deals with the backlash that followed Google’s decision to censor search returns in China. He draws on Google’s decision and publicly declared rationale, as well as reactions from prominent bloggers and Google critics, to reflect on what Richard Barbrook and Andy Cameron previously termed “the Californian ideology,” and to speculate on its transformation through cultural, commercial and legal concerns. Perhaps most significantly, de Vries Hoogerwerff concludes by recasting the Google China issue as a missed opportunity for proponents and critics of Google to question what he calls a belief in the democracy of the algorithm.

“Wikipedia researchers forgot the bots”

The following session shifted focus to two novel concepts central to Digital Methods research. The first is networked content, which refers broadly to content that is ‘held together’ or otherwise maintained by virtue of its networked form. In “Wikipedia and the Vigilance of the Crowd,” Sabine Niederer details the role of bots and software-assisted users in content creation and upkeep on Wikipedia. Her findings include an inventory of the various kinds of bots and software used, the authority bots hold (having fewer permissions than administrators, but more than registered users) and the relative bot-dependency of different language Wikipedias. Additionally, Niederer’s work sheds new light on previous Wikipedia research. The various analysts and commentators who have entered false information in Wikipedia articles in attempts to confirm or debunk narratives about the site’s reliability, she argues, were perhaps not testing ‘reliability’ in the conventional sense, but rather the technicity of Wikipedia content – the ability of bots, RSS feeds, alerts and the like to point contributors to erroneous edits.

A second concept central to the Digital Methods project is digital groundedness, which inverts epistemological approaches that assess the Web’s knowledge against claims ‘on the ground,’ and asks instead what claims about reality may be made on the basis of digital measures. In her National Webs project, Esther Weltevrede’s has developed a periodization of the Web from the perspective of topology: authored first by notions of cyberspace and virtuality, the Web is increasingly organized regionally and nationally by technological arrangements (e.g. IP-to-Geo location technology), demonstrated by such services as the Google and Yahoo search engines. As the Web is made more local, new avenues for inquiry are opened. Specifically, Weltevrede is developing methods and tools for demarcating and characterizing national Webs, sketching for example the prominence of non-governmental organizations on the Palestinian Web.

Web Machines as Interpolators as opposed to Extrapolators

The two afternoon sessions treated objects of study conventionally grouped under the heading Web 2.0. Erik Borra contributed a review of ‘post-demographics,’ a term used in past Digital Methods research to refer to the preferences, interests and connections entered into databases, distinguishable from variables traditionally accounted for by demographers (e.g. age, race, income, etc.). What story does one’s registered ‘features,’ from friends to favorite books, tell, and how is this information put to use? Crucial to business models that rely on product recommendation (canonically, Amazon’s system based on past purchases), post-demographics also provide sociologists with large data sets for research on social networking. Borra suggests that beyond commercial profiling and the focus on social networking behavior, key research in this area should focus on the derived or ‘hidden’ attribute. In other words, research that does not extrapolate from, but rather interpolates new or previously concealed properties from existing data. An example is the project vriendjespolitiek.net, also by Borra and colleagues, which profiles political parties in the Netherlands based on data from the popular social networking site Hyves.

Theorizing Open Source and Blogging

With “The Bazaar and the Cloud,” Andrea Fiore presented research into the relationship between free and open source software (FLOSS) and the ‘software as service’ paradigm that relies on cloud computing. Following Tiziana Terranova, Fiore argues against a dichotomous view of open source struggling against the proprietary forces of such software providers as Google, and instead seeks to highlight their interdependence. After reviewing back-end symbiosis – the FLOSS-enabled infrastructure of cloud computing – Fiore turns to what he calls a front-end strategy of “controlled openness.” The strategy is epitomized by the application programming interface (API): “By releasing open source libraries that wrap its own APIs into the control language of specific programming languages or operating systems, a service like Google mobilizes crowds of programmers in the work of building new front-ends” for its services. Reflecting back on the question of Googlization, discussion turned to both the possibility (and desirability) for open source developers to maintain a position outside the mix of proprietary software and open source that ‘software as service’ appears to require.

In the day’s final presentation, Michael Stevenson discussed recent work on the rise of blogging in the 1990s. Stevenson sees common ground in theories of blogging from Geert Lovink and Henry Jenkins, and builds on these to theorize the medium’s early development as the fabrication of an alternative media practice, one that deliberately sought to move beyond the subculture-mainstream divide. Drawing on initial attempts by users and commentators to characterize the medium, he argues that blogging signaled a turn away from the subcultural roots of the Web (including the virtual communities that blogging succeeded) and was simultaneously conflated with shifting definitions of the Web’s purpose.

Google’s Power Theory and Other Research Thoughts

The concluding plenary session discussed new research opportunities in a kind of top ten list provided by Rogers, together with the participants. (Only seven are covered here.) First, Google was described as an inculpable engine because of the increasing personalization of its results. One has oneself to blame, in part, for the results. Though not necessarily Google-specific, the ability to cast aside blame for its output is suggestive of a form of power Google has assumed. Relatedly, the idea of its democratic algorithm, with sites voting for other sites by linking, and the machine ultimately outputting what the search engine industry calls ‘organic results,’ also fits into thoughts of a blame-avoidance machine. (There was an invitation to think through how Lewis Mumford’s model of democratic versus authoritarian technics would be affected, as well as those built upon it.) Second, again with respect to Google, is another larger issue of how to do user studies. Normally, users are observed, interviewed, or surveyed. In standard registrational approaches their eyes may be tracked with an apparatus. However, Google is performing its very own form of user studies by treating the user as a data set, with preferences, a history and a location. It is also providing ‘user feedback,’ that is, results tailored in a variety of ways. Should method follow object, how might Google’s example be followed to innovate in approaches to the user?

Third, the Web, especially the projects that order it, has seen a gradual decline in editors, with the algorithmic engine (in Yahoo and Google) winning out over the special directory projects (Yahoo’s own and Google’s engine on top of Dmoz.org, the Open Directory Project). Something similar holds for Wikipedia’s collective authorship, over the editorial authorship of old, however much Wikipedia struggles with and sometimes seeks to straddle the two models. It was pointed out, fourthly, that there is an urgency to study the relationship between Google and Wikipedia, as their relationships are manifold. One of the more interesting ones is the collective editorship process of putting an article up for deletion, because the subject matter returns little or no results in Google. Thus Google becomes the authorizing entity for Wikipedia. In the study of Wikipedia to date, much attention has been placed on the vigilance of the Wikipedian community, and especially the community’s capacity to spot errors and vandals. What is missing from Wikipedia studies – the fifth point – is a symmetrical attention to the software bots (as opposed to the heroic humans). The question arose as to the status of previous Wikipedia studies that have left out the bots.

In the discussion of Internet censorship and Google in China, researchers worldwide are generally aware of how and when China censors, thanks to the work of the University of Toronto and Harvard University. However, what is remarkable (sixth point) is it appears that Google, in complying with Chinese censorship practice, makes its own blacklists! Google fetches pages through machines in China to check whether they are blocked, and subsequently updates its own search engine outputs, removing those pages blocked in China. In a perverse sense such a practice would make Google into one of the more thorough Internet censorship research organizations.

The remaining points concern the politics of code, and how they affect research. One discussion (seventh and last point) revolved around the APIs, the data feeds provided by engines for the purposes of research and mash-ups. Owing to the small data sets they furnish (limited number of queries per day), it is difficult to gather a sufficient data population. Rather than rely on APIs fully – Google pulled one of theirs, leaving researchers and others without data – there is the other data-collection practice called ‘scraping,’ or sometimes ‘screen-scraping.’ But too much scraping prompts the engines and other services to block the data collection. The larger issue concerns researchers’ abilities to analyze engines, if the APIs provide too little, and scraping is punished.

Michael Stevenson is pursuing a PhD in Media Studies, University of Amsterdam. Richard Rogers holds the chair in New Media & Digital Culture, University of Amsterdam.