Digital Humanities Centers as Cyberinfrastructure

John Unsworth

Coalition for Networked Information
Washington, DC
Monday, December 10, 2007

John Unsworth would like to apologize for being unable to be at CNI to deliver this talk. Some very unfortunate events involving a student at the Graduate School of Library and Information Science have superceded his travel to this meeting.

The ACLS report on Cyberinfrastructure for Humanities and Social Sciences (available online at was a response to what is now called the Atkins report, after Dan Atkins, who chaired the NSF-appointed blue-ribbon panel on Cyberinfrastructure that produced it: he also served as an advisor to the ACLS Commission. I want to thank Dan for his leadership in this topic, and especially for the ecumenical breadth of his thinking on the subject, not only in the original NSF report, but also in his more recent position as director of the NSF division of cyberinfrastructure: that kind of openness is extremely important as the humanities and the social sciences work out their relationship to science, engineering, computer science, and commercial interests in an emergent and rapidly changing environment.

The ACLS report noted, in its introduction, that the original NSF document described cyberinfrastructure as consisting of

The ACLS report went on to note that

Humanities scholars and social scientists will require similar facilities but, obviously, not exactly the same ones: "grids of computational centers are needed in the humanities and social sciences, but they will have to be staffed with different kinds of subject-area experts; comprehensive and well-curated libraries of digital objects will certainly be needed, but the objects themselves will be different from those used in the sciences; software toolkits for projects involving data-mining and data-visualization could be shared across the sciences, humanities, and social sciences, but only up to the point where the nature of the data begins to shape the nature of the tools. Science and engineering have made great strides in using information technology to understand and shape the world around us. This report is focused on how these same technologies could help advance the study and interpretation of the vastly more messy and idiosyncratic realm of human experience.

To that end, the ACLS report had eight recommendations: I'd like to look at each of those recommendations with an eye to the critical contributions that digital humanities centers can make in these areas, in order to ensure that the goals outlined in that report are realized. What I'll be arguing here is that digital humanities centers are cyberinfrastructure for humanities and social sciences--not the only kind, but one of the most important kinds, especially given where those disciplines are and where they need to go.

Recommendation 1: Invest in cyberinfrastructure for the humanities and social sciences, as a matter of strategic priority.

Centers are the most efficient way for institutions of higher education to make this investment: the collection of expertise, equipment, software, etc. that is required to facilitate digital humanities and social sciences requires some economy of scale: it can't be supported at the department level, and though it might be supported at the level of college or school, these bureaucratic units are never co-extensive with the humanities or the social sciences, in any university. If you are going to make an institutional investment in cyberinfrastructure for humanities and social sciences, as a university, you are obviously better off making that investment once, and in a high-impact, high-profile way, than many more times, with less impact, at a higher cost, across more units. Aside from the economies-of-scale argument, there is an argument to be made about the benefits of interdisciplinarity: it is still, in most universities, a relatively rare thing for faculty in humanities and social sciences to have ready access to compelling opportunities for interdisciplinary collaboration within their own institution.

Recommendation 2: Develop public and institutional policies that foster openness and access.

Centers, working closely with the library, can take point within the institution on promoting the development of these policies, and in promulgating them to faculty. The library is obviously a key player, and institutional repositories are an opportunity for what is probably best imagined as broad-and-shallow education of faculty, and these efforts will inevitably focus on the intellectual property that faculty members themselves produce; centers offer an opportunity for narrower and deeper engagements with the rights policies that govern the primary materials on which scholarship is based. In this engagement, the faculty member is the intellectual property (IP) consumer, rather than the IP creator, though access to these primary materials will be a necessary precondition for creation of the faculty member's IP. The library's role and that of the Centers are complementary, and should be coordinated, not least to make sure that a consistent message is being communicated to faculty at both moments--when they are IP consumers, and when they are IP producers. If there's a university press in the neighborhood, they should also be engaged in the discussion.

There's also another way in which Centers can play a particularly useful and important role, with respect to faculty members who are trying to negotiate questions of rights for access to primary source materials. As many of these materials will come from cultural institutions like libraries and museums and archives, and over time these may well be the same libraries, museums, and archives even though the faculty projects will be different, a Center can establish relationships with these institutions that span many years and many projects, providing a basis of trust and prior acquaintance that will ease negotiations in particular cases.

Recommendation 3: Promote cooperation between the public and private sectors.

The university has a hard time, especially in the humanities, in producing effective representatives or partners for the private sector. Humanities and, to a lesser extent, social science departments, have little or no experience, and often little or no interest, in partnering with the private sector. It's actually worse than that: the humanities tend to hold the private sector in contempt, as the culprit in the corporatization of the university. But just as Centers can provide continuity, build trust, and establish a track record with cultural institutions, so that individual faculty members don't have to start that process from scratch, Centers can do the same with private-sector partners: they can identify appropriate collaborators for the humanities, inculcate appropriate expectations for research outcomes, and match those partners with faculty who have congruent interests. Looking at it from the other side, the Center can match a researcher's interests with an appropriate private-sector partner, if one exists, and can create appropriate expectations for the nature and the outcomes of that partnership, on the faculty member's side. Private-sector partners might be interested in promoting cultural heritage, publishing or licensing scholarship for specialist or generalist audiences, or access to users with advanced requirements for general-interest content. Centers can represent the interests of the researcher in collaborations--for example, in something like the Google Book project, libraries represent one set of resources and requirements, but these are not necessarily always those of the faculty researcher. Centers could do all of these things more effectively if they networked with one another.

Recommendation 4: Cultivate leadership in support of cyberinfrastructure from within the humanities and social sciences.

Leadership in cyberinfrastructure, for the humanities, will no doubt emerge from large projects and from national centers, as it has done in other disciplines. And indeed, we already can see such leadership emerging in the centers that exist today, and also in the membership of the ACLS Commission, all of whom are people who have spent years of their academic lives developing, using, and promoting cyberinfrastructure for the humanities and social sciences. During the work of the Commission, Commission members heard a good deal about the need to change the reward system in the humanities--particularly tenure and promotion--to cultivate digital scholarship. At the same time, though, the Commission recognized that its own members had been rewarded by their disciplines and their home institutions for doing such work, so the situation is not a simple one: for example, there are examples of individuals who have been tenured for digital scholarship. The chair of the Commission was, more than ten years ago, at what was then considered a conservative department and university--but if we want to encourage larger numbers of junior faculty to experiment with new methods of doing scholarship in the humanities, we need to lower the risk in such activities, and senior faculty, department chairs, and deans are the ones who can make that happen.

Recommendation 5: Encourage digital scholarship.

In 2001, two years before the publication of the Atkins report, Fran Berman (the Director of San Diego Supercomputing Center) wrote this:

"We hear a lot about the impact on science and engineering of cyberinfrastructure hardware resources (computers, storage, instruments, networks) or software tools and interfaces. Less heard, perhaps, is a discussion of the element most critical to the success of the cyberinfrastructure--its human infrastructure. The cyberinfrastructure's human infrastructure is a synergistic collaboration of hundreds of researchers, programmers, software developers, tool builders, and others who understand the difficulties of developing applications and software for a complex, distributed, and dynamic environment. These people are able to work together to develop the software infrastructure, tools, and applications of the cyberinfrastructure. They provide the critical human network required to prototype, integrate, harden, and nurture ideas from concept to maturity.

Fran Berman, "The Human Side of the Cyberinfrastructure, Envision 17.2 (April-June 2001).

Human infrastructure is key to cyberinfrastructure in the humanities, as well, though we don't yet have (and may never have) "hundreds of researchers, programmers, software developers, tool builders, and others helping "to prototype, integrate, harden, and nurture ideas from concept to maturity. We do have some such people, though, and they often work in centers. Some also work in libraries and in campus computing organizations, but in both of those cases we find human infrastructure that is less exclusively focused on bringing to fruition the concepts of faculty researchers in the disciplines of the humanities. That exclusive focus is important: there is considerable danger of "mission creep in under-resourced academic settings, where computers are involved. Since the computer is a general purpose modeling machine, it can do lots of different things, and from the perspective of the person needing support, it isn't really that important whether the activity in question is research, or teaching, or publishing, or something else. But from the point of view of developing the kind of in-depth, long-term engagement with computational methods that actually produces new knowledge acquired by new means, that is a critical difference. Only research represents a long-term commitment on the part of the faculty member, and only that long-term commitment can justify the extremely taxing effort of what Daniel Pitti used to call "ontology and obstetrics--that is, eliciting from the researcher his or her tacit knowledge of a subject, working with him or her to express that knowledge in an explicit and computable form, trying it on the data for size, and iterating--usually many, many times, before an acceptable computational model, or tool, or resource has been developed. The long-term engagement of professional staff in this process is key, as well: it takes time to learn to understand the research paradigms, the vocabulary, the motivations, and the intellectual practices of scholars in the humanities--and without understanding these things, it is highly unlikely that a programmer, or tool-builder, or others in the human infrastructure can succeed in making cyberinfrastructure useful at any very high level in the humanities.

Recommendation 6: Establish national centers to support scholarship that contributes to and exploits cyberinfrastructure.

In that same 2001 article, Berman goes on to note that

"The personal networks, knowledge, and relationships of the human infrastructure take a long time to build and are critical to the usability of the resources. In particular, the advances we now enjoy in science and engineering are the fruit of the many years of cooperation in the national effort to unite computational and computer sciences.

Although it is likely that most of centers have figured out some way to work with faculty members at universities other than the one that houses them, it is usually an ad hoc and/or an unfunded arrangement, and it is difficult to get real traction on those terms. It's also difficult, on those terms, to be strategic about what projects you support, or about building a national network in which faculty could be directed to centers with appropriate expertise, and so on.

The ad hoc and project-based funding that has, by and large, characterized the work done in digital humanities to date raises some real (and, in other domains, familiar) problems for building cyberinfrastructure. Earlier this year, Paul Edwards, Steven Jackson, Geoffrey Bowker, and Cory Knobel published a very interesting white paper, coming out of some meetings at the University of Michigan, titled "Understanding Infrastructure: Dynamics, Tension, and Design. In this white paper, the authors write that

Social and historical analyses reveal some base-level tensions that complicate the work of infrastructural development. These include:

Although the white paper is primarily interested in cyberinfrastructure for computational science, which is still what most people are thinking of when they talk about cyberinfrastructure, the tensions articulated here are the same problems that we face. The authors go on to say:

Such complications challenge simple notions of infrastructure building as a planned, orderly, and mechanical act. They also suggest that boundaries between technical and social solutions are mobile, in both directions: the path between the technological and the social is not static and there is no one correct mapping. Robust cyberinfrastructure will develop only when social, organizational, and cultural issues are resolved in tandem with the creation of technology-based services. Sustained and proactive attention to these concerns will be critical to long-term success.

This passage suggests why it might be useful to talk not only about centers, but also about a national network or coalition of centers. Some such social structure is probably required if "social, organizational, and cultural issues are [going to be] resolved in tandem with the creation of technology-based services.

Recommendation 7: Develop and maintain open standards and robust tools.

No one wants to fund standards development--or if they do fund it, it is for a particular project, not with recurring operating funds. Maybe that's OK--after all, the argument can be made that if a standards organization doesn't have enough community support to survive on volunteer labor, it's not necessarily a good thing to keep it alive on external funding. On the other hand, some of the most profoundly important standards bodies operate on significant funding, with participation from government, private sector, and research communities. A middle road might be for funders to strongly encourage individual projects to write into their budgets membership fees for standards organization, and funds to travel to and participate in meetings of those organizations.

Developing and maintaining robust tools is a bit more of a challenge--at least, we have examples of humanities open standards that have survived for a long time, and we can point to very few software tools that, at least in their robust form, emerge from academic software development. That's OK, because the role of academic software development is to provide workable proof of concept tools, that serve their intended audience and purpose--albeit perhaps not robustly, but illustratively, at least. Designing and building the application is also a challenge, of course, and it also carries with it some research questions, but in both design and development, what distinguishes the research enterprise from its commercial equivalent is that we can imagine failures that are still useful outcomes, in the sense of being informative.

Bowker et al. talk about this too:

How we can learn more about "growing infrastructures by studying current cyberinfrastructure projects, in an iterative and informative cycle potentially beneficial to those projects and future ones? [. . . .] Anecdotal evidence from many of the workshop participants suggests that standard forms of project reporting, given the incentives of both funder and grantee, will tend to over-report experiences of success and under-report those of difficulty or failure. Efforts to accommodate and encourage the honest reporting of failure could go a long way to supporting long-term and comparative learning across the varieties of cyberinfrastructural experience. As science itself has proceeded through the disciplined and even-handed study of failure, funders and proponents of cyberinfrastructure must learn to stop hiding the bodies.

I do think that existing science cyberinfrastructure, in the sense of tools and environments that support collaboration in large, interdisciplinary research projects, has been oversold, by quite a bit. But what's wrong with that is not the fact that it doesn't work all that well yet--the problem is that when we speak and write about it, and especially when that speaking and writing has funding in view, we pretend that it does work, that it's great, that it's whiz-bang. Happily, we are not far enough along, in developing humanities cyberinfrastructure, to have much to oversell. But let us agree to try to do this one thing better than the sciences have done, and make our difficulties, the shortcomings of our tools, the challenges we haven't yet overcome, something that we actually talk about, analyze, and explicitly learn from.

Recommendation 8: Create extensive and reusable digital collections.

We have left the hardest for last. This is an area where centers can help, to some extent, by being a source of best practices that can be brought to bear on the individual project from the beginning, but even centers won't necessarily provide enough pressure, or have enough experience, to really produce this result. This is an area where centers need to be a point of contact with libraries--the library on the same campus as the center, if there's appropriate interest and expertise there, but libraries elsewhere, if not. If, as Deanna Marcum says, preservation begins at creation, then libraries, who will eventually be faced with collecting the products of digital humanities research, need to be involved as early as possible, in the creation of those products. There is a reciprocal benefit, as well: if library collections are taken out of their domestic context and subjected to expectations and uses that go beyond the ones envisioned by their creators. Texts that are prepared with the notion that they will always be used in the same way, for browsing and searching, in the same environment for which they were originally prepared, have a tendency to leave certain kinds of information implicit--it's implicit elsewhere in the system, and not explicit anywhere in the text itself. Once you start to aggregate these resources and combine them in a new context and for a new purpose, you find out, in practical terms, what it means to say that that their creators really only envisioned them being processed in their original context--for example, the texts don't carry within themselves a public URL, or any form of public identifier that would allow me to return a user to the public version of that text. They often don't have a proper Doctype declaration that would identify the DTD or schema according to which they are marked up, and if they do, it usually doesn't point to a publicly accessible version of that DTD or schema. Things like entity references may be unresolvable, given only the text and not the system in which it is usually processed. The list goes on: in short, it's as though the data has suddenly found itself in Union Station in its pajamas: it is not properly dressed for its new environment. So, there's some benefit to the library, and to the long-term survivability and usefulness of their collections, or publishers' collections, to have them used in new ways, in research.

Closing this presentation and, I hope, opening a useful discussion, here are a few other benefits of digital humanities centers:

  1. Centers can function as institutions that mentors humanities faculty and graduate students in the fine art of collaboration
  2. Centers can collect and sustain staff expertise that no individual project could afford.
  3. Centers can inculcate, in humanities faculty, an awareness of external funding opportunities and an understanding of how to pursue those opportunities, and a sense of why it's worth doing so.
  4. Centers can help faculty produce better grant proposals.
  5. Centers can provide funders with some long-term stability for individual research projects, and they can help to assure that the work funded in a particular project won't be orphaned, institutionally.
  6. Centers can provide graduate students with opportunities to work as part of a collective intellectual enterprise, which is quite unusual for them--and the experience can provide them with valuable experience when they apply for faculty jobs, or with experience that will open other career opportunities for them.
  7. Centers can involve humanities faculty in research projects that are collaborative, rely on staff support and computing infrastructure, and bring in external funding: all of these things make humanities faculty more difficult to relocate from one university to another, so the Center is an effective instrument of retention.
  8. Centers can be a point of connection between humanities faculty and LIS programs, which would be very fruitful. LIS faculty are about half from other disciplines, and humanities computing is very much about information organization, ontologies, taxonomies, schema, preservation, interface design, and other issues that are studied and taught in LIS programs. The LIS connection also would help to activate the NEH/IMLS connection, as well as the NSF cyberinfrastructure connection.

Thank you very much for your time, and my apologies for having to deliver this paper by proxy, but thanks very much to the proxy, and I look forward to hearing about the discussion.