When we began work on Perseus in 1985, support from the Annenberg/CPB Project allowed us to create a critical mass of information - textual, archaeological, and artistic - about the ancient Greek world. As the Greek collections in Perseus matured, we were able not only to include Roman civilization but to explore other areas in the humanities, such as the history of science and early modern English, for which our collections and infrastructure were useful.
This broader research agenda led in 1998 to a major grant from the Digital Library Initiative Phase 2, funded primarily by the National Endowment for the Humanities (NEH) and the National Science Foundation (NSF), which funded us to study the problems of creating a digital library for the humanities as a whole. With this as our foundational support, we were able to produce collections on topics such as the History and Topography of London, the American Civil War, and early modern English, and services such as historical named entity identification and a digital library environment that anticipated services offered by giants such as Yahoo and Google and whose full functionality still exceeds any system with which we are familiar, and a stream of publications on the methods involved.
We had already begun building what the NSF would in the early years of this century call Cyberinfrastructure: an aggregate of collections and services, automatically linked and analyzed, that begins to show emergent properties qualitatively distinct from the print world. After having identified a number of practices that distinguish digital from print infrastructure, we had begun to define the services and collections that this new digital infrastructure would require. We then began addressing the problem of extracting the sophisticated knowledge needed for these new services from very large collections with hundreds of thousands and millions of books, available only as scanned page images. Planning grants from the Mellon Foundation allowed us to run a series of seminars and preliminary research on the general problem of "What do you do with a million books?" and the more specific topic of "Classics in the Million Book Library."
Our research on a range of topics outside of Classics allowed us to see how the problems of classical studies related to those of other disciplines. In exploring the challenges and opportunities of very large collections, we looked at the problems of collections from the early modern period (which present the greatest challenges for automatic processing), from the 19th century (which provide a best case, with many documents in easily analyzed print and a wealth of detailed information about people, places, organizations and other topics in a modern format), and classical studies (with the complex layouts of its critical editions, lexica, and other reference works, and its need to manage materials in not only Greek and Latin but English, French, German, and Italian, and, if we wanted to cover the full classical tradition, Arabic as well). Since classical editions were among the first printed books and classical scholarship has not only continued ever since but also particularly flourished in the 19th century, we realized that classical studies raised a superset of the challenges we had set out to study. When we considered as well that classical studies covered not only literature and history but ancient science and medicine, and art and archaeology, we realized that classics covered a superset of the problems that many other fields within the humanities faced. Having worked on Shakespeare and Early Modern English, 19th century newspapers and the American civil war, the city of London and the history of early modern science, we decided that classical studies provided the best space within which to advance our work on a cyberinfrastructure for the humanities in general.