As a digital infrastructure for the humanities in general and classics in particular takes shape, an unprecedented range of opportunities are emerging whereby students can contribute tangibly to their disciplines at an early stage of their education. New collections and services allow undergraduates to conduct meaningful research projects, the results of which can be disseminated immediately, linked to the primary sources to which they contribute and preserved in institutional repositories for generations as part of the core collections on which humanities research depends.
During the 2008-09 academic year, we particularly encourage students and classes to help produce, and to construct research projects based upon, treebanks for classical Greek and Latin now under development. A treebank is a large collection of syntactically parsed sentences, in which an annotator has specified the exact syntactic relationship for every word in a sentence (e.g., what the subject is, what the object is, where the prepositional phrase should be attached, which adjective modifies which noun, etc.) along with each word's morphological analysis (feminine singular nominative adjective) and the dictionary entry it's derived from (est as an inflection of edo rather than sum). The following diagram represents a syntactic annotation of ista meam norit gloria canitiem ("that glory will know my old age") from Propertius 1.8.
Treebanks will be the most important new resources for the study of Greek and Latin that have appeared since the grand scholarly projects of 19th century philology. On the one hand, treebanks make a whole new generation of reading support possible: we will be able not only to ask what individual words mean but also to view analyses of the syntactic structures of individual sentences. Treebanks, however, also allow us to place our current understanding of lexicography, linguistics and style on a wholly new, quantifiable, transparent basis and to ask new questions about Greek and Latin that were not feasible with print tools. Treebanks are the classical equivalent of the genome for the life sciences. We have released the first 50,000 word treebank for classical Latin and are beginning a major effort that is designed to produce a treebank with 1,000,000 words of classical Greek.
We urge classes and students of Greek and Latin to contribute to this larger effort in a variety of ways:
Create treebanks for particular works or sections of Greek and Latin and contribute these to the larger Greek and Latin Treebanks: Each sentence in the treebank lists who produced the syntactic analysis. These producers can be individuals or groups such as classes. In our work so far, two separate annotators have analyzed each sentence and an editor has gone through to resolve discrepancies to produce a single database with one best analysis per word. Instructors can use this model with classes, providing a final set of syntactic analyses that cites the students individually or the class as a whole. Individual contributors can submit their own syntactic analyses for review and publication. This exercise will provide a mechanism with which students can think about Greek and Latin in new ways while contributing to the sum of knowledge about the languages themselves. Many passages of Greek and Latin admit, of course, of multiple syntactic interpretations. We can accept data that includes multiple versions of the same sentence or alternate interpretations for sentences already in the treebank.
Publish variant readings and accompanying annotations: Some readings may have no impact upon the syntactic structure of a sentence (although substituting one verb or noun for another may have a huge impact on the meaning). Other readings will require substantial revision of the syntactic analysis. Adding syntactically interpreted analyses will allow us to evaluate the syntactic impact of different editorial choices between editions and thus provides an essential component for true digital editions.
Publish documented annotations with alternate interpretations: In this case, we do not simply list alternate interpretations but provide evidence that supports each interpretation. In practice, this involves comparing passages where the structure is uncertain with similar passages where the structure is less problematic. As our treebanks grow in size, we will be able to use them to place our intepretations on a basis that not only allows for quantification but is transparent: others can go beyond the numbers and check for themselves each passage on which the numbers are based. As the treebanks are still growing in size, however, we may find it easier to conduct the more focused studies listed next.
Conduct original research on Greek and Latin lexicography, linguistics or stylistic analysis and publish the results with hypertextual links directly connecting your conclusions with the passages on which they comment: Even a treebank with one million words covers only a small percentage of classical Greek or Latin. You can build on pre-existing treebanks and/or contribute new data to ask new questions that have never before been possible. The rising generation of students has an opportunity to develop a cumulative set of research projects that will shed increasing light upon Greek and Latin. Each research publication will be submitted to editorial review and, if accepted, stored permanently in the Perseus Digital Library and the emerging distributed Scaife Digital Library. This research can be as narrowly focused as "Does Cicero's use of the passive voice differ between his letters to Atticus and his Phillipics?" or as broad as "the use of ποιέω in Greek." Other options (to give you some ideas) include the following:
Lexicographical: Traditional dictionaries like the Oxford Latin Dictionary and the LSJ provide plentiful citations to support their definitions of words, usually with the most frequent sense up front. Are their judgments reflected in actual usage? Does it differ between authors?
Linguistic: Classical Latin has been thought to use a word order in which the verb follows the direct object (SOV), but the word order of its daughter languages (like Italian and Spanish) puts the verb before the direct object (SVO). Did Classical Latin actually have this SOV word order, and if so, can we chart how it changed?
Stylistic: What kind of verbs is Sallust's Catiline the subject of (i.e., what kind of actions does he undertake?) How does this compare to the actions that Cicero attributes to him in his own text (In Catilinam)? Is one more sympathetic than the other?
You do not have to create a complete treebank for an author or work to generate important results if you create a thoughtful subset. We also support forums whereby people working on the Treebank can identify the passages of interest to themselves, take responsibility for particular passages and discuss questions about how to annotate them. By coordinating efforts, we can combine the work of many classes to provide much broader coverage for individual texts, authors and genres.
Interested students and faculty should contact David Bamman or Gregory Crane at Perseus. Content submitted by February 1, 2009, will be included in May 2009 release of the classical Greek and Latin treebanks.