Announcements

Contribute to the Greek and Latin Treebanks at Perseus!

We are currently looking for advanced students of Greek and Latin to contribute syntactic analyses (via a web-based system) to our existing treebanks. We particularly encourage students at various levels to design research projects around this new tool. We are looking in particular for the following:

To contribute, please contact David Bamman (david.bamman@tufts.edu) or Gregory Crane (gregory.crane@tufts.edu).

The Ancient Greek Dependency Treebank

The Ancient Greek Dependency Treebank is a 192,204-word collection of syntactically parsed Greek sentences. Currently in version 1.1, the treebank is comprised of selections from Hesiod's Works and Days, Homer's Iliad and Odyssey and the works of Aeschylus.

AuthorWorkWord count 
HesiodWorks and Days6,303
HomerIliad38,390
 Odyssey99,353
AeschylusAgamemnon9,796
 Eumenides6,376
 Libation Bearers6,563
 Persians6,223
 Prometheus Bound7,045
 Seven Against Thebes6,206
 Suppliants5,949
Total192,204

The Latin Dependency Treebank

The Latin Dependency Treebank is a 53,143-word collection of syntactically parsed Latin sentences. Currently in version 1.5, the treebank is comprised of excerpts from eight authors, in the following distribution:

AuthorWorkWord count 
CaesarB.G. (Book 2 selections)1,488
CiceroIn Catilinam 1.1-2.116,229
JeromeVulgate: Apocalypse8,382
OvidMetamorphoses: Book I4,789
PetroniusSatyricon 26-78 (Cena Trimalchionis)12,474
PropertiusElegies: Book I4,857
SallustCatilina12,311
VergilAeneid (Book 6 selections)2,613
Total53,143

Format

The texts in this collection are syntactically annotated under the formalism of a Dependency Grammar (cf. Mel'cuk 1988), in which individual words themselves are linked to their heads. This leads to structures that are familiar to traditional Classical grammars (in which an adjective, for example, "modifies" or "depends on" its head noun), as in the following annotation of ista meam norit gloria canitiem ("that glory will know my old age") from Propertius 1.8.

To represent this structure in a flat text document, we encode it using XML, as illustrated below. Here ista, the first word in the sentence (id="1") modifies the fourth word (id="4"), gloria, via the syntactic relation ATR (attributive). See the README document and syntactic guidelines below for more information on the explicit format and the syntax involved.

	<sentence id="2598662" document_id="Perseus:text:1999.02.0066" subdoc="book=1:poem=8b" span="ista0:canitiem0">
		<word id="1" form="ista" lemma="iste1" postag="p-s---fn-" head="4" relation="ATR" />
		<word id="2" form="meam" lemma="meus1" postag="a-s---fa-" head="5" relation="ATR" />
		<word id="3" form="norit" lemma="nosco1" postag="v3srsa---" head="0" relation="PRED" />
		<word id="4" form="gloria" lemma="gloria1" postag="n-s---fn-" head="3" relation="SBJ" />
		<word id="5" form="canitiem" lemma="canities1" postag="n-s---fa-" head="3" relation="OBJ" />
	</sentence>

	

Uses

The treebank data provides information about the syntactic relationship of every word in a sentence, along with its morphological analysis and the lemma it's derived from. In addition to providing crucial datasets for NLP tasks such as parsing and grammar induction, treebanks can be used by traditional scholars as well.

Student Research

Aside from the scholarly treebank of Aeschylus (described here), each sentence in the Greek and Latin Dependency Treebanks is built from the efforts of two independent annotators that are subsequently reconciled by a third. Many of our annotators are dedicated students, both graduate and undergraduate, and we are committed to engaging students in the act of scholarly research to produce scientific data that can be useful to wider Classical community. We would like to recognize the contribution of the following individuals (and class) to the creation of these treebanks and thank them for their commitment to the advancement of Classical scholarship:

Jennifer Adams, James Artz, Jennifer Curtin, James C. D'Amico, W. B. Dolan, Calliopi Dourou, Scott J. Dube, C. Dan Earley, J. F. Gentile, Francis Hartel, Connor Hayden, Kenny Hickman, Tovah Keynton, Michael Kinney, Florin Leonte, Alex Lessie, Daniel Lim Libatique, Brian Livingston, Viet Luong, Meg Luthin, George Matthews, Molly Miller, Skylar Neil, Robin Ngo, Jessica Nord, Anthony D. Yates and the Tufts University LAT-181 class (Spring 2008).

To contribute, please contact David Bamman (david.bamman@tufts.edu) or Gregory Crane (gregory.crane@tufts.edu).

Downloads

All files

latin-1.5.tar.gz
greek-1.1.tar.gz

Individual files

GreekLatin
treebank-1.1.xmltreebank-1.5.xml
treebank-1.5.xsdtreebank-1.5.xsd
README.txtREADME.txt
guidelines.pdfguidelines.pdf

Publications

Bamman, David, Francesco Mambrini and Gregory Crane (2009), "An Ownership Model of Annotation: The Ancient Greek Dependency Treebank," Proceedings of the Eighth International Workshop on Treebanks and Linguistic Theories (TLT8) (Milan), forthcoming [pdf]

Bamman, David and Gregory Crane (2008), "Building a Dynamic Lexicon from a Digital Library," Proceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL 2008) (Pittsburgh) [preprint]

Bamman, David and Gregory Crane (2008), "The Logic and Discovery of Textual Allusion," Proceedings of the Second Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2008) (Marrakech) [preprint]

Bamman, David, Marco Passarotti, Gregory Crane and Savina Raynaud (2007), "A Collaborative Model of Treebank Development," Proceedings of the Sixth International Workshop on Treebanks and Linguistic Theories (TLT2007) (Bergen) [pdf]

Bamman, David and Gregory Crane (2007), "The Latin Dependency Treebank in a Cultural Heritage Digital Library," Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007) (Prague: Assocation for Computational Linguistics), pp. 33-40 [pdf]

Bamman, David and Gregory Crane (2006), "The Design and Use of a Latin Dependency Treebank," Proceedings of the Fifth International Workshop on Treebanks and Linguistic Theories (TLT 2006) (Prague), pp. 67-78 [pdf]

Bamman, David, Marco Passarotti, Gregory Crane and Savina Raynaud (2007) "Guidelines for the Syntactic Annotation of Latin Treebanks," whitepaper (version 1.3) [pdf]