- Overview
- Role Projection Data
- Nominalisation Data
Computational Linguistics Datasets
A number of datasets that I've used in studies are available for research purposes.
License
The data on this page and its subpages is made available under the GNU GPL. That is, it can be used for academic (or any other) research purposes, but cannot be integrated in commercial software. By downloading the data, you acknowledge the terms and conditions of the GPL. If you use the data, please cite the papers indicated on the respective pages.
Datasets
- 2010. Textual Entailment Data with Discourse Annotation. (Mirkin, Dagan, and Pado 2010). The dataset and guidelines are stored externally. Please continue to http://www.cs.biu.ac.il/~nlp/downloads/discourse-for-entailment.html.
- 2010. Manual Named Entity annotation for German EUROPARL data. German classifiers for the Stanford CRF-based NER systems (optimized in April 2010 and reported in Faruqui and Pado 2010) and manually annotated EUROPARL data as out-of-domain testset. See the German NER page.
- 2010. Selectional Preferences for German and Spanish. (Peirsman and Pado 2010). Contact me.
- 2009. Projection of semantic roles. The 1000-sentence bilingual English-German corpus with role-semantic annotation (Pado and Lapata 2009, Pado and Erk 2010) is now available for download.
- 2008. Semi-supervised SRL for event nouns. The specification of Pado, Pennacchiotti, and Sporleder 2008 is here.
- 2007. Projection of frame-semantic classifications. Projected FrameNet predicate classes (Pado 2007) are available for German and French. Contact me.