German Named Entity Recognition (NER)

In Faruqui and Pado 2010, we have developed a Named Entity Recognizer (NER) for German that is based on the Conditional Random Field-based Stanford Named Entity Recognizer and includes semantic generalization information from large untagged German corpora. To our knowledge, our system is currently (June 2010) among the best systems for German NER. See the paper for a detailed evaluation.

The classifiers have been trained on the CoNLL 2003 Shared Task German train set and use generalization data from two large German corpora, namely the HGC (Stuttgart University Newspaper Corpus) and deWac (the .de top-level domain "web as corpus").

License

The data on this page is made available for academic purposes (teaching and research). The Stanford NER and the German classifiers may not be used in commercial software. If you use the data, please cite the paper as shown below.

Current classifiers

The current version of the classifiers are compatible with the Stanford Named Entity Recognizer version 1.2 and the Stanford CoreNLP tools version 1.0.4.. They are available directly from the Stanford NLP group NER page (scroll down to the link "German NER"). See also the current README here.

Legacy classifiers

The following classifier work with the (outdated) Stanford Named Entity Recognizer, version 1.1.1.

Sample input/output file

These files demonstrate the input and output format of the Stanford NER system with the German classifiers:

Out-of-domain evaluation data

For our out-of-domain evaluation, we manually annotated the first two German Europarl session transcripts with NER labels following the CoNLL 2003 annotation guidelines. They are also available, in the CoNLL 2003 column format:

Reference

@InProceedings{faruqui10:_training,
  author =       {Manaal Faruqui and Sebastian Pad\'o},
  title =        {Training and Evaluating a German Named Entity Recognizer 
                  with Semantic Generalization},
  booktitle = {Proceedings of KONVENS 2010},
  year =         2010,
  address =      {Saarbr\"ucken, Germany}}