If you can read this, your browser provides insufficient support for style sheets. The visual presentation of this document will suffer.
 

German Named Entity Recognition (NER)

In Faruqui and Pado 2010, we have developed a Named Entity Recognizer (NER) for German that is based on the Conditional Random Field-based Stanford Named Entity Recognizer and includes semantic generalization information from large untagged German corpora. To our knowledge, our system is currently (June 2010) among the best systems for German NER. See the paper for a detailed evaluation.

License

The data on this page is made available under the GNU GPL. That is, it can be used for academic (or any other) research purposes, but cannot be integrated in commercial software. By downloading the software, you acknowledge the terms and conditions of the GPL. If you use the data, please cite the paper as shown below.

German NER Classifiers

Here you can download the two best classifiers that we have developed. These classifiers have been trained on the CoNLL 2003 Shared Task German train set and use generalization data from two large German corpora:

To use the classifiers, you will need the Stanford Named Entity Recognizer, version 1.1.1. For more details on the differences between the classifiers, and on how to use them, please consult the README.

Out-of-domain evaluation data

For our out-of-domain evaluation, we annotated the first two German Europarl session transcripts. They are also available, in the ConLL 2003 column format:

Reference

@InProceedings{faruqui10:_training
  author =       {Manaal Faruqui and Sebastian Pad\'o},
  title =        {Training and Evaluating a German Named Entity Recognizer 
                  with Semantic Generalization},
  booktitle = {Proceedings of KONVENS 2010},
  year =         2010,
  address =      {Saarbr\"ucken, Germany},
  note =         {To appear}}