If you can read this, your browser provides insufficient support for style sheets. The visual presentation of this document will suffer.
 

Pado et al. 2009b

S. Pado, M. Galley, D. Jurafsky, and C. Manning: Robust Machine Translation Evaluation with Entailment Features. Proceedings of ACL 2009. Singapore.


Existing evaluation metrics for machine translation lack crucial robustness: their correlations with human quality judgments vary considerably across languages and genres. We believe that the main reason is their inability to properly capture meaning: A good translation candidate means the same thing as the reference translation, regardless of formulation. We propose a metric that evaluates MT output based on a rich set of features motivated by textual entailment, such as lexical-semantic (in-)compatibility and argument structure overlap. We compare this metric against a combination metric of four state-of-the-art scores (BLEU, NIST, TER, and METEOR) in two different settings. The combination metric outperforms the individual scores, but is bested by the entailment-based metric. Combining the entailment and traditional features yields further improvements.


@InProceedings{pado-EtAl:2009:ACLIJCNLP,
  author    = {Pado, Sebastian  and  Galley, Michel  and  
               Jurafsky, Dan  and  Manning, Christopher D.},
  title     = {Robust Machine Translation Evaluation with Entailment Features},
  booktitle = {Proceedings of the Joint Conference of the 47th Annual Meeting 
               of the ACL and the 4th International Joint Conference on 
               Natural Language Processing of the AFNLP},
  month     = {August},
  year      = {2009},
  address   = {Suntec, Singapore},
  publisher = {Association for Computational Linguistics},
  pages     = {297--305},
  url       = {http://www.aclweb.org/anthology/P/P09/P09-1034}
}