Choosing the Right Evaluation for Machine Translation: an Examination of Annotator and Automatic Metric Performance on Human Judgment Tasks
Choosing the Right Evaluation for Machine Translation: an Examination of Annotator and Automatic Metric Performance on Human Judgment Tasks
Michael J. Denkowski,A. Lavie
2010 · DBLP: conf/amta/DenkowskiL10
Conference of the Association for Machine Translation in the Americas · 54 Citations
TLDR
This paper examines the motivation, design, and practical results of several types of human evaluation tasks for machine translation and explores the practicality of tuning automatic evaluation metrics to each judgment type in a comprehensive experiment using the METEOR-NEXT metric.
