UPDF AI

Choosing the Right Evaluation for Machine Translation: an Examination of Annotator and Automatic Metric Performance on Human Judgment Tasks

Michael J. Denkowski,A. Lavie

2010 · DBLP: conf/amta/DenkowskiL10
Conference of the Association for Machine Translation in the Americas · 54 Citations

TLDR

This paper examines the motivation, design, and practical results of several types of human evaluation tasks for machine translation and explores the practicality of tuning automatic evaluation metrics to each judgment type in a comprehensive experiment using the METEOR-NEXT metric.