UPDF AI

Effective Self-Training for Parsing

David McClosky,Eugene Charniak,Mark Johnson

2006 · DOI: 10.3115/1220835.1220855
North American Chapter of the Association for Computational Linguistics · 引用 694 次

TLDR

This work presents a simple, but surprisingly effective, method of self-training a two-phase parser-reranker system using readily available unlabeled data and shows that this type of bootstrapping is possible for parsing when the bootstrapped parses are processed by a discriminative reranker.

摘要

We present a simple, but surprisingly effective, method of self-training a two-phase parser-reranker system using readily available unlabeled data. We show that this type of bootstrapping is possible for parsing when the bootstrapped parses are processed by a discriminative reranker. Our improved model achieves an f-score of 92.1%, an absolute 1.1% improvement (12% error reduction) over the previous best result for Wall Street Journal parsing. Finally, we provide some analysis to better understand the phenomenon.