UPDF AI

Detecting AI-Generated Filipino Text: A Synthetic Data Approach with Pre-Trained LLMs

Camron Ong,C. Cheng

2025 · DOI: 10.1109/IALP68296.2024.11156751
International Conference on Asian Language Processing · 0 Citations

TLDR

The baseline performance of various LLMs on classifying news articles as human-written or AI-generated or low-resource languages is investigated, highlighting the need for improvement in detection methods for low-resource languages.

Abstract

Large Language Models (LLMs) are increasingly used to generate text, raising concerns about using AI-generated content. While various AI Detectors and techniques have been developed, the performance in non-English languages, especially low-resource languages like Filipino, is largely unexplored. This study investigates the baseline performance of various LLMs on classifying news articles as human-written or AI-generated. The results show that 7 LLMs tested achieved low accuracy (0.27-0.47) in the classification task, with 5 heavily predicting texts as human-written. Furthermore, weak correlations between error-based metrics and classes suggest that new or improved strategies are needed. This study highlights the need for improvement in detection methods for low-resource languages, opening an avenue for further research in the field.

Cited Papers
Citing Papers