Prompting Generative LLMs for Consistent Thai Question Generation via Semantic Retrieval
Phantiga Wattanakul,Standa Na,Kritwara Rattanaopas
TLDR
These findings confirm that language-compatible pretraining, paired with effective retrieval and prompt design, enables fast and accurate Thai question generation—supporting real-world applications across sectors and raise the question of how such approaches might generalize to other Asian languages.
Abstract
Thai Question Answering has primarily focused on extractive methods despite the availability of resources like the 17,000-pair ThaiWiki QA dataset. This study proposes a generative question similarity framework that combines FAISS retrieval with prompt-tuned large language models. To evaluate the semantic correctness of generated questions, we introduce the Semantic Answer Match (SMA) metric, which measures whether a generated question leads to the same answer as the original. Through extensive experimentation, we found that Qwen2.5 (3B) achieved 88.2% accuracy, surpassing LLaMA ChatQA 8B by 150.6%, while Granite 3.3 (3B) followed closely with 87.5% accuracy. These results highlight the importance of language compatibility over model size, especially in lowresource settings. Additionally, our findings show that prompt design significantly influences generative performance, with certain base prompts achieving consistent results across models. Remarkably, even simple base prompts and instruction formats yield strong results with lower inference time. These findings confirm that language-compatible pretraining, paired with effective retrieval and prompt design, enables fast and accurate Thai question generation—supporting real-world applications across sectors. They also raise the question of how such approaches might generalize to other Asian languages.
