UPDF AI

On the Usage of a Classical Arabic Corpus as a Language Resource

Ibrahim Bounhas

2019 · DOI: 10.1145/3277591
17 Citations

TLDR

The key issues for building generic language resources from hadiths are discussed, taking into account the relevance of related literature and the wide community of researchers that are interested in these narrations.

Abstract

This article presents a literature review of computer-science-related research applied on hadith, a kind of Arabic narration which appeared in the 7th century. We study and compare existent works in several fields of Natural Language Processing (NLP), Information Retrieval (IR), and Knowledge Extraction (KE). Thus, we illicit their main drawbacks and identify some perspectives, which may be considered by the research community. We also study the characteristics of these types of documents, by enumerating the advantages/limits of using hadith as a language resource. Moreover, our study shows that previous studies used different collections of hadiths, thus making it hard to compare their results objectively. Besides, many preprocessing steps are recurrent through these applications, thus wasting a lot of time. Consequently, the key issues for building generic language resources from hadiths are discussed, taking into account the relevance of related literature and the wide community of researchers that are interested in these narrations. The ultimate goal is to structure hadith books for multiple usages, thus building common collections which may be exploited in future applications.

Cited Papers
Citing Papers