Enhancing Arabic Text Mining Using Linguistic Factors

Document Type : Original Article

Abstract

The World Wide Web overwhelms people with immense amount of widely
distributed, interconnected, rich and dynamic hypertext information. Text mining concerns
extracting knowledge from unstructured textual data. The most important task to achieve
this mission is finding the rules that relate specific words and phrases. This research
presents how Arabic morphology and Arabic synonymous, as linguistic factors, can be
used to extract the required knowledge from Arabic texts.
The contribution in this research is based on the design and implementation of a
system combining morphology, synonyms, indexing and databases for Text Mining and
Information Retrieval with different modes regarding morphology and synonyms.
The used approach is based on preprocessing the Arabic text to convert it into
semi-structured database. A suitable indexing method and an appropriate searching
mechanism are used to extract the required information. The proposed model is tested and
it showed a promising success. Shortage in Arabic Computational linguistics tools such as
Arabic lexicon tagged with semantic features appeared.

Keywords