Hybrid Arabic text summarization Approach based on Seq-to-seq and Transformer

Document Type : Original Article

Author

Higher Institute of Computers and Information Technology, Computer Depart., El. Shorouk Academy, Cairo, Egypt

10.21608/asc.2025.431306

Abstract

Text summarization is essential in natural language processing as the data volume increases quickly. Therefore, the user needs to summarize that data into a meaningful text in a short time. There are many efforts to summarize Latin texts. However, summarizing Arabic texts is challenging for many reasons, including the language’s complexity, structure, and morphology. Also, there is a need for benchmark data sources and a gold standard Arabic evaluation metrics summary. Thus, the contribution of this paper is multi-fold: First, the paper proposes a hybrid approach consisting of a Modified Sequence-To-Sequence MSTS model and a transformer-based model. The Seq-to-Seq- based model is modified by adding multi-layer encoders and a one-layer decoder to its structure. The output of the MSTS model is the extractive summarization. To generate the abstractive summarization, the extractive summarization is manipulated by a transformer-based model. Second, it introduces a new Arabic benchmark dataset, called the HASD, which includes 43k articles with their extractive and abstractive summaries. Finally, this work modifies the well-known extractive EASC benchmarks by adding to each text its abstractive summarization. The proposed model is tested using the proposed HASD and Modified EASC benchmarks and evaluated using Rouge, Bleu, and Arabic Rouge. The experimental results demonstrate competitive performance based on quantitative evaluation metrics compared to state-of-the-art methods.

Keywords

Main Subjects