T5-LDSum: Leveraging T5 Transformer in Hybrid Abstractive-Extractive Long Document Summarization
In the era of digital information overload, the ability to summarize books efficiently emerges as an invaluable skill. Book summarization condenses extensive texts into digestible, concise summaries, enabling readers to grasp the essence of a book without committing to reading it in its entirety. In scholarly research, various techniques for text summarization are employed, including extractive summarization, abstractive summarization, and hybrid approaches. This paper presents a novel hybrid method that leverages the strengths of both extractive and abstractive techniques, with a particular focus on the T5 text-to-text transformer model. Our methodology begins with a BERT-based extractive model paired with the LexRank algorithm to generate preliminary summaries. These summaries are subsequently refined using the T5 transformer model, known for its powerful text generation capabilities. We conducted two empirical studies using Oscar Wilde's 'The Picture of Dorian Gray,' specifically focusing on Chapter 13. Our experiments analyzed the efficacy of summaries derived from the top 20% and top 60% of sentences ranked by the extractive model. The findings indicate a consistent pattern: summaries based on the 60% sentence extraction significantly outperform those from the 20% extraction across all assessed metrics-Rouge, precision, recall, F1 scores, and human evaluations. The integration of the T5 transformer model in the refinement process is highlighted as a key component in achieving high-quality, coherent summaries. © 2024 IEEE.