Abstractive Text Summarisation using Transformers
Abstractive text summarization focuses on generating a meaningful summary of the given text as opposed to extractive text summarization which concatenates important sentences from the paragraph. In recent years most of the research being done in Abstractive text Summarization has been based on Recurrent Neural Networks, but RNN based approaches don't do well when long sequences are taken into consideration. In this project, we have explored the state-of-the-art Transformer Neural Networks and have applied them to the problem of solving abstractive text summarization. We propose a Transformer based neural network and then compare the results obtained by Transformers with a Bi-Directional LSTM with Attention for the dataset we have used. Our experimental results show that Transformers perform as well as Bi-LSTMs on the dataset that we have chosen. Transformers however increase the training efficiency and decrease the training times per epoch by a whole lot.
Transformer Seq2Seq Encoder-Decoder Architecture
flowchart LR
Input["Input Article Tokens"] -->|Positional Encoding| Enc["Encoder Stack"]
Enc -->|Self-Attention Blocks| EncOut["Context Representations"]
EncOut -->|Key/Value Vectors| Attn["Multi-Head Attention Alignment"]
PrevOut["Generated Summary Tokens"] -->|Positional Encoding| Dec["Decoder Stack"]
Dec -->|Masked Self-Attention| Attn
Attn -->|Attention Output| Linear["Linear Projection"]
Linear -->|Softmax probabilities| Prob["Next Token Prediction"]
style Enc fill:#3572A5,stroke:#fff,stroke-width:2px,color:#fff
style EncOut fill:#3572A5,stroke:#fff,stroke-width:2px,color:#fff
style Dec fill:#ff5277,stroke:#fff,stroke-width:2px,color:#fff
style Attn fill:#1DB954,stroke:#fff,stroke-width:2px,color:#fff
style Linear fill:#1DB954,stroke:#fff,stroke-width:2px,color:#fff
style Prob fill:#1DB954,stroke:#fff,stroke-width:2px,color:#fff
More on this project can be found here