How to make text summarizer using Python - Ryzen Hunt

How to make text summerization using Machine learning, Natural Language Processing, & AI in python using NLTK library. Explained by Ryzen Hunt.

How to make text summarizing tool using Python - Ryzen Hunt

How to make text summarizing tool using Python - Ryzen Hunt

Hello Earthians, in my previous posts , I have explained a basic tutorial of Arduino that how we can interface a servo motor with Arduino UNO. In this post, I will explain, how you can develop your own text-summerizer using NLTK Library in Python.

If you don't know what is Python and how we program in python, then please check my previous post, mentioned below:

What is Text summerization?

Text summarization is the process of reducing a large text document into a condensed form while preserving its key information. This technique has a wide range of applications in fields such as natural language processing, information retrieval, and data analysis. In this blog, we will discuss how to create a text summarization module using Python.

There are two main types of text summarization techniques: extractive summarization and abstractive summarization. Extractive summarization involves selecting important sentences or phrases from the text and using them to generate a summary. Abstractive summarization involves generating new sentences that summarize the information in the text. In this blog, we will focus on extractive summarization.

The first step in creating a text summarization module is to clean the text data. This involves removing any irrelevant information such as punctuation, stop words, and special characters. Stop words are common words such as “the,” “an,” and “and” that do not carry much meaning and can be removed from the text without affecting its overall meaning.
Once the text data has been cleaned, the next step is to create a frequency distribution of the words in the text. This will give us a measure of the importance of each word in the text. We can use the Natural Language Toolkit (NLTK) library in Python to perform this task. Next, we can use the frequency distribution to calculate the TF-IDF (term frequency-inverse document frequency) score for each sentence in the text.

The TF-IDF score is a measure of the importance of a sentence in the text. Sentences with high TF-IDF scores are likely to contain important information and should be included in the summary. Finally, we can use the TF-IDF scores to select the most important sentences in the text and generate a summary. The summary can be generated by concatenating the selected sentences and removing any redundant information.

Requirements:

Python(3.7 or above)
IDE(Visual Studio Code, Pycharm, etc.)
Internet connection

Source Code:

//This code is written by Anikesh at RyzenHunt
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize, word_tokenize
from sklearn.feature_extraction.text import TfidfVectorizer

def text_summarization(text):
    stop_words = set(stopwords.words("english"))
    words = word_tokenize(text)
    words = [word.lower() for word in words if word.isalpha()]
    words = [word for word in words if word not in stop_words]
    
    sentences = sent_tokenize(text)
    tfidf = TfidfVectorizer()
    matrix = tfidf.fit_transform(sentences)
    scores = zip(tfidf.get_feature_names(),
                 np.asarray(matrix.sum(axis=0)).ravel())
    sorted_scores = sorted(scores, key=lambda x: x[1], reverse=True)

    selected_sentences = []
    for sentence in sentences:
        for word, score in sorted_scores:
            if word in sentence.lower():
                selected_sentences.append(sentence)
                break
    summary = " ".join(selected_sentences)
    return summary

Result:

In this code, we first remove any stop words from the text and perform sentence tokenization to split the text into sentences. Then, we use the TfidfVectorizer from the scikit-learn library to calculate the TF-IDF scores for each sentence. We sort the scores in descending order and select the sentences that contain the words with the highest scores to generate the summary. In case, if it doesn't work, please write in the comment box or contact me.

Also check:

Hope you learned something new here. Please share this with your friends and Don't forget to share your comments and valuable suggestions and feedback. If you have any questions or doubts, plz feel free to ask, we would reply soon or answer in our next post.

For visiting our Python Course Series click here and for HTML Course Series click here and also check out our Arduino Course click here.

Search this blog

How to make text summarizer using Python - Ryzen Hunt

How to make text summarizing tool using Python - Ryzen Hunt

What is Text summerization?

Requirements:

Popular Posts:

Source Code:

Result:

Also check:

Data Types - Ryzen Hunt

How to make text summarizer using Python - Ryzen Hunt

How to make text summarizing tool using Python - Ryzen Hunt

What is Text summerization?

Requirements:

Popular Posts:

Source Code:

Result:

Also check:

You may like these posts