Covid19 Analysis with Python
Basic Sentiment Analysis using raw NLP as well as NLTK. Sentiment analysis is the technique of analyzing sentiment behind a given piece of text. In this article, we will look at the ways to analyze the sentiments of a given text and then move on to analyze Twitter data using GetOldTweets3 API. You are expected to be familiar with python, have it installed in your system, as well have an IDE of your own. So without further adieu let’s get started!
Step 1: Clean The Data, We are gonna start off by cleaning the text for which the emotions are gonna be analyzed. So we will be creating an external text file and the passage to be analyzed is inserted into that particular text file, let’s say “main.txt”. The first step to clean the given data is to convert it into a common independent format, here we are gonna stick on to the lower case. So we are gonna read the text inside “main.txt” into python and convert it into lower case. Next Up we are gonna remove all the punctuations that are involved in the passage. The third sub-step is to tokenize the semi cleaned data. By tokenizing, I mean splitting a lump of data into a list of words (we will be using the split function), and finally we are gonna look for stop words (words that don’t have any emotion) and remove them from our final cleaned list of text.
Convert the text into lower case and remove all the punctuations.
Tokenize and remove the stop words (without NLTK).
Step 2: Emotion algorithm (Basic Natural Language Processing), once we have cleaned the data, we need to create an external file (emotions.txt) containing words and the emotions attached to it. For example “game”: ”happy”, so once the file is created, we need to read it into python and clean it as well. After cleaning it, we need to check if the word in the emotions.txt also exists in the tokenized list of words we cleaned earlier. In this way, we are able to know the emotions that are involved in the passage.
We did find the emotions attached to our passage, but what we really need is the statistics, so we need to count the number of times, a particular emotion popped up, take the count, and plot a graph. (Make sure to add from collections import Counter in order use the Counter function)
Final Step: Plot the graph using Mat Plot Lib, Once the counting is done, it’s time to plot those counts into a bar graph, so that it’s visually pleasing
Covid19 Tweet Analysis
The reason why we started off with basic sentiment analysis, is because Tweet analysis is completely based on it. The only thing that’s gonna be different here, is the text we are gonna analyze. Previously, we used to load the passage into an external text file(main.txt) and analyze it. Here, instead of an external text file, we are gonna call an API called GetOldTweets3 by Mottl.
We are gonna define a function called get_tweets, and under that function, we are gonna write the code to fetch the tweets for a given timeline, a given keyword as well as a given limit. The code is provided in the official GetOldTweets3 documentation. Everything other than that almost remains the same as our sentiment analysis part. The source code for the tweet analysis is attached at the end of this article.
Natural Language Toolkit (NLTK)
NLTK helps us to tokenize, produce predefined stop words, as well as helps us to define the emotions easily. So now we can get rid of all the manual work we had to do so far.
NLTK is very fast and efficient! But it requires an installation, and it could be pretty slow, depending on your system. So I have provided the source code without NLTK.
Tokenize, Stop Words Generation, Emotion Generation using NLTK
I hope you like it. Stay tuned with us to get more updates about new projects of Python.