Twitter Data Analysis (Part 3) - Sentiment Analysis

 Sentiment Analysis


In this blog, we will be learning what Sentiment Analysis is and why we use it but also carry out sentiment analysis on our tweets.

What is Sentiment Analysis?

This is a process where text is analyzed to determine the attitude or emotion of a text. This text can range from a single word to a huger document. Sentiment analysis is often used within businesses. For instance, analysis can be done on reviews of a product or service. Businesses can analyze their consumer's reviews using various techniques and determine their true emotions and benefit from this. If a company has a huger amount of comment/reviews, a word cloud could be used to only show the most common words used in a matter of minutes (even seconds depending on the amount). This, however, would have taken hours or even days if humans were to analyze this manually. In our Twitter Analysis, we will be using the sentiment function which outputs two properties, Subjectivity, and Polarity. Subjectivity tells you how subjective or opinionated a text is. This varies from +1 (very opinionated) to 0 (very bias). Polarity tells you how positive, negative, or neutral a text is (+1 being very positive, -1 being very negative).

Lets Start

Before we carry out this analysis, we must preprocess our tweets As you can see from the screenshot, many tweets include URL links, # (hashtags), @mentions, RT mentions. If we include this in our analysis, we will receive inaccurate results. Therefore, we must clean our text by removing these leaving us with only the raw text/tweet. We will be using the regular expression operation (re) to clean our text. 

# Cleaning the text by removing hastags, @'s, URL's.
def CleanTxt(text):
  text = re.sub(r"@(\w+)", ' ', text) # This wil remove any @ mentions
  text = re.sub('@[^\s]+','',text) # This will remove any hashtags (#)
  text = re.sub('https?:\/\/\S+', '', text) # This will remove any URl's
  text = re.sub('RT[\s]+', '', text) # This will remove any RT mentions
return text # Cleaning the text df['Tweets'] = df['Tweets'].apply(CleanTxt) # Showing the cleaned text df

When we run this, a table should be formed that contains our cleaned tweets. 




Now we have cleaned the text, we will be using the library TextBlob and the sentiment function to calculate polarity and subjectivity. We will create two functions (one for subjectivity and polarity). These will calculate the values for each tweet and then apply this function to all the tweets, representing each value as additional columns. 

# Creating a funtion to get the subjectivity.
def getSubjectivity(text):
  return TextBlob(text).sentiment.subjectivity

# Creating a function to get the polarity. 
def getPolarity(text):
  return TextBlob(text).sentiment.polarity

# Creating two columns to store subjectivity and polarity
df['Subjectivity'] = df['Tweets'].apply(getSubjectivity)
df['Polarity'] = df['Tweets'].apply(getPolarity)

# Outputting the new data frame
df

When running this cell, a similar output should be displayed:

Following this, we will combine all the words from the tweets and use the function 'WordCloud' to produce a word cloud. A word cloud is an image that shows the most frequent words used within a text. The more often a word has been used, the bigger the font size will be. 

# Creating a wordcloud to create a visualisation of the frequent words.
allWords = ''.join([twts for twts in df['Tweets']]) # Stores all the words in one string
wordCloud = WordCloud(width=1000, height=1000, random_state=2 ,
                      max_font_size=200).generate(allWords)
plt.imshow(wordCloud, interpolation="Bilinear") # Interpolation is the process of 
estimating values that fall between known values. 
plt.axis('off')
plt.show()
      
lol = WordCloud()

This code will create a word cloud. My image may be similar to yours but this will all depend on the twitter account you will use.




In our table with all our tweets and values for subjectivity and polarity, we will create another column to show other users which tweets are positive, negative, or neutral. 

# Deciding whether the text is +ve (+1), neutral (0) or -ve (-1)
def getAnalysis(score):
  if score < 0:
    return 'Negative'
  elif score == 0:
    return 'Neutral'
  else:
    return 'Positive'

df['Analysis'] = df['Polarity'].apply(getAnalysis)
df

This will create an output similar to this:


Now we have many tweets each with values of subjectivity and polarity, we can create a scatter graph using Plyplot (plt). 

# Plotting a graph with Subjectivity against Polarity
plt.figure(figsize=(8,6))
for i in range(1, df.shape[0]):
  plt.scatter(df["Polarity"][i], df["Subjectivity"][i], color='Blue')
# Labelling axis's and title
plt.title('Twitter Sentiment Analysis')
plt.xlabel('Polarity')
plt.ylabel('Subjectivity')
plt.show()

Now, of course, you can change the size of the graph and the colour of the points to your choice. If you do NOT increase the df.index when displaying the table, remember to change the range from (1, df.shape[0]) to (0, df.shape[0]) as the starting value does not change

Next, we will display the number of positive, negative, or neutral tweets we have and also produce a percentage of each.

df['Analysis'].value_counts()

(Make sure each code is done in separate cells)

# Finding percentage of positive tweets
ptweets = df[df.Analysis == 'Positive']
ptweets = ptweets['Tweets']
ptweets
round( (ptweets.shape[0] / df.shape[0]) * 100, 1)

# Finding percentage of negative tweets
ntweets = df[df.Analysis == 'Negative']
ntweets = ntweets['Tweets']
ntweets

round( (ntweets.shape[0] / df.shape[0]) * 100, 1)

# Finding percentage of neutral tweets
neutweets = df[df.Analysis == 'Neutral']
neutweets = neutweets['Tweets']
ntweets

round( (neutweets.shape[0] / df.shape[0]) * 100, 1)

When running the first cell (df['Analysis'].value_counts()), you should see a similar table to this:



When running the other cells, I found out that there was 36% positive, 11% negative, and 53% neutral.

Finally, now that we have the number of tweets that are +ve, -ve or neutral, we can create a bar chart to represent this. This is will be done in a very similar way to how we represented the subjectivity against polarity scatter graph. 

# Creating a bar chart to visualise the count
plt.title('Twitter Sentiment Analysis')
plt.xlabel('Sentiment')
plt.ylabel('Count')
df['Analysis'].value_counts().plot(kind='bar')
plt.show()




Thanks for reading this post! I hope all of you have learned something here! 
Be sure to leave a comment on anything you're unsure of, I will try my best to reply ASAP!

Comments

Popular posts from this blog

Twitter Data Analysis (Part 2) - Google Colab

Twitter Data Analysis (Part 1) - Consumer Keys