Twitter Data Analysis (Part 2) - Google Colab

Colaboratory

Without any written code, our Twitter analysis is not possible. I have chosen to write the code using Google Colab (Python 3) as I am fairly familiar with it but as its also hassle-free for you as there's no need to download or set up any software. 

In this post, we will be discussing what Colab is and also start our coding!

What is Google Colab?

Colaboratory (Colab for short) is a hosted Jupypter Notebook environment provided by Google Research (thus the name Google Colab). As it is produced by Google, your notebooks are automatically saved to your Google Drive. This allows you to write, execute, and save your code all through your browser using your google accounts, all absolutely free!. In addition to this, sharing your work is as simple as sharing a normal document through Google Docs, by clicking on the Share button at the top right of the page.

Where do I start?

Start by going to https://colab.research.google.com/notebooks/intro.ipynb and sign in with your Google account.

Once signed in, this popup should appear at the center of your screen.


As you can see from the screenshot, there are 5 headers that are all useful at some point:

1. Examples - These are Notebooks already created by Google guiding you to any features you may want to use (E.g, a guide on using Charts, Forms, Data Tables, Accessing files, etc).

2. Recent - Shows your most recent Notebooks you have edited

3. Google Drive - This area includes any Notebooks you may have saved on Google Drive and want to access.

4. Github - Allows you to insert a Github URL, thus allowing you to access their Notebooks. Github is a place that allows users to publish their work in which other users may use as they wish.

5. Upload - Allows you to drag and drop a Notebook into the give section to open it.

Lets Start Coding

Click on ‘NEW NOTEBOOK’. From here you can start coding.

We will begin by importing libraries into our code that we will be using in this blog and future blogs.

# This is a program that will parse tweets fetched from Twitter using Python (Colab)
# Importing the libraries
import tweepy
from textblob import TextBlob
from wordcloud import WordCloud
import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt
plt.style.use("fivethirtyeight")

All of these libraries will be used, some are needed to access the Twitter API (tweepy), and some are for our sentiment analysis that we will be doing.
Once you have this, run this cell by clicking the play button next to code (no errors should pop-up, if a popup does appear, make sure you check your spelling as this is a common mistake). Once this is done, click the ' + Code ' button which will create a new cell. 

 
# Twitter API credentials
consumerKey = '-------------------------'
consumerSecret = '--------------------------------------------------'
accessToken = '--------------------------------------------------'
accessTokenSecret = '---------------------------------------------'

In this cell, we will manually enter our 4 keys into assigned variables. Many other articles may suggest you store the keys using your command line. This is the more safe and acceptable route but as long as you are not sharing your code with others, writing the keys in your code is fine. However, if you do want to use the command line, type in consumerKey = "(type your key in)", then in your code, write 'import os' (first cell) and consumerKey = os.environ['consumerKey'] (second cell). Repeat this for all 4 keys changing the variable name.

Next, we will use our keys and the OAuth function from the library 'tweepy' to create an authentication object. This object will include both Consumer Keys. We will then also include our Token Keys, which are AccessToken and AccessTokenSecret. 

# Creating the authentication object
authenticate = tweepy.OAuthHandler(consumerKey, consumerSecret)
# Setting the access token and access token secret
authenticate.set_access_token(accessToken, accessTokenSecret)
# Creating the API object while passing the authentication information
api = tweepy.API(authenticate, wait_on_rate_limit= True)

The wait_on_rate_limit determines whether or not to automatically wait for rate limits to replenish. In our case, we have set this to True, so it will automatically wait. 

Once this is done, we will retrieve a number of recent tweets from a twitter account that the user desires. 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Extracting tweets from the twitter user
account = str(input("Enter the twitter account you would like to see: @"))
num = int(input("Enter the number of recent tweets you would like to use: "))
posts = api.user_timeline(screen_name = account, count=num, lang= "en" 
                          ,tweet_mode = "extended")
print("Show the ",num," recent tweets \n")  
i=1
for tweet in posts[0:num]:
  print(str(i) + ') '+ tweet.full_text + '\n')
  i+=1

When we run this cell, and input the twitter account with the number of recent tweets to gather, we should see a list of tweets. In this case, I will be using @scottturner and 100 recent tweets. With this input, we receive an output of : 

( The screenshot only shows 18 but the list goes up to 100)

With these tweets, we can create a table from the library 'pandas' and the function pd.DataFrame().

# Creating a data frame with a column called Tweets so it looks nice
df = pd.DataFrame( [tweet.full_text for tweet in posts] , columns=['Tweets'])
df.index = df.index + 1
df.head()

We must increase the index of the data frame as the table would usually start at 0.


In the next blog, we will be carrying out a sentiment analysis of the tweets we have gathered. 

Be sure to leave a comment on anything you're unsure of, I will try my best to reply ASAP!

Comments

Popular posts from this blog

Twitter Data Analysis (Part 1) - Consumer Keys

Twitter Data Analysis (Part 3) - Sentiment Analysis