Blog > Big data

Sentiment Analysis Tutorial in Python: Trump's Tweets

The way other people think about one or another product or service has a big impact on our everyday process of making decisions. Earlier, people relied on the opinion of their friends, relatives, or products and services reposts, but the era of the Internet has made significant changes. Today opinions are collected from different people around the world via reviewing e-commerce sites as well as blogs and social nets. To transform gathered data into helpful information on how a product or service is perceived among the people, the sentiment analysis is needed.

What is sentiment analysis and why do we need it

Sentiment analysis is a computing exploration of opinions, sentiments, and emotions expressed in textual data. The reason why sentiment analysis is used increasingly by companies is due to the fact that the extracted information can result in the products and services monetizing.

Words express various kinds of sentiments. They can be positive, negative, or have no emotional overtone, be neutral. To perform analysis of the text's sentiment, the understanding of the polarity of the words is needed in order to classify sentiments into positive, negative, or neutral categories. This goal can be achieved through the use of sentiment lexicons.

Common approaches for classifying sentiment 

Sentiment analysis can be done in three ways: using ML algorithms, using dictionaries and lexicons, and combining these techniques.

The approach based on the ML algorithms got significant popularity nowadays as it gives wide opportunities for performing identification of different sentiment expressions in the text.

For performing lexicon-based approach various dictionaries with polarity scores can be found. Such dictionaries can help in establishing the connotation of the word. One of the pros of such an approach is that you don't need a training set for performing analysis, and that is why even a small piece of data can be successfully classified. However, the problem is that many words are still missing in sentiment lexicons that somewhat diminishes results of the classification.

Sentiment analysis based on the combination of ML and lexicon-based techniques is not much popular but allows to achieve much more promising results then the results of independent use of the two approaches.

The central part of the lexicon-based sentiment analysis belongs to the dictionaries. The most popular are afinn, bing, and nrc that can be found and installed on python packages repository All dictionaries are based on the polarity scores that can be positive, negative, or neutral. For Python developers, two useful sentiment tools will be helpful - VADER and TextBlob. VADER is a rule and lexicon-based tool for sentiment analysis that is adapted to sentiments that can be found in social media posts. VADER uses a list of tokens that are labeled according to their semantic connotation. TextBlob is a useful library for text processing. It provides general dealing with such tasks like phrase extraction, sentiment analysis, classification and so on.

Things needed to be done before SA 

In this tutorial, we will build a lexicon-based sentiment classifier of Donald Trump tweets with the help of the TextBlob. Let's look, which sentiments generally prevail in the scope of tweets.

As every data exploration, there are some steps needed to be done before analysis, problem statement and data preparation. As the theme of our study is already stated, let's concentrate on data preparation.

We will get tweets directly from Twitter, the data will come to us in some unordered look and that is why we need to order data into dataframe and do cleaning, removing links and stopwords.

Building sentiment classifier

First of all, we have to install packages needed for dealing with the task.

In [41]:
import tweepy
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import re
import nltk
import nltk.corpus as corp
from textblob import TextBlob 

The next step is to connect our app to Twitter via Twitter API. Provide the needed credentials that will be used in our function for connection and extracting tweets from Donald Trump's account.

In [4]:
CONSUMER_KEY    = "Key"
CONSUMER_SECRET = "Secret" 

ACCESS_TOKEN  = "Token"
ACCESS_SECRET = "Secret"
In [7]:
def twitter_access():
    auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
    auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
    api = tweepy.API(auth)
    return api

twitter = twitter_access()
In [9]:
tweets = twitter.user_timeline("RealDonaldTrump", count=600)

This is how our dataset looks:

In [81]:
tweets[0]
Out[81]:
Status(_api=<tweepy.api.API object at 0x7f987f0ce240>, _json={'created_at': 'Wed Jun 26 02:34:41 +0000 2019', 'id': 1143709133234954241, 'id_str': '1143709133234954241', 'text': 'Presidential Harassment!', 'truncated': False, 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': []}, 'source': '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 25073877, 'id_str': '25073877', 'name': 'Donald J. Trump', 'screen_name': 'realDonaldTrump', 'location': 'Washington, DC', 'description': '45th President of the United States of America🇺🇸', 'url': 'https://t.co/OMxB0x7xC5', 'entities': {'url': {'urls': [{'url': 'https://t.co/OMxB0x7xC5', 'expanded_url': 'http://www.Instagram.com/realDonaldTrump', 'display_url': 'Instagram.com/realDonaldTrump', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 61369691, 'friends_count': 47, 'listed_count': 104541, 'created_at': 'Wed Mar 18 13:46:38 +0000 2009', 'favourites_count': 7, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 42533, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': True, 'profile_background_color': '6D5C18', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': True, 'profile_image_url': 'http://pbs.twimg.com/profile_images/874276197357596672/kUuht00m_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/874276197357596672/kUuht00m_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/25073877/1560920145', 'profile_link_color': '1B95E0', 'profile_sidebar_border_color': 'BDDCAD', 'profile_sidebar_fill_color': 'C5CEC0', 'profile_text_color': '333333', 'profile_use_background_image': True, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'regular'}, 'geo': None, 'coordinates': None, 'place': None, 'contributors': None, 'is_quote_status': False, 'retweet_count': 10387, 'favorite_count': 48141, 'favorited': False, 'retweeted': False, 'lang': 'in'}, created_at=datetime.datetime(2019, 6, 26, 2, 34, 41), id=1143709133234954241, id_str='1143709133234954241', text='Presidential Harassment!', truncated=False, entities={'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': []}, source='Twitter for iPhone', source_url='http://twitter.com/download/iphone', in_reply_to_status_id=None, in_reply_to_status_id_str=None, in_reply_to_user_id=None, in_reply_to_user_id_str=None, in_reply_to_screen_name=None, author=User(_api=<tweepy.api.API object at 0x7f987f0ce240>, _json={'id': 25073877, 'id_str': '25073877', 'name': 'Donald J. Trump', 'screen_name': 'realDonaldTrump', 'location': 'Washington, DC', 'description': '45th President of the United States of America🇺🇸', 'url': 'https://t.co/OMxB0x7xC5', 'entities': {'url': {'urls': [{'url': 'https://t.co/OMxB0x7xC5', 'expanded_url': 'http://www.Instagram.com/realDonaldTrump', 'display_url': 'Instagram.com/realDonaldTrump', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 61369691, 'friends_count': 47, 'listed_count': 104541, 'created_at': 'Wed Mar 18 13:46:38 +0000 2009', 'favourites_count': 7, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 42533, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': True, 'profile_background_color': '6D5C18', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': True, 'profile_image_url': 'http://pbs.twimg.com/profile_images/874276197357596672/kUuht00m_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/874276197357596672/kUuht00m_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/25073877/1560920145', 'profile_link_color': '1B95E0', 'profile_sidebar_border_color': 'BDDCAD', 'profile_sidebar_fill_color': 'C5CEC0', 'profile_text_color': '333333', 'profile_use_background_image': True, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'regular'}, id=25073877, id_str='25073877', name='Donald J. Trump', screen_name='realDonaldTrump', location='Washington, DC', description='45th President of the United States of America🇺🇸', url='https://t.co/OMxB0x7xC5', entities={'url': {'urls': [{'url': 'https://t.co/OMxB0x7xC5', 'expanded_url': 'http://www.Instagram.com/realDonaldTrump', 'display_url': 'Instagram.com/realDonaldTrump', 'indices': [0, 23]}]}, 'description': {'urls': []}}, protected=False, followers_count=61369691, friends_count=47, listed_count=104541, created_at=datetime.datetime(2009, 3, 18, 13, 46, 38), favourites_count=7, utc_offset=None, time_zone=None, geo_enabled=True, verified=True, statuses_count=42533, lang=None, contributors_enabled=False, is_translator=False, is_translation_enabled=True, profile_background_color='6D5C18', profile_background_image_url='http://abs.twimg.com/images/themes/theme1/bg.png', profile_background_image_url_https='https://abs.twimg.com/images/themes/theme1/bg.png', profile_background_tile=True, profile_image_url='http://pbs.twimg.com/profile_images/874276197357596672/kUuht00m_normal.jpg', profile_image_url_https='https://pbs.twimg.com/profile_images/874276197357596672/kUuht00m_normal.jpg', profile_banner_url='https://pbs.twimg.com/profile_banners/25073877/1560920145', profile_link_color='1B95E0', profile_sidebar_border_color='BDDCAD', profile_sidebar_fill_color='C5CEC0', profile_text_color='333333', profile_use_background_image=True, has_extended_profile=False, default_profile=False, default_profile_image=False, following=False, follow_request_sent=False, notifications=False, translator_type='regular'), user=User(_api=<tweepy.api.API object at 0x7f987f0ce240>, _json={'id': 25073877, 'id_str': '25073877', 'name': 'Donald J. Trump', 'screen_name': 'realDonaldTrump', 'location': 'Washington, DC', 'description': '45th President of the United States of America🇺🇸', 'url': 'https://t.co/OMxB0x7xC5', 'entities': {'url': {'urls': [{'url': 'https://t.co/OMxB0x7xC5', 'expanded_url': 'http://www.Instagram.com/realDonaldTrump', 'display_url': 'Instagram.com/realDonaldTrump', 'indices': [0, 23]}]}, 'description': {'urls': []}}, 'protected': False, 'followers_count': 61369691, 'friends_count': 47, 'listed_count': 104541, 'created_at': 'Wed Mar 18 13:46:38 +0000 2009', 'favourites_count': 7, 'utc_offset': None, 'time_zone': None, 'geo_enabled': True, 'verified': True, 'statuses_count': 42533, 'lang': None, 'contributors_enabled': False, 'is_translator': False, 'is_translation_enabled': True, 'profile_background_color': '6D5C18', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': True, 'profile_image_url': 'http://pbs.twimg.com/profile_images/874276197357596672/kUuht00m_normal.jpg', 'profile_image_url_https': 'https://pbs.twimg.com/profile_images/874276197357596672/kUuht00m_normal.jpg', 'profile_banner_url': 'https://pbs.twimg.com/profile_banners/25073877/1560920145', 'profile_link_color': '1B95E0', 'profile_sidebar_border_color': 'BDDCAD', 'profile_sidebar_fill_color': 'C5CEC0', 'profile_text_color': '333333', 'profile_use_background_image': True, 'has_extended_profile': False, 'default_profile': False, 'default_profile_image': False, 'following': False, 'follow_request_sent': False, 'notifications': False, 'translator_type': 'regular'}, id=25073877, id_str='25073877', name='Donald J. Trump', screen_name='realDonaldTrump', location='Washington, DC', description='45th President of the United States of America🇺🇸', url='https://t.co/OMxB0x7xC5', entities={'url': {'urls': [{'url': 'https://t.co/OMxB0x7xC5', 'expanded_url': 'http://www.Instagram.com/realDonaldTrump', 'display_url': 'Instagram.com/realDonaldTrump', 'indices': [0, 23]}]}, 'description': {'urls': []}}, protected=False, followers_count=61369691, friends_count=47, listed_count=104541, created_at=datetime.datetime(2009, 3, 18, 13, 46, 38), favourites_count=7, utc_offset=None, time_zone=None, geo_enabled=True, verified=True, statuses_count=42533, lang=None, contributors_enabled=False, is_translator=False, is_translation_enabled=True, profile_background_color='6D5C18', profile_background_image_url='http://abs.twimg.com/images/themes/theme1/bg.png', profile_background_image_url_https='https://abs.twimg.com/images/themes/theme1/bg.png', profile_background_tile=True, profile_image_url='http://pbs.twimg.com/profile_images/874276197357596672/kUuht00m_normal.jpg', profile_image_url_https='https://pbs.twimg.com/profile_images/874276197357596672/kUuht00m_normal.jpg', profile_banner_url='https://pbs.twimg.com/profile_banners/25073877/1560920145', profile_link_color='1B95E0', profile_sidebar_border_color='BDDCAD', profile_sidebar_fill_color='C5CEC0', profile_text_color='333333', profile_use_background_image=True, has_extended_profile=False, default_profile=False, default_profile_image=False, following=False, follow_request_sent=False, notifications=False, translator_type='regular'), geo=None, coordinates=None, place=None, contributors=None, is_quote_status=False, retweet_count=10387, favorite_count=48141, favorited=False, retweeted=False, lang='in')

Not very informative, huh? Let's make our dataset look more legible.

In [101]:
tweetdata = pd.DataFrame(data=[tweet.text for tweet in tweets], columns=["tweets"])
In [102]:
tweetdata["Created at"] = [tweet.created_at for tweet in tweets]
tweetdata["retweets"] = [tweet.retweet_count for tweet in tweets]
tweetdata["source"] =  [tweet.source for tweet in tweets]
tweetdata["favorites"] = [tweet.favorite_count for tweet in tweets]

And this is how it looks now. Much better, isn't it?

In [103]:
tweetdata.head()
Out[103]:
  tweets Created at retweets source favorites
0 Presidential Harassment! 2019-06-26 02:34:41 10387 Twitter for iPhone 48141
1 Senator Thom Tillis of North Carolina has real... 2019-06-25 22:20:42 11127 Twitter for iPhone 45202
2 Staff Sgt. David Bellavia - today, we honor yo... 2019-06-25 21:38:42 11455 Twitter for iPhone 48278
3 Today, it was my great honor to present the Me... 2019-06-25 20:27:19 10389 Twitter for iPhone 44485
4 ....Martha is strong on Crime and Borders, the... 2019-06-25 19:25:20 9817 Twitter for iPhone 52995
 

The next step needed to be taken is cleaning our dataset from useless words that bring no sense and improving our dataset that will then contain, among default tweet data, its connotation (whether it's positive, negative, or neutral), sentimental score, and subjectivity.

In [104]:
stopword = corp.stopwords.words('english') + ['rt', 'https', 'co', 'u', 'go']
def clean_tweet(tweet):
    tweet = tweet.lower()
    filteredList = []
    global stopword
    tweetList = re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split()
    for i in tweetList:
        if not i in stopword:
            filteredList.append(i)
    return ' '.join(filteredList)
In [105]:
scores = []
status = []
sub = []
fullText = []
for tweet in tweetdata['tweets']:
    analysis = TextBlob(clean_tweet(tweet))
    fullText.extend(analysis.words)
    value = analysis.sentiment.polarity    
    subject = analysis.sentiment.subjectivity
    if value > 0:
        sent = 'positive'
    elif value == 0:
        sent = 'neutral'
    else:
        sent = 'negative'
    scores.append(value)
    status.append(sent)
    sub.append(subject)
In [106]:
tweetdata['sentimental_score'] = scores
tweetdata['sentiment_status'] = status
tweetdata['subjectivity'] = sub
tweetdata.drop(tweetdata.columns[2:5], axis=1, inplace=True)
In [107]:
tweetdata.head()
Out[107]:
  tweets Created at sentimental_score sentiment_status subjectivity
0 Presidential Harassment! 2019-06-26 02:34:41 0.000000 neutral 0.000000
1 Senator Thom Tillis of North Carolina has real... 2019-06-25 22:20:42 0.081481 positive 0.588889
2 Staff Sgt. David Bellavia - today, we honor yo... 2019-06-25 21:38:42 0.333333 positive 1.000000
3 Today, it was my great honor to present the Me... 2019-06-25 20:27:19 0.400000 positive 0.375000
4 ....Martha is strong on Crime and Borders, the... 2019-06-25 19:25:20 0.086667 positive 0.396667
 

For a better understanding of the obtained results, let's do some visualization.

In [109]:
positive = len(tweetdata[tweetdata['sentiment_status'] == 'positive'])
negative = len(tweetdata[tweetdata['sentiment_status'] == 'negative'])
neutral = len(tweetdata[tweetdata['sentiment_status'] == 'neutral'])
In [110]:
fig, ax = plt.subplots(figsize = (10,5))
index = range(3)
plt.bar(index[2], positive, color='green', edgecolor = 'black', width = 0.8)
plt.bar(index[0], negative, color = 'orange',edgecolor = 'black', width = 0.8)
plt.bar(index[1], neutral, color = 'grey',edgecolor = 'black', width = 0.8)
plt.legend(['Positive', 'Negative', 'Neutral'])
plt.xlabel('Sentiment Status ',fontdict = {'size' : 15})
plt.ylabel('Sentimental Frequency', fontdict = {'size' : 15})
plt.title("Donald Trump's Twitter sentiment status", fontsize = 20)
Out[110]:
Text(0.5, 1.0, "Donald Trump's Twitter sentiment status")
 
 

Conclusion

Sentiment analysis is a great way to explore emotions and opinions among society. We created basic sentiment classifier that can be used for analyzing textual data in social nets. The lexicon-based analysis allows creating own lexicon dictionaries thanks to what you can perform fine sentiment tuning depending on the task, textual data, and the goal of the analysis.