How To Scrape Hashtags with Twitter API using Python - Rapid Blog (2024)

Table of Contents

  • Get Access To Twitter API
    • 1. Signup for RapidAPI Account
    • 2. Navigate to Twitter API Console
    • 3. Subscribe to Twitter API
  • How To Use The Twitter API with Python
  • Building a Hashtag Generator Tool using the Twitter API and Python
    • Prerequisites
    • Development Steps
    • Testing Steps
  • Power Up Your Social Apps with RapidAPI

Twitter is a treasure trove of information. It is incredible how, despite the strict constraints on content length, people have embraced it and forged meaningful conversations with celebrities and strangers. However, if you decide to put Twitter to some smart use, it opens up more opportunities. All thanks to the Advanced Twitter search feature.

Thankfully. Twitter search is also available as an API on the RapidAPI marketplace, courtesy of SocialMiner. So in this blog post, we decided to build a quick tool that leverages Twitter to generate reports on the hashtags. Hashtags are an excellent way of instantly finding information related to a trend or a domain. But given the timespan and lightning speed of tweets, it is better to scrape relevant hashtags.

Suppose you are a smart developer, then using the Twitter API alongside Python, you can build a tool to scrape out and generate tweets containing hashtags of your interest. This post shows you the exact steps to build this Python script and demonstrate how you can put this to some great use.

But first, let’s get to understand the Twitter API.

Get Access To Twitter API

1. Signup for RapidAPI Account

To begin using the Twitter API, you’ll first need to sign up for a free RapidAPI developer account. With this account, you get a universal API Key to access all APIs hosted in RapidAPI.

RapidAPI is the world’s largest API marketplace, with over 10,000 APIs and a community of over 1,000,000 developers. Our goal is to help developers find and connect to APIs to help them build amazing apps.

2. Navigate to Twitter API Console

You can access the Twitter API directly through the API Console.

How To Scrape Hashtags with Twitter API using Python - Rapid Blog (1)

3. Subscribe to Twitter API

Once you are in the API console, click on the “Pricing” tab to look at the subscription tiers available for Twitter API.

Twitter API is available on a paid subscription model only. Under the BASIC plan, you get 1000 API calls per month for $5. To evaluate this API as part of this post, you can subscribe to the BASIC plan.

How To Use The Twitter API with Python

The Twitter API offers quite a few options. You can check out the API endpoints under the left panel on the main API console.

How To Scrape Hashtags with Twitter API using Python - Rapid Blog (2)

Apart from fetching specific tweets, profile details, and followers, following list for a Twitter handle, this API also supports advanced tweet search. The GET Search API endpoint offers a hashtag search that is equivalent to searching on the Twitter website.

Upon selecting the GET Search API endpoint, you get a bunch of optional parameters to refine your search query.

How To Scrape Hashtags with Twitter API using Python - Rapid Blog (3)

The parameters are mostly self-explanatory. Triggering the API with these default parameter values will fetch the tweets containing ‘#trump’ on CNN’s Twitter handle between 1st January 2018 and 10th October 2020.

The API response contains a paginated subset of tweets matching the search criteria.

How To Scrape Hashtags with Twitter API using Python - Rapid Blog (4)

Expanding each tweet id reveals a lot of information that we typically associate with a tweet.

“1314406909319901184”:{33 items

“created_at”:“Fri Oct 09 03:26:42 +0000 2020”

“id”:1314406909319901200

“id_str”:“1314406909319901184”

“full_text”:“#HypocriteInChief @realDonaldTrump , whos is #ProLife and #AntiAbortion , was given #COVID19 drug #Regeneron that is developed from aborted fetuses . #Trump wants everyone to have #Regeneron https://t.co/NRUeDlZGR2 #auspol #uspoli”

“truncated”:false

“display_text_range”:[]2 items

“entities”:{}4 items

“source”:“<a href=”https://mobile.twitter.com” rel=”nofollow”>Twitter Web App</a>”

“in_reply_to_status_id”:NULL

“in_reply_to_status_id_str”:NULL

“in_reply_to_user_id”:NULL

“in_reply_to_user_id_str”:NULL

“in_reply_to_screen_name”:NULL

“user_id”:927388044

“user_id_str”:“927388044”

“geo”:NULL

“coordinates”:NULL

“place”:NULL

“contributors”:NULL

“is_quote_status”:false

“retweet_count”:19

“favorite_count”:29

“reply_count”:1

“quote_count”:1

“conversation_id”:1314406909319901200

“conversation_id_str”:“1314406909319901184”

“favorited”:false

“retweeted”:false

“possibly_sensitive”:false

“possibly_sensitive_editable”:true

“card”:{}6 items

“lang”:“en”

“supplemental_language”:NULL

}

You can see that this tweet contains the hashtag ‘#trump’. Apart from the tweet text, creation time and the Twitter user id are some of the most common information to look for.

To get a better idea of using the API, try some other hashtags, and different date ranges to mix things up.

Now that you have figured out a way of searching hashtags on Twitter without manual intervention, the next obvious question is: What can we do with it?

Building a Hashtag Generator Tool using the Twitter API and Python

If you use Twitter to keep a tab of many hashtags, you surely know the angst of doing all the chores. Marketing executives do that to keep track of relevant hashtags about competitors. Professionals from different industries keep track of many hashtags to scout for collaboration, PR, or other opportunities.

There are many SaaS-based services available for managing hashtags. However, writing your own tool is a lot more fun. So, let us wear our programmer’s hat and get ready to write some Python code to generate a custom hashtag report.

Before proceeding, you must take into account a few prerequisites for setting up a Python development environment.

Prerequisites

To make the Python code work, you have to set up the Python runtime and development environment and a few libraries.

Follow the steps below to prepare your environment on your development computer.

  1. Install Anaconda: This is a complete Python platform with many libraries and its own set of tools. It also contains the conda CLI tool for Python environment and package management. Download and Install Anaconda from the official link.
  1. Create a new Python3 environment: Open a command line terminal and use the conda CLI to create a new Python environment for this project, named twitterapi.

# conda create —-name twitterapi

3. Activate the twitterapi environment.

# conda activate twitterapi

  1. Install Python packages: We will use a few Python libraries in the course of developing the hashtag generator tool. Install them using the following conda commands.

# conda install requests

# conda install mako

These commands will install the requests and mako library. You will get to understand more about their significance a little later.

Development Steps

Launch your favorite Python code editor and get ready to write some code now. Create a new Python file, ‘hashtag_gen.py.’ Follow the steps below to add the code portions to build the tool.

Step 1: Imports and globals

Add the import statements and global declaration as follows:

import datetimeimport jsonimport argparseimport requestsfrom mako.template import TemplateRAPIDAPI_KEY="<YOUR_RAPIDAPI_KEY>"

This is the place for importing all the libraries. A global constant is also declared to contain the RapidAPI key. Ensure that you replace the placeholder <YOUR_RAPIDAPI_KEY> with the actual API key generated within your RapidAPI subscription.

Step 2: Define the class TwitterAPISearch

Append the source code with a new Python class description as follows:

class TwitterAPISearch: """ Class for triggering search with Twitter API """ url = "https://twitter32.p.rapidapi.com/getSearch" x_rapidapi_key = RAPIDAPI_KEY x_rapidapi_host = "twitter32.p.rapidapi.com"

The class TwiterAPISearch is our main module that encapsulates the business logic for handling the Twitter API as well as generating the hashtag report.

Step 3: Define the function for triggering the API

Add the following function inside the scope of TwitterAPISearch class.

def trigger_twitter_api(self, hashtag): """ Trigger Twitter API """ # prepare query string with hashtag querystring = {"hashtag": hashtag} # set headers required for RapidAPI headers = { 'x-rapidapi-key': self.x_rapidapi_key, 'x-rapidapi-host': self.x_rapidapi_host } try: # call the API response = requests.request("GET", self.url, headers=headers, params=querystring) # Raise Error For Status response.raise_for_status() except requests.exceptions.RequestException as err: # if there is HTTPError print(f'RequestException occurred: {err}') return None else: # There is no error while calling API if response.status_code == 200: return json.loads(response.text) return None

This function calls the Twitter API’s GET Search endpoint. It uses the Python requests library to trigger the API call and checks for API errors. In the case of a 200 OK status code, it returns the API response in a JSON format.

Also, note that the API response is limited only to the tweets on the first page of the response.

Step 4: Define the function for generating report

Add another function within TwitterAPISearch class for generating the hashtag report as follows:

def generate_report(self, hashtag, api_data): """ Generate Hashtag Report """ # Generate timestamp for report file search_timestamp = datetime.datetime.now() date = search_timestamp.strftime("%m%d%Y") search_filename = hashtag + "_" + date + ".html" # Prepare Mako Template hashtag_template = Template( filename='hashtag_search_result_template.html', input_encoding='utf-8', output_encoding='utf-8', encoding_errors='replace') try: # Write the report file to disk with open(search_filename, 'wb+') as file_pointer: file_pointer.write(hashtag_template.render( Search_String=hashtag, Date=search_timestamp.strftime("%m/%d/%Y"), Time=search_timestamp.strftime("%H:%M:%S"), result_list=api_data)) file_pointer.close() print("Hashtag Report Generated") except IOError: print("Error In Opening File For Writing Hashtag Report") raise Exception

This function runs the API response through a predefined Mako template. Mako is an HTML templating engine which is used here to format the tweets in a report represented by an HTML file.

The report filename has the format starting with the hashtag string followed by ‘ _’ followed by MMDDYYYY.html.

The predefined Mako template file is named ‘hashtag_search_result_template.html’

The contents of this file resemble HTML.

<!DOCTYPE html><html lang="en"><head> <meta charset="UTF-8"> <title>Your Search Results for Hashtag : ${Search_String}</title> <style type="text/css"> .body{ font-family: sans-serif; } .tweet { border: 1px solid #8080808f; padding: 10px; margin: 10px; border-radius: 5px; font-family: sans-serif; box-shadow: 0 1px 10px rgba(0, 0, 0, 0.35); cursor: pointer; } a.tweet-link { text-decoration: none; color: #0f1419; font-family: sans-serif; } a.link { text-decoration: underline; color: #00e; font-family: sans-serif; } </style></head><body> <h1 style="text-align:center;">Your Search Results for Hashtag : ${Search_String}</h1> <h2 style="text-align:center;">Date : ${Date} , Time : ${Time}</h2> % if result_list["success"] == True: <div class="result"> % for tweet in result_list["data"]["tweets"]: <div class="tweet" onclick=" window.open('https://twitter.com/user/status/${result_list["data"]["tweets"][tweet]["id"]}','_blank')"> <h4 style="font-family: sans-serif;">${result_list["data"]["tweets"][tweet]["full_text"]}</h4> %if result_list["data"]["tweets"][tweet]["is_quote_status"] == True: <a class="link" href="${result_list["data"]["tweets"][tweet]["quoted_status_permalink"]["url"]}" rel="nofollow" target="_blank">${result_list["data"]["tweets"][tweet]["quoted_status_permalink"]["url"]}</a><br/> %endif <h5 style="color: #5b7083">Created At : ${result_list["data"]["tweets"][tweet]["created_at"]} . <a style="color: #5b7083" href="http://twitter.com/download/android" rel="nofollow" target="_blank">Twitter for Android</a></h5> </div> % endfor </div> %else: <h5 style="text-align:center;">No Data Found!!!</h5> %endif</body></html>

This file also contains additional programming constructs to generate dynamic HTML tags based on conditions. These are defined as per Mako’s templating syntax. Over here, the template loops through all the tweets and arranges them neatly in a card within an HTML page.

Ensure to save this template file with the name ‘hashtag_search_result_template.html’ and in the same location as the Python file ‘hashtag_gen.py.’

Step 5: Add the main block

Append the code with this __main__ block containing the following logic:

if __name__ == '__main__': # create argument parser command_line_arg_parser = argparse.ArgumentParser( description='An Hashtag to search') # add an argument -hashtag command_line_arg_parser.add_argument( '-hashtag', '--hashtag', help="An Hashtag to search on twitter", required=True) # parse the arguments args = command_line_arg_parser.parse_args() # get hashtag from argument list arg_hashtag = args.hashtag print("For hashtag:" + arg_hashtag) # Create object of TwitterAPISearch twitter_api = TwitterAPISearch() # Trigger twitter API resp = twitter_api.trigger_twitter_api(hashtag=arg_hashtag) if resp: print("Generating Hashtag Report For " + arg_hashtag) twitter_api.generate_report(hashtag=arg_hashtag, api_data=resp) else: # If response not found print("Error in Triggering Twitter Hashtag API")

At first, the program parses the hashtag with the command line switch –hashtag. The command-line argument processing is done via the argparse library.

Subsequently, it creates an instance, twitter_api of the TwitterAPISearch class. The twitter_api object is used to trigger the API and generate the response by calling the triggger_twitter_api( ) and generate_report( ) in sequence.

With this step, the development of the hashtag generation tool is complete. Save the file ‘hashtag_gen.py.’

Testing Steps

To test this tool, choose a hashtag that you want to track and run the Python script.

As an example, if you want to see the tweets containing ‘#api’, then the command is:

# python hashtag_gen.py -hashtag api

It takes a few seconds for the script to invoke the Twitter API and generate the HTML report file. Once completed, you can locate the report file in the same location as the Python file with filename format as per the code presented earlier in step 4.

Here is the sample output from the report file for ‘#api.’

How To Scrape Hashtags with Twitter API using Python - Rapid Blog (5)

You can click on the individual tweet card to redirect to twitter.com.

It’s time now to scrape your favorite hashtag. So go ahead and give it a shot.

This tool is very handy when you want to track hashtags on a regular basis. For instance, If you want to keep a very close eye on the API space, you can execute this tool every day to generate a daily report of all tweets containing ‘#api’.

Power Up Your Social Apps with RapidAPI

Search is one of the many features of Twitter API. So if you are interested in following user activities or tracking tweets’ engagement, it provides more options. Now that you got an idea of how to wrap this API within Python, you can explore these other options to build more sophisticated programmable tools for managing Twitter.

If you are looking for alternatives on other social media channels, check out the top social media APIs to leverage the power of APIs to make your social media management and tracking chores easier.

5/5 - (1 vote)

How To Scrape Hashtags with Twitter API using Python - Rapid Blog (2024)
Top Articles
Latest Posts
Article information

Author: Saturnina Altenwerth DVM

Last Updated:

Views: 6110

Rating: 4.3 / 5 (64 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Saturnina Altenwerth DVM

Birthday: 1992-08-21

Address: Apt. 237 662 Haag Mills, East Verenaport, MO 57071-5493

Phone: +331850833384

Job: District Real-Estate Architect

Hobby: Skateboarding, Taxidermy, Air sports, Painting, Knife making, Letterboxing, Inline skating

Introduction: My name is Saturnina Altenwerth DVM, I am a witty, perfect, combative, beautiful, determined, fancy, determined person who loves writing and wants to share my knowledge and understanding with you.