How to track Google “Top Stories” carousel

Author: Svet Petkov
Published on: October 13, 2022
Last Modified: August 9, 2024

What is Google “Top Stories” carousel?

Appearing on Google‘s Top Stories carousel is a great way to get exposure for news content and boost organic web traffic. However, Google is not providing much data (clicks, impressions etc.) that you can receive from this carousel.

From a technical perspective Top Stories carousel is an AI-powered search engine results page (SERP) feature that displays useful and timely articles from a broad range of high–quality and trustworthy news providers.

What Python libraries do you need?

import requests 
import json
import pandas as pd

Setting up the parameters for the request

To scrape the “Top Stories” carousel with SEPRapi you need to create a free registration in SEPRapi which provides 100 free requests per month.

When you create the registration you should copy the API Key and paste into the script.

Also, they are providing a really nice feature SERPapi Playground where you can adjust your parameters based on your needs.

The current example is with the following parameters:

  • Search engine – Google
  • Query – “nfl news”
  • Location – New York
  • Domain – Google.com
  • Language – English
  • Region – US
  • Device – Mobile
params = {
  "engine": "google",
  'q':'nfl predictions',
  "location_requested": "New York, NY, United States",
  "location_used": "New York,NY,United States",
  "google_domain": "google.com",
  "hl": "en",
  "gl": "us",
  "device": "mobile",
  "api_key":"API KEY"
}

Sending the request to SERPapi

In the variable “response” I am storing the response from the API. From the library, “request” and the function get we can send the request with the concatenate parameters.

response = requests.get("https://serpapi.com/search.json?", params).json()

Storing the on the Top Stories carousel from the API response in a Pandas DataFrame

The first step should be to filter from the whole API response, which contains the information from all SEPR (Search Engine Result Page) and to store only the information from “Top Stories” in a variable.

As you may know, on a mobile device, there is more than one carouse with news (Top stories, close topic and Also in News). That’s why in a second variable we can add only the “Top Stories” carousel in Pandas DataFrame.

top_t = response['top_stories']
carousel = pd.DataFrame(top_t['carousel'])

Adding more data into the data frame from the API

  • In the variable "kwrd", I added the query and stored in a "Keyword" column
  • In a second variable, I'm storing what is the device type
  • The second variable is the date and hour of the check in the column “Date of check”
kwrd = response['search_parameters']['q']
device = response['search_parameters']['device']
date = response["search_metadata"]['processed_at']
carousel['Keyword'] = kwrd
carousel['Device'] = device
carousel['Date of check'] = date

Exporting the data into a Google Sheets document.

This is a standard piece of code that you can use to export a dataFrame into a Google Sheets file. This is a great opportunity to use also Looker Studio to visualize the data.

from google.colab import auth
auth.authenticate_user()

import gspread
from google.auth import default
creds, _ = default()

gc = gspread.authorize(creds)
doc_name = 'Top Stories'
from gspread_dataframe import get_as_dataframe, set_with_dataframe

sh = gc.create(doc_name)
worksheet = gc.open(doc_name).sheet1

set_with_dataframe(worksheet, carousel)

Exported data from the “Top Stories” carousel to Google Sheets

That is how should look your exported dataFrame with all needed values from the API.

You can find this simple script in GitHub and or get in touch with me on X or LinkedIn

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram