What Is a Google News XML Sitemap and How Can You Monitor It?

Google NewsPythonPublishing SEO~8 min read

By Svet PetkovPublished 03/03/2026 · Updated 03/03/2026

A Google News XML sitemap is a specialised XML file that helps Google News discover and process recent news articles more efficiently. It contains article URLs together with metadata such as publication name, publication date, and title.

The key difference between a regular XML sitemap and a Google News sitemap is freshness. A Google News XML sitemap should only include articles published in the last 48 hours, which makes it useful both for Google and for competitive monitoring.

What a Google News XML sitemap does

Submitting a Google News sitemap helps Google crawl and index eligible news content faster. For publishers, that matters because speed and discoverability are often critical in Google News, Top Stories, and other freshness-driven surfaces.

Regular XML sitemap	Google News XML sitemap
Can include a broad set of site URLs.	Should include only recent news articles.
Used for general discovery and crawl guidance.	Used to surface fresh content for Google News.
May stay relatively stable over time.	Changes rapidly as articles enter and leave the 48-hour window.

Why monitor competitors' Google News sitemaps?

Monitoring competitor Google News sitemaps can reveal useful patterns in editorial output and distribution strategy. Because the file reflects only the most recent articles, it acts as a near-real-time signal of what a publisher is prioritising.

Discover new content ideas: spot topics, entities, and story angles competitors are publishing.
Track trends: understand which themes are accelerating in your niche or market.
Identify coverage gaps: find areas where your newsroom or content operation is underrepresented.
Measure publishing frequency: estimate how often competitors publish and which subjects dominate their recent output.

In practice, this kind of monitoring can support both Google News analysis and Google Discover research, especially when you combine sitemap data with headline classification or publisher tagging.

Python libraries used in the workflow

import advertools as adv
import ssl
import time
import pandas as pd

Advertools: a strong SEO-friendly Python package for working with sitemaps, crawl data, and search marketing workflows.
time: used to add a delay between runs so the script does not hammer the same sitemap continuously.
Pandas: used to combine, deduplicate, and export the sitemap data.
ssl: sometimes useful when handling HTTPS and connection behaviour in local environments.

Step 1: Load the sitemap into data frames

The core method here is adv.sitemap_to_df(), which fetches a sitemap and converts it into a Pandas DataFrame.

while True:
    nj_1 = adv.sitemap_to_df("https://www.example.com/news-sitemap.xml", max_workers=8)
    nj_2 = adv.sitemap_to_df("https://www.example.com/news-sitemap.xml", max_workers=8)
    nj_3 = adv.sitemap_to_df("https://www.example.com/news-sitemap.xml", max_workers=8)

That example shows the basic concept, but it is not ideal as-is. Repeatedly hitting the same sitemap in a tight loop creates unnecessary traffic and wastes local resources. If you are going to poll a competitor sitemap, you should introduce a delay and keep the frequency reasonable.

To find a competitor's Google News sitemap, the safest first step is usually to check the robots.txt file for a sitemap reference.

Step 2: Combine the data frames

all_sitemaps = [nj_1, nj_2, nj_3]

result = pd.concat(
    all_sitemaps,
    axis=0,
    join="outer",
    ignore_index=False,
    verify_integrity=False,
    copy=True,
)

This combines multiple DataFrames into a single variable so the full collection can be cleaned and exported together.

Step 3: Remove duplicates

result.drop_duplicates(subset=["loc"], keep="first", inplace=True)

Because a Google News sitemap only covers the most recent 48 hours, repeated snapshots will often contain the same article URLs. Deduplicating on loc keeps the export clean.

Step 4: Export the data to CSV

result.to_csv(
    "sitemap_data.csv",
    mode="a",
    index=True,
    header=not bool(result.shape[0]),
)

Appending to the same CSV allows the file to grow over time as you collect more sitemap snapshots. The header logic is there to avoid writing the column names on every append.

Step 5: Schedule the script responsibly

time.sleep(43200)

A 43,200-second delay means the script runs every 12 hours. That is usually enough for this type of monitoring without creating aggressive crawl behaviour.

Frequency	Seconds
6 hours	21600
12 hours	43200
24 hours	86400
48 hours	172800

Summary

What this workflow gives you

A simple way to collect recent competitor article URLs from Google News sitemaps.
A historical CSV that can be analysed later in Looker Studio, spreadsheets, or Python notebooks.
Better visibility into publishing cadence and topical focus.

Important caveats

Use reasonable polling intervals and avoid unnecessary load on competitor servers.
Check robots.txt and treat the process as monitoring, not aggressive scraping.
time.sleep() is fine in a local IDE, but for cloud-based environments you are usually better off using a scheduler such as cron or another orchestration layer.

This is a practical way to build a lightweight competitive intelligence dataset from Google News sitemaps. Keep it responsible, keep it clean, and the output can become genuinely useful for editorial and SEO analysis.