Visualising CTR with Python and Linear Regression

Author: Svet Petkov
Last Modified: November 29, 2024

Creating a CTR curve is important for understanding what is your CTR in your niche per ranking position on Google.

CTR depends on the position on the search result page as well as what other features Google is showing for the specific query and the intent, for example Top Stories carousel, shopping list, feature snippet, AI overview and many more.

By visualising and calculating the CTR per position with your own Google Search Console data, you can understand what is the highest CTR and use this data for a better forecast.

In this blog post, I will show you how to create a CTR curve using Python. I will use data from Google Search Console, which provides the most reliable CTR and average. position for each query.

For these few people who don't know what CTR - It's short for Click-Through Rate, which is a metric that measures the percentage of people who click on your website’s search results. CTR is calculated as follows clicks ÷ impressions = CTR

Analysing CTR curves for organic traffic forecast

The process that I use to create the linear regression is passing through data that will be valuable if you want to forecast your organic traffic based on your keyword research.

Step-by-step guide on how to create a CTR curve with Python with data from Google Search Console

Step 1: Importing the Required Libraries

To begin, I import the necessary libraries for data manipulation, visualisation, and statistical analysis:

import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import numpy as np

Step 2: Loading and Preparing the Data

To run the script you have to do an export from Google Search Console. The script loads the CTR data from a CSV file with the name “Queries.csv” which is in the .zip file from GSC using the Pandas library:

data = pd.read_csv('Queries.csv')

Next, the “CTR” column is converted from a string to a numeric type and scaled to a value between 0 and 1:

data['CTR'] = data['CTR'].str.rstrip('%').astype('float') / 100.0

Step 3: Cleaning and filtering the exported data

To ensure accurate analysis, the script filters out branded keywords with REGEX. The branded keywords have with really high CTR and which will make the number inaccurate.

Note: Make sure that you add all different variations of your brand name.

data = data[~data['Top queries'].str.contains('brand|bra nd|the brand', case=False, na=False)]

Furthermore, I refined the data by filtering for positions 1 to 15. 

data = data[data['Position'].between(1, 15)]

Step 4: Calculating the Average CTR by Position

I group the data by position and calculate the average CTR for each position:

avg_ctr_by_position = data.groupby('Position')['CTR'].mean()

Step 5: Outputting the Results to a CSV File

The average CTR by position is saved to a CSV file for further analysis or reporting. This is a CSV file with the CTR for each page between 1 and 15 which you can use to analyse and forecast potential organic growth.

avg_ctr_by_position.to_csv('average_ctr_by_position.csv', header=True)

Step 6: Visualising the Data

The script employs the Matplotlib library to create a scatter plot of the average CTR by position:

fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(avg_ctr_by_position.index, avg_ctr_by_position.values)

Step 7: Calculating Linear Regression

In addition, to understand the relationship between position and CTR, I calculate a linear regression model:

slope, intercept, r_value, p_value, std_err = stats.linregress(avg_ctr_by_position.index, avg_ctr_by_position.values)

Step 8: Plotting the Regression Line

The script creates a sequence of x-values and calculates the corresponding y-values for the regression line. The regression line is then added to the scatter plot:

x = np.array(avg_ctr_by_position.index)
y = intercept + slope * x
ax.plot(x, y, color='red', label='y={:.2f}x+{:.2f}'.format(slope, intercept))

Step 9: Finalising the Plot

The remaining lines of code add labels, a title, a legend, and grid lines to the plot before displaying it:

plt.title('CTR Curve')
plt.xlabel('Position')
plt.ylabel('Average CTR')
plt.legend()
plt.grid(True)
plt.show()
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram