Visualising CTR with Python and Linear Regression

Author: Svet Petkov
Last Modified: August 9, 2024

CTR curves are important for understanding how well your website is performing on search engines. CTR depends on the position on the search result page as well as what other features Google is showing for the specific query and the intent. By visualising and calculating the CTR per position with your own Google Search Console data, you can understand what is the highest CTR and use it for a forecast. Also, it’s a great way to measure your website’s visibility and effectiveness.

In this blog post, I will show you how to create a CTR curve using Python. I will use data from Google Search Console, which provides the most reliable CTR and average. position for each query.

What is CTR?

CTR, short for Click-Through Rate, is a metric that measures the percentage of people who click on your website’s search results. CTR is calculated as follows clicks ÷ impressions = CTR

Analysing CTR Curves for Website Performance

By studying CTR curves, you can gain valuable information about your website’s performance and identify areas for improvement. They help you understand how well your website is capturing the attention of search engine users and guide your SEO strategy. Also, you can use this CTR curve to forecast your organic traffic based on your keyword research.

Step-by-step guide on how to create a CTR curve with Python with data from Google Search Console

Step 1: Importing the Required Libraries

To begin, I import the necessary libraries for data manipulation, visualisation, and statistical analysis:

import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
import numpy as np

Step 2: Loading and Preparing the Data

To run the script you have to do an export from Google Search Console. The script loads the CTR data from a CSV file with the name “Queries.csv” which is in the .zip file from GSC using the Pandas library:

data = pd.read_csv('Queries.csv')

Next, the “CTR” column is converted from a string to a numeric type and scaled to a value between 0 and 1:

data['CTR'] = data['CTR'].str.rstrip('%').astype('float') / 100.0

Step 3: Filtering the Data

To ensure accurate analysis, the script filters out branded keywords with REGEX. The branded keywords have with really high CTR and will mess up the data.

Note: Make sure that you add all different variations of your brand name.

data = data[~data['Top queries'].str.contains('brand|bra nd|the brand', case=False, na=False)]

Furthermore, I refined the data by filtering for positions 1 to 15. 

data = data[data['Position'].between(1, 15)]

Step 4: Calculating the Average CTR by Position

I group the data by position and calculate the average CTR for each position:

avg_ctr_by_position = data.groupby('Position')['CTR'].mean()

Step 5: Outputting the Results to a CSV File

The average CTR by position is saved to a CSV file for further analysis or reporting. This is a CSV file with the CTR for each page between 1 and 15 which you can use to analyse and forecast potential organic growth.

avg_ctr_by_position.to_csv('average_ctr_by_position.csv', header=True)

Step 6: Visualising the Data

The script employs the Matplotlib library to create a scatter plot of the average CTR by position:

fig, ax = plt.subplots(figsize=(10, 6))
ax.scatter(avg_ctr_by_position.index, avg_ctr_by_position.values)

Step 7: Calculating Linear Regression

In addition, to understand the relationship between position and CTR, I calculate a linear regression model:

slope, intercept, r_value, p_value, std_err = stats.linregress(avg_ctr_by_position.index, avg_ctr_by_position.values)

Step 8: Plotting the Regression Line

The script creates a sequence of x-values and calculates the corresponding y-values for the regression line. The regression line is then added to the scatter plot:

x = np.array(avg_ctr_by_position.index)
y = intercept + slope * x
ax.plot(x, y, color='red', label='y={:.2f}x+{:.2f}'.format(slope, intercept))

Step 9: Finalising the Plot

The remaining lines of code add labels, a title, a legend, and grid lines to the plot before displaying it:

plt.title('CTR Curve')
plt.xlabel('Position')
plt.ylabel('Average CTR')
plt.legend()
plt.grid(True)
plt.show()
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram