Creating a CTR curve is important for understanding what is your CTR in your niche per ranking position on Google.
CTR depends on the position on the search result page as well as what other features Google is showing for the specific query and the intent, for example Top Stories carousel, shopping list, feature snippet, AI overview and many more.
By visualising and calculating the CTR per position with your own Google Search Console data, you can understand what is the highest CTR and use this data for a better forecast.
In this blog post, I will show you how to create a CTR curve using Python. I will use data from Google Search Console, which provides the most reliable CTR and average. position for each query.
For these few people who don't know what CTR - It's short for Click-Through Rate, which is a metric that measures the percentage of people who click on your website’s search results. CTR is calculated as follows clicks ÷ impressions = CTR
The process that I use to create the linear regression is passing through data that will be valuable if you want to forecast your organic traffic based on your keyword research.
To begin, I import the necessary libraries for data manipulation, visualisation, and statistical analysis:
import pandas as pd import matplotlib.pyplot as plt from scipy import stats import numpy as np
To run the script you have to do an export from Google Search Console. The script loads the CTR data from a CSV file with the name “Queries.csv” which is in the .zip file from GSC using the Pandas library:
data = pd.read_csv('Queries.csv')
Next, the “CTR” column is converted from a string to a numeric type and scaled to a value between 0 and 1:
data['CTR'] = data['CTR'].str.rstrip('%').astype('float') / 100.0
To ensure accurate analysis, the script filters out branded keywords with REGEX. The branded keywords have with really high CTR and which will make the number inaccurate.
Note: Make sure that you add all different variations of your brand name.
data = data[~data['Top queries'].str.contains('brand|bra nd|the brand', case=False, na=False)]
Furthermore, I refined the data by filtering for positions 1 to 15.
data = data[data['Position'].between(1, 15)]
I group the data by position and calculate the average CTR for each position:
avg_ctr_by_position = data.groupby('Position')['CTR'].mean()
The average CTR by position is saved to a CSV file for further analysis or reporting. This is a CSV file with the CTR for each page between 1 and 15 which you can use to analyse and forecast potential organic growth.
avg_ctr_by_position.to_csv('average_ctr_by_position.csv', header=True)
The script employs the Matplotlib library to create a scatter plot of the average CTR by position:
fig, ax = plt.subplots(figsize=(10, 6)) ax.scatter(avg_ctr_by_position.index, avg_ctr_by_position.values)
In addition, to understand the relationship between position and CTR, I calculate a linear regression model:
slope, intercept, r_value, p_value, std_err = stats.linregress(avg_ctr_by_position.index, avg_ctr_by_position.values)
The script creates a sequence of x-values and calculates the corresponding y-values for the regression line. The regression line is then added to the scatter plot:
x = np.array(avg_ctr_by_position.index) y = intercept + slope * x ax.plot(x, y, color='red', label='y={:.2f}x+{:.2f}'.format(slope, intercept))
The remaining lines of code add labels, a title, a legend, and grid lines to the plot before displaying it:
plt.title('CTR Curve') plt.xlabel('Position') plt.ylabel('Average CTR') plt.legend() plt.grid(True) plt.show()