CTR curves are important for understanding how well your website is performing on search engines. CTR depends on the position on the search result page as well as what other features Google is showing for the specific query and the intent. By visualising and calculating the CTR per position with your own Google Search Console data, you can understand what is the highest CTR and use it for a forecast. Also, it’s a great way to measure your website’s visibility and effectiveness.
In this blog post, I will show you how to create a CTR curve using Python. I will use data from Google Search Console, which provides the most reliable CTR and average. position for each query.
CTR, short for Click-Through Rate, is a metric that measures the percentage of people who click on your website’s search results. CTR is calculated as follows clicks ÷ impressions = CTR
By studying CTR curves, you can gain valuable information about your website’s performance and identify areas for improvement. They help you understand how well your website is capturing the attention of search engine users and guide your SEO strategy. Also, you can use this CTR curve to forecast your organic traffic based on your keyword research.
To begin, I import the necessary libraries for data manipulation, visualisation, and statistical analysis:
import pandas as pd import matplotlib.pyplot as plt from scipy import stats import numpy as np
To run the script you have to do an export from Google Search Console. The script loads the CTR data from a CSV file with the name “Queries.csv” which is in the .zip file from GSC using the Pandas library:
data = pd.read_csv('Queries.csv')
Next, the “CTR” column is converted from a string to a numeric type and scaled to a value between 0 and 1:
data['CTR'] = data['CTR'].str.rstrip('%').astype('float') / 100.0
To ensure accurate analysis, the script filters out branded keywords with REGEX. The branded keywords have with really high CTR and will mess up the data.
Note: Make sure that you add all different variations of your brand name.
data = data[~data['Top queries'].str.contains('brand|bra nd|the brand', case=False, na=False)]
Furthermore, I refined the data by filtering for positions 1 to 15.
data = data[data['Position'].between(1, 15)]
I group the data by position and calculate the average CTR for each position:
avg_ctr_by_position = data.groupby('Position')['CTR'].mean()
The average CTR by position is saved to a CSV file for further analysis or reporting. This is a CSV file with the CTR for each page between 1 and 15 which you can use to analyse and forecast potential organic growth.
avg_ctr_by_position.to_csv('average_ctr_by_position.csv', header=True)
The script employs the Matplotlib library to create a scatter plot of the average CTR by position:
fig, ax = plt.subplots(figsize=(10, 6)) ax.scatter(avg_ctr_by_position.index, avg_ctr_by_position.values)
In addition, to understand the relationship between position and CTR, I calculate a linear regression model:
slope, intercept, r_value, p_value, std_err = stats.linregress(avg_ctr_by_position.index, avg_ctr_by_position.values)
The script creates a sequence of x-values and calculates the corresponding y-values for the regression line. The regression line is then added to the scatter plot:
x = np.array(avg_ctr_by_position.index) y = intercept + slope * x ax.plot(x, y, color='red', label='y={:.2f}x+{:.2f}'.format(slope, intercept))
The remaining lines of code add labels, a title, a legend, and grid lines to the plot before displaying it:
plt.title('CTR Curve') plt.xlabel('Position') plt.ylabel('Average CTR') plt.legend() plt.grid(True) plt.show()