SEO Keyword Impact Analysis with Python

Author: Svet Petkov
Published on: October 13, 2023
Last Modified: July 25, 2024

As an SEO specialist, you are probably often asked about what would be the organic traffic that you will receive from some new page or new vertical.

That is why I decided to speed up the process of getting the estimation of expected organic traffic using basic Python automation.

I used a calculation that every SEO expert would easily understand and probably already knows. The script is using the current position of each keyword, avg. search volume and the CTR to calculate the expected organic traffic.

However, remember that this is just an estimation the actual traffic you are going to receive depends on many other different factors and SERP features.

What do you need to run the script?
Export of the “Organic keywords” report from Ahrefs – choose the data for the specific URL or subfolder that you are targetting. 

ahrefs-organic-keywords

IMPORTANT: Export the data in UTF-8 format

Export to CSV - UTF-8

2. Access to Google Colaboratory

What to expect from the script?

TL-DR:

  1. Based on the avg. monthly volume and the current position of each keyword the script is calculating the current traffic.
  2. By slider(available only on Google Colab), you can adjust with how many positions you want to improve the current ranking for scenarios one and two.
  3.  Then the script is calculating the potential traffic and stores the raw which you can download in a CSV format.

How does it work?

First, you should import all needed libraries:

import pandas as pd
import numpy as np
from google.colab import files

Then you have to upload the exported file from aHrefs > Organic Keywords report.

files.upload()

The third key moment is to copy the name of the file and paste it here. Then the additional columns that are not needed will be removed.

df2 = pd.read_csv('/content/www.fantasypros.com-nfl-depth-chart--organic__2022-08-31_08-47-55.csv')

Setting the improvement in the positions

The improvement has to be set up in the beginning and stored in different variables. scenario_one will keep the number of positions that you want to improve the ranking in scenario one.

Example: If you are calculating the traffic of your competitors, in scenario one you can increase the positions by 1 point and calculate the traffic if you outrank your competitor.

scenario_one = 1 #@param {type:"slider", min:1, max:100, step:1}
scenario_two = 4 #@param {type:"slider", min:1, max:100, step:1}

With df2.dropnayou are removing the keywords with a missing current position. These keywords are out of the top 100, most often keywords that are irrelevant and we should not add to the calculation.

df2 = df2.drop(["Previous traffic", "Traffic change", "Previous position", "Position change", "Previous URL inside", "Previous URL", "Current URL inside", "Current URL", "Previous date", "Current date", 'SERP features',"Current traffic"], axis=1) 
df2.dropna(inplace = True)

Calculating the current organic traffic

The ‘current_traffic’ is calculated based on two conditions – keyword position and CTR. Where the CTR is dependent on the current position.

#Calculating the current traffic based on the current position
position_rule = [
    (df2['Current position'] == 1),
    (df2['Current position'] >= 2) & (df2['Current position'] < 2.9),
    (df2['Current position'] >= 3) & (df2['Current position'] < 3.9),
    (df2['Current position'] >= 4) & (df2['Current position'] < 4.9),
    (df2['Current position'] >= 5) & (df2['Current position'] < 5.9),
    (df2['Current position'] >= 6) & (df2['Current position'] < 6.9),
    (df2['Current position'] >= 7) & (df2['Current position'] < 7.9),
    (df2['Current position'] >= 8) & (df2['Current position'] < 8.9),
    (df2['Current position'] >= 5)
]
ctr_curve = [(df2['Volume']*0.2569), #Current position 1
           (df2['Volume']*0.1561), #Current position 2
           (df2['Volume']*0.0986), #Current position 3
           (df2['Volume']*0.0678), #Current position 4
           (df2['Volume']*0.0483), #Current position 5
           (df2['Volume']*0.0356), #Current position 6
           (df2['Volume']*0.0269), #Current position 7
           (df2['Volume']*0.0191), #Current position 8
           (df2['Volume']*0.0147) #Current position 9
           ]
df2['current_traffic'] = np.select(position_rule, ctr_curve)

Note: Current CTR is based on the Advanced Web Ranking(AWR) in their Google Organic CTR History Study.

  • Device: Desktop
  • Market: US
  • Keywords: Non-Branded

Calculating the expected traffic for scenario number one

Based on the value stored in the variable “scenario_one” the following part of the script will change the current positions and will calculate the expected traffic based on the new positions.

Note: If the value of the position is under 0 will be replaced with 1, automatically.

df2['scenario_1'] = np.where(df2['Current position'] > 1, df2['Current position'] - scenario_one, 1 )
df2['scenario_1'] = np.where(df2['scenario_1'] <= 0, 5, df2['scenario_1'])
position_rule = [
    (df2['scenario_1'] == 1),
    (df2['scenario_1'] >= 2) & (df2['scenario_1'] < 2.9),
    (df2['scenario_1'] >= 3) & (df2['scenario_1'] < 3.9),
    (df2['scenario_1'] >= 4) & (df2['scenario_1'] < 4.9),
    (df2['scenario_1'] >= 5) & (df2['scenario_1'] < 5.9),
    (df2['scenario_1'] >= 6) & (df2['scenario_1'] < 6.9),
    (df2['scenario_1'] >= 7) & (df2['scenario_1'] < 7.9),
    (df2['scenario_1'] >= 8) & (df2['scenario_1'] < 8.9),
    (df2['scenario_1'] >= 5)
]
ctr_curve = [(df2['Volume']*0.2569), #CTR position 1
           (df2['Volume']*0.1561), #CTR position 2
           (df2['Volume']*0.0986), #CTR position 3
           (df2['Volume']*0.0678), #CTR position 4
           (df2['Volume']*0.0483), #CTR position 5
           (df2['Volume']*0.0356), #CTR position 6
           (df2['Volume']*0.0269), #CTR position 7
           (df2['Volume']*0.0191), #CTR position 8
           (df2['Volume']*0.0147) #CTR position 9
           ]
df2['exp_traffic-1'] = np.select(position_rule, ctr_curve)

Calculating the expected traffic for scenario number two

Similar to the previous part, here, based on the value stored in variable “scenario_two” the following part of the script will change the position in scenario one with the improved position and will calculate the expected traffic based on the new positions. The result of the calculation will be stored in column “exp_traffic2”

Note: If the value of the position is under 0 will be replaced with 1, automatically.

df2['scenario_2'] = np.where(df2['scenario_1'] > 1, df2['scenario_1'] - scenario_two, 1 )
df2['scenario_2'] = np.where(df2['scenario_2'] <= 0, 1, df2['scenario_2'])
position_rule = [
    (df2['scenario_2'] == 1),
    (df2['scenario_2'] >= 2) & (df2['scenario_2'] < 2.9),
    (df2['scenario_2'] >= 3) & (df2['scenario_2'] < 3.9),
    (df2['scenario_2'] >= 4) & (df2['scenario_2'] < 4.9),
    (df2['scenario_2'] >= 5) & (df2['scenario_2'] < 5.9),
    (df2['scenario_2'] >= 6) & (df2['scenario_2'] < 6.9),
    (df2['scenario_2'] >= 7) & (df2['scenario_2'] < 7.9),
    (df2['scenario_2'] >= 8) & (df2['scenario_2'] < 8.9),
    (df2['scenario_2'] >= 5)
]
ctr_curve = [(df2['Volume']*0.2569), #CTR position 1
           (df2['Volume']*0.1561), #CTR position 2
           (df2['Volume']*0.0986), #CTR position 3
           (df2['Volume']*0.0678), #CTR position 4
           (df2['Volume']*0.0483), #CTR position 5
           (df2['Volume']*0.0356), #CTR position 6
           (df2['Volume']*0.0269), #CTR position 7
           (df2['Volume']*0.0191), #CTR position 8
           (df2['Volume']*0.0147) #CTR position 9
           ]
df2['exp_traffic2'] = np.select(position_rule, ctr_curve)

Basic Keyword segmentation based on Keyword Difficulty(KD)

On purpose I kept the columns Cost-Per-Click(CPC) and the Keyword Difficulty(KD) and based on the KD, I categorised the keywords into three categories – Easy (<20), Medium, Hard(>60).

df2['Type'] = np.where(df2['KD'] > 60, 'hard', (np.where(df2['KD'] < 20, 'easy', 'medium')))
df2 = df2.round(decimals=1)
df2.head()

With the current manipulations the table should look like this:

Analysing the expected traffic

This simple analysis is done by creating a pivot table with an index of the type/category of the keywords and the sum of the values. For better understanding, I added also the “total” in a separate row.

Also, I remove the unnecessary columns, such as the current position, KD, and the positions from scenarios one and two.

summary_table = pd.pivot_table(data=df2,index=['Type'], aggfunc=np.sum)
summary_table = summary_table.round(decimals=1)
summary_table.loc["Total"] = summary_table.sum()
summary_df = pd.DataFrame(summary_table)
summary_df = summary_df.drop(["Current position", 'KD', 'scenario_1', 'scenario_2'], axis=1)
pivot-table-summary

Visualisation

As a final output, I created a bar chart with the current traffic, and the traffic from both scenarios divided by category.

summary_df = summary_df.drop(['Volume', 'CPC'], axis=1)
ax = summary_df.T.plot(kind='bar', ylabel='Type')
visualisation-plot-bars

Downloading the data

As a final step I added a few lines with which you can download the raw data with all keywords and also the pivot table with the summary.

Downloading in a CSV format the raw data.

df2.to_csv('/content/kw_forecast.csv')
files.download('/content/kw_forecast.csv')

Downloading in a CSV format the pivot table with the results

summary_table.to_csv('/content/summary.csv') 
files.download('/content/summary.csv')

Google Colaboratory: HERE

GitHub: HERE

Conclusion

This analysis will help you to get an overview of organic traffic on a specific page and subfolder of your competitor. The numbers are rough because of the many factors and fluctuations in the CTR, based on the features in the SERP, market, intent, etc. However, you can identify opportunities based on the expected traffic from not the competitive keywords.

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram