As an SEO specialist, you are probably often asked about what would be the organic traffic that you will receive from some new page or new vertical.
That is why I decided to speed up the process of getting the estimation of expected organic traffic using basic Python automation.
I used a calculation that every SEO expert would easily understand and probably already knows. The script is using the current position of each keyword, avg. search volume and the CTR to calculate the expected organic traffic.
However, remember that this is just an estimation the actual traffic you are going to receive depends on many other different factors and SERP features.
What do you need to run the script?
Export of the “Organic keywords” report from Ahrefs – choose the data for the specific URL or subfolder that you are targetting.
IMPORTANT: Export the data in UTF-8 format
2. Access to Google Colaboratory
First, you should import all needed libraries:
import pandas as pd import numpy as np from google.colab import files
Then you have to upload the exported file from aHrefs > Organic Keywords report.
files.upload()
The third key moment is to copy the name of the file and paste it here. Then the additional columns that are not needed will be removed.
df2 = pd.read_csv('/content/www.fantasypros.com-nfl-depth-chart--organic__2022-08-31_08-47-55.csv')
The improvement has to be set up in the beginning and stored in different variables. scenario_one will keep the number of positions that you want to improve the ranking in scenario one.
Example: If you are calculating the traffic of your competitors, in scenario one you can increase the positions by 1 point and calculate the traffic if you outrank your competitor.
scenario_one = 1 #@param {type:"slider", min:1, max:100, step:1} scenario_two = 4 #@param {type:"slider", min:1, max:100, step:1}
With df2.dropna, you are removing the keywords with a missing current position. These keywords are out of the top 100, most often keywords that are irrelevant and we should not add to the calculation.
df2 = df2.drop(["Previous traffic", "Traffic change", "Previous position", "Position change", "Previous URL inside", "Previous URL", "Current URL inside", "Current URL", "Previous date", "Current date", 'SERP features',"Current traffic"], axis=1) df2.dropna(inplace = True)
The ‘current_traffic’ is calculated based on two conditions – keyword position and CTR. Where the CTR is dependent on the current position.
#Calculating the current traffic based on the current position position_rule = [ (df2['Current position'] == 1), (df2['Current position'] >= 2) & (df2['Current position'] < 2.9), (df2['Current position'] >= 3) & (df2['Current position'] < 3.9), (df2['Current position'] >= 4) & (df2['Current position'] < 4.9), (df2['Current position'] >= 5) & (df2['Current position'] < 5.9), (df2['Current position'] >= 6) & (df2['Current position'] < 6.9), (df2['Current position'] >= 7) & (df2['Current position'] < 7.9), (df2['Current position'] >= 8) & (df2['Current position'] < 8.9), (df2['Current position'] >= 5) ] ctr_curve = [(df2['Volume']*0.2569), #Current position 1 (df2['Volume']*0.1561), #Current position 2 (df2['Volume']*0.0986), #Current position 3 (df2['Volume']*0.0678), #Current position 4 (df2['Volume']*0.0483), #Current position 5 (df2['Volume']*0.0356), #Current position 6 (df2['Volume']*0.0269), #Current position 7 (df2['Volume']*0.0191), #Current position 8 (df2['Volume']*0.0147) #Current position 9 ] df2['current_traffic'] = np.select(position_rule, ctr_curve)
Note: Current CTR is based on the Advanced Web Ranking(AWR) in their Google Organic CTR History Study.
Based on the value stored in the variable “scenario_one” the following part of the script will change the current positions and will calculate the expected traffic based on the new positions.
Note: If the value of the position is under 0 will be replaced with 1, automatically.
df2['scenario_1'] = np.where(df2['Current position'] > 1, df2['Current position'] - scenario_one, 1 ) df2['scenario_1'] = np.where(df2['scenario_1'] <= 0, 5, df2['scenario_1']) position_rule = [ (df2['scenario_1'] == 1), (df2['scenario_1'] >= 2) & (df2['scenario_1'] < 2.9), (df2['scenario_1'] >= 3) & (df2['scenario_1'] < 3.9), (df2['scenario_1'] >= 4) & (df2['scenario_1'] < 4.9), (df2['scenario_1'] >= 5) & (df2['scenario_1'] < 5.9), (df2['scenario_1'] >= 6) & (df2['scenario_1'] < 6.9), (df2['scenario_1'] >= 7) & (df2['scenario_1'] < 7.9), (df2['scenario_1'] >= 8) & (df2['scenario_1'] < 8.9), (df2['scenario_1'] >= 5) ] ctr_curve = [(df2['Volume']*0.2569), #CTR position 1 (df2['Volume']*0.1561), #CTR position 2 (df2['Volume']*0.0986), #CTR position 3 (df2['Volume']*0.0678), #CTR position 4 (df2['Volume']*0.0483), #CTR position 5 (df2['Volume']*0.0356), #CTR position 6 (df2['Volume']*0.0269), #CTR position 7 (df2['Volume']*0.0191), #CTR position 8 (df2['Volume']*0.0147) #CTR position 9 ] df2['exp_traffic-1'] = np.select(position_rule, ctr_curve)
Similar to the previous part, here, based on the value stored in variable “scenario_two” the following part of the script will change the position in scenario one with the improved position and will calculate the expected traffic based on the new positions. The result of the calculation will be stored in column “exp_traffic2”
Note: If the value of the position is under 0 will be replaced with 1, automatically.
df2['scenario_2'] = np.where(df2['scenario_1'] > 1, df2['scenario_1'] - scenario_two, 1 ) df2['scenario_2'] = np.where(df2['scenario_2'] <= 0, 1, df2['scenario_2']) position_rule = [ (df2['scenario_2'] == 1), (df2['scenario_2'] >= 2) & (df2['scenario_2'] < 2.9), (df2['scenario_2'] >= 3) & (df2['scenario_2'] < 3.9), (df2['scenario_2'] >= 4) & (df2['scenario_2'] < 4.9), (df2['scenario_2'] >= 5) & (df2['scenario_2'] < 5.9), (df2['scenario_2'] >= 6) & (df2['scenario_2'] < 6.9), (df2['scenario_2'] >= 7) & (df2['scenario_2'] < 7.9), (df2['scenario_2'] >= 8) & (df2['scenario_2'] < 8.9), (df2['scenario_2'] >= 5) ] ctr_curve = [(df2['Volume']*0.2569), #CTR position 1 (df2['Volume']*0.1561), #CTR position 2 (df2['Volume']*0.0986), #CTR position 3 (df2['Volume']*0.0678), #CTR position 4 (df2['Volume']*0.0483), #CTR position 5 (df2['Volume']*0.0356), #CTR position 6 (df2['Volume']*0.0269), #CTR position 7 (df2['Volume']*0.0191), #CTR position 8 (df2['Volume']*0.0147) #CTR position 9 ] df2['exp_traffic2'] = np.select(position_rule, ctr_curve)
On purpose I kept the columns Cost-Per-Click(CPC) and the Keyword Difficulty(KD) and based on the KD, I categorised the keywords into three categories – Easy (<20), Medium, Hard(>60).
df2['Type'] = np.where(df2['KD'] > 60, 'hard', (np.where(df2['KD'] < 20, 'easy', 'medium'))) df2 = df2.round(decimals=1) df2.head()
With the current manipulations the table should look like this:
This simple analysis is done by creating a pivot table with an index of the type/category of the keywords and the sum of the values. For better understanding, I added also the “total” in a separate row.
Also, I remove the unnecessary columns, such as the current position, KD, and the positions from scenarios one and two.
summary_table = pd.pivot_table(data=df2,index=['Type'], aggfunc=np.sum) summary_table = summary_table.round(decimals=1) summary_table.loc["Total"] = summary_table.sum() summary_df = pd.DataFrame(summary_table) summary_df = summary_df.drop(["Current position", 'KD', 'scenario_1', 'scenario_2'], axis=1)
As a final output, I created a bar chart with the current traffic, and the traffic from both scenarios divided by category.
summary_df = summary_df.drop(['Volume', 'CPC'], axis=1) ax = summary_df.T.plot(kind='bar', ylabel='Type')
As a final step I added a few lines with which you can download the raw data with all keywords and also the pivot table with the summary.
df2.to_csv('/content/kw_forecast.csv') files.download('/content/kw_forecast.csv')
summary_table.to_csv('/content/summary.csv') files.download('/content/summary.csv')
Google Colaboratory: HERE
GitHub: HERE
This analysis will help you to get an overview of organic traffic on a specific page and subfolder of your competitor. The numbers are rough because of the many factors and fluctuations in the CTR, based on the features in the SERP, market, intent, etc. However, you can identify opportunities based on the expected traffic from not the competitive keywords.