Scraping Google Search Result Page with Python
Google, the most powerful and most used search engine in the planet. We marketers almost every hour go to google to look for something or audit something.
When you are doing keyword research it becomes tedious to search each and every keyword, copy the same and then analyze.
Today here I will share few tricks which will save your day, will give you some space to have coffee, to do some gossip while the machine will work for you.
Let's start with logic behind.
While automating you should first jot down the steps which you perform manually. In this use case we are automating below steps;
Copy Keyword from the list.
Paste in google search box.
Click on search icon.
Copy responses and paste into CSV/Excel.
As part of automation we will use below python's packages;
requests
bs4
pandas
Let's begin some coding now-
First thing first, import necessary packages, I assume you already have these packages installed in your environment. If not, then 'pip' them out :)
#Import Packages
import requests
import bs4
import pandas as pd
Now get your keyword list read, in example below I am reading it from a CSV.
url='/path_to_your.csv'
input_df=pd.read_csv(url)
kwlist=input_df['Keyword'] #Assuming Keyword is the column in the read csv.
Get storage list for storing scraped data.
a_text=[] #To store text against the URL
h_ref=[] #To store URL's of results
k_wd=[] #To map the kewyord which gave the results
Now comes the Core Logic to mine your data.
for kwdlist in kwlist:
url = 'https://google.com/search?q=' + kwdlist
print("{} of Total {} keyword".format(num_count,len(input_df['Keyword'])))
request_result=requests.get(url)
# Creating soup from the fetched request
soup = bs4.BeautifulSoup(request_result.text,
"html.parser")
# Get all major headings of search result.
heading_object=soup.find_all( 'h3' )
a_tag=soup.find_all('a')
Append results in containers.
for i in a_tag:
a_text.append(i.text)
h_ref.append(i['href'])
k_wd.append(kwdlist)
num_count=num_count+1
This will give you all responses. As per the use case you may want to filter this data.
For e.g if you want to get all the url's only then below is the definition to get the same.
#Function definition to get "https" string
def find_match(string_list, wanted):
str_list=[]
for string in string_list:
if wanted in string:
str_list.append(string)
return str_list
https_list=find_match(h_ref,'https')
Time to get the output in CSV now.
filepath_k=('AIMarketer Pvt Ltd/SERP_Result_2.csv')
df.to_csv(filepath_k)
You can now safely step out from your desk and do anything but monitor screen. Provided you have instructed your machine not to go on sleep :D.
Thanks For reading. Logging off un-till next article.