G-HGRQ3430W9 259738593
 
Search

Scraping Google Search Result Page with Python

Google, the most powerful and most used search engine in the planet. We marketers almost every hour go to google to look for something or audit something.

When you are doing keyword research it becomes tedious to search each and every keyword, copy the same and then analyze.

Today here I will share few tricks which will save your day, will give you some space to have coffee, to do some gossip while the machine will work for you.


Let's start with logic behind.

While automating you should first jot down the steps which you perform manually. In this use case we are automating below steps;

  1. Copy Keyword from the list.

  2. Paste in google search box.

  3. Click on search icon.

  4. Copy responses and paste into CSV/Excel.

As part of automation we will use below python's packages;

  1. requests

  2. bs4

  3. pandas

Let's begin some coding now-


First thing first, import necessary packages, I assume you already have these packages installed in your environment. If not, then 'pip' them out :)

#Import Packages
import requests
import bs4
import pandas as pd   

Now get your keyword list read, in example below I am reading it from a CSV.


url='/path_to_your.csv'
input_df=pd.read_csv(url)
kwlist=input_df['Keyword'] #Assuming Keyword is the column in the read csv.

Get storage list for storing scraped data.


a_text=[] #To store text against the URL
h_ref=[]  #To store URL's of results
k_wd=[] #To map the kewyord which gave the results

Now comes the Core Logic to mine your data.



for kwdlist in kwlist:
        
    url = 'https://google.com/search?q=' + kwdlist
    print("{} of Total {} keyword".format(num_count,len(input_df['Keyword'])))
    request_result=requests.get(url)
      
    # Creating soup from the fetched request
    soup = bs4.BeautifulSoup(request_result.text,
                             "html.parser")
    # Get all major headings of search result.
    heading_object=soup.find_all( 'h3' )
    a_tag=soup.find_all('a')

Append results in containers.



    for i in a_tag:
        a_text.append(i.text)
        h_ref.append(i['href'])
        k_wd.append(kwdlist)
        
    num_count=num_count+1

This will give you all responses. As per the use case you may want to filter this data.

For e.g if you want to get all the url's only then below is the definition to get the same.



#Function definition to get "https" string

def find_match(string_list, wanted):
    str_list=[]
    for string in string_list:
        if wanted in string:
            str_list.append(string)
    return str_list


https_list=find_match(h_ref,'https')

Time to get the output in CSV now.


filepath_k=('AIMarketer Pvt Ltd/SERP_Result_2.csv')
df.to_csv(filepath_k)


You can now safely step out from your desk and do anything but monitor screen. Provided you have instructed your machine not to go on sleep :D.


Thanks For reading. Logging off un-till next article.