G-HGRQ3430W9 259738593
top of page
Search

Scraping Google Search Result Page with Python

Google, the most powerful and most used search engine in the planet. We marketers almost every hour go to google to look for something or audit something.

When you are doing keyword research it becomes tedious to search each and every keyword, copy the same and then analyze.

Today here I will share few tricks which will save your day, will give you some space to have coffee, to do some gossip while the machine will work for you.


Let's start with logic behind.

While automating you should first jot down the steps which you perform manually. In this use case we are automating below steps;

  1. Copy Keyword from the list.

  2. Paste in google search box.

  3. Click on search icon.

  4. Copy responses and paste into CSV/Excel.

As part of automation we will use below python's packages;

  1. requests

  2. bs4

  3. pandas

Let's begin some coding now-


First thing first, import necessary packages, I assume you already have these packages installed in your environment. If not, then 'pip' them out :)

#Import Packages
import requests
import bs4
import pandas as pd   

Now get your keyword list read, in example below I am reading it from a CSV.


url='/path_to_your.csv'
input_df=pd.read_csv(url)
kwlist=input_df['Keyword'] #Assuming Keyword is the column in the read csv.

Get storage list for storing scraped data.


a_text=[] #To store text against the URL
h_ref=[]  #To store URL's of results
k_wd=[] #To map the kewyord which gave the results

Now comes the Core Logic to mine your data.



for kwdlist in kwlist:
        
    url = 'https://google.com/search?q=' + kwdlist
    print("{} of Total {} keyword".format(num_count,len(input_df['Keyword'])))
    request_result=requests.get(url)
      
    # Creating soup from the fetched request
    soup = bs4.BeautifulSoup(request_result.text,
                             "html.parser")
    # Get all major headings of search result.
    heading_object=soup.find_all( 'h3' )
    a_tag=soup.find_all('a')

Append results in containers.



    for i in a_tag:
        a_text.append(i.text)
        h_ref.append(i['href'])
        k_wd.append(kwdlist)
        
    num_count=num_count+1

This will give you all responses. As per the use case you may want to filter this data.

For e.g if you want to get all the url's only then below is the definition to get the same.



#Function definition to get "https" string

def find_match(string_list, wanted):
    str_list=[]
    for string in string_list:
        if wanted in string:
            str_list.append(string)
    return str_list


https_list=find_match(h_ref,'https')

Time to get the output in CSV now.


filepath_k=('AIMarketer Pvt Ltd/SERP_Result_2.csv')
df.to_csv(filepath_k)


You can now safely step out from your desk and do anything but monitor screen. Provided you have instructed your machine not to go on sleep :D.


Thanks For reading. Logging off un-till next article.





206 views0 comments

Recent Posts

See All

Use of Natural Language Processing aka NLP in Digital Marketing is increasing day by day.As individuals behaviour in searching content and personification of content is increasing thus use of NLP is a

The fuel for Digital Marketing as we all know is "Data" and Internet is full of it. As per statistics "It is believed that over 2.5 quintillion bytes (2.5 e+9 GB) of the data is created every day"(Sou

Post: Blog2_Post
bottom of page