Python – Link with status code 200 redirect

Link with status code 200 redirect… here is a solution to the problem.

Link with status code 200 redirect

I have a link with status code 200. But when I open it in my browser, it redirects.

When you use Python Requests to get the same link, it only displays data from the original link. I tried Python Requests and urllib, but neither worked.

  1. How do I scrape the final URL and its data?

  2. How are links with a status of 200 redirected?

>>> url =' http://www.afaqs.com/news/story/52344_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18'
>>> r = requests.get(url)
>>> r.url
'http://www.afaqs.com/news/story/52344_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18'
>>> r.history
[]
>>> r.status_code
200

This is the link

Redirected link

Solution

This redirection is done by JavaScript. Therefore, you will not use requests.get(...) to get the redirect link directly. The original URL has the following page source:

<html>
    <head>
        <meta http-equiv="refresh" content="0; URL=http://www.afaqs.com/interviews/index.html?id=572_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18">
        <script type="text/javascript" src="http://gc.kis.v2.scr.kaspersky-labs.com/D5838D60-3633-1046-AA3A-D5DDF145A207/main.js" charset="UTF-8"></script>
    </head>
    <body bgcolor="#FFFFFF"></body>
</html>

Here you can see the redirected URL. Your job is to scrape it off. You can use RegEx or some simple string splitting operations to do it.

For example:

r = requests.get('http://www.afaqs.com/news/story/52344_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18')
redirected_url = r.text.split('URL=')[1].split('">')[0]
print(redirected_url)
# http://www.afaqs.com/interviews/index.html?id=572_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18

r = requests.get(redirected_url)
# Start scraping from this link...

Alternatively, use regular expression:

redirected_url = re.findall(r'URL=(http.*)">', r.text)[0]

Related Problems and Solutions