Link with status code 200 redirect… here is a solution to the problem.
Link with status code 200 redirect
I have a link with status code 200. But when I open it in my browser, it redirects.
When you use Python Requests to get the same link, it only displays data from the original link. I tried Python Requests and urllib, but neither worked.
How do I scrape the final URL and its data?
How are links with a status of 200 redirected?
>>> url =' http://www.afaqs.com/news/story/52344_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18' >>> r = requests.get(url) >>> r.url 'http://www.afaqs.com/news/story/52344_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18' >>> r.history [] >>> r.status_code 200
Solution
This redirection is done by JavaScript. Therefore, you will not use requests.get(...) to get
the redirect link directly. The original URL has the following page source:
<html>
<head>
<meta http-equiv="refresh" content="0; URL=http://www.afaqs.com/interviews/index.html?id=572_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18">
<script type="text/javascript" src="http://gc.kis.v2.scr.kaspersky-labs.com/D5838D60-3633-1046-AA3A-D5DDF145A207/main.js" charset="UTF-8"></script>
</head>
<body bgcolor="#FFFFFF"></body>
</html>
Here you can see the redirected URL. Your job is to scrape it off. You can use RegEx or some simple string splitting operations to do it.
For example:
r = requests.get('http://www.afaqs.com/news/story/52344_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18')
redirected_url = r.text.split('URL=')[1].split('">')[0]
print(redirected_url)
# http://www.afaqs.com/interviews/index.html?id=572_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18
r = requests.get(redirected_url)
# Start scraping from this link...
Alternatively, use regular expression:
redirected_url = re.findall(r'URL=(http.*)">', r.text)[0]