A Python error occurred while adding a variable with a backslash character to the URL string

A Python error occurred while adding a variable with a backslash character to the URL string … here is a solution to the problem.

A Python error occurred while adding a variable with a backslash character to the URL string

I’m trying to scrape some data from a website that assigns a session cookie and generates HTML containing breadcrumb code, and I need to append it to a URL to get the data. I ran into a problem when the crumb variable contains a backslash (HTTP 401 Unauthorized)… Because crumb is a variable, I don’t know how to add r’ to the beginning. I tried adding .encode(‘string-escape’) and .replace(‘\\’,’\’) to the crumb variable, but I couldn’t get it to work.

My code looks like this in python 2.7 :

cj = cookielib. CookieJar()
opener = urllib2.build_opener(urllib2. HTTPCookieProcessor(cj))    
opener.open('http://www.sample.com')

#Some code here that looks for crumb code in HTML

crumb = 'abc\xyz'

#This line fails when crumb contains a backslash
opener.open('http://www.sample.com/data=' + crumb)

cj.clear()

Does anyone know how to avoid a 401 error when trying to open a URL string that contains a backslash?

Also, if I loop through multiple breadcrumbs, is it necessary to clear the session cookie every time?

Update: It turns out that the backslash was introduced from \u002F in HTML. I believe if I convert strings to forward slashes before adding them to the URL, it will work. How do I convert \u002F in a string to /?

Solution

Because you can’t use crumb = r'abc\xyz'. I believe the str.encode('string-escape') function might help. Try:

crumb = 'abc\xyz'
crumb.encode('string-escape')

Related Problems and Solutions