Javascript – Web scraping requires authentication using /Javascript alerts

Web scraping requires authentication using /Javascript alerts… here is a solution to the problem.

Web scraping requires authentication using /Javascript alerts

I’ve been trying to scrape some raw XML data from internal company sites (excluding URLs for security purposes). I’m currently doing this with selenium and beautifulsoup (but I’m open to any other options). JavaScript browser alerts when I manually visit the website and I am prompted to enter my username and password (see image). I tried to automatically validate the credentials as follows (not authenticated):

def main():
    #gets specified list of direct reports
    # username:password@
    url ="http://{username}:{password}@myURL.com"
    driver.get(url)
    html = driver.page_source
    soup = BeautifulSoup(html, "lxml")
    # parsing logic follows ... 

However, when the script runs, I still need to manually enter the username and password in the chromedriver-controlled browse window, and the rest of the program works as expected.

Is there a way to avoid this manual entry? I’ve also tried solutions around driver.alert and sending keys and credentials to the browser to no avail. (I know this can be difficult because the site is not accessible outside the network, any insight is appreciated!)

EDIT: I should mention that this method worked a few weeks ago, but it no longer works after the chrome update…

Authentication pop-up

Solution

Your login process may return some sort of access token, a value in the response body, or a header with a token, which may be an authorization header or a set-cookie header.

In most cases, you will need to send that token with each request, as an authorization header, body parameter, or whatever the page expects.

Your job is to find the token by checking the server’s response when you authenticate, store it somewhere, and send it back every time you make a page request to the server.

How it is sent back depends on the requirements of the server in question. It may require request body parameters or headers, which are the two most likely cases.

Related Problems and Solutions