Python – Dynamic double quotes “keys” in text to form a valid JSON string in python

Dynamic double quotes “keys” in text to form a valid JSON string in python… here is a solution to the problem.

Dynamic double quotes “keys” in text to form a valid JSON string in python

I’m working on text contained in a JS variable on a web page and extracting a string using a regular expression and then converting it to a JSON object in Python using json.loads().

The problem I’m having is unquoted “keys”. Right now, I’m doing a series of substitutions for each key in each string (code below), but what I want is to dynamically identify any unquoted keys before passing the string to json.loads().

Example 1: No space after character

json_data1 = '[{storeName:"testName",address:"12345 Road",address2:"Suite 500",city:"testCity",storeImage:"http://www.testLink.com",state:"testState",phone:" 999-999-9999",lat:99.9999,lng:-99.9999}]'

Example 2: A space after a character

json_data2 = '[{storeName: "testName",address: "12345 Road",address2: "Suite 500",city: "testCity",storeImage: "http://www.testLink.com",state: "testState",phone: " 999-999-9999",lat: 99.9999,lng: -99.9999}]'

Example 3 is followed by the space,: character

json_data3 = '[{storeName: "testName", address: "12345 Road", address2: "Suite 500", city: "testCity", storeImage: "http://www.testLink.com", state: "testState", phone: " 999-999-9999", lat: 99.9999, lng: -99.9999}]'

Example 4 is followed by spaces: characters and line breaks

json_data4 = '''[
{
    storeName: "testName", 
    address: "12345 Road", 
    address2: "Suite 500", 
    city: "testCity", 
    storeImage: "http://www.testLink.com", 
    state: "testState", 
    phone: "999-999-9999", 
    lat: 99.9999, lng: -99.9999
}]'''

I need to create patterns to identify which are keys, rather than random string values that contain characters, such as string links in storeImage. In other words, I want to dynamically look up keys and reference them in double quotes to use json.loads() and return a valid JSON object.

I’m currently replacing every key in the text this way

content = re.sub('storeName:', '"storeName":', content)
content = re.sub('address:', '"address":', content)
content = re.sub('address2:', '"address2":', content)
content = re.sub('city:', '"city":', content)
content = re.sub('storeImage:', '"storeImage":', content)
content = re.sub('state:', '"state":', content)
content = re.sub('phone:', '"phone":', content)
content = re.sub('lat:', '"lat":', content)
content = re.sub('lng:', '"lng":', content)

Returned as a string representing valid JSON

json_data = [{"storeName": "testName", "address": "12345 Road", "address2": "Suite 500", "city": "testCity", "storeImage": "http://www.testLink.com", "state": "testState" , "phone": "999-999-9999", "lat": 99.9999, "lng": -99.9999}]

I’m sure there’s a better way to do this, but I’ve been unable to find or come up with a regular expression pattern to deal with these issues. Thanks a lot for any help!

Solution

Something like this should get the job done: ([{,]\s*)([^"':]+)(\s*:)

Replace with: \1"\2"\3

Example: https://regex101.com/r/oV0udR/1

Related Problems and Solutions