Dynamic double quotes “keys” in text to form a valid JSON string in python
I’m working on text contained in a JS variable on a web page and extracting a string using a regular expression and then converting it to a JSON object in Python using json.loads().
The problem I’m having is unquoted “keys”. Right now, I’m doing a series of substitutions for each key in each string (code below), but what I want is to dynamically identify any unquoted keys before passing the string to json.loads().
Example 1: No space after character
json_data1 = '[{storeName:"testName",address:"12345 Road",address2:"Suite 500",city:"testCity",storeImage:"http://www.testLink.com",state:"testState",phone:" 999-999-9999",lat:99.9999,lng:-99.9999}]'
Example 2: A space after a character
json_data2 = '[{storeName: "testName",address: "12345 Road",address2: "Suite 500",city: "testCity",storeImage: "http://www.testLink.com",state: "testState",phone: " 999-999-9999",lat: 99.9999,lng: -99.9999}]'
Example 3 is followed by the space,: character
json_data3 = '[{storeName: "testName", address: "12345 Road", address2: "Suite 500", city: "testCity", storeImage: "http://www.testLink.com", state: "testState", phone: " 999-999-9999", lat: 99.9999, lng: -99.9999}]'
Example 4 is followed by spaces: characters and line breaks
json_data4 = '''[
{
storeName: "testName",
address: "12345 Road",
address2: "Suite 500",
city: "testCity",
storeImage: "http://www.testLink.com",
state: "testState",
phone: "999-999-9999",
lat: 99.9999, lng: -99.9999
}]'''
I need to create patterns to identify which are keys, rather than random string values that contain characters, such as string links in storeImage
. In other words, I want to dynamically look up keys and reference them in double quotes to use json.loads()
and return a valid JSON object.
I’m currently replacing every key in the text this way
content = re.sub('storeName:', '"storeName":', content)
content = re.sub('address:', '"address":', content)
content = re.sub('address2:', '"address2":', content)
content = re.sub('city:', '"city":', content)
content = re.sub('storeImage:', '"storeImage":', content)
content = re.sub('state:', '"state":', content)
content = re.sub('phone:', '"phone":', content)
content = re.sub('lat:', '"lat":', content)
content = re.sub('lng:', '"lng":', content)
Returned as a string representing valid JSON
json_data = [{"storeName": "testName", "address": "12345 Road", "address2": "Suite 500", "city": "testCity", "storeImage": "http://www.testLink.com", "state": "testState" , "phone": "999-999-9999", "lat": 99.9999, "lng": -99.9999}]
I’m sure there’s a better way to do this, but I’ve been unable to find or come up with a regular expression pattern to deal with these issues. Thanks a lot for any help!
Solution
Something like this should get the job done: ([{,]\s*)([^"':]+)(\s*:)
Replace with: \1"\2"\3
Example: https://regex101.com/r/oV0udR/1