Regular expressions on strings to match sequences of characters
Settings
I have a large number of product images, some of which have the SKU of the product in the file name.
I need to check if the file name contains the SKU of the product.
All SKUs consist of 5 numbers, an underscore, and 2 digits; For example, '10008_01
‘, ‘23521_18'
, etc
My code
I’m using the regular expression setting I found here :
for image in product_image_list:
if re.match(r"^[0-9]{5}$" + '_' + r"^[0-9]{2}$", image):
print('Match: '+ image)
else:
print("NO match: " + image)
Where,
- image
- is the name of the
image
file, such as ‘FINAL 10008_01_angle.jpeg’ or'FINAL 10008_detail_B.jpeg'
product_image_list
is a list of images.
Question
The above code does not match, it only produces ‘No match'
.
How do I get it to work? IE。 How do I get:
'Match: Final 10008_01_angle.jpeg'
'MISMATCH: FINAL 10008_detail_B.jpeg'
Solution
^[0-9]{5}$_^[0-9]{2}$
Since $
, the pattern will never match any string inside anchor requires the end of the string, but there are more characters to match ( _
, then the beginning of the string, 2 digits and the end of the string).
You need to fix the regular expression pattern to match <5-digit>-<2-digit>
substrings without enclosing numbers, and use the pattern method with re.search (because re.match
only searches for matches at the beginning of the string):
if re.search(r'(?<!\d)[0-9]{5}_[0-9]{2}(?! \d)', image):
Here,
-
(?<!\d
) – (negative backward view) matches the position in the string that does not immediately follow the number -
[0-9]{5}
– 5 digits -
_
– Underline -
[0-9]{2}
– 2 digits -
(?! \d
) – (negative forward meaning) There can be no numbers to the right of the current position.
See also this regex demo
Print matching used
for image in product_image_list:
m = re.search(r'(?<!\d)[0-9]{5}_[0-9]{2}(?! \d)', image)
if m:
print('Matched SKU: {}'.format(m.group()))
else:
print("NO match found in '{}'.". format(image))
To match multiple occurrences, use re.findall
:
re.findall(r'(?<!\d)[0-9]{5}_[0-9]{2}(?! \d)', image)