Python regular expressions can't find patterns - use pyspark on Apache Spark

Python regular expressions can’t find patterns – use pyspark on Apache Spark … here is a solution to the problem.

Python regular expressions can’t find patterns – use pyspark on Apache Spark

Can anyone tell me why it’s a regular expression

df = df2.withColumn("extracted", F.regexp_extract("title", "[Pp]ython", 0))

The pattern can be found in a subsequent column called “Python” or “Python”

title
A fast PostgreSQL client library for Python: 3x faster than psycopg2
A project template for data science in Python
A simple python framework to build/train LUIS models
An Introduction to Stock Market Data Analysis with Python (Part 1)
Asynchronous Python
Cubr  A Rubiks Cube Solver Written in Python and using Webcam Input (2013)
Python 4 Kids: Python for Kids: Python 3  Project 10

But regular expressions can’t find patterns from Python or Python below

title
Python Core Development Sprint 2016: 3.6 and beyond
Hypothesis.works articles: 3.5.0 and 3.5.1 Releases of Hypothesis for Python
Total pip packages downloaded, separated by Python versions (June  August 2016)
PEP 530: Asynchronous Comprehensions in Python 3.6
Python 2.7 still reigns supreme in pip installs
CheckiO  games for Python and JavaScript coders. ClassRoom support is included
VR Zero, Virtual Reality on the RaspberryPi, in Python

Thank you

Solution

Use regular expressions that ignore case;

(?i) – Ignore or case sensitivity mode is turned on

data

data=[

  (1,"Python Core Development Sprint 2016: 3.6 and beyond"),
  (2,"Hypothesis.works articles: 3.5.0 and 3.5.1 Releases of Hypothesis for Python"),
  (3,"CheckiO  games for python and JavaScript coders. ClassRoom support is included")
  ]
df=spark.createDataFrame(data, ['id','title'])
df.show(truncate=False)

solution

df.withColumn('extract', F.regexp_extract(col('title'),'(?i)[P]ython',0)).show()

outcome

+---+--------------------+-------+
| id|               title|extract|
+---+--------------------+-------+
|  1|Python Core Devel...| Python|
|  2|Hypothesis.works ...| Python|
|  3|CheckiO  games fo...| python|
+---+--------------------+-------+

Python regular expressions can’t find patterns – use pyspark on Apache Spark

Solution

Related Problems and Solutions