Python words are broken down into subordinates : e. g. Motorbike – motor, bicycle

Python words are broken down into subordinates : e. g. Motorbike – > motor, bicycle … here is a solution to the problem.

Python words are broken down into subordinates : e. g. Motorbike – > motor, bicycle

I have a list of words like [bike, motorbike, copyright].
Now I want to check if the word consists of subordinates, which are also independent words. This means that my algorithm output should look similar to: [bike, motor, motorbike, copy, right, copyright].

I now know how to check if a word is an English word :

import enchant
english_words = []
arr = [bike, motorbike, copyright, apfel]
d_brit = enchant. Dict("en_GB")
for word in arr:
    if d_brit.check(word):
        english_words.append(word)

I also found an algorithm that breaks down words in all possible ways: Splitting a word into all possible ‘subwords’ – All possible combinations

Unfortunately, it takes a long time to split a word like this and then check if it’s an English word because my dataset is huge.

Can someone help?

Solution

The

nested for loops used in code are very slow in Python. Since performance seems to be the main issue, I recommend looking for available Python packages to do some of the work, or building your own extensions, such as using Cython, or not using Python at all.

Related Problems and Solutions