Java – Randomly generate meaningful (valid) English words in Android applications

Randomly generate meaningful (valid) English words in Android applications… here is a solution to the problem.

Randomly generate meaningful (valid) English words in Android applications

I’m making a dictionary app. To do this, I’m using the Pearson Dictionary API. I need to generate a word so I can look up the definition of that word.

Question

I know how to generate a

random word, but I don’t know how to generate a meaningful English word.

I tried to solve this problem by requesting a JSON response and checking the results[] in the response (results[ ] contains the definition of the word). So, if results[].lenght > 0 then the word is a valid English word.

But the solution above has its own serious problems: suppose I want to generate a 5-letter word with as many as 26^5 = 11881376 different combinations, and not as many 5-letter meaningful English words. As the letters in a word increase, so does the number of combinations. Therefore, generating a meaningful word can take a long time.

How to check if the generated word is a meaningful English word? Is there any viable programmatic way to do this?

Or is there any other way to fix this?

Solution

As far as I know, you either generate random strings of letters and check if they are words (a very slow, chance-by-luck approach, as you know), or store a list of “known good” words and pick randomly from that list.

How big the list needs to be depends on what you want to achieve.

According to this page OED has about 171,476 major entries, excluding plural forms (cat, cats), standard variants (sit, sitting), or words with multiple categories (e.g. dog can be a noun [the animal] or a verb [to follow persistently], etc.). According to this page an adult knows an average of 20,000 to 35,000 words, so choose 50,000 carefully words should cover most general uses.

the answer to this question (now closed) provides the source of some word lists. Checking one of them (originally provided by infochimps.org but available as simple text-list on GitHub) shows that the average length of more than 350,000 words is less than that 10 characters. For Linux (and possibly other flavors), /usr/share/dict/words can be a useful starting point.

Related Problems and Solutions