Java searches the string contents for partial matching
I’m working on a project where I need to search for a specific string in a piece of text. However, I don’t need an exact match, more of a % match.
For example, this is the text paragraph I’m searching for:
Fluticasone Propionate Nasal Spray, USP 50 mcg per spray is a
corticosteroid indicated for the management of the nasal symptoms of
perennial nonallergic rhinitis in adult and pediatric patients aged 4 years
and older."
Then I searched to see if any of the words in the following line matched the paragraph:
1)Unspecified acute lower respiratory infection
2)Vasomotor rhinitis
3)Allergic rhinitis due to pollen
4)Other seasonal allergic rhinitis
5)Allergic rhinitis due to food
6)Allergic rhinitis due to animal (cat) (dog) hair and dander
7)Other allergic rhinitis
8)"Allergic rhinitis, unspecified"
9)Chronic rhinitis
10)Chronic nasopharyngitis
My initial approach was to use the boolean value and include:
boolean found = med[x].toLowerCase().contains(condition[y].toLowerCase());
However, the result of each cycle is negative.
The result I expected was:
1) False
2) True
3) True
4) True
5) True
6) True
7) True
8) True
9) True
10) False
Very new to Java and its methods. Basically, if any word in A matches any word in B, it is marked as true. How do I do it?
Thanks!
Solution
You must first mark one of these strings. What you are doing now is trying to match the entire line.
Something like this should work:
String text = med[x].toLowerCase();
boolean found =
Arrays.stream(condition[y].split(" "))
.map(String::toLowerCase)
.map(s -> s.replaceAll("\\W", "")
.filter(s -> !s.isEmpty())
.anyMatch(text::contains);
I added punctuation and removal of any blank strings so we don’t have false matches on these. (\\W
actually removes characters that are not in [A-Za-z_0-9]
, but you can change it to any character you like).
If you need it to
be more efficient because you have a lot of text, you may want to flip it and use a Set
with faster lookup.
private Stream<String> tokenize(String s) {
return Arrays.stream(s.split(" "))
.map(String::toLowerCase)
.map(s -> s.replaceAll("\\W", "")
.filter(s -> !s.isEmpty());
}
Set<String> words = tokenize(med[x]).collect(Collectors.toSet());
boolean found = tokenize(condition[y]).anyMatch(words::contains);
You may also want to filter out stop words such as to
, and
, etc.
You can use the list here and add an extra filter after the filter that checks the blank string to check if the string is not a stop word.