Python – Elasticsearch does not give accurate results python

Elasticsearch does not give accurate results python… here is a solution to the problem.

Elasticsearch does not give accurate results python

I’m looking in ES using a matching phrase query. But I noticed that the returned results are inappropriate.
Code ——

      res = es.search(index=('indice_1'),

body = {
    "_source":["content"],

"query": {
        "match_phrase":{
        "content":"xyz abc"
        }}}

,
size=500,
scroll='60s')

It doesn’t let me record where the content is –
“Hi, my name is XYZ ABC.” And “Hey wassupxyz ABC. How’s life going”

A similar search in MongoDB using regular expressions will also get both records. Any help would be appreciated.

Solution

If you did not specify an analyzer, you are using standard Default. It will do grammar-based word segmentation. So you have a clause on the phrase “Hi, my name is XYZ ABC”. Will be similar to [hi, my, name, isxyz, abc] and match_phrase are looking for terms [xyz, abc] next to each other (unless you specify slop).

You can use different analyzers or modify your queries. If you use a match query, it will match the term “abc”. If you want phrase matching, you need to use a different analyzer. NGrams should be for you.

Here is an example:

PUT test_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "my_tokenizer"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  }, 
  "mappings": {
    "_doc": {
      "properties": {
        "content": {
          "type": "text",
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}

PUT test_index/_doc/1
{
  "content": "hi my name isxyz abc."
}

PUT test_index/_doc/2
{
  "content": "hey wassupxyz abc. how is life"
}

POST test_index/_doc/_search
{
  "query": {
    "match_phrase": {
      "content": "xyz abc"
    }
  }
}

This results in two documents being found.

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "test_index",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "content": "hey wassupxyz abc. how is life"
        }
      },
      {
        "_index": "test_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "content": "hi my name isxyz abc."
        }
      }
    ]
  }
}

Edit:
If you want to make a wildcard query, you can use the Standard Analyzer. The use case you specified in the comment will be added like this:

PUT test_index/_doc/3
{
  "content": "RegionLasit Pant0Q00B000001KBQ1SAO00"
}

You can query it with wildcard:

POST test_index/_doc/_search
{
  "query": {
    "wildcard": {
      "content.keyword": {
        "value": "*Lasit Pant*"
      }
    }
  }
}

Essentially, you’re doing a substring search analyzer without nGram. Your query phrase will be “*<my search terms>*"

Related Problems and Solutions