22 Apr 2018

Elasticsearch - How to decide the best mapping scheme?

I'm super new to Elasticsearch and have been finding my way to make search work for the upcoming version of CrazyEngineers. I wish to know from our fellow CEans who've worked with ES - how to decide which mapping scheme is most suitable? By mapping scheme, I'm referring to the analyzers and tokenizers that we need to define at the time of indexing documents.

I'm specifically interested in n-gram tokenizers but haven't been getting desired results for search terms. Would appreciate pointers.
Saandeep Sreerambatla

Saandeep Sreerambatla

Branch Unspecified
23 Apr 2018
what is the type of data that you are working on ? We need information on the business case?
I didnt work on ES at all, but my team works on ES so much. So need more information on this so that i can go back and get the answers.
23 Apr 2018
Thanks, @Saandeep Sreerambatla . Since posting the thread, I've learned more about ES. Right now, I'm in the 'fine-tuning' phase for @mention suggestions.

Let's say a user starts typing a name. Ideally, we should return multiple matches that have the starting characters, but these matches have to be relevant. I'm therefor experimenting with 'edge-ngram' tokenizer.

Here's the query I posted on a forum -

Here's how my current settings look like -

protected $settings = [
   'analysis'  => [
       'filter'    =>  [
           'autocomplete_filter'   =>  [
               'type'  =>  'edge_ngram',
               'min_gram'  =>  2,
               'max_gram'  =>  12,
           ],
       ],
       'analyzer'  =>  [
           'autocomplete' =>  [
               'type'  =>  'custom',
               'tokenizer' =>  'standard',
               'filter'    =>  ['lowercase', 'autocomplete_filter']
           ]
       ]
   ]
];
and
public function buildQueryPayload()
{
   return [
       'must'  =>  [
           'multi_match'   =>  [
               'query'     => $this->builder->query,
               'fields'    => ['first_name', 'last_name', 'name^2'],
               'type'      =>  'phrase_prefix'
           ]
       ]
   ];
}
This is my best try so far; but I'm not happy with the results. I've the following in my User model -

public function toSearchableArray() {
   return [
       'first_name'    =>  $this->first_name,
       'last_name'     =>  $this->last_name,
       'name'          =>  $this->first_name . ' ' . $this->last_name
   ];
}
I'm thinking of adding an extra field from my database to the index called last_activity_at and then sort the suggested users according to those who've the most recent last_activity.

Can someone suggest if I need to make changes to my filter, analyzer or the query? The end result I want to achieve is -

User types: @Joh . The system should return
  1. 'John Doe' [most recently active]
  2. 'John Doe' [second most recently active]
  3. 'Johan Kent'
  4. 'Johar Woe'
... like that. If the last_activity_at field is null, ES should only match based upon the closest search.

Would welcome suggestions.

Share this content on your social channels -

Only logged in users can reply.