Building A Search Page with Elasticsearch and .NET

Building A Search Page with Elasticsearch and .NET

Part II

Search index population

Elasticsearch is completely document-oriented and it stores entire documents in its index. But you need to to create a client to communicate with Elasticsearch.

var node = new Uri("http://localhost:9200");

var settings = new ConnectionSettings(node);

settings.DefaultIndex("stackoverflow");

var client = new ElasticClient(settings);



Here we create a class that represents our document.



public class Post

{

public string Id { get; set; }

public DateTime? CreationDate { get; set; }

public int? Score { get; set; }

public int? AnswerCount { get; set; }

public string Body { get; set; }

public string Title { get; set; }



[String(Index = FieldIndexOption.NotAnalyzed)]

public IEnumerable<string> Tags { get; set; }



[Completion]

public IEnumerable<string> Suggest { get; set; }

}



Elasticsearch is known to dynamically resolve the document type and its fields at index time,one can override field mappings or use features on fields in order to give more advanced usages. In the below example we decorated our POCO class with some features so we need to develop mappings with AutoMap.



var indexDescriptor = new CreateIndexDescriptor(stackoverflow)

.Mappings(ms => ms

.Map<Post>(m => m.AutoMap()));



Then, we can develop index called and put the mappings.



client.CreateIndex("stackoverflow", i => indexDescriptor);



After defining our mappings and created an index, we can feed it with documents.

As Elasticsearch does not have any handler to import specific file formats such as XML or CSV, but since it has client libraries for different languages, it is quite easy to build an importer. As StackOverflow dump is in XML format, we use .NET XmlReader class to know question rows, map to an instance of Post and add objects to the collection.

Next, we need to repeat over batches of 1-10k objects and call the IndexMany method on the client:

int batch = 1000;

IEnumerable<Post> data = LoadPostsFromFile(path);

for each (var batches in data.Batch(batch))

{

client.IndexMany<Post>(batches, "stackoverflow");

}

Full text search

Now since document database is populated, let’s define the search service interface:count);

public interface ISearchService<T>

{

SearchResult<T> Search(string query, int page, int pageSize);



SearchResult<Post> SearchByCategory(string query, IEnumerable<string> tags, int page, int pageSize);



IEnumerable<string> Autocomplete(string query, int count);



and a search result class:



public class SearchResult<T>

{

public int Total { get; set; }

public int Page { get; set; }

public IEnumerable<T> Results { get; set; }

public int ElapsedMilliseconds { get; set; }

}



The search method will perform the multi-match query against user input. The multi-match query is useful while running the query against multiple fields. By using this, we can see how relevant the Elasticsearch results are with the default configuration.

First, you require calling the Query method that is a container for any specific query we want to perform. Next, call the MultiMatch method which calls the Query method with the actual search phrase as a parameter and a list of fields that you want to search against. In our context, these are Title, Body, and Tags.

var result = client.Search(x => x // use search method

.Query(q => q // define query

.MultiMatch(mp => mp // of type MultiMatch

.Query(query) // pass text

.Fields(f => f // define fields to search against

.Fields(f1 => f1.Title, f2 => f2.Body, f3 => f3.Tags))))

.From(page - 1) // apply paging

.Size(pageSize)); // limit to page size



return new SearchResult<Post>

{

Total = (int)result.Total,

Page = page,

Results = result.Documents,

ElapsedMilliseconds = result.Took

};

The raw request to Elasticsearch will look like:

GET stackoverflow/post/_search

{

"query":

{

"multi_match":

{

"query": "elastic search",

"fields": ["title","body","tags"]

}

}

}

How to group by tags

Once the search returns results, we would group them by tags so that users can refine their search. In order to cluster group result as categories, we use the bucket aggregations. They allow as to compose bucket of documents which falls into given criteria or not. As we want to cluster by tags, which is a text field, we will use the term aggregations.

Let’s look at feature on the Tags field

[String(Index = FieldIndexOption.NotAnalyzed)]

public IEnumerable<string> Tags { get; set; }



It commands Elasticsearch to neither analyze nor process the input and to find the field. It would not change ‘unit-testing’ tag to ‘unit’ and ‘testing’ etc.

Now, extending the search result class with a dictionary containing the tag name and the number of posts designed with this tag.



public Dictionary<string, long> AggregationsByTags { get; set; }

Next, we need to add Aggregation, of type Term, to our query and give it a name.

var result = client.Search<Post>(x => x

.Query(q => q

.MultiMatch(mp => mp

.Query(query)

.Fields(f => f

.Fields(f1 => f1.Title, f2 => f2.Body, f3 => f3.Tags))))

.Aggregations(a => a // aggregate results

.Terms("by_tags", t => t // use term aggregations and name it

.Field(f => f.Tags) // on field Tags

.Size(10))) // limit aggregation buckets

.From(page - 1)

.Size(pageSize));

The search results now contain aggregation results so the newly-added field is used to return it back to the caller:

AggregationsByTags = result.Aggs.Terms("by_tags").Items

.ToDictionary(x => x.Key, y => y.DocCount)



The next step allows users to select one or more tags and use them as a filter. On adding a new method to interface, it would enable us to pass the selected tags to the search method



SearchResult<Post> SearchByCategory(string query, IEnumerable<string> tags, int page, int pageSize);



In the method implementation, we need to track the tags into an array of filters.

var filters = tags

.Select(c => new Func<FilterDescriptor<Post>, FilterContainer>(x => x

.Term(f => f.Tags, c)));



Then, we need to build our search as a bool query. Bool queries combine multiple queries. The queries inside clauses will be used for searching documents and applying a relevance score to them.

Then we can append a Filter clause which also contains a Bool query which filters the result set.

var result = client.Search<Post>(x => x

.Query(q => q

.Bool(b => b

.Must(m => m // apply clause that must match

.MultiMatch(mp => mp // our initial search query

.Query(query)

.Fields(f => f

.Fields(f1 => f1.Title, f2 => f2.Body, f3 => f3.Tags))))

.Filter(f => f // apply filter on the results

.Bool(b1 => b1

.Must(filters))))) // with array of filters

.Aggregations(a => a

.Terms("by_tags", t => t

.Field(f => f.Tags)

.Size(10)))

.From(page - 1)

.Size(pageSize));

The aggregations work in the scope of a query so they return a number of documents in a filtered set.

Autocomplete attributes

One of the features that are frequently used in search forms is autocompleted.

Searching big sets of text data by only a few characters is not an easy task. Elasticsearch provides us completion suggester which works on a special field that is recorded in a way that helps very fast searching.

You need to decide which field or fields you want to autocomplete to act on and what results will be suggested. Elasticsearch enables to define both input and output.

Summary

This article would help you how to build a full-text search functionality .

Installation and configuration of Elasticsearch are very easy. The default configuration choices are just right to start working with. Elasticsearch doesn't need a schema file and reveals a friendly JSON-based HTTP API for its configuration, index-population, and searching. The engine is optimized to work with a large amount of data.

You need a high-level .NET client to communicate with Elasticsearch so it fits nicely in .NET project.

Elasticsearch is an advanced search engine with many attributes and its own query DSL.

If you want to learn ASP.Net and perfect yourself in .NET training, our CRB Tech Solutions would be of great help and support for you. Join us with our updated program in ASP.Net course.

Stay connected to CRB Tech reviews for more technical optimization and other resources.