CategoryElasticsearch

SQLCE: How to execute complex scripts

For one of the side-projects I am working on, I needed a way to execute long SQL scripts to create some reports. I started the project using SQLCE because I didn’t wanted to bother with a full installation of a SQL server (even the Express one), so I wrote all the code exploiting EntityFramework 6 and the SQL Server Compact & SQLite Toolbox. I didn’t used SqlLite because at the time EF6 didn’t had support for it (was added in February 2014, see here).

However, one of the drawbacks is the lack of support of some SQL commands, for example the syntax “SELECT … INTO … FROM”. Also, I had complex scripts that used the GO command to separate the blocks. In order to make these work, I wrote a very simple routine that splits the .sqlce file by line, searches for each GO and executes the query till that point. You can find the code here on GitHub ūüôā

Multilanguage searching with Elasticsearch

This time I’ll start directly with the code. ¬†First an utility method to create the connection:

[csharp]
private ElasticClient Connect(IEnumerable contents)
{
var defaultLanguageCode = "eng";
var uri = new System.Uri(ConfigurationManager.AppSettings["ElasticSearchServer"]);
var settings = new ConnectionSettings(uri).SetDefaultIndex(defaultLanguageCode);
var client = new ElasticClient(settings);
}
[/csharp]

And here’s the interesting part:

[csharp]
public IEnumerable<SearchItem> Search(string text, int page, int pageSize, IEnumerable<string> languages)
{
ElasticClient client = this.Connect();
IQueryResponse<SearchItem> searchResults = client.Search<SearchItem>(s => this.GetIndexSearchDescriptor(s, languages)
.QueryString(text)
.Skip(System.Math.Max(0, page) * pageSize)
.Take(pageSize));

if (searchResults.Total != 0 && searchResults.Hits != null && searchResults.Hits.Hits != null)
{
int totalPages = (int)System.Math.Ceiling((double)((float)searchResults.Total / (float)pageSize));
var results = searchResults.Hits.Hits;

return results.Select(h => h.Source).ToArray();
}
return Enumerable.Empty<SearchItem>();
}
[/csharp]

As you may see, the Search method takes the¬†text parameter and a list of languages. In¬†the last post¬†we indexed the content translations using language codes (eg: eng, ita, esp and so on…) as index names. So the idea here is to use the¬†GetIndexSearchDescriptor method to get a SearchDescriptor instance from the language codes and run a query using the text in input.
As a bonus I have added quick&dirty pagination just for the sake of it ūüėÄ

[csharp]
private SearchDescriptor<SearchItem> GetIndexSearchDescriptor(SearchDescriptor<SearchItem> s, IEnumerable<string> languages)
{
if (languages == null || !languages.Any<string>())
return s.AllIndices();

return s.Indices(languages);
}
[/csharp]

Multilanguage indexing with Elasticsearch

This time I’m rambling about Elasticsearch. For those who still don’t know, Elasticsearch is a very interesting search engine based on Lucene. It’s structured to work as a NoSQL database¬†and exposes a very nice¬†RESTful¬†web interface.

Ok, that’s enough, let’s get started with the code!
The first thing to do is download (manually or via Nuget) the NEST client and add it to your project.
Suppose you have a model like this in your application domain (yes, I’m using MongoDB as persistence layer):

[csharp]
public class Content
{
public ObjectId Id {get;set;}
public IEnumerable Translations { get; set; }
}

public class ContentTranslation
{
public string Title { get; set; }
public string FullText { get; set; }
public string LanguageCode { get; set; }
}
[/csharp]

It’s a very simple document structure, modeled in order to store multilanguage contents. How can we store it in the search engine?
The idea here is to create an index for each language and use an intermediary class that holds language-specific data. Something like this:

[csharp]
public class SearchItem
{
public string Id { get; set; }
public string Text { get; set; }
}
[/csharp]

and this is the indexing code:

[csharp]
private void IndexContents(IEnumerable contents)
{
var defaultLanguageCode = "eng";
var uri = new System.Uri(ConfigurationManager.AppSettings["ElasticSearchServer"]);
var settings = new ConnectionSettings(uri).SetDefaultIndex(defaultLanguageCode);
var client = new ElasticClient(settings);

foreach (var content in contents) {
foreach (var translation in content.Translations) {
var searchItem = new SearchItem()
{
Id = content.Id.ToString(),
Text = string.Format("{0} {1}", translation.Title, translation.FullText)
};
client.Index(searchItem,
translation.LanguageCode,
typeof(Content).FullName,
content.Id.ToString()
);
}
}
}
[/csharp]

ok, let’s analyze the code:

  • lines 3 to 6 are responsible to initialize the ElasticSearch client and setting “eng” as default index.
  • lines 10 to 14 simply adapt the content translation to the intermediary class. Note on line 12 that we are specifying the Content Id.
  • And now the real indexing: lines 15 to 19: here we are telling the engine to index our searchItem, using¬†translation.LanguageCode as index name, ¬†the Content class type fullname as item type (this will be used somewhat like Collection name for a NoSQL db), and lastly we pass the current content Id.

That’s basically all ūüôā

Bonus: the NEST client exposes also a nice ElasticClient.IndexMany, allowing to index multiple items in just one call.

Next: ok now I’ve indexed my contents. How can I search them?

© 2017 Davide Guida

Theme by Anders NorenUp ↑