The datalake for everyone by Laurent Mourer

  • By: jsoon
  • In: Uncategorized
  • Posted: février 18, 2018

Girls in Tech Paris is delighted to be a partner of the Elastic tour as they make their way to Paris and Munich.

Our generous partner provided Girls in Tech members with a free opportunity to be educated about Elastic on their sold-out tour.  Thank you Elastic!!

Laurent Mourer describes his experience:

The datalake for everyone

This is the first time that I went to a conference. I have used the Elastic stack since 2015 starting with the version 2. Now I use the Elastic stack daily. Specifically I use Elasticsearch and Kibana to perform complex aggregations using the scripting language Painless.

In the past, I used Logstash and the Beats as a data input and built graphs using Kibana. Beats is used to collect system and network metrics and logs. I also used Logstash to extract data from an Oracle database with the jdbc plugin.

The principal goal of the conference was to present to us version 6.2 of the Elastic stack and see how big companies use the stack.

For Logstash the main feature discussed was the multiple data input. Before you had to add “if-conditions” with different types to allow different input sources. Multiple input sources are now supported natively. Nowadays Logstash can be clustered. It was rewritten in Java to increase performance, but it remains quite resource hungry.

Regarding Elasticsearch, the main change is the fact that there is no need to perform a rolling restart to update its version. However, this involves following a precise process. It’s impossible to  upgrade version 2 to version 6 without restarting.

Another positive thing I noted, is the arrival of an SQL plugin to facilitate the accessibility for developers familiar with SQL.

The visual appearance of Kibana has always constantly evolved to the be more ergonomic. The interface is more clear, simpler and more elegant. I find that Kibana is more and more accessible to non-technical people. This impression is accentuated by the introduction of canvas, the new language of data exploration which is simpler than the Elasticsearch dsl – Kuery, and by the “getting started” wizard directly in the Kibana homepage. Canvas allows you to easily build slides with visualization.

However it wasn’t clearly explained if canvas would be available without xpack (the premium version of Elastic). In order to increase security and allow big companies to more easily integrate Kibana in their environment, authentication by SAML was introduced.

In Kibana, one can now monitor Logstash and Beats to follow the activity and metrics about data collection.

I was disappointed that they didn’t speak about Painless. I use this language every day and despite the fact that it is very performant it’s missing tools to help developers. I feel a little abandoned when dealing with this technology.

The company Renault presented how they integrated the Elastic stack into their ecosystem. They began the project in 2015.

They integrated the ElasticSearch stack into their datalake. They keep improving their datalake even today, continually increasing it’s agility, and improving their costing efficiency for the service . The team that supports Elastic products in Renault seem very knowledgeable in dealing with this technology.

To show the importance of the Elastic products within Renault, here are some key metrics:

  • 35 projects based on Elastic stack.
  • 66 datasources ingested in the datalake as a whole
  • 300 TB stored in datalake as a whole

However they did not present the architecture of their solution.

Amadeus, on the other end, was far more transparent and showed us their infrastructure.

Their cluster contains 8 nodes:

  • 3 master nodes
  • 3 date nodes
  • 2 coordinating nodes

The advantage to this architecture is that it is:

  • Reliable
  • redundant
  • Flexible in scale up/down

Both presentations helped me understand how huge companies use the Elastic stack, and what are some good practices when using this stack. They also showed us examples of architectures.

The most interesting topic for me in the conference was the Machine Learning feature. It is a very fashionable term, however, I did not really know what it was. I never saw how Elastic was handling machine learning, and seeing the demo through Kibana was very impressive. The system is able to determine a temporal model by analyzing the data collected. This model then makes it possible to detect the anomalies of the data and to give a vision of the evolution of the data.

For example at Christmas connections to retailers websites increase. With the data of connections to this website, the Machine Learning feature can determine that over the Christmas period, connections would increase. If they did not the algorithms will consider that there is an anomaly.

To conclude, the version 6.2 of the Elastic stack was very exciting. The new features of Kibana allow  anyone to use it and the interface is more beautiful and ergonomic. They also show me how some of the big companies use the Elastic stack and I discovered some features which are include in x-pack.


Laurent Mourer for Girls in Tech


Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *