Content indexing and search is a growing need in an environment that produces more and more data with variable structure in nature and over time. Structured or free research within these data is also of great importance, and the ever-increasing volume requires the establishment of a system whose scaling is guaranteed by robust mechanisms. Elastic stack solutions, historically known by the acronym ELK (ElasticSearch, Logstash, Kibana) provide answers to these needs based on open source and highly distributable solutions.
The establishment of indexing and research infrastructure is a growing need. As a result of users becoming accustomed to data access via text search (Google, Bing, etc.), this need is constantly increasing for new applications in companies or on the Web.
In order to meet this need, an infrastructure must be put in place, corresponding to the following types of components:
These different components can be provided by a single solution, or by the composition of several complementary solutions.
Other components can be added to this infrastructure to ensure scalability and high availability, for example:
The Elastic stack is composed of several solutions. The best known are the acronym ELK (popular in the community):
In addition, there are more specialized collection components than Logstash: the Beats. These components are optimized to have a very small footprint on the systems they collect, and that provide ready-to-use data for Logstash or directly by an Elasticsearch cluster.
ElasticSearch, Logstash, Kibana and Beats are the Elastic stack, which (at the time of writing this article) has just released in a version 5.0 consistent that allows to align the version of all these solutions.
There are equivalent stacks provided by other editors like for example that of InfluxData which however, it is more focused on time-series data.
The composable aspect of Elastic's solution allows it to be go together with many other compatible tools.
Technically the Elastic stack is composed of different solutions, but coherent with each other from the point of view of APIs and data exchanges. However, they use different technical stacks.
Elasticsearch and Logstash are based on Java (at least in version 8) and require therefore at least the establishment of a JVM to operate.
Kibana embeds a version of NodeJS to provide the visualization interface.
The Beats are coded in Go to provide executables with the footprint as low as possible.
It is important to install these tools in very controlled environments.
The tools use most of the time the lingua franca JSON for the exchange of the data (even if one can choose to send the non-transformed data, or in other formats), and REST APIs to provide the distributed functionality of the search / indexing / monitoring cluster.
Distributed messages such as RabbitMQ or Kafka to serve as a buffer between the systems being collected and the indexing / searching infrastructure. This is often necessary both to ensure high availability and to facilitate maintenance operations of the Elasticsearch cluster.
The Elastic stack has reached a milestone with the new version 5.0.
The complete solution is more consistent, very well documented (one of the great qualities of the project) and provides an installation experience a little less fragmented.
A good first approach is to use it as a first step in the context of diagnostic tools on application logs or system metrics for example.
It is worth noting Elastic.co also provides support for this stack, as well as a proprietary and paid component: XPack which provides additional security, alert, monitoring and reporting components as well as the ability to structure graphs and query them based on these graphs.
By the recent acquisition of the company Prelert suggests that the stack will provide more and more Predictive analytics capabilities (based on Machine Learning) .To monitor very closely!
Of course, if you too have experienced Elastic Stack (ELK) or equivalent systems, do not hesitate to contact us!