It’s no secret that Elasticsearch is probably the fastest-moving Big Data open-source product, and everyone who uses it for search or log analytics knows the importance of keeping up-to-date with all of the recent updates.
The previous major upgrade to the ELK Stack (from 1.7 to 2.X) introduced some major breaking changes. The update introduces Elasticsearch 5.0, Logstash 5.0, Kibana 5.0 and unites all of them with Beats for the first time.
Elk Stack 5.0, which is in its third alpha version, contains several key changes that should make everyone’s life a little easier. One of the most important ones is the introduction of the Ingestion node, which might put the final nail in Logstash’s coffin.
In this post, I will outline the changes to the various components of the stack and weigh the pros and cons of the new version.
What’s New in ELK Stack 5.0
As stated above, this new version introduces many changes and upgrades. To grasp the full extent of those changes, each component of the stack needs to be examined separately. Some components are being deprecated while others have been moved and integrated directly into Elasticsearch itself.
With the combination of new Ingestion node (more about that later) and the improvements made to the Beats family, it sure seems that Logstash is starting to take the backseat when it comes to data ingestion and parsing.
Beats is a collection of data query, gathering and shipment agents. There are more than twenty such components that can work with different data sources, five of which are developed and maintained by the Elastic team (Filebeat, Metricbeat, Packetbeat, and Winlogbeat). The others are handled by the open source community and are called Community Beats .
Filebeat is used to ship various types of logs into Elasticsearch. Most of the work that was done on this Beat revolved around bug fixes in areas such as null data fields , correcting its shutdown behavior , new line parsing , and incorrect type keys handling .
Packetbeat sends data about network traffic that is interchanged between various applications and servers. The most notable feature that was added to Packetbeat is support for the third and fourth versions of NFS . Prior to version 5.0, the correction of issues relating to pgSQL parsing and a compilation issue fix were two of the items that were addressed.
Metricbeat and Topbeat
Both Metricbeat and Topbeat were designed to ship server metrics such as CPU, RAM, and disk utilization. As support for additional metrics was added, the Elastic team decided to release a new modular metric shipping Beat that allows users to add new metrics to its configuration easily. Up through the second alpha version of ELK Stack 5.0, Topbeat was changed many times to include the addition of usernames to processes and enable compilation on OpenBSD as well as incorporate a fix that related to Windows CPU values parsing .
In version 5.0, Metricbeat was released with ZooKeeper, NGINX, mySQL, Redis, and system modules.
Winlogbeat sends Windows-generated event logs to Elasticsearch. This beat received many enhancements:
- Data structure improvements that allow more fields to be published with each event
- Custom field configuration
- Event metadata cache for file handlers (this feature had an issue that was fixed )
- Event filtering and selection improvement
General Changes and Bug Fixes
The general stackwide changes and enhancements revolve mainly around configuration and output. The configuration changes include the addition of new options such as custom fields , CPU utilization , the addition of general and log configuration paths as CLI flags, and variable enhancements . Support for Kafka output has been added, and the Redis output has been enhanced.
In addition, filter behavior has been changed (with the addition of support for filter plugins and changes in field names). Compatibility with Elasticsearch versions 2.x has been added and Logstash test capabilities were enhanced.
Elasticsearch is the main component of the ELK Stack, so many changes were made to Elasticsearch 5.0 it including an entire new scripting language, a new way of log processing, and a different approach to data aggregation.
The last time Elasticsearch introduced a new scripting language, it was taken off the shelf in no time. A new attempt now is called Painless , and it addresses issues with the way that events and dynamic data types are declared and executed. Similar to Groovy , Painless maintains its original structure of object reference and reading, making it easy to use and implement as well as migrate old scripts.
Data aggregation has slightly improved. Today, new and existing data is cached and aggregated “on the fly” relative to the current system’s date and time and is now checked on a microsecond level. Before this change, an initial timestamp had to be set to run the calculations, and the system did not always calculate the time difference correctly. The dashboard had to be refreshed and reconfigured every time that a change was made.
Shifting from Logstash to Elasticsearch
Log processing has been shifted from Logstash to Elasticsearch itself. Logs are now shipped from the Filebeat forwarder directly into Elasticsearch, where they are then processed and indexed. This is a major change. The processing engine that is now inside Elasticsearch includes items including the date, convert, and grok filters.
Previously, Elasticsearch was running on v 6.0.0, but now the engine has been upgraded. The upgrade includes a revision in the query language that introduces the use of a new type of data structure and adds new data types such as keyword and text.
This upgrade improves the range query’s optimization and search capabilities and boosts the overall system performance because the new indexes and searches can be computed and executed in half the time.
Another enhancement to systemwide optimization that is worthwhile to mention is related to a change in the internal Elasticsearch engine that improves performance.
Elasticsearch 5.0 also ensures that deleted indexes are “kept at bay” by using a new feature called Deleted Index Tombstones (this prevents deleted indices from “returning” after a cluster maintenance operation is performed).
Support for IPv6 data has been significantly extended.
In Elasticsearch 5.0, many bugs were fixed in areas including analysis, the API, painless language utilization, data aggregation, clustering, and search. In the specific area of data aggregation, fixes to IP addresses , dates and time units parsing were introduced. Bugs were fixed that related to search in terms of non-indexed fields query failures and named queries performance issues .
Logstash 5.0 introduces many changes and bug fixes to Logstash. The most important change is in Logstash package handling. Logstash’s installation has been aligned, and now the binary, log, and data files can be found in paths similar to those in Elasticsearch and Kibana.
Kibana 5.0 has a new look and feel and includes many improvements to its GUI (a new menu and buttons) and its API interaction component, “Sense,” which was renamed to “Console.”
Kibana’s GUI now has several new color palettes and themes in contrast to Kibana 4.0, which only had a few dominant colors that were mainly black and white. In addition, the installation capabilities of Timelion and third-party plugin packages was extended.
A status page can now be configured for all users, even unauthenticated ones, and date formats can be specified once date histograms are filtered. Another new feature is a persistent UUID for each Kibana instance in an Elasticsearch cluster for unique identification.
X-Pack, the newest component in the ELK Stack, unifies the installation of various extensions and plugins such as Graph under Kibana. (Graph checks relations between indexed items in Elasticsearch and Watcher for internal alerting and monitoring capabilities.) This component makes it easier to use many packages and plugins.
Other additional features include an installation history report and the ability to manage roles and users in the system for security purposes.
What Will Break in ELK Stack 5.0
As version 5.0 is still a work in progress and only in its alpha stage, many issues are being found, reported, and worked on. This means that a first time “out of the box” installation may perform horribly or not even work at all.
Each component of the stack has changes that prevent any custom configuration to be imported into it. These changes need to be studied in the relase notes and documentation to prepare for any migration to the new release.
Configuration changes include the move of shipper settings inside the Beats configuration files.
The replacement of Topbeat with Metricbeat is a breaking change by itself because support shifts from Metricbeat from Topbeat.
Custom configurations and scripts that are related to the functionality of the Beats need to be changed as the binary file location has changed as a result of the new system’s directory layout , settings, and shift from rsyslog to self log file rotation.
The change in the directory layout and the use of a new configuration file will prevent normal configuration importation and — in the case of a custom configuration — may require the creation of new configuration files and adaptations to them.
A new cluster node discovery method has been changed with the introduction of node handshake . Previously discovered nodes will have to be rediscovered when initiating light connections and upon intial cluster recovery. Some registry settings such as this and this one have been changed or moved internally, causing older settings not to work in the new version.
Due to a stricter plugin version policy in the new Kibana, upgrading might break existing plugins. In this new version, developers will be required to release a new version of their plugins for each and every update — including minor ones. This discussion sheds more light on this change .
Should You Upgrade?
This is always the question one when faced with new versions. My rule of thumb: If it ain’t broken, don’t fix (or upgrade) it.
If you have a stable stack that works for you and satisfies your requirements, there is no immediate reason to make a change. Of course, if you have the resources available, you could set up a sandbox for testing.
Here are some other points to consider:
- The first advantage is an obvious one – a new version is an opportunity for a system to get a clean slate. With the new features of version 5.0, an old system’s architecture can be redesigned and implemented.
- The new version provides better search and index capabilities and promises better performance, especially with the dimension point fields feature used by the new Lucene engine.
- Kibana’s refactored UI and extended plugin support makes it more intuitive and easy to use. In addition, Kibana component integration was made easier with the addition of new configuration features and commands.
- “Painless,” the new scripting language, makes it easier to work with Elasticsearch and manipulate the data and events within it.
- The use of Beats together with the other components can boost log and data shipping performance. In addition, Beat configuration is relatively easy (and comes with much support for a JSON configuration format).
- The system is still in alpha testing, so there are many issues within the system as a whole. Things may change until the final release, so testing multiple versions (both stable and unstable) can be time-consuming.
- The parsing, processing, and indexing of data directly in Elasticsearch may affect overall cluster performance significantly, especially in heavy traffic environments. Good benchmarks, use cases, and demonstrations are needed to strengthen overall system performance. In addition, current cluster performance issues may be solved within the “boundaries” of the current cluster version by changing configurations or adding system resources. Upgrading may not be the solution for a major performance issue.
- Each new release requires the upgrading of all of the components in clusters. Breaking changes and possible configuration changes may prove the upgrade process to be very difficult and cumbersome.
A Final Note
Currently, the most recent version of Elk Stack 5.0 is Alpha 3. The release date of Alpha 4 (with even more bug fixes and new features) is still unknown.
ELK Stack 5.0 has many exciting new features and improvements, many of which are eagerly desired. This changes the ELK Stack as we know it and may greatly change the world of logging and monitoring as a whole.