1 Production considerations for ELK stack

In my previous part 1 and part 2 posts, I provided the detail steps on setting up ELK to monitor Spring Boot application, but that setup lack consideration for production environment. In this post, I will provide some tips for running ELK on production environment.

1.1 Security consideration

1.1.1 Using SSL for the communication between Filebeat and Logstash

The above setup does not use SSL between the communication of Filebeat and Logstash. If there is a need to setup SSL, you can read more details on

https://www.elastic.co/guide/en/beats/filebeat/current/configuring-ssl-logstash.html

For AWS, we can use VPC and security groups to restrict the traffic between Filebeat hosts and ELK host, implementation SSL might not be needed.

1.1.2 Login authentication for Kibana and Elasticsearch

Kibana and Elasticsearch does not come with login authentication functionality.

To improve security, we could:

Run Elasticsearch on localhost or restrict the access to a security group
Pay for http://www.elasticsearch.org/overview/shield/, which is the Elasticsearch solution to provide security for a cluster. It supports role-based access controls.
Use reverse proxy, such as suggestion from https://www.mapr.com/blog/how-secure-elasticsearch-and-kibana

1.2 Disk space management

1.2.1 Use LVM for the Elasticsearch data

Logical Volume Management allow use to dynamic create/resize/delete disk space to the Linux file system. It is import to use LVM for the /var/lib/elasticsearch.

1.2.2 Clean up old log data in Elasticsearch

We can use elasticsearch-curator to remove old log data from Elasticsearch

Download curator

# wget https://packages.elastic.co/curator/4/centos/6/Packages/elasticsearch-curator-4.2.5-1.x86_64.rpm

Install curator

# yum install elasticsearch-curator-4.2.5-1.x86_64.rpm

Before creating a cron job, get the full path of curator

# which curator

/usr/local/bin/curator

Create a curator configuration file called /root/curator_config.yml

# Remember, leave a key empty if there is no value. None will be a string,
# not a Python "NoneType"
client:
hosts:
- 127.0.0.1
port: 9200
timeout: 30

logging:
loglevel: INFO

Create an action file called /root/curator_del_action.yml

# Remember, leave a key empty if there is no value. None will be a string,
# not a Python "NoneType"
#
# Also remember that all examples have 'disable_action' set to True. If you
# want to use this action as a template, be sure to set this to False after
# copying it.
actions:
1:
   action: delete_indices
   description: >-
     Delete indices older than 5 days (based on index name), for custom-
     prefixed indices. Ignore the error if the filter does not result in an
     actionable list of indices (ignore_empty_list) and exit cleanly.
   options:
     ignore_empty_list: True
     continue_if_exception: False
     disable_action: False
   filters:
   - filtertype: age
     source: name
     direction: older
     timestring: '%Y.%m.%d'
     unit: days
     unit_count: 5

Create a crontab (note: any user should access the elasticsearch from the localhost)

# crontab -e

Add the following line to run curator at 20 minutes past midnight (system time) and connect to the elasticsearch node on 127.0.0.1 and delete all indexes older than 120 days and close all indexes older than 90 days.

20 0 * * * /usr/bin/curator --config /root/curator_config.yml /root/curator_del_action.yml

20 0 * * * /usr/bin/curator close -older-than 90

Options:

–older-than: Number of time units older than this

–prefix: Prefix of the names of the indices. Default is logstash

–time-unit: Timeout. Default is days

You can find more information on curator command line option from:

https://www.elastic.co/guide/en/elasticsearch/client/curator/current/singleton-cli.html

1.3 High availability and scalability consideration

An one node ELK stack is only good for small implementation, but it does save on costs. When the load increase, we could scale up (using a bigger EC2 instance) and scale out (break down ELK stack into multiple hosts with a Elasticsearch cluster.

We can also implement an ELK stack with high availability reside on multiple availability zones within AWS, but this is out of the scope of this document. You can get more information on how to implement a HA ELK stack from this link http://logz.io/blog/deploy-elk-production/

1.4 Consideration for AWS security group

The ELK EC2 instance need to have a security group with the following port opened:

22 - ssh
5601 - kibana web interface
5044 - logstash listening port

The security group for the ELK EC2 instance also should only listen to the security group for the application server needs to push logs to the ELK stack.

1.5 ELK init script

The following init script can be used to start or stop ELK stack

#!/bin/sh

# Init script for ELK stack

### BEGIN INIT INFO

# Provides: ELK

# Required-Start: $remote_fs $syslog

# Required-Stop: $remote_fs $syslog

# Default-Start: 2 3 4 5

# Default-Stop: 0 1 6

# Short-Description:

# Description: ELK

### END INIT INFO

PATH=/sbin:/usr/sbin:/bin:/usr/bin

export PATH

start() {

service elasticsearch start

r1=$?

service kibana start

r2=$?

initctl start logstash

r3=$?

if [[ $r1 -ne 0 || $r2 -ne 0 || $r3 -ne 0 ]]; then

exit 1

}

stop() {

service kibana stop

r1=$?

initctl stop logstash

r2=$?

service elasticsearch stop

r3=$?

if [[ $r1 -ne 0 || $r2 -ne 0 || $r3 -ne 0 ]]; then

exit 1

}

status() {

service elasticsearch status

r1=$?

service kibana status

r2=$?

initctl status logstash

r3=$?

if [[ $r1 -ne 0 || $r2 -ne 0 || $r3 -ne 0 ]]; then

exit 1

}

case "$1" in

start)

start

;;

stop)

stop

;;

restart)

stop && start

;;

status)

status

;;

echo "Usage: $SCRIPTNAME {start|stop|status|restart}"

;;

esac

exit $?

DevOps tips and tricks

Saturday, March 11, 2017

Setup ELK stack to monitor Spring application logs - part 3

1 Production considerations for ELK stack

1.1 Security consideration

1.1.1 Using SSL for the communication between Filebeat and Logstash

1.1.2 Login authentication for Kibana and Elasticsearch

1.2 Disk space management

1.2.1 Use LVM for the Elasticsearch data

1.2.2 Clean up old log data in Elasticsearch

1.3 High availability and scalability consideration

1.4 Consideration for AWS security group

1.5 ELK init script

No comments:

Post a Comment