1 Production considerations for ELK stack
In my previous part 1 and part 2 posts, I provided the detail steps on setting up ELK to monitor Spring Boot application, but that setup lack consideration for production environment. In this post, I will provide some tips for running ELK on production environment.
1.1 Security consideration
1.1.1 Using SSL for the communication between Filebeat and Logstash
The above setup does not use SSL between the communication of Filebeat and Logstash. If there is a need to setup SSL, you can read more details on
For AWS, we can use VPC and security groups to restrict the traffic between Filebeat hosts and ELK host, implementation SSL might not be needed.
1.1.2 Login authentication for Kibana and Elasticsearch
Kibana and Elasticsearch does not come with login authentication functionality.
To improve security, we could:
- Run Elasticsearch on localhost or restrict the access to a security group
- Pay for http://www.elasticsearch.org/overview/shield/, which is the Elasticsearch solution to provide security for a cluster. It supports role-based access controls.
- Use reverse proxy, such as suggestion from https://www.mapr.com/blog/how-secure-elasticsearch-and-kibana
1.2 Disk space management
1.2.1 Use LVM for the Elasticsearch data
Logical Volume Management allow use to dynamic create/resize/delete disk space to the Linux file system. It is import to use LVM for the /var/lib/elasticsearch.
1.2.2 Clean up old log data in Elasticsearch
We can use elasticsearch-curator to remove old log data from Elasticsearch
Download curator
# wget https://packages.elastic.co/curator/4/centos/6/Packages/elasticsearch-curator-4.2.5-1.x86_64.rpm
|
Install curator
# yum install elasticsearch-curator-4.2.5-1.x86_64.rpm
|
Before creating a cron job, get the full path of curator
# which curator
/usr/local/bin/curator
|
Create a curator configuration file called /root/curator_config.yml
# Remember, leave a key empty if there is no value. None will be a string,
# not a Python "NoneType" client: hosts: - 127.0.0.1 port: 9200 timeout: 30 logging: loglevel: INFO |
Create an action file called /root/curator_del_action.yml
# Remember, leave a key empty if there is no value. None will be a string,
# not a Python "NoneType" # # Also remember that all examples have 'disable_action' set to True. If you # want to use this action as a template, be sure to set this to False after # copying it. actions: 1: action: delete_indices description: >- Delete indices older than 5 days (based on index name), for custom- prefixed indices. Ignore the error if the filter does not result in an actionable list of indices (ignore_empty_list) and exit cleanly. options: ignore_empty_list: True continue_if_exception: False disable_action: False filters: - filtertype: age source: name direction: older timestring: '%Y.%m.%d' unit: days unit_count: 5 |
Create a crontab (note: any user should access the elasticsearch from the localhost)
# crontab -e
|
Add the following line to run curator at 20 minutes past midnight (system time) and connect to the elasticsearch node on 127.0.0.1 and delete all indexes older than 120 days and close all indexes older than 90 days.
20 0 * * * /usr/bin/curator --config /root/curator_config.yml /root/curator_del_action.yml
20 0 * * * /usr/bin/curator close -older-than 90
|
Options:
–older-than: Number of time units older than this
–prefix: Prefix of the names of the indices. Default is logstash
–time-unit: Timeout. Default is days
You can find more information on curator command line option from:
https://www.elastic.co/guide/en/elasticsearch/client/curator/current/singleton-cli.html
1.3 High availability and scalability consideration
An one node ELK stack is only good for small implementation, but it does save on costs. When the load increase, we could scale up (using a bigger EC2 instance) and scale out (break down ELK stack into multiple hosts with a Elasticsearch cluster.
We can also implement an ELK stack with high availability reside on multiple availability zones within AWS, but this is out of the scope of this document. You can get more information on how to implement a HA ELK stack from this link http://logz.io/blog/deploy-elk-production/
1.4 Consideration for AWS security group
The ELK EC2 instance need to have a security group with the following port opened:
- 22 - ssh
- 5601 - kibana web interface
- 5044 - logstash listening port
The security group for the ELK EC2 instance also should only listen to the security group for the application server needs to push logs to the ELK stack.
1.5 ELK init script
The following init script can be used to start or stop ELK stack
#!/bin/sh
# Init script for ELK stack
#
### BEGIN INIT INFO
# Provides: ELK
# Required-Start: $remote_fs $syslog
# Required-Stop: $remote_fs $syslog
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description:
# Description: ELK
### END INIT INFO
PATH=/sbin:/usr/sbin:/bin:/usr/bin
export PATH
start() {
service elasticsearch start
r1=$?
service kibana start
r2=$?
initctl start logstash
r3=$?
if [[ $r1 -ne 0 || $r2 -ne 0 || $r3 -ne 0 ]]; then
exit 1
fi
}
stop() {
service kibana stop
r1=$?
initctl stop logstash
r2=$?
service elasticsearch stop
r3=$?
if [[ $r1 -ne 0 || $r2 -ne 0 || $r3 -ne 0 ]]; then
exit 1
fi
}
status() {
service elasticsearch status
r1=$?
service kibana status
r2=$?
initctl status logstash
r3=$?
if [[ $r1 -ne 0 || $r2 -ne 0 || $r3 -ne 0 ]]; then
exit 1
fi
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop && start
;;
status)
status
;;
*)
echo "Usage: $SCRIPTNAME {start|stop|status|restart}"
;;
esac
exit $?
|
No comments:
Post a Comment