Powering Monitoring Analytics with ELK stack Abdelkader Lahmadi, Frédéric Beck

(1)

HAL Id: hal-01212015

https://hal.inria.fr/hal-01212015

Submitted on 5 Oct 2015

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Powering Monitoring Analytics with ELK stack

Abdelkader Lahmadi, Frédéric Beck

To cite this version:

Abdelkader Lahmadi, Frédéric Beck. Powering Monitoring Analytics with ELK stack. 9th Interna-

tional Conference on Autonomous Infrastructure, Management and Security (AIMS 2015), Jun 2015,

Ghent, Belgium. 9th International Conference on Autonomous Infrastructure, Management and Se-

curity (AIMS 2015), 2015. �hal-01212015�

(2)

Powering Monitoring Analytics with ELK stack

Abdelkader Lahmadi, Fr´ed´eric Beck

INRIA Nancy Grand Est, University of Lorraine, France

2015

(compiled on: June 23, 2015)

(3)

References

online Tutorials

I Elasticsearch Reference:

https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html

I Elasticsearch, The Definitive Guide:

https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html

I Logstash Reference:

https://www.elastic.co/guide/en/logstash/current/index.html

I Kibana User Guide:

https://www.elastic.co/guide/en/kibana/current/index.html Books

I Rafal Kuc, Marek Rogozinski: Mastering Elasticsearch, Second Edition, February 27, 2015

(4)

Monitoring data

Machine-generated data

I Nagios logs

I Snort and suricata logs

I Web server and ssh logs

I Honeypots logs

I Network event logs: NetFlow, IPFIX, pcap traces Snort log

[**] [1:2000537:6] ET SCAN NMAP -sS [**] [Classification: Attempted Information Leak]

[Priority: 2] 11/08-11:25:41.773271 10.2.199.239:61562 -> 180.242.137.181:5222 TCP TTL:38 TOS:0x0 ID:9963 IpLen:20 DgmLen:44 ******S* Seq: 0xF7D028A7 Ack: 0x0 Win: 0x800 TcpLen: 24 TCP Options (1) => MSS: 1160

Web server log

128.1.0.0 - - [30/Apr/1998:22:38:48] "GET /english/venues/Paris/St-Denis/ HTTP/1.1" 404 333

(5)

Monitoring data: properties

I Variety of data sources: flows, logs, pcap data

I Structure: structured, non structured

I Massive volume: grow at Moore’s Law kinds of speeds Who is looking into them ?

I IT administrators

I Researchers and scientists Why ?

I Continous monitoring tasks

I Extract useful information and insights: finding meaningful events leading to new discoveries

I Looking for a needle in a haystack: forensics and Incident Handling

I Data search, aggregation, correlation

(6)

Traditional tools and techniques

Storage

I Text files, binary files

I Relational databases: MySQL, PostgreSQL, MS Access

I New trends: NoSQL databases (Redis, MongoDB, HBase, ...)

May 31st 2015, 15:44:06.039 com.whatsapp 10.29.12.166 50048 184.173.179.38 443 10 668 May 31st 2015, 15:55:40.268 com.facebook.orca 10.29.12.166 47236 31.13.64.3 443 10 1384

Tools and commands

I SQL, Grep, sed, awk, cut, find, gnuplot

I Perl, Python, Shell

perl -p -e ’s/\t/ /g’ raw | cut -d ’ ’ -f5-

(7)

ELK Components

Elasticsearch Logstash Kibana

I end to end solution for logging, analytics, search and visualisation

(8)

Logstash: architecture

Event processing engine: data pipeline

I Inputs: collect data from variety of sources

I Filters: parse, process and enrich data

I Outputs: push data to a variety of destinations

(9)

Logstash: Inputs and codecs

Receive data from files, network, etc

I Network (TCP/UDP), File, syslog, stdin

I Redis, RabbitMQG, irc, Twitter, IMPA, xmpp

I syslog,ganglia, snmptrap, jmx, log4j

1 i n p u t {

2 f i l e {

3 p a t h = > "/ var / log / m e s s a g e s "

4 t y p e = > " s y s l o g "

5 }

6

7 f i l e {

8 p a t h = > "/ var / log / a p a c h e / a c c e s s . log "

9 t y p e = > " a p a c h e "

10 }

11 }

(10)

Logstash: Filters

Enrich and process the event data

I grok, date, mutate, elapsed, ruby

I dns, geoip,useragent

1 f i l t e r {

2 g r o k {

3 m a t c h = > { " m e s s a g e " = > "% { C O M B I N E D A P A C H E L O G }" }

4 }

5 d a t e {

6 m a t c h = > [ " t i m e s t a m p " , " dd / MMM / y y y y : HH : mm : ss Z " ]

7 }

8 }

(11)

Logstash: Outputs

Send event data to other systems

I file, stdout, tcp

I elasticsearch, MongoDB, email

I syslog, Ganglia

1 o u t p u t {

2 s t d o u t { c o d e c = > r u b y d e b u g } 3 e l a s t i c s e a r c h { h o s t = > l o c a l h o s t } 4 }

(12)

Elasticsearch: better than Grep!

Search engine

I Built on top of Apache Lucene: full-text-search engine library

I Schema-free

I JSON document oriented: store and REST API

I index, search, sort and filter documents Distributed search and store

I scaling up (bigger servers) or scaling out (more servers)

I coping with failure: replication RESTfull API

I store, fetch, alter, and delete data

I Easy backup with snapshot and restore

I No transactions nor rollback

(13)

Elasticsearch: glossary

I Document: main data unit used during indexing and searching, contains one or more fields

I Field: section of the document, a couple of name and value

I Term: unit of search representing a word from the text

I Index: named collection of documents (Like a database in a relational data base)

I type: a class of similar documents (like a table in in a relational data base)

I Inverted index: data structure that maps the terms in the index to the documents

I Shard: a separate Apache Lucene index. Low level ”worker” unit.

I Replica: an exact copy of a shard

Relational DB => Databases => Tables => Rows => Columns

(14)

Elasticsearch: indexes and documents

Create Index

1 c u r l - X P U T ’ l o c a l h o s t : 9 2 0 0 / logs ’ - d { 2 " s e t t i n g s ": {

3 " n u m b e r _ of _ s h a r d s " : 1 , 4 " n u m b e r _ of _ r e p l i c a s " : 0

5 }

6 }

Indexing a document

1 c u r l - X P U T ’ l o c a l h o s t : 9 2 0 0 / l o g s / a p a c h e l o g s /1 ’ - d { 2 " c l i e n t i p " = > " 8 . 0 . 0 . 0 " ,

3 " v e r b " = > " GET " ,

4 " r e q u e s t " = > "/ i m a g e s / c o m p _ s t a g e 2_ brc _ t o p r . gif " , 5 " h t t p v e r s i o n " = > "1.0" ,

6 " r e s p o n s e " = > "200" , 7 " b y t e s " = > " 1 6 3 "

8 }

Retrieving a document

1 c u r l - X G E T ’ l o c a l h o s t : 9 2 0 0 / l o g s / a p a c h e l o g s /1 ’

(15)

Elasticsearch: mapping

Define the schema of your documents

I elasticsearch is able to dynamically generate a mapping of fields type

I It is better to provide your own mapping: date fields as dates, numeric fields as numbers string fields as full-text or exact- value strings

Create a mapping

1 c u r l - X P U T ’ l o c a l h o s t : 9 2 0 0 / logs ’ { 2 " m a p p i n g s ": {

3 " a p a c h e l o g s " : { 4 " p r o p e r t i e s " : { 5 " c l i e n t i p " : { 6 " t y p e " : " ip "

7 } ,

8 " v e r b " : {

9 " t y p e " : " s t r i n g "

10 } ,

11 " r e q u e s t " : {

12 " t y p e " : " s t r i n g "

13 } ,

14 " h t t p v e r s i o n " : { 15 " t y p e " : " s t r i n g "

16 }

17 " r e s p o n s e " : {

18 " t y p e " : " s t r i n g "

19 }

20 " b y t e s " : {

21 " t y p e " : " i n t e g e r "

(14/27)

(16)

Elasticsearch: Quering

Search Lite

1 c u r l - X P U T ’ l o c a l h o s t : 9 2 0 0 / l o g s / a p a c h e l o g s /_ search ’

2 c u r l - X P U T ’ l o c a l h o s t : 9 2 0 0 / l o g s / a p a c h e l o g s /_ s e a r c h ? q = r e s p o n s e :200 ’

Search with Query DSL: match, filter, range

1 c u r l - X P U T ’ l o c a l h o s t : 9 2 0 0 / l o g s / a p a c h e l o g s /_ search ’ - d { 2 " q u e r y " : {

3 " m a t c h " : {

4 " r e s p o n s e " : " 2 0 0 "

5 }

6 }

7 }

8 c u r l - X P U T ’ l o c a l h o s t : 9 2 0 0 / l o g s / a p a c h e l o g s /_ search ’ - d { 9 " q u e r y " : {

10 " f i l t e r e d " : {

11 " f i l t e r " : {

12 " r a n g e " : {

13 " b y t e s " : { " gt " : 1 0 2 4 }

14 }

15 } ,

16 " q u e r y " : {

17 " m a t c h " : {

18 " r e s p o n s e " : " 2 0 0 "

19 }

20 } ^(15/27)

(17)

Elasticsearch: Aggregation

Analyze and summarize data

I How many needles are in the haystack?

I What is the average length of the needles?

I What is the median length of the needles, broken down by manufacturer?

I How many needles were added to the haystack each month?

Buckets

I Collection of documents that meet a criterion Metrics

I Statistics calculated on the documents in a bucket SQL equivalent

SELECT COUNT(response) FROM table GROUP BY response

I COUNT(response) is equivalent to a metric

(18)

Aggregation: buckets

I partioning document based on criteria: by hour, by most-popular terms, by age ranges, by IP ranges

1 c u r l - X P U T ’ l o c a l h o s t : 9 2 0 0 / l o g s / a p a c h e l o g s /_ s e a r c h ? s e a r c h _ t y p e = count

’ - d { 2 " a g g s " : {

3 " r e s p o n s e s " : {

4 " t e r m s " : {

5 " f i e l d " : " r e s p o n s e "

6 }

7 }

8 }

9 }

I Aggregations are placed under the top-level aggs parameter (the longer aggregations will also work if you prefer that)

I We then name the aggregation whatever we want: responses, in this example

I Finally, we define a single bucket of type terms

I Setting search type to count avoids executing the fetch phase of the search

(19)

Aggregation: metrics

I metric calculated on those documents in each bucket: min, mean, max, quantiles

1 c u r l - X P U T ’ l o c a l h o s t : 9 2 0 0 / l o g s / a p a c h e l o g s /_ s e a r c h ? s e a r c h _ t y p e = count

’ - d { 2 " a g g s ": {

3 " r e s p o n s e s ": {

4 " t e r m s ": {

5 " f i e l d ": " r e s p o n s e "

6 } ,

7 " a g g s ": {

8 " avg _ b y t e s ": {

9 " avg ": {

10 " f i e l d ": " b y t e s "

11 }

12 }

13 }

14 }

15 }

(20)

Aggregation: combining them

An aggregation is a combination of buckets and metrics

I Partition document by client IP (bucket)

I Then partition each client bucket by verbe (bucket)

I Then partition each verbe bucket by response (bucket)

I Finally, calculate the average bytes for each response (metric)

I Average bytes per<client IP, verbe, response>

Many types are available

I terms

I range, date range, ip range

I histogram, date histogram

I stats/avg, min, max, sum, percentiles

(21)

Kibana: visualisation

Data visualisation

I browser-based interface

I search, view, and interact with data stored in Elasticsearch indice

I charts, tables, and maps

I dashboards to display changes to Elasticsearch queries in real time

(22)

Kibana: setting your index

1 c u r l ’ l o c a l h o s t : 9 2 0 0 / _ cat / i n d i c e s ? v ’

2 i n d e x pri rep d o c s . c o u n t d o c s . d e l e t e d s t o r e . s i z e pri . s t o r e . s i z e

3 l o g s t a s h - 1 9 9 8 . 0 4 . 3 0 5 1 9 8 4 3 4 0 2 0 . 6 mb 2 0 . 6 mb

4 l o g s t a s h - 1 9 9 8 . 0 5 . 0 1 5 1 1 0 9 4 9 1 0 0 2 0 5 . 8 mb 2 0 5 . 8 mb

(23)

Kibana: discover

I Explore you data with access to every document

I Submit search query, filter the search

I Save and Load searches

(24)

Kibana: search

Query language

lang:en to just search inside a field named ”lang”

lang:e? wildcard expression

user.listed count:[0 to 10] range search on numeric fields lang:ens regular expression search (very slow)

(25)

Kibana: visualize

I Area chart: Displays a line chart with filled areas below the lines. The areas can be displayed stacked, overlapped, or some other variations.

I Data table: Displays a table of aggregated data.

I Line chart: Displays aggregated data as lines.

I Markdown widget: Use the Markdown widget to display free-form information or instructions about your dashboard.

I Metric: Displays one the result of a metric aggregation without buckets as a single large number.

I Pie chart: Displays data as a pie with different slices for each bucket or as a donut (depending on your taste).

I Tile map: Displays a map for results of a geohash aggregation.

I Vertical bar chart: A chart with vertical bars for each bucket.

(26)

Kibana: dashboard

Displays a set of saved visualisations in groups

I Save, load and share dashboards

(27)

What else ?

Elasticsearch is very active and evolving project

I Watchout when using the wildcard !

I Deleting a mapping deletes the data

I Respect procedures to update mappings

I Backup is easy¹, do it!

(28)

Powering Monitoring Analytics with ELK stack Abdelkader Lahmadi, Frédéric Beck

HAL Id: hal-01212015

https://hal.inria.fr/hal-01212015

Submitted on 5 Oct 2015

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Powering Monitoring Analytics with ELK stack

Abdelkader Lahmadi, Frédéric Beck

To cite this version:

Abdelkader Lahmadi, Frédéric Beck. Powering Monitoring Analytics with ELK stack. 9th Interna-

tional Conference on Autonomous Infrastructure, Management and Security (AIMS 2015), Jun 2015,

Ghent, Belgium. 9th International Conference on Autonomous Infrastructure, Management and Se-

curity (AIMS 2015), 2015. �hal-01212015�

Powering Monitoring Analytics with ELK stack

References

Monitoring data

Monitoring data: properties

Traditional tools and techniques

ELK Components

Logstash: architecture

Logstash: Inputs and codecs

Logstash: Filters

Logstash: Outputs

Elasticsearch: better than Grep!

Elasticsearch: glossary

Elasticsearch: indexes and documents

Elasticsearch: mapping

Elasticsearch: Quering

Elasticsearch: Aggregation

Aggregation: buckets

Aggregation: metrics

Aggregation: combining them

Kibana: visualisation

Kibana: setting your index

Kibana: discover

Kibana: search

Kibana: visualize

Kibana: dashboard

What else ?

Hands-on Lab