Eugeny Shtoltc

IT Cloud


Скачать книгу

Data: nested objects;

      ** Lucene engine;

      ** JSON join;

      ** Scalable: Solar Cloud (setting) && ZooKeeper (setting);

      ** Documentation since 2004.

      At the present time, micro-service architecture is increasingly used, which allows due to weak

      the connectivity between their components and their simplicity to simplify their development, testing, and debugging.

      But in general, the system becomes more difficult to analyze due to its distribution. To analyze the condition

      in general, logs are used, collected in a centralized place and converted into an understandable form. Also arises

      the need to analyze other data, for example, access_log NGINX, to collect metrics about attendance, mail log,

      mail server to detect attempts to guess a password, etc. Take ELK as an example of such a solution. ELK means

      a bunch of three products: Logstash, Elasticsearch and Kubana, the first and last of which are heavily focused on the central and

      provide ease of use. More generally ELK is called Elastic Stack, since the tool for preparing logs Logstash

      can be replaced by analogs such as Fluentd or Rsyslog, and the Kibana renderer can be replaced by Grafana. For example, although

      Kibana provides great analysis capabilities, Grafana provides notifications when events occur, and

      can be used in conjunction with other products, for example, CAdVisor – analysis of the state of the system and individual containers.

      EKL products can be self-installed, downloaded as self-contained containers for which you need to configure

      communication or as a single container.

      For Elasticsearch to work properly, you need the data to come in JSON format. If the data is submitted to

      text format (the log is written in one line, separated from the previous one by a line break), then it can

      provide only full-text searches as they will be interpreted as one line. For transmission

      logs in JSON format, there are two options: either configure the product under investigation to be output in this format,

      for example, for NGINX there is such a possibility. But, often this is impossible, since there is already

      the accumulated database of logs, and traditionally they are written in text format. For such cases, it is necessary

      post processing of logs from text format to JSON, which is handled by Logstash. It is important to note that if

      it is possible to immediately transfer data in a structured form (JSON, XML and others), then this follows

      do, because if you do detailed parsing, then any deviation is a one-sided deviation from the format

      will lead to inoperability, and if superficial – we lose valuable information. Anyway, parsing in

      this system is a bottleneck, although it can be scaled to a limited extent to a service or log

      file. Fortunately, more and more products are starting to support structured logging, such as

      the latest versions of NGINX support logs in JSON format.

      For systems that do not support this format, you can use the conversion to it using such

      programs like Logstash, File bear and Fluentd. The first one is included in the standard Elastic Stack delivery from the vendor

      and can be installed in one way ELK in Docker – container. It supports fetching data from files, network and

      standard stream both at the input and at the output, and most importantly, the native Elastic Search protocol.

      Logstash monitors log files based on modification date or receives over the network telnet data from a distributed

      systems, for example, containers and, after transformation, it is sent to the output, usually in Elastic Search. It is simple and

      comes standard with the Elastic Stack, making it easy and hassle-free to configure. But thanks to

      Java machine inside is heavy and not very functional, although it supports plugins, for example, synchronization with MySQL

      to send new data. Filebeat provides slightly more options. An enterprise tool for everything

      cases of life can serve Fluentd due to its high functionality (reading logs, system logs, etc.),

      scalability and the ability to roll out across Kubernetes clusters using the Helm chart, and monitor everything

      data center in the standard package, but about this relevant section.

      To manage logs, you can use Curator, which can archive old ones from ElasticSearch

      logs or delete them, increasing the efficiency of its work.

      The process of obtaining logs is logical carried out by special collectors: logstash, fluentd, filebeat or

      others.

      fluentd is the least demanding and simpler analogue of Logstash. Customization

      produced in /etc/td-agent/td-agent.conf, which contains four blocks:

      ** match – contains settings for transferring received data;

      ** include – contains information about file types;

      ** system – contains system settings.

      Logstash provides a much more functional configuration language. Logstash agent daemon – logstash monitors

      changes in files. If the logs are not located locally, but on a distributed system, then logstash is installed on each server and

      runs in agent mode bin / logstash agent -f /env/conf/my.conf . Since run

      logstash only as an agent for sending logs is wasteful, then you can use a product from those

      the same developers Logstash Forwarder (formerly Lumberjack) forwards logs via the lumberjack protocol to

      logstash to the server. You can use the Packetbeat agent to track and retrieve data from MySQL

      (https://www.8host.com/blog/sbor-metrik-infrastruktury-s-pomoshhyu-packetbeat-i-elk-v-ubuntu-14-04/).

      Also logstash allows you to convert data of different types:

      ** grok – set regular expressions to rip fields from a string, often for logs from text format to JSON;

      ** date – in case of archived logs, set the date when the log was created not as the current date, but take it from the log itself

      ** kv – for logs like key = value;

      ** mutate – select only the required fields and change the data in the fields, for example, replace the "/" character with "_";

      ** multiline – for multi-line logs with delimiters.

      For example, you can decompose a log in the format "date type number" into components, for example "01.01.2021 INFO 1" decompose into a hash "message":

      filter {

      grok {

      type => "my_log"

      match => ["message", "% {MYDATE: date}% {WORD: loglevel} $ {ID.id.int}"]

      }

      }

      The $ {ID.id.int} template takes the class – the ID template, the resulting value will be substituted into the id field and the string value will be converted to the int type.

      In the "Output" block, we can specify: output data to the console using the "Stdout" block, to a file – "File", transfer via http via JSON REST API – "Elasticsearch" or send by mail – "Email". You can also order conditions for the fields obtained in the filter block. For instance,:

      output {

      if [type] == "Info" {

      elasticsearch