Eugeny Shtoltc

IT Cloud


Скачать книгу

not only for individual containers, at least 10%. During imaging or container startup, an error may be thrown that the specified limits have been exceeded. To change the default settings, you need to tell the Dockerd server the settings, after stopping it with service docker stop (all containers will be stopped) and after resuming it with service docker start (the containers will be resumed). Settings can be set as options / bin / dockerd –storange-opt dm.basesize = 50G –stirange-opt

      In Container, we have authorization, control over our containers, with the ability to create them for testing and see graphs on the processor and memory. More will require a monitoring system. There are quite a few monitoring systems, for example, Zabbix, Graphite, Prometheus, Nagios, InfluxData, OkMeter, DataDog, Bosum, Sensu and others, of which Zabbix and Prometheus are the most popular. The first is traditionally used, since it is the leading deployment tool, which admins love for its ease of use (all you need to do is to have SSH access to the server), low-level, which allows you to work not only with servers, but also with other hardware, such as routers. The second is the opposite of the first: it is focused exclusively on collecting metrics and monitoring, focused as a ready-made solution, and not a framework and fell in love with programmers, set it according to the principle, chose metrics and received graphs. The key feature between Zabbix and Prometheus is not in the preferences of some to customize in detail for themselves and others to spend much less time, but in the scope. Zabbix is focused on setting up work with a specific hardware, which can be anything, and often very exotic in a corporate environment, and for this entity, a manual collection of metrics is written, a schedule is manually configured. For a dynamically changing environment of cloud solutions, even if it is just a Docker container, and even more so if it is Kubernetes, in which a huge number of entities are constantly created, and the entities themselves, apart from the general environment, are not of particular interest, it is not suitable for this in Prometheus Service Discovery is built-in and navigation is supported for Kubernetes through the namespace, the balancer (service) and the group of containers (POD), which can be configured in Grafana in the form of tables. In Kubernetes, according to The News Stack 2017, Kubernetes User and Experience is used in 63% of cases, in the rest there are more rare cloud monitoring tools.

      Metrics can be system (for example, CRU, RAM, ROM) and application (service and application metrics). System metrics are core metrics that are used by Kubernetes for scaling and the like and non-core metrics that are not used by Kubernetes. Here is an example of bundles for collecting metrics:

      * cAdvisor + Heapster + InfluxDB

      * cAdvisor + collectd + Heapster

      * cAdvisor + Prometheus

      * snapd + Heapster

      * snapd + SNAP cluster-level agent

      * Sysdig

      There are many monitoring systems and services on the market. We will consider exactly OpenSource, which can be installed in your cluster. They can be divided according to the model of obtaining metrics: into those who collect logs by polling, and those who expect that metrics will be poisoned in them. The latter are simpler both in structure and in use on a small scale. An example would be InfluxDB, which is a database that you can write to. The downside of this solution is the difficulty of scaling both in terms of support and load. If all services write at the same time, then they can overload the monitoring system, especially since it is difficult to scale, since the endpoint is registered in each service. The first group to practice a pull model of interaction is Prometheus. It is also a database with a daemon that polls services based on their registrations in the configuration file and pulls labels in a specific format, for example:

      cpu_usage: 2

      cpu_usage {app: myapp}: 2

      Prometheus is a mature product, it was developed in 2012, and in 2016 it was included in the CNCF (Cloud Native Computing Foundation) consortium. Prometheus consists of:

      * TSDB (Time Series Satabase) database, which looks more like a storage queue for metrics, with a specified accumulation period, for example, a week, allowing hundreds of thousands of metrics to be processed per second. This base is local to Prometheus, does not support horizontal scaling, in the case of Prometheus it is achieved by raising several of its instances and sharding them. Prometheus supports data aggregation, which is useful for reducing the amount of accumulated data, as well as archiving the database from memory to disk.

      * Service Discovery support Kubernetes in a box through a public API through polling PODs filtered according to the config on port 9121 of the TPC.

      * Grafana (a separate product, added by default) – a universal UI with dashboards and charts that supports Prometheus via PromQL.

      To return metrics, you can use ready-made solutions or develop your own. For the vast majority of system metrics there is an exporter, and for applied metrics, you often have to give your own metrics. Exporters are general and specialized. For example, NodeExporter provides most of the metrics, including those for processes, but there are two of them, and there are more specialized metrics. If you run Prometheus without exporters, then it will give out almost a thousand metrics, but these are the metrics of Prometheus itself, and there will be no node_ * prefixes in them. For these metrics to appear, you need to enable NodeExporter and write a URL to it in the Prometheus configuration to collect the metrics it provides. For NodeExporter, this can be localhost or the node address and port 9256. Usually, exporters specialize in product-specific metrics, for example:

      ** node_exporter – node metrics (CRU, Memory, Network);

      ** snmp_exporter – SNMP protocol metrics;

      ** mysqld_exporter – MySQL database metrics;

      ** consul_exporter – Consul database metrics;

      ** graphite_exporter – Graphite database metrics;

      ** memcached_exporter – Memcached database metrics;

      ** haproxy_exporter – HAProxy balancer metrics;

      ** CAdvisor – container metrics;

      ** process-exporter – detailed process metrics;

      ** metrics-server – CRU, Memory, File-descriptors, Disks;

      ** cAdvisor – a Docker daemon metrics – containers monitoring;

      ** kube-state-metrics – deployments, PODs, nodes.

      Prometheus supports remote data writing (https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_write), for example, to TSDB distributed storage for Prometheus – Weave Works Cortex, using a setting in the configuration, which allows data analysis from multiple Prometheus:

      remote_write:

      – url: "http: // localhost: 9000 / receive"

      Let's consider his work on a ready-made instance. I'll take www.katacoda.com/courses/istio/deploy-istio-on-kubernetes for this and go through it. Our Prometheus is located on its standard port 9090:

      controlplane $ kubectl -n istio-system get svc prometheus

      NAME TYPE CLUSTER-IP EXTERNAL-IP PORT (S) AGE

      prometheus ClusterIP 10.99.70.170 <none> 9090 / TCP 6m59s

      To open its UI, I'll go to the WEB tab and change the address 80 to 9090: https://2886795314-9090-ollie08.environments.katacoda.com/graph. In the input line, you need to enter the desired metric in the PromQL (Prometheus query language) language, as well as InfluxQL for InfluxDB and SQL for TimescaleDB. For example, I will enter "CRU", and it will display me a list containing it. There are two tabs under the line: a tab with a graph and a tab for displaying in a tabular form. I will be looking at a tabular view. I selected machine_cru_cores and clicked Execute. Common metrics usually have similar names, for example machine_cru_cores and node_cru_cores. The metrics themselves consist of the name, tags in brackets and the value of the metric, in the same form they need to be requested, in the same form they are displayed in the table.

      machine_cpu_cores {beta_kubernetes_io_arch = "amd64", beta_kubernetes_io_os = "linux", instance = "controlplane", job = "kubernetes-cadvisor", kubernetes_io_arch = "amd64", kubernetes_io_hostname "=" controlplane ", kubernetes_io_hostname" = "controlplane"

      machine_cpu_cores {beta_kubernetes_io_arch