Blog

Home / Resources / Blog Post

Kubernetes Monitoring – 5 Key Metrics

Written by Teknita Team

September 26, 2022


Kubernetes is rapidly becoming the most important infrastructure platform in the modern IT environment. Known as K8s, it is an open-source system for automating deployment, scaling, and management of containerized applications.

How Kubernetes works

  1. When developers create a multi-container application, they plan out how all the parts fit and work together, how many of each component should run, and roughly what should happen when challenges (e.g., lots of users logging in at once) are encountered.
  2. They store their containerized application components in a container registry (local or remote) and capture this thinking in one or several text files comprising aconfiguration. To start the application, they “apply” the configuration to Kubernetes.
  3. Kubernetes job is to evaluate and implement this configuration and maintain it until told otherwise. It:
    • Analyzes the configuration, aligning its requirements with those of all the other application configurations running on the system
    • Finds resources appropriate for running the new containers (e.g., some containers might need resources like GPUs that aren’t present on every host)
    • Grabs container images from the registry, starts up the new containers, and helps them connect to one another and to system resources (e.g., persistent storage), so the application works as a whole
  4. Then Kubernetes monitors everything, and when real events diverge from desired states, Kubernetes tries to fix things and adapt. For example, if a container crashes, Kubernetes restarts it. If an underlying server fails, Kubernetes finds resources elsewhere to run the containers that node was hosting. If traffic to an application suddenly spikes, Kubernetes can scale out containers to handle the additional load, in conformance to rules and limits stated in the configuration.

Here are five metrics to manage your Kubernetes environments.

Kubernetes Cluster Metrics

Monitoring the health of a Kubernetes cluster can help you understand the components that impact the health of your cluster. For example, you can learn how many resources the cluster uses as a whole and how many applications run on each node within the cluster. You can also learn whether your nodes are working well and at what capacity.

Here are several useful metrics to monitor:

  • Node resource utilization—metrics such as network bandwidth, memory and CPU utilization, and disk utilization. You can use these metrics to find out if you should decrease or increase the number and size of cluster nodes.
  • The number of nodes—this metric can help you learn what resources are being billed by the cloud provider and discover how the cluster is used.
  • Running pods—by tracking the number of running pods, you can understand if the available nodes are sufficient to handle current workloads if a node fails.

Kubernetes Pod Metrics

The process of monitoring a Kubernetes pod can be divided into three components:

  • Kubernetes metrics—these allow you to monitor how an individual pod is being handled and deployed by the orchestrator. You can monitor information such as the number of instances in a pod at a given moment compared to the expected number of instances (a lower number may indicate the cluster has run out of resources). You can also see in-progress deployment (the number of instances being switched to a newer version), check the health of your pods, and view network data.
  • Pod container metrics—these are mostly available via cAdvisor and exposed through Heapster, which queries each node about the containers that are running. Important metrics include network, CPU, and memory usage, which can be compared with the maximum usage permitted.
  • Application-specific metrics—these are developed by the actual application itself and relate to specific business rules. A database application, for example, will likely expose metrics on the state of an index, as well as relational statistics, while an eCommerce application might expose the data on the number of customers online and the revenue generated in a given timeframe. The application directly exposes these types of metrics, and you can link the app to a monitoring tool to track them more closely.

State Metrics

kube-state-metrics is a Kubernetes service that provides data on the state of cluster objects, including pods, nodes, namespaces, and DaemonSets. It serves metrics through the standard Kubernetes metrics API.

Here are several aspects you can monitor using state metrics:

  • Persistent Volumes (PVs) – a PV is a storage resource specified on the cluster and made available as persistent storage for any pod that requests it. PVs are bound to a certain pod during their lifecycle. When the PV is no longer needed by the pod, it is reclaimed. Monitoring PVs can help you learn when reclamation processes fail, which signifies that something is not working properly with your persistent storage.
  • Disk pressure—occurs when a node uses too much disk space or when a node uses disk space too quickly. Disk pressure is defined according to a configurable threshold. Monitoring this metric can help you learn if the application truly requires additional disk space or if it prematurely fills up the disk in an unanticipated manner.
  • Crash loop—can happen when a pod starts, crashes, and then gets stuck in a loop of continuously trying to restart without success. When a crash loop occurs, the application cannot run. It may be caused by an application crashing within the pod, a pod misconfiguration, or a deployment issue. Since there are many possibilities, debugging a crash loop can be a tricky effort. However, you do need to learn of the crash immediately in order to quickly mitigate or implement emergency measures that can keep the application available.
  • Jobs—components designed to temporarily run pods. A job can run pods for a limited amount of time. Once the pods complete their functions, the job can shut them down. Sometimes, though, jobs do not complete their function successfully. This may happen due to a node being rebooted or crashing. It may also be the result of resource exhaustion. Monitoring job failures can help you learn when your application is not accessible.

Container Metrics

You should monitor container metrics to ensure containers are properly utilizing resources. These metrics can help you understand if you are reaching a predefined resource limit and detect pods that are stuck in a CrashLoopBackoff.

Here are several container metrics that you should monitor:

  • Container CPU usage—learn how much CPU resources your containers are using in relation to the pod limits you have defined.
  • Container memory utilization—discover how much memory your containers are utilizing in relation to the pod limits you have defined.
  • Network usage—detect sent and received data packets as well as how much bandwidth is being used.

Application Metrics

These metrics can help you measure the availability and performance of the applications running in pods. The business scope of the application determines the type of metrics provided. Here are several important metrics:

  • Application availability—can help you measure the uptime and response times of the application. This metric can help you assess optimal user experience and performance.
  • Application health and performance—can help you learn about performance issues, latency, responsiveness, and other user experience issues. This metric can surface errors that should be fixed within the application layer.

You can read more about Kubernetes Monitoring here.

Teknita has the expert resources to support all your technology initiatives.
We are always happy to hear from you.

Click here to connect with our experts!

0 Comments

Related Articles

How ECM Simplifies Public Sector Operations

How ECM Simplifies Public Sector Operations

Government organizations operate under unique challenges—managing vast amounts of information, ensuring regulatory compliance, and delivering timely services to citizens. In an era of digital transformation, traditional systems can no longer keep up with the demand...

Streamlining Compliance for Global Tech Companies with ECM

Streamlining Compliance for Global Tech Companies with ECM

For global tech companies, compliance is more than a regulatory obligation—it’s a cornerstone of building trust with customers and stakeholders. However, navigating the complexities of international regulations, from data privacy laws like GDPR to industry-specific...

How ECM Optimizes Product Lifecycle Management in Consumer Goods

How ECM Optimizes Product Lifecycle Management in Consumer Goods

In the highly competitive world of consumer goods, time-to-market and operational efficiency are critical for success. Managing a product’s lifecycle—from initial concept to retirement—is no small feat, especially when faced with challenges like fragmented data,...

Stay Up to Date With The Latest News & Updates

Join Our Newsletter

Keep up to date with the latest industry news.

Follow Us

Lets socialize!