Apache SkyWalking – EBPF

Blog: SkyWalking 10 Release: Service Hierarchy, Kubernetes Network Monitoring by eBPF, BanyanDB, and More

Mon, 13 May 2024 00:00:00 +0000

The Apache SkyWalking team today announced the 10 release. SkyWalking 10 provides a host of groundbreaking features and enhancements. The introduction of Layer and Service Hierarchy streamlines monitoring by organizing services and metrics into distinct layers and providing seamless navigation across them. Leveraging eBPF technology, Kubernetes Network Monitoring delivers granular insights into network traffic, topology, and TCP/HTTP metrics. BanyanDB emerges as a high-performance native storage solution, while expanded monitoring support encompasses Apache RocketMQ, ClickHouse, and Apache ActiveMQ Classic. Support for Multiple Labels Names enhances flexibility in metrics analysis, while enhanced exporting and querying capabilities streamline data dissemination and processing.

This release blog briefly introduces these new features and enhancements as well as some other notable changes.

Layer and Service Hierarchy

Layer concept was introduced in SkyWalking 9.0.0, it represents an abstract framework in computer science, such as Operating System(OS_LINUX layer), Kubernetes(k8s layer). It organizes services and metrics into different layers based on their roles and responsibilities in the system. SkyWalking provides a suite of monitoring and diagnostic tools for each layer, but there is a gap between the layers, which can not easily bridge the data across different layers.

In SkyWalking 10, SkyWalking provides new abilities to jump/connect across different layers and provide a seamless monitoring experience for users.

Layer Jump

In the topology graph, users can click on a service node to jump to the dashboard of the service in another layer. The following figures show the jump from the GENERAL layer service topology to the VIRTUAL_DATABASE service layer dashboard by clicking the topology node. Figure 1: Layer Jump

Figure 2: Layer jump Dashboard

Service Hierarchy

SkyWalking 10 introduces a new concept called Service Hierarchy, which defines the relationships of existing logically same services in various layers. OAP will detect the services from different layers, and try to build the connections. Users can click the Hierarchy Services in any layer’s service topology node or service dashboard to get the Hierarchy Topology. In this topology graph, users can see the relationships between the services in different layers and the summary of the metrics and also can jump to the service dashboard in the layer. When a service occurs performance issue, users can easily analyze the metrics from different layers and track down the root cause:

The examples of the Service Hierarchy relationships:

The application song deployed in the Kubernetes cluster with SkyWalking agent and Service Mesh at the same time. So the application song across the GENERAL, MESH, MESH_DP and K8S_SERVICE layers which could be monitored by SkyWalking, the Service Hierarchy topology as below: Figure 3: Service Hierarchy Agent With K8s Service And Mesh With K8s Service.

And can also have the Service Instance Hierarchy topology to get the single instance status across the layers as below: Figure 4: Instance Hierarchy Agent With K8s Service(Pod)
The PostgreSQL database psql deployed in the Kubernetes cluster and used by the application song. So the database psql across the VIRTUAL_DATABASE, POSTGRESQL and K8S_SERVICE layers which could be monitored by SkyWalking, the Service Hierarchy topology as below: Figure 5: Service Hierarchy Agent(Virtual Database) With Real Database And K8s Service

For more supported layers and how to detect the relationships between services in different layers please refer to the Service Hierarchy. how to configure the Service Hierarchy in SkyWalking, please refer to the Service Hierarchy Configuration section.

Monitoring Kubernetes Network Traffic by using eBPF

In the previous version, skyWalking provides Kubernetes (K8s) monitoring from kube-state-metrics and cAdvisor, which can monitor the Kubernetes cluster status and the metrics of the Kubernetes resources.

In SkyWalking 10, by leverage Apache SkyWalking Rover 0.6+, SkyWalking has the ability to monitor the Kubernetes network traffic by using eBPF, which can collect and map access logs from applications in Kubernetes environments. Through these data, SkyWalking can analyze and provide the Service Traffic, Topology, TCP/HTTP level metrics from the Kubernetes aspect.

The following figures show the Topology and TCP Dashboard of the Kubernetes network traffic:

Figure 6: Kubernetes Network Traffic Topology

Figure 7: Kubernetes Network Traffic TCP Dashboard

More details about how to monitor the Kubernetes network traffic by using eBPF, please refer to the Monitoring Kubernetes Network Traffic by using eBPF.

BanyanDB - Native APM Database

BanyanDB 0.6.0 and BanyanDB Java client 0.6.0 are released with SkyWalking 10, As a native storage solution for SkyWalking, BanyanDB is going to be SkyWalking’s next-generation storage solution. This is recommended to use for medium-scale deployments from 0.6 until 1.0.
It has shown high potential performance improvement. Less than 50% CPU usage and 50% memory usage with 40% disk volume compared to Elasticsearch in the same scale.

Apache RocketMQ Server Monitoring

Apache RocketMQ is an open-source distributed messaging and streaming platform, which is widely used in various scenarios including Internet, big data, mobile Internet, IoT, and other fields. SkyWalking provides a basic monitoring dashboard for RocketMQ, which includes the following metrics:

Cluster Metrics: including messages produced/consumed today, total producer/consumer TPS, producer/consumer message size, messages produced/consumed until yesterday, max consumer latency, max commitLog disk ratio, commitLog disk ratio, pull/send threadPool queue head wait time, topic count, and broker count.
Broker Metrics: including produce/consume TPS, producer/consumer message size.
Topic Metrics: including max producer/consumer message size, consumer latency, producer/consumer TPS, producer/consumer offset, producer/consumer message size, consumer group count, and broker count.

The following figure shows the RocketMQ Cluster Metrics dashboard: Figure 8: Apache RocketMQ Server Monitoring

For more metrics and details about the RocketMQ monitoring, please refer to the Apache RocketMQ Server Monitoring,

ClickHouse Server Monitoring

ClickHouse is an open-source column-oriented database management system that allows generating analytical data reports in real-time, it is widely used for online analytical processing (OLAP). ClickHouse monitoring provides monitoring of the metrics 、events and asynchronous metrics of the ClickHouse server, which includes the following parts of metrics:

Server Metrics
Query Metrics
Network Metrics
Insert Metrics
Replica Metrics
MergeTree Metrics
ZooKeeper Metrics
Embedded ClickHouse Keeper Metrics

The following figure shows the ClickHouse Cluster Metrics dashboard: Figure 9: ClickHouse Server Monitoring

For more metrics and details about the ClickHouse monitoring, please refer to the ClickHouse Server Monitoring, and here is a blog that can help for a quick start Monitoring ClickHouse through SkyWalking.

Apache ActiveMQ Server Monitoring

Apache ActiveMQ Classic is a popular and powerful open-source messaging and integration pattern server. SkyWalking provides a basic monitoring dashboard for ActiveMQ, which includes the following metrics:

Cluster Metrics: including memory usage, rates of write/read, and average/max duration of write.
Broker Metrics: including node state, number of connections, number of producers/consumers, and rate of write/read under the broker. Depending on the cluster mode, one cluster may include one or more brokers.
Destination Metrics: including number of producers/consumers, messages in different states, queues, and enqueue duration in a queue/topic.

The following figure shows the ActiveMQ Cluster Metrics dashboard: Figure 10: Apache ActiveMQ Server Monitoring

For more metrics and details about the ActiveMQ monitoring, please refer to the Apache ActiveMQ Server Monitoring, and here is a blog that can help for a quick start Monitoring ActiveMQ through SkyWalking.

Support Multiple Labels Names

Before SkyWalking 10, SkyWalking does not store the labels names in the metrics data, which makes MQE have to use _ as the generic label name, it can’t query the metrics data with multiple labels names.

SkyWalking 10 supports storing the labels names in the metrics data, and MQE can query or calculate the metrics data with multiple labels names. For example: The k8s_cluster_deployment_status metric has labels namespace, deployment and status. If we want to query all deployment metric values with namespace=skywalking-showcase and status=true, we can use the following expression:

k8s_cluster_deployment_status{namespace='skywalking-showcase', status='true'}

related enhancement:

Since Alarm rule configuration had migrated to the MQE in SkyWalking 9.6.0, the alarm rule also supports multiple labels names.
PromeQL service supports multiple labels names query.

Metrics gRPC exporter

SkyWalking 10 enhanced the metrics gPRC exporter, it supports exporting all types of metrics data to the gRPC server.

SkyWalking Native UI Metrics Query Switch to V3 APIs

SkyWalking Native UI metrics query deprecate the V2 APIs, and all migrated to V3 APIs and MQE.

Other Notable Enhancements

Support Java 21 runtime and oap-java21 image for Java 21 runtime.
Remove CLI(swctl) from the image.
More MQE functions and operators supported.
Enhance the native UI and improve the user experience.
Several bugs and CVEs fixed.

Zh: SkyWalking 10 发布：服务层次结构、基于 eBPF 的 Kubernetes 网络监控、BanyanDB 等

Mon, 13 May 2024 00:00:00 +0000

Apache SkyWalking 团队今天宣布发布 SkyWalking 10。SkyWalking 10 提供了一系列突破性的功能和增强功能。Layer 和 Service Hierarchy 的引入通过将服务和指标组织成不同的层次，并提供跨层无缝导航，从而简化了监控。利用 eBPF 技术，Kubernetes 网络监控提供了对网络流量、拓扑和 TCP/HTTP 指标的详细洞察。BanyanDB 作为高性能的原生存储解决方案出现，同时扩展的监控支持包括 Apache RocketMQ、ClickHouse 和 Apache ActiveMQ Classic。对多标签名称的支持增强了指标分析的灵活性，而增强的导出和查询功能简化了数据分发和处理。

本文简要介绍了这些新功能和增强功能以及其他一些值得注意的变化。

Layer 和 Service Hierarchy

Layer 概念是在 SkyWalking 9.0.0 中引入的，它代表计算机科学中的一个抽象框架，例如操作系统（OS_LINUX layer）、Kubernetes（k8s layer）。它根据系统中服务和指标的角色和职责将其组织到不同的层次。SkyWalking 为每个层提供了一套监控和诊断工具，但层之间存在 gap，无法轻松跨层桥接数据。

在 SkyWalking 10 中，SkyWalking 提供了跨层跳转/连接的新功能，为用户提供无缝的监控体验。

Layer Jump

在拓扑图中，用户可以点击服务节点跳转到另一层服务的仪表板。下图显示了通过点击拓扑节点从 GENERAL 层服务拓扑跳转到 VIRTUAL_DATABASE 服务层仪表板的过程。

Service Hierarchy

SkyWalking 10 引入了一个新概念，称为 Service Hierarchy，它定义了各层中现有逻辑相同服务的关系。OAP 将检测不同层次的服务，并尝试建立连接。用户可以点击任何层的服务拓扑节点或服务仪表板中的 Hierarchy Services 获取 Hierarchy Topology。在此拓扑图中，用户可以看到不同层次服务之间的关系和指标摘要，并且可以跳转到该层的服务仪表板。当服务发生性能问题时，用户可以轻松分析不同层次的指标并找出根本原因：

以下是 Service Hierarchy 关系的示例：

应用程序 song 同时在 Kubernetes 集群中部署了 SkyWalking agent 和 Service Mesh。因此，应用程序 song 跨越了 GENERAL、MESH、MESH_DP 和 K8S_SERVICE 层，SkyWalking 可以监控这些层次，Service Hierarchy 拓扑如下：

还可以有 Service Instance Hierarchy 拓扑来获取跨层的单实例状态，如下所示：

在 Kubernetes 集群中部署并由应用程序 song 使用的 PostgreSQL 数据库 psql。因此，数据库 psql 跨越 VIRTUAL_DATABASE、POSTGRESQL 和 K8S_SERVICE 层，SkyWalking 可以监控这些层次，Service Hierarchy 拓扑如下：

有关更多支持的层次以及如何检测不同层次服务之间的关系，请参阅 Service Hierarchy。有关如何在 SkyWalking 中配置 Service Hierarchy，请参阅 Service Hierarchy Configuration 部分。

使用 eBPF 监控 Kubernetes 网络流量

在之前的版本中，SkyWalking 提供了来自 kube-state-metrics 和 cAdvisor 的 Kubernetes (K8s) 监控，它可以监控 Kubernetes 集群状态和 Kubernetes 资源的指标。

在 SkyWalking 10 中，通过利用 Apache SkyWalking Rover 0.6+，SkyWalking 具有使用 eBPF 监控 Kubernetes 网络流量的能力，可以收集和映射 Kubernetes 环境中应用程序的访问日志。通过这些数据，SkyWalking 可以从 Kubernetes 角度分析和提供服务流量、拓扑、TCP/HTTP 级别指标。

下图显示了 Kubernetes 网络流量的拓扑和 TCP 仪表板：

有关如何使用 eBPF 监控 Kubernetes 网络流量的更多详细信息，请参阅使用 eBPF 监控 Kubernetes 网络流量。

BanyanDB - 原生 APM 数据库

BanyanDB 0.6.0 和 BanyanDB Java 客户端 0.6.0 随 SkyWalking 10 一起发布。作为 SkyWalking 的原生存储解决方案，BanyanDB 将成为 SkyWalking 的下一代存储解决方案。推荐在 0.6 到 1.0 期间用于中等规模的部署。它展示了高性能改进的潜力。与 Elasticsearch 在同一规模下相比，CPU 使用率降低 50%，内存使用率降低 50%，磁盘使用量减少 40%。

Apache RocketMQ 服务器监控

Apache RocketMQ 是一个开源的分布式消息和流平台，广泛应用于互联网、大数据、移动互联网、物联网等领域。SkyWalking 为 RocketMQ 提供了一个基本的监控仪表板，包括以下指标：

Cluster Metrics：包括当天产生/消费的消息数、总生产者/消费者 TPS、生产者/消费者消息大小、截至昨天产生/消费的消息数、最大消费者延迟、最大 commitLog 磁盘比、commitLog 磁盘比、拉取/发送线程池队列头等待时间、topic count 和 broker count。
Broker Metrics：包括生产/消费 TPS、生产者/消费者消息大小。
Topic Metrics：包括最大生产者/消费者消息大小、消费者延迟、生产/消费 TPS、生产/消费偏移、生产/消费消息大小、消费者组数和代理数。

下图显示了 RocketMQ Cluster Metrics 仪表板：

有关 RocketMQ 监控的更多指标和详细信息，请参阅 Apache RocketMQ 服务器监控。

ClickHouse Server 监控

ClickHouse 是一个开源的列式数据库管理系统，可以实时生成分析数据报告，广泛用于在线分析处理 (OLAP)。ClickHouse 监控提供了 ClickHouse 服务器的指标、事件和异步指标的监控，包括以下部分的指标：

Server Metrics
Query Metrics
Network Metrics
Insert Metrics
Replica Metrics
MergeTree Metrics
ZooKeeper Metrics
Embedded ClickHouse Keeper Metrics

下图显示了 ClickHouse Cluster Metrics 仪表板：

有关 ClickHouse 监控的更多指标和详细信息，请参阅 ClickHouse 服务器监控，以及一篇可以帮助快速入门的博客通过 SkyWalking 监控 ClickHouse。

Apache ActiveMQ 服务器监控

Apache ActiveMQ Classic 是一个流行且强大的开源消息和集成模式服务器。SkyWalking 为 ActiveMQ 提供了一个基本的监控仪表板，包括以下指标：

Cluster Metrics：包括内存使用率、写入/读取速率和平均/最大写入持续时间。
Broker Metrics：包括节点状态、连接数、生产者/消费者数和代理下的写入/读取速率。根据集群模式，一个集群可以包含一个或多个代理。
Destination Metrics：包括生产者/消费者数、不同状态的消息、队列和队列/主题中的入队持续时间。

下图显示了 ActiveMQ Cluster Metrics 仪表板：

有关 ActiveMQ 监控的更多指标和详细信息，请参阅 Apache ActiveMQ 服务器监控，以及一篇可以帮助快速入门的博客通过 SkyWalking 监控 ActiveMQ。

支持多标签名称

在 SkyWalking 10 之前，SkyWalking 不会在指标数据中存储标签名称，这使得 MQE 必须使用 _ 作为通用标签名称，无法使用多个标签名称查询指标数据。

SkyWalking 10 支持在指标数据中存储标签名称，MQE 可以使用多个标签名称查询或计算指标数据。例如：k8s_cluster_deployment_status 指标具有 namespace、deployment 和 status 标签。如果我们想查询所有 namespace=skywalking-showcase 和 status=true 的部署指标值，可以使用以下表达式：

k8s_cluster_deployment_status{namespace='skywalking-showcase', status='true'}

Metrics gRPC 导出器

SkyWalking 10 增强了 metrics gPRC exporter，支持将所有类型的指标数据导出到 gRPC 服务器。

SkyWalking 原生 UI 指标查询切换到 V3 API

SkyWalking 原生 UI 指标查询弃用 V2 API，全部迁移到 V3 API 和 MQE。

其他值得注意的增强功能

支持 Java 21 运行时和 oap-java21 镜像用于 Java 21 运行时。
从镜像中删除 CLI（swctl）。
支持更多的 MQE 函数和操作。
增强原生 UI 并改善用户体验。
修复了一些 Bug 和 CVE。

Blog: Activating Automatical Performance Analysis -- Continuous Profiling

Sun, 25 Jun 2023 00:00:00 +0000

Background

In previous articles, We have discussed how to use SkyWalking and eBPF for performance problem detection within processes and networks. They are good methods to locate issues, but still there are some challenges:

The timing of the task initiation: It’s always challenging to address the processes that require performance monitoring when problems occur. Typically, manual engagement is required to identify processes and the types of performance analysis necessary, which cause extra time during the crash recovery. The root cause locating and the time of crash recovery conflict with each other from time to time. In the real case, rebooting would be the first choice of recovery, meanwhile, it destroys the site of crashing.
Resource consumption of tasks: The difficulties to determine the profiling scope. Wider profiling causes more resources than it should. We need a method to manage resource consumption and understand which processes necessitate performance analysis.
Engineer capabilities: On-call is usually covered by the whole team, which have junior and senior engineers, even senior engineers have their understanding limitation of the complex distributed system, it is nearly impossible to understand the whole system by a single one person.

The Continuous Profiling is a new created mechanism to resolve the above issues.

Automate Profiling

As profiling is resource costing and high experience required, how about introducing a method to narrow the scope and automate the profiling driven by polices creates by senior SRE engineer? So, in 9.5.0, SkyWalking first introduced preset policy rules for specific services to be monitored by the eBPF Agent in a low-energy manner, and run profiling when necessary automatically.

Policy

Policy rules specify how to monitor target processes and determine the type of profiling task to initiate when certain threshold conditions are met.

These policy rules primarily consist of the following configuration information:

Monitoring type: This specifies what kind of monitoring should be implemented on the target process.
Threshold determination: This defines how to determine whether the target process requires the initiation of a profiling task.
Trigger task: This specifies what kind of performance analysis task should be initiated.

Monitoring type

The type of monitoring is determined by observing the data values of a specified process to generate corresponding metrics. These metric values can then facilitate subsequent threshold judgment operations. In eBPF observation, we believe the following metrics can most directly reflect the current performance of the program:

Monitor Type	Unit	Description
System Load	Load	System load average over a specified period.
Process CPU	Percentage	The CPU usage of the process as a percentage.
Process Thread Count	Count	The number of threads in the process.
HTTP Error Rate	Percentage	The percentage of HTTP requests that result in error responses (e.g., 4xx or 5xx status codes).
HTTP Avg Response Time	Millisecond	The average response time for HTTP requests.

Monitoring network type metrics is not as simple as obtaining basic process information. It requires the initiation of eBPF programs and attaching them to the target process for observation. This is similar to the principles of network profiling task we introduced in the previous article, except that we no longer collect the full content of the data packets. Instead, we only collect the content of messages that match specified HTTP prefixes.

By using this method, we can significantly reduce the number of times the kernel sends data to the user space, and the user-space program can parse the data content with less system resource usage. This ultimately helps in conserving system resources.

Metrics collector

The eBPF agent would report metrics of processes periodically as follows to indicate the process performance in time.

Name	Unit	Description
process_cpu	(0-100)%	The CPU usage percent
process_thread_count	count	The thread count of process
system_load	count	The average system load for the last minute, each process have same value
http_error_rate	(0-100)%	The network request error rate percentage
http_avg_response_time	ms	The network average response duration

Threshold determination

For the threshold determination, the judgement is made by the eBPF Agent based on the target monitoring process in its own memory, rather than relying on calculations performed by the SkyWalking backend. The advantage of this approach is that it doesn’t have to wait for the results of complex backend computations, and it reduces potential issues brought about by complicated interactions.

By using this method, the eBPF Agent can swiftly initiate tasks immediately after conditions are met, without any delay.

It includes the following configuration items:

Threshold: Check if the monitoring value meets the specified expectations.
Period: The time period(seconds) for monitoring data, which can also be understood as the most recent duration.
Count: The number of times(seconds) the threshold is triggered within the detection period, which can also be understood as the total number of times the specified threshold rule is triggered in the most recent duration(seconds). Once the count check is met, the specified Profiling task will be started.

Trigger task

When the eBPF Agent detects that the threshold determination in the specified policy meets the rules, it can initiate the corresponding task according to pre-configured rules. For each different target performance task, their task initiation parameters are different:

On/Off CPU Profiling: It automatically performs performance analysis on processes that meet the conditions, defaulting to 10 minutes of monitoring.
Network Profiling: It performs network performance analysis on all processes in the same Service Instance on the current machine, to prevent the cause of the issue from being unrealizable due to too few process being collected, defaulting to 10 minutes of monitoring.

Once the task is initiated, no new profiling tasks would be started for the current process for a certain period. The main reason for this is to prevent frequent task creation due to low threshold settings, which could affect program execution. The default time period is 20 minutes.

Data Flow

The figure 1 illustrates the data flow of the continuous profiling feature:

Figure 1: Data Flow of Continuous Profiling

eBPF Agent with Process

Firstly, we need to ensure that the eBPF Agent and the process to be monitored are deployed on the same host machine, so that we can collect relevant data from the process. When the eBPF Agent detects a threshold validation rule that conforms to the policy, it immediately triggers the profiling task for the target process, thereby reducing any intermediate steps and accelerating the ability to pinpoint performance issues.

Sliding window

The sliding window plays a crucial role in the eBPF Agent’s threshold determination process, as illustrated in the figure 2:

Figure 2: Sliding Window in eBPF Agent

Each element in the array represents the data value for a specified second in time. When the sliding window needs to verify whether it is responsible for a rule, it fetches the content of each element from a certain number of recent elements (period parameter). If an element exceeds the threshold, it is marked in red and counted. If the number of red elements exceeds a certain number, it is deemed to trigger a task.

Using a sliding window offers the following two advantages:

Fast retrieval of recent content: With a sliding window, complex calculations are unnecessary. You can know the data by simply reading a certain number of recent array elements.
Solving data spikes issues: Validation through count prevents situations where a data point suddenly spikes and then quickly returns to normal. Verification with multiple values can reveal whether exceeding the threshold is frequent or occasional.

eBPF Agent with SkyWalking Backend

The eBPF Agent communicates periodically with the SkyWalking backend, involving three most crucial operations:

Policy synchronization: Through periodic policy synchronization, the eBPF Agent can keep processes on the local machine updated with the latest policy rules as much as possible.
Metrics sending: For processes that are already being monitored, the eBPF Agent periodically sends the collected data to the backend program. This facilitates real-time query of current data values by users, who can also compare this data with historical values or thresholds when problems arise.
Profiling task reporting: When the eBPF detects that a certain process has triggered a policy rule, it automatically initiates a performance task, collects relevant information from the current process, and reports it to the SkyWalking backend. This allows users to know when, why, and what type of profiling task was triggered from the interface.

Demo

Next, let’s quickly demonstrate the continuous profiling feature, so you can understand more specifically what it accomplishes.

Deploy SkyWalking Showcase

SkyWalking Showcase contains a complete set of example services and can be monitored using SkyWalking. For more information, please check the official documentation.

In this demo, we only deploy service, the latest released SkyWalking OAP, and UI.

export SW_OAP_IMAGE=apache/skywalking-oap-server:9.5.0
export SW_UI_IMAGE=apache/skywalking-ui:9.5.0
export SW_ROVER_IMAGE=apache/skywalking-rover:0.5.0

export FEATURE_FLAGS=mesh-with-agent,single-node,elasticsearch,rover
make deploy.kubernetes

After deployment is complete, please run the following script to open SkyWalking UI: http://localhost:8080/.

kubectl port-forward svc/ui 8080:8080 --namespace default

Create Continuous Profiling Policy

Currently, continues profiling feature is set by default in the Service Mesh panel at the Service level.

Figure 3: Continuous Policy Tab

By clicking on the edit button aside from the Policy List, the polices of current service could be created or updated.

Figure 4: Edit Continuous Profiling Policy

Multiple polices are supported. Every policy has the following configurations.

Target Type: Specifies the type of profiling task to be triggered when the threshold determination is met.
Items: For profiling task of the same target, one or more validation items can be specified. As long as one validation item meets the threshold determination, the corresponding performance analysis task will be launched.
1. Monitor Type: Specifies the type of monitoring to be carried out for the target process.
2. Threshold: Depending on the type of monitoring, you need to fill in the corresponding threshold to complete the verification work.
3. Period: Specifies the number of recent seconds of data you want to monitor.
4. Count: Determines the total number of seconds triggered within the recent period.
5. URI Regex/List: This is applicable to HTTP monitoring types, allowing URL filtering.

Done

After clicking the save button, you can see the currently created monitoring rules, as shown in the figure 5:

Figure 5: Continuous Profiling Monitoring Processes

The data can be divided into the following parts:

Policy list: On the left, you can see the rule list you have created.
Monitoring Summary List: Once a rule is selected, you can see which pods and processes would be monitored by this rule. It also summarizes how many profiling tasks have been triggered in the last 48 hours by the current pod or process, as well as the last trigger time. This list is also sorted in descending order by the number of triggers to facilitate your quick review.

When you click on a specific process, a new dashboard would show to list metrics and triggered profiling results.

Figure 6: Continuous Profiling Triggered Tasks

The current figure contains the following data contents:

Task Timeline: It lists all profiling tasks in the past 48 hours. And when the mouse hovers over a task, it would also display detailed information:
1. Task start and end time: It indicates when the current performance analysis task was triggered.
2. Trigger reason: It would display the reason why the current process was profiled and list out the value of the metric exceeding the threshold when the profiling was triggered. so you can quickly understand the reason.
Task Detail: Similar to the CPU Profiling and Network Profiling introduced in previous articles, this would display the flame graph or process topology map of the current task, depending on the profiling type.

Meanwhile, on the Metrics tab, metrics relative to profiling policies are collected to retrieve the historical trend, in order to provide a comprehensive explanation of the trigger point about the profiling.

Figure 7: Continuous Profiling Metrics

Conclusion

In this article, I have detailed how the continuous profiling feature in SkyWalking and eBPF works. In general, it involves deploying the eBPF Agent service on the same machine where the process to be monitored resides, and monitoring the target process with low resource consumption. When it meets the threshold conditions, it would initiate more complex CPU Profiling and Network Profiling tasks.

In the future, we will offer even more features. Stay tuned!

Twitter, ASFSkyWalking
Slack. Send Request to join SkyWalking slack mail to the mail list(dev@skywalking.apache.org), we will invite you in.
Subscribe to our medium list.

Zh: 自动化性能分析——持续剖析

Sun, 25 Jun 2023 00:00:00 +0000

背景

在之前的文章中，我们讨论了如何使用 SkyWalking 和 eBPF 来检测性能问题，包括进程和网络。这些方法可以很好地定位问题，但仍然存在一些挑战：

任务启动的时间: 当需要进行性能监控时，解决需要性能监控的进程始终是一个挑战。通常需要手动参与，以标识进程和所需的性能分析类型，这会在崩溃恢复期间耗费额外的时间。根本原因定位和崩溃恢复时间有时会发生冲突。在实际情况中，重新启动可能是恢复的第一选择，同时也会破坏崩溃的现场。
任务的资源消耗: 确定分析范围的困难。过宽的分析范围会导致需要更多的资源。我们需要一种方法来管理资源消耗并了解哪些进程需要性能分析。
工程师能力: 通常由整个团队负责呼叫，其中有初级和高级工程师，即使是高级工程师也对复杂的分布式系统有其理解限制，单个人几乎无法理解整个系统。

持续剖析（Continuous Profiling） 是解决上述问题的新机制。

自动剖析

由于性能分析的资源消耗和高经验要求，因此引入一种方法以缩小范围并由高级 SRE 工程师创建策略自动剖析。因此，在 9.5.0 中，SkyWalking 首先引入了预设策略规则，以低功耗方式监视特定服务的 eBPF 代理，并在必要时自动运行剖析。

策略

策略规则指定了如何监视目标进程并确定在满足某些阈值条件时应启动何种类型的分析任务。

这些策略规则主要包括以下配置信息：

监测类型: 这指定了应在目标进程上实施什么样的监测。
阈值确定: 这定义了如何确定目标进程是否需要启动分析任务。
触发任务: 这指定了应启动什么类型的性能分析任务。

监测类型

监测类型是通过观察指定进程的数据值来生成相应的指标来确定的。这些指标值可以促进后续的阈值判断操作。在 eBPF 观测中，我们认为以下指标最能直接反映程序的当前性能：

监测类型	单位	描述
系统负载	负载	在指定时间段内的系统负载平均值。
进程 CPU	百分比	进程的 CPU 使用率百分比。
进程线程计数	计数	进程中的线程数。
HTTP 错误率	百分比	导致错误响应（例如，4xx 或 5xx 状态代码）的 HTTP 请求的百分比。
HTTP 平均响应时间	毫秒	HTTP 请求的平均响应时间。

指标收集器

eBPF 代理会定期报告以下进程度量，以指示进程性能：

名称	单位	描述
process_cpu	(0-100)%	CPU 使用率百分比
process_thread_count	计数	进程中的线程数
system_load	计数	最近一分钟的平均系统负载，每个进程的值相同
http_error_rate	(0-100)%	网络请求错误率百分比
http_avg_response_time	毫秒	网络平均响应持续时间

阈值确定

对于阈值的确定，eBPF 代理是基于其自身内存中的目标监测进程进行判断，而不是依赖于 SkyWalking 后端执行的计算。这种方法的优点在于，它不必等待复杂后端计算的结果，减少了复杂交互所带来的潜在问题。

通过使用此方法，eBPF 代理可以在条件满足后立即启动任务，而无需任何延迟。

它包括以下配置项：

阈值: 检查监测值是否符合指定的期望值。
周期: 监控数据的时间周期（秒），也可以理解为最近的持续时间。
计数: 检测期间触发阈值的次数（秒），也可以理解为最近持续时间内指定阈值规则触发的总次数（秒）。一旦满足计数检查，指定的分析任务将被开始。

触发任务

当 eBPF Agent 检测到指定策略中的阈值决策符合规则时，根据预配置的规则可以启动相应的任务。对于每个不同的目标性能任务，它们的任务启动参数都不同：

On/Off CPU Profiling: 它会自动对符合条件的进程进行性能分析，缺省情况下监控时间为 10 分钟。
Network Profiling: 它会对当前机器上同一 Service Instance 中的所有进程进行网络性能分析，以防问题的原因因被收集进程太少而无法实现，缺省情况下监控时间为 10 分钟。

一旦任务启动，当前进程将在一定时间内不会启动新的剖析任务。主要原因是为了防止因低阈值设置而频繁创建任务，从而影响程序执行。缺省时间为 20 分钟。

数据流

图 1 展示了持续剖析功能的数据流：

图 1: 持续剖析的数据流

eBPF Agent进行进程跟踪

首先，我们需要确保 eBPF Agent 和要监测的进程部署在同一台主机上，以便我们可以从进程中收集相关数据。当 eBPF Agent 检测到符合策略的阈值验证规则时，它会立即为目标进程触发剖析任务，从而减少任何中间步骤并加速定位性能问题的能力。

滑动窗口

滑动窗口在 eBPF Agent 的阈值决策过程中发挥着至关重要的作用，如图 2 所示：

图 2: eBPF Agent 中的滑动窗口

数组中的每个元素表示指定时间内的数据值。当滑动窗口需要验证是否负责某个规则时，它从最近的一定数量的元素 (period 参数) 中获取每个元素的内容。如果一个元素超过了阈值，则标记为红色并计数。如果红色元素的数量超过一定数量，则被认为触发了任务。

使用滑动窗口具有以下两个优点：

快速检索最近的内容：使用滑动窗口，无需进行复杂的计算。你可以通过简单地读取一定数量的最近数组元素来了解数据。
解决数据峰值问题：通过计数进行验证，可以避免数据点突然增加然后快速返回正常的情况。使用多个值进行验证可以揭示超过阈值是频繁还是偶然发生的。

eBPF Agent与OAP后端通讯

eBPF Agent 定期与 SkyWalking 后端通信，涉及三个最关键的操作：

策略同步：通过定期的策略同步，eBPF Agent 可以尽可能地让本地机器上的进程与最新的策略规则保持同步。
指标发送：对于已经被监视的进程，eBPF Agent 定期将收集到的数据发送到后端程序。这就使用户能够实时查询当前数据值，用户也可以在出现问题时将此数据与历史值或阈值进行比较。
剖析任务报告：当 eBPF 检测到某个进程触发了策略规则时，它会自动启动性能任务，从当前进程收集相关信息，并将其报告给 SkyWalking 后端。这使用户可以从界面了解何时、为什么和触发了什么类型的剖析任务。

演示

接下来，让我们快速演示持续剖析功能，以便你更具体地了解它的功能。

部署 SkyWalking Showcase

SkyWalking Showcase 包含完整的示例服务，并可以使用 SkyWalking 进行监视。有关详细信息，请查看官方文档。

在此演示中，我们只部署服务、最新发布的 SkyWalking OAP 和 UI。

export SW_OAP_IMAGE=apache/skywalking-oap-server:9.5.0
export SW_UI_IMAGE=apache/skywalking-ui:9.5.0
export SW_ROVER_IMAGE=apache/skywalking-rover:0.5.0

export FEATURE_FLAGS=mesh-with-agent,single-node,elasticsearch,rover
make deploy.kubernetes

部署完成后，请运行以下脚本以打开 SkyWalking UI：http://localhost:8080/。

kubectl port-forward svc/ui 8080:8080 --namespace default

创建持续剖析策略

目前，持续剖析功能在 Service Mesh 面板的 Service 级别中默认设置。

图 3: 持续策略选项卡

通过点击 Policy List 旁边的编辑按钮，可以创建或更新当前服务的策略。

图 4: 编辑持续剖析策略

支持多个策略。每个策略都有以下配置。

Target Type：指定符合阈值决策时要触发的剖析任务的类型。
Items：对于相同目标的剖析任务，可以指定一个或多个验证项目。只要一个验证项目符合阈值决策，就会启动相应的性能分析任务。
1. Monitor Type：指定要为目标进程执行的监视类型。
2. Threshold：根据监视类型的不同，需要填写相应的阈值才能完成验证工作。
3. Period：指定你要监测的最近几秒钟的数据数量。
4. Count：确定最近时间段内触发的总秒数。
5. URI 正则表达式/列表：这适用于 HTTP 监控类型，允许 URL 过滤。

完成

单击保存按钮后，你可以看到当前已创建的监控规则，如图 5 所示：

图 5: 持续剖析监控进程

数据可以分为以下几个部分：

策略列表：在左侧，你可以看到已创建的规则列表。
监测摘要列表：选择规则后，你可以看到哪些 pod 和进程将受到该规则的监视。它还总结了当前 pod 或进程在过去 48 小时内触发的性能分析任务数量，以及最后一个触发时间。该列表还按触发次数降序排列，以便你快速查看。

当你单击特定进程时，将显示一个新的仪表板以列出指标和触发的剖析结果。

图 6: 持续剖析触发的任务

当前图包含以下数据内容：

任务时间轴：它列出了过去 48 小时的所有剖析任务。当鼠标悬停在任务上时，它还会显示详细信息：
1. 任务的开始和结束时间：它指示当前性能分析任务何时被触发。
2. 触发原因：它会显示为什么会对当前进程进行剖析，并列出当剖析被触发时超过阈值的度量值，以便你快速了解原因。
任务详情：与前几篇文章介绍的 CPU 剖析和网络剖析类似，它会显示当前任务的火焰图或进程拓扑图，具体取决于剖析类型。

同时，在 Metrics 选项卡中，收集与剖析策略相关的指标以检索历史趋势，以便在剖析的触发点提供全面的解释。

图 7: 持续剖析指标

结论

在本文中，我详细介绍了 SkyWalking 和 eBPF 中持续剖析功能的工作原理。通常情况下，它涉及将 eBPF Agent 服务部署在要监视的进程所在的同一台计算机上，并以低资源消耗监测目标进程。当它符合阈值条件时，它会启动更复杂的 CPU 剖析和网络剖析任务。

在未来，我们将提供更多功能。敬请期待！

Twitter：ASFSkyWalking
Slack：向邮件列表 (dev@skywalking.apache.org) 发送“Request to join SkyWalking Slack”，我们会邀请你加入。
订阅我们的 Medium 列表。

Blog: eBPF enhanced HTTP observability - L7 metrics and tracing

Thu, 12 Jan 2023 00:00:00 +0000

Background

Apache SkyWalking is an open-source Application Performance Management system that helps users collect and aggregate logs, traces, metrics, and events for display on a UI. In the previous article, we introduced how to use Apache SkyWalking Rover to analyze the network performance issue in the service mesh environment. However, in business scenarios, users often rely on mature layer 7 protocols, such as HTTP, for interactions between systems. In this article, we will discuss how to use eBPF techniques to analyze performance bottlenecks of layer 7 protocols and how to enhance the tracing system using network sampling.

This article will show how to use Apache SkyWalking with eBPF to enhance metrics and traces in HTTP observability.

HTTP Protocol Analysis

HTTP is one of the most common Layer 7 protocols and is usually used to provide services to external parties and for inter-system communication. In the following sections, we will show how to identify and analyze HTTP/1.x protocols.

Protocol Identification

In HTTP/1.x, the client and server communicate through a single file descriptor (FD) on each side. Figure 1 shows the process of communication involving the following steps:

Connect/accept: The client establishes a connection with the HTTP server, or the server accepts a connection from the client.
Read/write (multiple times): The client or server reads and writes HTTPS requests and responses. A single request-response pair occurs within the same connection on each side.
Close: The client and server close the connection.

To obtain HTTP content, it’s necessary to read it from the second step of this process. As defined in the RFC, the content is contained within the data of the Layer 4 protocol and can be obtained by parsing the data. The request and response pair can be correlated because they both occur within the same connection on each side.

Figure 1: HTTP communication timeline.

HTTP Pipeline

HTTP pipelining is a feature of HTTP/1.1 that enables multiple HTTP requests to be sent over a single TCP connection without waiting for the corresponding responses. This feature is important because it ensures that the order of the responses on the server side matches the order of the requests.

Figure 2 illustrates how this works. Consider the following scenario: an HTTP client sends multiple requests to a server, and the server responds by sending the HTTP responses in the same order as the requests. This means that the first request sent by the client will receive the first response from the server, the second request will receive the second response, and so on.

When designing HTTP parsing, we should follow this principle by adding request data to a list and removing the first item when parsing a response. This ensures that the responses are processed in the correct order.

Figure 2: HTTP/1.1 pipeline.

Metrics

Based on the identification of the HTTP content and process topology diagram mentioned in the previous article, we can combine these two to generate process-to-process metrics data.

Figure 3 shows the metrics that currently support the analysis between the two processes. Based on the HTTP request and response data, we can analyze the following data:

Metrics Name	Type	Unit	Description
Request CPM(Call Per Minute)	Counter	count	The HTTP request count
Response Status CPM(Call Per Minute)	Counter	count	The count of per HTTP response status code
Request Package Size	Counter/Histogram	Byte	The request package size
Response Package Size	Counter/Histogram	Byte	The response package size
Client Duration	Counter/Histogram	Millisecond	The duration of single HTTP response on the client side
Server Duration	Counter/Histogram	Millisecond	The duration of single HTTP response on the server side

Figure 3: Process-to-process metrics.

HTTP and Trace

During the HTTP process, if we unpack the HTTP requests and responses from raw data, we can use this data to correlate with the existing tracing system.

Trace Context Identification

In order to track the flow of requests between multiple services, the trace system usually creates a trace context when a request enters a service and passes it along to other services during the request-response process. For example, when an HTTP request is sent to another server, the trace context is included in the request header.

Figure 4 displays the raw content of an HTTP request intercepted by Wireshark. The trace context information generated by the Zipkin Tracing system can be identified by the “X-B3” prefix in the header. By using eBPF to intercept the trace context in the HTTP header, we can connect the current request with the trace system.

Figure 4: View of HTTP headers in Wireshark.

Trace Event

We have added the concept of an event to traces. An event can be attached to a span and consists of start and end times, tags, and summaries, allowing us to attach any desired information to the Trace.

When performing eBPF network profiling, two events can be generated based on the request-response data. Figure 5 illustrates what happens when a service performs an HTTP request with profiling. The trace system generates trace context information and sends it in the request. When the service executes in the kernel, we can generate an event for the corresponding trace span by interacting with the request-response data and execution time in the kernel space.

Previously, we could only observe the execution status in the user space. However, by combining traces and eBPF technologies, we can now also get more information about the current trace in the kernel space, which would impact less performance for the target service if we do similar things in the tracing SDK and agent.

Figure 5: Logical view of profiling an HTTP request and response.

Sampling

To ensure efficient data storage and minimize unnecessary data sampling, we use a sampling mechanism for traces in our system. This mechanism triggers sampling only when certain conditions are met. We also provide a list of the top N traces, which allows users to quickly access the relevant request information for a specific trace.

To help users easily identify and analyze relevant events, we offer three different sampling rules:

Slow Traces: Sampling is triggered when the response time for a request exceeds a specified threshold.
Response Status [400, 500): Sampling is triggered when the response status code is greater than or equal to 400 and less than 500.
Response Status [500, 600): Sampling is triggered when the response status code is greater than or equal to 500 and less than 600.

In addition, we recognize that not all request or response raw data may be necessary for analysis. For example, users may be more interested in requesting data when trying to identify performance issues, while they may be more interested in response data when troubleshooting errors. As such, we also provide configuration options for request or response events to allow users to specify which type of data they would like to sample.

Profiling in a Service Mesh

The SkyWalking and SkyWalking Rover projects have already implemented the HTTP protocol analyze and trace associations. How do they perform when running in a service mesh environment?

Deployment

Figure 6 demonstrates the deployment of SkyWalking and SkyWalking Rover in a service mesh environment. SkyWalking Rover is deployed as a DaemonSet on each machine where a service is located and communicates with the SkyWalking backend cluster. It automatically recognizes the services on the machine and reports metadata information to the SkyWalking backend cluster. When a new network profiling task arises, SkyWalking Rover senses the task and analyzes the designated processes, collecting and aggregating network data before ultimately reporting it back to the SkyWalking backend service.

Figure 6: SkyWalking rover deployment topology in a service mesh.

Tracing Systems

Starting from version 9.3.0, the SkyWalking backend fully supports all functions in the Zipkin server. Therefore, the SkyWalking backend can collect traces from both the SkyWalking and Zipkin protocols. Similarly, SkyWalking Rover can identify and analyze trace context in both the SkyWalking and Zipkin trace systems. In the following two sections, network analysis results will be displayed in the SkyWalking and Zipkin UI respectively.

SkyWalking

When SkyWalking performs network profiling, similar to the TCP metrics in the previous article, the SkyWalking UI will first display the topology between processes. When you open the dashboard of the line representing the traffic metrics between processes, you can see the metrics of HTTP traffic from the “HTTP/1.x” tab and the sampled HTTP requests with tracing in the “HTTP Requests” tab.

As shown in Figure 7, there are three lists in the tab, each corresponding to a condition in the event sampling rules. Each list displays the traces that meet the pre-specified conditions. When you click on an item in the trace list, you can view the complete trace.

Figure 7: Sampled HTTP requests within tracing context.

When you click on an item in the trace list, you can quickly view the specified trace. In Figure 8, we can see that in the current service-related span, there is a tag with a number indicating how many HTTP events are related to that trace span.

Since we are in a service mesh environment, each service involves interacting with Envoy. Therefore, the current span includes Envoy’s request and response information. Additionally, since the current service has both incoming and outgoing requests, there are events in the corresponding span.

Figure 8: Events in the trace detail.

When the span is clicked, the details of the span will be displayed. If there are events in the current span, the relevant event information will be displayed on a time axis. As shown in Figure 9, there are a total of 6 related events in the current Span. Each event represents a data sample of an HTTP request/response. One of the events spans multiple time ranges, indicating a longer system call time. It may be due to a blocked system call, depending on the implementation details of the HTTP request in different languages. This can also help us query the possible causes of errors.

Figure 9: Events in one trace span.

Finally, we can click on a specific event to see its complete information. As shown in Figure 10, it displays the sampling information of a request, including the SkyWalking trace context protocol contained in the request header from the HTTP raw data. The raw request data allows you to quickly re-request the request to solve any issues.

Figure 10: The detail of the event.

Zipkin

Zipkin is one of the most widely used distributed tracing systems in the world. SkyWalking can function as an alternative server to provide advanced features for Zipkin users. Here, we use this way to bring the feature into the Zipkin ecosystem out-of-box. The new events would also be treated as a kind of Zipkin’s tags and annotations.

To add events to a Zipkin span, we need to do the following:

Split the start and end times of each event into two annotations with a canonical name.
Add the sampled HTTP raw data from the event to the Zipkin span tags, using the same event name for corresponding purposes.

Figures 11 and 12 show annotations and tags in the same span. In these figures, we can see that the span includes at least two events with the same event name and sequence suffix (e.g., “Start/Finished HTTP Request/Response Sampling-x” in the figure). Both events have separate timestamps to represent their relative times within the span. In the tags, the data content of the corresponding event is represented by the event name and sequence number, respectively.

Figure 11: Event timestamp in the Zipkin span annotation.

Figure 12: Event raw data in the Zipkin span tag.

Demo

In this section, we demonstrate how to perform network profiling in a service mesh and complete metrics collection and HTTP raw data sampling. To follow along, you will need a running Kubernetes environment.

Deploy SkyWalking Showcase

SkyWalking Showcase contains a complete set of example services and can be monitored using SkyWalking. For more information, please check the official documentation.

In this demo, we only deploy service, the latest released SkyWalking OAP, and UI.

export SW_OAP_IMAGE=apache/skywalking-oap-server:9.3.0
export SW_UI_IMAGE=apache/skywalking-ui:9.3.0
export SW_ROVER_IMAGE=apache/skywalking-rover:0.4.0

export FEATURE_FLAGS=mesh-with-agent,single-node,elasticsearch,rover
make deploy.kubernetes

After deployment is complete, please run the following script to open SkyWalking UI: http://localhost:8080/.

kubectl port-forward svc/ui 8080:8080 --namespace default

Start Network Profiling Task

Currently, we can select the specific instances that we wish to monitor by clicking the Data Plane item in the Service Mesh panel and the Service item in the Kubernetes panel.

In figure 13, we have selected an instance with a list of tasks in the network profiling tab.

Figure 13: Network Profiling tab in the Data Plane.

When we click the Start button, as shown in Figure 14, we need to specify the sampling rules for the profiling task. The sampling rules consist of one or more rules, each of which is distinguished by a different URI regular expression. When the HTTP request URI matches the regular expression, the rule is used. If the URI regular expression is empty, the default rule is used. Using multiple rules can help us make different sampling configurations for different requests.

Each rule has three parameters to determine if sampling is needed:

Minimal Request Duration (ms): requests with a response time exceeding the specified time will be sampled.
Sampling response status code between 400 and 499: all status codes in the range [400-499) will be sampled.
Sampling response status code between 500 and 599: all status codes in the range [500-599) will be sampled.

Once the sampling configuration is complete, we can create the task.

Figure 14: Create network profiling task page.

Done!

After a few seconds, you will see the process topology appear on the right side of the page.

When you click on the line between processes, you can view the data between the two processes, which is divided into three tabs:

TCP: displays TCP-related metrics.
HTTP/1.x: displays metrics in the HTTP 1 protocol.
HTTP Requests: displays the analyzed request and saves it to a list according to the sampling rule.

Figure 16: TCP metrics in a network profiling task.

Figure 17: HTTP/1.x metrics in a network profiling task.

Figure 18: HTTP sampled requests in a network profiling task.

Conclusion

In this article, we detailed the overview of how to analyze the Layer 7 HTTP/1.x protocol in network analysis, and how to associate it with existing trace systems. This allows us to extend the scope of data we can observe from just user space to also include kernel-space data.

In the future, we will delve further into the analysis of kernel data, such as collecting information on TCP packet size, transmission frequency, network card, and help on enhancing distributed tracing from another perspective.

Additional Resources

Zh: 使用 eBPF 提升 HTTP 可观测性 - L7 指标和追踪

Thu, 12 Jan 2023 00:00:00 +0000

背景

Apache SkyWalking 是一个开源应用性能管理系统，帮助用户收集和聚合日志、追踪、指标和事件，并在 UI 上显示。在上一篇文章中，我们介绍了如何使用 Apache SkyWalking Rover 分析服务网格环境中的网络性能问题。但是，在商业场景中，用户通常依靠成熟的第 7 层协议（如 HTTP）来进行系统之间的交互。在本文中，我们将讨论如何使用 eBPF 技术来分析第 7 层协议的性能瓶颈，以及如何使用网络采样来增强追踪系统。

本文将演示如何使用 Apache SkyWalking 与 eBPF 来增强 HTTP 可观察性中的指标和追踪。

HTTP 协议分析

HTTP 是最常用的 7 层协议之一，通常用于为外部方提供服务和进行系统间通信。在下面的章节中，我们将展示如何识别和分析 HTTP/1.x 协议。

协议识别

在 HTTP/1.x 中，客户端和服务器通过两端的单个文件描述符（File Descriptor）进行通信。图 1 显示了涉及以下步骤的通信过程：

Connect/Accept：客户端与 HTTP 服务器建立连接，或者服务器接受客户端的连接。
Read/Write（多次）：客户端或服务器读取和写入 HTTPS 请求和响应。单个请求 - 响应对在每边的同一连接内发生。
Close：客户端和服务器关闭连接。

为了获取 HTTP 内容，必须从此过程的第二步读取它。根据 RFC 定义，内容包含在 4 层协议的数据中，可以通过解析数据来获取。请求和响应对可以相关联，因为它们都在两端的同一连接内发生。

图 1：HTTP 通信时间线。

HTTP 管线化

HTTP 管线化（Pipelining）是 HTTP/1.1 的一个特性，允许在等待对应的响应的情况下在单个 TCP 连接上发送多个 HTTP 请求。这个特性很重要，因为它确保了服务器端的响应顺序必须与请求的顺序匹配。

图 2 说明了这是如何工作的，考虑以下情况：HTTP 客户端向服务器发送多个请求，服务器通过按照请求的顺序发送 HTTP 响应来响应。这意味着客户端发送的第一个请求将收到服务器的第一个响应，第二个请求将收到第二个响应，以此类推。

在设计 HTTP 解析时，我们应该遵循这个原则，将请求数据添加到列表中，并在解析响应时删除第一个项目。这可以确保响应按正确的顺序处理。

图 2： HTTP/1.1 管道。

指标

根据前文提到的 HTTP 内容和流程拓扑图的识别，我们可以将这两者结合起来生成进程间的指标数据。

图 3 显示了目前支持两个进程间分析的指标。基于 HTTP 请求和响应数据，可以分析以下数据：

指标名称	类型	单位	描述
请求 CPM（Call Per Minute）	计数器	计数	HTTP 请求计数
响应状态 CPM (Call Per Minute)	计数器	计数	每个 HTTP 响应状态码的计数
请求包大小	计数器 / 直方图	字节	请求包大小
响应包大小	计数器 / 直方图	字节	响应包大小
客户端持续时间	计数器 / 直方图	毫秒	客户端单个 HTTP 响应的持续时间
服务器持续时间	计数器 / 直方图	毫秒	服务器端单个 HTTP 响应的持续时间

图 3：进程到进程指标。

HTTP 和追踪

在 HTTP 过程中，如果我们能够从原始数据中解包 HTTP 请求和响应，就可以使用这些数据与现有的追踪系统进行关联。

追踪上下文标识

为了追踪多个服务之间的请求流，追踪系统通常在请求进入服务时创建追踪上下文，并在请求 - 响应过程中将其传递给其他服务。例如，当 HTTP 请求发送到另一个服务器时，追踪上下文包含在请求头中。

图 4 显示了 Wireshark 拦截的 HTTP 请求的原始内容。由 Zipkin Tracing 系统生成的追踪上下文信息可以通过头中的 “X-B3” 前缀进行标识。通过使用 eBPF 拦截 HTTP 头中的追踪上下文，可以将当前请求与追踪系统连接起来。

图 4：Wireshark 中的 HTTP Header 视图。

Trace 事件

我们已经将事件这个概念加入了追踪中。事件可以附加到跨度上，并包含起始和结束时间、标签和摘要，允许我们将任何所需的信息附加到追踪中。

在执行 eBPF 网络分析时，可以根据请求 - 响应数据生成两个事件。图 5 说明了在带分析的情况下执行 HTTP 请求时发生的情况。追踪系统生成追踪上下文信息并将其发送到请求中。当服务在内核中执行时，我们可以通过与内核空间中的请求 - 响应数据和执行时间交互，为相应的追踪跨度生成事件。

以前，我们只能观察用户空间的执行状态。现在，通过结合追踪和 eBPF 技术，我们还可以在内核空间获取更多关于当前追踪的信息，如果我们在追踪 SDK 和代理中执行类似的操作，将对目标服务的性能产生较小的影响。

图 5：分析 HTTP 请求和响应的逻辑视图。

抽样

该机制仅在满足特定条件时触发抽样。我们还提供了前 N 条追踪的列表，允许用户快速访问特定追踪的相关请求信息。为了帮助用户轻松识别和分析相关事件，我们提供了三种不同的抽样规则：

慢速追踪：当请求的响应时间超过指定阈值时触发抽样。
响应状态 [400,500)：当响应状态代码大于或等于 400 且小于 500 时触发抽样。
响应状态 [500,600)：当响应状态代码大于或等于 500 且小于 600 时触发抽样。

此外，我们认识到分析时可能并不需要所有请求或响应的原始数据。例如，当试图识别性能问题时，用户可能更感兴趣于请求数据，而在解决错误时，他们可能更感兴趣于响应数据。因此，我们还提供了请求或响应事件的配置选项，允许用户指定要抽样的数据类型。

服务网格中的分析

SkyWalking Rover 项目已经实现了 HTTP 协议的分析和追踪关联。当在服务网格环境中运行时它们的表现如何？

部署

图 6 演示了 SkyWalking 和 SkyWalking Rover 在服务网格环境中的部署方式。SkyWalking Rover 作为一个 DaemonSet 部署在每台服务所在的机器上，并与 SkyWalking 后端集群通信。它会自动识别机器上的服务并向 SkyWalking 后端集群报告元数据信息。当出现新的网络分析任务时，SkyWalking Rover 会感知该任务并对指定的进程进行分析，在最终将数据报告回 SkyWalking 后端服务之前，收集和聚合网络数据。

图 6：服务网格中的 SkyWalking rover 部署拓扑。

追踪系统

从版本 9.3.0 开始，SkyWalking 后端完全支持 Zipkin 服务器中的所有功能。因此，SkyWalking 后端可以收集来自 SkyWalking 和 Zipkin 协议的追踪。同样，SkyWalking Rover 可以在 SkyWalking 和 Zipkin 追踪系统中识别和分析追踪上下文。在接下来的两节中，网络分析结果将分别在 SkyWalking 和 Zipkin UI 中显示。

SkyWalking

当 SkyWalking 执行网络分析时，与前文中的 TCP 指标类似，SkyWalking UI 会首先显示进程间的拓扑图。当打开代表进程间流量指标的线的仪表板时，您可以在 “HTTP/1.x” 选项卡中看到 HTTP 流量的指标，并在 “HTTP Requests” 选项卡中看到带追踪的抽样的 HTTP 请求。

如图 7 所示，选项卡中有三个列表，每个列表对应事件抽样规则中的一个条件。每个列表显示符合预先规定条件的追踪。当您单击追踪列表中的一个项目时，就可以查看完整的追踪。

图 7：Tracing 上下文中的采样 HTTP 请求。

当您单击追踪列表中的一个项目时，就可以快速查看指定的追踪。在图 8 中，我们可以看到在当前的服务相关的跨度中，有一个带有数字的标签，表示与该追踪跨度相关的 HTTP 事件数。

由于我们在服务网格环境中，每个服务都涉及与 Envoy 交互。因此，当前的跨度包括 Envoy 的请求和响应信息。此外，由于当前的服务有传入和传出的请求，因此相应的跨度中有事件。

图 8：Tracing 详细信息中的事件。

当单击跨度时，将显示跨度的详细信息。如果当前跨度中有事件，则相关事件信息将在时间轴上显示。如图 9 所示，当前跨度中一共有 6 个相关事件。每个事件代表一个 HTTP 请求 / 响应的数据样本。其中一个事件跨越多个时间范围，表示较长的系统调用时间。这可能是由于系统调用被阻塞，具体取决于不同语言中的 HTTP 请求的实现细节。这也可以帮助我们查询错误的可能原因。

图 9：一个 Tracing 范围内的事件。

最后，我们可以单击特定的事件查看它的完整信息。如图 10 所示，它显示了一个请求的抽样信息，包括从 HTTP 原始数据中的请求头中包含的 SkyWalking 追踪上下文协议。原始请求数据允许您快速重新请求以解决任何问题。

图 10：事件的详细信息。

Zipkin

Zipkin 是世界上广泛使用的分布式追踪系统。SkyWalking 可以作为替代服务器，提供高级功能。在这里，我们使用这种方式将功能无缝集成到 Zipkin 生态系统中。新事件也将被视为 Zipkin 的标签和注释的一种。

为 Zipkin 跨度添加事件，需要执行以下操作：

将每个事件的开始时间和结束时间分别拆分为两个具有规范名称的注释。
将抽样的 HTTP 原始数据从事件添加到 Zipkin 跨度标签中，使用相同的事件名称用于相应的目的。

图 11 和图 12 显示了同一跨度中的注释和标签。在这些图中，我们可以看到跨度包含至少两个具有相同事件名称和序列后缀的事件（例如，图中的 “Start/Finished HTTP Request/Response Sampling-x”）。这两个事件均具有单独的时间戳，用于表示其在跨度内的相对时间。在标签中，对应事件的数据内容分别由事件名称和序列号表示。

图 11：Zipkin span 注释中的事件时间戳。

图 12：Zipkin span 标签中的事件原始数据。

演示

在本节中，我们将演示如何在服务网格中执行网络分析，并完成指标收集和 HTTP 原始数据抽样。要进行操作，您需要一个运行中的 Kubernetes 环境。

部署 SkyWalking Showcase

SkyWalking Showcase 包含一套完整的示例服务，可以使用 SkyWalking 进行监控。有关详细信息，请参阅官方文档。

在本演示中，我们只部署了服务、最新发布的 SkyWalking OAP 和 UI。

export SW_OAP_IMAGE=apache/skywalking-oap-server:9.3.0
export SW_UI_IMAGE=apache/skywalking-ui:9.3.0
export SW_ROVER_IMAGE=apache/skywalking-rover:0.4.0

export FEATURE_FLAGS=mesh-with-agent,single-node,elasticsearch,rover
make deploy.kubernetes

部署完成后，运行下面的脚本启动 SkyWalking UI：http://localhost:8080/。

kubectl port-forward svc/ui 8080:8080 --namespace default

启动网络分析任务

目前，我们可以通过单击服务网格面板中的 Data Plane 项和 Kubernetes 面板中的 Service 项来选择要监视的特定实例。

在图 13 中，我们已在网络分析选项卡中选择了一个具有任务列表的实例。

图 13：数据平面中的网络分析选项卡。

当我们单击 “开始” 按钮时，如图 14 所示，我们需要为分析任务指定抽样规则。抽样规则由一个或多个规则组成，每个规则都由不同的 URI 正则表达式区分。当 HTTP 请求的 URI 与正则表达式匹配时，将使用该规则。如果 URI 正则表达式为空，则使用默认规则。使用多个规则可以帮助我们为不同的请求配置不同的抽样配置。

每个规则都有三个参数来确定是否需要抽样：

最小请求持续时间（毫秒）：响应时间超过指定时间的请求将被抽样。
在 400 和 499 之间的抽样响应状态代码：范围 [400-499) 中的所有状态代码将被抽样。
在 500 和 599 之间的抽样响应状态代码：范围 [500-599) 中的所有状态码将被抽样。

抽样配置完成后，我们就可以创建任务了。

图 14：创建网络分析任务页面。

完成

几秒钟后，你会看到页面的右侧出现进程拓扑结构。

图 15：网络分析任务中的流程拓扑。

当您单击进程之间的线时，您可以查看两个过程之间的数据，它被分为三个选项卡：

TCP：显示与 TCP 相关的指标。
HTTP/1.x：显示 HTTP 1 协议中的指标。
HTTP 请求：显示已分析的请求，并根据抽样规则保存到列表中。

图 16：网络分析任务中的 TCP 指标。

图 17：网络分析任务中的 HTTP/1.x 指标。

图 18：网络分析任务中的 HTTP 采样请求。

总结

在本文中，我们详细介绍了如何在网络分析中分析 7 层 HTTP/1.x 协议，以及如何将其与现有追踪系统相关联。这使我们能够将我们能够观察到的数据从用户空间扩展到内核空间数据。

在未来，我们将进一步探究内核数据的分析，例如收集 TCP 包大小、传输频率、网卡等信息，并从另一个角度提升分布式追踪。

其他资源

Blog: Diagnose Service Mesh Network Performance with eBPF

Tue, 27 Sep 2022 00:00:00 +0000

Background

This article will show how to use Apache SkyWalking with eBPF to make network troubleshooting easier in a service mesh environment.

Apache SkyWalking is an application performance monitor tool for distributed systems. It observes metrics, logs, traces, and events in the service mesh environment and uses that data to generate a dependency graph of your pods and services. This dependency graph can provide quick insights into your system, especially when there’s an issue.

However, when troubleshooting network issues in SkyWalking’s service topology, it is not always easy to pinpoint where the error actually is. There are two reasons for the difficulty:

Traffic through the Envoy sidecar is not easy to observe. Data from Envoy’s Access Log Service (ALS) shows traffic between services (sidecar-to-sidecar), but not metrics on communication between the Envoy sidecar and the service it proxies. Without that information, it is more difficult to understand the impact of the sidecar.
There is a lack of data from transport layer (OSI Layer 4) communication. Since services generally use application layer (OSI Layer 7) protocols such as HTTP, observability data is generally restricted to application layer communication. However, the root cause may actually be in the transport layer, which is typically opaque to observability tools.

Access to metrics from Envoy-to-service and transport layer communication can make it easier to diagnose service issues. To this end, SkyWalking needs to collect and analyze transport layer metrics between processes inside Kubernetes pods - a task well suited to eBPF. We investigated using eBPF for this purpose and present our results and a demo below.

Monitoring Kubernetes Networks with eBPF

With its origins as the Extended Berkeley Packet Filter, eBPF is a general purpose mechanism for injecting and running your own code into the Linux kernel and is an excellent tool for monitoring network traffic in Kubernetes Pods. In the next few sections, we'll provide an overview of how to use eBPF for network monitoring as background for introducing Skywalking Rover, a metrics collector and profiler powered by eBPF to diagnose CPU and network performance.

How Applications and the Network Interact

Interactions between the application and the network can generally be divided into the following steps from higher to lower levels of abstraction:

User Code: Application code uses high-level network libraries in the application stack to exchange data across the network, like sending and receiving HTTP requests.
Network Library: When the network library receives a network request, it interacts with the language API to send the network data.
Language API: Each language provides an API for operating the network, system, etc. When a request is received, it interacts with the system API. In Linux, this API is called syscalls.
Linux API: When the Linux kernel receives the request through the API, it communicates with the socket to send the data, which is usually closer to an OSI Layer 4 protocol, such as TCP, UDP, etc.
Socket Ops: Sending or receiving the data to/from the NIC.

Our hypothesis is that eBPF can monitor the network. There are two ways to implement the interception: User space (uprobe) or Kernel space (kprobe). The table below summarizes the differences.

	Pros	Cons
uprobe	• Get more application-related contexts, such as whether the current request is HTTP or HTTPS. • Requests and responses can be intercepted by a single method	• Data structures can be unstable, so it is more difficult to get the desired data. • Implementation may differ between language/library versions. • Does not work in applications without symbol tables.
kprobe	• Available for all languages. • The data structure and methods are stable and do not require much adaptation. • Easier correlation with underlying data, such as getting the destination address of TCP, OSI Layer 4 protocol metrics, etc.	• A single request and response may be split into multiple probes. • Contextual information is not easy to get for stateful requests. For example header compression in HTTP/2.

For the general network performance monitor, we chose to use the kprobe (intercept the syscalls) for the following reasons:

It’s available for applications written in any programming language, and it’s stable, so it saves a lot of development/adaptation costs.
It can be correlated with metrics from the system level, which makes it easier to troubleshoot.
As a single request and response are split into multiple probes, we can use technology to correlate them.
For contextual information, It’s usually used in OSI Layer 7 protocol network analysis. So, if we just monitor the network performance, then they can be ignored.

Kprobes and network monitoring

Following the network syscalls of Linux documentation, we can implement network monitoring by intercepting two types of methods: socket operations and send/receive methods.

Socket Operations

When accepting or connecting with another socket, we can get the following information:

Connection information: Includes the remote address from the connection which helps us to understand which pod is connected.
Connection statics: Includes basic metrics from sockets, such as round-trip time (RTT), lost packet count in TCP, etc.
Socket and file descriptor (FD) mapping: Includes the relationship between the Linux file descriptor and socket object. It is useful when sending and receiving data through a Linux file descriptor.

Send/Receive

The interface related to sending or receiving data is the focus of performance analysis. It mainly contains the following parameters:

Socket file descriptor: The file descriptor of the current operation corresponding to the socket.
Buffer: The data sent or received, passed as a byte array.

Based on the above parameters, we can analyze the following data:

Bytes: The size of the packet in bytes.
Protocol: The protocol analysis according to the buffer data, such as HTTP, MySQL, etc.
Execution Time: The time it takes to send/receive the data.

At this point (Figure 1) we can analyze the following steps for the whole lifecycle of the connection:

Connect/Accept: When the connection is created.
Transform: Sending and receiving data on the connection.
Close: When the connection is closed.

Figure 1

Protocol and TLS

The previous section described how to analyze connections using send or receive buffer data. For example, following the HTTP/1.1 message specification to analyze the connection. However, this does not work for TLS requests/responses.

Figure 2

When TLS is in use, the Linux Kernel transmits data encrypted in user space. In the figure above, The application usually transmits SSL data through a third-party library (such as OpenSSL). For this case, the Linux API can only get the encrypted data, so it cannot recognize any higher layer protocol. To decrypt inside eBPF, we need to follow these steps:

Read unencrypted data through uprobe: Compatible multiple languages, using uprobe to capture the data that is not encrypted before sending or after receiving. In this way, we can get the original data and associate it with the socket.
Associate with socket: We can associate unencrypted data with the socket.

OpenSSL Use case

For example, the most common way to send/receive SSL data is to use OpenSSL as a shared library, specifically the SSL_read and SSL_write methods to submit the buffer data with the socket.

Following the documentation, we can intercept these two methods, which are almost identical to the API in Linux. The source code of the SSL structure in OpenSSL shows that the Socket FD exists in the BIO object of the SSL structure, and we can get it by the offset.

In summary, with knowledge of how OpenSSL works, we can read unencrypted data in an eBPF function.

Introducing SkyWalking Rover, an eBPF-based Metrics Collector and Profiler

SkyWalking Rover introduces the eBPF network profiling feature into the SkyWalking ecosystem. It’s currently supported in a Kubernetes environment, so must be deployed inside a Kubernetes cluster. Once the deployment is complete, SkyWalking Rover can monitor the network for all processes inside a given Pod. Based on the monitoring data, SkyWalking can generate the topology relationship diagram and metrics between processes.

Topology Diagram

The topology diagram can help us understand the network access between processes inside the same Pod, and between the process and external environment (other Pod or service). Additionally, it can identify the data direction of traffic based on the line flow direction.

In Figure 3 below, all nodes within the hexagon are the internal process of a Pod, and nodes outside the hexagon are externally associated services or Pods. Nodes are connected by lines, which indicate the direction of requests or responses between nodes (client or server). The protocol is indicated on the line, and it’s either HTTP(S), TCP, or TCP(TLS). Also, we can see in this figure that the line between Envoy and Python applications is bidirectional because Envoy intercepts all application traffic.

Figure 3

Metrics

Once we recognize the network call relationship between processes through the topology, we can select a specific line and view the TCP metrics between the two processes.

The diagram below (Figure 4) shows the metrics of network monitoring between two processes. There are four metrics in each line. Two on the left side are on the client side, and two on the right side are on the server side. If the remote process is not in the same Pod, only one side of the metrics is displayed.

Figure 4

The following two metric types are available:

Counter: Records the total number of data in a certain period. Each counter contains the following data: a. Count: Execution count. b. Bytes: Packet size in bytes. c. Execution time: Execution duration.
Histogram: Records the distribution of data in the buckets.

Based on the above data types, the following metrics are exposed:

Name	Type	Unit	Description
Write	Counter and histogram	Millisecond	The socket write counter.
Read	Counter and histogram	Millisecond	The socket read counter.
Write RTT	Counter and histogram	Microsecond	The socket write round trip time (RTT) counter.
Connect	Counter and histogram	Millisecond	The socket connect/accept with another server/client counter.
Close	Counter and histogram	Millisecond	The socket with other socket counter.
Retransmit	Counter	Millisecond	The socket retransmit package counter.
Drop	Counter	Millisecond	The socket drop package counter.

Demo

In this section, we demonstrate how to perform network profiling in the service mesh. To follow along, you will need a running Kubernetes environment.

NOTE: All commands and scripts are available in this GitHub repository.

Install Istio

Istio is the most widely deployed service mesh, and comes with a complete demo application that we can use for testing. To install Istio and the demo application, follow these steps:

Install Istio using the demo configuration profile.
Label the default namespace, so Istio automatically injects Envoy sidecar proxies when we’ll deploy the application.
Deploy the bookinfo application to the cluster.
Deploy the traffic generator to generate some traffic to the application.

export ISTIO_VERSION=1.13.1

# install istio
istioctl install -y --set profile=demo
kubectl label namespace default istio-injection=enabled

# deploy the bookinfo applications
kubectl apply -f https://raw.githubusercontent.com/istio/istio/$ISTIO_VERSION/samples/bookinfo/platform/kube/bookinfo.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/$ISTIO_VERSION/samples/bookinfo/networking/bookinfo-gateway.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/$ISTIO_VERSION/samples/bookinfo/networking/destination-rule-all.yaml
kubectl apply -f https://raw.githubusercontent.com/istio/istio/$ISTIO_VERSION/samples/bookinfo/networking/virtual-service-all-v1.yaml

# generate traffic
kubectl apply -f https://raw.githubusercontent.com/mrproliu/skywalking-network-profiling-demo/main/resources/traffic-generator.yaml

Install SkyWalking

The following will install the storage, backend, and UI needed for SkyWalking:

git clone https://github.com/apache/skywalking-kubernetes.git
cd skywalking-kubernetes
cd chart
helm dep up skywalking
helm -n istio-system install skywalking skywalking \
 --set fullnameOverride=skywalking \
 --set elasticsearch.minimumMasterNodes=1 \
 --set elasticsearch.imageTag=7.5.1 \
 --set oap.replicas=1 \
 --set ui.image.repository=apache/skywalking-ui \
 --set ui.image.tag=9.2.0 \
 --set oap.image.tag=9.2.0 \
 --set oap.envoy.als.enabled=true \
 --set oap.image.repository=apache/skywalking-oap-server \
 --set oap.storageType=elasticsearch \
 --set oap.env.SW_METER_ANALYZER_ACTIVE_FILES='network-profiling'

Install SkyWalking Rover

SkyWalking Rover is deployed on every node in Kubernetes, and it automatically detects the services in the Kubernetes cluster. The network profiling feature has been released in the version 0.3.0 of SkyWalking Rover. When a network monitoring task is created, the SkyWalking rover sends the data to the SkyWalking backend.

kubectl apply -f https://raw.githubusercontent.com/mrproliu/skywalking-network-profiling-demo/main/resources/skywalking-rover.yaml

Start the Network Profiling Task

Once all deployments are completed, we must create a network profiling task for a specific instance of the service in the SkyWalking UI.

To open SkyWalking UI, run:

kubectl port-forward svc/skywalking-ui 8080:80 --namespace
istio-system

Currently, we can select the specific instances that we wish to monitor by clicking the Data Plane item in the Service Mesh panel and the Service item in the Kubernetes panel.

In the figure below, we have selected an instance with a list of tasks in the network profiling tab. When we click the start button, the SkyWalking Rover starts monitoring this instance’s network.

Figure 5

Done!

After a few seconds, you will see the process topology appear on the right side of the page.

Figure 6

When you click on the line between processes, you can see the TCP metrics between the two processes.

Figure 7

Conclusion

In this article, we detailed a problem that makes troubleshooting service mesh architectures difficult: lack of context between layers in the network stack. These are the cases when eBPF begins to really help with debugging/productivity when existing service mesh/envoy cannot. Then, we researched how eBPF could be applied to common communication, such as TLS. Finally, we demo the implementation of this process with SkyWalking Rover.

For now, we have completed the performance analysis for OSI layer 4 (mostly TCP). In the future, we will also introduce the analysis for OSI layer 7 protocols like HTTP.

Get Started with Istio

To get started with service mesh today, Tetrate Istio Distro is the easiest way to install, manage, and upgrade Istio. It provides a vetted upstream distribution of Istio that’s tested and optimized for specific platforms by Tetrate plus a CLI that facilitates acquiring, installing, and configuring multiple Istio versions. Tetrate Istio Distro also offers FIPS certified Istio builds for FedRAMP environments.

For enterprises that need a unified and consistent way to secure and manage services and traditional workloads across complex, heterogeneous deployment environments, we offer Tetrate Service Bridge, our flagship edge-to-workload application connectivity platform built on Istio and Envoy.

Additional Resources

Blog: Pinpoint Service Mesh Critical Performance Impact by using eBPF

Tue, 05 Jul 2022 00:00:00 +0000

Content

Background

Apache SkyWalking observes metrics, logs, traces, and events for services deployed into the service mesh. When troubleshooting, SkyWalking error analysis can be an invaluable tool helping to pinpoint where an error occurred. However, performance problems are more difficult: It’s often impossible to locate the root cause of performance problems with pre-existing observation data. To move beyond the status quo, dynamic debugging and troubleshooting are essential service performance tools. In this article, we’ll discuss how to use eBPF technology to improve the profiling feature in SkyWalking and analyze the performance impact in the service mesh.

Trace Profiling in SkyWalking

Since SkyWalking 7.0.0, Trace Profiling has helped developers find performance problems by periodically sampling the thread stack to let developers know which lines of code take more time. However, Trace Profiling is not suitable for the following scenarios:

Thread Model: Trace Profiling is most useful for profiling code that executes in a single thread. It is less useful for middleware that relies heavily on async execution models. For example Goroutines in Go or Kotlin Coroutines.
Language: Currently, Trace Profiling is only supported in Java and Python, since it’s not easy to obtain the thread stack in the runtimes of some languages such as Go and Node.js.
Agent Binding: Trace Profiling requires Agent installation, which can be tricky depending on the language (e.g., PHP has to rely on its C kernel; Rust and C/C++ require manual instrumentation to make install).
Trace Correlation: Since Trace Profiling is only associated with a single request it can be hard to determine which request is causing the problem.
Short Lifecycle Services: Trace Profiling doesn’t support short-lived services for (at least) two reasons:
1. It’s hard to differentiate system performance from class code manipulation in the booting stage.
2. Trace profiling is linked to an endpoint to identify performance impact, but there is no endpoint to match these short-lived services.

Fortunately, there are techniques that can go further than Trace Profiling in these situations.

Introduce eBPF

We have found that eBPF — a technology that can run sandboxed programs in an operating system kernel and thus safely and efficiently extend the capabilities of the kernel without requiring kernel modifications or loading kernel modules — can help us fill gaps left by Trace Profiling. eBPF is a trending technology because it breaks the traditional barrier between user and kernel space. Programs can now inject bytecode that runs in the kernel, instead of having to recompile the kernel to customize it. This is naturally a good fit for observability.

In the figure below, we can see that when the system executes the execve syscalls, the eBPF program is triggered, and the current process runtime information is obtained by using function calls.

Using eBPF technology, we can expand the scope of Skywalking’s profiling capabilities:

Global Performance Analysis: Before eBPF, data collection was limited to what agents can observe. Since eBPF programs run in the kernel, they can observe all threads. This is especially useful when you are not sure whether a performance problem is caused by a particular request.
Data Content: eBPF can dump both user and kernel space thread stacks, so if a performance issue happens in kernel space, it’s easier to find.
Agent Binding: All modern Linux kernels support eBPF, so there is no need to install anything. This means it is an orchestration-free vs an agent model. This reduces friction caused by built-in software which may not have the correct agents installed, such as Envoy in a Service Mesh.
Sampling Type: Unlike Trace Profiling, eBPF is event-driven and, therefore, not constrained by interval polling. For example, eBPF can trigger events and collect more data depending on a transfer size threshold. This can allow the system to triage and prioritize data collection under extreme load.

eBPF Limitations

While eBPF offers significant advantages for hunting performance bottlenecks, no technology is perfect. eBPF has a number of limitations described below. Fortunately, since SkyWalking does not require eBPF, the impact is limited.

Linux Version Requirement: eBPF programs require a Linux kernel version above 4.4, with later kernel versions offering more data to be collected. The BCC has documented the features supported by different Linux kernel versions, with the differences between versions usually being what data can be collected with eBPF.
Privileges Required: All processes that intend to load eBPF programs into the Linux kernel must be running in privileged mode. As such, bugs or other issues in such code may have a big impact.
Weak Support for Dynamic Language: eBPF has weak support for JIT-based dynamic languages, such as Java. It also depends on what data you want to collect. For Profiling, eBPF does not support parsing the symbols of the program, which is why most eBPF-based profiling technologies only support static languages like C, C++, Go, and Rust. However, symbol mapping can sometimes be solved through tools provided by the language. For example, in Java, perf-map-agent can be used to generate the symbol mapping. However, dynamic languages don’t support the attach (uprobe) functionality that would allow us to trace execution events through symbols.

Introducing SkyWalking Rover

SkyWalking Rover introduces the eBPF profiling feature into the SkyWalking ecosystem. The figure below shows the overall architecture of SkyWalking Rover. SkyWalking Rover is currently supported in Kubernetes environments and must be deployed inside a Kubernetes cluster. After establishing a connection with the SkyWalking backend server, it saves information about the processes on the current machine to SkyWalking. When the user creates an eBPF profiling task via the user interface, SkyWalking Rover receives the task and executes it in the relevant C, C++, Golang, and Rust language-based programs.

Other than an eBPF-capable kernel, there are no additional prerequisites for deploying SkyWalking Rover.

CPU Profiling with Rover

CPU profiling is the most intuitive way to show service performance. Inspired by Brendan Gregg‘s blog post, we’ve divided CPU profiling into two types that we have implemented in Rover:

On-CPU Profiling: Where threads are spending time running on-CPU.
Off-CPU Profiling: Where time is spent waiting while blocked on I/O, locks, timers, paging/swapping, etc.

Profiling Envoy with eBPF

Envoy is a popular proxy, used as the data plane by the Istio service mesh. In a Kubernetes cluster, Istio injects Envoy into each service’s pod as a sidecar where it transparently intercepts and processes incoming and outgoing traffic. As the data plane, any performance issues in Envoy can affect all service traffic in the mesh. In this scenario, it’s more powerful to use eBPF profiling to analyze issues in production caused by service mesh configuration.

Demo Environment

If you want to see this scenario in action, we’ve built a demo environment where we deploy an Nginx service for stress testing. Traffic is intercepted by Envoy and forwarded to Nginx. The commands to install the whole environment can be accessed through GitHub.

On-CPU Profiling

On-CPU profiling is suitable for analyzing thread stacks when service CPU usage is high. If the stack is dumped more times, it means that the thread stack occupies more CPU resources.

When installing Istio using the demo configuration profile, we found there are two places where we can optimize performance:

Zipkin Tracing: Different Zipkin sampling percentages have a direct impact on QPS.
Access Log Format: Reducing the fields of the Envoy access log can improve QPS.

Zipkin Tracing

Zipkin with 100% sampling

In the default demo configuration profile, Envoy is using 100% sampling as default tracing policy. How does that impact the performance?

As shown in the figure below, using the on-CPU profiling, we found that it takes about 16% of the CPU overhead. At a fixed consumption of 2 CPUs, its QPS can reach 5.7K.

Disable Zipkin tracing

At this point, we found that if Zipkin is not necessary, the sampling percentage can be reduced or we can even disable tracing. Based on the Istio documentation, we can disable tracing when installing the service mesh using the following command:

istioctl install -y --set profile=demo \
   --set 'meshConfig.enableTracing=false' \
   --set 'meshConfig.defaultConfig.tracing.sampling=0.0'

After disabling tracing, we performed on-CPU profiling again. According to the figure below, we found that Zipkin has disappeared from the flame graph. With the same 2 CPU consumption as in the previous example, the QPS reached 9K, which is an almost 60% increase.

Tracing with Throughput

With the same CPU usage, we’ve discovered that Envoy performance greatly improves when the tracing feature is disabled. Of course, this requires us to make trade-offs between the number of samples Zipkin collects and the desired performance of Envoy (QPS).

The table below illustrates how different Zipkin sampling percentages under the same CPU usage affect QPS.

Zipkin sampling %	QPS	CPUs	Note
100% (default)	5.7K	2	16% used by Zipkin
1%	8.1K	2	0.3% used by Zipkin
disabled	9.2K	2	0% used by Zipkin

Access Log Format

Default Log Format

In the default demo configuration profile, the default Access Log format contains a lot of data. The flame graph below shows various functions involved in parsing the data such as request headers, response headers, and streaming the body.

Simplifying Access Log Format

Typically, we don’t need all the information in the access log, so we can often simplify it to get what we need. The following command simplifies the access log format to only display basic information:

istioctl install -y --set profile=demo \
   --set meshConfig.accessLogFormat="[%START_TIME%] \"%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%\" %RESPONSE_CODE%\n"

After simplifying the access log format, we found that the QPS increased from 5.7K to 5.9K. When executing the on-CPU profiling again, the CPU usage of log formatting dropped from 2.4% to 0.7%.

Simplifying the log format helped us to improve the performance.

Off-CPU Profiling

Off-CPU profiling is suitable for performance issues that are not caused by high CPU usage. For example, when there are too many threads in one service, using off-CPU profiling could reveal which threads spend more time context switching.

We provide data aggregation in two dimensions:

Switch count: The number of times a thread switches context. When the thread returns to the CPU, it completes one context switch. A thread stack with a higher switch count spends more time context switching.
Switch duration: The time it takes a thread to switch the context. A thread stack with a higher switch duration spends more time off-CPU.

Write Access Log

Enable Write

Using the same environment and settings as before in the on-CPU test, we performed off-CPU profiling. As shown below, we found that access log writes accounted for about 28% of the total context switches. The “__write” shown below also indicates that this method is the Linux kernel method.

Disable Write

SkyWalking implements Envoy’s Access Log Service (ALS) feature which allows us to send access logs to the SkyWalking Observability Analysis Platform (OAP) using the gRPC protocol. Even by disabling the access logging, we can still use ALS to capture/aggregate the logs. We’ve disabled writing to the access log using the following command:

istioctl install -y --set profile=demo --set meshConfig.accessLogFile=""

After disabling the Access Log feature, we performed the off-CPU profiling. File writing entries have disappeared as shown in the figure below. Envoy throughput also increased from 5.7K to 5.9K.

Conclusion

In this article, we’ve examined the insights Apache Skywalking’s Trace Profiling can give us and how much more can be achieved with eBPF profiling. All of these features are implemented in skywalking-rover. In addition to on- and off-CPU profiling, you will also find the following features:

Continuous profiling, helps you automatically profile without manual intervention. For example, when Rover detects that the CPU exceeds a configurable threshold, it automatically executes the on-CPU profiling task.
More profiling types to enrich usage scenarios, such as network, and memory profiling.

Apache SkyWalking – EBPF

Blog: SkyWalking 10 Release: Service Hierarchy, Kubernetes Network Monitoring by eBPF, BanyanDB, and More

Layer and Service Hierarchy

Layer Jump

Service Hierarchy

Monitoring Kubernetes Network Traffic by using eBPF

BanyanDB - Native APM Database

Apache RocketMQ Server Monitoring

ClickHouse Server Monitoring

Apache ActiveMQ Server Monitoring

Support Multiple Labels Names

Metrics gRPC exporter

SkyWalking Native UI Metrics Query Switch to V3 APIs

Other Notable Enhancements

Zh: SkyWalking 10 发布：服务层次结构、基于 eBPF 的 Kubernetes 网络监控、BanyanDB 等

Layer 和 Service Hierarchy

Layer Jump

Service Hierarchy

使用 eBPF 监控 Kubernetes 网络流量

BanyanDB - 原生 APM 数据库

Apache RocketMQ 服务器监控

ClickHouse Server 监控

Apache ActiveMQ 服务器监控

支持多标签名称

Metrics gRPC 导出器

SkyWalking 原生 UI 指标查询切换到 V3 API

其他值得注意的增强功能

Blog: Activating Automatical Performance Analysis -- Continuous Profiling

Background

Automate Profiling

Policy

Monitoring type

Network related monitoring

Metrics collector

Threshold determination

Trigger task

Data Flow

eBPF Agent with Process

Sliding window

eBPF Agent with SkyWalking Backend

Demo

Deploy SkyWalking Showcase

Create Continuous Profiling Policy

Done

Conclusion

Zh: 自动化性能分析——持续剖析

背景

自动剖析

策略

监测类型

相关网络监测

指标收集器

阈值确定

触发任务

数据流

eBPF Agent进行进程跟踪

滑动窗口

eBPF Agent与OAP后端通讯

演示

部署 SkyWalking Showcase

创建持续剖析策略

完成

结论

Blog: eBPF enhanced HTTP observability - L7 metrics and tracing

Background

HTTP Protocol Analysis

Protocol Identification

HTTP Pipeline

Metrics

HTTP and Trace

Trace Context Identification

Trace Event

Sampling

Profiling in a Service Mesh

Deployment

Tracing Systems

SkyWalking

Zipkin

Demo

Deploy SkyWalking Showcase