Building Unified Observability for Kong with DeepFlow

Casey, Bin Peng 2024-06-06 Reads:

Gateway AutoTracing Kong

This article is the second in a series on building unified observability capabilities for API gateways based on DeepFlow. It aims to explain how to solve the issue of observability data silos in API gateways.

This article will introduce how to use DeepFlow’s zero-code eBPF-based features to build an observability solution for Kong Gateway. It will also explain how to integrate the rich data sources from existing Kong plugins on this basis to eliminate silos and create a unified observability platform for comprehensive monitoring and analysis of Kong Gateway. With DeepFlow, Kong Gateway can achieve comprehensive observability from traffic monitoring and tracing analysis to performance optimization, eliminating data fragmentation and providing a centralized monitoring view. This accelerates fault diagnosis and performance tuning, making the work of DevOps and SRE teams more efficient.

Building Unified Observability for Kong with DeepFlow

0x0: Installing Kong and DeepFlow

To build unified observability capabilities for Kong based on DeepFlow, you need to deploy both DeepFlow and the Kong Gateway. For convenience, this article deploys both DeepFlow and Kong as K8s services in an All-in-One K8s cluster, which includes the Kong Ingress Controller (control plane) and the Kong Gateway (data plane) components. The entire deployment process takes approximately 5 minutes. For detailed deployment instructions, refer to the DeepFlow official deployment documentation and the Kong official documentation.

Note: To leverage DeepFlow’s eBPF-based observability capabilities, ensure that the server’s Linux operating system kernel is version 4.14 or above (version 3.10 is also acceptable when using CentOS or Red Hat distributions).

0x1: Distributed Tracing

Distributed tracing focuses on discussing the distributed call chain between Kong’s data plane (Kong Gateway) and its backend services. There are two solutions for implementing distributed tracing capabilities for the Kong Gateway and backend services using DeepFlow: 1) Utilizing eBPF, DeepFlow can achieve out-of-the-box API-level distributed tracing without modifying the code or configuration of the Kong Gateway and backend services. 2) When the backend services already have APM (Application Performance Monitoring) capabilities—such as using tools like OpenTelemetry or SkyWalking—it can be combined with Kong Gateway’s Tracers plugin to integrate all tracing data into DeepFlow, achieving full-stack application function-level distributed tracing.

Two Methods for Implementing Distributed Tracing of Kong and Backend Services in DeepFlow

DeepFlow eBPF AutoTracing

DeepFlow’s distributed tracing (AutoTracing) capability is ready-to-use, requiring no plugins to be enabled on the Kong Gateway. You only need to deploy the deepflow-agent on the server where the Kong Gateway is located. By accessing the Distributed Tracing Dashboard provided by DeepFlow in Grafana, you can initiate a trace for a specific call and see the full-link tracing process of this call within the Kong Gateway and its backend services. As shown in the diagram below:

①: Access the port of the K8s Node where the Kong Gateway service is located through a cloud LB.
②: Enter the network interface card of the POD corresponding to the Kong Gateway service.
③: Enter the nginx process within the Kong Gateway service.
④: Complete the business processing, and forward the request to the backend service via the nginx process.
⑤: Forward through the network interface card of the POD corresponding to the Kong Gateway service.
⑥/⑦: Forward the request to the backend service.

DeepFlow eBPF AutoTracing

DeepFlow eBPF + OpenTelemetry

In this method, the Kong Gateway uses the OpenTelemetry plugin to generate trace data, and the backend services also have APM capabilities and can convert their generated trace data into the OpenTelemetry format. When both the Kong Gateway and backend services send their trace data to DeepFlow, DeepFlow can generate a full-stack call chain tracing flame graph with no blind spots, including APM application spans, eBPF system spans, and cBPF network spans.

This method is suitable when we want to obtain distributed tracing at the function level within the backend service processes, or when the backend services use thread pools to handle calls (which could cause DeepFlow AutoTracing to break the chain).

1.Deploy Backend Services with APM

To demonstrate the complete tracing effect, we first deployed a Demo application supporting OpenTelemetry capabilities behind the Kong Gateway. For deploying the Demo application, refer to: DeepFlow Demo - One-click deployment of the WebShop application composed of five microservices written in Spring Boot. Create a route on the Kong Ingress Controller to access the backend services.

## Create Ingress Resource for Service
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-shop-ingress
  namespace: deepflow-otel-spring-demo
  annotations:
    konghq.com/strip-path: 'false'    
spec:
  ingressClassName: kong
  rules:
  - http:
      paths:
      - path: /shop/full-test
        pathType: Prefix
        backend:
          service:
            name: web-shop
            port:
              number: 8090

## Access Service
curl $CLUSTER-IP/shop/full-test
{
    "count": 1,
    "elapsed": 19,
    "elapsedAvg": 19,
    "startAt": "2024-06-03 15:27:37.218",
    "stopAt": "2024-06-03 15:27:37.237",
    "success": 0,
    "failed": 1
}

2.Enable the OpenTelemetry Plugin in Kong Gateway

Add the OpenTelemetry plugin in the Kong Gateway configuration:

apiVersion: configuration.konghq.com/v1
kind: KongClusterPlugin
metadata:
  name: opentelemetry
  annotations:
    kubernetes.io/ingress.class: kong
  labels:
    global: "true"
plugin: opentelemetry 
config:
  endpoint: http://$deepflow_agent_address/api/v1/otel/trace

3.Integrate OpenTelemetry Trace Data with DeepFlow

Integrate OpenTelemetry span data through the DeepFlow Agent. This feature is enabled by default and requires no additional configuration.

## Display DeepFlow-Agent Default Configuration
## deepflow-ctl agent-group-config example

## Data Integration Socket
## Default: 1. Options: 0 (disabled), 1 (enabled).
## Note: Whether to enable receiving external data sources such as Prometheus,
##   Telegraf, OpenTelemetry, and SkyWalking.
#external_agent_http_proxy_enabled: 1

4.Demonstration of OpenTelemetry Integration

We initiate a command from the client to access the WebShop service: curl -H "HOST: kong.deepflow.demo" $CLUSTER-IP/shop/full-test.

In Grafana, open the Distributed Tracing Dashboard provided by DeepFlow. Find the corresponding call and initiate tracing. You will see that both the Kong Gateway and the backend services are traced, and the application spans generated by APM are fully correlated with the network spans and system spans generated by DeepFlow on a single flame graph:

Note: In the flame graph, ‘A’ represents the application spans generated by APM, while ‘N’ and ‘S’ represent the network spans and system spans generated by DeepFlow.

DeepFlow eBPF + OTel

0x2: Performance Metrics

For performance metrics, DeepFlow also provides out-of-the-box viewing of RED (Rate, Error, Duration) performance metrics at the endpoint level, as well as rich TCP network performance metrics (throughput, retransmissions, zero window, connection anomalies, etc.). Similarly, instance and route-level metrics data such as HTTP status codes, bandwidth, connection counts, and latency, obtained from Kong Gateway’s metrics plugins (e.g., Prometheus, StatsD), can be integrated into DeepFlow and viewed in the Grafana Dashboard provided by Kong.

Collecting Kong Performance Metrics with DeepFlow

Out-of-the-Box eBPF Performance Metrics

After deploying the deepflow-agent on the server where the Kong Gateway is located, it can automatically collect very granular metrics at the application and network layers. These include request rates, response latencies, and error states for specific clients and endpoints, as well as TCP connection latencies and anomalies. Detailed metrics can be found on the DeepFlow website. In Grafana, by opening the Application - xxx Ingress Dashboard provided by DeepFlow, you can view application layer performance metrics related to the Kong Gateway. Network-related metrics can be viewed in the Network - xxx Dashboard.

DeepFlow eBPF Performance Metrics (Application)

DeepFlow eBPF Performance Metrics (Network)

Enable the Prometheus Plugin in Kong Gateway

Add the Prometheus plugin in the Kong Gateway configuration. For specific plugin configuration, refer to the Kong Prometheus Plugin documentation.

apiVersion: configuration.konghq.com/v1
kind: KongClusterPlugin
metadata:
  name: prometheus
  annotations:
    kubernetes.io/ingress.class: kong
  labels:
    global: "true"
plugin: prometheus
config:
  status_code_metrics: true
  bandwidth_metrics: true
  upstream_health_metrics: true
  latency_metrics: true
  per_consumer: true

Use Prometheus to Pull Kong Gateway Metrics

In this setup, Prometheus is deployed using the Kube-Prometheus project. First, update the values.yaml file in the Kong Helm Chart package:

gateway:
  enabled: true
  serviceMonitor:
    enabled: true
    labels:
      release: promstack

Create permissions for Prometheus to scrape resources in the kong namespace:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: prometheus-kong
  namespace: kong
rules:
- apiGroups: [""]
  resources: ["pods", "endpoints", "services"]
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: prometheus-kong
  namespace: kong
subjects:
- kind: ServiceAccount
  name: prometheus-k8s
  namespace: monitoring
roleRef:
  kind: Role
  name: prometheus-kong
  apiGroup: rbac.authorization.k8s.io

At this point, a Prometheus backend service is needed to collect the metrics generated by the Kong Gateway plugin, so a prometheus-server needs to be deployed first. However, since storing these metrics does not rely on prometheus-server, an Agent Mode prometheus-server can be deployed, or a more lightweight grafana-agent can be used instead of prometheus-server. Assuming prometheus-server is already deployed, enable RemoteWrite to send the metrics data to DeepFlow:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    ...
  name: prometheus-k8s
  namespace: monitoring
spec:
  remoteWrite:
    - url: "http://$deepflow-agent-address/api/v1/prometheus"

Integrate Prometheus Metrics Data with DeepFlow

Integrate Prometheus metrics data through the DeepFlow Agent. This feature is enabled by default and requires no additional configuration.

## Display DeepFlow-Agent Default Configuration
## deepflow-ctl agent-group-config example

## Data Integration Socket
## Default: 1. Options: 0 (disabled), 1 (enabled).
## Note: Whether to enable receiving external data sources such as Prometheus,
##   Telegraf, OpenTelemetry, and SkyWalking.
#external_agent_http_proxy_enabled: 1

Demonstration of Prometheus Integration

Since DeepFlow supports PromQL, you only need to change the data source of the Kong-provided Grafana Dashboard to DeepFlow to view the rich native performance metrics of the Kong Gateway. For detailed instructions on using these metrics, refer to the official documentation on the Prometheus plugin.

Display Kong Dashboard via DeepFlow Data Source

0x3: Access Logs and Continuous Profiling

Collecting Kong Access Logs and Profiling Data with DeepFlow

For access logs, the logs recorded by Kong Gateway can be forwarded to DeepFlow through logging plugins such as Vector. Without using plugins, DeepFlow does not require any modifications to the Kong Gateway; you only need to deploy the deepflow-agent on the server where the Kong Gateway is located. Then, in Grafana, open the Application - Request Log Dashboard provided by DeepFlow to view the access logs. The logs include header information from requests and responses, and can analyze the response latency and error codes for each request.

Access Logs Dashboard Provided by DeepFlow

DeepFlow also uses eBPF to obtain snapshots of application function call stacks (an enterprise feature), allowing it to generate On-CPU/Off-CPU profiles for the Kong Gateway process. In addition to business functions, the function call stacks can also display the time consumption of dynamic link libraries and kernel system call functions.

On-CPU Continuous Profiling Feature in DeepFlow Enterprise Edition

Off-CPU Continuous Profiling Feature in DeepFlow Enterprise Edition

0x4: What is DeepFlow

DeepFlow is an observability product developed by Yunshan Networks (opens new window), designed to provide in-depth observability for complex cloud infrastructure and cloud-native applications. Based on eBPF, DeepFlow implements application performance metrics, distributed tracing, continuous profiling, and other observation signals with zero-disturbance (Zero Code) collection, integrating intelligent tags (SmartEncoding) technology to achieve a full stack (Full Stack) correlation. By using DeepFlow, cloud-native applications automatically attain deep observability, alleviating the burden on developers and providing monitoring and diagnostic capabilities from code to infrastructure for DevOps/SRE teams.

GitHub Repository: https://github.com/deepflowio/deepflow
Visit the DeepFlow Demo to experience zero-code, full-coverage, and fully correlated observability.