v7.0 EE Release Notes
Created:2025-08-19 Last Modified:2025-08-19
This document was translated by ChatGPT
#1. Business and Applications
#1.1 Business Observability
- (New UI) Added automatic correlation display of system metrics, events, and application logs in the right-slide page.
- Usability improvements
- Added
Metrics Analysiscapability to the right-slide page for quick classification and comparison of application and network performance metrics. - (New UI) Optimized the usability of business definition operations, supporting drag-and-drop to define services in the topology.
- Added
#1.2 Application Observability
- AutoTracing
- ⭐ Support for RocketMQ protocol collection and tracing, Documentation.
- ⭐ Support for Tars protocol collection and tracing, Documentation.
- Support for Ping protocol collection and tracing, Documentation.
- Support for Dubbo protocol collection and tracing when using Fastjson serialization, Documentation.
- Support for parsing compressed MySQL calls, Documentation.
- Support for parsing MySQL Login Response statements and truncated MySQL protocol content.
- Optimized parsing of unary-type gRPC calls, Documentation.
- Support for parsing multiple DNS requests in TCP Payload, and parsing SRV-type DNS call logs, Documentation (opens new window).
- Support for collecting Unix Socket call logs, and automatic tracing between TCP/UDP Socket call logs and Unix Socket call logs.
- Enriched eBPF hook points for file read/write event collection to improve adaptability.
- Support for parsing TraceID and SpanID from Borui and Yunzhihui APM.
- Support for cross-thread analysis of the parent Span of the current Span (system Span at the client process location).
- AutoMetrics
- Added timeout ratio metric (
timeout_ratio) to application performance metrics (application,application_map).
- Added timeout ratio metric (
- AutoTagging
- ⭐ Optimized the meaning of the
process_knamefield in call logs and file read/write event data, changing fromkernel thread nametosystem processname for better readability. - ⭐ Optimized the meaning of the
response_statusfield in call logs and improved page prompt information.- Normal: Response code is normal.
- Client Error: Response code indicates a client-side error, e.g., HTTP 4XX.
- Server Error: Response code indicates a server-side error, e.g., HTTP 5XX.
- Timeout: If no response is collected within a certain time, the request is marked as timed out.
- Collector
Application Session Merge Timeout Setting: DNS and TLS default 15s, other protocols default 120s, Documentation.
- Collector
- Unknown: When concurrent requests exceed the collector's cache capacity, the oldest requests are marked as unknown.
- Collector
Session Aggregate Max Entriessetting: Default cache of 64K requests, Documentation.
- Collector
- Parse Failed: Response was collected but the response code could not be parsed due to truncation or compression.
- Collector
Payload Truncationsetting: Default parses the first 1024 bytes of Payload, Documentation.
- Collector
- ⭐ Optimized the meaning of the
#1.3 Code Observability
- Usability improvements
- Collect
Java/PythonOnCPU profiling data by default. - Collect
deepflow-*OnCPU profiling data by default.
- Collect
#2. Infrastructure
#2.1 Asset Observability
- ⭐ Added asset observability feature, supporting viewing observability data from the perspective of cloud hosts and container resources.
#2.2 Network Observability
- Changed the end status (
close_type) of non-TCP traffic in network flow logs from timeout to normal end (1). - Changed the default unit for all traffic rates on the page from bytes per second (
Bps) to bits per second (bps).
#2.3 Traffic Distribution
- Distribution strategy supports specifying collector groups.
#3. Customization
#3.1 Dashboards
- When using PromQL queries, support setting metric aliases, units, and thresholds.
#4. Others
#4.1 Resource List
- AutoTagging
- Process resources
- ⭐ Automatically record gprocess name as jar/py file name to avoid all showing as java/python.
- Aggregate processes with the same
cmdlinewithin the same cloud host or the same K8s workload into a unique gprocess to reduce redundant process information. - Optimized default values for process matcher, Documentation.
- By default, ignore collection of
sleep/sh/bash/pause/runcprocess information. - By default, collect process information for
Java/Python. - By default, collect process information for
deepflow-*. - By default, collect process information in containers.
- By default, ignore collection of
- Support for collecting and associating change events of K8s resource definitions and ConfigMaps.
- Process resources
- Usability improvements
- ⭐ Performance: Added KV search capability to list pages to improve search experience in large-scale resource scenarios with millions of entries.
- Added ID column to VPC resource list to align with cloud platforms.
- When entering a peering connection, VPC can be left empty to establish peering with all VPCs under the specified cloud platform.
#4.2 System Management
- Server
- ⭐ Support for MCP Server.
- ⭐ Support for defining indexes for fields such as attribute.X, metrics.X to speed up retrieval of commonly used fields.
- Support for terminating remote upgrades of collectors and optimizing CPU resource usage of Server during upgrades.
- Support for setting maximum query duration to avoid excessive resource consumption for large time-scale queries.
- Agent
- ⭐ OneAgent: Support for using deepflow-agent to collect application logs, host system metrics, and K8s container system metrics.
- ⭐ OneAgent: Support for using deepflow-agent for continuous probing.
- ⭐ Security: Support for limiting the number of Sockets used by deepflow-agent, Documentation.
- ⭐ Adaptability: Support for collecting traffic from Pod internal NICs, suitable for scenarios where Pod NIC traffic cannot be directly collected under the Root network namespace (e.g., Huawei Cloud CCE Turbo CNI (opens new window)), Documentation.
- ⭐ Performance: Support for compressed transmission of PCAP data, with compression ratio up to 5:1 ~ 10:1, Documentation.
- ⭐ Performance: Support for compressed sending of call logs and flow logs, with call log compression ratio up to 8:1 in test environments, Documentation.
- ⭐ Performance: Optimized memory usage of Cache for application performance metrics in Agent by timely cleaning up expired LRU entries, reducing overall memory consumption by 43% in test environments.
- ⭐ Performance: Aggregate and store flow logs generated by LB health checks, reducing flow log storage overhead by nearly 50% in a production environment, Documentation.
- ⭐ Performance: Improved call log merge success rate on the agent side, significantly reducing the proportion of
response_status = Unknowncall logs, with a 50% reduction in unknown rate observed in test environments. - ⭐ Support for collecting virtual and physical NIC traffic on non-Open vSwitch DPDK KVM hosts, Documentation.
- Adapted to K8s CNI with identical MAC addresses for virtual NICs on the same host.
- Optimized resource overhead protection mechanism when application protocol recognition fails to avoid mistakenly disabling application protocol parsing, Documentation.
- Collector list supports displaying associated VPC information.
- Limit agent data sending bandwidth consumption, default allowing 100Mbps, Documentation.
- When agent traffic reaches the rate limit, support choosing between
droporwaitstrategies; default is drop, can be configured to wait to improve data sending success rate, Documentation. - Added circuit breaker mechanism for free disk space in Agent runtime environment, Documentation.
- Support for disabling Agent use of Swap memory, Documentation.
- Optimization: Reduced work performed by Agent when in disabled state.