v6.6 CE Release Notes

Created:2025-08-19 Last Modified:2025-08-19

This document was translated by ChatGPT

#1. Backport From 7.0

  • AutoTracing
    • [2025/01/02] Support collection and tracing of the Tars protocol, documentation.
    • [2025/01/16] For non-TCP traffic in network flow logs (l4_flow_log), change the end status (close_type) from timeout to normal end (1).
    • [2025/04/02] Support collection and tracing of the Ping protocol, documentation.
    • [2025/04/02] Support collection and tracing of the Dubbo protocol when using Fastjson serialization, documentation.
    • [2025/04/15] Support parsing MySQL Login Response statements.
    • [2025/04/15] Support parsing multiple DNS requests in a TCP Payload.
    • [2025/04/28] Enrich eBPF hook points for collecting file read/write events (io_event) to improve adaptability.
    • [2025/05/29] Support collecting Unix Socket call logs (l7_flow_log) and automatic tracing between TCP/UDP Socket call logs and Unix Socket call logs.
    • [2025/05/29] Support parsing SRV type DNS call logs, documentation (opens new window).
    • [2025/05/29] Support parsing truncated MySQL protocol content.
  • AutoTagging
    • [2025/04/28] Optimize the meaning of the process_kname field in call logs and file read/write event data, changing from kernel thread name to system process name for better readability.
    • [2025/04/28] Aggregate processes with the same cmdline within the same cloud host or the same K8s workload into a unique gprocess to reduce redundant process information.
    • [2025/04/28] Optimize default values for the process matcher, documentation.
      • By default, ignore collection of process information for sleep/sh/bash/pause/runc.
      • By default, collect process information and OnCPU profiling data for Java/Python, and automatically record the gprocess name as the jar/py file name to avoid all being displayed as java/python.
      • By default, collect process information and OnCPU profiling data for deepflow-*.
      • By default, collect process information in containers.
    • [2025/04/28] Optimize the meaning of the response_status field in call logs (l7_flow_log).
      • Normal: Response code is normal.
      • Client Error: Response code indicates a client-side error, e.g., HTTP 4XX.
      • Server Error: Response code indicates a server-side error, e.g., HTTP 5XX.
      • Timeout: If no response is collected within a certain time, the request is marked as timed out.
        • Agent Application session merge timeout configuration: DNS and TLS default 15s, other protocols default 120s, documentation.
      • Unknown: When concurrent requests exceed the collector's cache capacity, the oldest requests are marked as unknown.
        • Agent Maximum session aggregation entries configuration: Default cache of 64K requests, documentation.
      • Parse Failed: Response was collected but the response code could not be parsed due to truncation or compression.
        • Agent Payload truncation configuration: Default parses the first 1024 bytes of the Payload, documentation.
    • [2025/06/11] Optimize parsing of unary type gRPC calls, documentation.
    • [2025/08/21] Support collecting multiple HTTP2/gRPC requests and responses in a single packet.
    • [2025/08/21] Support obtaining the full file path for file read/write events:
      • Fully obtain NAS file paths, supporting NFS, SMB, CIFS, and other protocols.
      • Fully obtain the absolute path for file read/write inside container Pods.
  • AutoMetrics
    • [2025/08/21] Support aggregation to generate eBPF profiling metric data with 1s granularity to speed up profiling metric queries.
  • AutoTagging
    • [2025/08/21] Simplify process sync blacklist configuration, documentation.
    • [2025/08/21] Adapt to K8s v1.32+ API.
  • Server
    • [2025/02/11] Support terminating remote upgrades of collectors and optimize CPU resource usage of the Server during upgrades.
  • Agent
    • [2025/02/11] Support limiting the number of sockets used by deepflow-agent, documentation.
    • [2025/03/18] Support collecting traffic from Pod internal NICs, applicable to scenarios where Pod NIC traffic cannot be directly collected under the Root network namespace (e.g., Huawei Cloud CCE Turbo CNI (opens new window)), documentation.
    • [2025/04/15] Limit the bandwidth consumption of data sent by the agent, default allowing 100Mbps, documentation.
    • [2025/04/28] Optimize memory usage of the cache for application performance metrics in the Agent by timely cleaning up expired LRU entries, reducing overall memory consumption by 43% in test environments.
    • [2025/04/28] Aggregate and store flow logs (l4_flow_log) generated by LB health checks, reducing flow log storage overhead by nearly 50% in some production environments, documentation.
    • [2025/04/28] Optimize resource overhead protection mechanism when application protocol recognition fails to avoid mistakenly disabling application protocol parsing, documentation.
    • [2025/05/16] Support compressed transmission of call logs and flow logs, with a compression ratio of up to 8:1 in test environments, documentation.
    • [2025/05/29] When Agent traffic reaches the rate limit, support choosing between drop or wait strategies; default is drop, can be configured to wait to improve data transmission success rate, documentation.
    • [2025/06/11] Add a circuit breaker mechanism for free disk space in the Agent runtime environment, documentation.
    • [2025/06/11] Support disabling Agent's use of swap memory, documentation.
    • [2025/06/11] Adapt to K8s CNI with identical MAC addresses for virtual NICs on the same host.
    • [2025/06/11] Optimization: Reduce work done by the Agent when disabled.
    • [2025/08/21] Support Watchdog mechanism to ensure circuit breakers execute properly in extreme cases.
    • [2025/08/21] Support compressed transmission of application logs, with compression ratios between 5:1 and 20:1, documentation.

#2. v6.6.9 [2024/12/12]

#2.1 Stable Feature

  • AutoTracing
    • Support collection and tracing of the Memcached protocol, documentation.
    • cBPF data supports Tars protocol parsing, documentation.
    • File read/write events support collecting the full path of file names and the offset of read/write files.
  • AutoProfiling
    • Support CPU performance profiling for Python and CUDA.
    • Optimize Java process symbol table synchronization mechanism, reducing transient CPU consumption introduced to business processes by about 50%.
    • Improve function stack merging efficiency, reducing resource overhead for function stack reporting, with significant performance improvement in scenarios with many threads of the same name.
  • AutoTagging
    • When TraceID exists in the protocol header, support disabling eBPF syscall_trace_id calculation (via syscall_trace_id_disabled) to reduce impact on business performance.
    • Support completely disabling cBPF data collection (by setting tap_interface_regex to an empty string) to reduce memory overhead.
    • Enhance process synchronization capability, documentation.
      • Support synchronizing only processes inside containers.
      • Support not synchronizing Socket information (only process information).
    • When a region whitelist is configured for the cloud platform (Domain), calling the Region API is no longer required.
    • Failure to obtain NAT gateway, routing table, or load balancer information from Alibaba Cloud or Tencent Cloud will not affect synchronization of other resource information.
  • Server
    • Optimize storage performance of genesis* related MySQL tables.
    • Support using ByConity instead of ClickHouse, documentation.
    • Support using ClickHouse Enterprise Edition (currently only supported on Alibaba Cloud), documentation (opens new window).
  • Agent
    • Support compressed transmission of profiling data, reducing bandwidth consumption by 30%.
    • Application log data supports compressed transmission, reducing bandwidth consumption by 95% (CPU consumption increases by 3%).
    • Support deepflow-agent using a single socket to transmit all observability data, and allow disabling this feature via multiple_sockets_to_ingester to use multiple sockets for improved transmission performance.
    • When BTF (BPF Type Format) is enabled on Linux, and the kernel is >= 5.5 (opens new window) on X86 architecture or >= 6.0 (opens new window) on ARM architecture, the agent will automatically use fentry/fexit instead of kprobe/kretprobe, resulting in about 15% performance improvement.
    • The original environment variable ONLY_WATCH_K8S_RESOURCE has been replaced with K8S_WATCH_POLICY, documentation.

#3. v6.6.8 [2024/11/14]

#3.1 Stable Feature

  • Server
    • By default, aggregate and generate network performance metrics and application performance metrics with granularity of 1h and 1d.
  • Agent

#4. v6.6.7 [2024/10/31]

#4.1 Beta Feature

  • AutoTagging
    • Enhance process synchronization capability, documentation.
      • Support synchronizing only processes inside containers.
      • Support not synchronizing Socket information (only process information).

#5. v6.6.6 [2024/10/11]

#5.1 Backward Incompatible Change

  • AutoTracing
    • To reduce resource overhead and avoid misidentification, the agent will by default only parse the following application protocols (to enable parsing of other protocols, configure l7-protocol-enabled):
      • HTTP, HTTP2/gRPC, MySQL, Redis, Kafka, DNS, TLS.
      • Reminder: When using Wasm to parse private protocols, please add Custom to l7-protocol-enabled.

#5.2 Stable Feature

#6. v6.6.5 [2024/09/24]

#6.1 Beta Feature

  • AutoProfiling
    • Optimize Java process symbol table synchronization mechanism, reducing transient CPU consumption introduced to business processes by about 50%.
    • Improve function stack merging efficiency, reducing resource overhead for function stack reporting, with significant performance improvement in scenarios with many threads of the same name.
  • Server
    • Optimize storage performance of genesis* related MySQL tables.
    • AutoTagging: When a region whitelist is configured for the cloud platform (Domain), calling the Region API is no longer required.
    • AutoTagging: Failure to obtain NAT gateway, routing table, or load balancer information from Alibaba Cloud or Tencent Cloud will not affect synchronization of other resource information.
  • Agent
    • When BTF (BPF Type Format) is enabled on Linux, and the kernel is >= 5.5 (opens new window) on X86 architecture or >= 6.0 (opens new window) on ARM architecture, the agent will automatically use fentry/fexit instead of kprobe/kretprobe, resulting in about 15% performance improvement.
    • Support compressed transmission of profiling data, reducing bandwidth consumption by 30%.
    • The original environment variable ONLY_WATCH_K8S_RESOURCE has been replaced with K8S_WATCH_POLICY, documentation.

#6.2 Stable Feature

  • AutoTracing
    • Support enhancing HTTP2/gRPC call logs using Wasm Plugin (currently not supporting enhancement of eBPF uprobe data), documentation.
  • AutoProfiling
    • Support stack unwinding using DWARF when Frame Pointer is missing.
  • AutoTagging
    • Support Alibaba Cloud resource synchronization using a regular account's AK/SK with ResourceGroupId.

#7. v6.6.4 [2024/08/29]

#7.1 Beta Feature

  • AutoTracing
  • AutoProfiling
    • Support stack unwinding using DWARF when Frame Pointer is missing.
  • AutoTagging
    • Support Alibaba Cloud resource synchronization using a regular account's AK/SK with ResourceGroupId.
  • Server

#7.2 Stable Feature

  • AutoTracing
    • Automatically correct minor clock drift between different machines in distributed tracing flame graphs.
  • AutoTagging
    • Support customizing K8s workload abstraction rules using Lua Plugin, documentation.
    • Support synchronizing LoadBalancer type container services.
  • Server
    • Support using OceanBase instead of MySQL.

#8. v6.6.3 [2024/08/15]

#8.1 Beta Feature

  • AutoTracing
    • When TraceID exists in the protocol header, support disabling eBPF syscall_trace_id calculation (via syscall_trace_id_disabled) to reduce impact on business performance.
    • Automatically correct minor clock drift between different machines in distributed tracing flame graphs.
  • AutoTagging
    • Support customizing K8s workload abstraction rules using Lua Plugin, documentation.
  • Agent
    • Support completely disabling cBPF data collection (by setting tap_interface_regex to an empty string) to reduce memory overhead.
    • Support deepflow-agent using a single socket to transmit all observability data, and allow disabling this feature via multiple_sockets_to_ingester to use multiple sockets for improved transmission performance.

#8.2 Stable Feature

  • AutoProfiling
  • AutoMetrics
    • Support aligning timestamps of request and response metrics within the same session to help AIOps systems better perform root cause analysis (thanks to pegasusljn: FR (opens new window)).
  • AutoTagging
    • Correctly tag Universal Tag for loopback NIC traffic on K8s Nodes.
  • Agent
    • Reduce the number of sockets used by deepflow-agent when sending data.
      • Merge sockets used for transmitting open_telemetry and open_telemetry_compressed data when integrating with OpenTelemetry.
      • Merge sockets used for agent self-monitoring, transmitting deepflow_stats and agent_log data.
      • Merge sockets used for transmitting prometheus and telegraf metrics when integrating with Prometheus and Telegraf.

#9. v6.6.2 [2024/08/01]

#9.1 Beta Feature

  • AutoMetrics
    • Support aligning timestamps of request and response metrics within the same session to help AIOps systems better perform root cause analysis (thanks to pegasusljn: FR (opens new window)).

#9.2 Stable Feature

  • AutoTracing
    • Optimize default values for NTP clock offset (host_clock_offset_us) and network delay (network_delay_us) configuration parameters used in network span tracing to reduce mismatch probability.

#10. v6.6.1 [2024/07/18]

#10.1 Beta Feature

  • AutoTagging
    • Correctly tag Universal Tag for loopback NIC traffic on K8s Nodes.

#10.2 Stable Feature

  • AutoTracing
    • Add URL masking capability for HTTP protocol, enable Redis protocol masking by default.
  • AutoTagging
    • Support synchronizing Volcano Engine resource tags, documentation.
    • Stop synchronizing Pods in K8s Evicted state to reduce resource overhead.
  • Integration
    • Optimize mapping of schema/target and other fields in OTel Span to l7_flow_log, documentation.
  • Agent
    • Support aggregated collection of traffic from multiple member physical NICs of an Open vSwitch bond interface.

#11. v6.6.0 [2024/07/04]

#11.1 Backward Incompatible Change

#Functions Response Size (Byte) Download Time
Before 450,000 21.9M 6.16s
After 450,000 3.07M 0.78s

#11.2 Beta Feature

  • AutoTagging
    • Support synchronizing Volcano Engine resource tags, documentation.
  • Agent
    • Support aggregated collection of traffic from multiple member physical NICs of an Open vSwitch bond interface.