AutoProfiling
This document was translated by ChatGPT
#1. AutoProfiling
By using eBPF to capture snapshots of an application's function call stack, DeepFlow can generate profiling flame graphs for any process, helping developers quickly pinpoint function performance bottlenecks. In addition to business functions, the function call stack can also display the time consumption of dynamic link libraries, language runtimes, and kernel functions. Furthermore, when collecting function call stacks, DeepFlow generates a unique identifier that can be associated with call logs, enabling the linkage between distributed tracing and function performance profiling.

DeepFlow's AutoProfiling
#2. Capabilities and Limitations
Supported eBPF profiling data types:
| Type | Supported Languages/Libraries | Community Edition | Enterprise Edition |
|---|---|---|---|
| on-cpu | Java | ✔ | ✔ |
| C/C++ | ✔ | ✔ | |
| Rust | ✔ | ✔ | |
| Golang | ✔ | ✔ | |
Python *** | ✔ | ✔ | |
| CUDA | ✔ | ✔ | |
Lua * | ✔ | ✔ | |
| off-cpu | Java | ✔ | |
| C/C++ | ✔ | ||
| Rust | ✔ | ||
| Golang | ✔ | ||
Python *** | ✔ | ||
| CUDA | ✔ | ||
Lua * | ✔ | ||
| on-gpu | CUDA * | ✔ | |
| mem-alloc | Java ** | ✔ | |
| Rust | ✔ | ||
Golang * | ✔ | ||
Python * *** | ✔ | ||
| mem-inuse | Rust | ✔ | |
| hbm-alloc | CUDA * | ✔ | |
| hbm-inuse | CUDA * | ✔ | |
| rdma | C/C++ * | ✔ |
Notes:
*: features in development**: The JVM running the Java program must have a symbol table, see check method***: Currently supports Python 3.10- Types:
- on-cpu: Time a function spends on the CPU
- off-cpu: Time a function waits for the CPU
- on-gpu: Time a function spends on the GPU
- mem-alloc: Total memory allocated by objects and the function call stack
- mem-inuse: Current memory usage of objects and the function call stack
- hbm-alloc: Total GPU memory allocated by objects and the function call stack
- hbm-inuse: Current GPU memory usage of objects and the function call stack
- Languages:
- Languages compiled into ELF format executables: Golang, Rust, C/C++
- Languages using the JVM: Java
- Interpreted languages: Python
Two prerequisites must be met to obtain profiling data:
- The application process must enable Frame Pointer or enable the Agent's DWARF stack unwinding capability
- Enable Frame Pointer (frame pointer register) for the application process:
- Compile C/C++:
gcc -fno-omit-frame-pointer - Compile Rust:
RUSTFLAGS="-C force-frame-pointers=yes" - Compile Golang: Enabled by default, no extra compile parameters needed
- Run Java:
-XX:+PreserveFramePointer- Enabling this parameter disables certain compiler optimizations. However, based on real-world measurements from Netflix (opens new window) and Brendan Gregg (opens new window), this configuration typically introduces less than 1% performance overhead. As a result, Netflix has been widely using it in production since 2015 to support daily performance analysis of its Java applications.
- Compile C/C++:
- For enabling the Agent's DWARF stack unwinding capability, please refer to the documentation
- Enable Frame Pointer (frame pointer register) for the application process:
- For compiled languages, ensure the symbol table is preserved during compilation
The Off-CPU profiling feature only collects the following call stacks:
- Call stacks where the process state is equal to
TASK_INTERRUPTIBLE(interruptible sleep) orTASK_UNINTERRUPTIBLE(uninterruptible sleep) when yielding the CPU - Call stacks excluding process 0 (Idle process)
- Call stacks containing at least one user-space function
- Call stacks where the CPU wait time is no more than 1 hour
#3. FAQ
#3.1 JVM Symbol Table Check
- Find the process ID of the Java process that requires memory profiling, denoted as
$pid - Check the location of the loaded
libjvm.sofor the process, denoted as$pathgrep libjvm.so /proc/$pid/maps1 - Check whether the file contains a symbol table
readelf -WS $path | grep symtab1