View Data
This document was translated by ChatGPT
#1. Retrieve Profile for a Specific Process
#1.1 Grafana Panel
After selecting a process name in app_service
, the display effect in Grafana is as follows:
On-CPU - Process
#1.2 API
Profile query API example:
# Confirm the listening IP and port of deepflow-server
deepflow_server_node_ip=FIXME # Remember to modify
port=$(kubectl get --namespace deepflow -o jsonpath="{.spec.ports[0].nodePort}" services deepflow-server)
# Request the API of deepflow-server
curl -X POST http://${deepflow_server_node_ip}:$port/v1/profile/ProfileTracing \
-H 'Content-Type: application/json' \
-d '{
"app_service": "deepflow-agent",
"profile_language_type": "eBPF",
"profile_event_type": "on-cpu",
"tag_filter": "",
"time_start": 1708479421,
"time_end": 1708480321
}'
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Explanation of API request parameters:
- app_service: Process name
- profile_language_type: Use
eBPF
when retrieving eBPF Profile data - profile_event_type: For eBPF On-CPU Profile data, set the value to
on-cpu
- tag_filter: When process names conflict, other tags can be used for filtering
- For example,
"tag_filter": "pod_cluster='prod-cluster' AND pod_ns='app'"
- For example,
- time_start, time_end: Time range
The profile.in_process
table supports the following tag_filter
fields:
Name | DisplayName | Description |
---|---|---|
_id | UID | |
time | Time | Round end_time to seconds. |
region | Region | |
az | Availability Zone | |
host | VM Hypervisor | Host running virtual machine. |
chost | Cloud Host | Including virtual machines |
vpc | VPC | |
router | Router | |
dhcpgw | DHCP Gateway | |
lb | Load Balancer | |
lb_listener | Load Balancer Listener | |
natgw | NAT Gateway | |
redis | Redis | |
rds | RDS | |
pod_cluster | K8s Cluster | |
pod_ns | K8s Namespace | |
pod_node | K8s Node | |
pod_ingress | K8s Ingress | |
pod_service | K8s Service | |
pod_group_type | K8s Workload Type | |
pod_group | K8s Workload | Such as Deployment |
pod | K8s POD | |
service | Service | Deprecated,please use pod_service |
auto_instance_type | Auto Instance Type | The type of 'auto_instance'. |
auto_instance | Auto Instance Tag | The instance of IP |
auto_service_type | Auto Service Type | The type of 'auto_service'. |
auto_service | Auto Service Tag | On the basis of 'auto_instance' |
gprocess | Process | |
host_ip | VM Hypervisor | The management IP address of VM Hypervisor. |
host_hostname | VM Hypervisor | The hostname of VM Hypervisor. |
chost_ip | Cloud Host | The primary IP address of Cloud Host. |
chost_hostname | Cloud Host | The hostname of Cloud Host. |
pod_node_ip | K8s Node | The primary IP address of K8s Node. |
pod_node_hostname | K8s Node | The hostname of K8s Node. |
k8s.label | K8s Label | |
k8s.annotation | K8s Annotation | |
k8s.env | K8s Env | |
cloud.tag | Cloud Tag | |
ip | IP Address | |
is_ipv4 | IPv4 Flag | |
app_service | Application Service | |
app_instance | Application Instance | |
process_id | Process ID | |
trace_id | TraceID | |
span_name | Span Name | |
vtap | DeepFlow Agent | Deprecated,please use agent. |
agent | DeepFlow Agent | |
profile_value_unit | 单位 | |
profile_event_type | 剖析类型 | |
profile_create_timestamp | 聚合时间 | |
profile_in_timestamp | 写入时间 | |
profile_language_type | 语言类型 | |
profile_id | ProfileID |
generate from csv file: in_process.en
Example of API return result:
{
"OPT_STATUS": "SUCCESS",
"DESCRIPTION": "",
"result": {
"functions": [
"deepflow-agent",
"[t] platform-synchr",
"[k] entry_SYSCALL_64_after_hwframe",
// ...
"[l] __write"
],
"function_types": [
"P",
"T",
"K",
// ...
"K"
],
"function_values": {
"columns": ["self_value", "total_value"],
"values": [
[0, 640352895],
[0, 6292923],
[0, 46848438],
// ...
[0, 1797978]
]
},
"node_values": {
"columns": ["function_id", "parent_node_id", "self_value", "total_value"],
"values": [
[0, -1, 0, 640352895],
[1, 0, 0, 6292923],
[2, 1, 0, 1444443],
// ...
[3, 2, 0, 10101]
]
}
},
"debug": null
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
Explanation of API return result:
- functions: Function names
- function_types: Function types
[t] thread_name
: Thread, only appears in the second layer of the flame graph[k] function_name
: Linux kernel functions, CUDA dynamic library functions (libcuda (opens new window), libcublas (opens new window), etc.)[l] function_name
: Functions in dynamic librariesfunction_name
: Represents business functions of the application$app_service
: The top node of the flame graph, named after the process name- Additionally, when the function name translation fails, it may appear in one of the following forms:
[/tmp/perf-29887.map]
: The file name of the Java process symbol file for process number 29887 in square brackets, the function address was not found in this file. Java process symbol files are automatically generated periodically, usually because the function was not loaded when the symbol file was generated.[/lib/ld-musl-x86_64.so.1]
: The file path of the dynamic library (withso
) in square brackets, the function address belongs to this file but failed to translate, usually due to the symbol table being trimmed.[/usr/local/bin/kube-apiserver]
: The file path of the executable file in square brackets, the function address belongs to this file but failed to translate, usually due to the symbol table being trimmed.[unknown] 0x0000000003932388
: In all other cases, when an address cannot be successfully translated into a function name, it is displayed as such. Specifically, when the third layer function address of the flame graph (usually the entry function of the thread) fails to translate into a function name, it is displayed as[unknown start_thread?]
.
- function_values: CPU duration of the function, in microseconds (us).
- node_values: CPU duration of the function as a node, in microseconds (us).
- function_id: Unique identifier of the function
- parent_node_id: Unique identifier of the parent node of the function in the flame graph
- total_value: CPU duration of the function, in microseconds (us).
- On-CPU Profile: This value indicates the CPU time spent by the function
- Off-CPU Profile: This value indicates the CPU waiting time of the function
- self_value: The
net
CPU duration of the function as a leaf node (bottom-level function), in microseconds (us).- The difference between On-CPU and Off-CPU is the same as above
Using the API return result, you can draw a CPU flame graph for the specified process.
#2. Retrieve Profile for a Specific Host
#2.1 Grafana Panel
After selecting Total
in app_service
, the display effect in Grafana is as follows (the flame graph has only three layers):
On-CPU - Host
#2.2 API
提示
Currently, only On-CPU Profile supports querying the overall data of the host.
When the request parameter carries "app_service": "Total"
, you can obtain a special On-CPU Profile data named Total
, which is a Profile of all processes on a host, detailed to the thread level. It can be used to quickly locate bottleneck processes and threads when the On-CPU regex
is not configured for a certain process. An example of the return result at this time:
{
"OPT_STATUS": "SUCCESS",
"DESCRIPTION": "",
"result": {
"functions": [
"Total",
"[p] java",
// ...
"[t] DefaultTimer10-"
],
"function_types": [
"H",
"P",
// ...
"T"
],
"function_values": {
"columns": ["self_value", "total_value"],
"values": [
[0, 4563875153],
[4616160, 325630360],
// ...
[6565660, 6565660]
]
},
"node_values": {
"columns": ["function_id", "parent_node_id", "self_value", "total_value"],
"values": [
[0, -1, 0, 4563875153],
[1, 0, 4616160, 325630360],
// ...
[2, 1, 6565660, 6565660]
]
}
},
"debug": null
}
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Additional explanation of functions in the above return result:
$app_service
: The top node of the flame graph, the name is fixed as Total[p] name
: The name of a process[t] name
: The name of a thread, its parent node is a[p] name
type node, indicating the process to which this thread belongs
Using the API return result, you can draw an On-CPU flame graph for the specified host.
#3. About Function Type
Function Type | Meaning | Profile Event Type | Characteristics |
---|---|---|---|
O | Object type | mem-* | Leaf node of Memory Profile |
H | Cloud host | * | Root node equal to Total |
P | Process | * | Starts with [p] , and root nodes not equal to Total |
T | Thread | * | Starts with [t] |
K | Kernel function | * | Starts with [k] |
C | CUDA driver function | * | Starts with [c] |
L | Dynamic library function | * | Starts with [l] |
? | Unknown function | * | Other starts with [ |
A | Application function | * | Functions other than the above |