eBPF BTF (BPF Type Format) Programming Guide
1. Introduction to BTF
What is BTF?
BTF (BPF Type Format) is a type metadata format provided by the Linux kernel, used to describe type information for eBPF programs and kernel data structures.
Core Advantages of BTF
- ✅ Compile Once, Run Everywhere (CO-RE): No need to recompile on target machines
- ✅ Kernel Structure Access: Safely read kernel data structures
- ✅ Type Safety: Compile-time type compatibility checking
- ✅ Debug Friendly: Provides rich type information
Problems Solved by BTF
Before BTF, eBPF programs faced the following problems:
task_struct Structure Example (Simplified)
task_struct is the core data structure in the Linux kernel that describes processes. Its size and layout can differ across kernel versions.
Example 1: task_struct in Linux 5.10 Kernel (Simplified)
struct task_struct {
struct thread_info thread_info; // Offset: 0 (Size: 16 bytes)
unsigned int __state; // Offset: 16 (Size: 4 bytes)
void *stack; // Offset: 24 (Size: 8 bytes)
refcount_t usage; // Offset: 32 (Size: 4 bytes)
unsigned int flags; // Offset: 36 (Size: 4 bytes)
// ... hundreds of bytes of other fields omitted ...
pid_t pid; // Offset: 1232 (Size: 4 bytes) ⬅️ Here!
pid_t tgid; // Offset: 1236 (Size: 4 bytes)
struct task_struct *real_parent; // Offset: 1256 (Size: 8 bytes)
struct task_struct *parent; // Offset: 1264 (Size: 8 bytes)
char comm[16]; // Offset: 1784 (Size: 16 bytes)
struct mm_struct *mm; // Offset: 1848 (Size: 8 bytes)
// ... more fields ...
};Example 2: task_struct in Linux 6.1 Kernel (Simplified)
struct task_struct {
struct thread_info thread_info; // Offset: 0 (Size: 16 bytes)
unsigned int __state; // Offset: 16 (Size: 4 bytes)
void *stack; // Offset: 24 (Size: 8 bytes)
refcount_t usage; // Offset: 32 (Size: 4 bytes)
unsigned int flags; // Offset: 36 (Size: 4 bytes)
// ⚠️ Version 6.1 added some security-related fields
unsigned int ptrace; // Offset: 40 (New!)
int on_rq; // Offset: 44 (New!)
// ... other fields omitted ...
pid_t pid; // Offset: 1368 (Size: 4 bytes) ⬅️ Offset changed!
pid_t tgid; // Offset: 1372 (Size: 4 bytes)
struct task_struct *real_parent; // Offset: 1392 (Size: 8 bytes) ⬅️ Also changed!
struct task_struct *parent; // Offset: 1400 (Size: 8 bytes)
char comm[16]; // Offset: 1920 (Size: 16 bytes) ⬅️ Also changed!
struct mm_struct *mm; // Offset: 1984 (Size: 8 bytes)
// ... more fields ...
};Offset Calculation Example
Suppose we want to read the pid field:
// ❌ Wrong way: Hard-coded offsets
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
int pid;
// On Linux 5.10
bpf_probe_read(&pid, sizeof(pid), (void *)task + 1232); // pid at offset 1232
// But on Linux 6.1, the same code reads the wrong location!
bpf_probe_read(&pid, sizeof(pid), (void *)task + 1232); // ❌ Should be 1368!BTF's Solution
// BTF + CO-RE approach - Automatically handles offsets
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
pid_t pid = BPF_CORE_READ(task, pid); // ✅ Automatic adaptation!Advantages:
- ✅ Compiler automatically calculates correct offsets
- ✅ Runtime adaptation to different kernel versions
- ✅ Type-safe access method
2. BTF Core Concepts
2.1 vmlinux.h
vmlinux.h is a header file containing all kernel data structure definitions, generated by bpftool from BTF information.
Generating vmlinux.h
# Generate from current kernel
bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
# Check if kernel supports BTF
ls /sys/kernel/btf/vmlinuxAdvantages of vmlinux.h
// Traditional way - Need to include multiple header files
#include <linux/sched.h>
#include <linux/fs.h>
#include <linux/mm.h>
// ... potentially dozens of header files
// BTF way - Only one header file needed
#include "vmlinux.h" // ✅ Contains all kernel definitions2.2 BPF_CORE_READ Macro
BPF_CORE_READ is the core macro of CO-RE, used to safely read kernel structure fields.
Syntax Format
// Basic usage
BPF_CORE_READ(ptr, field)
// Single-level access equivalent to traditional pointer access
ptr->field
// Multi-level nested access
BPF_CORE_READ(ptr, field1, field2, field3)
// Multi-level nested access equivalent to traditional pointer access
ptr->field1->field2->field3Usage Examples
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
// Read single field
pid_t pid = BPF_CORE_READ(task, pid);
// Read nested field
pid_t ppid = BPF_CORE_READ(task, real_parent, pid);
// Equivalent to
// task->real_parent->pid2.3 BPF_CORE_READ_INTO() Macro
BPF_CORE_READ_INTO (Read to Variable)
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
pid_t ppid;
// Read value into specified variable
BPF_CORE_READ_INTO(&ppid, task, real_parent, pid);2.4 BPF_CORE_READ_STR_INTO() Macro
BPF_CORE_READ_STR_INTO (Read String)
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
char comm[16];
// Read process name
BPF_CORE_READ_STR_INTO(comm, task, comm);2.5 bpf_probe_read vs bpf_core_read vs BPF_CORE_READ Detailed Explanation
These three are different ways to read memory data in eBPF and are easily confused. Let's compare them in detail:
Core Differences Overview
| Feature | bpf_probe_read | bpf_core_read | BPF_CORE_READ |
|---|---|---|---|
| Type | Helper function | Helper function | Macro |
| Definition Location | Kernel | Kernel (inline function) | libbpf header file |
| CO-RE Support | ❌ No | ✅ Yes | ✅ Yes |
| Type Safety | ❌ Weak (void *) | ✅ Strong | ✅ Strong |
| Use Case | Read arbitrary memory | Read single field | Read nested fields |
| Recommendation | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
1. bpf_probe_read - Traditional Memory Read Function
Function Prototype:
long bpf_probe_read(void *dst, u32 size, const void *unsafe_ptr);Characteristics:
- Lowest-level memory read function
- Requires manual size specification
- No type checking
- Does not support CO-RE
Usage Example:
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
struct task_struct *parent;
pid_t ppid;
// Read real_parent pointer
bpf_probe_read(&parent, sizeof(parent), &task->real_parent);
// Read parent->pid
bpf_probe_read(&ppid, sizeof(ppid), &parent->pid);Problems:
- ❌ Need to know exact field offsets
- ❌ Nested access requires multiple calls
- ❌ No CO-RE, cannot work across kernel versions
- ❌ Verbose code
Applicable Scenarios:
- Reading arbitrary memory addresses (like user-space addresses)
- Scenarios unrelated to BTF/CO-RE
- Need precise control over read behavior
2. bpf_core_read - CO-RE Inline Function
Function Prototype:
static __always_inline int bpf_core_read(void *dst, int sz, const void *src);Characteristics:
- Inline function provided by kernel
- Supports CO-RE relocation
- Requires manual size specification
- Can only read single fields
Usage Example:
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
pid_t pid;
// Read single field - Correct usage
bpf_core_read(&pid, sizeof(pid), &task->pid); // ✅
// Read nested field - Wrong usage!
// bpf_core_read(&ppid, sizeof(ppid), &task->real_parent->pid); // ❌ Compilation error!Limitations:
- ⚠️ Cannot directly access nested fields (like
task->real_parent->pid) - ⚠️ Need to manually specify size
- ⚠️ Still relatively verbose
Correct Nested Access Method:
// Need to read in two steps
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
struct task_struct *parent;
pid_t ppid;
// Step 1: Read parent pointer
bpf_core_read(&parent, sizeof(parent), &task->real_parent);
// Step 2: Read parent->pid
bpf_core_read(&ppid, sizeof(ppid), &parent->pid);Applicable Scenarios:
- Reading single simple fields
- Need CO-RE but don't want to use macros
- Scenarios with extreme performance requirements
3. BPF_CORE_READ - Recommended CO-RE Macro ⭐⭐⭐⭐⭐
Macro Definition (Simplified):
#define BPF_CORE_READ(src, a, ...) \
({ \
/* Record access path at compile time */ \
/* Generate CO-RE relocation information */ \
/* Return read value */ \
})Characteristics:
- Macro provided by libbpf
- Full CO-RE support
- Supports nested field access
- Automatically infers type and size
- Most concise code
Usage Example:
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
// Read single field
pid_t pid = BPF_CORE_READ(task, pid);
// Read nested field - One line! ✅
pid_t ppid = BPF_CORE_READ(task, real_parent, pid);Advantages:
- ✅ Most concise code (nested access in one line)
- ✅ Type safe (compile-time checking)
- ✅ Automatic offset handling
- ✅ Full CO-RE support
Applicable Scenarios:
- Reading kernel structure fields (Recommended!)
- Need CO-RE support
- Want concise, readable code
Practical Comparison: Reading Parent Process PID
Scenario: Read task->real_parent->pid
Method 1: bpf_probe_read (Not Recommended)
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
struct task_struct *parent;
pid_t ppid;
// Need 3 steps, 8 lines of code
bpf_probe_read(&parent, sizeof(parent),
(void *)task + offsetof(struct task_struct, real_parent));
bpf_probe_read(&ppid, sizeof(ppid),
(void *)parent + offsetof(struct task_struct, pid));
// ❌ Problems:
// 1. Need to know offsetof, but offsetof may be inaccurate in eBPF
// 2. No CO-RE, cannot work across kernel versions
// 3. Verbose code, error-proneMethod 2: bpf_core_read (Usable, but Verbose)
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
struct task_struct *parent;
pid_t ppid;
// Need 2 steps, 5 lines of code
bpf_core_read(&parent, sizeof(parent), &task->real_parent); // ✅ CO-RE
bpf_core_read(&ppid, sizeof(ppid), &parent->pid); // ✅ CO-RE
// ⚠️ Drawbacks:
// 1. Need intermediate variable parent
// 2. Need two function calls
// 3. Manual size specificationMethod 3: BPF_CORE_READ (Recommended!) ⭐⭐⭐⭐⭐
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
// Only need 1 line! ✅
pid_t ppid = BPF_CORE_READ(task, real_parent, pid);
// ✅ Advantages:
// 1. Concise and clear code
// 2. Full CO-RE support
// 3. Automatic type and size handling
// 4. Nested access in one callCommon Misconceptions
Misconception 1: Confusing bpf_core_read Function and BPF_CORE_READ Macro
// ❌ Wrong: Using macro as function
bpf_core_read(&ppid, sizeof(ppid), task->real_parent->pid); // Compilation error!
// ✅ Correct: Use macro
pid_t ppid = BPF_CORE_READ(task, real_parent, pid);Misconception 2: Directly Accessing Nested Fields in bpf_core_read
// ❌ Wrong: bpf_core_read doesn't support nested access
pid_t ppid;
bpf_core_read(&ppid, sizeof(ppid), &task->real_parent->pid); // ❌
// ✅ Correct: Use BPF_CORE_READ macro
pid_t ppid = BPF_CORE_READ(task, real_parent, pid); // ✅Misconception 3: Using BPF_CORE_READ Where bpf_probe_read_user Should Be Used
// ❌ Wrong: BPF_CORE_READ is for kernel structures, cannot read user-space memory
char *user_str = "user space string";
char buf[64];
// BPF_CORE_READ(buf, user_str); // ❌ Wrong!
// ✅ Correct: Use bpf_probe_read_user_str to read user-space strings
bpf_probe_read_user_str(buf, sizeof(buf), user_str); // ✅Selection Guide
Decision Tree:
Need to read memory data
│
├─ Reading user-space memory?
│ └─ Yes → Use bpf_probe_read_user / bpf_probe_read_user_str
│
└─ Reading kernel structures?
│
├─ Need CO-RE support?
│ ├─ No → Use bpf_probe_read (not recommended unless special reason)
│ └─ Yes ↓
│
├─ Accessing nested fields?
│ ├─ Yes → Use BPF_CORE_READ macro ⭐⭐⭐⭐⭐ (Recommended!)
│ └─ No → Use bpf_core_read or BPF_CORE_READ
│
└─ Conclusion: Default to BPF_CORE_READ macro!Best Practice Recommendations
- Prefer BPF_CORE_READ macro
- Avoid using bpf_probe_read to read kernel structures
- Only use for reading user-space memory
- Or in scenarios that completely don't need CO-RE
- bpf_core_read function has few use cases
- Only use when special control is needed
- In most cases, BPF_CORE_READ macro is sufficient
4. Common Incorrect Usage Comparison
Error Example 1: Direct Pointer Access
// ❌ Wrong: Direct access (will cause verifier failure)
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
pid_t ppid = task->real_parent->pid; // Verifier error!Error Reason:
- eBPF verifier cannot verify pointer validity
- Different kernel versions have different offsets
Error Example 2: Using bpf_probe_read
// ❌ Not recommended: Using bpf_probe_read (can work, but not best practice)
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
struct task_struct *parent;
pid_t ppid;
bpf_probe_read(&parent, sizeof(parent), &task->real_parent);
bpf_probe_read(&ppid, sizeof(ppid), &parent->pid);Problems:
- Verbose code
- No CO-RE portability
- Need to manually handle each level of pointer
Correct Example
// ✅ Correct: Use BPF_CORE_READ
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
pid_t ppid = BPF_CORE_READ(task, real_parent, pid);
// ✅ Better: Use bpf_get_current_task_btf()
struct task_struct *task = (struct task_struct *)bpf_get_current_task_btf();
pid_t ppid = BPF_CORE_READ(task, real_parent, pid);2.6 bpf_get_current_task_btf() Function
This is a helper function that returns a BTF-typed pointer, safer than bpf_get_current_task().
Comparison of Two Ways to Get task_struct
| Method | Function | Return Type | Type Safety | Recommendation |
|---|---|---|---|---|
| Traditional way | bpf_get_current_task() | void * (requires casting) | ❌ Weak | Not recommended |
| BTF way | bpf_get_current_task_btf() | struct task_struct * | ✅ Strong | Recommended |
Usage Example
// Method 1: Traditional way
struct task_struct *task = (struct task_struct *)bpf_get_current_task();
pid_t ppid = BPF_CORE_READ(task, real_parent, pid);
// Method 2: BTF way (Recommended)
struct task_struct *task = (struct task_struct *)bpf_get_current_task_btf();
pid_t ppid = BPF_CORE_READ(task, real_parent, pid);Key Differences:
bpf_get_current_task_btf()returns a pointer carrying BTF type information- eBPF verifier can perform stricter type checking
- Better error messages and debugging experience
3. Practical Example: Monitoring open System Call
Complete eBPF Kernel Program
File: btf.bpf.c
// Example code would go hereUser-Space Program
File: btf.c
// Example code would go here4. Common Questions
Q1: What's the relationship between BTF and CO-RE?
Answer:
- BTF: Type metadata format (data format)
- CO-RE: Compile Once, Run Everywhere technology (application using BTF)
- Relationship: CO-RE depends on type information provided by BTF
Q2: Do all kernels support BTF?
Answer: No, the following conditions must be met:
- Linux kernel >= 5.2 (BTF support)
- Kernel compiled with
CONFIG_DEBUG_INFO_BTF=yenabled - Check method:
ls /sys/kernel/btf/vmlinux
Q3: What's the difference between bpf_probe_read, bpf_core_read, and BPF_CORE_READ?
Answer: These three are different ways to read memory data in eBPF. For detailed comparison, please refer to Section 2.5.
Quick Summary:
| Feature | bpf_probe_read | bpf_core_read | BPF_CORE_READ |
|---|---|---|---|
| Type | Function | Function | Macro |
| CO-RE Support | ❌ No | ✅ Yes | ✅ Yes |
| Nested Access | ❌ Need multiple calls | ❌ Need multiple calls | ✅ One line |
| Type Safety | ❌ Weak | ✅ Strong | ✅ Strong |
| Recommendation | ⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
Decision Guide:
- 🥇 Prefer BPF_CORE_READ macro: Read kernel structure fields (especially nested fields)
- 🥈 Occasionally use bpf_core_read function: Single field and need special control
- 🥉 Avoid bpf_probe_read: Only for reading user-space memory or scenarios that completely don't need CO-RE
Example:
// ⭐⭐⭐⭐⭐ Recommended: BPF_CORE_READ macro
pid_t ppid = BPF_CORE_READ(task, real_parent, pid); // One line!
// ⭐⭐⭐ Usable: bpf_core_read function
bpf_core_read(&parent, sizeof(parent), &task->real_parent);
bpf_core_read(&ppid, sizeof(ppid), &parent->pid); // Need two steps
// ⭐⭐ Not recommended: bpf_probe_read
bpf_probe_read(&parent, sizeof(parent), &task->real_parent);
bpf_probe_read(&ppid, sizeof(ppid), &parent->pid); // No CO-REQ4: Why sometimes use bpf_get_current_task(), and sometimes bpf_get_current_task_btf()?
Answer:
| Function | Return Type | Kernel Requirement | Recommendation |
|---|---|---|---|
bpf_get_current_task() | unsigned long (requires casting) | All versions | High compatibility |
bpf_get_current_task_btf() | struct task_struct * | >= 5.14 | Type safe |
Recommendation:
- If only supporting new kernels (>= 5.14): Use
bpf_get_current_task_btf() - If need to support old kernels: Use
bpf_get_current_task()+ casting