During this weeks Hackathon at Grafana Labs, I tried to get delve's eBPF tracing to work using a remote-controlled agent. To do this, I took their trace.bpf.c eBPF function that extracts the passed parameters and returns them to a ringbuffer.
Turns out, just compiling this didn't work. The validator always refused to load the program with the following error:
dereference of modified ctx ptr r0I found the article Ebpf: Dereference of Modified Ctx Ptr Disallowed which told my why the validator refuses to load the program (arbitrary memory access at offsets from the ctx pointer are not allowed) but doesn't offer any fixes. The most likely cause for these issues is the compiler optimizing the output in a way that no longer passes validation.
By removing different parts of the module, I narrowed the issue down to this switch statement:
__always_inline void get_value_from_register(struct pt_regs *ctx, void *dest,
int reg_num) {
switch (reg_num) {
case 0: // RAX
__builtin_memcpy(dest, &ctx->ax, sizeof(ctx->ax));
break;
case 1: // RDX
__builtin_memcpy(dest, &ctx->dx, sizeof(ctx->dx));
break;
// ...
}
}Whenever this was included in the program, it refused to load. In this case the ctx pointer isn't actually modified, it's just accessed at different offsets. So how come this still caused the validator to fail?
In a message on the iovisor-dev mailing list I found my answer! Yonghong Song explains:
Now I remembered that we had this issue before in bcc. it is a compiler optimization likes this:
if (...) *(ctx + 60) else *(ctx + 56)The compiler translates it to
if (...) ptr = ctx + 60 else ptr = ctx + 56 *(ptr + 0)
In my case, this would look like this (in pseudo c code):
reg = &ctx
switch (reg_num) {
case 0: // RAX
reg += ax_offset
break;
case 1: // RDX
reg += dx_offset
break;
}
__builtin_memcpy(dest, reg, some_const); // all fields have the same size so this param is optimized away
Luckily, Yonghong also offers a solution. By sprinkling __asm__
__volatile__("" : : : "memory"); after the offending memory operations, we force
the compiler to merge diverging paths at this point. This avoids the
optimization taken by LLVM and results in a valid program!
__always_inline void get_value_from_register(struct pt_regs *ctx, void *dest,
int reg_num) {
switch (reg_num) {
case 0: // RAX
__builtin_memcpy(dest, &ctx->ax, sizeof(ctx->ax));
__asm__ __volatile__("" : : : "memory");
break;
case 1: // RDX
__builtin_memcpy(dest, &ctx->dx, sizeof(ctx->dx));
__asm__ __volatile__("" : : : "memory");
break;
// ...
}
}