RSoC: Implementing ptrace for Redox OS - part 5

By jD91mZM2 on Monday, July 22, 2019

Introduction

This week I’ve decided to skip trying to get GDB working for now (there are so many issues it’ll take forever to solve them), and instead decided to finally give focus to the final concerns I had about ptrace. Most changes this week was related to getting decent behavior of child processes, although the design feels… suboptimal, somehow (not sure why), so I feel I must be able to improve it better later.

Another change was security: Tracers running as a non-root user can now in addition to only tracing processes running as the same user, only trace processes that are directly or indirectly children of the tracer. In the future this can easily be allowed with some kind of capability, but currently in Redox there isn’t a capability-like system other than the simple (but really powerful) namespacing system which sadly I don’t think can be used for this.

Catching child processes with ptrace

The first and most pressing issue was that there needed to be a way to catch child processes (also threads, Redox makes no distinction) from ptrace. It should be possible to both stop the child process, or let it keep running uninterupted.

My design revolves around an event system: When a child process is created it sends an event with the PID, as well as pauses the child until we’ve determined if we want to trace it. Events interrupt any wait, but returns Ok(0) to signify that the breakpoint may not have been reached yet. Then one can read from the proc:<pid>/trace file handle to obtain a few events at a time, which will also clear them. Setting a breakpoint while an event is not yet read, will not wait for that breakpoint to be reached, but rather return Ok(0) immediately.

With this design, you will be able to write code like this:

let mut written = tracer.write(&[PTRACE_SYSCALL])?;
while written == 0 {
    let mut event = PtraceEvent::default();
    // Assert we read exactly one event, as we know there must be at
    // least one because of `written == 0`, and we only have storage
    // for one event (room for optimization: you *can* read more at
    // once)
    assert_eq!(tracer.read(&mut event)?, mem::size_of::<PtraceEvent>());

    // TODO: Handle event

    // Retry the wait, the tracer is still running at this point and
    // may or may not have reached the breakpoint we set.
    written = tracer.write(&[PTRACE_WAIT])?
}

The implementation was actually quite simple, which was actually somewhat of a goal for this design! (Redox is a microkernel, remember?)

Events are sent to a VecDeque that handles them. If the queue was previously empty, send a EVENT_READ notification to the event: scheme. Either way, notify the tracer file handle waiting for a breakpoint to be reached.

I love the WaitCondition structure, it makes blocking and notifying so easy :)

Here you can see the code

Catching signals

Another change I made was related to signals. It’s tricky to come up with a design that just feels right, and I didn’t really succeed here. Let’s start with the critera though. I’m not very used to ptrace so I am far from an expert in even the simple task of giving criteria, but I’ll start with the most basic: The design has to be able to trace system calls and instructions ran in signal handlers. The design also has to be able to obtain the actual signal number, as well as run the program normally up to a signal.

My design is requires 2 new constants:

PTRACE_EVENT_SIGNAL is an event similar to the child-process event above, but on each signal we only send the signal number.
PTRACE_SIGNAL for running the program until any signal is reached, which in case it stops before the signal handler is invoked. So if the child sends a signal to itself, it will be somewhere inside the SYS_KILL syscall.

Turns out these simple criteria are still pretty tricky to meet for two reasons

Firstly, as a signal may happen inside a syscall, they will break the unwritten rule that each syscall will raise 2 breakpoints in order. If we take the example above about a program killing itself, we’ll see the following behavior

(pre-)syscall: SYS_KILL
(pre-)syscall: <inside the signal handler>
(post-)syscall: <inside the signal handler>
...
(pre-)syscall: SYS_SIGRETURN
(post-)syscall: SYS_KILL

This is annoying mostly because the program can’t actually detect whether it’s the event is pre or post syscall. So it’ll have to make an intelligent guess using the signal event, which I guess could work. But this is suboptimal and I have yet to solve this somehow.

Secondly, it is ~~possible~~, nay, probable that a signal is handled using either SIG_DFL or SIG_IGN. Both of these are handled in the kernel, and should probably not let you control. Plus, even I wanted to allow this, the current way of setting breakpoints only works on user-triggered syscalls (which is probably for the best). So no signal breakpoint will be caused then, only a signal event. And you can’t abort the signal.

I think something to set as a goal with signals is to hit two birds with one stone and use them for handling the int3 instruction which is used by debuggers to set breakpoints. That instruction causes a SIGTRAP signal, something which I think could be used. Somehow there has to be a way to inject code into a program (on Linux you search memory maps for free space and inject machine code there, see this awesome article)

Once again, I’d be extremely thankful for any kind of feedback, even negative. I’ll go read the Linux man page on ptrace a few times now - we’re at that point.

RFC changes