In order words, it swaps gs both directly at the start of the syscall
handler, then swaps it back, and the at the end of the syscall handler.
I cannot tell for sure why this is necessary, but probably since some
interrupt handler will execute swapgs in the wrong order or something.
The reason for these types of rewrites, is that more recent Rust
compilers have started to deprecate naked functions that consist of more
than only a single asm block, as they can trigger all sorts of UB.
Previously there was a triple fault, due to a combination of reasons
(e.g. rsp and rbp being ordered in the struct and in the assembly).
Now, the locks will be held __all the way until the new context__ has
been switched to, which completely eliminates any possibility that the
"pcid fault" originates here.
While I am unsure whether this will work, this could also be an
opportunity to be able to remove CONTEXT_SWITCH_LOCK fully.
This is due to a warning in more recent compilers, which forbid anything
but a single inline assembly block, in naked functions. It does
unfortunately triple fault right now, but I hope I may be able to fix it
soon.
So, when I first introduced io_uring, it was not compiled with the
`multi_core` kernel feature, mainly to make development easier (I
thought). However, since io_uring allows multiple simultaneous system
calls, we cannot longer make the in-kernel contexts block, for example
when receiving a message from a pipe, if there can be multiple such
requests simultaneously.
This has required me to change WaitCondition into allowing multiple
simultaneous tasks; although, it introduces a potential race condition:
since a future can only return Pending and not block directly before
releasing the lock (condvar logic), we need some way to make sure that
nothing happens after the context finds out that it has to wait, and the
actual waiting. If a message is pushed in between, and the waker is
called (Context::unblock), just before it was going to block itself,
then we miss the message, and potentially cause a deadlock.
Fortunately, in order to block and unblock contexts, we need to
exclusively lock the context. So, what we can do to ensure that waking
while running is no longer a no-op, is to introduce a "wake flag", which
is set only if the context is currently running, and Runnable.
But, this still caused all weird kinds of hard-to-debug problems, with
arbitrary CPU exceptions and possibly memory corruption. The reason for
this, is that the context switching logic uses really unsafe operations,
which is why context switching (at the moment) requires an exclusive
lock. Before this commit, it would modify the `running` field after the
lock had been released, which obviously can cause a data race, when the
regular context waker code that is run within a system call, locks the
context but not the global switching lock.
The solution was to make sure that the locks were held, all the way
until the actual switching, which was done in assembly. There can still
be a race condition here, since it modifies memory containing registers
after the lock has been released, even if it may be behind &mut on
another context, which can be UB, but it has not contributed to any
actual bugs... yet.
* I have not yet done that rigorous testing, but it appears to work well
enough, and I have not encountered the bug after like 10 tries.
This solves a bug, that allows processes in different address spaces to
be the target of a futex wakeup call, even though that process is in
another address space!