Programming my own locking and why I shouldn't have

Part 2: inspiration

Linux' wait queue

In the previous post I demonstrated the limitations of the condition variable in RIOT OS and teased that the Linux kernel has a solution. Indeed, Linux has the wait queue, which has the same purpose as the condition variable, only it does not require a mutex to ensure [reading condition data + going to sleep] is atomic w.r.t. [writing the condition data + signaling the condition variable]. In fact, that is not even required to be atomic and also isn't.

Previously we demonstrated the requirement for atomicity by looking at some examples. Now let's first have a look at a flattened out, pseudocode comparison between condition variable and wait queue usage. By flattened out I mean, locking code is inlined in application code instead of neatly packet in a function. We'll see in a second why. Also, we assume the following:

each code line is atomic w.r. to any other line
lines may not be re-ordered
lines from different contexts (including ISR!) may be arbitrarily interleaved

So, for example, a thread executing the lines [A, B] and another thread executing [C, D] in parallel may result in e.g. [A, C, D, B] or [A, C, B, D] but not in [B, A, C, D]. The reason why we don't consider ISRs to be atomic is that we can replace an ISR with a thread and still have the same concurrency guarantees.

RIOT condition variables without mutex (ISR signaling):


WAKER:
  cond_data = VALUE
  wake_up_waiters(cond_var)

WAITER:
  while (cond_data != VALUE):
      enqueue(cond_var)
      yield()
  dequeue(wq)

First thing to notice is that we left out the mutex completely. Since the mutex is not locked in the ISR, it serves no purpose other than to clutter our representation. Furthermore:

enqueue(cond_var) removes the current thread from the scheduler runqueue and adds it to the condition variable's waiting thread list
yield() tells the scheduler to schedule another thread. Returning from this function is only possible if the current thread is in the scheduler's runqueue (or as soon as that happens).
dequeue(cond_var) removes the current thread from the condition variable's waiting list and adds it back to the scheduler's runqueue. If the thread is already removed, this has no effect.
wake_up_waiters(cond_var) removes ALL threads from the condition variable's waiting list and adds them back to the scheduler's runqueue.

The race condition we discussed previously happens if the waiter misses both condition data update and condition variable signaling:


WAKER:

  cond_data = VALUE
  wake_up_waiters(cond_var)


WAITER:
  while (cond_data != VALUE):


      enqueue(cond_var)
      yield()
  dequeue(wq)

Linux kernel wait queue


WAKER:
  cond_data = VALUE
  wake_up_waiters(wq)

WAITER:
  enqueue(wq)
  while (cond_data != VALUE):
      yield()
      enqueue(wq)
  dequeue(wq)

The waker code didn't change at all. However, the waiter code swaps the positions of the condition data check cond_data != VALUE and enqueue(wq). The condition check always happens after the current thread has been enqueued in the wait queue. In other words, the waiter "queries" the events of the waker in reverse order. This has the implication that we can miss either the condition data check or the wake-up signal, but never both. This concept is neatly explained in the SMP BARRIER PAIRING section of the memory barriers documentation of the Linux kernel.

Let's have a look at some possible execution scenarios to see why this works.

Waiter sees cond_data == VALUE and doesn't enter the loop:


WAKER:
  cond_data = VALUE
  wake_up_waiters(wq)


WAITER:


  enqueue(wq)
  while (cond_data != VALUE):
      yield()
      enqueue(wq)
  dequeue(wq)

yield() will return because the thread is back in the runqueue:


WAKER:


  cond_data = VALUE
  wake_up_waiters(wq)


WAITER:
  enqueue(wq)
  while (cond_data != VALUE):
            

      yield()
      enqueue(wq)
  dequeue(wq)

wake_up_waiters(wq) will put the waiter back into the runqueue:


WAKER:


  cond_data = VALUE

  wake_up_waiters(wq)


WAITER:
  enqueue(wq)
  while (cond_data != VALUE):

      yield()

      enqueue(wq)
  dequeue(wq)

I'll let it as an exercise for the reader to come up with other scenarios and convince themselves that the waiter will never miss the condition. For the next episode of this series, I'm going to (try to) implement a Linux-style wait queue for the RIOT OS.