Please just use the lock. Don't make up your own locking crap. Really.
For almost a year now I've been working with the RIOT embedded operating system on a daily basis. At my previous work I fooled around with the Linux kernel. Nothing too serious, but since I had a LOT of free time on my hands I invested it into learning one or two things about locking. I have a weird kink for concurrent programming.
Anyway, one feature I'm really missing from the Linux kernel is an elegant way of signaling an event from both an interrupt routine or a kernel thread. In RIOT you have the condition variables, which are semantically identical to the pthread condition variables. However, you cannot use them to signal a condition from an interrupt routine. Well, technically you can but it's almost impossible without running into a race condition.
But before we get to ISR signaling, let's have a look at some canonical
usage example of a condition variable where both signaling and waiting are
happening in thread context. Below,
take_measurement()
is using a RIOT condition variable to
signal that a sensor measurement has been performed while
wait_for_critical_value()
is waiting for the measurement to
happen, but only continues if the value is past a certain threshold.
These two functions are, of course, called by different threads.
static uint64_t measurement;
mutex_t cond_lock = MUTEX_INIT;
cond_t cond = COND_INIT;
void take_measurement(void)
{
mutex_lock(&cond_lock);
measurement = measure();
cond_broadcast(&cond);
mutex_unlock(&cond_lock);
}
void wait_for_critical_value(void)
{
mutex_lock(&cond_lock);
while (measurement < THRESHOLD) {
/* cond_wait() unlocks the mutex and locks it again upon return */
cond_wait(&cond, &cond_lock);
}
mutex_unlock(&cond_lock);
}
The mutex bound to the condition variable plays two roles. Firstly, it ensures that writing the condition data is atomic w.r. to reading it. Secondly, and most importantly, it guarantees that writing the condition data + signaling the condition variable is atomic w.r. to reading from the condition data + going to sleep. To put it differently, without the mutex the following sequence of events could happen:
measurement
is read by the waiter and is below the threshold take_measurement()
is called and writes a value above
the threshold and calls cond_broadcast()
, but no one is waiting.cond_wait()
and goes to sleep, missing
the cond_broadcast()
and possibly sleeping forever. Now let's do the same thing with the signaling code in an interrupt routine:
static uint64_t measurement;
mutex_t cond_lock = MUTEX_INIT;
cond_t cond = COND_INIT;
void measurement_irq(void)
{
measurement = measure();
cond_broadcast(&cond);
}
void wait_for_critical_value(void)
{
mutex_lock(&cond_lock);
while (atomic_load_u64(&measurement) < THRESHOLD) {
cond_wait(&cond, cond_lock);
}
mutex_unlock(&cond_lock);
}
It shouldn't take long to notice something is off: firstly, we're mixing
mutex and atomics; and secondly, the mutex is a dummy. We only lock and unlock
it to satisfy the cond_wait()
API, but we're not pairing it in
the ISR. And we do this for a good reason: we may not call any function that
can lock in an interrupt routine, including mutex_lock()
. Since
measurement
is an integer larger than the native word size
(to best of my knowledge, 64-bit MCUs are uncommon), we need the atomic
call to ensure we're not reading some half-baked value.
Again, the main scope of the mutex is to guarantee that the waiter doesn't miss the signal. But since we're not locking in the ISR, the mutex is useless and the race condition mentioned above can happen.
Another thing we can do is to ditch the condition variable and use an initially locked mutex instead:
static uint64_t measurement;
mutex_t cond_lock = mutex_init_locked;
void measurement_irq(void)
{
measurement = measure();
/* we may unlock an already unlocked mutex */
mutex_unlock(&cond_lock);
}
void wait_for_critical_value(void)
{
while (atomic_load_u64(&measurement) < threshold) {
mutex_lock(&cond_lock);
}
}
This code is race-free, because a thread cannot possibly see an unlocked
mutex as locked (obviously), so we can never miss the signaling. Note, we
may unlock an already unlocked mutex, so the MUTEX_INIT_LOCKED
is not necessary, but it saves the waiter an extra mutex_lock()
.
However, we have another problem: what if there are multiple waiters? i.e.
what if wait_for_critical_value()
is called from multiple threads?
Picture this:
measurement
is below the threshold wait_for_critical_value()
is called by thread a and locks the mutex wait_for_critical_value()
is called by thread b and locks the mutex measurement_irq()
fires, sets measurement
above the threshold and unlocks the mutex Not good. But we can fix this too, by unlocking the mutex after we're satisfied with the condition:
void wait_for_critical_value(void)
{
while (atomic_load_u64(&measurement) < threshold) {
mutex_lock(&cond_lock);
}
/* signal other potential waiters */
mutex_unlock(&cond_lock);
}
This works, but it's not very elegant. Also, every time a thread wants to wait for the condition, it is very probable it will have to lock the mutex twice. And there's another, more subtle bug: we're assuming that each waiter waits for the same condition data. But what happens if a thread with lower priority waits for some different condition data? For example:
void wait_for_half_the_critical_value(void)
{
while (atomic_load_u64(&measurement) < threshold / 2) {
mutex_lock(&cond_lock);
}
/* signal other potential waiters */
mutex_unlock(&cond_lock);
}
As long as there's a higher priority thread waiting in wait_for_critical_value()
,
the lower priority thread calling wait_for_half_the_critical_value()
will never wake up.
To be honest, I never had to wait on a condition variable with different condition data checks, but it's nevertheless a valid use case which is handled well by the condition variable API. In any case: at this point, it's pretty obvious that we're abusing the mutex semantics.
In the following part, we're going to look at how the Linux kernel solves this, and (try) to implement something similar in RIOT.