Kernel-v4: Description

Architecture, concepts, and programming model of the preemptive, event-driven kernel-v4

Kernel-v4 is a preemptive, event-driven kernel for ARM Cortex-M microcontrollers.

It provides:

actor-based concurrency with priority-driven scheduling;
background tasks in thread mode;
a tick service for periodic scheduling;
kernel alarms for microsecond-level timing (RP2350 only at the moment);
message passing between interrupt handlers and actors;
semaphores for mutual exclusion.

Architecture Overview

Key properties:

all kernel code runs in handler mode as interrupt handlers, except background tasks, which run in thread mode;
the NVIC serves as the hardware scheduler: each ready queue is backed by a software interrupt at a configured priority, and the NVIC's priority-based preemption determines which actor runs next;
actors follow a run-to-completion model – when scheduled, an actor's run procedure executes and returns; there is no blocking within a single run;
this eliminates the need for per-thread stacks – all handler-mode code shares the main stack (MSP);
the kernel itself maintains no persistent state: all state resides in actors, queues, and messages.

Actors

An actor is a data structure (ActorDesc) with a run procedure (ActorRun), representing a logical thread of control.

Actor lifecycle:

InitAct binds a run procedure and an identifier to the actor;
RunAct places the actor on a ready queue;
the kernel calls run when the actor reaches the head of the queue;
after run returns, the actor is no longer queued unless it re-subscribes.

Re-subscription options:

GetTick – wake at the next tick
GetMsg – wake when a message arrives
Submit – schedule for background execution

The ActorDesc record can be extended to carry per-actor state: counters, device handles, state arrays, or any application-specific data. To implement a state machine, an actor swaps its run procedure between activations. The K4sema example demonstrates this pattern: each actor cycles through claim, print, and release states by assigning different run procedures from a state array.

The id field identifies the actor, and the msg field holds the current message (if any) when run is called.

Ready Queues

A ready queue is created with NewRdyQ and installed with InstallRdyQ, which binds it to a software interrupt number and an NVIC priority. The RunHandler procedure passed to InstallRdyQ calls RunQueue, which de-queues and runs all actors on that ready queue.

Multiple ready queues at different NVIC priorities enable priority levels:

a high-priority ready queue pre-empts a low-priority one through the NVIC's hardware preemption mechanism;
actors are placed on a ready queue with RunAct (initially) or indirectly through GetTick, GetMsg, or Submit.

Tick Service

Kernel.Install configures SysTick at a given interval (in milliseconds) and NVIC priority, and creates the per-core kernel context.

Tick-driven scheduling:

an actor calls GetTick to be woken at the next SysTick event;
the SysTick handler moves all waiting actors to their respective ready queues, where they are dispatched in the normal way.

Background tasks:

Kernel.Run enables SysTick and enters the background loop;
Submit places an actor on the background loop's actor queue with a tick count;
the main loop, running in thread mode, decrements each actor's counter on every tick and dispatches the actor when the counter reaches zero;
background tasks are suitable for non-time-critical work such as periodic monitoring or LED toggling.

Event Queues and Messages

Event queues decouple producers and consumers. An event queue (EventQ) pairs a message queue with an actor queue.

Producer side (PutMsg):

if an actor is already waiting, the message is delivered directly and the actor is placed on its ready queue;
if no actor is waiting, the message is enqueued for later retrieval.

Consumer side (GetMsg):

if a message is available, it is delivered immediately and the actor is placed on its ready queue;
if not, the actor parks on the event queue's actor queue until a message arrives.

Message pools:

pre-allocate message objects at initialisation, avoiding heap allocation at run time;
a producer obtains a message with GetFromMsgPool, fills in its data, and sends it with PutMsg;
after the consumer's run procedure returns, the kernel automatically returns the message to its pool.

PutMsgAwaited is a variant for interrupt handlers: if no actor is waiting, the message is returned to its pool rather than enqueued. This prevents unbounded message accumulation when the consumer is not ready.

Semaphores

The Semaphores module provides binary semaphores for mutual exclusion among actors. A semaphore is built on top of an event queue with a single message that acts as the token.

Operations:

Init – creates the semaphore and places the token on the event queue, marking it as free;
Claim – calls GetMsg on the underlying event queue; if the token is available, the actor proceeds; if not, the actor parks until the token is released;
Release – calls PutMsg, which either unblocks a waiting actor or returns the token to the queue.

The K4sema example demonstrates this pattern: two actors compete for a shared terminal, each cycling through claim, print, and release states.

Kernel Alarms

The KernelAlarms module provides microsecond-level scheduling using hardware timer alarms on the RP2350.

Operations:

Init – binds an alarm to a specific timer and alarm number at a given interrupt priority;
Arm – schedules an actor to be woken at an absolute time (in microseconds);
Rearm – reschedules relative to the last trigger time, avoiding cumulative drift.

When the alarm fires, the interrupt handler delivers a message via the alarm's event queue, activating the waiting actor.

The K4print example uses a kernel alarm to toggle a GPIO pin at a 10 ms period, running at high priority alongside lower-priority UART output actors.

Data Protection

Shared queues – ready queues, actor queues, message queues, and message pools – require protection against concurrent access from interrupt handlers and the main loop.

Kernel-v4 uses the BASEPRI register. Each queue operation:

saves the current BASEPRI value;
raises BASEPRI to block all interrupts at or below the kernel's blocking priority level;
performs the queue manipulation (a few instructions for head and tail pointer updates);
restores the previous BASEPRI value.

Queues are always left in a consistent state. Messages are removed from a queue before execution and are then owned by the consumer until the kernel returns them to their pool.

BASEPRI and PRIMASK

Two Cortex-M mechanisms can provide mutual exclusion for queue operations:

PRIMASK disables all interrupts except NMI and HardFault. It uses a single instruction (CPSID I / CPSIE I) and is the simplest approach, but it blocks all interrupts, including those unrelated to the kernel.
BASEPRI masks only interrupts at or below a configurable priority level. It requires slightly more instructions (save, load, restore), but allows higher-priority interrupts – those that do not interact with kernel queues – to continue firing during the critical section.

Kernel-v4 uses BASEPRI. This preserves responsiveness for high-priority interrupts above the blocking level. Both mechanisms are single-core only: neither register controls another core's execution priority.

Performance

The K4perf example measures actor activation latency on an RP2350 running at 125 MHz (Cortex-M33). The following figures are indicative values measured with GPIO pins and an oscilloscope:

interrupt handler entry: approximately 230 ns (29 cycles);
full actor activation (interrupt, handler, message delivery, actor run): approximately 2.3 µs (281 cycles) with a minimal handler;
with a more realistic handler that includes GPIO event discrimination: approximately 3.1 µs;
flash cache cold start: approximately 29 µs on first execution, before the XIP cache is warm.

The Cortex-M33 hardware interrupt latency is 12 cycles.

See the K4perf example for details.

Real-Time Characteristics

Response times are consistent and reproducible across runs. This follows from:

the Cortex-M33 hardware interrupt model;
the NVIC's deterministic priority resolution;
the kernel's run-to-completion actor model with short, bounded critical sections;
the Astrobe compiler's code generation: the compiler produces compact code that directly represents the source program, with no hidden optimisations that could introduce timing variability.

The developer can look at the source code and know exactly what the generated code looks like – a property that enables guaranteed response times for hard real-time applications.

Priority Layering

The use of BASEPRI for data protection enables a horizontal layering of the system for responsiveness:

above the blocking level – interrupt handlers that do not interact with kernel queues are never blocked by kernel critical sections, not even during a queue operation; these handlers can respond to external events with latency limited only by the Cortex-M hardware, in the sub-microsecond range at typical clock frequencies;
at or below the blocking level – actors and handlers interact through the kernel's queues and message-passing mechanisms, with response times in the low microsecond range.

This separation allows time-critical interrupt handlers to coexist with the kernel without interference, while the kernel provides the structuring benefits of actors, message passing, and scheduling for the remainder of the application.

Dual-Core Partitioning

On dual-core MCUs such as the RP2350, vertical partitioning extends this approach:

one core runs a pure interrupt-driven workload without the kernel, achieving the lowest possible response times bounded only by hardware interrupt latency;
the other core runs a kernel instance, providing actor-based concurrency and message passing with the response times shown above.

Each core operates independently with its own interrupt priorities and critical section state.

Multi-core

Kernel-v4 does not support cross-core actor sharing. Each core runs its own independent kernel instance with its own ready queues, tick service, and background loop. This is a fundamental constraint: neither BASEPRI nor PRIMASK can control another core's interrupt masking, so the critical section mechanism protects only within a single core.

UART Stream Drivers

Two RP2350 driver modules demonstrate idiomatic kernel-v4 I/O patterns:

UARTstrKbw uses blocking busy-wait: the driver actor writes each character to the UART transmit FIFO, spinning until space is available. Note that higher prio actors can still run.
UARTstrKint uses interrupt-driven output: the driver actor writes the first character, then parks on a device event queue. When the UART transmit interrupt fires, the handler either writes the next character or delivers a message to wake the actor for the next string.

Both drivers follow the message-passing pattern: PutString obtains a message from a pool, fills it with text, and sends it to the driver's event queue. The driver actor receives the message and writes to the UART hardware.

The K4print example demonstrates both drivers.

Modules and Repository

Core modules, available on all supported platforms:

Kernel – actors, queues, message pools, tick service, background loop
Semaphores – binary semaphores for mutual exclusion
SysTick – SysTick configuration for the kernel tick

RP2350 extensions:

KernelAlarms – microsecond-level scheduling via hardware timer alarms
UARTstrKbw – UART output driver (busy-wait)
UARTstrKint – UART output driver (interrupt-driven)

Last updated: 13 March 2026