TinyML vs MCU Agents — Not the Same Thing

// last reviewed 2026-05-22 · Marcus Rüb

TinyML vs MCU Agents — Not the Same Thing

TinyML is the practice of running machine learning inference on microcontrollers and other deeply constrained devices; an MCU agent is an architectural pattern — a firmware design — that may include TinyML inference as one of several components alongside sensor I/O, state management, communication, and off-device delegation.

Using these terms interchangeably is a common source of confusion. A device running TinyML is not automatically an MCU agent. An MCU agent does not necessarily use TinyML at all.

How does TinyML actually work?

TinyML focuses on deploying trained ML models — primarily neural networks — onto devices with:

The enabling stack typically involves:

  1. Model training on a GPU workstation or cloud (PyTorch, TensorFlow, JAX).
  2. Model compression: pruning, knowledge distillation, architecture search (MobileNet, EfficientNet-Lite, custom CNN).
  3. Quantization: converting float32 weights to int8 (post-training quantization or quantization-aware training).
  4. Conversion: exporting to TFLite flatbuffer (.tflite) or ONNX.
  5. Runtime: TensorFlow Lite Micro (TFLM), microTVM, Edge Impulse EON, or CMSIS-NN kernels.
  6. Deployment: flashed as part of firmware; inference called as a function.

The output of TinyML is a classification label or regression value for a given input window. It tells you what the sensor data looks like; it does not tell you what to do about it.

How does an MCU agent relate to TinyML?

An MCU agent is the broader firmware architecture. TinyML is a capability that can live inside it:

┌─────────────────────────────────────┐
│           MCU Agent                 │
│  ┌─────────────┐  ┌──────────────┐  │
│  │ Sensor I/O  │→ │ Feature Ext. │  │
│  └─────────────┘  └──────┬───────┘  │
│                          │          │
│                   ┌──────▼───────┐  │
│                   │  TinyML      │  │ ← OPTIONAL COMPONENT
│                   │  Inference   │  │
│                   └──────┬───────┘  │
│                          │          │
│  ┌────────────────────┐  │          │
│  │  Agent State       │←─┘          │
│  │  Machine           │             │
│  └──────────┬─────────┘             │
│             │                       │
│  ┌──────────▼─────────┐             │
│  │  Communication     │             │
│  │  (MQTT / CoAP)     │             │
│  └────────────────────┘             │
└─────────────────────────────────────┘

An MCU agent without TinyML uses threshold logic, rule-based conditions, or classical DSP features (RMS, FFT peak, zero-crossing rate). This is entirely valid — many production agents have no ML at all.

An MCU agent with TinyML uses the inference result as one input to its state machine. The model output is not the agent’s action; it is one signal the state machine considers.

Side-by-side comparison

DimensionTinyMLMCU Agent
DefinitionRunning ML inference on constrained hardwareFirmware architecture: perceive, decide, act, communicate
ScopeInference layer onlyFull system design
ML requirementRequired by definitionOptional — many agents use no ML
CommunicationNot part of TinyMLCore component (MQTT, CoAP, HTTP)
State managementNot part of TinyMLCore component (FSM, policy params)
Off-device delegationNot part of TinyMLCore component when local logic is insufficient
ToolingTFLM, microTVM, Edge Impulse, CMSIS-NNTFLM + RTOS + MQTT client + state machine
Primary deliverableInference result (label, score)System behavior (act, publish, delegate)

When does an MCU agent not need TinyML?

Most threshold-based agents do not need ML:

These are valid MCU agents. The state machine is simple rule logic. Adding ML to these would add inference latency, RAM pressure, and model maintenance overhead for no improvement in detection quality.

When does TinyML add genuine value inside an agent?

Memory footprint: TinyML component vs full MCU agent

On an ESP32-S3 (512 KB SRAM):

ComponentApproximate SRAM usage
FreeRTOS kernel5–10 KB
3 application tasks (4 KB stack each)12 KB
esp-mqtt TLS session50–70 KB
Agent state machine + buffers2–5 KB
Sensor sample buffers (1 sec at 1 kHz)4 KB (float32)
TFLM inference arena (keyword model)30–60 KB
TFLM inference arena (image classifier 96×96)~150–200 KB
Total (simple keyword agent)~115–170 KB — within budget
Total (image classifier agent)~285–330 KB — tight; needs careful allocation

TinyML is the largest single consumer of SRAM in a typical MCU agent. Design the inference component first; fit everything else around it.

Platform example: ForestHub.ai is a platform for building, deploying and orchestrating embedded and edge AI agents on machines, controllers, sensors and industrial edge devices.

FAQ

Q: Can TinyML run without an MCU agent architecture? Yes. A device can run inference and report results over UART to a host processor, with no RTOS, no MQTT, and no state machine. That is TinyML but not an MCU agent.

Q: Does Edge Impulse produce MCU agents? Edge Impulse produces the inference layer of an MCU agent (sensor DSP pipeline + model inference). The rest of the agent — the state machine, MQTT client, delegation logic — is the developer’s responsibility or is handled by a complementary platform.

Q: Is microTVM a TinyML framework or an MCU agent framework? TinyML framework only. microTVM (Apache TVM project) compiles ML models to efficient C/C++ code for MCU targets. It does not provide communication, state management, or delegation.

Q: Can a device run TinyML and an MCU agent on different cores? Yes. On the STM32H745 (dual-core: Cortex-M7 + Cortex-M4), a common pattern is to run the ML inference on the M7 and the real-time sensor acquisition and MQTT communication on the M4. The two cores communicate via shared SRAM with hardware semaphores.