research

Researchers propose RoboGuard, a two-stage architecture to prevent unsafe LLM-powered robot behavior

Researchers have introduced RoboGuard, a safety framework that addresses the gap between LLM vulnerabilities and physical robot risks. The system uses a root-of-trust LLM to contextualize safety rules and temporal logic control to prevent harmful robot actions, reducing unsafe plan execution from over 92% to below 3% in tests.

March 5, 2026 · 5:09 AM2 min read

Researchers Propose RoboGuard Framework for LLM-Enabled Robot Safety

A new research paper (arXiv:2503.07885) addresses a critical safety gap: while large language models enable powerful robot capabilities, they also introduce vulnerabilities—from hallucinations to adversarial jailbreaking—that can cause dangerous physical actions in real-world environments.

The Problem

Traditional robot safety systems don't account for LLM-specific risks like prompt injection attacks. Conversely, current LLM safety approaches ignore the physical consequences of robot failures. This mismatch has created an unprotected layer in autonomous systems.

RoboGuard: Two-Stage Architecture

The proposed framework operates in two stages:

Stage 1: Context-Aware Safety Grounding

A root-of-trust LLM—shielded from malicious prompts—uses chain-of-thought reasoning to contextualize pre-defined safety rules based on the robot's specific environment. Rather than generic constraints, this generates context-dependent safety specifications like temporal logic constraints that reflect actual physical conditions.

Stage 2: Conflict Resolution

When the LLM generates a plan that conflicts with contextual safety specifications, RoboGuard uses temporal logic control synthesis to automatically resolve the conflict. This ensures safety compliance while minimizing deviation from user preferences.

Experimental Results

In both simulation and real-world tests that included worst-case jailbreaking attacks:

Unsafe plan execution reduced from >92% to <3% without degrading performance on safe operations
The system remained resource-efficient and robust against adaptive attacks
Chain-of-thought reasoning in the root-of-trust LLM enhanced safety effectiveness

These results held across scenarios designed to stress-test the system with adversarial inputs.

What This Means

RoboGuard represents a practical bridge between LLM safety and robot safety—two fields that have developed independently despite their intersection mattering enormously. The framework's ability to prevent 92% of unsafe actions suggests that deliberate architectural choices (root-of-trust verification, temporal logic verification) can meaningfully constrain LLM behavior in high-stakes physical systems.

The sub-3% unsafe execution rate is far from perfect, but represents a significant improvement for systems that will eventually operate around humans. The real-world testing is particularly noteworthy; simulations often overestimate robustness.

Key limitations remain: the framework assumes access to a trustworthy LLM for grounding, scalability to complex multi-robot systems is unaddressed, and temporal logic specification requires domain expertise. Nevertheless, this work signals a necessary direction—safety mechanisms specifically engineered for LLM-robot integration rather than borrowed wholesale from either parent field.

Researchers have published additional resources and implementation details at https://robo-guard.github.io/.

Source: arxiv.org ↗

robot-safety llm-safety adversarial-attacks temporal-logic autonomous-systems guardrails jailbreak-defense