AI Network: A Guardrailed Agentic Framework for Safe Network Automation

Technologies Used

Python AI

Abstract

We need the speed of AI-assisted operations without losing control of safety, auditability, and change discipline. This project implements that balance: a practical, prompt-driven network automation system where read operations are fast and write operations are explicitly gated.

It combines agentic orchestration patterns, reusable SKILL modules, MCP-Style structured tool interactions, and policy guardrails into one operator-friendly workflow.

All examples in this paper are intentionally anonymized and secret-safe.

1. Why This Architecture Matters

Traditional network automation often fails in one of two ways:

It is too manual, slow, and error-prone.
It is too powerful, with weak controls around destructive actions.

This project takes a third path:

Natural-language UX for speed.
Explicitly constrained execution for safety.
Strong telemetry for audit and compliance.
Modular extensibility via SKILLs and integration connectors.

2. Design Principles

The system is built on a few non-negotiables:

Read-only by default.
No direct config push from generic fetch path.
Two-step confirmation before any write apply.
Structured logs for every operation.
Secrets handled out-of-band and stored with restricted permissions.
Extensibility through modular SKILL packages.

3. System Architecture

flowchart LR
    U["Operator"] --> I["Prompt Interface"]
    I --> P["Policy Layer<br/>(mode checks + guardrails)"]
    P --> R["Read Path<br/>fetch_script.py"]
    P --> W["Write Path<br/>config_push_script.py"]
    R --> T["Transport Layer<br/>cli_executor.py via jump host"]
    W --> T
    R --> X["Parser Layer<br/>parsers.py"]
    R --> O1["Observability<br/>execution_log.jsonl + command_outputs/"]
    W --> O2["Observability<br/>config_push_log.jsonl + config_push_outputs/"]
    R --> S["SKILL Integrations<br/>Notification"]

3.1 Interface Layer

Operators use plain-English prompts. The assistant maps intent into safe CLI execution plans, not raw shell passthrough.

3.2 Policy and Guardrails Layer

The policy layer enforces mode-aware behavior:

Unsafe command patterns are blocked.
Shell chaining patterns are blocked.

3.3 Read Execution Path

Single commands or multi-step playbooks.
Inventory fan-out (--devices ALL).
Risk detection for dangerous markers.
Optional parser extraction into JSON payloads.
Large-output offloading to files.
JSONL execution logging.

3.4 Write Execution Path

Full config preview shown before apply.
--preview-only workflow for human validation.
Apply requires explicit --apply APPLY.
Destructive markers blocked unless deliberate opt-in.
Structured push audit logs and transcript retention.

3.5 Transport Layer

Connection/session handling.
Retry behavior on session drops.
Interactive config mode transcript capture.
Error marker detection.
Device-family-specific commit handling safeguards.

3.6 Parser Layer

show_version
show_bgp_summary
generic parser for unknown formats

This is critical for turning terminal output into machine-usable telemetry.

3.7 Skills and External Integrations

Implemented multiple SKILLs show the extensibility model:

Notification: sends workflow outcomes in notification, tokenized safely and many more

4. Agentic Model

Even without a complex distributed agent mesh, this repo follows clear agentic role separation:

Planner role: converts prompt intent into an execution plan.
Policy role: validates whether operation is read or write and enforces rules.
Executor role: runs device operations over controlled transport.
Analyzer role: parses and summarizes outputs.
Notifier role: publishes outcomes to collaboration channels.

This role separation gives the benefits of agentic systems while remaining operationally deterministic.

5. MCP-Style Integration Pattern

The system already behaves in an MCP-compatible pattern:

Tool executions are represented as structured operation events.
JSONL logs capture normalized fields for downstream tooling.
Single-operation traces demonstrate predictable machine-readable outputs.

Example event shape:

{
  "timestamp": "2026-04-01T17:26:07Z",
  "device": "<DEVICE_NAME>",
  "command": "show version",
  "success": true,
  "elapsed_seconds": 1.8,
  "parser": null,
  "error": ""
}

This contract-style output is what makes future connector/MCP expansion straightforward.

6. Guardrails Deep Dive

This is the core strength of the platform.

6.1 Read-Only Enforcement

Allowed prefixes only (show, display, get, etc.).
Blocks write/control verbs in read mode.
Blocks command chaining (;, &&, ||, substitutions).

6.2 Write Gating

Config changes only through dedicated write script.
Full configuration plan shown before execution.
Human-in-the-loop approval flow.
Explicit apply token required.

6.3 Destructive Action Controls

Detects high-risk patterns (reload, write erase, etc.).
Requires explicit destructive override when needed.

6.4 Auditability

Every operation is logged with timestamp, target, command, result, duration.
Large payloads stored in file artifacts; logs retain references.
Enables post-incident forensics and compliance review.

6.5 Secret Hygiene

SKILL-level token storage in protected .secrets paths.
Reusable secure token handling without exposing secrets in prompts.

7. Technical Workflows (Examples)

7.1 Read-Only Multi-Device Check

python3 fetch_script.py \
  --devices ALL \
  --cli-command "show version" \
  --parser show_version \
  --assume-yes

Outcome:

Safe command fan-out.
Structured version/uptime extraction.
Per-device audit trail.

7.2 Controlled Config Push

Preview phase:

python3 config_push_script.py \
  --devices R11 \
  --config-command "interface TenGigE0/0/0/0" \
  --config-command "description LINK_TO_CORE" \
  --preview-only

Apply phase after approval:

python3 config_push_script.py \
  --devices R11 \
  --config-command "interface TenGigE0/0/0/0" \
  --config-command "description LINK_TO_CORE" \
  --apply APPLY

Outcome:

No accidental writes.
Human-approved, auditable change path.

7.3 Alerting Integration

Alerts runs periodic checks and sends notifications only when failures occur, reducing alert noise while preserving operational visibility.

8. Why It Is Easy to Integrate

Adoption is intentionally low-friction:

One-time setup installs dependencies.
Populate list of devices
Start with read-only prompts and commands.
Add SKILL packages by folder copy convention.
Extend parsers or playbooks incrementally.

No complex platform rewrite is required. Teams can start with immediate operational value and scale gradually.

9. Enterprise Value

This approach delivers practical gains:

Faster operator workflows with natural language entry.
Lower risk via strict mode separation and apply gating.
Better reliability through standardized execution and retries.
Better compliance via immutable operation logs.
Better extensibility via modular SKILL and connector pattern.

Conclusion

This project demonstrates that AI-driven network operations do not need to trade safety for speed.

By combining guardrails, explicit write controls, structured telemetry, and modular SKILL integration, it creates an automation platform that is both technically rigorous and easy to operationalize.