Technologies Used
Abstract
We need the speed of AI-assisted operations without losing control of safety, auditability, and change discipline. This project implements that balance: a practical, prompt-driven network automation system where read operations are fast and write operations are explicitly gated.
It combines agentic orchestration patterns, reusable SKILL modules, MCP-Style structured tool interactions, and policy guardrails into one operator-friendly workflow.
All examples in this paper are intentionally anonymized and secret-safe.
1. Why This Architecture Matters
Traditional network automation often fails in one of two ways:
- It is too manual, slow, and error-prone.
- It is too powerful, with weak controls around destructive actions.
This project takes a third path:
- Natural-language UX for speed.
- Explicitly constrained execution for safety.
- Strong telemetry for audit and compliance.
- Modular extensibility via SKILLs and integration connectors.
2. Design Principles
The system is built on a few non-negotiables:
- Read-only by default.
- No direct config push from generic fetch path.
- Two-step confirmation before any write apply.
- Structured logs for every operation.
- Secrets handled out-of-band and stored with restricted permissions.
- Extensibility through modular SKILL packages.
3. System Architecture
flowchart LR
U["Operator"] --> I["Prompt Interface"]
I --> P["Policy Layer<br/>(mode checks + guardrails)"]
P --> R["Read Path<br/>fetch_script.py"]
P --> W["Write Path<br/>config_push_script.py"]
R --> T["Transport Layer<br/>cli_executor.py via jump host"]
W --> T
R --> X["Parser Layer<br/>parsers.py"]
R --> O1["Observability<br/>execution_log.jsonl + command_outputs/"]
W --> O2["Observability<br/>config_push_log.jsonl + config_push_outputs/"]
R --> S["SKILL Integrations<br/>Notification"]
3.1 Interface Layer
Operators use plain-English prompts. The assistant maps intent into safe CLI execution plans, not raw shell passthrough.
3.2 Policy and Guardrails Layer
The policy layer enforces mode-aware behavior:
- Unsafe command patterns are blocked.
- Shell chaining patterns are blocked.
3.3 Read Execution Path
- Single commands or multi-step playbooks.
- Inventory fan-out (
--devices ALL). - Risk detection for dangerous markers.
- Optional parser extraction into JSON payloads.
- Large-output offloading to files.
- JSONL execution logging.
3.4 Write Execution Path
- Full config preview shown before apply.
--preview-onlyworkflow for human validation.- Apply requires explicit
--apply APPLY. - Destructive markers blocked unless deliberate opt-in.
- Structured push audit logs and transcript retention.
3.5 Transport Layer
- Connection/session handling.
- Retry behavior on session drops.
- Interactive config mode transcript capture.
- Error marker detection.
- Device-family-specific commit handling safeguards.
3.6 Parser Layer
show_versionshow_bgp_summarygenericparser for unknown formats
This is critical for turning terminal output into machine-usable telemetry.
3.7 Skills and External Integrations
Implemented multiple SKILLs show the extensibility model:
Notification: sends workflow outcomes in notification, tokenized safely and many more
4. Agentic Model
Even without a complex distributed agent mesh, this repo follows clear agentic role separation:
- Planner role: converts prompt intent into an execution plan.
- Policy role: validates whether operation is read or write and enforces rules.
- Executor role: runs device operations over controlled transport.
- Analyzer role: parses and summarizes outputs.
- Notifier role: publishes outcomes to collaboration channels.
This role separation gives the benefits of agentic systems while remaining operationally deterministic.
5. MCP-Style Integration Pattern
The system already behaves in an MCP-compatible pattern:
- Tool executions are represented as structured operation events.
- JSONL logs capture normalized fields for downstream tooling.
- Single-operation traces demonstrate predictable machine-readable outputs.
Example event shape:
{
"timestamp": "2026-04-01T17:26:07Z",
"device": "<DEVICE_NAME>",
"command": "show version",
"success": true,
"elapsed_seconds": 1.8,
"parser": null,
"error": ""
}
This contract-style output is what makes future connector/MCP expansion straightforward.
6. Guardrails Deep Dive
This is the core strength of the platform.
6.1 Read-Only Enforcement
- Allowed prefixes only (
show,display,get, etc.). - Blocks write/control verbs in read mode.
- Blocks command chaining (
;,&&,||, substitutions).
6.2 Write Gating
- Config changes only through dedicated write script.
- Full configuration plan shown before execution.
- Human-in-the-loop approval flow.
- Explicit apply token required.
6.3 Destructive Action Controls
- Detects high-risk patterns (
reload,write erase, etc.). - Requires explicit destructive override when needed.
6.4 Auditability
- Every operation is logged with timestamp, target, command, result, duration.
- Large payloads stored in file artifacts; logs retain references.
- Enables post-incident forensics and compliance review.
6.5 Secret Hygiene
- SKILL-level token storage in protected
.secretspaths. - Reusable secure token handling without exposing secrets in prompts.
7. Technical Workflows (Examples)
7.1 Read-Only Multi-Device Check
python3 fetch_script.py \
--devices ALL \
--cli-command "show version" \
--parser show_version \
--assume-yes
Outcome:
- Safe command fan-out.
- Structured version/uptime extraction.
- Per-device audit trail.
7.2 Controlled Config Push
Preview phase:
python3 config_push_script.py \
--devices R11 \
--config-command "interface TenGigE0/0/0/0" \
--config-command "description LINK_TO_CORE" \
--preview-only
Apply phase after approval:
python3 config_push_script.py \
--devices R11 \
--config-command "interface TenGigE0/0/0/0" \
--config-command "description LINK_TO_CORE" \
--apply APPLY
Outcome:
- No accidental writes.
- Human-approved, auditable change path.
7.3 Alerting Integration
Alerts runs periodic checks and sends notifications only when failures occur, reducing alert noise while preserving operational visibility.
8. Why It Is Easy to Integrate
Adoption is intentionally low-friction:
- One-time setup installs dependencies.
- Populate list of devices
- Start with read-only prompts and commands.
- Add SKILL packages by folder copy convention.
- Extend parsers or playbooks incrementally.
No complex platform rewrite is required. Teams can start with immediate operational value and scale gradually.
9. Enterprise Value
This approach delivers practical gains:
- Faster operator workflows with natural language entry.
- Lower risk via strict mode separation and apply gating.
- Better reliability through standardized execution and retries.
- Better compliance via immutable operation logs.
- Better extensibility via modular SKILL and connector pattern.
Conclusion
This project demonstrates that AI-driven network operations do not need to trade safety for speed.
By combining guardrails, explicit write controls, structured telemetry, and modular SKILL integration, it creates an automation platform that is both technically rigorous and easy to operationalize.