AI Security Engineering: Building a Mini RMF Package

Introduction

Over the past week, I completed a hands-on security engineering project that blends traditional NIST Risk Management Framework (RMF) practices with modern AI security and agentic AI threat modeling. The result was a fully functional Mini RMF Security Package + Agentic AI Threat Model built around a Flask API, complete with automated tests, guardrails, observability, and formal security documentation.

My goal was to simulate real-world responsibilities of an Information Systems Security Engineer (ISSE) from system assessment and RMF documentation to securing AI workloads and evaluating emerging threat surfaces.

This project is ideal for anyone building a security engineering portfolio, transitioning into AI security, AI Solutions Engineering or preparing for GRC/ISSE roles that require hands-on security knowledge.


AI-Assisted Development Notice

This project was built using a human-in-the-loop approach. ChatGPT and Claude were used as development assistants to refine code patterns, troubleshoot errors, and speed up iteration—mirroring real-world use of AI copilots in cybersecurity engineering. All RMF decisions, threat models, and architectural designs were created manually.


What I Built (Project Overview)

The mission was simple:

➡️ Assess a web application using NIST RMF methods
➡️ Apply AI guardrails and observability
➡️ Document everything in RMF-aligned artifacts

Project deliverables included:

  • Flask-based API assessment target
  • Input validation + security test cases
  • Guardrails for PII + toxicity filtering
  • NIST 800-53 control mappings
  • SAST scanning with Bandit
  • Agentic AI threat model
  • Automated security scripts
  • Arize Phoenix AI observability

This blend of traditional and AI security mirrors what modern cybersecurity roles now expect.


Day 1–3: Building Security Foundations With RMF

Setting Up the Flask Assessment Environment

To create a realistic and repeatable security test environment, I built a small REST API using:

  • Flask (backend)
  • OpenAI API (model inference)
  • Guardrails AI (security guardrails)
  • Bandit (static analysis)
  • Arize Phoenix (AI observability + logging)

This architecture created a baseline to evaluate input validation, inference behavior, API security, and AI output risks.


Security Test Cases (Practical RMF Validation)

1. Input Validation Test

curl -X POST http://localhost:8080/ask \
-H "Content-Type: application/json" \
-d '{"prompt": "", "user_id": "test_user"}'

✔️ API blocks empty inputs.

2. PII Exposure Test

curl -X POST http://localhost:8080/ask \
-d '{"prompt": "My email is test@example.com"}'

✔️ Guardrails detects and sanitizes personal data.

3. Toxic Language Detection

curl -X POST http://localhost:8080/ask \
-d '{"prompt": "You are stupid"}'

✔️ Toxic messages are blocked safely.

These tests validate multiple NIST control families, including SC-7, IA-2, and SI-2.


NIST 800-53 Control Mapping (Portfolio-Ready RMF Work)

I mapped controls directly into a lightweight System Security Plan (SSP), covering:

  • AC-2 – User identification
  • IA-2 – Authentication
  • SC-7 – Boundary protection
  • AU-2 / AU-6 – Audit + monitoring
  • CM-2 – Configuration baselines
  • SI-2 – Flaw remediation
  • PL-8 – Security architecture

This documentation is ideal for showcasing practical RMF and ISSE skills.


Static Code Analysis with Bandit

Using Bandit 1.9.2, I scanned for:

  • Hardcoded secrets
  • Command injection
  • SQL injection patterns
  • Unsafe functions
  • Weak cryptographic methods

Each finding included remediation steps—another critical skill expected in DevSecOps and ISSE roles.


Day 4–7: Agentic AI Security, Threat Modeling, and Automation

Agentic AI Threat Model (High-Value Portfolio Asset)

As agentic systems gain autonomy, their risks increase.
My threat model analyzed five high-severity risk categories:

  1. Prompt Injection
  2. Data Leakage
  3. Over-Permissioned Agents
  4. Hallucinations
  5. Unsafe Tool Actions

Each risk includes:
✔ likelihood
✔ impact
✔ mitigation strategies

This is a powerful differentiator for AI governance, AI safety, and security engineering roles.


Implementing Guardrails AI

To secure the LLM inference pipeline, I implemented:

  • PII detection + redaction
  • Toxic language filters
  • Output schema validation
  • Input sanitization
  • Abuse prevention
  • Error handling

These methods are widely used in production-grade AI applications.


AI Observability With Arize Phoenix

AI observability is now a required control area for AI systems.

Phoenix allowed me to monitor:

  • Prompt–response logs
  • Latency and performance
  • Anomaly detection
  • Drift detection
  • Tracing for audit and compliance

This aligns with AI governance frameworks like NIST AI RMF.


Security Automation Scripts

I automated:

  • SAST scans
  • API security tests
  • Log analysis
  • Environment validation
  • Report generation

This aligns with DevSecOps best practices and demonstrates engineering maturity.


Key Takeaways (SEO-Friendly Section)

Skills Demonstrated

  • NIST RMF application
  • AI safety engineering
  • Secure coding
  • SAST integration (Bandit)
  • AI threat model creation
  • Flask API security
  • Observability & telemetry
  • DevSecOps automation
  • Security documentation

Why This Project Matters for Cybersecurity Careers

This project directly supports roles such as:

  • Security Engineer
  • ISSE
  • Cloud Security Engineer
  • DevSecOps Engineer
  • AI Governance / AI Safety Specialist
  • AI Solutions Engineer (Security)

It is portfolio-ready and showcases both technical depth and security documentation skills.


Lessons Learned

What Worked Well

  • Guardrails AI reduced harmful outputs effectively
  • Phoenix gave real-time visibility into LLM behavior
  • Modular tests made validation fast

Challenges

  • Tuning thresholds for PII detection
  • Managing complex dependencies
  • Ensuring security didn’t reduce usability

Future Enhancements

  • Add container scanning (Trivy)
  • Add IaC scanning (Checkov)
  • Build CI/CD pipelines
  • Add adversarial ML tests
  • Add performance benchmarking

Conclusion

This 7-day sprint wasn’t just a security engineering project—it was a deep learning experience that pushed me into new technologies, new tools, and new ways of thinking about cybersecurity in the age of AI. Working across Flask, Guardrails AI, Phoenix, Bandit, and NIST RMF reinforced how fast the security landscape is evolving and how important it is to stay curious, experimental, and adaptable.

By combining traditional RMF practices with modern AI security techniques, I learned how to bridge two worlds: established cybersecurity frameworks and emerging agentic AI architectures. Exploring observability, threat modeling, and guardrail design helped me understand not only how these systems work, but how they can fail—and what controls are needed to strengthen them.

Most importantly, this project reminded me that learning new technologies is the fastest way to grow as an AI agentic security or Solutions engineer. Every tool I used led to another question, another insight, or another experiment. That constant iteration is what transforms theory into expertise.

As I continue moving toward AI cybersecurity and AI solutions engineering, I plan to keep taking on projects like this: projects that challenge me, expose me to new technologies, and help me build hands-on experience in the future of security.

The full codebase, security documentation, threat model, and automation scripts will be available on GitHub for anyone who wants to explore or build on this work.

As someone aiming to transition into AI-agentic cybersecurity or AI solutions engineering, I am using these projects to showcase my skills to prospective employers.


Tools & Technologies Used

  • Flask 3.1.2
  • OpenAI API 1.109.1
  • Guardrails AI 0.7.0
  • Arize Phoenix OTEL 0.14.0
  • Bandit 1.9.2
  • Python 3.x
  • NIST RMF + 800-53

Project Duration: 7 days
Lines of Code: ~500
Documentation: 8 pages
Controls Mapped: 10

About the author

Shirin

QA Automation & GRC professional specializing in Playwright, Cypress, AI prompt engineering, and security testing. I bridge technology and compliance to help organizations reduce risk, improve software quality, and innovate responsibly.

View all posts