Skip to content

Data engineering, expanded to architecture.

Diego Lafuente Data Systems Architect

Systems-oriented builder with 12+ years architecting data pipelines, platform implementations, and autonomous workflow systems across insurance technology, benefits administration, and pension domains. Designing and orchestrating enterprise data solutions while building agentic workflow systems. Managed operations for 15 clients, directed teams of 20+ analysts and developers, and delivered $800K+ in change order revenue. Expanding deep data engineering expertise across wider architectural surfaces.

12+

Years Experience

Data engineering & systems architecture

30+

Customer Migrations

Legacy-to-AWS cloud migration

1M+

Records at Scale

Enterprise data pipeline throughput

70+

Data Integrations

Concurrent vendor data pipelines

25+

Integration Contracts

Vendor data feed architectures

70+

Business Objects

DTO catalog from legacy template analysis

Proof of Work

Flagship case studies from real engagements. Expand for the full narrative.

Multi-Employer Pension System Stabilization & $600K Revenue Recovery

architecture leadership pension problem-solving agile key result
$600K+ change order revenue
Situation

Inherited an unstable pension system for a multi-employer pension plan with critical operational issues. The platform had accumulated technical debt and the client relationship was at risk due to recurring system failures and delayed deliverables.

Task

Stabilize system operations, restore client confidence, and deliver pending change orders worth $600K+ while simultaneously managing ongoing pension administration needs.

Action

Led a structured stabilization initiative by triaging critical defects, implementing modular fixes, and establishing a prioritized backlog. Decomposed complex pension domain into structured system requirements. Designed automation for retroactive coverage processing with downstream life event automation. Directed technical stabilization across development and QA tracks with SQL-heavy backend analysis and automation design, delivering incremental improvements while maintaining operational stability.

Result

Stabilized the pension system, delivered $600K+ in change order revenue, and restored the client relationship. Established repeatable patterns for managing unstable legacy platforms through structured decomposition and agile delivery.

Public Pension Fund Domain Architecture & System Decomposition

architecture data-modeling data-engineering pension agile communication key result
1500+ structured requirements
200K-300K participant lives
3.5 year implementation
Situation

A public pension fund for unionized workers required a comprehensive multi-employer pension implementation covering eligibility, benefit calculations, disability, death benefits, payment methods, and participant statements — a complex game mechanic situation serving 200K-300K participant lives. The pension rules were complex and poorly documented, requiring deep domain expertise to decompose into implementable system architecture.

Task

Decompose the full scope of the pension fund domain into implementable system architecture, ensuring complete coverage of all pension modules across a multi-year implementation.

Action

Decomposed the fund's complex pension domain into implementable system architecture across 1500+ structured requirements spanning eligibility logic, benefit calculations, disability processing, death benefits, payment methods, and participant statements. Mapped pension rules, calculations, and business logic to system modules enabling parallel development streams. Drove first scaled agile program for the value stream, earning SAFe certification through hands-on release train engineering.

Result

Delivered comprehensive pension domain architecture across 1500+ structured requirements enabling parallel development across multiple teams. 3.5-year implementation navigated through SAG-AFTRA strike and COVID, bringing client relationship from red to yellow status.

Enterprise Benefits Data Integration Architecture — 70+ Concurrent Pipelines

architecture data-engineering integrations sql benefits key result
~70 concurrent integrations
15 enterprise clients
1M+ participants
20+ team members
Situation

The benefits platform served 15 enterprise clients requiring data integration pipelines to 10-15 vendors each — insurance carriers, payroll systems, DCFSA administrators. At operational steady state, approximately 70 integrations ran concurrently across SQL/SSIS pipelines handling EDI 834, flat files, position-based formats, feedback loops, and change detection.

Task

Design and maintain the data integration architecture serving the full client portfolio — pipeline design, format engineering, and vendor feed reliability across all 15 enterprise accounts.

Action

Led integration pipeline design across 20+ analysts and developers, establishing SQL/SSIS data export pipelines for vendor integrations after participant elections. Engineered SQL/SSIS integration pipelines for EDI 834, flat files, position-based formats (e.g., a retirement provider with ~5,000 positions per row), multi-row files, feedback loops, and change detection logic. Designed data transformation architectures mapping internal platform data to vendor-specific output formats across insurance carriers, payroll providers, and benefits vendors. Maintained COBRA, ACA, and PHI compliance across all data integration touchpoints.

Result

Architected and maintained data integration platform serving 1M+ benefit plan participants with ~70 concurrent pipelines across 15 enterprise clients. Self-initiated sprint cycles drove continuous pipeline improvement without formal agile mandate.

What I Build

Autonomous systems built end-to-end — from architecture decisions through production deployment. Each project demonstrates system design, engineering tradeoffs, and operational thinking.

AI-Powered Orchestration Systems

active
What it is

Model-agnostic multi-agent orchestration framework that coordinates Claude (Anthropic), Codex (OpenAI), Cursor, and Gemini (Google) through structured execution control plans with human governance loops. Powers development workflows across Cascade, Runner's Review, and Career-Ops.

Why it exists

Demonstrates end-to-end system architecture capability: SQLite-backed state management, tmux split-pane multi-agent coordination, Slack integration for human oversight, and event-driven agent activation across multiple AI model providers.

Key decisions
  • SQLite over message queue for state management — Single-writer workload with read-heavy queries. SQLite provides ACID guarantees, zero-config deployment, and portable state that can be inspected with standard tools.
  • File-based watchers over WebSocket for agent activation — OS-native file watching (inotify/FSEvents) avoids a persistent server process. Agents poll a lightweight SQLite query every 3 seconds, keeping resource usage minimal.
  • STOP gates with evidence-based handoffs — Prevents unbounded agent autonomy. Each gate requires structured evidence before the orchestrator approves continuation, creating an audit trail of decisions and artifacts.
Components
Orchestration DB DB Watcher Agent Harness Slack Listener
Python SQLite tmux Slack API Claude API OpenAI Codex Multi-Model Orchestration

Cascade

active
What it is

Cloud-native benefits intelligence SaaS platform (Aquaduct Cascade) delivering regulatory intelligence, vendor sentiment analysis, compliance alerts, and plan-design monitoring for benefits professionals through dashboards and API feeds.

Why it exists

Built and deployed a production SaaS platform on AWS with Next.js SSR on ECS Fargate, PostgreSQL RDS, CloudFront, Application Load Balancer routing, and EventBridge-scheduled collector tasks. Operational maturity includes 41 ADRs, 34 postmortems, and 46+ runbooks supporting blue/green releases, incident learning loops, and reliable data collection.

Key decisions
  • Next.js over plain React SPA — Server-side rendering keeps dashboard and content pages fast while preserving API-route flexibility for authenticated workflows and operational endpoints.
  • ECS Fargate plus ALB over static hosting — The product needs SSR, authenticated application flows, health checks, and blue/green delivery controls that fit containerized compute behind AWS load balancing.
  • Prisma ORM over raw SQL — Type-safe database access with auto-generated migrations. Schema changes propagate to TypeScript types automatically.
  • EventBridge-triggered collectors over manual research — Scheduled collectors surface regulatory changes, M&A activity, and vendor signals within hours instead of waiting on manual review cycles.
Components
Edge Delivery Layer Web Application Intelligence Store Collector Runtime Delivery Operations
Next.js 15 React TypeScript Prisma PostgreSQL AWS ECS Fargate AWS RDS AWS EventBridge Scheduler AWS CloudFront AWS Application Load Balancer Docker CloudWatch Terraform RSS/Atom REST API

Runner's Review

active
What it is

Data-driven running performance analysis application that ingests training logs and race results from multiple sources (Garmin, Strava APIs), normalizes heterogeneous data formats, and produces pacing recommendations, split analysis, and progression tracking through automated pipelines.

Why it exists

End-to-end data engineering in a personal domain: multi-source API ingestion, schema normalization across vendor formats, transformation pipelines for pace/HR/elevation analytics, and automated report generation with trend detection.

Key decisions
  • Pandas for transformation over raw SQL — Complex pace/HR analytics require windowed aggregations and statistical functions that are more expressive in pandas than SQL. The dataset fits comfortably in memory.
  • SQLite as the analytical store — Local-first design with no server dependency. Race results and training logs are append-mostly data that benefits from SQL query flexibility for ad-hoc analysis.
  • Matplotlib for static report generation — Reports are generated as static images for inclusion in weekly summaries. No interactive dashboard requirement — static charts provide reproducible, version-controlled output.
Components
API Ingestion Layer Unified Data Store Analytics Engine Report Generator
Python pandas REST APIs SQLite Data Modeling Matplotlib

Career-Ops

active
What it is

Autonomous career lifecycle management platform covering resume engineering, document generation, job hunting automation, and career trajectory planning. Uses multi-agent orchestration with Scaled Agile Framework methodology adapted for solo operation.

Why it exists

Full-stack autonomous system: schema-driven document generation pipeline, multi-format output (DOCX, PDF, HTML, TXT), variant-based resume targeting, and agent-coordinated workflow execution.

Key decisions
  • YAML source of truth over database-per-domain — Single file enables version-controlled career data with diff-friendly format. JSON Schema validation ensures structural integrity while keeping the data human-readable and editable.
  • Jinja2 templates with variant system — Variants allow targeted resume generation (data-engineer vs architect) from the same data model. Jinja2 provides inheritance, macros, and filters for complex document layouts.
  • Astro for static site generation — Zero-JS-by-default architecture produces fast, accessible pages. Career data is loaded at build time — no runtime API calls or client-side rendering required.
Components
Career Profile Schema Document Generation Pipeline Portfolio Website Job Intelligence Engine
Python JSON Schema Jinja2 YAML SQLite GitHub Actions

Positioning

Systems-oriented builder whose anchor discipline is data engineering, deploying the same core skillset across wider architectural surfaces. Expanding from a proven foundation into system architecture, autonomous workflows, and enterprise-scale data design.

Ready to Work Together?

The resume provides a structured overview. This site contains the full proof — metrics, case studies, system architectures, and provenance for every claim.