MCP Architecture: Government Data Platform

MCP Architecture: Government Data Platform

Overview

This document defines the architecture for a modular government data platform built on MCP (Model Context Protocol) groups. The architecture emphasizes domain-specific engines over a single “mega-MCP” to ensure scalability, clarity, and maintainability.

Core Principles

Avoid a Single Mega-MCP

A single monolithic MCP leads to:

  • Cognitive overload: Too many tools, unclear boundaries
  • Tight coupling: Changes ripple across unrelated domains
  • Slow iteration: Hard to evolve individual domains independently
  • Blurry ownership: Unclear who owns what logic
  • Scaling issues: Becomes unmaintainable as it grows

Solution: Use MCP groups, each owning a coherent “decision surface” (type of question).

Use MCP Groups

Each MCP group:

  • Owns a coherent decision surface (e.g., “How big/fragmented is the market?”)
  • Has stable inputs (NAICS, geography, time)
  • Produces reusable analytical outputs
  • Builds engines, not endpoints

Core Utilities MCP

A small, shared layer for primitives:

  • Normalization (NAICS codes, geography codes, time alignment)
  • Unit conversion
  • Inflation adjustment
  • Mappings (NAICS/geography concordances)

All domain MCPs depend on Core Utilities.

Composition Over Duplication

Products combine MCP outputs:

  • TAM Calculator = Market Structure MCP
  • Expansion Tool = Market Structure + Geography MCPs
  • Viability Score = Market Structure + Industry Economics + Macro Conditions MCPs

When a Single MCP Works

Only for shared primitives:

  • Normalization
  • Unit conversion
  • Inflation adjustment
  • Mappings (NAICS/geography)

Not for domain logic or multi-domain joins.

Architecture Diagram

graph TB
    subgraph "Core Layer"
        CoreGovData["Core Government Data MCP<br/>(Raw Data Access)"]
        CoreUtils["Core Utilities MCP<br/>(Normalization, Mappings)"]
    end
    
    subgraph "Domain MCPs"
        MarketStruct["Market Structure MCP<br/>(Size, Fragmentation, Reachability)"]
        IndustryEcon["Industry Economics MCP<br/>(Margins, Costs, Cyclicality)"]
        MacroCond["Macro Conditions MCP<br/>(Regime Indicators, Pressures)"]
        LaborSkills["Labor & Skills MCP<br/>(Hiring, Wages, Automation)"]
        Geography["Geography & Real Assets MCP<br/>(Location, Costs, Demand)"]
    end
    
    subgraph "Products"
        TAMCalc["TAM Calculator"]
        ExpansionTool["Expansion Planner"]
        ViabilityScore["Market Viability Tool"]
        IndustrySnapshot["Industry Snapshots"]
    end
    
    CoreGovData -->|"Raw Data"| MarketStruct
    CoreGovData -->|"Raw Data"| IndustryEcon
    CoreGovData -->|"Raw Data"| MacroCond
    CoreGovData -->|"Raw Data"| LaborSkills
    CoreGovData -->|"Raw Data"| Geography
    
    CoreUtils -->|"Normalization"| MarketStruct
    CoreUtils -->|"Normalization"| IndustryEcon
    CoreUtils -->|"Normalization"| MacroCond
    CoreUtils -->|"Normalization"| LaborSkills
    CoreUtils -->|"Normalization"| Geography
    
    MarketStruct -->|"Market Insights"| TAMCalc
    MarketStruct -->|"Market Insights"| ExpansionTool
    MarketStruct -->|"Market Insights"| IndustrySnapshot
    
    MarketStruct -->|"Compose"| ViabilityScore
    IndustryEcon -->|"Compose"| ViabilityScore
    MacroCond -->|"Compose"| ViabilityScore
    
    MarketStruct -->|"Compose"| ExpansionTool
    Geography -->|"Compose"| ExpansionTool

MCP Group Structure

1. Core Government Data MCP

Purpose: Raw, authoritative access to U.S. government economic/business data APIs.

Role: Data-access layer, not an analytical engine.

Sources:

  • Census Bureau (ABS, BDS, BFS, CBP, ACS)
  • BEA (GDP, regional, industry)
  • BLS (OES, CES, PPI)
  • FRED (macro time series)
  • Treasury (yield curves, debt)
  • SEC EDGAR (company filings)

Rules:

  • ✅ Provide raw API access
  • ✅ Minimal normalization (format only)
  • ✅ Return provenance metadata
  • ❌ No joins across agencies
  • ❌ No derived metrics
  • ❌ No business logic
  • ❌ No interpretation

Naming Convention: All tools prefixed with fetch_raw_* to signal raw data access.

Downstream Consumers:

  • Market Structure MCP
  • Industry Economics MCP
  • Macro Conditions MCP
  • Labor & Skills MCP
  • Geography MCP

2. Market Structure MCP

Purpose: Market size, structure, and reachability insights.

Decision Surface: “How big/fragmented is the market? Where are businesses forming? Who can be reached?”

Sources (via Core Gov Data MCP):

  • Census (ABS, BDS, BFS, CBP)
  • BEA (GDP by industry/region)
  • SBA (business formation data)

Outputs:

  • Market size (TAM/SAM/SOM)
  • Firm counts by size category
  • Entry/exit rates
  • Market fragmentation (HHI, concentration)
  • Market reachability by firm size

Powers:

  • TAM/SAM/SOM calculators
  • Market viability tools
  • Expansion logic
  • Industry snapshots

Tools:

  • get_market_size(naics, geography, year)
  • get_firm_counts(naics, geography, year)
  • get_entry_exit_rates(naics, geography, year_range)
  • get_market_fragmentation(naics, geography, year)
  • get_market_reachability(naics, geography, firm_size)

3. Industry Economics MCP ✅

Purpose: Profitability, cost structure, and cyclicality analysis.

Decision Surface: “Are margins real? What drives costs? How cyclical is the industry?”

Sources (via Core Gov Data MCP):

  • BEA (industry GDP, profits)
  • IRS SOI (corporate/partnership income by sector)
  • BLS (PPI, cost indices)

Outputs:

  • Margins (gross, operating, net)
  • Cost mix (labor, materials, overhead)
  • Cyclicality indicators
  • Structural profitability benchmarks

Powers:

  • Profitability benchmarks
  • Cost structure analysis
  • Reality checks (hype correction)

Status: ✅ Complete - Standalone implementation with own API clients

4. Macro Conditions MCP ✅

Purpose: Macroeconomic regime indicators and pressures.

Decision Surface: “Are conditions tightening/easing? What regime are we in? How does it affect businesses?”

Sources (via Core Gov Data MCP):

  • FRED (inflation, rates, GDP, employment)
  • Treasury (yield curves, debt) - Planned
  • Federal Reserve (non-FRED: supervision, stress tests) - Planned

Outputs:

  • Regime indicators (expansion/recession/tightening/easing)
  • Industry-specific macro effects
  • Cost indices
  • Margin risk indicators

Powers:

  • Dashboards
  • Scenario engines
  • Outlook reports

Rule: Never standalone—always feeds other MCPs.

Status: ✅ Complete - All 3 tools implemented with FRED and BEA data

5. Labor & Skills MCP ✅

Purpose: Hiring viability, wage pressure, and automation exposure.

Decision Surface: “Can the industry hire? At what cost? What’s automatable?”

Sources (via Core Gov Data MCP):

  • BLS (OES, CES)
  • O*NET (occupation skills, tasks, automation) - Bulk download
  • Census (demographics, labor force)

Outputs:

  • Skill demand/availability
  • Wage pressure indicators
  • Automation exposure scores
  • Workforce planning metrics

Powers:

  • Hiring viability tools
  • Wage pressure indicators
  • Workforce planning
  • AI readiness assessment

Status: ✅ Complete - All 3 tools implemented with BLS and Census data

6. Geography & Real Assets MCP ✅

Purpose: Location recommendations, cost/demand tradeoffs, growth arbitrage.

Decision Surface: “Where is it cheapest to operate? Where is demand growing? What regions make sense?”

Sources (via Core Gov Data MCP):

  • Census (demographics, business formation)
  • BEA (regional GDP)
  • BLS (regional employment/wages)
  • FHFA/HUD (housing, real assets) - Planned
  • Market Structure MCP (via composition)
  • Industry Economics MCP (via composition)

Outputs:

  • Location rankings
  • Cost/demand tradeoffs
  • Growth arbitrage opportunities
  • Regional viability scores

Powers:

  • Location recommendations
  • Expansion planning
  • Regional strategy decisions

Status: ✅ Complete - All 3 tools implemented with MCP composition

Data Flow

Request Flow

  1. Product calls domain MCP (e.g., Market Structure)
  2. Domain MCP calls Core Government Data MCP for raw data
  3. Domain MCP calls Core Utilities MCP for normalization
  4. Domain MCP performs joins and analysis
  5. Domain MCP returns interpreted results with provenance

Composition Flow

  1. Product calls multiple domain MCPs
  2. Product combines outputs (e.g., Market Structure + Industry Economics)
  3. Product returns composite insights

Provenance Envelope

Every output from domain MCPs must include:

{
  "provenance": {
    "sources": [
      {
        "agency": "Census Bureau",
        "dataset": "BDS",
        "dataset_id": "BDS_2023",
        "release_date": "2024-03-15",
        "variables": ["ESTABS_ENTRY", "ESTABS_EXIT"]
      }
    ],
    "transforms": [
      "Normalized NAICS code: '0054' → '54'",
      "Aggregated firm counts by size category",
      "Calculated HHI from firm size distribution"
    ],
    "units": {
      "market_size": "USD",
      "inflation_base_year": 2023,
      "geography": "us:*"
    },
    "methodology": "Market size calculated as sum of firm revenues from ABS, adjusted for inflation to 2023 base year."
  }
}

Error Model

Error Categories

  1. Data Not Available: Requested data doesn’t exist or is suppressed
  2. Invalid Input: Invalid NAICS code, geography, or year
  3. API Error: Upstream API failure (with retry guidance)
  4. Computation Error: Analysis failed (e.g., division by zero)

Error Response Format

{
  "success": false,
  "error": {
    "error_code": "data_not_available",
    "message": "Market size data not available for NAICS 999999 in geography us:* for year 2023",
    "retry_after": null,
    "suggestions": [
      "Try a broader NAICS code (e.g., '54' instead of '541211')",
      "Try a different geography (e.g., 'state:06' instead of 'us:*')"
    ]
  }
}

Versioning Strategy

MCP Versioning

  • Semantic versioning: v1.0.0, v1.1.0, v2.0.0
  • Breaking changes: New major version (e.g., v2.0.0)
  • New features: New minor version (e.g., v1.1.0)
  • Bug fixes: New patch version (e.g., v1.0.1)

Tool Versioning

  • Tools are versioned within MCP version
  • Deprecated tools marked with @deprecated in docs
  • New versions of tools use suffix (e.g., get_market_size_v2)

Schema Versioning

  • Input/output schemas versioned separately
  • Schema changes documented in docs/contracts/
  • Breaking schema changes trigger MCP major version bump

Communication Patterns

In-Process Composition (Current)

Domain MCPs import Core utilities and Core data-access as libraries.

  • Communication: Function calls with typed DTOs
  • Pros: Fastest dev, easiest testing, no network overhead
  • Cons: Less independently deployable; versioning is repo-level

Current approach: Start with in-process composition.

Service Composition (Future)

Each MCP is a service exposing an API (HTTP/JSON).

  • Communication: HTTP with strict schemas + versioning
  • Pros: Deploy/scale independently, clearer boundaries
  • Cons: More infra + failure modes; contract management matters

Future approach: Design interfaces as if moving to service composition later.

Build Order

Phase 0: Core Utilities (Week 0-1)

Stop Criteria: Datasets join by industry × geography × time.

Deliverables:

  • NAICS normalization
  • Geography normalization
  • Time alignment utilities
  • Basic mappings

Phase 1: Market Structure MCP (Weeks 1-3)

Stop Criteria: Answer “How big/fragmented/where/for whom?” for any NAICS.

Deliverables:

  • 5 core tools implemented
  • TAM tool integration
  • Industry snapshots
  • Selectors

Phase 2: Industry Economics MCP (Weeks 3-6)

Stop Criteria: Identify “big but structurally bad” markets.

Deliverables:

  • Profitability panels
  • Cost structure charts
  • Reality checks

Phase 3: Macro Conditions MCP (Weeks 6-8)

Stop Criteria: Macro answers “So what?” concisely.

Deliverables:

  • Industry-specific explanations
  • Regime indicators
  • Scenario toggles

Phase 4: Labor & Skills + Geography MCPs (Weeks 8-12)

Stop Criteria: Users decide “where/whether to operate.”

Deliverables:

  • Hiring viability tools
  • Location recommendations
  • Expansion widgets

Benefits

  • Speed: Parallel development of domain MCPs
  • Clarity: Clear ownership and boundaries
  • Extensibility: Easy to add new domain MCPs
  • Defensible IP: Methodology and composition logic
  • Flexibility: Ready for LLMs or APIs

Non-Goals

This architecture does NOT:

  • Create a single mega-MCP
  • Mix domain logic in Core Government Data MCP
  • Chain MCPs at request-time without proper error handling
  • Duplicate logic across MCPs
  • Skip provenance tracking

References