MCP Architecture: Government Data Platform
MCP Architecture: Government Data Platform
Overview
This document defines the architecture for a modular government data platform built on MCP (Model Context Protocol) groups. The architecture emphasizes domain-specific engines over a single “mega-MCP” to ensure scalability, clarity, and maintainability.
Core Principles
Avoid a Single Mega-MCP
A single monolithic MCP leads to:
- Cognitive overload: Too many tools, unclear boundaries
- Tight coupling: Changes ripple across unrelated domains
- Slow iteration: Hard to evolve individual domains independently
- Blurry ownership: Unclear who owns what logic
- Scaling issues: Becomes unmaintainable as it grows
Solution: Use MCP groups, each owning a coherent “decision surface” (type of question).
Use MCP Groups
Each MCP group:
- Owns a coherent decision surface (e.g., “How big/fragmented is the market?”)
- Has stable inputs (NAICS, geography, time)
- Produces reusable analytical outputs
- Builds engines, not endpoints
Core Utilities MCP
A small, shared layer for primitives:
- Normalization (NAICS codes, geography codes, time alignment)
- Unit conversion
- Inflation adjustment
- Mappings (NAICS/geography concordances)
All domain MCPs depend on Core Utilities.
Composition Over Duplication
Products combine MCP outputs:
- TAM Calculator = Market Structure MCP
- Expansion Tool = Market Structure + Geography MCPs
- Viability Score = Market Structure + Industry Economics + Macro Conditions MCPs
When a Single MCP Works
Only for shared primitives:
- Normalization
- Unit conversion
- Inflation adjustment
- Mappings (NAICS/geography)
Not for domain logic or multi-domain joins.
Architecture Diagram
graph TB
subgraph "Core Layer"
CoreGovData["Core Government Data MCP<br/>(Raw Data Access)"]
CoreUtils["Core Utilities MCP<br/>(Normalization, Mappings)"]
end
subgraph "Domain MCPs"
MarketStruct["Market Structure MCP<br/>(Size, Fragmentation, Reachability)"]
IndustryEcon["Industry Economics MCP<br/>(Margins, Costs, Cyclicality)"]
MacroCond["Macro Conditions MCP<br/>(Regime Indicators, Pressures)"]
LaborSkills["Labor & Skills MCP<br/>(Hiring, Wages, Automation)"]
Geography["Geography & Real Assets MCP<br/>(Location, Costs, Demand)"]
end
subgraph "Products"
TAMCalc["TAM Calculator"]
ExpansionTool["Expansion Planner"]
ViabilityScore["Market Viability Tool"]
IndustrySnapshot["Industry Snapshots"]
end
CoreGovData -->|"Raw Data"| MarketStruct
CoreGovData -->|"Raw Data"| IndustryEcon
CoreGovData -->|"Raw Data"| MacroCond
CoreGovData -->|"Raw Data"| LaborSkills
CoreGovData -->|"Raw Data"| Geography
CoreUtils -->|"Normalization"| MarketStruct
CoreUtils -->|"Normalization"| IndustryEcon
CoreUtils -->|"Normalization"| MacroCond
CoreUtils -->|"Normalization"| LaborSkills
CoreUtils -->|"Normalization"| Geography
MarketStruct -->|"Market Insights"| TAMCalc
MarketStruct -->|"Market Insights"| ExpansionTool
MarketStruct -->|"Market Insights"| IndustrySnapshot
MarketStruct -->|"Compose"| ViabilityScore
IndustryEcon -->|"Compose"| ViabilityScore
MacroCond -->|"Compose"| ViabilityScore
MarketStruct -->|"Compose"| ExpansionTool
Geography -->|"Compose"| ExpansionTool
MCP Group Structure
1. Core Government Data MCP
Purpose: Raw, authoritative access to U.S. government economic/business data APIs.
Role: Data-access layer, not an analytical engine.
Sources:
- Census Bureau (ABS, BDS, BFS, CBP, ACS)
- BEA (GDP, regional, industry)
- BLS (OES, CES, PPI)
- FRED (macro time series)
- Treasury (yield curves, debt)
- SEC EDGAR (company filings)
Rules:
- ✅ Provide raw API access
- ✅ Minimal normalization (format only)
- ✅ Return provenance metadata
- ❌ No joins across agencies
- ❌ No derived metrics
- ❌ No business logic
- ❌ No interpretation
Naming Convention: All tools prefixed with fetch_raw_* to signal raw data access.
Downstream Consumers:
- Market Structure MCP
- Industry Economics MCP
- Macro Conditions MCP
- Labor & Skills MCP
- Geography MCP
2. Market Structure MCP
Purpose: Market size, structure, and reachability insights.
Decision Surface: “How big/fragmented is the market? Where are businesses forming? Who can be reached?”
Sources (via Core Gov Data MCP):
- Census (ABS, BDS, BFS, CBP)
- BEA (GDP by industry/region)
- SBA (business formation data)
Outputs:
- Market size (TAM/SAM/SOM)
- Firm counts by size category
- Entry/exit rates
- Market fragmentation (HHI, concentration)
- Market reachability by firm size
Powers:
- TAM/SAM/SOM calculators
- Market viability tools
- Expansion logic
- Industry snapshots
Tools:
get_market_size(naics, geography, year)get_firm_counts(naics, geography, year)get_entry_exit_rates(naics, geography, year_range)get_market_fragmentation(naics, geography, year)get_market_reachability(naics, geography, firm_size)
3. Industry Economics MCP ✅
Purpose: Profitability, cost structure, and cyclicality analysis.
Decision Surface: “Are margins real? What drives costs? How cyclical is the industry?”
Sources (via Core Gov Data MCP):
- BEA (industry GDP, profits)
- IRS SOI (corporate/partnership income by sector)
- BLS (PPI, cost indices)
Outputs:
- Margins (gross, operating, net)
- Cost mix (labor, materials, overhead)
- Cyclicality indicators
- Structural profitability benchmarks
Powers:
- Profitability benchmarks
- Cost structure analysis
- Reality checks (hype correction)
Status: ✅ Complete - Standalone implementation with own API clients
4. Macro Conditions MCP ✅
Purpose: Macroeconomic regime indicators and pressures.
Decision Surface: “Are conditions tightening/easing? What regime are we in? How does it affect businesses?”
Sources (via Core Gov Data MCP):
- FRED (inflation, rates, GDP, employment)
- Treasury (yield curves, debt) - Planned
- Federal Reserve (non-FRED: supervision, stress tests) - Planned
Outputs:
- Regime indicators (expansion/recession/tightening/easing)
- Industry-specific macro effects
- Cost indices
- Margin risk indicators
Powers:
- Dashboards
- Scenario engines
- Outlook reports
Rule: Never standalone—always feeds other MCPs.
Status: ✅ Complete - All 3 tools implemented with FRED and BEA data
5. Labor & Skills MCP ✅
Purpose: Hiring viability, wage pressure, and automation exposure.
Decision Surface: “Can the industry hire? At what cost? What’s automatable?”
Sources (via Core Gov Data MCP):
- BLS (OES, CES)
- O*NET (occupation skills, tasks, automation) - Bulk download
- Census (demographics, labor force)
Outputs:
- Skill demand/availability
- Wage pressure indicators
- Automation exposure scores
- Workforce planning metrics
Powers:
- Hiring viability tools
- Wage pressure indicators
- Workforce planning
- AI readiness assessment
Status: ✅ Complete - All 3 tools implemented with BLS and Census data
6. Geography & Real Assets MCP ✅
Purpose: Location recommendations, cost/demand tradeoffs, growth arbitrage.
Decision Surface: “Where is it cheapest to operate? Where is demand growing? What regions make sense?”
Sources (via Core Gov Data MCP):
- Census (demographics, business formation)
- BEA (regional GDP)
- BLS (regional employment/wages)
- FHFA/HUD (housing, real assets) - Planned
- Market Structure MCP (via composition)
- Industry Economics MCP (via composition)
Outputs:
- Location rankings
- Cost/demand tradeoffs
- Growth arbitrage opportunities
- Regional viability scores
Powers:
- Location recommendations
- Expansion planning
- Regional strategy decisions
Status: ✅ Complete - All 3 tools implemented with MCP composition
Data Flow
Request Flow
- Product calls domain MCP (e.g., Market Structure)
- Domain MCP calls Core Government Data MCP for raw data
- Domain MCP calls Core Utilities MCP for normalization
- Domain MCP performs joins and analysis
- Domain MCP returns interpreted results with provenance
Composition Flow
- Product calls multiple domain MCPs
- Product combines outputs (e.g., Market Structure + Industry Economics)
- Product returns composite insights
Provenance Envelope
Every output from domain MCPs must include:
{
"provenance": {
"sources": [
{
"agency": "Census Bureau",
"dataset": "BDS",
"dataset_id": "BDS_2023",
"release_date": "2024-03-15",
"variables": ["ESTABS_ENTRY", "ESTABS_EXIT"]
}
],
"transforms": [
"Normalized NAICS code: '0054' → '54'",
"Aggregated firm counts by size category",
"Calculated HHI from firm size distribution"
],
"units": {
"market_size": "USD",
"inflation_base_year": 2023,
"geography": "us:*"
},
"methodology": "Market size calculated as sum of firm revenues from ABS, adjusted for inflation to 2023 base year."
}
}
Error Model
Error Categories
- Data Not Available: Requested data doesn’t exist or is suppressed
- Invalid Input: Invalid NAICS code, geography, or year
- API Error: Upstream API failure (with retry guidance)
- Computation Error: Analysis failed (e.g., division by zero)
Error Response Format
{
"success": false,
"error": {
"error_code": "data_not_available",
"message": "Market size data not available for NAICS 999999 in geography us:* for year 2023",
"retry_after": null,
"suggestions": [
"Try a broader NAICS code (e.g., '54' instead of '541211')",
"Try a different geography (e.g., 'state:06' instead of 'us:*')"
]
}
}
Versioning Strategy
MCP Versioning
- Semantic versioning:
v1.0.0,v1.1.0,v2.0.0 - Breaking changes: New major version (e.g.,
v2.0.0) - New features: New minor version (e.g.,
v1.1.0) - Bug fixes: New patch version (e.g.,
v1.0.1)
Tool Versioning
- Tools are versioned within MCP version
- Deprecated tools marked with
@deprecatedin docs - New versions of tools use suffix (e.g.,
get_market_size_v2)
Schema Versioning
- Input/output schemas versioned separately
- Schema changes documented in
docs/contracts/ - Breaking schema changes trigger MCP major version bump
Communication Patterns
In-Process Composition (Current)
Domain MCPs import Core utilities and Core data-access as libraries.
- Communication: Function calls with typed DTOs
- Pros: Fastest dev, easiest testing, no network overhead
- Cons: Less independently deployable; versioning is repo-level
Current approach: Start with in-process composition.
Service Composition (Future)
Each MCP is a service exposing an API (HTTP/JSON).
- Communication: HTTP with strict schemas + versioning
- Pros: Deploy/scale independently, clearer boundaries
- Cons: More infra + failure modes; contract management matters
Future approach: Design interfaces as if moving to service composition later.
Build Order
Phase 0: Core Utilities (Week 0-1)
Stop Criteria: Datasets join by industry × geography × time.
Deliverables:
- NAICS normalization
- Geography normalization
- Time alignment utilities
- Basic mappings
Phase 1: Market Structure MCP (Weeks 1-3)
Stop Criteria: Answer “How big/fragmented/where/for whom?” for any NAICS.
Deliverables:
- 5 core tools implemented
- TAM tool integration
- Industry snapshots
- Selectors
Phase 2: Industry Economics MCP (Weeks 3-6)
Stop Criteria: Identify “big but structurally bad” markets.
Deliverables:
- Profitability panels
- Cost structure charts
- Reality checks
Phase 3: Macro Conditions MCP (Weeks 6-8)
Stop Criteria: Macro answers “So what?” concisely.
Deliverables:
- Industry-specific explanations
- Regime indicators
- Scenario toggles
Phase 4: Labor & Skills + Geography MCPs (Weeks 8-12)
Stop Criteria: Users decide “where/whether to operate.”
Deliverables:
- Hiring viability tools
- Location recommendations
- Expansion widgets
Benefits
- Speed: Parallel development of domain MCPs
- Clarity: Clear ownership and boundaries
- Extensibility: Easy to add new domain MCPs
- Defensible IP: Methodology and composition logic
- Flexibility: Ready for LLMs or APIs
Non-Goals
This architecture does NOT:
- Create a single mega-MCP
- Mix domain logic in Core Government Data MCP
- Chain MCPs at request-time without proper error handling
- Duplicate logic across MCPs
- Skip provenance tracking