The previous article in this series defined enterprise context management and explained why it matters. This article is the engineering blueprint. We'll build a context engine from the ground up — covering architecture, ingestion, storage, retrieval, and delivery through the Model Context Protocol (MCP).

This isn't a theoretical exercise. The patterns described here are running in production, serving context to AI systems across a multi-service platform. Every code example is drawn from real implementation, simplified for clarity.

Architecture Overview

A context engine has five layers, each with a distinct responsibility. Data flows from left to right: organizational knowledge enters through ingestion, gets structured and stored, and exits through delivery to AI consumers.

graph LR subgraph Sources S1[Git Repos] S2[Documents] S3[Project Tools] S4[APIs & Services] end subgraph Ingestion I1[Source Adapters] I2[Content Extractors] end subgraph Processing P1[Chunker] P2[Entity Extractor] P3[Embedder] P4[Classifier] end subgraph Storage D1[(Relational DB)] D2[(Vector Store)] end subgraph Delivery E1[MCP Server] E2[REST API] E3[Webhook Push] end S1 --> I1 S2 --> I1 S3 --> I1 S4 --> I1 I1 --> I2 I2 --> P1 P1 --> P2 P2 --> P3 P3 --> P4 P4 --> D1 P3 --> D2 D1 --> E1 D2 --> E1 D1 --> E2 D2 --> E2 D1 --> E3 style D1 fill:#1e3a8a,stroke:#93c5fd,color:#fff style D2 fill:#7c3aed,stroke:#c4b5fd,color:#fff style E1 fill:#059669,stroke:#6ee7b7,color:#fff
*Figure 1: Context engine architecture — five layers from source ingestion to AI delivery*

Layer 1: Ingestion

Every organization stores knowledge differently — code in Git, decisions in Confluence, processes in Notion, relationships in CRMs, architecture in ADRs. The ingestion layer normalizes this chaos into a consistent format.

Source Adapters

Each knowledge source gets a dedicated adapter. The adapter knows how to authenticate, paginate, and extract content from its source. The critical design decision is making adapters incremental — they track what's changed since the last sync rather than re-ingesting everything.

This PHP interface defines the contract that every source adapter must implement. The fetchSince() method accepts a timestamp and returns only records modified after that point, keeping synchronization efficient even as your knowledge base grows:

interface SourceAdapter
{
    /**
     * Return records modified since the given timestamp.
     * Returns an iterable to handle large result sets without memory exhaustion.
     */
    public function fetchSince(Carbon $since): iterable;

    /**
     * Unique identifier for this source (e.g., 'github:org/repo').
     */
    public function sourceIdentifier(): string;
}

A Git repository adapter, for example, uses git log --since to identify changed files, then reads only those files. This PHP class shows how the SourceAdapter interface is implemented for Git repositories — it filters files by configurable include paths and yields RawContent objects with source metadata attached:

class GitRepositoryAdapter implements SourceAdapter
{
    public function __construct(
        private string $repoPath,
        private array $includePaths = ['docs/', 'README.md', 'ARCHITECTURE.md'],
    ) {}

    public function fetchSince(Carbon $since): iterable
    {
        $changedFiles = $this->getChangedFiles($since);

        foreach ($changedFiles as $file) {
            if (!$this->shouldInclude($file)) continue;

            yield new RawContent(
                source: $this->sourceIdentifier(),
                path: $file,
                content: file_get_contents($this->repoPath . '/' . $file),
                lastModified: $this->getFileModifiedDate($file),
                metadata: [
                    'repository' => basename($this->repoPath),
                    'file_type' => pathinfo($file, PATHINFO_EXTENSION),
                ],
            );
        }
    }

    private function shouldInclude(string $file): bool
    {
        foreach ($this->includePaths as $pattern) {
            if (str_contains($pattern, '*') ? fnmatch($pattern, $file) : str_starts_with($file, $pattern)) {
                return true;
            }
        }
        return false;
    }
}
Implementation Tip Store a `last_synced_at` timestamp per source adapter. On each sync run, fetch only records modified after the last successful sync. This keeps ingestion fast even as your knowledge base grows. If a sync fails, don't update the timestamp — the next run will retry the same window.

Content Extractors

Raw content from source adapters comes in many formats: Markdown, HTML, PDF, code files, JSON API responses. Content extractors normalize everything to clean text while preserving structural information (headings, lists, code blocks).

This PHP class shows a Markdown content extractor. It parses headings, identifies code blocks and their languages, and strips formatting to produce clean text — while preserving the structural metadata that downstream processing steps need:

class MarkdownExtractor implements ContentExtractor
{
    public function extract(RawContent $raw): ExtractedContent
    {
        $headings = $this->extractHeadings($raw->content);
        $codeBlocks = $this->extractCodeBlocks($raw->content);
        $plainText = $this->stripToText($raw->content);

        return new ExtractedContent(
            text: $plainText,
            structure: [
                'headings' => $headings,
                'has_code' => !empty($codeBlocks),
                'languages' => array_unique(array_column($codeBlocks, 'language')),
                'word_count' => str_word_count($plainText),
            ],
        );
    }
}

Layer 2: Processing

Raw extracted content must be transformed into structured, searchable context records. This is where the magic happens — and where most context engines differentiate themselves.

Semantic Chunking

The most important processing step is chunking: breaking long documents into pieces that each contain a coherent thought or concept. Naive chunking by token count produces fragments that confuse retrieval systems. Semantic chunking respects content boundaries.

The following PHP class splits content at natural boundaries — headings and paragraph breaks — rather than arbitrary token counts. When a section is too large for a single chunk, it subdivides by paragraph while preserving the parent heading for context:

class SemanticChunker
{
    public function chunk(ExtractedContent $content, int $maxTokens = 512): array
    {
        $sections = $this->splitBySections($content->text);
        $chunks = [];

        foreach ($sections as $section) {
            $tokenCount = $this->countTokens($section['content']);

            if ($tokenCount <= $maxTokens) {
                // Section fits in one chunk
                $chunks[] = new Chunk(
                    content: $section['content'],
                    heading: $section['heading'],
                    tokenCount: $tokenCount,
                );
            } else {
                // Split large sections by paragraph, respecting boundaries
                $chunks = array_merge(
                    $chunks,
                    $this->splitByParagraphs($section, $maxTokens)
                );
            }
        }

        return $chunks;
    }

    private function splitBySections(string $text): array
    {
        // Split on markdown headings (##, ###, etc.)
        $parts = preg_split('/^(#{2,4}\s+.+)$/m', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
        $sections = [];
        $currentHeading = null;
        $currentContent = '';

        foreach ($parts as $part) {
            if (preg_match('/^#{2,4}\s+/', $part)) {
                if ($currentContent) {
                    $sections[] = ['heading' => $currentHeading, 'content' => trim($currentContent)];
                }
                $currentHeading = trim($part);
                $currentContent = '';
            } else {
                $currentContent .= $part;
            }
        }

        if ($currentContent) {
            $sections[] = ['heading' => $currentHeading, 'content' => trim($currentContent)];
        }

        return $sections;
    }
}

Entity Extraction

After chunking, each chunk is analyzed for entities — the people, systems, concepts, and relationships it references. Entity extraction enables the context engine to answer queries like "find all context related to the authentication system" even when a chunk doesn't use the word "authentication" but does mention "JWT tokens" and "SSO."

This PHP class performs entity extraction using a combination of pattern matching and domain terminology cross-referencing. It identifies system names, technologies, and domain concepts within each chunk and assigns confidence scores to the matches:

class EntityExtractor
{
    private array $domainTerms;

    public function extract(Chunk $chunk): array
    {
        $entities = [];

        // Pattern-based extraction for known entity types
        $entities = array_merge($entities, $this->extractSystemNames($chunk->content));
        $entities = array_merge($entities, $this->extractTechnologies($chunk->content));
        $entities = array_merge($entities, $this->extractDomainConcepts($chunk->content));

        // Cross-reference with known domain terminology
        foreach ($this->domainTerms as $term => $category) {
            if (stripos($chunk->content, $term) !== false) {
                $entities[] = new Entity(
                    name: $term,
                    type: $category,
                    confidence: 0.85,
                );
            }
        }

        return array_unique($entities);
    }
}

Embedding Generation

Each chunk gets a vector embedding — a numerical representation of its semantic meaning. These embeddings power similarity search in the vector store. The choice of embedding model matters: you want a model trained on technical/enterprise content, not just general web text.

This PHP class generates embeddings using OpenAI's API. It prepends the chunk's heading to its content for richer semantic representation, and supports batch processing for efficiency — up to 100 chunks per API call:

class EmbeddingGenerator
{
    public function __construct(
        private OpenAIClient $client,
        private string $model = 'text-embedding-3-small',
    ) {}

    public function embed(Chunk $chunk): array
    {
        $response = $this->client->embeddings()->create([
            'model' => $this->model,
            'input' => $chunk->heading
                ? "{$chunk->heading}\n\n{$chunk->content}"
                : $chunk->content,
        ]);

        return $response->embeddings[0]->embedding;
    }

    /**
     * Batch embedding for efficiency — process up to 100 chunks at once.
     */
    public function embedBatch(array $chunks): array
    {
        $inputs = array_map(fn (Chunk $c) => $c->heading
            ? "{$c->heading}\n\n{$c->content}"
            : $c->content, $chunks);

        $response = $this->client->embeddings()->create([
            'model' => $this->model,
            'input' => $inputs,
        ]);

        return array_map(fn ($e) => $e->embedding, $response->embeddings);
    }
}

Layer 3: Storage

Context records need two types of storage, each optimized for different query patterns.

Relational Store

The relational database stores context metadata — everything except the vector embedding. This enables precise filtering: "find all architecture records verified in the last 90 days that mention the authentication system."

This Laravel migration defines the context_records table schema. Notice the confidence score, freshness_policy, and supersedes fields — these enable the delivery layer to prioritize current, verified, high-confidence records over stale or uncertain ones:

// Migration
Schema::create('context_records', function (Blueprint $table) {
    $table->id();
    $table->string('context_id')->unique();
    $table->string('domain')->index();
    $table->string('source');
    $table->string('title');
    $table->longText('content');
    $table->string('heading')->nullable();
    $table->json('entities')->nullable();
    $table->json('metadata')->nullable();
    $table->float('confidence')->default(0.5);
    $table->integer('token_count');
    $table->string('verified_by')->nullable();
    $table->timestamp('verified_at')->nullable();
    $table->string('supersedes')->nullable();
    $table->string('freshness_policy')->default('90d');
    $table->timestamp('deprecated_at')->nullable();
    $table->timestamps();

    $table->index(['domain', 'deprecated_at']);
    $table->index(['verified_at']);
});

Vector Store

The vector store holds embeddings alongside record identifiers for similarity search. In production, we use a dedicated vector database, but you can start with SQLite and a cosine similarity function for prototyping.

graph TB subgraph "Query Flow" Q[Query: 'How does auth token rotation work?'] --> E[Embed Query] E --> VS[Vector Similarity Search] E --> FS[Full-Text Search] VS --> M[Merge & Deduplicate] FS --> M M --> R[Re-rank by Relevance] R --> F[Filter by Freshness & Confidence] F --> T[Fit to Token Budget] T --> P[Context Package] end style Q fill:#f59e0b,stroke:#d97706,color:#000 style P fill:#059669,stroke:#6ee7b7,color:#fff style VS fill:#7c3aed,stroke:#c4b5fd,color:#fff
*Figure 2: Dual retrieval — vector similarity and full-text search are merged and re-ranked for optimal context delivery*

Layer 4: Retrieval & Ranking

When a query arrives, the retrieval layer pulls candidates from both stores and produces a ranked list. The ranking formula balances four signals:

Signal Weight Rationale
Semantic similarity 0.40 How closely the record's meaning matches the query
Entity overlap 0.25 How many extracted entities match between query and record
Freshness 0.20 How recently the record was verified (exponential decay)
Confidence 0.15 The record's stated confidence score

This PHP class implements the weighted scoring formula. The freshnessDecay() method uses an exponential decay function with a 90-day half-life — a record verified yesterday scores nearly 1.0, while one verified six months ago scores closer to 0.1:

class RelevanceScorer
{
    public function score(ContextRecord $record, QueryContext $query): float
    {
        $similarity = $this->cosineSimilarity($query->embedding, $record->embedding);
        $entityOverlap = $this->entityOverlap($query->entities, $record->entities);
        $freshness = $this->freshnessDecay($record->verified_at);
        $confidence = $record->confidence;

        return (0.40 * $similarity)
             + (0.25 * $entityOverlap)
             + (0.20 * $freshness)
             + (0.15 * $confidence);
    }

    private function freshnessDecay(?Carbon $verifiedAt): float
    {
        if (!$verifiedAt) return 0.3; // Unverified records get a penalty

        $daysSinceVerification = $verifiedAt->diffInDays(now());
        // Exponential decay: half-life of 90 days
        return exp(-0.693 * $daysSinceVerification / 90);
    }

    private function entityOverlap(array $queryEntities, array $recordEntities): float
    {
        if (empty($queryEntities)) return 0;

        $queryNames = array_map(fn ($e) => strtolower($e->name), $queryEntities);
        $recordNames = array_map(fn ($e) => strtolower($e->name), $recordEntities);
        $overlap = count(array_intersect($queryNames, $recordNames));

        return min(1.0, $overlap / count($queryNames));
    }
}

Token Budget Fitting

AI models have finite context windows. The token fitter selects the highest-scoring records that fit within the available budget, using a greedy algorithm that maximizes total relevance score per token spent.

This PHP class implements the greedy selection algorithm. It iterates through records sorted by relevance score, adding each one to the package if it fits within the remaining token budget, and stops when the budget drops below the minimum useful chunk size of 50 tokens:

class TokenBudgetFitter
{
    public function fit(Collection $scored, int $budget): ContextPackage
    {
        $selected = collect();
        $remaining = $budget;

        foreach ($scored->sortByDesc('score') as $record) {
            if ($record->token_count <= $remaining) {
                $selected->push($record);
                $remaining -= $record->token_count;
            }

            if ($remaining < 50) break; // Minimum useful chunk size
        }

        return new ContextPackage(
            records: $selected,
            totalTokens: $budget - $remaining,
            budget: $budget,
        );
    }
}

Layer 5: Delivery via MCP

The Model Context Protocol (MCP) is the delivery mechanism that connects your context engine to AI systems. MCP defines a standardized interface for AI tools to request and receive context.

An MCP server exposes your context engine as a set of tools that AI systems can call. The following JavaScript implementation creates a minimal MCP server using the official @modelcontextprotocol/sdk package. It exposes two tools — search_context for natural language queries and get_context_for_files for file-based context retrieval — and communicates over standard I/O:

// mcp-server.js — Context Engine MCP Server
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";

const server = new Server({
  name: "context-engine",
  version: "1.0.0",
}, {
  capabilities: { tools: {} },
});

// Tool: Search context by natural language query
server.setRequestHandler("tools/call", async (request) => {
  const { name, arguments: args } = request.params;

  if (name === "search_context") {
    const results = await searchContext(args.query, args.domain, args.limit);
    return {
      content: [{
        type: "text",
        text: formatContextResults(results),
      }],
    };
  }

  if (name === "get_context_for_files") {
    const results = await getContextForFiles(args.files);
    return {
      content: [{
        type: "text",
        text: formatContextResults(results),
      }],
    };
  }
});

// List available tools
server.setRequestHandler("tools/list", async () => ({
  tools: [
    {
      name: "search_context",
      description: "Search organizational context by natural language query",
      inputSchema: {
        type: "object",
        properties: {
          query: { type: "string", description: "Natural language search query" },
          domain: { type: "string", description: "Optional domain filter" },
          limit: { type: "number", description: "Max results", default: 10 },
        },
        required: ["query"],
      },
    },
    {
      name: "get_context_for_files",
      description: "Get relevant context for a set of source files",
      inputSchema: {
        type: "object",
        properties: {
          files: {
            type: "array",
            items: { type: "string" },
            description: "File paths to find context for",
          },
        },
        required: ["files"],
      },
    },
  ],
}));

const transport = new StdioServerTransport();
await server.connect(transport);

Configuring MCP in VS Code

With the MCP server built, connect it to your development environment. In VS Code with GitHub Copilot, add the following JSON configuration to your MCP settings file (.vscode/mcp.json or your user settings). This tells Copilot to launch the MCP server as a child process, connecting over standard I/O with your context engine's API URL and key passed as environment variables:

{
  "mcpServers": {
    "context-engine": {
      "command": "node",
      "args": ["./mcp-server.js"],
      "env": {
        "CONTEXT_API_URL": "https://api.yourplatform.com/context",
        "CONTEXT_API_KEY": "${env:CONTEXT_API_KEY}"
      }
    }
  }
}

Now when Copilot encounters a coding task, it can call search_context or get_context_for_files to retrieve relevant organizational knowledge before generating code. The AI system stops guessing about your architecture and starts knowing it.

Key Insight MCP turns your context engine from an internal tool into an AI-native service. Any MCP-compatible AI client — Copilot, Claude, custom agents — can consume your organizational context through a standardized protocol. Build once, deliver everywhere.

Operational Concerns

Cache Invalidation

Context that changes frequently (sprint status, deployment state) should have short TTLs. Stable context (architecture decisions, coding standards) can be cached aggressively. Implement a TTL-based cache with manual invalidation for forced refreshes.

This PHP class wraps Laravel's cache system with domain-aware TTLs. Architecture context is cached for 24 hours since it rarely changes, while sprint status is cached for only 15 minutes. The get() method also checks whether any source records have been updated since the cache was built, automatically invalidating stale entries:

class ContextCache
{
    public function get(string $key, string $domain): ?ContextPackage
    {
        $cached = Cache::get("context:{$domain}:{$key}");
        if (!$cached) return null;

        // Check if any source records have been updated since cache was built
        if ($this->hasStaleRecords($cached)) {
            Cache::forget("context:{$domain}:{$key}");
            return null;
        }

        return $cached;
    }

    public function put(string $key, string $domain, ContextPackage $package): void
    {
        $ttl = $this->ttlForDomain($domain);
        Cache::put("context:{$domain}:{$key}", $package, $ttl);
    }

    private function ttlForDomain(string $domain): int
    {
        return match($domain) {
            'architecture' => 86400,     // 24 hours
            'coding-standards' => 86400, // 24 hours
            'sprint-status' => 900,      // 15 minutes
            'deployments' => 300,        // 5 minutes
            default => 3600,             // 1 hour
        };
    }
}

Rate Limiting

Your context engine will receive bursts of queries during active development sessions. Implement rate limiting per consumer with generous defaults — context queries should never block productive work.

Monitoring

Track these metrics in production:

  • Query latency (p50, p95, p99) — context delivery should be <200ms at p95
  • Cache hit rate — target >60% to keep query latency low
  • Records per query — how many context records are returned per request
  • Token utilization — what percentage of the token budget is used (low utilization suggests poor coverage)
  • Staleness ratio — percentage of records past their freshness threshold

Putting It All Together

The complete pipeline from organizational knowledge to AI context delivery:

  1. Source adapters continuously sync changes from your knowledge sources
  2. Content extractors normalize raw content to clean text with structure
  3. Semantic chunker breaks content into coherent pieces
  4. Entity extractor identifies systems, concepts, and relationships
  5. Embedding generator creates vector representations for similarity search
  6. Dual storage persists structured metadata and vector embeddings
  7. Retrieval engine merges structured filtering with semantic similarity
  8. Relevance scorer ranks candidates by similarity, freshness, confidence, and entity overlap
  9. Token fitter selects optimal records within the model's context window
  10. MCP server delivers assembled context to any AI consumer

This is not a weekend project — but it doesn't have to be a year-long initiative either. Start with a single source adapter (your main code repository's documentation), a simple chunker, and a basic MCP server. Get context flowing to your AI tools, measure the quality, and iterate from there.

The complete architecture is documented on our Enterprise Context Management platform, and the companion article on context management versus knowledge management explores the strategic differences in depth.