EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryKnowledge Management
Proven
⇄ Compare

EAAPL-KNW001: Enterprise Knowledge Graph

EAAPL-KNW001: Enterprise Knowledge Graph

Pattern ID: EAAPL-KNW001 Status: Proven Complexity: High Tags: knowledge-graph embedding enterprise-only high-complexity Version: 1.0 Last Updated: 2026-06-12


1. Executive Summary

An Enterprise Knowledge Graph (EKG) is a persistent, curated, machine-readable representation of an organisation's entities, relationships, and facts. Unlike a document corpus, a knowledge graph stores structured semantic relationships that AI applications can traverse, reason over, and cite precisely.

This pattern covers the full lifecycle: ontology design via domain expert workshops; ingestion pipelines that combine NLP-extracted facts with structured database mappings; graph database selection aligned to workload profile; versioning and rollback; entity resolution and quality management; and integration with RAG pipelines where graph traversal augments vector similarity search.

For a CIO/CTO audience, the primary value proposition is twofold. First, the EKG becomes the enterprise's durable, governed knowledge asset — independent of any single AI vendor or model. Second, AI applications powered by an EKG produce answers that are traceable, auditable, and correctable because every claim maps back to a specific node, edge, and source document. This directly addresses model risk and regulatory explainability requirements without requiring per-answer human review.

Typical ROI realisation occurs within 12–18 months for organisations with mature data catalogues and identifiable high-value knowledge domains such as compliance, product, or customer relationship data.


2. Problem Statement

2.1 Business Problem

Enterprise knowledge is fragmented across wikis, SharePoint sites, ERP systems, policy repositories, and individual email threads. AI applications built on top of this fragmentation inherit its inconsistencies — two AI answers to the same question may contradict each other depending on which documents were retrieved. Business users lose trust quickly. Compliance and legal teams cannot accept AI outputs that cannot be audited back to authoritative sources.

2.2 Technical Problem

Large Language Models have no persistent memory of enterprise-specific entities or their relationships. Vector search retrieves semantically similar passages but cannot answer multi-hop relational questions such as "Which compliance policies apply to product X sold in jurisdiction Y to customer class Z?" without explicit relationship traversal. Retrieval augmented generation on raw documents alone cannot guarantee answer consistency when the same fact appears in multiple slightly-different forms across documents.

2.3 Symptoms

  • AI answers contradict each other for equivalent questions posed in different phrasings
  • AI cannot answer cross-entity relational queries reliably
  • Regulatory auditors cannot obtain a complete audit trail for AI-generated decisions
  • Duplicate entities (same person, product, or policy represented multiple times) cause incorrect AI behaviour
  • Knowledge updates (e.g., a policy change) do not propagate consistently to AI responses

2.4 Cost of Inaction

  • Regulatory non-compliance risk in high-stakes domains (financial advice, medical, legal)
  • AI adoption stalls due to trust deficit — business users revert to manual processes
  • Knowledge fragmentation compounds: each new AI application builds its own ad-hoc corpus, creating N siloed knowledge stores instead of one governed asset
  • Entity deduplication effort grows super-linearly as data volumes increase without a resolution strategy

3. Context

3.1 When to Apply

  • Organisation has ≥3 distinct knowledge domains (e.g., product, customer, compliance, HR) that AI applications must reason across
  • Answers require multi-hop relational reasoning, not just passage retrieval
  • Regulatory or audit requirements demand traceable reasoning chains
  • Persistent entity identity matters (same customer, product, or policy referenced consistently)
  • Organisation has sufficient data engineering maturity to operate a graph database in production

3.2 When NOT to Apply

  • Single-domain, single-document-type RAG use cases (plain vector RAG is simpler and sufficient)
  • Organisations without a data governance function — ontology without governance degrades rapidly
  • Proof-of-concept or MVP phases — graph infrastructure investment is only justified at production scale
  • Highly dynamic knowledge where relationships change faster than graph update pipelines can process (sub-minute freshness requirement)

3.3 Prerequisites

  • Functioning data catalogue with documented data domains
  • At least one domain expert per knowledge domain available for ontology workshops
  • Master data management (MDM) or entity resolution capability for at least one entity type
  • Engineering team with graph database operational experience or vendor-managed service option

3.4 Industry Applicability

Industry Applicability Primary Use Case
Financial Services High Regulatory compliance, customer 360, product eligibility
Healthcare High Clinical pathways, drug interactions, patient history
Manufacturing High Product genealogy, supplier relationships, maintenance knowledge
Legal / Professional Services High Case precedent, contract clause relationships, jurisdiction mapping
Retail / CPG Medium Product taxonomy, supplier network, customer segmentation
Government High Policy cross-reference, citizen services, regulatory mapping

4. Architecture Overview

The Enterprise Knowledge Graph architecture is organised into four horizontal layers: Ingestion, Graph Store, Query and Traversal, and AI Integration.

4.1 Ingestion Layer

Knowledge enters the graph through three parallel pipelines.

NLP Extraction Pipeline processes unstructured documents — policy PDFs, contracts, technical specifications, wiki pages. A document pre-processor performs OCR, layout analysis, and language detection. Named Entity Recognition (NER) identifies entity mentions: people, organisations, products, locations, dates, regulatory references. Relationship Extraction (RE) models identify semantic relationships between co-occurring entities. Coreference Resolution links pronoun and alias references to their canonical entity. The output is a set of candidate triples (subject, predicate, object) with confidence scores. Triples above a high-confidence threshold are auto-ingested; triples in the middle band enter a human validation queue; low-confidence triples are discarded.

Structured Data Mapping Pipeline ingests from databases, APIs, and data warehouses. A schema mapper translates relational tables to graph entities and edges using pre-defined mapping rules maintained in a mapping registry. Incremental change capture (CDC) ensures graph updates propagate within a defined SLO (typically minutes for critical domains). Foreign key relationships become graph edges. Referential integrity is validated before loading.

Manual Curation Pipeline handles high-value facts that are too important to risk extraction errors — regulatory requirements, executive decisions, product pricing. Subject matter experts enter facts through a governed curation UI with mandatory source citation, effective date, and expiry date fields.

4.2 Graph Store Layer

The graph database stores nodes (entities), edges (relationships), and properties (attributes). The ontology — defined in OWL or a property graph schema — governs which node types, relationship types, and property names are valid. Schema validation is enforced at write time. The store maintains full version history: every node and edge has created_at, updated_at, deleted_at, and source_document_id fields. This enables point-in-time graph snapshots.

4.3 Quality Management Layer

An entity resolution service runs continuously, identifying candidate duplicates using a configurable matching strategy (exact match on canonical identifiers, fuzzy match on names and attributes, embedding similarity for semantic duplicates). Duplicate candidates above a merge threshold are automatically merged; candidates in the uncertain band are routed to a human review queue. Each node and edge carries a confidence score that is propagated to any AI answer derived from it. A quality dashboard tracks entity count, duplicate rate, confidence distribution, and validation queue depth.

4.4 AI Integration Layer

AI applications query the graph in two modes. Direct graph traversal executes Cypher, SPARQL, or Gremlin queries when the application knows the specific relationship pattern it needs (e.g., "find all policies applicable to this product category in this jurisdiction"). Hybrid RAG combines vector retrieval with graph traversal: the vector store retrieves relevant document passages; the graph store enriches the context with structured relationships between entities mentioned in those passages; the LLM receives both unstructured passages and structured graph context. This hybrid approach consistently outperforms pure vector RAG on multi-hop questions.

4.5 Ontology Governance

The ontology evolves through a formal change management process. Proposed changes go through a domain expert review, impact analysis (which existing nodes/edges would be affected), and approval. Schema migrations are versioned and applied through a controlled deployment pipeline, not ad-hoc.


5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Ingestion["Ingestion Layer"] A[Unstructured Documents] B[Structured Databases] C[Manual Curation] end subgraph Store["Graph Store"] D{Confidence Router} E[(Graph Database)] F[Entity Resolution] end subgraph Integration["AI Integration"] G[Graph Traversal] H[Hybrid RAG] I[LLM Application] end A -->|NLP extraction| D B -->|schema mapping| D C -->|validated facts| D D -->|high confidence| E D -->|uncertain| F F -->|resolved| E E --> G E --> H G --> I H --> I style A fill:#dbeafe,stroke:#3b82f6 style B fill:#dbeafe,stroke:#3b82f6 style C fill:#dbeafe,stroke:#3b82f6 style D fill:#f3e8ff,stroke:#a855f7 style E fill:#fef9c3,stroke:#eab308 style F fill:#f0fdf4,stroke:#22c55e style G fill:#f0fdf4,stroke:#22c55e style H fill:#f0fdf4,stroke:#22c55e style I fill:#d1fae5,stroke:#10b981

6. Components

Component Type Responsibility Technology Options Criticality
NLP Extraction Pipeline Processing NER, relationship extraction, coreference resolution from unstructured text spaCy + custom models, AWS Comprehend, Azure AI Language, Hugging Face NLP High
Schema Mapper Processing Translate relational schema to graph triples; CDC from source databases Apache Kafka + custom mapper, Debezium CDC, AWS DMS High
Curation UI Application Human entry of high-confidence facts with mandatory source citation Custom React app, Stardog Designer, PoolParty Medium
Graph Database Storage Store and serve nodes, edges, properties with full version history Neo4j Enterprise, Amazon Neptune, Azure Cosmos DB Gremlin, TigerGraph Critical
Ontology Engine Governance Enforce schema validity; manage ontology versions and change lifecycle OWL ontologies via Protégé, Neo4j schema constraints, custom schema registry High
Entity Resolution Service Quality Identify and merge duplicate entities across sources Splink (probabilistic), OpenRefine, custom embedding-based matcher High
Human Validation Queue Workflow Route low/medium confidence triples and merge candidates to human reviewers Custom workflow app, Jira-integrated task queue, Label Studio Medium
Quality Dashboard Observability Monitor confidence distribution, coverage gaps, staleness, duplicate rate Grafana + custom metrics, Tableau, Superset Medium
Graph Query API Integration Expose graph queries to AI applications via REST/GraphQL Neo4j Bolt, Neptune SPARQL endpoint, custom GraphQL wrapper Critical
Hybrid RAG Orchestrator Integration Combine vector and graph retrieval into unified LLM context LangChain graph retrievers, LlamaIndex KnowledgeGraphIndex, custom High

7. Data Flow

7.1 Primary Data Flow — Document Ingestion to AI Query

Step Actor Action Output
1 Document Source Pushes new or updated document to ingestion queue Document + metadata in queue
2 NLP Pipeline Performs OCR, NER, RE, coreference resolution Candidate triples with confidence scores
3 Confidence Router Routes triples by confidence threshold High → auto-ingest; medium → HVQ; low → discard log
4 Entity Resolution Checks candidate entities against existing graph nodes for duplicates Merged entity or new entity node
5 Graph Writer Writes validated triples to graph database with source provenance Nodes and edges with version metadata
6 Ontology Validator Validates new nodes/edges against ontology schema Accepted or rejected with error code
7 AI Application Issues graph traversal or hybrid RAG query Structured query (Cypher/SPARQL)
8 Graph Query API Executes query, returns nodes/edges with confidence and source metadata Structured result set
9 LLM Receives graph context + optional retrieved passages AI answer with traceable evidence chain

7.2 Error Flow

Error Detection Recovery Escalation
NLP extraction failure (malformed document) Pipeline error log; dead letter queue Retry with fallback OCR; manual review flag Alert ingestion ops team
Schema validation rejection (ontology violation) Graph writer rejects write; error logged Return error to upstream with violation details; route to ontology change process Ontology governance review
Entity resolution conflict (ambiguous merge) Confidence below merge threshold Route to human review queue SME review within SLA
Graph database write failure Graph writer exception; retry with backoff Retry ×3 with exponential backoff; dead letter queue after PagerDuty alert; DBA on-call
Stale source document (past expiry) Quality monitor scheduled job Flag node as stale; remove from AI context until refreshed Document owner notified for refresh

8. Security Considerations

8.1 Authentication and Authorisation

All graph query API endpoints require service-to-service authentication via mTLS or OAuth 2.0 client credentials. Human-facing interfaces (curation UI, validation queue, quality dashboard) require MFA-enabled SSO. Graph access is row-level controlled: nodes and edges carry data classification labels that are enforced by the query API layer — an application with "INTERNAL" clearance cannot traverse edges labelled "CONFIDENTIAL".

8.2 Secrets Management

Graph database credentials, NLP API keys, and CDC pipeline credentials are stored in a secrets vault (HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault). No credentials appear in application code or configuration files committed to source control. Secrets are rotated on a 90-day schedule with zero-downtime rotation procedures documented and tested.

8.3 Data Classification

Each node and edge in the graph carries a data classification label (Public, Internal, Confidential, Restricted). Classification is inherited from the source document's classification label. AI applications receive only nodes and edges at or below their authorised classification level. Cross-classification graph paths (a path that traverses a Restricted edge to reach a Public conclusion) trigger a security review before the traversal is permitted.

8.4 Encryption

Data at rest in the graph database is encrypted using AES-256. Data in transit between all components uses TLS 1.3 minimum. Backup snapshots are encrypted with customer-managed keys (CMK). The encryption key lifecycle is managed independently of the graph database service.

8.5 Auditability

Every write to the graph database is logged with: actor identity, source document, timestamp, confidence score, and operation type (create/update/delete/merge). Audit logs are immutable (append-only) and retained for the regulatory retention period of the organisation (minimum 7 years for financial services). AI application queries against the graph are logged with the application identity, query text, and result set hash.

8.6 OWASP LLM Top 10 Mapping

OWASP LLM Risk Relevance to EKG Mitigation
LLM01 Prompt Injection Adversarial document content could inject instructions into NLP extraction pipeline Input sanitisation before NLP processing; NLP models do not execute instructions
LLM02 Insecure Output Handling Graph-derived content passed to LLM could contain malicious content Sanitise graph node content before including in LLM prompt; structured output parsing only
LLM03 Training Data Poisoning Malicious documents ingested into graph poison downstream AI responses Document approval workflow; source authentication; confidence scoring surfaces anomalies
LLM04 Model Denial of Service Adversarially complex graph traversal queries could exhaust compute Query complexity limits; traversal depth caps; rate limiting on graph query API
LLM05 Supply Chain Vulnerabilities NLP model dependencies could introduce vulnerabilities Signed model artefacts; model provenance registry; dependency scanning in CI/CD
LLM06 Sensitive Information Disclosure Graph traversal could expose relationships across classification levels Row-level security on graph nodes/edges; classification-aware query layer
LLM07 Insecure Plugin Design External knowledge sources plugged into the graph via APIs could be compromised API source authentication; input validation on all external data; schema validation at ingest
LLM08 Excessive Agency AI applications given graph write access could modify authoritative knowledge Graph write access restricted to ingestion pipeline service accounts only; AI apps have read-only access
LLM09 Overreliance AI answers derived from stale or low-confidence graph data presented as authoritative Confidence scores surfaced to end users; staleness flags; human review before high-stakes use
LLM10 Model Theft Graph database contains proprietary enterprise knowledge — theft is equivalent to model theft Network isolation; encrypted backups; DLP controls on graph export endpoints

9. Governance Considerations

9.1 Responsible AI

The knowledge graph is an AI artefact that encodes enterprise facts. Bias can enter via selective ingestion (if certain document types, geographies, or business units are over- or under-represented). A coverage audit is performed quarterly: which domains have strong graph coverage, which have gaps? Gaps are documented and ingestion is prioritised accordingly. Any domain where the graph is used to make consequential decisions must have designated human review as a backstop.

9.2 Model Risk Management

The NLP extraction models (NER, RE) are subject to the same model risk management process as predictive models. Each extraction model has a model card documenting training data, performance metrics on enterprise-domain validation sets, known failure modes, and refresh schedule. Model performance is monitored in production via a golden validation set — if extraction recall drops below threshold, a model retraining or replacement is triggered.

9.3 Human Approval Gates

Consequential ontology changes (adding a new entity type, deprecating a relationship type, merging two entity classes) require domain expert sign-off and a review period of minimum 5 business days. Automated entity merges above the high-confidence threshold are permitted but are logged and auditable. Human validation queues for medium-confidence triples must be cleared within a defined SLA (e.g., 5 business days) to prevent knowledge staleness.

9.4 Policy Ownership

Each knowledge domain in the graph has a designated Data Steward who owns the quality, accuracy, and freshness of that domain's nodes and edges. The Data Steward approves new ingestion sources for their domain and is notified of staleness alerts. An ontology governance committee (cross-domain) resolves conflicts when two domain stewards disagree on a shared entity or relationship type.

9.5 Traceability

Every node and edge in the graph maintains full provenance: source document ID, source system, extraction method (NLP/structured/manual), extractor version, confidence score, human validator ID (if applicable), and effective date range. This provenance chain is the foundation for AI answer explainability (see EAAPL-KNW005).

9.6 Governance Artefacts

Artefact Owner Frequency Location
Ontology specification (OWL/schema) Ontology Governance Committee Updated per change Version-controlled schema repository
Domain coverage audit report Data Stewards + Data Office Quarterly Data governance platform
NLP model cards ML Engineering Per model version ML model registry
Entity resolution configuration Data Engineering Per major configuration change IaC repository
Ingestion source register Domain Data Stewards Updated per new source Data catalogue
Human validation queue SLA report Data Governance Monthly Governance dashboard

10. Operational Considerations

10.1 Monitoring and SLOs

Metric SLO Target Alerting Threshold Tool
Graph query API p99 latency ≤200ms for simple traversal; ≤2s for multi-hop >500ms p99 over 5 min Prometheus + Grafana
Ingestion pipeline lag (CDC) ≤5 min from source change to graph update >15 min lag Kafka consumer lag metrics
Entity resolution queue depth ≤500 pending merges >2,000 pending Custom metric + alert
Human validation queue SLA 100% cleared within 5 business days Any item >3 days Workflow system alert
Graph database availability 99.9% <99.5% over 1-hour window Cloud provider health checks
Extraction model recall (golden set) ≥0.85 recall on golden validation set <0.80 recall Scheduled evaluation job

10.2 Logging

All graph write operations, query API calls, entity resolution decisions, and human validation actions are logged with structured JSON to a centralised log aggregation platform (Splunk, Elastic, CloudWatch Logs). Log retention: 90 days hot, 7 years cold archive. Sensitive node content is masked in logs; only node IDs and metadata are logged.

10.3 Incident Management

P1 incidents (graph database unavailable, ingestion pipeline halted) trigger immediate on-call escalation with a 15-minute response SLA. P2 incidents (entity resolution backlog exceeds threshold, extraction model performance degradation) are addressed within 4 business hours. Incident post-mortems are conducted for all P1 and P2 incidents and findings are reviewed by the ontology governance committee.

10.4 Disaster Recovery

Scenario RTO RPO Recovery Procedure
Graph database node failure 5 min (failover to replica) 0 (synchronous replication) Automatic failover; validate with health check query
Graph database corruption 4 hours 15 min (from last backup) Restore from most recent validated backup; replay CDC from checkpoint
Ingestion pipeline failure 30 min 5 min (CDC offset retention) Restart pipeline; replay from last committed offset
Regional cloud outage 4 hours 1 hour Promote DR region replica; update DNS; validate query functionality

10.5 Capacity Planning

Graph database storage grows at approximately 10–50 GB per million nodes (depending on property richness). Query throughput scales with read replica count. Plan for 3× initial storage capacity to accommodate growth and index overhead. NLP extraction compute is bursty — autoscaling worker pools are preferred over fixed allocation.


11. Cost Considerations

11.1 Cost Drivers

Cost Driver Description Typical Range
Graph database licensing/hosting Neo4j Enterprise licence or managed service (Neptune/Cosmos DB) $2,000–$20,000/month depending on instance size
NLP model inference Document extraction at ingestion; billed per document or per compute hour $0.001–$0.01 per document page
Graph query compute CPU/memory for traversal queries; scales with query volume and complexity $500–$5,000/month
Human validation labour Data stewards and domain experts reviewing medium-confidence extractions $5,000–$30,000/month depending on ingestion volume
Vector store (for hybrid RAG) Companion vector DB for hybrid retrieval mode $500–$3,000/month
Engineering and operations FTE cost for graph engineers, ontology maintainers 1–3 FTE at senior level

11.2 Scaling Risks

  • NLP extraction cost grows linearly with document volume; large document libraries (>1M documents) require batching strategies and model efficiency optimisation
  • Human validation queue is the primary bottleneck at scale — automation improvements (raising confidence thresholds, better models) are necessary before labour scales
  • Graph query complexity (deep multi-hop traversals) can cause latency and cost spikes; query depth limits and result caching are essential

11.3 Optimisations

  • Cache frequent graph traversal results (TTL aligned to update frequency of those node types)
  • Segment graph into hot (frequently traversed) and cold (archival) partitions; hot partition on in-memory or SSD-backed storage
  • Use embedding-based pre-filtering to reduce traversal space before deep graph queries
  • Batch document ingestion during off-peak hours to leverage lower spot/preemptible compute pricing

11.4 Indicative Cost Ranges

Organisation Scale Monthly Infrastructure Cost Annual Total Cost (incl. labour)
Mid-market (100K nodes, 10 AI apps) $5,000–$15,000 $200,000–$400,000
Enterprise (10M nodes, 50+ AI apps) $30,000–$80,000 $800,000–$2,000,000
Large Enterprise (100M+ nodes, enterprise-wide) $100,000–$300,000 $3,000,000–$8,000,000

12. Trade-Off Analysis

12.1 Graph Database Technology Options

Option Strengths Weaknesses Best For
Neo4j Enterprise Richest Cypher query language; mature ecosystem; strong OLTP performance; native graph storage Licence cost; self-managed complexity; limited native analytics Complex relationship traversal; enterprise OLTP workloads
Amazon Neptune Fully managed; SPARQL + Gremlin support; native AWS integration; high availability built-in Higher latency than self-managed Neo4j; limited Cypher support; AWS lock-in AWS-native organisations; reduced operational overhead priority
Azure Cosmos DB (Gremlin) Globally distributed; multi-model (also supports SQL API); Azure AD integration Gremlin is less expressive than Cypher; higher latency for complex traversals; cost at scale Azure-native organisations; global distribution requirement
TigerGraph Superior analytics/OLAP graph workloads; GSQL for complex algorithms; handles very large graphs Steeper learning curve; smaller ecosystem; higher upfront cost Fraud detection; large-scale analytics-heavy graph workloads
pgvector + PostgreSQL graph extension Low operational overhead; existing PostgreSQL skills; cost-effective Limited graph query expressiveness; does not scale to enterprise graph sizes Small graphs (<1M nodes) embedded in existing PostgreSQL estate

12.2 Architectural Tensions

Tension Option A Option B Recommended Resolution
Ingestion automation vs. accuracy Maximise auto-ingestion (lower confidence thresholds) for speed and coverage Maximise human validation for accuracy at cost of speed Domain-dependent: high-stakes domains (legal, medical) require higher human validation rate; informational domains can accept higher auto-ingestion
Graph schema rigidity vs. flexibility Strict ontology enforcement (rejects any node/edge not in ontology) Schema-optional property graph (allows ad-hoc properties) Hybrid: core entity types and relationships are ontology-enforced; optional properties are schema-flexible with metadata tagging
Graph freshness vs. consistency Near-real-time ingestion (CDC; minutes lag) for freshness Batch ingestion with consistency verification Critical domains: near-real-time; analytical domains: batch acceptable
In-house vs. managed graph service Self-managed Neo4j for maximum control and query performance Managed service (Neptune/Cosmos DB) for reduced ops burden Organisations without dedicated graph DBA capability should use managed services despite performance trade-off

13. Failure Modes

Failure Likelihood Impact Detection Recovery
Ontology drift (graph evolves without ontology update) Medium High — AI answers become inconsistent with business reality Schema validation failure rate increase; data steward complaints Ontology audit; schema migration to realign; tighten change control process
Entity resolution false merge (two distinct entities merged incorrectly) Medium High — all AI answers about the merged entity are incorrect User-reported AI errors; data steward audit Demerge operation; add negative match rule; re-extract affected documents
NLP extraction model degradation (new document types not in training set) Medium Medium — knowledge gaps for new document types Recall drop on golden validation set Model fine-tuning on new document examples; temporary manual curation fallback
Circular relationship injection (ingestion creates graph cycle where none should exist) Low Medium — graph traversal may loop; AI reasoning corrupted Graph integrity check job detects cycles Remove offending edges; add cycle-detection validation to ingestion
Data steward turnover (domain knowledge lost when steward leaves) High High — ontology maintenance and validation quality drops Validation queue SLA misses; ontology change requests unanswered Documented ontology rationale per entity/relationship; successor handover process
Graph database replication lag during peak ingestion Medium Low — briefly stale reads from replicas Replica lag metric alert Reduce ingestion batch size; scale ingestion workers; route time-sensitive reads to primary

13.1 Cascading Failure Scenarios

Scenario 1: Ontology Breaking Change Cascade. An ontology change renames a core entity type without a migration. NLP extraction pipeline starts writing new nodes with the old type (type mismatch). Entity resolution stops matching new nodes to existing ones (different types). AI applications receive duplicate entities, contradictory answers. Human validation queue floods. Resolution requires: freeze ingestion, apply schema migration, re-run entity resolution, validate AI outputs.

Scenario 2: Bad Batch Ingestion Cascade. A malformed data export from an ERP system is ingested without sufficient validation. 50,000 incorrect relationship records are written to the graph. AI applications begin producing wrong answers for all queries touching those relationships. Detection is delayed because monitoring covers latency and availability but not answer correctness. Resolution requires: identify and rollback the ingestion batch; purge affected edges; re-ingest from correct source; add data validation pre-check to prevent recurrence.


14. Regulatory Considerations

Regulation Relevant Clause Requirement How EKG Addresses It
APRA CPS 230 (Operational Resilience) CPS 230 §36–§38 Material service providers and critical data must have documented recovery capability Graph database DR procedures; RTO/RPO defined and tested; backup validation
APRA CPS 234 (Information Security) CPS 234 §15–§17 Information assets classified and protected proportionate to criticality Node-level data classification; encryption at rest and in transit; access control per classification
Australian Privacy Act 1988 APP 11 (Security of Personal Information) Personal information must be protected from misuse and unauthorised access PII nodes tagged and access-controlled; audit log of all PII node access; right to erasure procedures
EU AI Act Article 13 (Transparency) High-risk AI systems must be designed to allow transparency of operation Knowledge graph provenance chain enables AI answer traceability to source facts
EU AI Act Article 14 (Human Oversight) High-risk AI systems must allow human oversight and intervention Human validation queues; confidence scores surfaced; graph corrections possible by authorised stewards
ISO/IEC 42001 §6.1 (Risk Assessment) AI management system must document AI-related risks and controls Risk register includes NLP extraction risk, entity resolution risk, ontology drift risk
NIST AI RMF GOVERN 1.1, MAP 1.5 AI risks identified and assigned to organisational roles Data steward ownership model maps to RMF GOVERN function

15. Reference Implementations

15.1 AWS

Component AWS Service
Graph database Amazon Neptune (serverless for variable workloads)
NLP extraction Amazon Comprehend + custom SageMaker NER models
Document ingestion queue Amazon SQS + S3 trigger
CDC from RDS AWS Database Migration Service + Amazon Kinesis
Secrets management AWS Secrets Manager
Human validation workflow Amazon Step Functions + custom UI
Monitoring Amazon CloudWatch + Managed Grafana
Hybrid RAG Amazon Bedrock Knowledge Bases (Neptune integration)

15.2 Azure

Component Azure Service
Graph database Azure Cosmos DB for Apache Gremlin
NLP extraction Azure AI Language (NER + relation extraction)
Document ingestion Azure Event Hubs + Blob Storage trigger
CDC from SQL databases Azure Data Factory CDC
Secrets management Azure Key Vault
Human validation workflow Azure Logic Apps + Power Apps
Monitoring Azure Monitor + Managed Grafana
Hybrid RAG Azure AI Search (graph + vector hybrid)

15.3 GCP

Component GCP Service
Graph database Neo4j on GKE or managed Neo4j Aura
NLP extraction Google Cloud Natural Language API + Vertex AI custom models
Document ingestion Cloud Pub/Sub + Cloud Storage trigger
Secrets management Google Cloud Secret Manager
Monitoring Google Cloud Monitoring + Grafana
Hybrid RAG Vertex AI Search + custom graph context enrichment

15.4 On-Premises

Component Technology
Graph database Neo4j Enterprise or TigerGraph on-prem
NLP extraction Hugging Face models on GPU servers; spaCy pipeline
Document ingestion Apache Kafka + custom Spark pipeline
Secrets management HashiCorp Vault
Monitoring Prometheus + Grafana
Hybrid RAG LangChain + Weaviate or Qdrant

Pattern ID Pattern Name Relationship Type Notes
EAAPL-KNW002 Semantic Data Layer Complementary Semantic layer provides business ontology that EKG implements; together they create natural language data access
EAAPL-KNW003 AI Knowledge Corpus Management Complementary Corpus management governs the documents that feed the NLP extraction pipeline
EAAPL-KNW005 Knowledge Graph for Explainability Extension EKG is the substrate; KNW005 adds the explainability presentation layer
EAAPL-KNW006 Corpus Quality Assurance Dependency Quality assurance must run on documents before they enter NLP extraction
EAAPL-RAG001 Retrieval Augmented Generation Consumer RAG pattern consumes the knowledge graph via hybrid retrieval mode
EAAPL-GOV002 AI Model Risk Management Governance NLP extraction models within EKG are subject to model risk management

17. Maturity Assessment

Overall Maturity Label: Proven

Dimension Score (1–5) Rationale
Technology readiness 4 Graph databases, NLP extraction, and hybrid RAG are all production-proven; tooling is mature
Organisational capability 2 Most enterprises lack dedicated graph engineers and ontology governance experience — this is the primary constraint
Standards availability 3 OWL, RDF, SPARQL are mature W3C standards; property graph standards (GQL) are emerging ISO
Vendor ecosystem 4 Multiple mature commercial and open-source vendors; managed cloud services available on all major clouds
Case evidence 4 Strong evidence from financial services (Goldman Sachs KG), healthcare, and tech companies; patterns well-documented
Regulatory alignment 4 EU AI Act and SR 11-7 requirements are well-addressed by the provenance and explainability capabilities
Overall 3.5 / 5 Proven pattern with high technology readiness; primary constraint is organisational capability uplift required

18. Revision History

Version Date Author Changes
1.0 2026-06-12 EAAPL Editorial Board Initial publication — covers ontology design, ingestion pipelines, graph DB selection, versioning, quality management, and RAG integration
← Back to LibraryMore Knowledge Management