EAAPL-KNW001: Enterprise Knowledge Graph
Pattern ID: EAAPL-KNW001
Status: Proven
Complexity: High
Tags: knowledge-graph embedding enterprise-only high-complexity
Version: 1.0
Last Updated: 2026-06-12
1. Executive Summary
An Enterprise Knowledge Graph (EKG) is a persistent, curated, machine-readable representation of an organisation's entities, relationships, and facts. Unlike a document corpus, a knowledge graph stores structured semantic relationships that AI applications can traverse, reason over, and cite precisely.
This pattern covers the full lifecycle: ontology design via domain expert workshops; ingestion pipelines that combine NLP-extracted facts with structured database mappings; graph database selection aligned to workload profile; versioning and rollback; entity resolution and quality management; and integration with RAG pipelines where graph traversal augments vector similarity search.
For a CIO/CTO audience, the primary value proposition is twofold. First, the EKG becomes the enterprise's durable, governed knowledge asset — independent of any single AI vendor or model. Second, AI applications powered by an EKG produce answers that are traceable, auditable, and correctable because every claim maps back to a specific node, edge, and source document. This directly addresses model risk and regulatory explainability requirements without requiring per-answer human review.
Typical ROI realisation occurs within 12–18 months for organisations with mature data catalogues and identifiable high-value knowledge domains such as compliance, product, or customer relationship data.
2. Problem Statement
2.1 Business Problem
Enterprise knowledge is fragmented across wikis, SharePoint sites, ERP systems, policy repositories, and individual email threads. AI applications built on top of this fragmentation inherit its inconsistencies — two AI answers to the same question may contradict each other depending on which documents were retrieved. Business users lose trust quickly. Compliance and legal teams cannot accept AI outputs that cannot be audited back to authoritative sources.
2.2 Technical Problem
Large Language Models have no persistent memory of enterprise-specific entities or their relationships. Vector search retrieves semantically similar passages but cannot answer multi-hop relational questions such as "Which compliance policies apply to product X sold in jurisdiction Y to customer class Z?" without explicit relationship traversal. Retrieval augmented generation on raw documents alone cannot guarantee answer consistency when the same fact appears in multiple slightly-different forms across documents.
2.3 Symptoms
- AI answers contradict each other for equivalent questions posed in different phrasings
- AI cannot answer cross-entity relational queries reliably
- Regulatory auditors cannot obtain a complete audit trail for AI-generated decisions
- Duplicate entities (same person, product, or policy represented multiple times) cause incorrect AI behaviour
- Knowledge updates (e.g., a policy change) do not propagate consistently to AI responses
2.4 Cost of Inaction
- Regulatory non-compliance risk in high-stakes domains (financial advice, medical, legal)
- AI adoption stalls due to trust deficit — business users revert to manual processes
- Knowledge fragmentation compounds: each new AI application builds its own ad-hoc corpus, creating N siloed knowledge stores instead of one governed asset
- Entity deduplication effort grows super-linearly as data volumes increase without a resolution strategy
3. Context
3.1 When to Apply
- Organisation has ≥3 distinct knowledge domains (e.g., product, customer, compliance, HR) that AI applications must reason across
- Answers require multi-hop relational reasoning, not just passage retrieval
- Regulatory or audit requirements demand traceable reasoning chains
- Persistent entity identity matters (same customer, product, or policy referenced consistently)
- Organisation has sufficient data engineering maturity to operate a graph database in production
3.2 When NOT to Apply
- Single-domain, single-document-type RAG use cases (plain vector RAG is simpler and sufficient)
- Organisations without a data governance function — ontology without governance degrades rapidly
- Proof-of-concept or MVP phases — graph infrastructure investment is only justified at production scale
- Highly dynamic knowledge where relationships change faster than graph update pipelines can process (sub-minute freshness requirement)
3.3 Prerequisites
- Functioning data catalogue with documented data domains
- At least one domain expert per knowledge domain available for ontology workshops
- Master data management (MDM) or entity resolution capability for at least one entity type
- Engineering team with graph database operational experience or vendor-managed service option
3.4 Industry Applicability
| Industry | Applicability | Primary Use Case |
|---|---|---|
| Financial Services | High | Regulatory compliance, customer 360, product eligibility |
| Healthcare | High | Clinical pathways, drug interactions, patient history |
| Manufacturing | High | Product genealogy, supplier relationships, maintenance knowledge |
| Legal / Professional Services | High | Case precedent, contract clause relationships, jurisdiction mapping |
| Retail / CPG | Medium | Product taxonomy, supplier network, customer segmentation |
| Government | High | Policy cross-reference, citizen services, regulatory mapping |
4. Architecture Overview
The Enterprise Knowledge Graph architecture is organised into four horizontal layers: Ingestion, Graph Store, Query and Traversal, and AI Integration.
4.1 Ingestion Layer
Knowledge enters the graph through three parallel pipelines.
NLP Extraction Pipeline processes unstructured documents — policy PDFs, contracts, technical specifications, wiki pages. A document pre-processor performs OCR, layout analysis, and language detection. Named Entity Recognition (NER) identifies entity mentions: people, organisations, products, locations, dates, regulatory references. Relationship Extraction (RE) models identify semantic relationships between co-occurring entities. Coreference Resolution links pronoun and alias references to their canonical entity. The output is a set of candidate triples (subject, predicate, object) with confidence scores. Triples above a high-confidence threshold are auto-ingested; triples in the middle band enter a human validation queue; low-confidence triples are discarded.
Structured Data Mapping Pipeline ingests from databases, APIs, and data warehouses. A schema mapper translates relational tables to graph entities and edges using pre-defined mapping rules maintained in a mapping registry. Incremental change capture (CDC) ensures graph updates propagate within a defined SLO (typically minutes for critical domains). Foreign key relationships become graph edges. Referential integrity is validated before loading.
Manual Curation Pipeline handles high-value facts that are too important to risk extraction errors — regulatory requirements, executive decisions, product pricing. Subject matter experts enter facts through a governed curation UI with mandatory source citation, effective date, and expiry date fields.
4.2 Graph Store Layer
The graph database stores nodes (entities), edges (relationships), and properties (attributes). The ontology — defined in OWL or a property graph schema — governs which node types, relationship types, and property names are valid. Schema validation is enforced at write time. The store maintains full version history: every node and edge has created_at, updated_at, deleted_at, and source_document_id fields. This enables point-in-time graph snapshots.
4.3 Quality Management Layer
An entity resolution service runs continuously, identifying candidate duplicates using a configurable matching strategy (exact match on canonical identifiers, fuzzy match on names and attributes, embedding similarity for semantic duplicates). Duplicate candidates above a merge threshold are automatically merged; candidates in the uncertain band are routed to a human review queue. Each node and edge carries a confidence score that is propagated to any AI answer derived from it. A quality dashboard tracks entity count, duplicate rate, confidence distribution, and validation queue depth.
4.4 AI Integration Layer
AI applications query the graph in two modes. Direct graph traversal executes Cypher, SPARQL, or Gremlin queries when the application knows the specific relationship pattern it needs (e.g., "find all policies applicable to this product category in this jurisdiction"). Hybrid RAG combines vector retrieval with graph traversal: the vector store retrieves relevant document passages; the graph store enriches the context with structured relationships between entities mentioned in those passages; the LLM receives both unstructured passages and structured graph context. This hybrid approach consistently outperforms pure vector RAG on multi-hop questions.
4.5 Ontology Governance
The ontology evolves through a formal change management process. Proposed changes go through a domain expert review, impact analysis (which existing nodes/edges would be affected), and approval. Schema migrations are versioned and applied through a controlled deployment pipeline, not ad-hoc.
5. Architecture Diagram
6. Components
| Component | Type | Responsibility | Technology Options | Criticality |
|---|---|---|---|---|
| NLP Extraction Pipeline | Processing | NER, relationship extraction, coreference resolution from unstructured text | spaCy + custom models, AWS Comprehend, Azure AI Language, Hugging Face NLP | High |
| Schema Mapper | Processing | Translate relational schema to graph triples; CDC from source databases | Apache Kafka + custom mapper, Debezium CDC, AWS DMS | High |
| Curation UI | Application | Human entry of high-confidence facts with mandatory source citation | Custom React app, Stardog Designer, PoolParty | Medium |
| Graph Database | Storage | Store and serve nodes, edges, properties with full version history | Neo4j Enterprise, Amazon Neptune, Azure Cosmos DB Gremlin, TigerGraph | Critical |
| Ontology Engine | Governance | Enforce schema validity; manage ontology versions and change lifecycle | OWL ontologies via Protégé, Neo4j schema constraints, custom schema registry | High |
| Entity Resolution Service | Quality | Identify and merge duplicate entities across sources | Splink (probabilistic), OpenRefine, custom embedding-based matcher | High |
| Human Validation Queue | Workflow | Route low/medium confidence triples and merge candidates to human reviewers | Custom workflow app, Jira-integrated task queue, Label Studio | Medium |
| Quality Dashboard | Observability | Monitor confidence distribution, coverage gaps, staleness, duplicate rate | Grafana + custom metrics, Tableau, Superset | Medium |
| Graph Query API | Integration | Expose graph queries to AI applications via REST/GraphQL | Neo4j Bolt, Neptune SPARQL endpoint, custom GraphQL wrapper | Critical |
| Hybrid RAG Orchestrator | Integration | Combine vector and graph retrieval into unified LLM context | LangChain graph retrievers, LlamaIndex KnowledgeGraphIndex, custom | High |
7. Data Flow
7.1 Primary Data Flow — Document Ingestion to AI Query
| Step | Actor | Action | Output |
|---|---|---|---|
| 1 | Document Source | Pushes new or updated document to ingestion queue | Document + metadata in queue |
| 2 | NLP Pipeline | Performs OCR, NER, RE, coreference resolution | Candidate triples with confidence scores |
| 3 | Confidence Router | Routes triples by confidence threshold | High → auto-ingest; medium → HVQ; low → discard log |
| 4 | Entity Resolution | Checks candidate entities against existing graph nodes for duplicates | Merged entity or new entity node |
| 5 | Graph Writer | Writes validated triples to graph database with source provenance | Nodes and edges with version metadata |
| 6 | Ontology Validator | Validates new nodes/edges against ontology schema | Accepted or rejected with error code |
| 7 | AI Application | Issues graph traversal or hybrid RAG query | Structured query (Cypher/SPARQL) |
| 8 | Graph Query API | Executes query, returns nodes/edges with confidence and source metadata | Structured result set |
| 9 | LLM | Receives graph context + optional retrieved passages | AI answer with traceable evidence chain |
7.2 Error Flow
| Error | Detection | Recovery | Escalation |
|---|---|---|---|
| NLP extraction failure (malformed document) | Pipeline error log; dead letter queue | Retry with fallback OCR; manual review flag | Alert ingestion ops team |
| Schema validation rejection (ontology violation) | Graph writer rejects write; error logged | Return error to upstream with violation details; route to ontology change process | Ontology governance review |
| Entity resolution conflict (ambiguous merge) | Confidence below merge threshold | Route to human review queue | SME review within SLA |
| Graph database write failure | Graph writer exception; retry with backoff | Retry ×3 with exponential backoff; dead letter queue after | PagerDuty alert; DBA on-call |
| Stale source document (past expiry) | Quality monitor scheduled job | Flag node as stale; remove from AI context until refreshed | Document owner notified for refresh |
8. Security Considerations
8.1 Authentication and Authorisation
All graph query API endpoints require service-to-service authentication via mTLS or OAuth 2.0 client credentials. Human-facing interfaces (curation UI, validation queue, quality dashboard) require MFA-enabled SSO. Graph access is row-level controlled: nodes and edges carry data classification labels that are enforced by the query API layer — an application with "INTERNAL" clearance cannot traverse edges labelled "CONFIDENTIAL".
8.2 Secrets Management
Graph database credentials, NLP API keys, and CDC pipeline credentials are stored in a secrets vault (HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault). No credentials appear in application code or configuration files committed to source control. Secrets are rotated on a 90-day schedule with zero-downtime rotation procedures documented and tested.
8.3 Data Classification
Each node and edge in the graph carries a data classification label (Public, Internal, Confidential, Restricted). Classification is inherited from the source document's classification label. AI applications receive only nodes and edges at or below their authorised classification level. Cross-classification graph paths (a path that traverses a Restricted edge to reach a Public conclusion) trigger a security review before the traversal is permitted.
8.4 Encryption
Data at rest in the graph database is encrypted using AES-256. Data in transit between all components uses TLS 1.3 minimum. Backup snapshots are encrypted with customer-managed keys (CMK). The encryption key lifecycle is managed independently of the graph database service.
8.5 Auditability
Every write to the graph database is logged with: actor identity, source document, timestamp, confidence score, and operation type (create/update/delete/merge). Audit logs are immutable (append-only) and retained for the regulatory retention period of the organisation (minimum 7 years for financial services). AI application queries against the graph are logged with the application identity, query text, and result set hash.
8.6 OWASP LLM Top 10 Mapping
| OWASP LLM Risk | Relevance to EKG | Mitigation |
|---|---|---|
| LLM01 Prompt Injection | Adversarial document content could inject instructions into NLP extraction pipeline | Input sanitisation before NLP processing; NLP models do not execute instructions |
| LLM02 Insecure Output Handling | Graph-derived content passed to LLM could contain malicious content | Sanitise graph node content before including in LLM prompt; structured output parsing only |
| LLM03 Training Data Poisoning | Malicious documents ingested into graph poison downstream AI responses | Document approval workflow; source authentication; confidence scoring surfaces anomalies |
| LLM04 Model Denial of Service | Adversarially complex graph traversal queries could exhaust compute | Query complexity limits; traversal depth caps; rate limiting on graph query API |
| LLM05 Supply Chain Vulnerabilities | NLP model dependencies could introduce vulnerabilities | Signed model artefacts; model provenance registry; dependency scanning in CI/CD |
| LLM06 Sensitive Information Disclosure | Graph traversal could expose relationships across classification levels | Row-level security on graph nodes/edges; classification-aware query layer |
| LLM07 Insecure Plugin Design | External knowledge sources plugged into the graph via APIs could be compromised | API source authentication; input validation on all external data; schema validation at ingest |
| LLM08 Excessive Agency | AI applications given graph write access could modify authoritative knowledge | Graph write access restricted to ingestion pipeline service accounts only; AI apps have read-only access |
| LLM09 Overreliance | AI answers derived from stale or low-confidence graph data presented as authoritative | Confidence scores surfaced to end users; staleness flags; human review before high-stakes use |
| LLM10 Model Theft | Graph database contains proprietary enterprise knowledge — theft is equivalent to model theft | Network isolation; encrypted backups; DLP controls on graph export endpoints |
9. Governance Considerations
9.1 Responsible AI
The knowledge graph is an AI artefact that encodes enterprise facts. Bias can enter via selective ingestion (if certain document types, geographies, or business units are over- or under-represented). A coverage audit is performed quarterly: which domains have strong graph coverage, which have gaps? Gaps are documented and ingestion is prioritised accordingly. Any domain where the graph is used to make consequential decisions must have designated human review as a backstop.
9.2 Model Risk Management
The NLP extraction models (NER, RE) are subject to the same model risk management process as predictive models. Each extraction model has a model card documenting training data, performance metrics on enterprise-domain validation sets, known failure modes, and refresh schedule. Model performance is monitored in production via a golden validation set — if extraction recall drops below threshold, a model retraining or replacement is triggered.
9.3 Human Approval Gates
Consequential ontology changes (adding a new entity type, deprecating a relationship type, merging two entity classes) require domain expert sign-off and a review period of minimum 5 business days. Automated entity merges above the high-confidence threshold are permitted but are logged and auditable. Human validation queues for medium-confidence triples must be cleared within a defined SLA (e.g., 5 business days) to prevent knowledge staleness.
9.4 Policy Ownership
Each knowledge domain in the graph has a designated Data Steward who owns the quality, accuracy, and freshness of that domain's nodes and edges. The Data Steward approves new ingestion sources for their domain and is notified of staleness alerts. An ontology governance committee (cross-domain) resolves conflicts when two domain stewards disagree on a shared entity or relationship type.
9.5 Traceability
Every node and edge in the graph maintains full provenance: source document ID, source system, extraction method (NLP/structured/manual), extractor version, confidence score, human validator ID (if applicable), and effective date range. This provenance chain is the foundation for AI answer explainability (see EAAPL-KNW005).
9.6 Governance Artefacts
| Artefact | Owner | Frequency | Location |
|---|---|---|---|
| Ontology specification (OWL/schema) | Ontology Governance Committee | Updated per change | Version-controlled schema repository |
| Domain coverage audit report | Data Stewards + Data Office | Quarterly | Data governance platform |
| NLP model cards | ML Engineering | Per model version | ML model registry |
| Entity resolution configuration | Data Engineering | Per major configuration change | IaC repository |
| Ingestion source register | Domain Data Stewards | Updated per new source | Data catalogue |
| Human validation queue SLA report | Data Governance | Monthly | Governance dashboard |
10. Operational Considerations
10.1 Monitoring and SLOs
| Metric | SLO Target | Alerting Threshold | Tool |
|---|---|---|---|
| Graph query API p99 latency | ≤200ms for simple traversal; ≤2s for multi-hop | >500ms p99 over 5 min | Prometheus + Grafana |
| Ingestion pipeline lag (CDC) | ≤5 min from source change to graph update | >15 min lag | Kafka consumer lag metrics |
| Entity resolution queue depth | ≤500 pending merges | >2,000 pending | Custom metric + alert |
| Human validation queue SLA | 100% cleared within 5 business days | Any item >3 days | Workflow system alert |
| Graph database availability | 99.9% | <99.5% over 1-hour window | Cloud provider health checks |
| Extraction model recall (golden set) | ≥0.85 recall on golden validation set | <0.80 recall | Scheduled evaluation job |
10.2 Logging
All graph write operations, query API calls, entity resolution decisions, and human validation actions are logged with structured JSON to a centralised log aggregation platform (Splunk, Elastic, CloudWatch Logs). Log retention: 90 days hot, 7 years cold archive. Sensitive node content is masked in logs; only node IDs and metadata are logged.
10.3 Incident Management
P1 incidents (graph database unavailable, ingestion pipeline halted) trigger immediate on-call escalation with a 15-minute response SLA. P2 incidents (entity resolution backlog exceeds threshold, extraction model performance degradation) are addressed within 4 business hours. Incident post-mortems are conducted for all P1 and P2 incidents and findings are reviewed by the ontology governance committee.
10.4 Disaster Recovery
| Scenario | RTO | RPO | Recovery Procedure |
|---|---|---|---|
| Graph database node failure | 5 min (failover to replica) | 0 (synchronous replication) | Automatic failover; validate with health check query |
| Graph database corruption | 4 hours | 15 min (from last backup) | Restore from most recent validated backup; replay CDC from checkpoint |
| Ingestion pipeline failure | 30 min | 5 min (CDC offset retention) | Restart pipeline; replay from last committed offset |
| Regional cloud outage | 4 hours | 1 hour | Promote DR region replica; update DNS; validate query functionality |
10.5 Capacity Planning
Graph database storage grows at approximately 10–50 GB per million nodes (depending on property richness). Query throughput scales with read replica count. Plan for 3× initial storage capacity to accommodate growth and index overhead. NLP extraction compute is bursty — autoscaling worker pools are preferred over fixed allocation.
11. Cost Considerations
11.1 Cost Drivers
| Cost Driver | Description | Typical Range |
|---|---|---|
| Graph database licensing/hosting | Neo4j Enterprise licence or managed service (Neptune/Cosmos DB) | $2,000–$20,000/month depending on instance size |
| NLP model inference | Document extraction at ingestion; billed per document or per compute hour | $0.001–$0.01 per document page |
| Graph query compute | CPU/memory for traversal queries; scales with query volume and complexity | $500–$5,000/month |
| Human validation labour | Data stewards and domain experts reviewing medium-confidence extractions | $5,000–$30,000/month depending on ingestion volume |
| Vector store (for hybrid RAG) | Companion vector DB for hybrid retrieval mode | $500–$3,000/month |
| Engineering and operations | FTE cost for graph engineers, ontology maintainers | 1–3 FTE at senior level |
11.2 Scaling Risks
- NLP extraction cost grows linearly with document volume; large document libraries (>1M documents) require batching strategies and model efficiency optimisation
- Human validation queue is the primary bottleneck at scale — automation improvements (raising confidence thresholds, better models) are necessary before labour scales
- Graph query complexity (deep multi-hop traversals) can cause latency and cost spikes; query depth limits and result caching are essential
11.3 Optimisations
- Cache frequent graph traversal results (TTL aligned to update frequency of those node types)
- Segment graph into hot (frequently traversed) and cold (archival) partitions; hot partition on in-memory or SSD-backed storage
- Use embedding-based pre-filtering to reduce traversal space before deep graph queries
- Batch document ingestion during off-peak hours to leverage lower spot/preemptible compute pricing
11.4 Indicative Cost Ranges
| Organisation Scale | Monthly Infrastructure Cost | Annual Total Cost (incl. labour) |
|---|---|---|
| Mid-market (100K nodes, 10 AI apps) | $5,000–$15,000 | $200,000–$400,000 |
| Enterprise (10M nodes, 50+ AI apps) | $30,000–$80,000 | $800,000–$2,000,000 |
| Large Enterprise (100M+ nodes, enterprise-wide) | $100,000–$300,000 | $3,000,000–$8,000,000 |
12. Trade-Off Analysis
12.1 Graph Database Technology Options
| Option | Strengths | Weaknesses | Best For |
|---|---|---|---|
| Neo4j Enterprise | Richest Cypher query language; mature ecosystem; strong OLTP performance; native graph storage | Licence cost; self-managed complexity; limited native analytics | Complex relationship traversal; enterprise OLTP workloads |
| Amazon Neptune | Fully managed; SPARQL + Gremlin support; native AWS integration; high availability built-in | Higher latency than self-managed Neo4j; limited Cypher support; AWS lock-in | AWS-native organisations; reduced operational overhead priority |
| Azure Cosmos DB (Gremlin) | Globally distributed; multi-model (also supports SQL API); Azure AD integration | Gremlin is less expressive than Cypher; higher latency for complex traversals; cost at scale | Azure-native organisations; global distribution requirement |
| TigerGraph | Superior analytics/OLAP graph workloads; GSQL for complex algorithms; handles very large graphs | Steeper learning curve; smaller ecosystem; higher upfront cost | Fraud detection; large-scale analytics-heavy graph workloads |
| pgvector + PostgreSQL graph extension | Low operational overhead; existing PostgreSQL skills; cost-effective | Limited graph query expressiveness; does not scale to enterprise graph sizes | Small graphs (<1M nodes) embedded in existing PostgreSQL estate |
12.2 Architectural Tensions
| Tension | Option A | Option B | Recommended Resolution |
|---|---|---|---|
| Ingestion automation vs. accuracy | Maximise auto-ingestion (lower confidence thresholds) for speed and coverage | Maximise human validation for accuracy at cost of speed | Domain-dependent: high-stakes domains (legal, medical) require higher human validation rate; informational domains can accept higher auto-ingestion |
| Graph schema rigidity vs. flexibility | Strict ontology enforcement (rejects any node/edge not in ontology) | Schema-optional property graph (allows ad-hoc properties) | Hybrid: core entity types and relationships are ontology-enforced; optional properties are schema-flexible with metadata tagging |
| Graph freshness vs. consistency | Near-real-time ingestion (CDC; minutes lag) for freshness | Batch ingestion with consistency verification | Critical domains: near-real-time; analytical domains: batch acceptable |
| In-house vs. managed graph service | Self-managed Neo4j for maximum control and query performance | Managed service (Neptune/Cosmos DB) for reduced ops burden | Organisations without dedicated graph DBA capability should use managed services despite performance trade-off |
13. Failure Modes
| Failure | Likelihood | Impact | Detection | Recovery |
|---|---|---|---|---|
| Ontology drift (graph evolves without ontology update) | Medium | High — AI answers become inconsistent with business reality | Schema validation failure rate increase; data steward complaints | Ontology audit; schema migration to realign; tighten change control process |
| Entity resolution false merge (two distinct entities merged incorrectly) | Medium | High — all AI answers about the merged entity are incorrect | User-reported AI errors; data steward audit | Demerge operation; add negative match rule; re-extract affected documents |
| NLP extraction model degradation (new document types not in training set) | Medium | Medium — knowledge gaps for new document types | Recall drop on golden validation set | Model fine-tuning on new document examples; temporary manual curation fallback |
| Circular relationship injection (ingestion creates graph cycle where none should exist) | Low | Medium — graph traversal may loop; AI reasoning corrupted | Graph integrity check job detects cycles | Remove offending edges; add cycle-detection validation to ingestion |
| Data steward turnover (domain knowledge lost when steward leaves) | High | High — ontology maintenance and validation quality drops | Validation queue SLA misses; ontology change requests unanswered | Documented ontology rationale per entity/relationship; successor handover process |
| Graph database replication lag during peak ingestion | Medium | Low — briefly stale reads from replicas | Replica lag metric alert | Reduce ingestion batch size; scale ingestion workers; route time-sensitive reads to primary |
13.1 Cascading Failure Scenarios
Scenario 1: Ontology Breaking Change Cascade. An ontology change renames a core entity type without a migration. NLP extraction pipeline starts writing new nodes with the old type (type mismatch). Entity resolution stops matching new nodes to existing ones (different types). AI applications receive duplicate entities, contradictory answers. Human validation queue floods. Resolution requires: freeze ingestion, apply schema migration, re-run entity resolution, validate AI outputs.
Scenario 2: Bad Batch Ingestion Cascade. A malformed data export from an ERP system is ingested without sufficient validation. 50,000 incorrect relationship records are written to the graph. AI applications begin producing wrong answers for all queries touching those relationships. Detection is delayed because monitoring covers latency and availability but not answer correctness. Resolution requires: identify and rollback the ingestion batch; purge affected edges; re-ingest from correct source; add data validation pre-check to prevent recurrence.
14. Regulatory Considerations
| Regulation | Relevant Clause | Requirement | How EKG Addresses It |
|---|---|---|---|
| APRA CPS 230 (Operational Resilience) | CPS 230 §36–§38 | Material service providers and critical data must have documented recovery capability | Graph database DR procedures; RTO/RPO defined and tested; backup validation |
| APRA CPS 234 (Information Security) | CPS 234 §15–§17 | Information assets classified and protected proportionate to criticality | Node-level data classification; encryption at rest and in transit; access control per classification |
| Australian Privacy Act 1988 | APP 11 (Security of Personal Information) | Personal information must be protected from misuse and unauthorised access | PII nodes tagged and access-controlled; audit log of all PII node access; right to erasure procedures |
| EU AI Act | Article 13 (Transparency) | High-risk AI systems must be designed to allow transparency of operation | Knowledge graph provenance chain enables AI answer traceability to source facts |
| EU AI Act | Article 14 (Human Oversight) | High-risk AI systems must allow human oversight and intervention | Human validation queues; confidence scores surfaced; graph corrections possible by authorised stewards |
| ISO/IEC 42001 | §6.1 (Risk Assessment) | AI management system must document AI-related risks and controls | Risk register includes NLP extraction risk, entity resolution risk, ontology drift risk |
| NIST AI RMF | GOVERN 1.1, MAP 1.5 | AI risks identified and assigned to organisational roles | Data steward ownership model maps to RMF GOVERN function |
15. Reference Implementations
15.1 AWS
| Component | AWS Service |
|---|---|
| Graph database | Amazon Neptune (serverless for variable workloads) |
| NLP extraction | Amazon Comprehend + custom SageMaker NER models |
| Document ingestion queue | Amazon SQS + S3 trigger |
| CDC from RDS | AWS Database Migration Service + Amazon Kinesis |
| Secrets management | AWS Secrets Manager |
| Human validation workflow | Amazon Step Functions + custom UI |
| Monitoring | Amazon CloudWatch + Managed Grafana |
| Hybrid RAG | Amazon Bedrock Knowledge Bases (Neptune integration) |
15.2 Azure
| Component | Azure Service |
|---|---|
| Graph database | Azure Cosmos DB for Apache Gremlin |
| NLP extraction | Azure AI Language (NER + relation extraction) |
| Document ingestion | Azure Event Hubs + Blob Storage trigger |
| CDC from SQL databases | Azure Data Factory CDC |
| Secrets management | Azure Key Vault |
| Human validation workflow | Azure Logic Apps + Power Apps |
| Monitoring | Azure Monitor + Managed Grafana |
| Hybrid RAG | Azure AI Search (graph + vector hybrid) |
15.3 GCP
| Component | GCP Service |
|---|---|
| Graph database | Neo4j on GKE or managed Neo4j Aura |
| NLP extraction | Google Cloud Natural Language API + Vertex AI custom models |
| Document ingestion | Cloud Pub/Sub + Cloud Storage trigger |
| Secrets management | Google Cloud Secret Manager |
| Monitoring | Google Cloud Monitoring + Grafana |
| Hybrid RAG | Vertex AI Search + custom graph context enrichment |
15.4 On-Premises
| Component | Technology |
|---|---|
| Graph database | Neo4j Enterprise or TigerGraph on-prem |
| NLP extraction | Hugging Face models on GPU servers; spaCy pipeline |
| Document ingestion | Apache Kafka + custom Spark pipeline |
| Secrets management | HashiCorp Vault |
| Monitoring | Prometheus + Grafana |
| Hybrid RAG | LangChain + Weaviate or Qdrant |
16. Related Patterns
| Pattern ID | Pattern Name | Relationship Type | Notes |
|---|---|---|---|
| EAAPL-KNW002 | Semantic Data Layer | Complementary | Semantic layer provides business ontology that EKG implements; together they create natural language data access |
| EAAPL-KNW003 | AI Knowledge Corpus Management | Complementary | Corpus management governs the documents that feed the NLP extraction pipeline |
| EAAPL-KNW005 | Knowledge Graph for Explainability | Extension | EKG is the substrate; KNW005 adds the explainability presentation layer |
| EAAPL-KNW006 | Corpus Quality Assurance | Dependency | Quality assurance must run on documents before they enter NLP extraction |
| EAAPL-RAG001 | Retrieval Augmented Generation | Consumer | RAG pattern consumes the knowledge graph via hybrid retrieval mode |
| EAAPL-GOV002 | AI Model Risk Management | Governance | NLP extraction models within EKG are subject to model risk management |
17. Maturity Assessment
Overall Maturity Label: Proven
| Dimension | Score (1–5) | Rationale |
|---|---|---|
| Technology readiness | 4 | Graph databases, NLP extraction, and hybrid RAG are all production-proven; tooling is mature |
| Organisational capability | 2 | Most enterprises lack dedicated graph engineers and ontology governance experience — this is the primary constraint |
| Standards availability | 3 | OWL, RDF, SPARQL are mature W3C standards; property graph standards (GQL) are emerging ISO |
| Vendor ecosystem | 4 | Multiple mature commercial and open-source vendors; managed cloud services available on all major clouds |
| Case evidence | 4 | Strong evidence from financial services (Goldman Sachs KG), healthcare, and tech companies; patterns well-documented |
| Regulatory alignment | 4 | EU AI Act and SR 11-7 requirements are well-addressed by the provenance and explainability capabilities |
| Overall | 3.5 / 5 | Proven pattern with high technology readiness; primary constraint is organisational capability uplift required |
18. Revision History
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2026-06-12 | EAAPL Editorial Board | Initial publication — covers ontology design, ingestion pipelines, graph DB selection, versioning, quality management, and RAG integration |