Proven

EAAPL-KNW001: Enterprise Knowledge Graph

Pattern ID: EAAPL-KNW001 Status: Proven Complexity: High Tags: knowledge-graph embedding enterprise-only high-complexity Version: 1.0 Last Updated: 2026-06-12

1. Executive Summary

An Enterprise Knowledge Graph (EKG) is a persistent, curated, machine-readable representation of an organisation's entities, relationships, and facts. Unlike a document corpus, a knowledge graph stores structured semantic relationships that AI applications can traverse, reason over, and cite precisely.

This pattern covers the full lifecycle: ontology design via domain expert workshops; ingestion pipelines that combine NLP-extracted facts with structured database mappings; graph database selection aligned to workload profile; versioning and rollback; entity resolution and quality management; and integration with RAG pipelines where graph traversal augments vector similarity search.

For a CIO/CTO audience, the primary value proposition is twofold. First, the EKG becomes the enterprise's durable, governed knowledge asset — independent of any single AI vendor or model. Second, AI applications powered by an EKG produce answers that are traceable, auditable, and correctable because every claim maps back to a specific node, edge, and source document. This directly addresses model risk and regulatory explainability requirements without requiring per-answer human review.

Typical ROI realisation occurs within 12–18 months for organisations with mature data catalogues and identifiable high-value knowledge domains such as compliance, product, or customer relationship data.

2. Problem Statement

2.1 Business Problem

Enterprise knowledge is fragmented across wikis, SharePoint sites, ERP systems, policy repositories, and individual email threads. AI applications built on top of this fragmentation inherit its inconsistencies — two AI answers to the same question may contradict each other depending on which documents were retrieved. Business users lose trust quickly. Compliance and legal teams cannot accept AI outputs that cannot be audited back to authoritative sources.

2.2 Technical Problem

Large Language Models have no persistent memory of enterprise-specific entities or their relationships. Vector search retrieves semantically similar passages but cannot answer multi-hop relational questions such as "Which compliance policies apply to product X sold in jurisdiction Y to customer class Z?" without explicit relationship traversal. Retrieval augmented generation on raw documents alone cannot guarantee answer consistency when the same fact appears in multiple slightly-different forms across documents.

2.3 Symptoms

AI answers contradict each other for equivalent questions posed in different phrasings
AI cannot answer cross-entity relational queries reliably
Regulatory auditors cannot obtain a complete audit trail for AI-generated decisions
Duplicate entities (same person, product, or policy represented multiple times) cause incorrect AI behaviour
Knowledge updates (e.g., a policy change) do not propagate consistently to AI responses

2.4 Cost of Inaction

Regulatory non-compliance risk in high-stakes domains (financial advice, medical, legal)
AI adoption stalls due to trust deficit — business users revert to manual processes
Knowledge fragmentation compounds: each new AI application builds its own ad-hoc corpus, creating N siloed knowledge stores instead of one governed asset
Entity deduplication effort grows super-linearly as data volumes increase without a resolution strategy

3. Context

3.1 When to Apply

Organisation has ≥3 distinct knowledge domains (e.g., product, customer, compliance, HR) that AI applications must reason across
Answers require multi-hop relational reasoning, not just passage retrieval
Regulatory or audit requirements demand traceable reasoning chains
Persistent entity identity matters (same customer, product, or policy referenced consistently)
Organisation has sufficient data engineering maturity to operate a graph database in production

3.2 When NOT to Apply

Single-domain, single-document-type RAG use cases (plain vector RAG is simpler and sufficient)
Organisations without a data governance function — ontology without governance degrades rapidly
Proof-of-concept or MVP phases — graph infrastructure investment is only justified at production scale
Highly dynamic knowledge where relationships change faster than graph update pipelines can process (sub-minute freshness requirement)

3.3 Prerequisites

Functioning data catalogue with documented data domains
At least one domain expert per knowledge domain available for ontology workshops
Master data management (MDM) or entity resolution capability for at least one entity type
Engineering team with graph database operational experience or vendor-managed service option

3.4 Industry Applicability

Industry	Applicability	Primary Use Case
Financial Services	High	Regulatory compliance, customer 360, product eligibility
Healthcare	High	Clinical pathways, drug interactions, patient history
Manufacturing	High	Product genealogy, supplier relationships, maintenance knowledge
Legal / Professional Services	High	Case precedent, contract clause relationships, jurisdiction mapping
Retail / CPG	Medium	Product taxonomy, supplier network, customer segmentation
Government	High	Policy cross-reference, citizen services, regulatory mapping

4. Architecture Overview

The Enterprise Knowledge Graph architecture is organised into four horizontal layers: Ingestion, Graph Store, Query and Traversal, and AI Integration.

4.1 Ingestion Layer

Knowledge enters the graph through three parallel pipelines.

NLP Extraction Pipeline processes unstructured documents — policy PDFs, contracts, technical specifications, wiki pages. A document pre-processor performs OCR, layout analysis, and language detection. Named Entity Recognition (NER) identifies entity mentions: people, organisations, products, locations, dates, regulatory references. Relationship Extraction (RE) models identify semantic relationships between co-occurring entities. Coreference Resolution links pronoun and alias references to their canonical entity. The output is a set of candidate triples (subject, predicate, object) with confidence scores. Triples above a high-confidence threshold are auto-ingested; triples in the middle band enter a human validation queue; low-confidence triples are discarded.

Structured Data Mapping Pipeline ingests from databases, APIs, and data warehouses. A schema mapper translates relational tables to graph entities and edges using pre-defined mapping rules maintained in a mapping registry. Incremental change capture (CDC) ensures graph updates propagate within a defined SLO (typically minutes for critical domains). Foreign key relationships become graph edges. Referential integrity is validated before loading.

Manual Curation Pipeline handles high-value facts that are too important to risk extraction errors — regulatory requirements, executive decisions, product pricing. Subject matter experts enter facts through a governed curation UI with mandatory source citation, effective date, and expiry date fields.

4.2 Graph Store Layer

The graph database stores nodes (entities), edges (relationships), and properties (attributes). The ontology — defined in OWL or a property graph schema — governs which node types, relationship types, and property names are valid. Schema validation is enforced at write time. The store maintains full version history: every node and edge has created_at, updated_at, deleted_at, and source_document_id fields. This enables point-in-time graph snapshots.

4.3 Quality Management Layer

An entity resolution service runs continuously, identifying candidate duplicates using a configurable matching strategy (exact match on canonical identifiers, fuzzy match on names and attributes, embedding similarity for semantic duplicates). Duplicate candidates above a merge threshold are automatically merged; candidates in the uncertain band are routed to a human review queue. Each node and edge carries a confidence score that is propagated to any AI answer derived from it. A quality dashboard tracks entity count, duplicate rate, confidence distribution, and validation queue depth.

4.4 AI Integration Layer

AI applications query the graph in two modes. Direct graph traversal executes Cypher, SPARQL, or Gremlin queries when the application knows the specific relationship pattern it needs (e.g., "find all policies applicable to this product category in this jurisdiction"). Hybrid RAG combines vector retrieval with graph traversal: the vector store retrieves relevant document passages; the graph store enriches the context with structured relationships between entities mentioned in those passages; the LLM receives both unstructured passages and structured graph context. This hybrid approach consistently outperforms pure vector RAG on multi-hop questions.

4.5 Ontology Governance

The ontology evolves through a formal change management process. Proposed changes go through a domain expert review, impact analysis (which existing nodes/edges would be affected), and approval. Schema migrations are versioned and applied through a controlled deployment pipeline, not ad-hoc.

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Ingestion["Ingestion Layer"] A[Unstructured Documents] B[Structured Databases] C[Manual Curation] end subgraph Store["Graph Store"] D{Confidence Router} E[(Graph Database)] F[Entity Resolution] end subgraph Integration["AI Integration"] G[Graph Traversal] H[Hybrid RAG] I[LLM Application] end A -->|NLP extraction| D B -->|schema mapping| D C -->|validated facts| D D -->|high confidence| E D -->|uncertain| F F -->|resolved| E E --> G E --> H G --> I H --> I style A fill:#dbeafe,stroke:#3b82f6 style B fill:#dbeafe,stroke:#3b82f6 style C fill:#dbeafe,stroke:#3b82f6 style D fill:#f3e8ff,stroke:#a855f7 style E fill:#fef9c3,stroke:#eab308 style F fill:#f0fdf4,stroke:#22c55e style G fill:#f0fdf4,stroke:#22c55e style H fill:#f0fdf4,stroke:#22c55e style I fill:#d1fae5,stroke:#10b981

6. Components

Component	Type	Responsibility	Technology Options	Criticality
NLP Extraction Pipeline	Processing	NER, relationship extraction, coreference resolution from unstructured text	spaCy + custom models, AWS Comprehend, Azure AI Language, Hugging Face NLP	High
Schema Mapper	Processing	Translate relational schema to graph triples; CDC from source databases	Apache Kafka + custom mapper, Debezium CDC, AWS DMS	High
Curation UI	Application	Human entry of high-confidence facts with mandatory source citation	Custom React app, Stardog Designer, PoolParty	Medium
Graph Database	Storage	Store and serve nodes, edges, properties with full version history	Neo4j Enterprise, Amazon Neptune, Azure Cosmos DB Gremlin, TigerGraph	Critical
Ontology Engine	Governance	Enforce schema validity; manage ontology versions and change lifecycle	OWL ontologies via Protégé, Neo4j schema constraints, custom schema registry	High
Entity Resolution Service	Quality	Identify and merge duplicate entities across sources	Splink (probabilistic), OpenRefine, custom embedding-based matcher	High
Human Validation Queue	Workflow	Route low/medium confidence triples and merge candidates to human reviewers	Custom workflow app, Jira-integrated task queue, Label Studio	Medium
Quality Dashboard	Observability	Monitor confidence distribution, coverage gaps, staleness, duplicate rate	Grafana + custom metrics, Tableau, Superset	Medium
Graph Query API	Integration	Expose graph queries to AI applications via REST/GraphQL	Neo4j Bolt, Neptune SPARQL endpoint, custom GraphQL wrapper	Critical
Hybrid RAG Orchestrator	Integration	Combine vector and graph retrieval into unified LLM context	LangChain graph retrievers, LlamaIndex KnowledgeGraphIndex, custom	High

7. Data Flow

7.1 Primary Data Flow — Document Ingestion to AI Query

Step	Actor	Action	Output
1	Document Source	Pushes new or updated document to ingestion queue	Document + metadata in queue
2	NLP Pipeline	Performs OCR, NER, RE, coreference resolution	Candidate triples with confidence scores
3	Confidence Router	Routes triples by confidence threshold	High → auto-ingest; medium → HVQ; low → discard log
4	Entity Resolution	Checks candidate entities against existing graph nodes for duplicates	Merged entity or new entity node
5	Graph Writer	Writes validated triples to graph database with source provenance	Nodes and edges with version metadata
6	Ontology Validator	Validates new nodes/edges against ontology schema	Accepted or rejected with error code
7	AI Application	Issues graph traversal or hybrid RAG query	Structured query (Cypher/SPARQL)
8	Graph Query API	Executes query, returns nodes/edges with confidence and source metadata	Structured result set
9	LLM	Receives graph context + optional retrieved passages	AI answer with traceable evidence chain

7.2 Error Flow

Error	Detection	Recovery	Escalation
NLP extraction failure (malformed document)	Pipeline error log; dead letter queue	Retry with fallback OCR; manual review flag	Alert ingestion ops team
Schema validation rejection (ontology violation)	Graph writer rejects write; error logged	Return error to upstream with violation details; route to ontology change process	Ontology governance review
Entity resolution conflict (ambiguous merge)	Confidence below merge threshold	Route to human review queue	SME review within SLA
Graph database write failure	Graph writer exception; retry with backoff	Retry ×3 with exponential backoff; dead letter queue after	PagerDuty alert; DBA on-call
Stale source document (past expiry)	Quality monitor scheduled job	Flag node as stale; remove from AI context until refreshed	Document owner notified for refresh

8. Security Considerations

8.1 Authentication and Authorisation

All graph query API endpoints require service-to-service authentication via mTLS or OAuth 2.0 client credentials. Human-facing interfaces (curation UI, validation queue, quality dashboard) require MFA-enabled SSO. Graph access is row-level controlled: nodes and edges carry data classification labels that are enforced by the query API layer — an application with "INTERNAL" clearance cannot traverse edges labelled "CONFIDENTIAL".

8.2 Secrets Management

Graph database credentials, NLP API keys, and CDC pipeline credentials are stored in a secrets vault (HashiCorp Vault, AWS Secrets Manager, or Azure Key Vault). No credentials appear in application code or configuration files committed to source control. Secrets are rotated on a 90-day schedule with zero-downtime rotation procedures documented and tested.

8.3 Data Classification

Each node and edge in the graph carries a data classification label (Public, Internal, Confidential, Restricted). Classification is inherited from the source document's classification label. AI applications receive only nodes and edges at or below their authorised classification level. Cross-classification graph paths (a path that traverses a Restricted edge to reach a Public conclusion) trigger a security review before the traversal is permitted.

8.4 Encryption

Data at rest in the graph database is encrypted using AES-256. Data in transit between all components uses TLS 1.3 minimum. Backup snapshots are encrypted with customer-managed keys (CMK). The encryption key lifecycle is managed independently of the graph database service.

8.5 Auditability

Every write to the graph database is logged with: actor identity, source document, timestamp, confidence score, and operation type (create/update/delete/merge). Audit logs are immutable (append-only) and retained for the regulatory retention period of the organisation (minimum 7 years for financial services). AI application queries against the graph are logged with the application identity, query text, and result set hash.

8.6 OWASP LLM Top 10 Mapping

OWASP LLM Risk	Relevance to EKG	Mitigation
LLM01 Prompt Injection	Adversarial document content could inject instructions into NLP extraction pipeline	Input sanitisation before NLP processing; NLP models do not execute instructions
LLM02 Insecure Output Handling	Graph-derived content passed to LLM could contain malicious content	Sanitise graph node content before including in LLM prompt; structured output parsing only
LLM03 Training Data Poisoning	Malicious documents ingested into graph poison downstream AI responses	Document approval workflow; source authentication; confidence scoring surfaces anomalies
LLM04 Model Denial of Service	Adversarially complex graph traversal queries could exhaust compute	Query complexity limits; traversal depth caps; rate limiting on graph query API
LLM05 Supply Chain Vulnerabilities	NLP model dependencies could introduce vulnerabilities	Signed model artefacts; model provenance registry; dependency scanning in CI/CD
LLM06 Sensitive Information Disclosure	Graph traversal could expose relationships across classification levels	Row-level security on graph nodes/edges; classification-aware query layer
LLM07 Insecure Plugin Design	External knowledge sources plugged into the graph via APIs could be compromised	API source authentication; input validation on all external data; schema validation at ingest
LLM08 Excessive Agency	AI applications given graph write access could modify authoritative knowledge	Graph write access restricted to ingestion pipeline service accounts only; AI apps have read-only access
LLM09 Overreliance	AI answers derived from stale or low-confidence graph data presented as authoritative	Confidence scores surfaced to end users; staleness flags; human review before high-stakes use
LLM10 Model Theft	Graph database contains proprietary enterprise knowledge — theft is equivalent to model theft	Network isolation; encrypted backups; DLP controls on graph export endpoints

9. Governance Considerations

9.1 Responsible AI

The knowledge graph is an AI artefact that encodes enterprise facts. Bias can enter via selective ingestion (if certain document types, geographies, or business units are over- or under-represented). A coverage audit is performed quarterly: which domains have strong graph coverage, which have gaps? Gaps are documented and ingestion is prioritised accordingly. Any domain where the graph is used to make consequential decisions must have designated human review as a backstop.

9.2 Model Risk Management

The NLP extraction models (NER, RE) are subject to the same model risk management process as predictive models. Each extraction model has a model card documenting training data, performance metrics on enterprise-domain validation sets, known failure modes, and refresh schedule. Model performance is monitored in production via a golden validation set — if extraction recall drops below threshold, a model retraining or replacement is triggered.

9.3 Human Approval Gates

Consequential ontology changes (adding a new entity type, deprecating a relationship type, merging two entity classes) require domain expert sign-off and a review period of minimum 5 business days. Automated entity merges above the high-confidence threshold are permitted but are logged and auditable. Human validation queues for medium-confidence triples must be cleared within a defined SLA (e.g., 5 business days) to prevent knowledge staleness.

9.4 Policy Ownership

Each knowledge domain in the graph has a designated Data Steward who owns the quality, accuracy, and freshness of that domain's nodes and edges. The Data Steward approves new ingestion sources for their domain and is notified of staleness alerts. An ontology governance committee (cross-domain) resolves conflicts when two domain stewards disagree on a shared entity or relationship type.

9.5 Traceability

Every node and edge in the graph maintains full provenance: source document ID, source system, extraction method (NLP/structured/manual), extractor version, confidence score, human validator ID (if applicable), and effective date range. This provenance chain is the foundation for AI answer explainability (see EAAPL-KNW005).

9.6 Governance Artefacts

Artefact	Owner	Frequency	Location
Ontology specification (OWL/schema)	Ontology Governance Committee	Updated per change	Version-controlled schema repository
Domain coverage audit report	Data Stewards + Data Office	Quarterly	Data governance platform
NLP model cards	ML Engineering	Per model version	ML model registry
Entity resolution configuration	Data Engineering	Per major configuration change	IaC repository
Ingestion source register	Domain Data Stewards	Updated per new source	Data catalogue
Human validation queue SLA report	Data Governance	Monthly	Governance dashboard

10. Operational Considerations

10.1 Monitoring and SLOs

Metric	SLO Target	Alerting Threshold	Tool
Graph query API p99 latency	≤200ms for simple traversal; ≤2s for multi-hop	>500ms p99 over 5 min	Prometheus + Grafana
Ingestion pipeline lag (CDC)	≤5 min from source change to graph update	>15 min lag	Kafka consumer lag metrics
Entity resolution queue depth	≤500 pending merges	>2,000 pending	Custom metric + alert
Human validation queue SLA	100% cleared within 5 business days	Any item >3 days	Workflow system alert
Graph database availability	99.9%	<99.5% over 1-hour window	Cloud provider health checks
Extraction model recall (golden set)	≥0.85 recall on golden validation set	<0.80 recall	Scheduled evaluation job

10.2 Logging

All graph write operations, query API calls, entity resolution decisions, and human validation actions are logged with structured JSON to a centralised log aggregation platform (Splunk, Elastic, CloudWatch Logs). Log retention: 90 days hot, 7 years cold archive. Sensitive node content is masked in logs; only node IDs and metadata are logged.

10.3 Incident Management

P1 incidents (graph database unavailable, ingestion pipeline halted) trigger immediate on-call escalation with a 15-minute response SLA. P2 incidents (entity resolution backlog exceeds threshold, extraction model performance degradation) are addressed within 4 business hours. Incident post-mortems are conducted for all P1 and P2 incidents and findings are reviewed by the ontology governance committee.

10.4 Disaster Recovery

Scenario	RTO	RPO	Recovery Procedure
Graph database node failure	5 min (failover to replica)	0 (synchronous replication)	Automatic failover; validate with health check query
Graph database corruption	4 hours	15 min (from last backup)	Restore from most recent validated backup; replay CDC from checkpoint
Ingestion pipeline failure	30 min	5 min (CDC offset retention)	Restart pipeline; replay from last committed offset
Regional cloud outage	4 hours	1 hour	Promote DR region replica; update DNS; validate query functionality

10.5 Capacity Planning

Graph database storage grows at approximately 10–50 GB per million nodes (depending on property richness). Query throughput scales with read replica count. Plan for 3× initial storage capacity to accommodate growth and index overhead. NLP extraction compute is bursty — autoscaling worker pools are preferred over fixed allocation.

11. Cost Considerations

11.1 Cost Drivers

Cost Driver	Description	Typical Range
Graph database licensing/hosting	Neo4j Enterprise licence or managed service (Neptune/Cosmos DB)	$2,000–$20,000/month depending on instance size
NLP model inference	Document extraction at ingestion; billed per document or per compute hour	$0.001–$0.01 per document page
Graph query compute	CPU/memory for traversal queries; scales with query volume and complexity	$500–$5,000/month
Human validation labour	Data stewards and domain experts reviewing medium-confidence extractions	$5,000–$30,000/month depending on ingestion volume
Vector store (for hybrid RAG)	Companion vector DB for hybrid retrieval mode	$500–$3,000/month
Engineering and operations	FTE cost for graph engineers, ontology maintainers	1–3 FTE at senior level

11.2 Scaling Risks

NLP extraction cost grows linearly with document volume; large document libraries (>1M documents) require batching strategies and model efficiency optimisation
Human validation queue is the primary bottleneck at scale — automation improvements (raising confidence thresholds, better models) are necessary before labour scales
Graph query complexity (deep multi-hop traversals) can cause latency and cost spikes; query depth limits and result caching are essential

11.3 Optimisations

Cache frequent graph traversal results (TTL aligned to update frequency of those node types)
Segment graph into hot (frequently traversed) and cold (archival) partitions; hot partition on in-memory or SSD-backed storage
Use embedding-based pre-filtering to reduce traversal space before deep graph queries
Batch document ingestion during off-peak hours to leverage lower spot/preemptible compute pricing

11.4 Indicative Cost Ranges

Organisation Scale	Monthly Infrastructure Cost	Annual Total Cost (incl. labour)
Mid-market (100K nodes, 10 AI apps)	$5,000–$15,000	$200,000–$400,000
Enterprise (10M nodes, 50+ AI apps)	$30,000–$80,000	$800,000–$2,000,000
Large Enterprise (100M+ nodes, enterprise-wide)	$100,000–$300,000	$3,000,000–$8,000,000

12. Trade-Off Analysis

12.1 Graph Database Technology Options

Option	Strengths	Weaknesses	Best For
Neo4j Enterprise	Richest Cypher query language; mature ecosystem; strong OLTP performance; native graph storage	Licence cost; self-managed complexity; limited native analytics	Complex relationship traversal; enterprise OLTP workloads
Amazon Neptune	Fully managed; SPARQL + Gremlin support; native AWS integration; high availability built-in	Higher latency than self-managed Neo4j; limited Cypher support; AWS lock-in	AWS-native organisations; reduced operational overhead priority
Azure Cosmos DB (Gremlin)	Globally distributed; multi-model (also supports SQL API); Azure AD integration	Gremlin is less expressive than Cypher; higher latency for complex traversals; cost at scale	Azure-native organisations; global distribution requirement
TigerGraph	Superior analytics/OLAP graph workloads; GSQL for complex algorithms; handles very large graphs	Steeper learning curve; smaller ecosystem; higher upfront cost	Fraud detection; large-scale analytics-heavy graph workloads
pgvector + PostgreSQL graph extension	Low operational overhead; existing PostgreSQL skills; cost-effective	Limited graph query expressiveness; does not scale to enterprise graph sizes	Small graphs (<1M nodes) embedded in existing PostgreSQL estate

12.2 Architectural Tensions

Tension	Option A	Option B	Recommended Resolution
Ingestion automation vs. accuracy	Maximise auto-ingestion (lower confidence thresholds) for speed and coverage	Maximise human validation for accuracy at cost of speed	Domain-dependent: high-stakes domains (legal, medical) require higher human validation rate; informational domains can accept higher auto-ingestion
Graph schema rigidity vs. flexibility	Strict ontology enforcement (rejects any node/edge not in ontology)	Schema-optional property graph (allows ad-hoc properties)	Hybrid: core entity types and relationships are ontology-enforced; optional properties are schema-flexible with metadata tagging
Graph freshness vs. consistency	Near-real-time ingestion (CDC; minutes lag) for freshness	Batch ingestion with consistency verification	Critical domains: near-real-time; analytical domains: batch acceptable
In-house vs. managed graph service	Self-managed Neo4j for maximum control and query performance	Managed service (Neptune/Cosmos DB) for reduced ops burden	Organisations without dedicated graph DBA capability should use managed services despite performance trade-off

13. Failure Modes

Failure	Likelihood	Impact	Detection	Recovery
Ontology drift (graph evolves without ontology update)	Medium	High — AI answers become inconsistent with business reality	Schema validation failure rate increase; data steward complaints	Ontology audit; schema migration to realign; tighten change control process
Entity resolution false merge (two distinct entities merged incorrectly)	Medium	High — all AI answers about the merged entity are incorrect	User-reported AI errors; data steward audit	Demerge operation; add negative match rule; re-extract affected documents
NLP extraction model degradation (new document types not in training set)	Medium	Medium — knowledge gaps for new document types	Recall drop on golden validation set	Model fine-tuning on new document examples; temporary manual curation fallback
Circular relationship injection (ingestion creates graph cycle where none should exist)	Low	Medium — graph traversal may loop; AI reasoning corrupted	Graph integrity check job detects cycles	Remove offending edges; add cycle-detection validation to ingestion
Data steward turnover (domain knowledge lost when steward leaves)	High	High — ontology maintenance and validation quality drops	Validation queue SLA misses; ontology change requests unanswered	Documented ontology rationale per entity/relationship; successor handover process
Graph database replication lag during peak ingestion	Medium	Low — briefly stale reads from replicas	Replica lag metric alert	Reduce ingestion batch size; scale ingestion workers; route time-sensitive reads to primary

13.1 Cascading Failure Scenarios

Scenario 1: Ontology Breaking Change Cascade. An ontology change renames a core entity type without a migration. NLP extraction pipeline starts writing new nodes with the old type (type mismatch). Entity resolution stops matching new nodes to existing ones (different types). AI applications receive duplicate entities, contradictory answers. Human validation queue floods. Resolution requires: freeze ingestion, apply schema migration, re-run entity resolution, validate AI outputs.

Scenario 2: Bad Batch Ingestion Cascade. A malformed data export from an ERP system is ingested without sufficient validation. 50,000 incorrect relationship records are written to the graph. AI applications begin producing wrong answers for all queries touching those relationships. Detection is delayed because monitoring covers latency and availability but not answer correctness. Resolution requires: identify and rollback the ingestion batch; purge affected edges; re-ingest from correct source; add data validation pre-check to prevent recurrence.

14. Regulatory Considerations

Regulation	Relevant Clause	Requirement	How EKG Addresses It
APRA CPS 230 (Operational Resilience)	CPS 230 §36–§38	Material service providers and critical data must have documented recovery capability	Graph database DR procedures; RTO/RPO defined and tested; backup validation
APRA CPS 234 (Information Security)	CPS 234 §15–§17	Information assets classified and protected proportionate to criticality	Node-level data classification; encryption at rest and in transit; access control per classification
Australian Privacy Act 1988	APP 11 (Security of Personal Information)	Personal information must be protected from misuse and unauthorised access	PII nodes tagged and access-controlled; audit log of all PII node access; right to erasure procedures
EU AI Act	Article 13 (Transparency)	High-risk AI systems must be designed to allow transparency of operation	Knowledge graph provenance chain enables AI answer traceability to source facts
EU AI Act	Article 14 (Human Oversight)	High-risk AI systems must allow human oversight and intervention	Human validation queues; confidence scores surfaced; graph corrections possible by authorised stewards
ISO/IEC 42001	§6.1 (Risk Assessment)	AI management system must document AI-related risks and controls	Risk register includes NLP extraction risk, entity resolution risk, ontology drift risk
NIST AI RMF	GOVERN 1.1, MAP 1.5	AI risks identified and assigned to organisational roles	Data steward ownership model maps to RMF GOVERN function

15. Reference Implementations

15.1 AWS

Component	AWS Service
Graph database	Amazon Neptune (serverless for variable workloads)
NLP extraction	Amazon Comprehend + custom SageMaker NER models
Document ingestion queue	Amazon SQS + S3 trigger
CDC from RDS	AWS Database Migration Service + Amazon Kinesis
Secrets management	AWS Secrets Manager
Human validation workflow	Amazon Step Functions + custom UI
Monitoring	Amazon CloudWatch + Managed Grafana
Hybrid RAG	Amazon Bedrock Knowledge Bases (Neptune integration)

15.2 Azure

Component	Azure Service
Graph database	Azure Cosmos DB for Apache Gremlin
NLP extraction	Azure AI Language (NER + relation extraction)
Document ingestion	Azure Event Hubs + Blob Storage trigger
CDC from SQL databases	Azure Data Factory CDC
Secrets management	Azure Key Vault
Human validation workflow	Azure Logic Apps + Power Apps
Monitoring	Azure Monitor + Managed Grafana
Hybrid RAG	Azure AI Search (graph + vector hybrid)

15.3 GCP

Component	GCP Service
Graph database	Neo4j on GKE or managed Neo4j Aura
NLP extraction	Google Cloud Natural Language API + Vertex AI custom models
Document ingestion	Cloud Pub/Sub + Cloud Storage trigger
Secrets management	Google Cloud Secret Manager
Monitoring	Google Cloud Monitoring + Grafana
Hybrid RAG	Vertex AI Search + custom graph context enrichment

15.4 On-Premises

Component	Technology
Graph database	Neo4j Enterprise or TigerGraph on-prem
NLP extraction	Hugging Face models on GPU servers; spaCy pipeline
Document ingestion	Apache Kafka + custom Spark pipeline
Secrets management	HashiCorp Vault
Monitoring	Prometheus + Grafana
Hybrid RAG	LangChain + Weaviate or Qdrant

Pattern ID	Pattern Name	Relationship Type	Notes
EAAPL-KNW002	Semantic Data Layer	Complementary	Semantic layer provides business ontology that EKG implements; together they create natural language data access
EAAPL-KNW003	AI Knowledge Corpus Management	Complementary	Corpus management governs the documents that feed the NLP extraction pipeline
EAAPL-KNW005	Knowledge Graph for Explainability	Extension	EKG is the substrate; KNW005 adds the explainability presentation layer
EAAPL-KNW006	Corpus Quality Assurance	Dependency	Quality assurance must run on documents before they enter NLP extraction
EAAPL-RAG001	Retrieval Augmented Generation	Consumer	RAG pattern consumes the knowledge graph via hybrid retrieval mode
EAAPL-GOV002	AI Model Risk Management	Governance	NLP extraction models within EKG are subject to model risk management

17. Maturity Assessment

Overall Maturity Label: Proven

Dimension	Score (1–5)	Rationale
Technology readiness	4	Graph databases, NLP extraction, and hybrid RAG are all production-proven; tooling is mature
Organisational capability	2	Most enterprises lack dedicated graph engineers and ontology governance experience — this is the primary constraint
Standards availability	3	OWL, RDF, SPARQL are mature W3C standards; property graph standards (GQL) are emerging ISO
Vendor ecosystem	4	Multiple mature commercial and open-source vendors; managed cloud services available on all major clouds
Case evidence	4	Strong evidence from financial services (Goldman Sachs KG), healthcare, and tech companies; patterns well-documented
Regulatory alignment	4	EU AI Act and SR 11-7 requirements are well-addressed by the provenance and explainability capabilities
Overall	3.5 / 5	Proven pattern with high technology readiness; primary constraint is organisational capability uplift required

18. Revision History

Version	Date	Author	Changes
1.0	2026-06-12	EAAPL Editorial Board	Initial publication — covers ontology design, ingestion pipelines, graph DB selection, versioning, quality management, and RAG integration

← Back to Library More Knowledge Management →

EAAPL-KNW001: Enterprise Knowledge Graph

EAAPL-KNW001: Enterprise Knowledge Graph

1. Executive Summary

2. Problem Statement

2.1 Business Problem

2.2 Technical Problem

2.3 Symptoms

2.4 Cost of Inaction

3. Context

3.1 When to Apply

3.2 When NOT to Apply

3.3 Prerequisites

3.4 Industry Applicability

4. Architecture Overview

4.1 Ingestion Layer

4.2 Graph Store Layer

4.3 Quality Management Layer

4.4 AI Integration Layer

4.5 Ontology Governance

5. Architecture Diagram

6. Components

7. Data Flow

7.1 Primary Data Flow — Document Ingestion to AI Query

7.2 Error Flow

8. Security Considerations

8.1 Authentication and Authorisation

8.2 Secrets Management

8.3 Data Classification

8.4 Encryption

8.5 Auditability

8.6 OWASP LLM Top 10 Mapping

9. Governance Considerations

9.1 Responsible AI

9.2 Model Risk Management

9.3 Human Approval Gates

9.4 Policy Ownership

9.5 Traceability

9.6 Governance Artefacts

10. Operational Considerations

10.1 Monitoring and SLOs

10.2 Logging

10.3 Incident Management

10.4 Disaster Recovery

10.5 Capacity Planning

11. Cost Considerations

11.1 Cost Drivers

11.2 Scaling Risks

11.3 Optimisations

11.4 Indicative Cost Ranges

12. Trade-Off Analysis

12.1 Graph Database Technology Options

12.2 Architectural Tensions

13. Failure Modes

13.1 Cascading Failure Scenarios

14. Regulatory Considerations

15. Reference Implementations

15.1 AWS

15.2 Azure

15.3 GCP

15.4 On-Premises

16. Related Patterns

17. Maturity Assessment

18. Revision History