King Klown Logo
King Klown& KOA

Index

SenTient (Semantic Entity Intelligent Transformation)

Version: 1.0.0-RC2 Status: Production Ready (Hybrid Architecture)


1. Executive Summary and Core Philosophy

SenTient is a next-generation Entity Reconciliation and Relation Extraction engine designed to bridge the gap between messy, unstructured text and structured Knowledge Graphs (Wikidata/Wikibase).

The core philosophy is a Hybrid Orchestration System that combines three distinct technological lineages into a single "Funnel" pipeline to achieve high performance and accuracy:

Key Architectural Features


2. The Three-Layer Funnel and Processing Pipeline

The system operates on a "Funnel" logic: broad and fast at the top, narrow and precise at the bottom. The unit of work is the SmartCell object, which acts as the immutable contract across all layers.

2.1. Layer 1: Ingestion & Fast Tagging (The Sieve)

2.2. Layer 2: The Semantic Linguist (Falcon 2.0)

2.3. Layer 3: The Core Orchestrator (Final Adjudication)


3. QA, Validation & Benchmarking

The QA Strategy relies on three pillars to statistically prove system improvement over time.

3.1. The Scrutinizers (Runtime Validation)

Scrutinizers are "Linting Rules for Data" located in config/qa/scrutinizer_rules.yaml. They run in the Java Core before export.

3.2. Golden Standard Datasets

Accuracy is measured against ground truth using industry-standard datasets:

3.3. Benchmarking & Deployment Guardrail

The evaluate_falcon_api.py script runs the full pipeline.

MetricTarget (v1.0)Acceptable Range
Precision0.85> 0.80
Recall0.82> 0.75
F-Score0.83> 0.78
Latency (p95)200ms< 500ms

Deployment Rule: If Precision drops by > 2% after a model update (e.g., SBERT or Solr FST index), the deployment is rejected.


4. Data Dictionary: The SmartCell Protocol

The SmartCell is the immutable data contract defined in schemas/data/smart_cell.json.

Logical FieldJSON TypeJava TypePython TypeDescription
raw_valueStringStringstrOriginal user input (never modified)
statusEnum (String)Recon.JudgmentstrCurrent lifecycle state (NEW, PENDING, MATCHED, etc.)
consensus_scoreFloatfloat (transient)floatFinal calculated confidence (0.0 to 1.0)
matchCandidate ObjReconCandidatedictThe single winning entity (if reconciled)
vectorArray<Float>double[]np.ndarraySBERT embedding payload

Telemetry (features)

The Candidate object contains a features object used for UI visualization and debugging. The Frontend renders a stacked bar chart based on these weights:


5. Wiring & Configuration Strategy

Network Topology (Port Map)

All services are bound strictly to 127.0.0.1 for security.

ServicePortProtocolTimeout
Java Core (Orchestrator)3333HTTP/1.1-
Falcon (Python)5005HTTP/1.1120s (Throttled)
Solr (Tapioca)8983HTTP/2500ms (Strict)
ElasticSearch9200HTTP/TCP-

File System Layout

The central configuration files are located in config/orchestration/environment.json and other files within the config/ directory.