Abstract Wiki Architect is a family-based, data-driven NLG toolkit for Abstract Wikipedia and Wikifunctions.
Instead of writing one renderer per language (“300 scripts for 300 languages”), this project organises NLG as:
The goal is to provide a professional, testable architecture for rule-based NLG across many languages, aligned with the ideas behind Abstract Wikipedia and Wikifunctions, but usable independently.
Abstract Wikipedia and Wikifunctions aim to:
To do that at scale, you need more than “one script per language”. You need:
This repository is a way to:
It is not an official Wikimedia component, but it is designed with that ecosystem in mind.
Konnaxion is a broader socio-technical platform project, focused on:
Konnaxion is not an AI product and does not expose AI features to end users. AI is used on the builder side: to design, generate, and refactor entire files, modules, and documentation. The running system is conventional software.
The intended relationship between Abstract Wiki Architect and Konnaxion is:
Shared semantic structures
Renderer for multi-lingual narratives
Clear separation of concerns
Alignment with Wikimedia
In short: Abstract Wiki Architect is the NLG / language layer; Konnaxion is the larger socio-technical platform that may reuse it.
Very roughly, the architecture is:
Engines (families) + Configs (languages) + Constructions (sentence patterns)
- Lexica + Frames (semantics) + Discourse + Router/API
engines/ – family-level engines (Romance, Slavic, Agglutinative, Germanic, Bantu, Semitic, Indo-Aryan, Iranic, Japonic, Koreanic, etc.).
morphology/ – family-specific morphology modules:
data/morphology_configs/ (e.g. romance_grammar_matrix.json, slavic_matrix.json, …),data/romance/it.json, data/slavic/ru.json),Under constructions/:
copula_equative_simple – “X is a Y”copula_equative_classification – “X is a Polish physicist”copula_attributive_np, copula_attributive_adjcopula_existential – “There is a Y in X”copula_locativepossession_have – “X has Y”intransitive_event, transitive_event, ditransitive_event, passive_eventrelative_clause_subject_gapcoordination_clausescomparative_superlativecausative_eventtopic_comment_copularapposition_npConstructions are family-agnostic:
Under semantics/ and docs/FRAMES_*.md:
Entity, Location, TimeSpan, Event, quantities, etc.FRAMES_ENTITY.md): persons, organisations, places, works, products, laws, projects, etc.FRAMES_EVENT.md): single events / episodes with participants, time, and location.FRAMES_RELATIONAL.md): statement-level facts (definitions, attributes, measurements, memberships, roles, part–whole, comparisons, etc.).FRAMES_NARRATIVE.md): timelines, careers, developments, receptions, comparisons, lists.FRAMES_META.md): article / section structure and sources.semantics/normalization.py and semantics/aw_bridge.py map “loose” inputs (dicts, CSV rows, Z-objects) into typed frames that the engines and constructions can consume.
A key example is the biography frame:
from semantics.types import Entity, BioFrame
marie = Entity(
id="Q7186",
name="Marie Curie",
gender="female",
human=True,
)
frame = BioFrame(
main_entity=marie,
primary_profession_lemmas=["physicist"],
nationality_lemmas=["polish"],
)
This BioFrame can be passed to the internal router or the public NLG API.
Under discourse/:
DiscourseState tracks mentioned entities, current topic, and simple salience.info_structure.py assigns topic vs focus labels to frames and arguments.referring_expression.py chooses between full name, short name, pronoun, or zero subject.planner.py orders several frames into short multi-sentence descriptions.This is what allows outputs like:
“Marie Curie is a Polish physicist. She discovered radium.”
instead of:
“Marie Curie is a Polish physicist. Marie Curie discovered radium.”
and makes it possible to build topic–comment variants for languages where that matters.
Under lexicon/:
Lexeme, Form, etc.),Lexicon data in data/lexicon/ (e.g. en_lexicon.json, fr_lexicon.json, it_lexicon.json, ru_lexicon.json, ja_lexicon.json, …) typically contains:
NOUN, ADJ, VERB, …),human, nationality, …),Supporting tools include:
language_profiles/ – per-language profiles (family, default constructions, key settings).
router.py – internal entry point:
given a language code and either:
loads the language profile and lexicon,
selects the appropriate family engine and constructions,
returns a surface string.
Examples:
render_bio(...) for biography-like sentences,render_from_semantics(frame, lang_code=...) for semantic-frame inputs.On top of this, a small public NLG API (docs/FRONTEND_API.md) exposes:
from nlg.api import generate_bio, generate
from semantics.types import Entity, BioFrame
bio = BioFrame(
main_entity=Entity(name="Douglas Adams", gender="male", human=True),
primary_profession_lemmas=["writer"],
nationality_lemmas=["british"],
)
result = generate_bio(lang="en", bio=bio)
print(result.text) # "Douglas Adams was a British writer."
print(result.sentences) # ["Douglas Adams was a British writer."]
result2 = generate(lang="fr", frame=bio)
print(result2.text)
The API returns a GenerationResult (final text, sentence list, debug info), and hides router / engine / lexicon details from callers.
The toolkit is built around test suites and regression checks:
CSV-based test suites (qa_tools/generated_datasets/test_suite_*.csv):
Test suite generator:
qa_tools/test_suite_generator.py produces language-specific CSV templates.Test runner:
qa/test_runner.py loads frames from CSV rows, calls the renderer, and compares actual vs expected outputs,Lexicon QA:
The intent is to make it easy to:
Beyond the Python API, the stack can be exposed as a web service:
/abstract_wiki_architect/).The HTTP API simply wraps the same frame-based generate(...) / generate_bio(...) calls and returns JSON, making it easier to integrate with other systems (including potential Wikifunctions prototypes or Konnaxion).
Details are in docs/hosting.md.
The project emphasises architecture and file-level organisation:
AI is used on the builder side:
Automated tests and QA suites provide stability as the architecture evolves.
The aim is a codebase that is:
Repository: https://github.com/Rejean-McCormick/abstract-wiki-architect
Meta-Wiki (Abstract Wikipedia tools page): https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Tools/abstract-wiki-architect
Konnaxion (platform reusing these ideas): https://github.com/Rejean-McCormick/Konnaxion https://github.com/Rejean-McCormick/Konnaxion/wiki