Consumers / AI Agent Integration
AI agents are a primary type of consumer for Ariane.
They use the UI graphs stored in Atlas as a reference for planning and executing actions inside existing software, turning high-level user goals into concrete sequences of UI operations.
This page describes how an AI agent can integrate with Ariane conceptually.
High-Level Flow
From an agent’s point of view, the interaction with Ariane typically follows this loop:
- Understand the user’s goal.
- Identify the current UI state.
- Query Atlas for available actions and intents.
- Plan a sequence of transitions toward the goal.
- Execute (or instruct) the steps in the live UI.
- Observe the resulting state and adjust if necessary.
Ariane supports steps 2–4 by providing structured, semantic information about the UI.
1. Goal Representation
An agent starts with a goal, often expressed in natural language:
- “Export this document as PDF.”
- “Change the default font to Arial.”
- “Turn on dark mode.”
Internally, the agent should map this to:
- One or more intents known to Ariane (e.g.,
ExportToPDF, ChangeDefaultFont, EnableDarkMode).
- Optional constraints or preferences (e.g., “use the simplest path”, “avoid destructive steps”).
Ariane does not perform this mapping itself; it exposes a vocabulary of intents that the agent can align to.
See: Atlas/Ontology-Vocabulary
2. State Recognition Against Atlas
To act correctly, the agent must know which state in Atlas corresponds to the user’s current screen.
A typical process:
-
Observe the live UI via:
- Direct access to accessibility APIs, or
- Screen capture + OCR + element detection.
-
Compute or approximate fingerprints compatible with Atlas:
- Structural representation (if a tree is accessible).
- Visual/perceptual hash (if screenshots are available).
- Semantic hints from labels and titles.
-
Query Atlas:
- “Given these fingerprint components and context (app, version, platform), what is the closest
state_id?”
Atlas returns:
- Candidate
state_id(s).
- Similarity or confidence scores (implementation-dependent).
- References to interactive elements and their semantics.
The agent then:
- Selects the most plausible state.
- Optionally confirms via additional checks (e.g., checking that key labels match expectations).
3. Inspecting Available Actions
Once the agent has a state_id, it can ask:
-
What actions are available?
- Query Atlas for all outgoing transitions from this state.
- Retrieve each transition’s:
- Action type.
- Target element.
- Target state.
- Optional intent.
-
How are actions presented in the UI?
- For each associated element, retrieve:
- Role (button, menu item, etc.).
- Label text.
- Bounds/coordinates.
- Patterns (primaryAction, destructiveAction, etc.).
The agent now knows, for this state:
- Which UI elements exist.
- What they do structurally.
- What they likely mean semantically.
4. Planning with Intents
Given a goal intent (e.g., ExportToPDF), the agent can treat the UI graph as a planning space.
Local Decision
In many cases, the next step is local:
- If any outgoing transition from the current state has
intent == goalIntent, choose that transition.
Example:
current_state: state_home
goal_intent: ExportToPDF
Atlas returns:
- transition: trans_home_to_export_dialog
- intent: Export