King Klown Logo
King Klown& KOA

Theseus / Exploration Engine

The Exploration Engine controls how Theseus moves through an application:

Its goal is to build a useful UI graph with bounded effort, while minimizing risk.


Inputs and Outputs

Inputs:

Outputs:


Core Loop

Conceptually, the Exploration Engine runs a loop like this:

  1. Identify the current state (using state identification).
  2. If the state is new:
    • Record it.
    • Enumerate candidate actions.
  3. Pick a safe, unexplored action.
  4. Execute the action via the driver.
  5. Observe the resulting UI and compute the next state.
  6. Record a transition from the previous state to the new state.
  7. Repeat until no further actions are available within constraints.

Traversal Strategy

The engine treats the UI as an implicit graph it is progressively revealing.

Depth-First Exploration

A simple and useful default is depth-first search (DFS):

Benefits:

Other traversal modes (e.g., breadth-first, heuristic-guided search) can be added, but DFS is a good baseline.


Loop Detection and State History

To avoid infinite loops and redundant work:

If an action leads to a state that is already known:

This ensures the exploration converges even if the application allows cycling between screens.


Action Selection and Prioritization

Not all actions are equally important. The engine can prioritize:

Examples of filters:

Exact heuristics can be tuned per application or platform.


Safety and Risk Management

Some actions are potentially destructive (e.g., delete, reset, format, uninstall). The engine should avoid them by default unless running in a controlled sandbox.

Safety mechanisms include:

The Exploration Engine should treat safety as a first-class concern.


Handling Non-Standard UIs

Some UIs do not expose sufficient accessibility information to support tree-based exploration.

In these cases, the engine may enable fallback modes via the driver:

  1. Vision-based candidate detection

    • Capture a screenshot.
    • Detect potential interactive regions (buttons, inputs, menus) using vision heuristics.
    • Use OCR to infer text labels where possible.
  2. Coordinate-based interaction

    • Treat candidate regions as elements.
    • Perform actions by clicking/tapping at their coordinates.

Trade-offs:


Exploration Limits

To keep exploration tractable, the engine uses explicit limits:

When a limit is reached, the engine:

These limits can be configured depending on:


Error Handling and Recovery

During exploration, actions can fail in various ways:

The engine should:

Failures are data too: they indicate unreachable paths or restricted states.


Output for Atlas

The Exploration Engine provides Atlas with:

Atlas then persists this information and makes it available to consumers.