Background

Introduction: Simulating Visualisation for Smarter Agents

Autonomous AI agents operate in increasingly complex visual worlds, from physical robotics navigating cluttered spaces to virtual agents interacting in rich simulations. Success requires more than reacting to immediate sensory input; it demands the ability to anticipate visual futures, explore unseen possibilities, and learn efficiently from limited real-world visual experience. Computational visual imagination (online generation and manipulation of visual scenarios) and visual dreams (offline synthesis and processing of visual data) are becoming essential capabilities for next-generation autonomous agents.

We define:

Visual Imagination: The agent's ability to use its internal models to actively generate and manipulate visual representations (images, scenes, viewpoints, object transformations) of states, actions, or outcomes not currently perceived. This includes predicting future visual states, visualizing counterfactuals, and synthesizing novel visual concepts.
Visual Dreams: The agent's ability to engage in offline processing involving the generation, replay, augmentation, or recombination of visual experiences (real or synthetic) solely for the purpose of improving its internal models (visual perception, world dynamics, policy) without external interaction.

Crucially, we address these as functional computational processes, not conscious experiences. Their power lies in enabling agents to "see" possibilities, learn from synthetic sight, and act with visual foresight.

PreviousSummary NextCurrent foundations

Last updated 3 months ago