Vid2coach Top
Instead of waiting for a user to make a mistake and ask for help, Vid2Coach looks ahead to prevent errors before they ruin a project. Vid2Coach: Transforming How-To Videos into Task Assistants
Vid2Coach first breaks down a long how-to video into actionable, high-level steps. Instead of watching a 20-minute video, the system extracts the key milestones of the task. 2. Multimodal Demonstration Details
: Utilizing wearable cameras in commercial smart glasses , the system monitors user actions to provide proactive feedback on progress and success. vid2coach top
Providing form corrections in real-time during exercise routines.
: Uses Retrieval-Augmented Generation (RAG) to suggest alternative techniques, such as using a plunge chopper instead of a knife. Impact and Availability Instead of waiting for a user to make
Vid2Coach is a system that reimagines how we learn from video content. Rather than simply watching a tutorial, it allows users to actively perform a task while the system watches and assists them via a camera in smart glasses. Originally developed as a research project and presented at the ACM UIST 2025 conference, the system’s primary goal is to make instructional videos accessible for BLV individuals.
: Connects abstract visual landmarks to identifiable sensory indicators like texture, scent, and temperature. 3. Smart Glasses Real-Time Progress Monitoring As a user performs a task
In an increasingly digital world, learning new skills—from cooking a complex recipe to assembling furniture—often involves watching how-to videos. However, for blind and low-vision (BLV) individuals, these visual-heavy resources can be inaccessible, making independent task completion frustrating or impossible. Enter , a pioneering AI-driven system that transforms standard how-to videos into personalized, interactive, and accessible task assistants.
As a user performs a task, the camera in their smart glasses provides the system with a live view of the workspace. Vid2Coach then analyzes the video feed to track the user’s progress against the reference video, classifying the user's actions as "irrelevant," "in-progress," or "complete". It also delivers proactive, spoken feedback—like "Your knife angle looks good, now focus on keeping the slices even"—to help users self-correct.
Standard AI video descriptors typically summarize scenes at a high level, often omitting specific mechanics to keep a narrative flowing. Vid2Coach works differently. It processes an instructional video by slicing it into structured, high-level steps while evaluating both the audio track and individual frames simultaneously.
The research behind Vid2Coach demonstrates remarkable results: