ChromeOS FaceGaze

FaceGaze (publicly named “Face control”) is a ChromeOS accessibility feature that allows users to control the cursor with their head and perform various actions using facial gestures.

Summary

User flow

FaceGaze can be enabled either in the accessibility quick settings menu or in the ChromeOS settings app under the route Accessibility > Cursor and touchpad > Face control. Once FaceGaze is enabled, the face recognition model and backing web assembly will be downloaded via DLC (downloadable content). When the download succeeds, the face model gets initialized and the webcam is turned on. The user can then move the cursor with their head and perform actions with facial gestures. When recognized, gestures and their associated actions will be posted to the FaceGaze bubble UI, which is a floating UI component positioned at the top of the display.

FaceGaze has several actions that temporarily put FaceGaze into a different state. Examples include enter/exit scroll mode, start/end long click, pause/ resume FaceGaze, and start/stop Dictation. When scroll mode is active, for example, head movements will not move the mouse but instead be used to determine a scroll direction. When FaceGaze is in an alternate state, it will be communicated via the bubble UI.

Note that if the DLC download fails, FaceGaze will automatically turn off and a notification will be shown with a failure message.

Technical overview

FaceGaze is implemented primarily as a Chrome extension in TypeScript. It also has a few browser-side components (DLC hook and APIs), as well as ash-side components (bubble UI). The high-level components of the feature are:

The Chrome extension, which is where most of the logic lives
A hook in the extension to connect to the device's webcam
An ML model, called FaceLandmarker, which processes video frames and returns results containing the location of all relevant face points, confidences for facial gestures, and the amount of head rotation. This is the technology that makes FaceGaze possible.
Extension APIs to update the cursor position, send synthetic mouse and key events, and interact with the FaceGaze bubble in the browser (among other things)
The ash-side implementation for the bubble UI
Settings page implementation, where users can configure their cursor settings and update their gesture-to-action bindings

Once FaceGaze is initialized, here's a high-level flow of how it responds to a single camera frame:

FaceGaze will grab the latest frame from the webcam feed
The frame is forwarded to the FaceLandmarker, which returns a raw result with face points, gesture confidences, and head rotation
FaceGaze will further interpret this result and convert facial gestures to actions (called “macros” in the code) depending on the user's preferences
FaceGaze will update the mouse location, perform actions, and update the floating bubble UI
The above process is repeated many times per second to give the user a feeling of responsiveness, e.g. mouse movement responds quickly to head movement

As mentioned above, FaceGaze utilizes a DLC to supply the FaceLandmarker model and the backing web assembly.

Accessing the webcam feed

FaceGaze utilizes the webRTC API, specifically the ImageCapture API to grab video frames and pass them to the FaceLandmarker model.

Code structure

The majority of FaceGaze code lives in the facegaze/ extension directory. Settings code lives in chrome/browser/resources/ash/settings/os_a11y_page. Code for the bubble UI lives in ash/system/accessibility.

FaceGaze extension classes

The facegaze/ extension directory contains several noteworthy classes:

FaceGaze, which is the main object. It handles setup/teardown, interacts with APIs like chrome.settingsPrivate, and owns the other essential classes.
WebCamFaceLandmarker, which requests the DLC download, initializes the FaceLandmarker API, starts the webcam, continually passes frames from the video stream into the FaceLandmarker while the video stream is active, and returns results to the main FaceGaze object.
GestureDetector, which computes which gestures were detected, filtering out those with low confidence scores. It also transforms raw gestures into ones supported by FaceGaze; for example, FaceGaze doesn't support “blink left eye” and “blink right eye” individually. Instead, it supports a compound “blink eyes” gesture.
GestureHandler, which does additional processing of FaceLandmarker results and converts recognized gestures into executable macros.
MouseController, which similarly processes FaceLandmarker results to convert recognized face points and rotation into a new cursor location. This class also contains logic to smooth cursor movement so that the user gets natural cursor movements instead of jumpy cursor movements.
ScrollModeController, which gives users scroll functionality with FaceGaze.
BubbleController, which controls all interaction with the FaceGaze bubble UI.

FaceGaze ash-side classes

FaceGazeBubbleController manages the FaceGaze UI from ash and provides an entry point for updating/changing the UI.
FaceGazeBubbleView is the actual implementation of the FaceGaze UI.

FaceGaze browser-side classes

AccessibilityManager contains logic for setting up/tearing down the extension, forwarding requests and results for DLC downloads, and showing notifications to the user.
AccessibilityDlcInstaller performs the install of the facegaze-assets DLC and passes the contents through to the extension.
DragEventRewriter is a common class that helps implement drag and drop for Autoclick and FaceGaze. While the class is active, all mouse movement events will be rewritten into mouse drag events.

FaceGaze settings

TODO

Testing

See the facegaze/ extension directory for all extension tests.
See facegaze_browsertest.cc for C++ integration tests. Note that these tests hook into a JavaScript class called FaceGazeTestSupport and allows the C++ tests to execute JavaScript or wait for information to propagate to the extension side before continuing. facegaze_test_utils.cc contains test support for writing tests.
See facegaze.go which provides infrastructure for FaceGaze in tast. Also see idle_perf.go, which runs FaceGaze idly for ten minutes and collects performance metrics across many different types of physical devices.