Cover Image for The Nvidia artificial intelligence avatar appeared on my screen and left me unsettled.
Fri Jan 10 2025

The Nvidia artificial intelligence avatar appeared on my screen and left me unsettled.

Nvidia unveiled a prototype of an artificial intelligence avatar at CES 2025 that sits on the computer desk. This AI assistant, called R2X, resembles a video game character and is capable of...

Nvidia introduced a prototype of an artificial intelligence assistant at CES 2025, which resides on the user's desktop. This assistant, named R2X, resembles a video game character and is designed to help navigate through applications on the computer. The R2X avatar is represented and animated using Nvidia's artificial intelligence models, allowing users to work with popular large language models (LLMs), such as OpenAI's GPT-4o or xAI's Grok.

Users can interact with R2X via text and voice, upload files for processing, and also enable the assistant to see in real-time what is happening on the screen or through the camera. Recently, several tech companies have been developing artificial intelligence avatars, not only for gaming but also for business clients and consumers. Although the initial demonstrations are somewhat strange, some believe these avatars could become a promising interface for AI assistants.

With R2X, Nvidia aims to integrate generative capabilities from the gaming world with the latest advancements in LLMs, aspiring to create an AI assistant that simulates human appearance and behavior. The company plans to release these avatars as open-source in the first half of 2025, viewing this as a new interface for developers, allowing users to integrate their favorite AI software products or even run the avatars locally.

Similar to Microsoft's Recall feature, which has been delayed due to privacy concerns, R2X can continuously take screenshots and process them through an AI model, although this option is disabled by default. When activated, it can provide feedback on the applications in use and assist, for example, with complex programming tasks. However, R2X is still in a prototype phase, and Nvidia acknowledges that some issues persist. During demonstrations, the avatar exhibited a "uncanny valley" effect, with unusual facial expressions and a somewhat aggressive tone at times.

Despite these drawbacks, R2X provided useful instructions and could accurately visualize the content on the screen. Nevertheless, there were moments when it gave incorrect directions and eventually lost the ability to see the screen. This could be due to limitations in the AI model being used, in this case, GPT-4o. In one of the demonstrations, a Nvidia product lead showed how R2X could observe and assist users with applications on the screen, even helping to use the generative fill function in Adobe Photoshop. However, the avatar made mistakes in giving instructions on this function and lost the ability to see during the process.

In another demonstration, R2X was able to process a PDF from the desktop and answer questions about the content, thanks to a retrieval-augmented generation (RAG) function. To represent these avatars, Nvidia uses its RTX neural faces algorithm and a new model called Audio2Face™-3D to automate facial movements. This model, however, also experienced difficulties, sometimes keeping the avatar's face in strange positions.

Additionally, R2X avatars are expected to be able to join Microsoft Teams meetings as personal assistants. Nvidia is also working on equipping these avatars with more autonomous capabilities, allowing R2X to take actions on the desktop. However, these capabilities seem to be a long way from realization and would require partnerships with software developers like Microsoft and Adobe, who are also exploring similar systems. Currently, it is unclear how Nvidia generates R2X's voices, although it is perceived that the voice when using GPT-4o is different from the default ChatGPT voices, while xAI's Grok chatbot still does not have a voice option.