ChatGPT can now read some desktop applications on your Mac.
OpenAI's ChatGPT has begun to integrate with other applications on your computer. On Thursday, the startup revealed that the ChatGPT desktop application for macOS now has the ability to read.
OpenAI has begun integrating ChatGPT with various applications on computers, making developers' work easier. Recently, the startup introduced a ChatGPT app for macOS that can read code in several programmer-oriented applications, such as VS Code, Xcode, TextEdit, Terminal, and iTerm2. This improvement eliminates the need to copy and paste code into ChatGPT, a process that had become routine. By enabling this feature, the system will automatically send the segment of code being worked on along with the query made, providing a more complete context.
Despite this innovation, ChatGPT cannot write code directly within development applications, unlike other artificial intelligence tools like Cursor or GitHub Copilot. The feature, known as "Work with Apps," does not function as an autonomous agent, although OpenAI considers this to be a "key element" for the development of more advanced systems. One of the main hurdles for creating AI agents is the ability to understand what appears on the computer screen, beyond the queries. In this initial phase, OpenAI is focused on coding applications, given the rise of programming assistants as one of the most popular uses of language models.
The new feature is already available to Plus and Teams users, with plans to incorporate it into the Enterprise and Edu versions in the coming weeks. OpenAI plans to expand ChatGPT's compatibility with other types of applications, particularly those based on text for writing tasks.
In a demonstration with TechCrunch, an OpenAI employee showed how to open ChatGPT and an Xcode environment with a simple project modeling the solar system, although excluding Earth. The employee used an Xcode tab within ChatGPT so that the chatbot could access the application and asked it to "add the missing planets." The chatbot was able to complete the task, generating the corresponding code to represent Earth, although they still had to paste ChatGPT's response into their environment.
To read different applications, OpenAI primarily uses the macOS accessibility API, which allows for text reading. While this functionality is relatively reliable, it depends on the installation of extensions in certain applications like VS Code. However, this solution has limitations as the API can only read text and cannot understand visual elements such as images or videos. "Work with Apps" will send up to the last 200 lines of code along with requests from some users, and in other cases, it will use all the code from the active window as input.
The question remains about how OpenAI plans to extend this feature to other applications that are not compatible with Apple’s screen reading technology. Competitors like Anthropic have developed systems that analyze screenshots of the desktop to interact with other programs, although their current implementation presents several bugs and slowness.
In a recent meeting, an OpenAI representative clarified that the new feature is not intended to function as an agent, but rather to facilitate collaboration with coding tools, anticipating that more tools will be launched soon. This advancement toward the creation of agents is particularly relevant given that OpenAI is close to launching a general-purpose artificial intelligence agent, known as "Operator," which is expected to be available in early 2025.
For now, updates are limited to macOS, just before Apple implements integration with ChatGPT in December, and there is still no clarity on when "Work with Apps" will be available on Windows, the operating system backed by Microsoft, OpenAI's largest investor.