Cover Image for Anthropic aims for its artificial intelligence to manage your computer while also influencing the market.
Sun Oct 27 2024

Anthropic aims for its artificial intelligence to manage your computer while also influencing the market.

Claude becomes the first artificial intelligence model that can operate a computer to perform useful tasks.

As technology advances, people's adaptation to tools like chatbots has been a gradual process. However, the next evolution may involve giving artificial intelligence the ability to manage our computers. Anthropic, a significant competitor in the AI field, has introduced its Claude model, which not only browses the internet but also opens applications and uses the keyboard and mouse to perform common tasks on a PC.

Jared Kaplan, chief scientist at Anthropic and a professor at Johns Hopkins University, states that we are on the brink of entering an era where an AI model can employ the same tools as humans to carry out various tasks. Recently, the chatbot service experienced a simultaneous outage, likely due to a surge in requests.

In a recent demonstration, Kaplan illustrated Claude's capabilities by showing how it helped plan an outing to watch the sunrise from the Golden Gate Bridge. Claude opened the Chrome browser, searched for relevant information, and created a calendar event to coordinate the meeting with a friend, although it did not include directions to the destination.

In another presentation, Claude was asked to create a basic website for its promotion. The model generated the necessary code through its own interface and used Visual Studio Code, a code editor, to develop the site, even managing to fix an error by identifying and removing the problematic code segment.

Mike Krieger, head of product at Anthropic, envisions that these AI agents will facilitate the automation of routine tasks, allowing people to focus on other activities. From now on, 'agentic' features will be available to users through the Claude 3.5 Sonnet API, in addition to a smaller version called Claude 3.5 Haiku.

Despite how impressive these demonstrations are, making the technology work consistently and flawlessly presents significant challenges. Current models, which can answer questions with near-human proficiency, are fundamental in chatbots like ChatGPT and Gemini, and are capable of performing tasks based on simple commands.

Anthropic claims that Claude has outperformed other AI systems on various metrics, such as SWE-bench and OSWorld, although these assertions have yet to be independently verified. In OSWorld, it is reported that Claude succeeds in its tasks 14.9% of the time, a low percentage compared to humans, who hover around 75%, but higher than GPT-4's 7.7%.

Companies like Canva and Replit are already testing the 'agentic' version of Claude to automate design and coding tasks. However, experts like Ofir Press warn that 'agentic' AI often struggles with long-term planning and recovering from errors, highlighting the need for solid performance in more complex tests to demonstrate its utility in practical applications.