If you’re worried about artificial intelligence taking your job, you might want to sit down for this one. AI startup Anthropic has demonstrated a new “Claude” model called that can look at a computer screen and operate a virtual mouse and keyboard, “the way people do,” according to promotional material.
In the video demo, researcher Sam Ringer shows Claude performing a bit of data entry “drudge work,” with the AI model using screenshots of a Mac desktop to find relevant information and submit a form. It is indeed the kind of thing that employees all over the world do every day, though Ringer notes that this is a “representative example.” Exactly how much of the video is edited isn’t known.
But you don’t need to take Anthropic’s word for it. An early version of the Claude 3.5 Sonnet API is available to try out now, and Ethan Mollick, a professor studying AI at the University of Pennsylvania’s Wharton School, did just that. Mollick tested out the AI with Universal Paperclips, an online clicker game with some wonderfully subtle science fiction going on in its background.
Mollick pointed the program at the game’s browser window and “told it to win,” then sat back and watched it operate. The result was fascinating. The AI was able to identify the point of the game by extrapolating its text-based interface, then use some trial and error to try and win — in this case, basically just making the numbers go up. It was able to fiddle with the price of paperclips to increase its fantasy revenue with some basic A/B testing, the way a real player would. But didn’t quite put together the steps needed to optimize the process, something that would be fairly obvious to a human player.
The real-world AI was “playing” a game about fictional AI. It ran into a few logic loops that prevented it from making meaningful progress, and Mollick’s virtual machine crashed multiple times before the hours-long game could be completed. But with an interesting bit of input from the human operator, “you are a computer, use your abilities,” it was coaxed into writing a basic bit of code to automate its processes.
This is an example of a virtual computer writing virtual code to play a virtual game — we’re going full Inception here, albeit with a fairly basic goal and outcome. Claude declared that it had “successfully ‘won’” the game by reaching a milestone “within the given constraints” after multiple VM crashes.
It didn’t win Universal Paperclips, not by a long shot. But bear in mind that playing this largely contextual game is far beyond the original automation intention laid out in Anthropic’s demo video. The AI’s ability to identify a goal and make progress with some minimal prodding was impressive. The full breakdown is well worth a read.
“[Claude] was flexible in the face of most errors, and persistent,” writes Professor Mollick. “It did clever things like A/B testing. And most importantly, it just did the work, operating for nearly an hour without interruption.”
Anthropic’s Claude AI is available as a free text-based tool on the web and as an app on iOS and Android, with the ability to ask about images and text documents. The latest changes (version 3.5) are live for the free version, but more advanced access requires the $20 per person, per month Pro account, with priority bandwidth and more models. Anthropic claims current clients that include dozens of companies, notably including Notion, Intuit (makers of TurboTax), and Zoom.