Opinion

Voice-Controlled AI Gadgets Could Fix the Worst Part of Using a Computer

What if instead of spending time doing stuff on a laptop, you could simply tell an AI to operate it for you?

March 24, 2024

A man sitting in front of his computer at night, from the movie "Her."

Warner Bros. Discovery

We’ve covered several of the ways large language models, and more generally, a new wave of artificial intelligence software and hardware, could change the way we play games, work with our own data, and find information online. No one is quite sure what generative AI is good for — and accepting pretty subpar performance while developers try and figure it out — but it’s at least clear that leaning into natural interactions with digital assistants could change a lot of the ways we use our computers every day. Mainly, by removing the need for complicated interfaces.

So far, current AI gadgets are being used to imagine a future without apps, or at least one where they’re far less important. Humane’s Ai Pin rejects them entirely, preferring to work with services directly to create “experiences” that streamed on demand. Meanwhile, the Rabbit’s R1 uses what it calls a “large action model” to navigate web and app interfaces for you, even giving you the option to train the R1 on new skills Rabbit never imagined.

The 01 Light, a new open-source device from Open Interpreter could take things even further. With a little setup, the pebble-shaped 01 Light can run and control your existing computer entirely locally. If talking to something with natural language was what got people hooked on the potential of generative AI, the ability to train an AI model to handle all of the repetitive clicking, tapping, and swiping of the devices you already own could realize the actual science fiction vision of an AI assistant, not decades from now, but this year.

The 01 Light

A woman holding a white AI-enabled disc to the camera.

Yes, the 01 Light is another AI gadget, but it might also be the most powerful yet.

Open Interpreter

Based on a short demo video Open Interpreter shared on X, the 01 Light is really just one component of a larger project to become the “Linux” of artificially intelligent hardware.

“By combining code-interpreting language models (“interpreters”) with speech recognition and voice synthesis, the 01’s flagship operating system (“01OS”) can power conversational, computer-operating AI devices similar to the Rabbit R1 or the Humane Pin,” Open Interpreter claims in its open source documents. The goal is to achieve the functionality of those devices but with an open-source operating system and hardware platform that’s “modular, and free for personal or commercial use.”

The 01 Light, which Open Interpreter is selling for $99 currently, runs locally on your laptop or desktop, but could eventually be hosted entirely on Open Interpreter’s servers. Press a button and ask for the weather, and the local model opens up Chrome, heads to the Weather Channel’s website, and reads off the forecast. Ask it to check your calendar and add an event, and it opens up Apple’s Calendar app and creates it. The effect is that there’s a ghostly user at your beck and call, one that already understands your computer’s app interfaces, and can be taught to do new skills just with voice instructions.

... there’s a ghostly user at your beck and call...

The ability to “learn” new information is a key component of current AI products you can use today. OpenAI’s “GPTs” feature is a consumer-friendly way to create customized chatbots based on GPT-4’s existing skills and whatever parameters and new information you introduce. Rabbit’s introduction video demonstrated the R1’s ability to learn how to use Discord as a selling point. The model gets the gist of how software is laid out; you just teach it the specifics so it can pull off doing the task repeatedly. It’s less glamorous than our existing smartphone and desktop operating systems somehow being reinvented to anticipate our needs and work for us, but the key thing is what Rabbit and Open Interpreter are doing is possible now, without a behemoth like Google, Apple, or Microsoft having to do anything.

Computer, Enhance

Science fiction, whether literature, film, or television, is littered with examples of people talking to computers. Captain Kirk talking to the Enterprise’s computer, David Bowman dealing with Hal 9000 in both the 2001: A Space Odyssey film and novel, Theodore Twombly falling in love with Samantha in Her. There are countless examples, often with the added wrinkle of an emotional relationship layered on top of the more utilitarian one of a computer that does things when you talk to it. Some people, to be dangerously general, are very hung up on the idea of a computer servant who does things for you… until it doesn’t. And whether you feel the same, it’s easy to see how it could be interesting and even useful.

We lived with the influence of that interest in AI assistants for years, as evidenced by the smart speaker you might already have in your living room, or the smartphone you have in your pocket. But the idea of these assistants using your devices for you hasn’t nearly been as thoroughly explored. Samsung’s Bixby was supposed to navigate your Galaxy phone for you, flipping on and off settings you might have a hard time finding for yourself. Bixby didn’t light the world on fire, but it was a good enough idea for Siri and Google Assistant to adopt not long after. Cortana was meant to do the same thing for Windows and the Office suite, and Microsoft is just starting to seriously explore the idea again with the Copilot experience on Windows 11.

The promise of the Rabbit R1 and especially the 01 Light, given its open-source bona fides (you can download the CAD files and schematics to make one right now), is that anything that happens on the web is fair game for an AI assistant. You don’t have to wait for official support; the model can already “see” and “understand” what’s there.

Never Click, Tap, or Swipe Again

There are more than a few things where a physical button or simple software interface will remain more convenient than talking into what amounts to a walkie-talkie with an AI model. Accepting that, though, if Open Interpreter’s concept and implementation get taken up, there’s a real chance that our relationships with our existing computers could really change or at the very least, how apps and interfaces are designed could become radically warped.

Is there a way an interface might be more user-friendly to these models? Do we even need to learn how to use professional software like Adobe Photoshop to add a drop shadow if we can have an AI assistant in a device like the 01 Light navigate dropdown menus and layers for you? (Or Adobe can sell an AI assistant that does it? Sounds like an easy way to increase subscription fees!) These are the kind of ripple effects an AI model that understands and can run software could have.

Apps have been the primary way we’ve understood how to get things done on our smartphones, tablets, and laptops. As needs have changed, the apps have gotten more complicated and functional, but not necessarily easier to use. Apps will stick around and developers and designers will continue to try and make them accessible, but in the meantime, if AI can make using an app simpler and less time-consuming, that’s an option I want to have.

Related Tags