I Welcome Our New Robot Underlings

Humanoid Robot Roundup 11/24

Nov 06, 2024

Two months ago, I wrote Against Humanoid Robots, defending my bet that humanoid robots capable of replacing humans in domestic tasks would not be commercially available within 10 years.

Since then, enough has happened in this space to warrant the promised addendum. I expect this will become a semi-regular update as long as the field keeps moving forward.

On today’s agenda:

The 1X NEO is announced and deploys into an apartment
Tesla’s We, Robot fiasco
Unitree does very little at all
Boston Dynamics plugs along
A fun surprise…

I. 1X NEO’s Maiden Voyage

Just as I published my original post, right before I was ready to hit publish, another domestic robot hit the news: The 1X NEO Beta.

NEO is a lightweight, softer robot explicitly designed for domestic purposes. Its creators announced their intentions to begin deploying to client homes in the thousands by 2025, scaling up to millions by 2028. I figured that was way, way overoptimistic, and decided to come back around to it once there was more meat on those bones. That time has come.

Though 1X have their own channel, most NEO footage is on S3, a fairly sized tech channel, including both an expo of the robot and long interviews with 1X leadership.

NEO is almost like a detailed answer to my original piece: its lighter and softer body is designed to minimize risk in a home environment, and 1X plans to deploy in homes first in order to get enhanced training data, instead of deploying in industry and slowly working up to domestic applications. That said, I haven’t been able to find the kind of physical specifications on the NEO that are commonplace for its competitors, including degrees of freedom of motion and whether it uses LIDAR cameras.

Which brings us to the NEO’s first deployment, into the apartment of S3 creator Jason Carman. Released just a week after my piece, this video features the NEO operating in an apartment, accompanied by a squad of 1X engineers and a ton of film equipment. This is slickly made, but it’s not hiding its artificiality. One thing I do like about 1X is that they’re fairly transparent compared to companies like Tesla and Unitree, and seem to be truly serious about developing a robot for home use, with all the challenges and risks that entails.

NEO: Capabilities

Up front, 1X highlight tasks like preparing coffee and making your bed. If it can’t do things like this, it’s not ready for home use. So, how does the NEO perform?

No word on making a bed, but it does make coffee! Kinda…

Rather, what the video shows is the NEO pouring hot water through a filter, and b-roll at the end shows it pressing the button on an electric kettle and shaking coffee grounds into the filter. It specifically does not pour the coffee into a mug. (1:51)

In my original article and going forward, I go by ‘if you don’t see it, it’s not there’ rules. Any capacity that is implied but not specifically shown, for our purposes, does not exist. Do not be fooled by the illusion of motion: these are still frames.

Everything between the moments shown is a blank, and we should not assume that the robot did things like get the coffee grounds, rolled up the filter, etc, or that it performed any of the tasks we did see on the first try.

If you’re a big fan of robotics, that might seem like I’m just pouring cold water for no reason, but it’s necessary to counter the obscene hype things like this can get. People, from online enthusiasts to otherwise sober business leaders, are too willing to assume vast capacity which isn’t actually shown, or believe that performance in a controlled lab on specifically-trained tasks matches live performance in the wild.

The NEO also performs some other tasks: in a series of cuts, it grabs a pepper from an open fridge, the pepper is taken from its hand, and it closes the fridge. This looks nice and smooth in the edit, but we should expect that it was much slower live, and it may not have worked on the first try or with just one set of instructions. (3:34)

It’s also seen to pick up an egg and hand it over, this time in a smooth motion, though you can see a change in the position of the egg carton which suggests some intermediary action or multiple attempts. (4:08)

And then the NEO is asked to prove that the eggs aren’t hard boiled, which it accomplishes by taking an egg and dropping it on the counter. (4:42)

All of these moments are staged, but this one was the most obviously so, even contrived. Not just because of the humor of the moment, but because this is not the behavior you’d want in the real thing: in response to a question like that, we do not want it to wordlessly make a mess in our very calvinist kitchen, which we must then clean up. At the very least, a clarifying question would be in order.

This does bring up the question about whether the NEO is actually acting autonomously or if it’s being teleoperated. We know it’s accompanied by a team of off-camera engineers, and there are a couple of moments, like a little wave at 2:46, that seem like human timing, but I do believe that the substantive actions we discussed already were performed autonomously; if they weren’t and 1X were willing to be a lot more deceptive (like certain companies who will have to wait their turn) then they’d be showing something much more smooth.

That said, I’m also confident that it’s not acting autonomously based on verbal commands alone. Instead, I expect their behind-the-scenes work involved a lot of manual operation, repositioning, and setting up the environment in between verbal commands.

A few more small notes: as I expected, 1X’s plan for a NEO deployment begins with a guided tour of its environment, as well as setting preferences like off-limits areas. It remains to be seen how long such a tour takes and how granular it needs to be.

It’s not until late in the video we see the NEO walk, and it’s more of an undignified waddle. Though it’s mechanically capable of smoother motion, like we saw in the original announcement, it’s not quite there in autonomous motion. My thesis that software is the much bigger bottleneck than hardware remains triumphant.

All in all, the NEO’s present capabilities aren’t shocking, and, nice editing aside, 1X isn’t making outrageous claims here. According to the machine itself, its ability to do something like cook autonomously is at least several years out.

Rather, its present capabilities are less interesting than 1X’s intended deployment model.

NEO: The Plan

The NEO not only isn’t autonomous today, it won’t even be partially autonomous for years after its intended launch in 2025, and that’s all part of the plan.

Unlike everyone else in this space, 1X isn’t trying to capture industry or development first and then go into the home: they identify the home as the endgame, but want to go there directly without the intermediary steps.

How are they going to do this? By having the first NEOs be teleoperated, and use the resulting data to train autonomous systems, one task at a time, until whole workflows can be automated.

This is, by far, the most plausible way of acquiring the training data necessary for training domestic tasks, especially if you can get people to pay you as you do it. But it comes with a whole host of very interesting problems.

First and foremost, the issue of privacy. It’s actually remarkably non-obvious where the issue comes in: after all, we’re expecting these robots to see first adoption by the kinds of people who already have domestic servants and want to replace them. If you’re comfortable having a flesh-and-blood person in your home to do work, why does having the robot controlled remotely seem so different?

I believe it’s because, in the former case, you can actually develop a personal relationship and rapport with a real person, which the robot is designed to obviate. Instead, this faceless thing, whose void-like head keeps it safely on the other side of the uncanny valley, is actually controlled by a person, anonymous and at hundreds or thousands of miles remove. It’s a bit eerie, and I’m not sure we’ve seen anything quite like it before in consumer technology. Instead of welcoming someone into your home or being a distinct non-person, the teleoperated NEO takes on a voyeuristic quality.

1X’s preferred solution is a set of guardrails on what the operator can see and where they can go, combining geofencing of off-limits areas with a (apparently still in development) machine vision censor which prevents the operator from seeing certain things/places. 1X must understand how weird this sounds, given that they intentionally compare this to the Black Mirror episode ‘White Christmas’

White Christmas: Black Mirror, Did it Age Well? | Elliot Chan — White Christmas - Netflix, 2014

But beyond the privacy of the client vis-a-vis the operator, we must also concern ourselves with their privacy with regard to the AI models and anyone else who may view the data.

Unlike other customer details that are rather more nebulous and difficult to get your head around, like internet browsing behavior, this data consists of long-run audio-visual recordings of the client’s domestic life. Even for people comfortable with an Alexa in their home, full A/V from an ambulatory robot is likely a hard sell.

This is doubly important because that data will be the main training set for an AI model, placing considerable practical, ethical, and legal burdens on 1X’s technicians. I also wonder how much 1X, a Norwegian company, has looked into possible conflicts with the EU AI bill. This subject is entirely outside of my expertise, and I would welcome comments from anyone informed on it.

This is going to form an additional client bottleneck early on. In my experience, most people rich and techno-optimist enough to be early adopters of robot servants are also quite cautious when it comes to their personal data. I doubt this will be fatal, but it will slow things down in the beginning.

The selling point of a domestic robot is that, assuming a given price point and lifetime, it’s cheaper and more reliable than a human being. If the robots are going to be controlled by people anyway, that advantage won’t materialize until they are almost entirely autonomous—upwards of a decade in my bet, some 3 years on 1X’s much more optimistic timeline.

That is, those customers getting a NEO next year (assuming that plan stays on), almost certainly won’t be making a one time purchase. Unless 1X is willing to dip into cash reserves for as long as this training process takes, the NEO is going to require some ongoing payment for the labor of the operators themselves.

Absent time-gating on daily NEO usage, 1X will also need some multiple of the number of robots in circulation in operators, possibly waiting on standby for the robot they control to be activated. Depending on what kind of lag we see on long-distance teleoperation, they’ll probably need to open operation centers in several time zones. Given 1X’s goal of millions of robots by 2028, we should assume their internal estimates aim to have the majority of domestic workflow automated by then, assuming they don’t plan to become one of the world’s largest employers along the way. This ‘robot as service’ model is meant to be temporary, but how long it will actually take to deprecate is unclear.

We should also consider what this would be like for the operators. 1X has a lot of options, but it’s not inconceivable that operating a robot servant hundreds of miles away could turn into an unusually alienating job.

As I pointed out in my original piece, human domestic service workers aren’t treated well to begin with, and replacing them with something that just isn’t an ethical subject is certainly part of the appeal of robots for many. But what happens when the robot is, in fact, controlled by a nameless, faceless person across the horizon? What would it be like, however briefly, to run domestic work on a call-center model, and what are the effects of working, visibly and measurably, towards your own obsolescence?

Hopefully, 1X has some notion of how it’s going to do this in an ethical manner (maybe it’s addressed in one of those S3 interviews), but keep it in the back of your mind as we go on.

NEO: Assessment

With all that said, it may seem like I’m hating on 1X, but that’s not the case.

To the contrary, I really like these guys and find them extremely refreshing. Though their videos have the usual issues with slick and deceptive editing, they’re far more transparent than the standard for their competitors. These are dense documents, and I’ll often find myself stopping to make a note only to have that concern addressed directly just moments later. In particular, their very frank discussion of cybersecurity issues and the threat that even small vulnerabilities would have on their clients and the company puts me at ease much more effectively than the blank positivity you get from competitors with more developed PR departments.

Would I invest in them? Based on the information available publicly at present, no. But I really do wish them success. Out of the robots I’ve examined so far, 1X is the only one that has seriously set its sights on the home and critically examined what that requires. Where others LARP in this direction, 1X is taking serious and intentional action.

And as the most serious actors I’ve seen in this field, they come in for the most serious critique, but not mockery (wait your turn, Tesla).

In particular, their vision of domestic robotics has highlighted some deficiencies in my original article.

One aspect which I didn’t consider directly then, though I was on the threshold, was the robot’s ability to not only infer unstated desires from verbal instructions, but to take in non-verbal body language. Though the 1X team still expects the NEO to do its more detailed work based on verbal instruction, they also place an emphasis on its ability to respond to body language for moment-to-moment interaction.

Finally, I now see there was a hole in my original bet. Both my friend and I just assumed that any robot deployed into the home would need to already be fully autonomous; in my view, this is the major blocker on deployment and my biggest issue with the 10-year timeline.

I’d have egg on my face if the NEO deployed and met all my original criteria, except that it was completely remote-controlled!

So, with the consent of all parties, I retroactively add the following language into our bet:

That the robot should perform all such tasks autonomously, without the actuality or expectation of teleoperator supervision or interference.

II. Shot, Chaser: Tesla’s Remote Bartending

I’m rather late to the party on this one, but it’s worth covering anyway.

On October 10th, a month and change after my article, Tesla held its We, Robot event. I learned about it the next morning, when I received this link from my aforementioned friend:

Hundreds of videos, like this one, showing people interacting with the Tesla Optimus in natural language, complete with lightning-fast responses and dynamic body language, swept social media. It was a triumph of robotics and AI, indistinguishable from interacting with a human being.

Because they were, in fact, interacting with human beings.

It became clear in short order, between clips in which the Optimus admitted that it was being ‘assisted’ by a human and later announcements, that autonomous behavior was minimal outside of walking from place to place. Everything else, from the conversation to the rock-paper-scissors to pouring drinks from the tap, was performed by a teleoperator.

While outlets like The Verge take a very rosy view of the event, claiming that

‘It’s obvious when you watch the videos from the event, of course’

and

‘It doesn’t feel like Tesla was going out of its way to make anyone think the Optimus machines were acting on their own.’

this is bull hockey. According to reporting by TechCrunch, the illusion broadly worked on the people present, all of whom were investors in or fans of Tesla. They came away from the event fully believing not just that it was moving autonomously but that it was speaking to them autonomously, powered either by ChatGPT or Musk’s competitor chatbot Grok.

These were robots milling about at an event centered around autonomous vehicles, of whom Musk said

It can be a teacher, babysit your kids. It can walk your dog, mow your lawn, get the groceries, just be your friend, serve drinks. Whatever you can think of, it will do.

Clearly, most attendees were convinced that they were autonomous as were hundreds of thousands of viewers on social media, including my friend and no small number of influential and otherwise canny entrepreneurs.

This wasn’t obviously false to its target audience, and Tesla made no effort to make them aware of it. Considering how the operators in the videos sound nervous and evasive when asked if ‘they’ are autonomous, yet continue to role-play as robots, I expect they were instructed to maintain the illusion without making direct claims which could be held against them.

But this is not a Disneyland environment in which actors must playfully keep up the illusion of talking to Mickey Mouse; this is a robotics event centered on the promise of autonomous vehicles, run by a company with a long track record of claiming it has cracked autonomy when it clearly hasn’t.

The fact that, as far as I can tell, Tesla never explicitly claimed in as many words that the Optimus robots present were speaking or moving autonomously, is the fig leaf between them and allegations of lying to investors.

Needless to say, technically not lying to your investors is not a good idea, and Tesla’s stock dipped considerably the next day, though it has since recovered. Then again, it’s unclear whether the cause of this were the robots themselves or the autonomous cab and van unveiled at the same event, which, broadly, don’t inspire confidence.

It contrasts rather nicely with 1X: a new competitor putting their technical challenges on display, albeit with a digital glamer, versus an established firm covering their deficiencies in layers of deniable illusions. If you think this is acceptable, get your head checked.

As far as getting an actual view into the Optimus’ capabilities, Tesla claims they can share persistent models of their environment, climb stairs, and perform basic interactions with humans, such as passing out canned drinks and protecting their personal space. Neat, though not revolutionary, if true. That said, I’m downgrading my confidence in all claims about the Optimus accordingly.

III. Missing the Forest for the Unitrees

Where Tesla has had a newsworthy couple of months, including for reasons wholly unrelated to robotics I won’t touch on here, Unitree has shown very little.

I included Unitree’s G1 in my original post as the third corner alongside Tesla and Boston Dynamics: a less well-known Chinese firm that revealed their humanoid platform shortly before I wrote my article.

I say ‘platform’ intentionally. Both the existing footage and the little english-language documentation available shows only anemic efforts towards AI integration. Instead, it seems more like an engineering and education platform which consumers can program or train themselves. That said, between my original article and the present, it seems to have come back in stock, though I’m not spending my own money to confirm.

Its big announcements surround the G1’s capacity to autonomously right itself, as well as achieve a smoother walk cycle and a considerable jump distance. The latter is probably irrelevant for domestic purposes, but the former is valuable, so long as that smoothness holds up in less sanitized environments.

IV. A Very Dynamic Halloween

Finally, and as expected, Boston Dynamics continues to deliver updates with little fanfare. Footage with no commentary of the Atlas performing simple but effortful tasks in a factory environment, supposedly fully autonomously, dropped a couple months after my article, along with a Halloween variant the next day with the Atlas wearing a hot dog costume.

To be clear, the Atlas simply is not in the running for domestic applications: it’s expensive and heavy-duty, designed for industry and emergencies. I highlight it in these posts as a benchmark for other robots in development. It’s hard to get too excited about the Unitree G1 doing a long jump when the Atlas has been doing backflips and running obstacle courses for years.

The other reason I bring up the Atlas is highlighted in these videos: where the rest of the humanoid robot industry is headed towards teleoperation, Boston Dynamics seems dedicated to an ML-based approach.

They claim, below this video,

There are no prescribed or teleoperated movements; all motions are generated autonomously online.

which would imply that teleoperation data is a smaller part of the training process for the Atlas vs their contemporaries. That said, I’m not totally clear on this, and plan to look into available details on its training process in the future.

V. Next Plodding Steps

That just about brings us to a close. If all goes well, I just might make this a quarterly post, catching up with the state of the field and figuring out whether I should surrender on my bet.

By next time, I hope to take a broader look at emerging humanoids, including Figure and the Apptronik Apollo.

Until then -

Wait… is that…

VI. Scent Teleportation

In my original piece, I closed with an aspect of domestic robots that I didn’t see people paying attention to: machine olfaction, the ability of a robot to smell and taste. I considered this, though not an essential aspect of domestic robotics, a blind spot that would considerably hamper a robot’s ability to perform domestic tasks such as cooking where smell and taste are crucial.

I also mentioned that the field is far, far less developed than machine vision and hearing, both because it’s not a key sense for humans and because it’s a much tougher problem in abstract.

But just a week ago, the field experienced a considerable advance.

A team at Osmo claims to have successfully ‘teleported’ a scent from one location to another, analyzing it here and reconstituting it molecularly over there. It’s an amazing accomplishment, and I do recommend reading their account of it.

Its most immediate applications would probably be in fields like perfumes and other fragrances, but could this mean that robots might have a sense of smell and taste within a decade?

I’m going with a very strong no.

While this has shown that machines can, in principle, be given wide-ranging olfaction, the hardware required is not only bulky but very, very expensive. A GCMS (Gas Chromatography-Mass Spectrometer) like the one Osmo used is a hefty piece of desktop lab kit, and goes for… well it’s expensive enough that when you try to shop for one, you’re looking for a quote, not a market price. That said, Google thinks purchases from second-hand vendors are plausibly in the $1 million range… with 14-day free shipping.

Could mass spectrometers be miniaturized enough to fit inside a robotic frame and become cheap enough to mass-produce? Maybe, but definitely not within 10 years, and barring some extraordinary use case that puts serious investment to that end, not in the next 30 either, and that’s assuming there aren’t fundamental limitations here I’m not aware of.

If you’ve made it this far, do consider teleporting this post into others’ feeds.

This is, counter to my usual thesis, a case where hardware is the bottleneck, not software. We could plausibly have an AI model capable of identifying and reconstructing subtle scents within 5 years… but it wouldn’t be able to get input outside a lab.

That downer is my cue to sign off. Until next time, have an excellent week.

The Spirit That Self-Negates

Discussion about this post