Two Rules of AI Business and Startups That Ignore Them

These rules are not new, and they are not mine; I stole them from Andrew Ng and Benedict Evans, two men with a huge following. Still, a large majority of AI entrepreneurs and engineers don’t pay attention to them, maybe because these rules show why their AI project will fail.

AI’s Law of Diminishing Returns

To paraphrase Andrew’s words from Coursera’s Deep Learning Specialization course:

The effort to half an AI system’s error rate is similar, regardless of the starting error rate. 

This is not very intuitive. If an AI system passes 90% of test cases and errors on 10%, then you are 90% done, right? Fix the remaining 10% of errors, and you will have 100% accuracy? Absolutely not. If it took you six months to halve the error rate from 20% to 10%, it will take you approximately another six months to halve 10% to 5%. And another six months to halve 5% to 2.5%. Ad infinitum. You will never achieve a 0% error rate on a real-world AI system. For an illustrative example, see this typical chart of error rate vs the number of training samples:

Notice that later in the training process, training set size increases exponentially with each error rate halving, and the error rate never reaches zero. Sure, you will get more efficient with acquiring training data (e.g., by using low-quality sources or synthetic data). Still, it is hard to believe that acquiring 10X more data is going to be much easier than acquiring the initial set. 

This rule becomes more intuitive when dissecting what an AI system error rate represents: uncovered real-world special cases. There are an infinite number of them. For example, one of the easiest machine learning (ML) tasks is classifying images of dogs and cats. It is an introductory task with online tutorials that get 99% accuracy. But solving the last 1% is incredibly hard. For example, is the creature in the image below a dog or a cat?

It is Atchoum the cat, who rose to fame because half of humans recognized him as a dog. The human accuracy on dog/cat classification within 30 seconds is 99.6%. A dog/cat classifier with less than a 0.4% error rate would be superhuman. But it is possible. A training set with hundreds of thousands of strange-looking dogs and cats would teach a neural network to focus just on details encoded in dog or cat chromosomes (e.g. cat eyes). However, building such a dataset is orders of magnitude more complex than a tutorial with 99% accuracy. Other problems lurk in that 1% error rate: photos that are too dark, photos in low resolution, photo compression artifacts, photo post-processing by modern smartphones (adding of non-existing details), dogs and cats with medical conditions etc. The problem space is infinite. This is still considered a solved ML problem though, because a 1% error rate is low enough for all practical purposes. 

But for some problems, even a 0.01% error rate is not satisfactory, for example: full-self driving (FSD). Elon Musk said in a 2015 article with Forbes:

“We’re going to end up with complete autonomy, and I think we will have complete autonomy in approximately two years.”

Tesla was so confident in that prediction that they started selling a full self-driving add-on package in 2016, and they weren’t the only ones. Kyle Vogt, CEO of Cruise, wrote a piece called How we built the first real self-driving car (really) in 2017, in which he claimed:

“the most critical requirement for deployment at scale is actually the ability to manufacture the cars that run that software”

So, the software and the working prototype are done; they just need to mass-produce “100,000 vehicles per year.” 

Fast forward to 2024. Elon Musk’s predictions for autonomous Tesla vehicles deserved a lengthy Wikipedia table, mostly in red

What about Kyle Vogt? In October of 2023 Cruise’s car dragged a pedestrian for 20 feet, after which California’s DMV suspended Cruise’s self-driving taxi license. Kyle “resigned” as CEO in November 2023.  

Don’t misunderstand me—I believe autonomous cars will have a significant market share, probably in the next decade. The failed predictions above illustrate what happens when entrepreneurs don’t respect the AI law of diminishing returns. Elon and Kyle probably saw a demo of a full-self driving car that could drive on its own, on a sunny day, on a marked road. Sure, a safety driver needed to intervene sometimes, but that was only 1% of the drive time. It is easy to conclude that “autonomous driving is a solved problem,” as Elon said in 2016. Notice how ML scientists and engineers didn’t make such bombastic claims. They were aware of many edge cases, some of which are described in crash reports. Edge cases include:

Why so many companies promised a drastic reduction in self-driving error rates in such a short time without having a completely new ML architecture is an open question. Scaling laws for convolutional neural networks have been known for some time, and the new transformer architecture obeys a similar scaling law. 

AI’s Product vs Feature Rule

When is an AI system a good stand-alone product, and when is it just a feature? In the words of Benedict Evans from The AI Summer podcast: “Is this feature or a product? Well, if you can’t guarantee it is right, it’s a feature. It needs to be wrapped in something that manages or controls expectations.” I love that statement. The “it is right” part can be broken down using error rate:

If your AI system has a higher error rate than target users, you have an AI feature in an existing workflow, not a stand-alone AI product.

This rule is more intuitive than the law of diminishing returns. If target users are better at a task, they will not like stand-alone AI system results. They could still use AI to save them effort and time, but they will want to review and edit AI output. If AI completely fails at a task, humans will use the old workflow and the old software to finish the task.

Let’s take MidJourney for example, which generates whole images based on a text prompt. When I used it for a hobby project last year, satisfying artistic images appeared instantly, like magic. But then I spent hours fixing creepy hands, similar to the ones below:

Each time MidJourney created a new image, one of the hands had strange artifacts. Finally, it generated an image with two normal hands—but then it destroyed the ears in another part of the image. The problem was less with wrong details and more with bad UI, which didn’t allow correction of the AI’s mistakes.

Adobe’s approach is different—it treats generative AI as just one feature in its product suite. You use an existing tool, select an area, and then do a generative fill:

You can use it for the smallest of tasks, like removing cigarette butts from grass in a wedding photoshoot. If you dislike AI grass, no problem—revert to the old designer joy of manually cloning grass. Also, Adobe Illustrator has generative Vector AI that generates vector shapes you can edit to your liking.

MidJourney makes more impressive demos, but Adobe’s approach is more useful to professional designers. That doesn’t mean MidJourney doesn’t make sense as a product, its target users are the ones who don’t care about details. For example, last Christmas, I got the following greetings image over WhatsApp:

Did you notice baby Jesus’ hands and eyes? Take another look:

That would never pass with a designer, but that is not the point. There is a whole army of users who don’t care about image composition and details, they just want images that go with their content. In other words, MidJourney is not a replacement for Adobe’s Creative Suite—it is a replacement for stock photo libraries like Shutterstock and Getty Images. And judging by the recent popularity of AI-generated images on social media and the web, people like artsy MidJourney images more than stock photos.

Low-hanging fruit in stand-alone AI products are use cases where a high error rate doesn’t matter or is still better than the human error rate. An unfortunate example is guided missiles; in the Gulf War, the accuracy of Tomahawk missiles was less than 60%. But the army was happy to buy Tomahawks because they were still much more accurate than older alternatives, as fewer than 1 in 14 unguided bombs hit their targets.

Evaluating startups based on the above rules

The great thing is that error rates are measurable, so the above rules give a framework to judge an AI startup quickly. Below is a simple startup example.

Devin AI made quite a splash in March of 2024 with a video demo of developer AI that can create fully working software projects. The announcement says that Devin was “evaluated on the SWE-Bench” (relevant benchmark), and “correctly resolves 13.86% of the issues unassisted, far exceeding the previous state-of-the-art model performance of 1.96% unassisted.” So, the current state-of-the-art (SOTA) has a 98% error rate, and they claim to have an 86% error rate. Even if that claim is valid (it wasn’t independently verified), why do their promo videos show success after success? It turns out that the video examples were cherry-picked, the task description was changed, and Devin took hours to complete.

In my opinion, Microsoft took the right approach with GitHub Copilot. Although LLMs work surprisingly well for coding, they still make a ton of mistakes and don’t make sense as a stand-alone product. Copilot is a feature integrated into popular IDEs that pops up with suggestions when they are likely to help. You can review, edit, or improve on each suggestion.  

Again, don’t get me wrong. I think coding SOTA will drastically improve over the next few years, and one day, AI will be able to solve 80% of GitHub issues. Devin AI is still far away from that day, although the company has a valuation of $2 billion in 2024.

More formally, the framework for evaluation is:

  1. Find a relevant benchmark for a specific AI use case. 
  2. Find the current state-of-the-art (SOTA) error rate and human error rate on that benchmark.
  3. Is the SOTA better or comparable to the human error rate?
    1. If yes (unlikely): Great, the problem is solved, and you can create a stand-alone AI product by reproducing SOTA results.
    2. If no (likely): Check if there is a niche customer segment that is more tolerant of errors. If yes, you can still have a niche stand-alone product. If you can’t find such a niche, go to the next step.
  4. You can’t release a stand-alone AI product. Wait for SOTA to get better, pour money into research, or go to the next step.
  5. Think about how to integrate AI as a feature into the existing product. Make it easy for users to detect and correct AI’s mistakes. Then, measure AI’s return on investment:

    AI_ROI = Effort_saved_by_AI / Effort_lost_correcting_AI

    If too much user time is spent checking and correcting AI errors (AI_ROI<=1), you don’t even have a feature.

Or, to summarize everything discussed here in one sentence:

Every innovative AI use case will eventually become a feature or a product, once the error rates allow it. If you want to make it happen faster, become a researcher. OpenAI’s early employees spent seven years on AI research before overnight success with ChatGPT. Ilya Sutskever, OpenAI’s chief scientist, still didn’t want to release ChatGPT 3.5 because he was afraid it hallucinated too much. Science takes time.

If you found this article useful, please share.

 

 

Daddy, did you really need to buy an electric car?

I made a mistake: I bought an electric car. EV articles I have read on Hacker News and Reddit didn’t prepare me for a dozen EV infrastructure problems in my part of the EU. Anecdotes below explain lessons I learned the hard way.

When I see a cool new gadget, my rationality goes away. I tell myself, “Don’t buy another device that will somehow be discharged when you need it,” but to no avail. The siren call would sing, “It has USB-C!” until my wallet opens wider than the Spielberg’s “Jaws”.

This time, the siren call was a government incentive of 10,600 EUR ($12,100) for a new electric vehicle purchase. “Great,” I thought, “I need to replace my old car.” My experience of driving electric car-shares in Oxford and Berlin was great. So, I purchased a 1.5-ton gadget, with no USB-C ports included, called Hyundai Kona EV:

Why not a Tesla Model 3, you ask? Fun fact: Nikola Tesla was a naming inspiration for both Tesla cars and Nikola trucks. Tesla was an ethnic Serbian born in present-day Croatia, but you can’t officially buy a Tesla or Nikola vehicle in Croatia (where I live) or Serbia. Elon Musk tweeted his hope to open sales in the region early in 2020, but as of August 2020 that didn’t happen.

First, the good parts: as soon as I picked up my car in Zagreb, Croatia, I was impressed. EVs are so quiet that EU law requires they produce a buzzing sound when going slower than 30 km/h. Acceleration is instant; torque is strong enough, even in Eco mode, to make tires scream if I pushed the “gas” pedal too fast. There are no vibrations and no jerking when the car shifts gears. EVs use regenerative breaking, again completely silent. EV charging is currently free in Croatia, and depending on rainfall, 44% of that electricity is carbon-free. I was in love with the car.

Lessons 1-6, Slovenia

The first weekend I drove with a group of friends to a Toastmaster’s competition in Ljubljana, Slovenia, 143 km from Zagreb (83 mi). Kona EV with 64 kWh battery has a declared range of 450 km (280 mi), so even with 90% battery, I had enough for a round trip. So I thought. I soon learned the first lesson:

#1: EVs have 20% shorter range in cold weather.

It was November. But still, 450 x 90% x 0.8 = 324 km, round trip is 286 km, we were good. While driving 130 km/h on a highway (80mph, a legal limit), I noticed my range dropping significantly. The car specs sheet lied in the worst possible way: they were technically correct. High torque: true. Range of 450 km: true. But:

#2: EVs can’t deliver both performance and declared range at the same time, you need to choose one.

I had the wrong intuition that cars have a longer range on the highway than in the city. That is true for gas cars because gas cars are extremely inefficient in cities. All cylinders are running, swallowing gas when accelerating, and that kinetic energy is lost when braking. As EVs use only the energy needed and recuperate that energy when breaking, they really have a declared range when driving in the city. When driving fast, EV range shortens because air resistance is proportional to the square of speed. Gas engines are more efficient when all cylinders are firing at constant, high RPMs. As a result, gas cars have a maximum range at speeds of 89-97 km/h (55-60 mph). EVs have a maximum range at around 55 km/h (35 mph), and range falls linearly as you drive faster. For example, this is the range decrease for Tesla Model S, depending on speed and temperature:

As I slowed down, the estimated range increased. Unfortunately, part of the highway was closed for construction, and we needed to take a detour. The round trip just got longer, so I decided to park at the EV charging place. But, it was a Type 2 AC charger, meaning another lesson was coming:

#3: To charge at Type 2 AC plugs in the EU, you need to bring your cable.

Of course, I had none. Fast chargers I used before had attached cables, and I was wondering what was the point of the cable lock button on Hyundai Kona? It is there because:

#4: Lock your 230 EUR Type 2 cable if you don’t want it stolen.

Between event sessions, I was Googling “fast EV chargers in Slovenia.” Highway stops by “Petrol” had them, great. Around 11 PM, we left Ljubljana in high spirits. We arrived at the Petrol station and found a fast charger, with cables. Guess what happened next:

#5: EU fast chargers are activated via a proprietary chip card or smartphone app. Each company in each country uses a different card / app.

Of course, I only had a smartphone charging app for Croatia. Lessons were coming fast:

#5A: Petrol gas stations clerks can’t sell you a Petrol charging card, and they can’t activate an EV charging even when you want to pay for it.

“Download and register in the Petrol app,” the night clerk said. Yes, but:

#5B: Petrol Android app is 50 MB, so if a station is covered by a slow EDGE signal, you can’t download the app.

We got back to the highway, where my friend caught H+ signal and downloaded the app. At the next Petrol station, we discovered:

#5C: When you hit the “Start charging” button in the Petrol app, it redirects you to a registration form to enter payment and address.

No problem, I can fill the form, even on EDGE signal. But then:

#5D: To register in the Petrol app, needed for charging, you need a valid address in Slovenia.

It is good the EV charging station was 20 meters away from the gas station because I was cursing Petrol so loud a clerk would have had all rights to call the police. I should have read Petrol app comments beforehand:

It was past midnight when we got back on the highway. To extend the range, I slowed down to 75 km/h (46 mph). The limit was 130km/h, so a trailer truck started taking over us. Then I remembered a Tour de France strategy I saw on TV:

#6: In an emergency, you can extend the range of your EV by tailgating a  trailer truck, a strategy called aerodynamic drafting.

Don’t hold me responsible for traffic tickets or beating by an angry trucker. There I was, hidden behind a trailer truck, to save electricity while driving 90 km/h. The car showed it was using less power, and the range extended. When we entered Zagreb, there was 19 km left:

I went to a fast-charging station I could actually activate, and I fell asleep in the car.

Lessons 7-9, Croatia

I made fun of Petrol for their unfriendliness towards tourists, but at least the EV chargers at Petrol stations are working. In Croatia, you can’t always count on that. In my modest opinion, that stems from the fact that charging is free. Why is free electricity a problem, you ask? There is no commercial incentive to build and maintain charging stations. Instead, most chargers are built when the EU donates money or when the government is pressed to improve EV infrastructure. Local politicians come when a charging station is finished to deliver speeches: “This is an example of Croatia using EU funds wisely, for a green and sustainable future.” After the journalists leave, there is no economic interest in sustaining chargers in the future.

For example, this city-provided Type 2 charger has been broken for three months:

In front of the Zagreb city hall, one charging station displays an error:

While other just states it is out of order:

What I learned is:

#7: Free chargers are often unmaintained, and you are better with a commercial provider you can hold accountable.

“But, there is nothing to break on a charging station!” I thought. Boy, I was wrong. EV charging stations use internal computer for authorization, and for controlling the display and the charging protocol. Before charging starts, a car locks the cable with the mechanical pin and signals that back to the charger. When a car is unlocked, it signals a charger to drop the voltage to zero, so the mechanical pin holding cable can be released. Guess what?

#8: Some chargers are not compatible with some EVs and refuse to stop charging, resulting in the “Hotel California effect.” You can check out in the app, but you can never leave.

For example, Croatian Telecom chargers. The first time their cable got stuck in my car I panicked. Their app was not working, the charger was not responding to car signals, and their charging stations don’t have the stop button:

They have the contact phone, but it was Sunday afternoon. I called them, expecting three levels of menus and “we are not working on weekends” message. To my great surprise, a female voice answered after two rings. “I have a big problem with your EV charging station: I can’t get my car out because…” My explanation was stopped by “What is the serial number written on the station?” As I dictated the serial, I could hear fast typing. A few seconds later charging station stopped buzzing, display went blank for a moment, and I could unplug my car. “Wow, that was fast, thank you very much!” I said. Although they currently don’t charge for electricity, Croatian Telekom is a commercial operation. They have a direct 24/7 support line where a representative can reset any charging station in Croatia, provided the serial number. It seems that such a solution was easier than fixing charging stations or the charging app. I now just call them and say, “Can you reset the station with serial X?” If you work there, let me know, I would like to buy a box of chocolates for the team.

Unfortunately, EV charges are not the only thing broken here. Our driving culture is also broken:

#9: It is common to find a gas car parked at the EV charging spot, the practice called ICEing (internal combustion engine-ing).

One time I urgently needed to charge, and this was the charging spot:

ICEing is common around the world. Tesla implemented parking locks in China. German police are lifting cars instead of giving parking tickets. Croatian EV owners decided to stage a short protest, where they blocked access to a gas station with EVs, for five minutes:

Online reactions were not sympathetic:

I find Paul’s “they’re blocking poor people” comment particularly insightful. EV owners are currently perceived as rich geeks. There was a similar dilemma in the early 1900s: why would tax-payers money be used to make asphalted roads when only rich people can afford cars?

Lesson 10, Austria

After the New Year, my 10-year-old daughter and me were going for a skiing vacation to Austria. This time I prepared like it was D-day. For obvious reasons, I decided to skip Slovenia and charge somewhere in Austria. First charging option was a fast charger next to IKEA Graz, but that required registration with a SMATRICS, a setup fee, and a monthly subscription. The second option was a charger at our ski village, but I found out that to be a tourist trap, as they charged 230.40 EUR for 8 hours. In other words, they were selling electricity at 20x times the Austrian rate. My third option was a valid one: IONITY 50kWh chargers on a highway near Graz, no registration needed, and a full charge of 6.5 EUR. We stopped there, activated charging via an app, and went to a nearby restaurant for Schnitzel. By the time we finished a coffee, the car was fully charged. I could easily go to the wrong charging station, so the lesson was clear:

#10: Before a long EV trip, research all charging options, download necessary apps, and read negative user reviews to see if other people had a problem with charging.

While leaving Graz, I remembered that Tesla studied, but never finished, nearby Graz Polytechnic. The minor problem was that he came into conflict with a professor over the Gramme dynamo, when Tesla suggested that commutators were not necessary. The bigger problem was that he spent his nights gambling and got into gambling debt.

Lesson 11, Croatia

It was a May trip to Velebit mountain that finally convinced me buying an EV was a mistake.

My daughter, me, and a dozen other people were going for a weekend of camping. The camping site confirmed I could use one of electric sockets for motor homes. Just in case, I found two charging stations in a nearby town. Perfect.

As we arrived at the camping site, I started charging. Hyundai portable wall charger pulls a maximum of 2.8 kW, so I planned two nights of charging. I joined a barbecue and grabbed a beer. An hour later, the camp owner was searching for me.

“You plugged your EV, right?” he asked.

“Yes, it is charging fine,” I replied.

“Not anymore. The camp’s main electric fuse went off. We don’t have any electricity,” he explained. “Can you unplug it?”

He was apologetic. “Sorry for the situation, can you plug it again after 10 PM when everybody goes to sleep?”

As told, I plugged my car at 10:30 PM. Twenty minutes later, I saw the camp owner walking around with a lamp and asking people if they knew where the EV owner was.

“The main fuse again?” I asked.

“Yes,” he replied, “sorry, but you will have to charge your car somewhere else.”

That was a small private camp, and it seemed the owner had just extended electric cabling from the fuse box of his house. Electricity was used by his house, laundry room, a dozen plugs across the camp, and an electric heater for a glamping tent.

The next day, after hiking during the day, I drove 30 minutes to the Type 2 charger in Gospić. Unfortunately:

#11: AC Type 2 charger speed depends not on declared power (22 kW), but on the power of AC converter you have in your car. That makes Type 2 chargers useless for travel.

Out of AC charger declared for 22 kW, Tesla 3 can draw 11 kW, and Kona EV can draw 7.2 kW. I needed to leave my car for seven more hours, so I asked a friend to pick me up. When we got back to the camp, the joke was on me and my fancy electric car. The joke continued the following day because we needed to check out of the camp, without a car. My daughter went in one car, me to another, and our bags were stuffed in a third car. Then our gas car convoy took a detour to Gospić to drop us all off at the charging station.

As my daughter finally sat down in our car, she asked, “Daddy, did you really need to buy an electric car?”

There is a funny twist to the story. There was a charger closer to our camping site in a Nikola Tesla museum. Because, of all places, Tesla was born in the nearby village of Smiljan. I didn’t try that charger, as online comments explained it is located behind the museum fence. And the fence is closed when the museum is closed, as it was that weekend:

Lesson 12, Conclusion

Humans rationalize their mistakes, and I am not different. Buying an EV was a mistake, but I convinced myself it is not so bad. During six months, I have wasted 10+ hours on the above EV charging issues. But that is still less than 54 hours annually Americans spend stalled in traffic or 44 hours annually Brits spend searching for parking. EV charging is a hassle, but it is only a problem when going on a longer trip. In city, you deal with rush hours and parking problems every day. The final lesson I learned:

#12: Today’s cars are great, but the car infrastructure sucks. With an EV, in addition to traffic jams and parking, you will regularly have to deal with charger infrastructure.

Nikola Tesla understood that the biggest obstacle to electricity adoption was infrastructure. He invented AC that is easier to transform to high voltages and transport over large distances. 

After winning the War of the currents, Tesla aimed even higher. He wanted to use currents with specific frequencies for wireless power transfer via earth or air. Early radio receivers used this idea and had large antennas that would provide both signal and electric power wirelessly. But Tesla dreamed bigger, envisioning high-frequency transmitting towers powering electric airplanes:

Instead, we got a world where it is difficult to charge an EV even when standing next to an electrical plug. But this time dreams failed because of people, not because of technology.

 

UPDATE: check discussions on Hacker News, r/TrueReddit, and r/ElectricVehicles.