Why McKinsey Says Your GenAI Projects Are Failing (Spoiler: It's Not Because of the AI)

Written by

Published on

September 20, 2023

Read time

5 min

INTERESTING ARCHITECTURE TRENDS

Lorem ipsum dolor sit amet consectetur adipiscing elit obortis arcu enim urna adipiscing praesent velit viverra. Sit semper lorem eu cursus vel hendrerit elementum orbi curabitur etiam nibh justo, lorem aliquet donec sed sit mi dignissim at ante massa mattis egestas.

Neque sodales ut etiam sit amet nisl purus non tellus orci ac auctor.
Adipiscing elit ut aliquam purus sit amet viverra suspendisse potenti.
Mauris commodo quis imperdiet massa tincidunt nunc pulvinar.
Adipiscing elit ut aliquam purus sit amet viverra suspendisse potenti.

WHY ARE THESE TRENDS COMING BACK AGAIN?

Vitae congue eu consequat ac felis lacerat vestibulum lectus mauris ultrices ursus sit amet dictum sit amet justo donec enim diam. Porttitor lacus luctus accumsan tortor posuere raesent tristique magna sit amet purus gravida quis blandit turpis.

Odio facilisis mauris sit amet massa vitae tortor.

WHAT TRENDS DO WE EXPECT TO START GROWING IN THE COMING FUTURE?

At risus viverra adipiscing at in tellus integer feugiat nisl pretium fusce id velit ut tortor sagittis orci a scelerisque purus semper eget at lectus urna duis convallis porta nibh venenatis cras sed felis eget. Neque laoreet suspendisse interdum consectetur libero id faucibus nisl donec pretium vulputate sapien nec sagittis aliquam nunc lobortis mattis aliquam faucibus purus in.

Neque sodales ut etiam sit amet nisl purus non tellus orci ac auctor.
Eleifend felis tristique luctus et quam massa posuere viverra elit facilisis condimentum.
Magna nec augue velit leo curabitur sodales in feugiat pellentesque eget senectus.
Adipiscing elit ut aliquam purus sit amet viverra suspendisse potenti .

WHY IS IMPORTANT TO STAY UP TO DATE WITH THE ARCHITECTURE TRENDS?

Dignissim adipiscing velit nam velit donec feugiat quis sociis. Fusce in vitae nibh lectus. Faucibus dictum ut in nec, convallis urna metus, gravida urna cum placerat non amet nam odio lacus mattis. Ultrices facilisis volutpat mi molestie at tempor etiam. Velit malesuada cursus a porttitor accumsan, sit scelerisque interdum tellus amet diam elementum, nunc consectetur diam aliquet ipsum ut lobortis cursus nisl lectus suspendisse ac facilisis feugiat leo pretium id rutrum urna auctor sit nunc turpis.

“Vestibulum pulvinar congue fermentum non purus morbi purus vel egestas vitae elementum viverra suspendisse placerat congue amet blandit ultrices dignissim nunc etiam proin nibh sed.”

WHAT IS YOUR NEW FAVORITE ARCHITECTURE TREND?

Eget lorem dolor sed viverra ipsum nunc aliquet bibendumelis donec et odio pellentesque diam volutpat commodo sed egestas liquam sem fringilla ut morbi tincidunt augue interdum velit euismod. Eu tincidunt tortor aliquam nulla facilisi enean sed adipiscing diam donec adipiscing ut lectus arcu bibendum at varius vel pharetra nibh venenatis cras sed felis eget.

I read McKinsey's report on GenAI project failures the same way I read my credit card statement - with a mix of denial and resignation. "Surely our GenAI pilots are different" , I thought. Spoiler alert: they're not. But here's what the report gets wrong.

The headline screams that most GenAI projects fail. True-ish. But the reason isn't what you think. It's not because the models are dumb. It's not because the technology doesn't work. It's failing because someone in a meeting room promised 99% accuracy in Q2, and now we're in August wondering why the chatbot keeps hallucinating about your company's non-existent product lines.

Let me be blunt: We've forgotten that LLM outputs require serious optimization work.

‍

The Gap Between "Working AI" and "Production AI"

Here's what nobody tells you when you greenlight a GenAI project. Getting an LLM to generate something is easy. Getting it to generate something reliable enough for your actual business is a completely different beast.

I've watched brilliant engineering teams build beautiful proof-of-concepts where the model cranks out creative, coherent responses. Everyone nods. The executive sponsor gets excited. PowerPoint slides multiply like rabbits. Then someone asks the simple question: "But will it actually work for our customers?"

And that's when the lights start flickering.

The problem isn't that GenAI doesn't work. The problem is that we're treating LLM output optimization like it's some nice-to-have afterthought, when it should be the entire game from day one.

‍

What Everyone Forgets: Optimization is 80% of the Work

Let me walk you through what we've learned the hard way:

Grounding and Guardrails —Your model needs to stay in its lane. If you're asking an LLM to help customers troubleshoot billing issues, it needs to know the difference between a possible answer and an accurate answer. That requires guardrails. Serious ones. And they take time to build.

RAG (Retrieval-Augmented Generation) —You think you can just feed your knowledge base into a model and call it a day? No. RAG requires tuning. Which documents get retrieved? In what order? How do you prevent the model from mixing signals across conflicting sources? This is where projects actually live or die, and it's unglamorous work.

Fine-tuning and Evaluation —Want your model to sound like your brand? To handle edge cases your industry cares about? To not confidently give wrong answers? That's fine-tuning. And before you can fine-tune, you need to know what "good" looks like, which means building evaluation frameworks before you even start optimizing.

The Evaluation Problem Nobody Wants to Solve —Here's the kicker: you can't just eyeball LLM outputs and call it good. You need systematic evaluation. Metrics. Baselines. Continuous testing. Most teams skip this because it feels like overhead. Then they launch, and suddenly they're dealing with production incidents that could have been caught in week two of the POC.

‍

The Promise Problem: Why Your Timeline is Already Broken

I've been in enough rooms to recognize the pattern. Someone says: "We need 99% accuracy on customer support responses by end of Q2."

My internal alarm bells go ding ding ding.

Here's the thing about 99% accuracy with LLMs: it's achievable, but not in 12 weeks if you're starting from a cold start. It requires:

Data preparation (finding or generating quality training examples)
Multiple evaluation rounds (because your first attempt will be wrong)
Iterative fine-tuning (rinse, repeat, realize you need better data)
Edge case discovery (the fun part where customers find scenarios you never imagined)
Guardrail refinement (when you realize your model keeps making the same mistake in slightly different ways)

If you're committing to 99%+ accuracy on a complex task in a compressed timeline, yellow lights should be flashing everywhere. Not because it's impossible, but because you're either underestimating the work or setting yourself up to ship something that will embarrass you.

‍

What We've Learned: The Incremental Approach

Here's where I'm going to give you something more useful than hand-wringing.

Start with honest expectations. Your first GenAI implementation won't hit production-grade accuracy. Maybe it gets to 85%. That's fine. That's actually good if it's on a well-defined problem with proper evaluation metrics.

Build evaluation into the POC stage. Don't wait until you're in production to ask "Is this actually working?" Instrument your POC to measure accuracy from day one. Set a baseline. Track improvement. Know exactly which cases the model handles well and which ones make it lose its mind. This single practice will save you from 80% of GenAI project disasters.

Create incremental delivery milestones. Don't aim for "90% accuracy across all use cases" in one sprint. Aim for "65% accuracy on high-volume, low-complexity queries, with built-in escalation to humans." Then iterate. Then expand. Then optimize. Each milestone should move you toward your goal, not bet everything on a single launch.

Be ruthlessly honest about where 95%+ accuracy is non-negotiable. In customer success, some things matter more than others. Answering "What's your return policy?" can tolerate higher error rates than "Is this charge fraudulent?" Know the difference. Invest optimization effort where it actually matters for your business.

Partner with humans, don't replace them. At least initially. GenAI works best as a force multiplier for humans, not as a human replacement. This isn't romantic; it's pragmatic. It also means your accuracy requirements can be more realistic.

‍

The Real Reason Projects Fail

McKinsey will tell you projects fail because of organizationalresistance, poor change management, or inadequate talent. All true.

But dig deeper, and you'll find the real reason: misaligned expectations between what people think GenAI can do and what it actually takes to build production-grade GenAI systems.

People see ChatGPT making magic happen and think it's easy. They don't see the optimization work, the evaluation cycles, the guardrail engineering, the fine-tuning data preparation. They see the finished product and work backward, imagining it was fast and straightforward.

It wasn't.

‍

The Pragmatic Path Forward

If you're leading a GenAI initiative, here's what I'd actually do:

Set realistic accuracy targets for your first phase. Shoot for 80-85% if it's a new problem. If you hit it, celebrate.
Build evaluation from day one. Not at the end. Day one. Instrument your POC so you know exactly what's working and what isn't. Make this non-negotiable.
Commit to incremental delivery. Launch with a narrow, well-defined use case. Get it to 95%+ if it matters. Expand from there. Each expansion becomes easier because you've built the infrastructure.
Invest in optimization for high-stakes decisions. Where accuracy really matters - fraud detection, compliance, critical customer interactions - that's where you invest in RAG refinement, fine-tuning, and rigorous evaluation. Everywhere else, a well-built human escalation path works fine.
Under promise, overdeliver. Tell your stakeholders: "We're building something that works for 60% of cases in Month 1, and we'll expand the coverage and accuracy from there". They'll be thrilled when you hit 75% in Month 2.

‍The Honest Conclusion

McKinsey's right that most GenAI projects fail. But it's not because GenAI doesn't work. It's because we're treating it like magic instead of like the engineering problem it actually is.

LLM outputs don't become production-grade through hope and executive enthusiasm. They become production-grade through systematic optimization, rigorous evaluation, and realistic timelines.

The companies winning with GenAI aren't the ones moving fastest. They're the ones being most honest about what it takes to move well.

‍Here's my question for you: In your organization, where are people making promises about GenAI accuracy that feel optimistic? And where do you actually have time to build the evaluation infrastructure that makes those promises realistic?

I'm genuinely curious. This is the gap where most projects live—and where the interesting conversations actually happen.

Let's talk about it.

‍

Why McKinsey Says Your GenAI Projects Are Failing (Spoiler: It's Not Because of the AI)

Written by

Hanan Zakai

Published on

September 20, 2023

Read time

5 min

INTERESTING ARCHITECTURE TRENDS

Neque sodales ut etiam sit amet nisl purus non tellus orci ac auctor.
Adipiscing elit ut aliquam purus sit amet viverra suspendisse potenti.
Mauris commodo quis imperdiet massa tincidunt nunc pulvinar.
Adipiscing elit ut aliquam purus sit amet viverra suspendisse potenti.

WHY ARE THESE TRENDS COMING BACK AGAIN?

WHAT TRENDS DO WE EXPECT TO START GROWING IN THE COMING FUTURE?

Neque sodales ut etiam sit amet nisl purus non tellus orci ac auctor.
Eleifend felis tristique luctus et quam massa posuere viverra elit facilisis condimentum.
Magna nec augue velit leo curabitur sodales in feugiat pellentesque eget senectus.
Adipiscing elit ut aliquam purus sit amet viverra suspendisse potenti .

WHY IS IMPORTANT TO STAY UP TO DATE WITH THE ARCHITECTURE TRENDS?

“Vestibulum pulvinar congue fermentum non purus morbi purus vel egestas vitae elementum viverra suspendisse placerat congue amet blandit ultrices dignissim nunc etiam proin nibh sed.”

WHAT IS YOUR NEW FAVORITE ARCHITECTURE TREND?

Let me be blunt: We've forgotten that LLM outputs require serious optimization work.

‍

The Gap Between "Working AI" and "Production AI"

And that's when the lights start flickering.

The problem isn't that GenAI doesn't work. The problem is that we're treating LLM output optimization like it's some nice-to-have afterthought, when it should be the entire game from day one.

‍

What Everyone Forgets: Optimization is 80% of the Work

Let me walk you through what we've learned the hard way:

‍

The Promise Problem: Why Your Timeline is Already Broken

I've been in enough rooms to recognize the pattern. Someone says: "We need 99% accuracy on customer support responses by end of Q2."

My internal alarm bells go ding ding ding.

Here's the thing about 99% accuracy with LLMs: it's achievable, but not in 12 weeks if you're starting from a cold start. It requires:

Data preparation (finding or generating quality training examples)
Multiple evaluation rounds (because your first attempt will be wrong)
Iterative fine-tuning (rinse, repeat, realize you need better data)
Edge case discovery (the fun part where customers find scenarios you never imagined)
Guardrail refinement (when you realize your model keeps making the same mistake in slightly different ways)

‍

What We've Learned: The Incremental Approach

Here's where I'm going to give you something more useful than hand-wringing.

‍

The Real Reason Projects Fail

McKinsey will tell you projects fail because of organizationalresistance, poor change management, or inadequate talent. All true.

But dig deeper, and you'll find the real reason: misaligned expectations between what people think GenAI can do and what it actually takes to build production-grade GenAI systems.

It wasn't.

‍

The Pragmatic Path Forward

If you're leading a GenAI initiative, here's what I'd actually do:

Set realistic accuracy targets for your first phase. Shoot for 80-85% if it's a new problem. If you hit it, celebrate.
Build evaluation from day one. Not at the end. Day one. Instrument your POC so you know exactly what's working and what isn't. Make this non-negotiable.
Commit to incremental delivery. Launch with a narrow, well-defined use case. Get it to 95%+ if it matters. Expand from there. Each expansion becomes easier because you've built the infrastructure.
Invest in optimization for high-stakes decisions. Where accuracy really matters - fraud detection, compliance, critical customer interactions - that's where you invest in RAG refinement, fine-tuning, and rigorous evaluation. Everywhere else, a well-built human escalation path works fine.
Under promise, overdeliver. Tell your stakeholders: "We're building something that works for 60% of cases in Month 1, and we'll expand the coverage and accuracy from there". They'll be thrilled when you hit 75% in Month 2.

‍The Honest Conclusion

McKinsey's right that most GenAI projects fail. But it's not because GenAI doesn't work. It's because we're treating it like magic instead of like the engineering problem it actually is.

LLM outputs don't become production-grade through hope and executive enthusiasm. They become production-grade through systematic optimization, rigorous evaluation, and realistic timelines.

The companies winning with GenAI aren't the ones moving fastest. They're the ones being most honest about what it takes to move well.

I'm genuinely curious. This is the gap where most projects live—and where the interesting conversations actually happen.

Let's talk about it.

‍

Latest Posts

Browse all posts

Designing & Building with Generative AI: כך נראה העתיד של יצירת מוצרים

Blog

November 23, 2021

Why McKinsey Says Your GenAI Projects Are Failing (Spoiler: It's Not Because of the AI)

INTERESTING ARCHITECTURE TRENDS

WHY ARE THESE TRENDS COMING BACK AGAIN?

WHAT TRENDS DO WE EXPECT TO START GROWING IN THE COMING FUTURE?

WHY IS IMPORTANT TO STAY UP TO DATE WITH THE ARCHITECTURE TRENDS?

WHAT IS YOUR NEW FAVORITE ARCHITECTURE TREND?

Why McKinsey Says Your GenAI Projects Are Failing (Spoiler: It's Not Because of the AI)

INTERESTING ARCHITECTURE TRENDS

WHY ARE THESE TRENDS COMING BACK AGAIN?

WHAT TRENDS DO WE EXPECT TO START GROWING IN THE COMING FUTURE?

WHY IS IMPORTANT TO STAY UP TO DATE WITH THE ARCHITECTURE TRENDS?

WHAT IS YOUR NEW FAVORITE ARCHITECTURE TREND?

Latest Posts

Designing & Building with Generative AI: כך נראה העתיד של יצירת מוצרים

I've Been Building AI Agents Without Writing a Single Line of Code (And You Can Too)

Architecture Next 2025 - videos