90% of your data is garbage nobody knows how to process

· 4 min read
Share:

TL;DR

  • 90% of your data is unstructured (emails, PDFs, photos, notes)
  • 97% of companies invest in Big Data, only 40% use it well
  • AI needs clean data; if you don’t have it, it gives you garbage with more confidence
  • Before buying tools: inventory, basic pipelines, single source of truth

The numbers

  • 181 zettabytes of data generated in 2025
  • 90% is unstructured
  • 97% of companies have invested in Big Data
  • Only 40% use analytics effectively

Translation: almost every company has data. Almost none know what to do with it.

What “unstructured” means

Structured data:

SELECT name, date, amount FROM sales

Easy. A table. Clear columns. SQL and done.

Unstructured data:

  • Emails from complaining customers
  • Scanned PDFs of contracts
  • Slack messages from the team
  • Call recordings
  • Product photos in WhatsApp
  • Notes on photographed post-its

90% of your company’s data is this. And it doesn’t fit in a table.

The real problem

Companies buy:

  • Power BI licenses
  • Snowflake subscriptions
  • “Enterprise AI” platforms

And then discover their data is in:

  • 47 Excel files shared via email
  • An Access database from 2008 that “only John knows how to use”
  • Network folders named “FINAL_v3_GOOD_THIS_ONE_YES”
  • The CEO’s inbox that never forwards anything

It’s not a tools problem. It’s a plumbing problem.

What I see as a data engineer

80% of my job isn’t analysis. It’s:

1. Finding where the data lives

  • “Who has the 2019 sales history?”
  • “In an Excel that Maria had before she left”

2. Cleaning garbage

  • Dates in 15 different formats
  • “NULL”, “N/A”, ”-”, ” ”, “not applicable” → all the same thing
  • Duplicates that nobody knows if they’re duplicates or different records

3. Connecting systems that don’t talk

  • The CRM doesn’t talk to the ERP
  • The ERP exports CSV with broken encoding
  • Someone has a Python script that “fixes it” but nobody knows where it is

4. Convincing people to use the system

  • “Yeah, but I have it in my Excel and it works fine”

Why AI won’t save you

The fantasy:

“We’ll put in AI and it analyzes all our data automatically”

The reality:

AI needs clean, structured, accessible data. If you don’t have it, AI will give you garbage with more confidence.

Garbage in, garbage out. But now with a chatbot telling you the garbage is gold.

What to do before buying AI

1. Data inventory

What data do you have? Where is it? Who maintains it?

If you can’t answer this, you’re not ready for AI.

2. Single source of truth

Per process. Per metric. One place where the good data lives.

Not “John’s Excel” vs “Maria’s report.”

3. Basic pipelines

Extract → Transform → Load. The basics. No glamour.

If your data doesn’t flow, no tool will help you.

4. Data governance

Who decides what an “active customer” is? Who approves changes to definitions?

Without this, every department has its own truth.

Conclusion

90% of your data is unstructured.

97% of companies have invested in Big Data.

Only 40% use it effectively.

The difference isn’t the tool. It’s the plumbing.

Before buying AI, make sure you can answer: “How much did we sell last month?” without three people giving you three different numbers.


This connects to another problem: 95% of companies see no results with AI. It’s not the tool, it’s the plumbing.

Someone says the numbers don’t match? Read my Power BI debugging guide to find where the problem is.

Found this useful? Share it

Share:

You might also like