90% of your data is garbage nobody knows how to process
TL;DR
- 90% of your data is unstructured (emails, PDFs, photos, notes)
- 97% of companies invest in Big Data, only 40% use it well
- AI needs clean data; if you don’t have it, it gives you garbage with more confidence
- Before buying tools: inventory, basic pipelines, single source of truth
The numbers
- 181 zettabytes of data generated in 2025
- 90% is unstructured
- 97% of companies have invested in Big Data
- Only 40% use analytics effectively
Translation: almost every company has data. Almost none know what to do with it.
What “unstructured” means
Structured data:
SELECT name, date, amount FROM sales
Easy. A table. Clear columns. SQL and done.
Unstructured data:
- Emails from complaining customers
- Scanned PDFs of contracts
- Slack messages from the team
- Call recordings
- Product photos in WhatsApp
- Notes on photographed post-its
90% of your company’s data is this. And it doesn’t fit in a table.
The real problem
Companies buy:
- Power BI licenses
- Snowflake subscriptions
- “Enterprise AI” platforms
And then discover their data is in:
- 47 Excel files shared via email
- An Access database from 2008 that “only John knows how to use”
- Network folders named “FINAL_v3_GOOD_THIS_ONE_YES”
- The CEO’s inbox that never forwards anything
It’s not a tools problem. It’s a plumbing problem.
What I see as a data engineer
80% of my job isn’t analysis. It’s:
1. Finding where the data lives
- “Who has the 2019 sales history?”
- “In an Excel that Maria had before she left”
2. Cleaning garbage
- Dates in 15 different formats
- “NULL”, “N/A”, ”-”, ” ”, “not applicable” → all the same thing
- Duplicates that nobody knows if they’re duplicates or different records
3. Connecting systems that don’t talk
- The CRM doesn’t talk to the ERP
- The ERP exports CSV with broken encoding
- Someone has a Python script that “fixes it” but nobody knows where it is
4. Convincing people to use the system
- “Yeah, but I have it in my Excel and it works fine”
Why AI won’t save you
The fantasy:
“We’ll put in AI and it analyzes all our data automatically”
The reality:
AI needs clean, structured, accessible data. If you don’t have it, AI will give you garbage with more confidence.
Garbage in, garbage out. But now with a chatbot telling you the garbage is gold.
What to do before buying AI
1. Data inventory
What data do you have? Where is it? Who maintains it?
If you can’t answer this, you’re not ready for AI.
2. Single source of truth
Per process. Per metric. One place where the good data lives.
Not “John’s Excel” vs “Maria’s report.”
3. Basic pipelines
Extract → Transform → Load. The basics. No glamour.
If your data doesn’t flow, no tool will help you.
4. Data governance
Who decides what an “active customer” is? Who approves changes to definitions?
Without this, every department has its own truth.
Conclusion
90% of your data is unstructured.
97% of companies have invested in Big Data.
Only 40% use it effectively.
The difference isn’t the tool. It’s the plumbing.
Before buying AI, make sure you can answer: “How much did we sell last month?” without three people giving you three different numbers.
This connects to another problem: 95% of companies see no results with AI. It’s not the tool, it’s the plumbing.
Someone says the numbers don’t match? Read my Power BI debugging guide to find where the problem is.
You might also like
95% see no results with AI (and why that's normal)
The J-curve of adoption nobody tells you about. Why productivity drops before it rises when you adopt AI.
Data Fabric: what it is and why you should care
Unified data architecture regardless of where data lives. What it means for a data engineer and how it relates to tools you already use.
The AI bubble: 7 trillion looking for returns
Who wins, who loses, and why you should care. Analysis of massive AI investment and its bubble signals.