AI for spreadsheets: cleaning, analyzing, and charting data
A practical guide to using AI to handle spreadsheets — cleaning messy data, writing formulas, building pivot tables, and turning rows of numbers into clear answers. Works in Excel, Google Sheets, or via ChatGPT directly.
If you have ever spent forty minutes trying to remember the syntax for a VLOOKUP, or stared at a column of dates in seven different formats, or pasted data from a PDF that came out as a single column of garbled text — this article is for you.
Spreadsheets are arguably the second-most-improved category of work in the AI era, after writing. The improvement is not about replacing spreadsheets (you still need them). It is about removing the friction between "I have data" and "I have answers."
Three things changed:
- The major spreadsheet tools now have AI features built in — Excel's Copilot, Google Sheets' Gemini, both surprisingly capable.
- ChatGPT, Claude, and Gemini can read uploaded spreadsheets, do analysis, and produce results without you needing to write formulas at all.
- Reasoning models are good enough at multi-step data work that "describe the analysis you want in plain English" usually works.
This article walks through how to actually use these capabilities, in the order most people will encounter them.
Three modes of AI + spreadsheets
Roughly speaking, you have three options. Each is right in different situations.
Mode 1: AI inside the spreadsheet. Use Excel's Copilot or Sheets' Gemini directly. Type your question into a sidebar; it writes formulas, builds charts, summarises data. Best when your data is already in a spreadsheet and you want answers without leaving it.
Mode 2: AI outside the spreadsheet. Upload a CSV or Excel file to ChatGPT, Claude, or Gemini. Ask questions; the model runs analysis (often by writing and executing Python in the background) and returns results, charts, and a cleaned file. Best for messy data, ad-hoc questions, or anything beyond basic.
Mode 3: AI as a formula writer. Describe what you want in plain English and ask the model for the exact formula. Paste it into your spreadsheet. Best when you know roughly what you want but cannot remember the syntax.
We will cover all three.
Mode 1: AI inside the spreadsheet
Both Excel Copilot and Google Sheets' Gemini sidebar work in a similar way: you select a range or open a document, type a request, and the AI returns formulas, charts, or summaries.
Reliable prompts for inside-the-spreadsheet AI:
Look at the data in columns A to F. Summarise it: tell me the date range, the column types, and any obvious issues (missing values, inconsistent formats, outliers).
Add a column to the right that classifies each row as "high," "medium," or "low" priority based on the value in column D.
Make a chart that compares monthly revenue across the three product categories. Make it readable on a slide.
Find the rows where the "status" column has anything other than "open," "closed," or "in progress" — these are likely typos.
Calculate the running 7-day average of column C and put it in column G.
The pattern is consistent: be specific about which columns, what operation, and what the output should look like. The AI will write the formulas or build the chart for you. Watch the results and tweak.
The limits: complex multi-step analysis still trips up the inside-the-spreadsheet AI sometimes. Anything that requires reasoning across multiple sheets, conditional logic stacked three deep, or substantial cleanup, is usually better done in mode 2.
Mode 2: AI outside the spreadsheet
Upload your CSV or Excel file to ChatGPT (Plus or Pro), Claude, or Gemini Advanced. The model can read it, analyse it, write code to manipulate it, and return both a written answer and a downloadable processed file.
A reliable workflow:
I'm uploading a spreadsheet of [what it is]. First, give me a quick overview:
>
1. What's in it — rows, columns, what each column appears to be. 2. Any data quality issues (missing values, weird formats, likely typos, duplicates). 3. A first round of obvious-but-useful summary stats (counts, ranges, distributions).
>
Then wait for my specific question.
This first pass takes the model 20 seconds. You get an honest assessment of what you are dealing with. Now ask the real question:
Of the entries in this spreadsheet:
>
1. How many came from Estonia, Finland, and Germany? 2. What is the average deal size in each country? 3. Which sales rep has the highest conversion rate?
>
Give me a clean summary table. Also: are there any rows where the country field is ambiguous (e.g., "DE" vs "Germany" vs "Deutschland")? Flag those.
The model will produce the answer, often along with a chart, and a cleaned version of the file you can download. The "flag ambiguous rows" instruction is the part most people forget — without it, data quality issues hide inside the summary.
For multi-step analysis:
I want to understand which customers churned, why, and what predicts churn. Walk me through your analysis in three steps:
>
1. First, identify churned customers — define your criterion (e.g., no activity in 90 days). 2. Then compare churned vs active customers on the dimensions I have data for. 3. Then tell me which dimensions show the biggest gap and would be the best leading indicators of churn.
>
Use a reasoning model and show your work so I can sanity-check.
The "use a reasoning model" hint matters for anything multi-step. The fast default model may take shortcuts. The reasoning variant (GPT-5 Thinking, Claude Extended Thinking, o3) will be slower but produce more reliable analysis.
Mode 3: AI as a formula writer
The simplest mode. You know what you want; you just cannot remember the formula. Open ChatGPT (no upload needed), describe the goal, get the formula. Paste into your spreadsheet. Done.
Examples that work consistently well:
I have a list of email addresses in column A. I want column B to show just the domain. What Google Sheets formula should I use?
In Excel, how do I count the rows where column D is between 10 and 20 AND column E says "active"?
I have dates in column A that look like "2026-04-12T14:30:00Z" — full ISO format. I want column B to show just the date in DD/MM/YYYY format, in Estonian time zone.
I have a column of currency strings like "€1,234.56", "$987.00", "£42.10". I want to split it into two columns: numeric value, and currency symbol.
I have two sheets. Sheet1 has customer IDs and names. Sheet2 has customer IDs and dollar amounts. I want a third sheet that joins them, with customer name, customer ID, and amount.
You will get the exact formula. For the last one, you will probably get a VLOOKUP, INDEX/MATCH, or XLOOKUP example with a clear explanation. Paste it in and adjust the cell ranges.
A particularly useful follow-up for formulas you are using regularly: "Now explain what this formula is doing, line by line, so I understand it." This is one of the better ways to actually learn spreadsheet syntax that I have seen — better than tutorials, because it is grounded in your actual problem.
Cleaning messy data
Data cleaning is the silent killer of spreadsheet work. AI is at its very best here. Some common cleanup operations and how to ask:
Inconsistent date formats:
Look at column A. Dates are in mixed formats: some "2026-04-12", some "12/04/2026", some "April 12, 2026", some weirdly typed. Normalise them all to ISO format (YYYY-MM-DD). Flag any rows you couldn't parse.
Mismatched company names:
Look at column B (Company). Some rows say "Apple Inc.", some "Apple", some "apple inc", some "Apple Computer Inc.". Group these into canonical names. Add a new column with the canonical name for each row.
Phone numbers in many formats:
Column D has phone numbers. Some have country codes, some don't; some have parentheses, dashes, spaces. Normalise them to E.164 international format (+372...). Assume Estonia if no country code is given. Flag any I cannot parse.
Email addresses with hidden errors:
Column E has email addresses. Find any that are likely invalid (missing @, missing TLD, has spaces, has typos like ".con" instead of ".com"). Flag them in a new column.
The model often returns the cleaned file along with a summary of what it changed. Review the summary before trusting the result.
A trap to watch for: silent errors
AI doing data analysis can produce silently wrong answers in two ways.
Hallucinated calculations. The model writes plausible-sounding numbers without actually computing them. This is rare on modern frontier models with built-in code execution (Python/Code Interpreter), but it does happen. The protection: explicitly ask "show me the code or formula you used, and give me a way to verify."
Misinterpreting the data. The model assumes a column is something other than what it is — "total" when it is "subtotal," "active customers" when it is "all customers." Always show the model the actual column names and ask it to confirm interpretation:
Before doing the analysis, tell me what each column is. Confirm with me that your interpretation matches what I intend. Wait for confirmation before computing anything.
This adds 20 seconds and catches the most common analysis-level mistakes.
When the spreadsheet gets too big
ChatGPT, Claude, and Gemini have practical limits on file size for analysis. As of 2026, you can comfortably analyse spreadsheets up to roughly 50,000 rows or a few hundred MB. Beyond that, the right tools shift toward proper data analysis (Python, R, SQL, BI tools like Hex or Mode).
A useful intermediate move: for a million-row dataset, sample a representative slice (10,000–50,000 rows), upload it, do the analysis, and confirm the result on the full data through a more capable tool. AI is great for "tell me what's in the data and what to look for"; specialised tools are right for the actual production analysis.
The takeaway
Three modes — inside the spreadsheet, outside the spreadsheet, formula writer. Pick the right one for the task and your spreadsheet work goes from "twenty minutes of remembering VLOOKUP" to "two minutes of describing the problem in plain English."
You do not need to learn formula syntax anymore unless you want to. You do need to be specific about the data, the operation, and the output. And you always need to spot-check the results — but that takes thirty seconds, and now the rest of the spreadsheet work happens at AI speed.
Try one real spreadsheet task today using the right mode. The thing that used to be friction will not be friction anymore.