
Excel AI tools have a problem. They work great on demo data. Clean tables, simple formulas, one sheet. But the moment you bring in actual work data, multiple sheets with inconsistent headers, broken formulas you inherited from someone who left two years ago, thousands of customer responses that need categorizing, most AI tools just fall apart.
I spent a few weeks testing Claude in Excel across eight real tasks. Not toy examples. Real work data, the kind that actually causes problems. Here is what I found, including the exact prompts I used for each one.
Getting Set Up
Claude in Excel is an add-in, not a built-in feature. Open Excel, go to Insert, click Get Add-ins, search for Claude, and install the Anthropic add-in. A panel shows up on the right side of your Excel window once it is installed.
When you open the panel for the first time, you pick a model. There are two main options: Opus and Sonnet. Opus is the more powerful one, better for complex reasoning and anything involving understanding context across a large spreadsheet. Sonnet is faster and handles most standard tasks just fine. I used Opus for anything involving data interpretation or complex formulas, and Sonnet for the simpler stuff.
Task 1. Merging Multiple Sheets With Different Column Headers

This is the task that made me give up on Copilot for Excel work.
Picture this: sales data split across three sheets, Q1, Q2, and Q3. The problem is that whoever built these sheets was not consistent. Q1 says “Customer Name,” Q2 says “Client,” Q3 says “customer_name.” A normal VLOOKUP fails. Most AI tools fail too because they match text literally.
Claude reads what the data actually means. It understood that “Customer Name,” “Client,” and “customer_name” all refer to the same thing. It mapped the columns by meaning rather than by exact string match, merged all three sheets into a clean combined table with consistent headers, and finished on the first try. ChatGPT and Copilot both needed manual column mapping before they could do anything.
Prompt I used:
“I have three sheets named Q1, Q2, and Q3. Each has sales data but the column headers do not match across sheets. Merge all three into a new sheet called Combined. Map columns that refer to the same data even if the names are different. For example, Customer Name, Client, and customer_name should all become one Customer Name column in the output. If you are uncertain about any column, flag it.”
Task 2. Writing Formulas That Actually Update When Data Changes
Hard-coded formulas are one of the most common Excel headaches in real work. Someone writes =SUM(B2:B50) and when new rows get added, the formula does not expand. Or they hard-code a threshold number directly into a formula instead of referencing a cell, so changing that number means hunting through every formula manually.
Claude rewrites formulas to be dynamic. It uses named ranges, OFFSET, INDEX/MATCH, and structured references so ranges expand automatically and comparison values reference cells rather than hard-coded numbers.
“Look at this formula in cell D5: [paste formula]. It has hard-coded values. Rewrite it so the range expands automatically when new rows are added, any threshold or comparison values reference a specific cell rather than a number typed directly into the formula, and it still returns the same result for the current data. Tell me what you changed and why.”
The explanation at the end matters more than people realize. Claude does not just fix it and move on. It walks you through what was wrong with the original, which is what helps you avoid writing the same broken pattern next time.
Task 3. Debugging Formula Errors From Typos to Logic Problems

Formula debugging is where Claude separates itself from everything else. The difference is not just that it finds the error. It follows a structured process through three categories of problems.
First it looks for syntax errors, wrong parentheses, missing commas, misspelled function names. Then data type errors, trying to do math on text, date formatting mismatches. Then logic errors, where the formula runs without any error message but returns the wrong number. That third category is the one that catches people off guard, and it is where most tools completely miss the problem.
“This formula returns an error: [paste formula]. The data it should be working with is in columns A through F. Walk me through what the error message means, what specifically in the formula is causing it, and the corrected version. If there are multiple issues, fix them in order of priority.”
Task 4. Turning Raw Sales Data Into Actual Insights

This is where Claude stops being a formula tool and starts acting like an analyst.
Paste your sales data and ask it to analyze. What comes back is not a summary of the numbers. It is an interpretation. Claude identifies which products are declining and in which regions, whether the decline is consistent over time or concentrated in specific months, and whether the pattern looks more like a pricing problem, a distribution issue, or seasonality.
“Here is our sales data for the past 12 months across five product categories and three regions: [paste data]. Tell me the three most significant trends you see, which category or region is underperforming relative to the others and why that might be, any anomalies worth investigating, and what the data suggests we should focus on next quarter. Start with the most important finding.”
The quality of what you get back scales directly with how specific your question is. The more context you give Claude about what you are actually trying to understand, the more targeted the analysis becomes.
Task 5. Building Pivot Tables Without the Setup
Most people who need pivot tables spend more time configuring them than reading them. Claude cuts that out entirely.
Describe what you want to see in plain English and Claude builds the configuration, applies it, and formats it. It also picks the right aggregation method based on what the question actually requires, sum vs average vs count, rather than defaulting to whatever you last used.
“Using the data in the Sales sheet, create a pivot table showing total revenue by product category, broken down by region, for each quarter. Sort by revenue descending within each quarter. Add conditional formatting to highlight the top-performing combination in each quarter.”
The conditional formatting request in the same prompt was what surprised me. Claude applied it as part of the same operation. No second prompt, no manual formatting afterward.
Task 6. Picking the Right Chart and Building It

The most common charting mistake in Excel is using the wrong chart type for what the data shows. A line chart implies that data points connect continuously, so if your data is discrete categories, a bar chart is the right call. A pie chart implies the segments add up to something meaningful, and if they do not, the chart is misleading rather than helpful.
Claude recommends based on what the data actually shows, not just what looks familiar.
“I want to visualize the data in columns A through C. Column A is month, column B is revenue, column C is number of transactions. What chart type would best show both the revenue trend over time and how transaction volume relates to revenue? Build that chart, label it clearly, and explain why you chose this format over a dual-axis chart or two separate charts.”
Task 7. Handling Errors Gracefully With IFERROR and Conditional Logic
Real spreadsheets have messy data. Lookups that return nothing. Calculations that divide by zero. Conditions that return text where you expected a number. Left alone, these create error cascades where one broken cell breaks every formula that references it.
Claude handles these by building IFERROR wrappers, adding conditional branches, and replacing blank or error cells with appropriate defaults. The key thing it does differently is explain the pattern so you can apply it to other formulas yourself.
“This VLOOKUP occasionally returns #N/A when it cannot find a match: [paste formula]. I want it to return Not Found as text when there is no match, return Check Data if the lookup value itself is blank, and return the normal result in all other cases. Write the updated formula and show me how to extend this pattern to other lookups in the same sheet.”
Task 8. Categorizing Thousands of Customer Messages in Seconds

This one genuinely surprised me. Most people do not know Excel can do this at all, but with Claude it just works.
The scenario: 3,000 rows of customer inquiry text, one message per row, and you need to sort each one into six categories. Billing question, technical issue, shipping problem, return request, product feedback, or other. Doing it manually takes days. Claude reads each message, understands the intent, and assigns the right category in seconds.
“In column A I have customer inquiry messages, one per row, 3,000 rows total. In column B, classify each message into one of these categories: Billing Question, Technical Issue, Shipping Problem, Return Request, Product Feedback, Other. If a message fits multiple categories, choose the most prominent one. If you are uncertain about a classification, mark it Review Needed in column B and add a brief reason in column C.”
The Review Needed flag is worth including every time you do this kind of task. It gives you a quality check column you can scan quickly rather than having to trust every single output. In practice, maybe 3 to 5 percent of rows get flagged, and most of those are genuinely ambiguous cases worth a second look.
Why Claude Works Better Than ChatGPT and Copilot for Excel
Three things stood out across all eight tasks.
Claude reads context across the whole spreadsheet, not just the cell or range you point it at. When it writes a formula or merges sheets, it is understanding what the surrounding data means, not just what it says literally.
It explains what it did. ChatGPT often gives you the output without the reasoning. Claude gives you both, which means you learn something from it rather than just getting a result you cannot replicate next time.
And it handles ambiguity better than anything else I have used. Real work data is full of ambiguity. Headers that kind of match but not exactly. Categories that overlap. Values that could mean two different things depending on context. Claude makes judgment calls in those situations and tells you what judgment it made, so you can correct it if it got something wrong.
For clean, well-structured demo data, ChatGPT and Copilot are fine. For the kind of spreadsheet that has been touched by multiple people over multiple years, Claude is in a different category. That is the honest summary after running it through eight weeks of real tasks.