ChatGPT can't reliably add 61 numbers in random order, giving 12 different wrong answers despite showing convincing fake work. The AI optimizes for speed over accuracyChatGPT can't reliably add 61 numbers in random order, giving 12 different wrong answers despite showing convincing fake work. The AI optimizes for speed over accuracy

ChatGPT Is Gaslighting You With Math

\ ChatGPT can write efficient Python code and draft complex SQL queries in seconds. Heck, it’ll brainstorm your entire marketing campaign if you let it. We’re rushing to trust this “genius,” latest models and all, with our legal documents and the code that runs our business operations.

As a Business Intelligence Analyst with over a decade of experience at companies like Amazon and Microsoft, I was curious. I’ve watched this technology go from a toy to a tool that many claim is ready to replace me. But while everyone tests AI with harder problems, I decided to do the opposite.

The results revealed what’s really going on: a major design choice, a cost-versus-performance tradeoff, that runs the entire system. It’s a transparency problem, and it’s a gap big enough to make any data professional skeptical. It revealed that while AI is great at writing code, it can’t be trusted with the most basic building block of my job: simple arithmetic.

And the scariest part? It’s confidently wrong.

The math problem that revealed the trick

My test was a simple list of all the numbers from -3000 to 3000 (at intervals of 100), which came to 61 numbers total. The list was specifically designed so the final, correct answer would be a simple “zero.” This way, I’d know instantly if it was right.

Here’s where the “shortcut” behavior became clear. When I gave it the list sorted in ascending or descending order, it aced the test every time. It correctly recognized the simple pattern: -3000 cancels 3000, -2900 cancels 2900, and so on. This, as it turns out, tested pattern matching, which is a core AI strength.

But what happens when you remove the pattern? I put those exact same numbers into a random order. This broke the simple shortcut and forced the AI to actually calculate.

It failed, and not by a small margin.

The “performative” failure

The AI didn’t calculate. It put on a performance of calculation.

It’s like watching Noah Wyle on The Pitt or ER: he’s a convincing doctor, but you should never trust him to perform an actual medical procedure on you. In the same way, the AI goes through the motions of calculation. It replies with a “step-by-step” breakdown that looks perfectly logical, but the final answer is confidently and fundamentally wrong.

When I challenged it, it “double-checked” and got a different wrong answer.

The failure was more fundamental than I first thought. This process involved extensive hand holding, like asking it to break the list into small groups of 10 numbers. It still got the math wrong on each of those simple, 10-integer sums. Even after I gave it the correct answers for each group, it still failed to correctly add the six group subtotals.

In all, I got 12 different incorrect answers from this back-and-forth, all from the same prompt.

This wasn’t a failure to handle a “complex” list. It was a complete failure to perform basic addition. In the end, it apologized for its “mental calculation errors,” a priceless admission that it was simulating calculation rather than doing it.

This shortcut-first design is the same reason the AI has historically flubbed other simple tasks, like counting letters in a word. Its main goal is finding the cheapest shortcut, not necessarily the correct answer.

Then I found the hidden calculator

So, is ChatGPT just hopelessly bad at math? Not exactly. I went back to that very first wrong answer and, instead of replying, I clicked the “Think Longer” option.

Instantly, this happened:

# Define the list of numbers numbers = [ 2100, 800, -1800, 2400, 1000, -2400, 1200, ... ] # Calculate the sum total_sum = sum(numbers) total_sum # Result 0

It got the correct answer (zero) in one second. This is the “gotcha.” It had the calculator all along. It was just choosing not to use it.

What’s actually happening here?

This isn’t a bug. What we’re seeing is a design tradeoff. ChatGPT is optimized for speed first and accuracy second in its default mode.

The default, fast path is text-only. It attempts to solve the problem by relying on its vast training data, which includes learned arithmetic patterns. For simple sums (like 2+2), this is reliable. But for tasks that require real precision across many steps, like our 61-number list, it fails. It’s like a person trying to multiply large numbers in their head: they understand the concept of math, but quickly lose track of the intermediate steps and “carries.” This text-only approach is very fast and, more than that, it’s cheap to run.

The thinking path (using the Python calculator) is perfectly accurate, but it’s expensive to run. Don’t get me wrong, this “slow” path still returned an answer in a split second. But for OpenAI, the system cost is enormous. Instead of just predicting the next word, the AI has to do a lot more work: it must spin up a secure Python interpreter, write the code, execute it, and then read the output.

This is the cost-performance tradeoff in action. It’s the why behind everything we just saw.

Here’s where it gets personal

Look, here’s the real risk for data professionals. We assume the AI is analyzing. It’s not. It’s just simulating what the analysis should look like.

Now, I know what the response will be: this is all part of the new “dynamic” system. OpenAI’s own press releases boast about this, calling it a “real-time router” that “quickly decides” whether to respond quickly or to “think longer” for hard problems. This sounds great on paper. But my test shows this so-called smart router can fail on seemingly simple tasks that lack obvious patterns.

When I gave it a 61-number math problem in random order, its internal logic misjudged the difficulty. It seemed to think this was a simple task it could crush. This tells me the router’s heuristics aren’t tuned to catch this kind of “deceptively simple” problem. It’s probably just looking at query length or whether there are math symbols. So, instead of correctly identifying this as a “hard problem” and automatically engaging its thinking model, it chose the fast, text-only path and proceeded to fail.

That’s the career risk right there. Imagine asking the AI to “check the subtotals on this expense report.” It replies with a confident, text-only “Looks correct!” You pass that report to an executive, who quickly does some mental math and realizes your calculations are wrong.

In that moment, you’ve damaged your credibility by relying on a tool for a task it wasn’t designed to handle reliably. The AI’s failure was that it simulated the act of checking instead of actually calculating. And you’re a professional, left holding the bag.

When AI math actually works

So, it’s important to know when AI is reliable for these tasks. My test was designed to hit a specific vulnerability. AI math is generally trustworthy in a few other places:

  • When it explicitly uses its code interpreter (like the “Think Longer” path).
  • For simple, in-context arithmetic (e.g., “I have 3 apples and buy 2 more, how many do I have?”).
  • For symbolic math and explaining concepts (e.g., “Explain the Pythagorean theorem”).

The risk, as my test shows, is not knowing which mode the AI is in.

What you should actually do about this

This doesn’t mean “don’t use AI.” It means we need to use it like a pro, not a novice. We have to treat it like the “shortcut engine” it is. So here’s my practical guide for data analysts based on my findings:

Look, if you remember nothing else: if it shows you code, you can trust the result. If it just talks at you, be skeptical.

Use “Think Longer” for any “right-or-wrong” answer. Don’t wait for it to fail first.

Use the right tool for the job. For straightforward arithmetic, use Excel. It’s built for it and is infinitely more reliable. Why make a “creative writing” engine do a calculator’s job? However, for generating an analytical workflow or cleaning data before the calculation, using the AI with its code execution on is a powerful and genuinely useful way to get work done.

This all comes down to transparency. The AI isn’t flawed because it failed the math; it’s flawed because it hid the failure behind a mask of confidence. It has a perfectly good calculator in its back pocket but defaults to the fast, unreliable method without telling you which one it’s using. As data professionals, that’s just not a foundation we can build on. Look, until these systems tell us how they’re getting an answer, the rule is simple: if it’s not in a code block, it’s not an answer. It’s just a performance.

\

Market Opportunity
MATH Logo
MATH Price(MATH)
$0.03344
$0.03344$0.03344
-0.44%
USD
MATH (MATH) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

U.S. Court Finds Pastor Found Guilty in $3M Crypto Scam

U.S. Court Finds Pastor Found Guilty in $3M Crypto Scam

The post U.S. Court Finds Pastor Found Guilty in $3M Crypto Scam appeared on BitcoinEthereumNews.com. Crime 18 September 2025 | 04:05 A Colorado judge has brought closure to one of the state’s most unusual cryptocurrency scandals, declaring INDXcoin to be a fraudulent operation and ordering its founders, Denver pastor Eli Regalado and his wife Kaitlyn, to repay $3.34 million. The ruling, issued by District Court Judge Heidi L. Kutcher, came nearly two years after the couple persuaded hundreds of people to invest in their token, promising safety and abundance through a Christian-branded platform called the Kingdom Wealth Exchange. The scheme ran between June 2022 and April 2023 and drew in more than 300 participants, many of them members of local church networks. Marketing materials portrayed INDXcoin as a low-risk gateway to prosperity, yet the project unraveled almost immediately. The exchange itself collapsed within 24 hours of launch, wiping out investors’ money. Despite this failure—and despite an auditor’s damning review that gave the system a “0 out of 10” for security—the Regalados kept presenting it as a solid opportunity. Colorado regulators argued that the couple’s faith-based appeal was central to the fraud. Securities Commissioner Tung Chan said the Regalados “dressed an old scam in new technology” and used their standing within the Christian community to convince people who had little knowledge of crypto. For him, the case illustrates how modern digital assets can be exploited to replicate classic Ponzi-style tactics under a different name. Court filings revealed where much of the money ended up: luxury goods, vacations, jewelry, a Range Rover, high-end clothing, and even dental procedures. In a video that drew worldwide attention earlier this year, Eli Regalado admitted the funds had been spent, explaining that a portion went to taxes while the remainder was used for a home renovation he claimed was divinely inspired. The judgment not only confirms that INDXcoin qualifies as a…
Share
BitcoinEthereumNews2025/09/18 09:14
MSCI’s Proposal May Trigger $15B Crypto Outflows

MSCI’s Proposal May Trigger $15B Crypto Outflows

MSCI's plan to exclude crypto-treasury companies could cause $15B outflows, impacting major firms.
Share
CoinLive2025/12/19 13:17
This U.S. politician’s suspicious stock trade just returned over 200% in weeks

This U.S. politician’s suspicious stock trade just returned over 200% in weeks

The post This U.S. politician’s suspicious stock trade just returned over 200% in weeks appeared on BitcoinEthereumNews.com. United States Representative Cloe Fields has seen his stake in Opendoor Technologies (NASDAQ: OPEN) stock return over 200% in just a matter of weeks. According to congressional trade filings, the lawmaker purchased a stake in the online real estate company on July 21, 2025, investing between $1,001 and $15,000. At the time, the stock was trading around $2 and had been largely stagnant for months. Receive Signals on US Congress Members’ Stock Trades Stocks Stay up-to-date on the trading activity of US Congress members. The signal triggers based on updates from the House disclosure reports, notifying you of their latest stock transactions. Enable signal The trade has since paid off, with Opendoor surging to $10, a gain of nearly 220% in under two months. By comparison, the broader S&P 500 index rose less than 5% during the same period. OPEN one-week stock price chart. Source: Finbold Assuming he invested a minimum of $1,001, the purchase would now be worth about $3,200, while a $15,000 stake would have grown to nearly $48,000, generating profits of roughly $2,200 and $33,000, respectively. OPEN’s stock rally Notably, Opendoor’s rally has been fueled by major corporate shifts and market speculation. For instance, in August, the company named former Shopify COO Kaz Nejatian as CEO, while co-founders Keith Rabois and Eric Wu rejoined the board, moves seen as a return to the company’s early innovative spirit.  Outgoing CEO Carrie Wheeler’s resignation and sale of millions in stock reinforced the sense of a new chapter. Beyond leadership changes, Opendoor’s surge has taken on meme-stock characteristics. In this case, retail investors piled in as shares climbed, while short sellers scrambled to cover, pushing prices higher.  However, the stock is still not without challenges, where its iBuying model is untested at scale, margins are thin, and debt tied to…
Share
BitcoinEthereumNews2025/09/18 04:02