SHARE
Facebook X Pinterest WhatsApp

AI Language Models Struggle With Quantitative Reasoning Problems

thumbnail
AI Language Models Struggle With Quantitative Reasoning Problems

cognitive computing

AI language models may have far surpassed humans in some computational areas, but quantitative reasoning continues to be a difficulty for them.

Written By
thumbnail
David Curry
David Curry
Nov 30, 2022

One of the advantages of deploying artificial intelligence is its immense computational assistance, which can run calculations in a fraction of the time it would take a human. 

While number crunching and calculations may be the realm of computers now, it appears that quantitative reasoning, or applying mathematics to real-world problems, has stumped even the most sophisticated of AI language models. 

See Also: Machine Learning Isn’t An Instant Fix For Fraud

The Center for AI Safety developed an data set for quantitative reasoning, called MATH, and put some of the top-of-the-line AI language models to the test. The results were not great, with the models averaging seven percent in the test, much lower than the human grad student average of 40 percent. Math Olympiad champions scored 90 percent on the test. 

The reason for this low test score is due to quantitative reasoning requiring a combination of skills including parsing a question, recalling formulas, and properly interpreting the problem through step-by-step solutions. If the AI slips up at any one of the steps, it can cause major deviation from the correct answer. 

According to reporting from technology and science magazine IEEE Spectrum, AI researchers from University of California, Berkeley, OpenAI, and Google have all performed better with a less intense data set, called GSM8K, which was produced by OpenAI and features grade-school level problems. 

Google Minerva, which is built on the company’s Pathways Language Model (PaLM), has seen the most success, announcing the model had reached 78 percent accuracy in June. This was ahead of OpenAI’s expectations, as they previously said its GPT model would need to be trained on 100 times more data to achieve 80 percent accuracy. 

Google says it achieved this improvement in accuracy with minimal scaling upwards, through “chain-of-thought prompting”, which breaks down larger problems into more manageable chunks, alongside majority voting, which runs the same problem 100 times instead of just once and choose the solution which it went for the most. It has seen improvement in Minerva’s accuracy with the MATH data set, hitting 50 percent recently. 

“Our approach to quantitative reasoning is not grounded in formal mathematics,” said Ethan Dyer and Guy Gur-Ari, research scientists at Google Research. “Minerva parses questions and generates answers using a mix of natural language and LaTeX mathematical expressions, with no explicit underlying mathematical structure. This approach has an important limitation, in that the model’s answers cannot be automatically verified. Even when the final answer is known and can be verified, the model can arrive at a correct final answer using incorrect reasoning steps, which cannot be automatically detected.”

Artificial intelligence has been marketed as more than just calculations and arithmetic, possibly making more accurate decisions than humans based on data. To that end, building AI language models that are able to employ critical thinking and quantitative reasoning at a high level seems like a necessity, if humans are ever going to seriously contemplate shifting parts of decision making to algorithms.

thumbnail
David Curry

David is a technology writer with several years experience covering all aspects of IoT, from technology to networks to security.

Recommended for you...

The Rise of Autonomous BI: How AI Agents Are Transforming Data Discovery and Analysis
Why the Next Evolution in the C-Suite Is a Chief Data, Analytics, and AI Officer
Digital Twins in 2026: From Digital Replicas to Intelligent, AI-Driven Systems
Real-time Analytics News for the Week Ending December 27

Featured Resources from Cloud Data Insights

The Difficult Reality of Implementing Zero Trust Networking
Misbah Rehman
Jan 6, 2026
Cloud Evolution 2026: Strategic Imperatives for Chief Data Officers
Why Network Services Need Automation
The Shared Responsibility Model and Its Impact on Your Security Posture
RT Insights Logo

Analysis and market insights on real-time analytics including Big Data, the IoT, and cognitive computing. Business use cases and technologies are discussed.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.