SHARE
Facebook X Pinterest WhatsApp

Using Smaller ML Models To Train Large Language Models

thumbnail
Using Smaller ML Models To Train Large Language Models

Futuristic robot artificial intelligence enlightening AI technology development and machine learning concept. Global robotic bionic science research for future of human life. 3D rendering graphic.

A new research project by MIT has developed a way to train large language models on smaller machine learning algorithms.

Written By
thumbnail
David Curry
David Curry
Apr 26, 2023

Large language models, which are the frameworks OpenAI, Google, and others have used to build chatbots such as Bard, BlenderBot, and ChatGPT are enormous endeavors, trained on billions of parameters which take a lot time to source, arrange, and feed into the model. 

Often, at the start of a new project, developers will begin the process anew, sourcing billions of bits of data to feed into a new large language model (LLM). This is time consuming and can rack up the costs of developing a model, while also harming the environment by running computers for weeks or months to train the model. 

SEE ALSO: Algorithmic Exclusion Harming AI Ability To Make Successful Predictions

Researchers at MIT have developed a way for creators of these LLMs to integrate old models into new development, through a method called Linear Growth Operator (LiGO). 

This method uses smaller models, which may run in the millions of parameters, to train a much larger language model. It encodes the knowledge learned during its own training to teach the LLM, which can lead to up to 50 percent reduction in computational cost. 

“It’s been estimated that training models at the scale of what ChatGPT is hypothesized to run on could take millions of dollars, just for a single training run. Can we improve the efficiency of these training methods, so we can still get good models in less time and for less money? We propose to do this by leveraging smaller language models that have previously been trained,” said Yoon Kim, assistant professor in MIT’s Department of Electrical Engineering and Computer Science and co-author of the paper

The LiGO method can be utilized by developers working on vision and language models, often improving their performance and as well as lowering computational costs. It expands the width and depth of the model, by creating a linear map of the operation which transforms input values into output values. 

LLMs have continued to increase in size over the past half decade, with Google’s BERT, one of the first notable LLMs to use the transformer mechanism in 2018, being trained on 340 million parameters. By 2020, OpenAI was training GPT-3 on 175 billion parameters, and Google has trained GLaM on 1.2 trillion. OpenAI’s GPT-4 is estimated to have been trained on over 1.5 trillion parameters, although that has not been confirmed by OpenAI.

Finding ways to more efficiently train these LLMs is imperative, especially for developers which do not have the resources or capacity to compete with OpenAI (backed by Microsoft) and Google. 

Speaking on the subject of ever-increasing resource needs for LLMs, Kim said: “This has led to an arms race of companies trying to train larger and larger transformers on larger and larger datasets. More so than other architectures, it seems that transformer networks get much better with scaling. We’re just not exactly sure why this is the case.” 

thumbnail
David Curry

David is a technology writer with several years experience covering all aspects of IoT, from technology to networks to security.

Recommended for you...

The Rise of Autonomous BI: How AI Agents Are Transforming Data Discovery and Analysis
Why the Next Evolution in the C-Suite Is a Chief Data, Analytics, and AI Officer
Digital Twins in 2026: From Digital Replicas to Intelligent, AI-Driven Systems
Real-time Analytics News for the Week Ending December 27

Featured Resources from Cloud Data Insights

The Difficult Reality of Implementing Zero Trust Networking
Misbah Rehman
Jan 6, 2026
Cloud Evolution 2026: Strategic Imperatives for Chief Data Officers
Why Network Services Need Automation
The Shared Responsibility Model and Its Impact on Your Security Posture
RT Insights Logo

Analysis and market insights on real-time analytics including Big Data, the IoT, and cognitive computing. Business use cases and technologies are discussed.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.