Data Privacy Concerns Deter Enterprises From Commercial LLMs


A new survey reveals insights on challenges, open-source alternatives, and the future of commercial LLMs.

Businesses of all shapes, sizes, and sectors are trying to figure out how to integrate large language models (LLMs) into their workflows, but so far, very few are willing to integrate commercial LLMs in production, primarily due to data privacy concerns. 

That’s according to a new survey published by low-code platform Predibase, which asked over 150 people in the artificial intelligence and machine learning industry about their company’s investments in LLMs, challenges, open-source interest, and customization efforts. 

SEE ALSO: Maximizing Software Quality with Artificial Intelligence

In the report, more than 85 percent of respondents said they had no plans to use commercial LLMs in the production stage, with 44 percent of that group using LLMs for experimentation, 27 percent expecting to start using LLMs in the next 12 months, and 15 percent having no immediate plans. 

Only 14 percent of respondents had LLMs in production, with one percent having more than two. 

“It is now open season for LLMs. Thanks to the widespread recognition of OpenAI’s ChatGPT, businesses are in an arms race to gain a competitive edge using the latest AI capabilities. Still, they require more customized LLMs to meet domain-specific use cases,” said Piero Molino, co-founder and CEO of Predibase, in a statement.

The most prominent reason for delaying or avoiding LLMs in production is giving up access to proprietary data. Some of the most popular LLMs are not open-source and by injecting proprietary data they may have to sacrifice ownership of that data to the LLM owner, which is then used to further train the model. According to Predibase, organizations can get over this roadblock by using open-source LLMs in a virtual private cloud, providing them with ways to maintain full ownership of their data. 

Another hurdle 30 percent of organizations face is customization and fine-tuning, as this often requires a high velocity of proprietary data and expertise in how to train LLMs. Outside of leading-edge businesses, most do not have the resources of talent to pull this off, which forces them to work with third-party providers or consultancy firms. There are open-source platforms coming available aimed at reducing the complexity of model training, which should enable more businesses to succeed. 

To train state-of-the-art LLMs organizations need to be willing to spend millions on cloud servers, AI, and other software costs. In the survey, 17 percent said this was an roadblock to adding LLMs in production. 

Other roadblocks included LLM hallucinations, which happen to all types of generative AI services. Even ChatGPT, which is seen by some as the golden standard for generative AI sophistication, is susceptible to hallucinations and can make up statistics and information, although OpenAI and others are embedding new techniques to reduce this from happening. Latency was the last concern mentioned by Predibase, as LLMs need a lot of compute resources to perform at sub-second timeframes. 

Even with these roadblocks, there is a keen awareness in the technology industry of the benefits which integrating LLMs into workflows can bring. The growth of open-source alternatives should make the market more accessible to organizations of all sizes, and it seems like the gap between proprietary solutions and open-source is closing every year. 

David Curry

About David Curry

David is a technology writer with several years experience covering all aspects of IoT, from technology to networks to security.

Leave a Reply

Your email address will not be published. Required fields are marked *