A Return to Small Data for 2021


The goal of pursuing small data is to create a truly human-like intelligent machine capable of learning the way children do when exploring the world for the first time.

Big data gets a ton of press. It’s glamorous and mysterious. Humans don’t understand or comprehend, so it belongs solely in the realm of the machines. The (re)emergence of small data aims to change that.

While businesses do need so much information to manage insights, it’s expensive. The AI black box makes it difficult to understand conclusions or have oversight. And companies are looking for faster, more efficient ways to train models to do what they need.

Enter: Small data, which is the “realm of humans.” A new push to reduce training times is bringing it back. It’s easier to grasp conclusions until explainable AI debuts. And it looks like small data could be this year’s newest trend.

See also: 5 Challenges Of Big Data Analytics in 2021

What is it?

Small data is just what it sounds like – it is limited-sized datasets. Machines may think “like” us, but they require so much information to come to a conclusion that the average human brain is still the superior learner.

Such datasets are typically in the sub-terabyte range. They’re processable with a regular computing system in a normal amount of time. Based on those parameters, much of the data companies currently have falls into small data. By focusing on big data, companies are missing great opportunities.

Small data can make small impacts and help drive more significant projects. Such data is perfect for quick decisions and near-instant analysis. Companies can dive into the specific data sets to answer smaller but immediately impactful questions. This could be the answer to overcoming the plateau we’ve seen in AI applications.

Why the big deal?

At the beginning of machine learning, it took hundreds of thousands of examples to learn something simple, making training time-consuming and expensive. On the flip side, imagine when babies learn to grasp an object. Once they learn the basic motion of grasping a toy, it doesn’t take much to apply those same motions to pick up a bottle. From there a spoon, and from there, anything in the world.

The reduction in steps is called transfer learning. Machines are not good at this process. Once a machine learns to pick up one object, say a box (through thousands if not hundreds of thousands of examples), it cannot just move that knowledge to pick up a rug. It has to start almost completely over.

So, a human needs one repetition to know the difference between a box and a rug. Even with the introduction of transfer learning in artificial intelligence, machines still need a few thousand repetitions. The goal here is to move to the kind of small data humans need, i.e., as few as one, to teach a machine a new task.

Potential use cases

It’s not just the expense spurring small data exploration. Launching specific applications will require machines to train on smaller data sets. For example, remote monitoring in medical applications would thrive on smaller data sets.

In cardio remote monitoring, for example, patient heart irregularities are a single data source. Each patient’s heartbeat is a unique thing. Machines would need to learn quickly based on that single patient’s data how to detect when something is going on.

In cases like that, humans are still heavily involved with the monitoring process, making small data sets even more critical. Researchers sometimes struggle to make any definitive conclusions about the “why” behind AI’s insights in the medical research field because of the incomprehensibility of big datasets.

In another example, new manufacturing operations do not produce enough training data to fully deploy AI the way manufacturers would like. Without volumes of historical information for quality control, as an example, manufacturers haven’t been able to take full advantage of the models.

The problem isn’t a lack of information. It’s a serious lack of the right kind of information. For example, when we have a production line, a machine can certainly learn all about what goes right. But without human-level transfer learning, a single irregularity or defect isn’t enough to trigger the AI learning response. Manufacturers need examples (again, thousands of them) of defects to create effective training data.

In both cases, serious inefficiencies are slowing progress. With a better grasp of small datasets, even down to one or less-than-one, we could reach a new era of artificial intelligence.

The end goal for small data

The goal of pursuing small data is to create a truly, humanly intelligent machine capable of learning the way children do when exploring the world for the first time. Companies know that deploying AI is going to become a differentiator in Industry 4.0, but the compute-intensive, expensive training is a serious barrier.

Tiny AI would allow companies to deploy artificial intelligence projects with far less investment. It could also inspire new use cases in remote medical monitoring, manufacturing, or even other types of monitoring that suffers from a lack of information to make decisions.

Companies are under pressure to generate growth from technology initiatives but deploying data-hungry AI models slows things down. Companies are very aware of money left on the table as AI becomes a differentiator.

Small data offers ample opportunities. It can answer strategic questions in your business. It can help you get your data management in order. As companies wrangle with all kinds of information, focusing exclusively on larger datasets often introduces complexity that derails even the best projects. Exploring such datasets in-depth is shaping up to a must-do for 2021.

Elizabeth Wallace

About Elizabeth Wallace

Elizabeth Wallace is a Nashville-based freelance writer with a soft spot for data science and AI and a background in linguistics. She spent 13 years teaching language in higher ed and now helps startups and other organizations explain - clearly - what it is they do.

Leave a Reply

Your email address will not be published. Required fields are marked *