SHARE
Facebook X Pinterest WhatsApp

Data Annotation Feeds the AI Beast

thumbnail
Data Annotation Feeds the AI Beast

Portrait of attractive woman with a scnanning grid on her face. Face id, security, facial recognition, future technology concept.

The demand for AI-enabled applications that deliver increasingly refined results is driving the need for high-quality annotated data to train AI models.

Dec 3, 2019

Many continuous intelligence (CI) applications need trained AI models to work. An autonomous vehicle relies on sample data sets that help it differentiate objects and identify road markings and traffic signs. Similarly, an automated video surveillance system needs a data set to learn how to distinguish between a raccoon and an intruder. If the quality of that training data is not right, the performance of the AI models will not be satisfactory. 

The booming demand for AI-enabled applications that deliver increasingly refined results is driving need for data that is suitable to train AI models. Meeting the demand is challenging. It is one thing to classify cat images on social media. Building a high-quality dataset for facial recognition or autonomous vehicles is much more complex.

See also: Continuous Intelligence Data Considerations

In the past, facial recognition only used several dots on a human face. Now, facial key-point labeling can involve more than 200 dots with dozens used to clearly define each eyebrow, the lips, and the jawline, and more. Such detail is needed to train AI models to determine more than simple things like whether the person is male or female. Models now might also be used to determine race, age, and emotions.

One indication of the need for such data comes from China. There, the data service company Testin set up shop in Hengdian World Studios, also known as “Chinawood,” the largest film studio in Asia. Instead of making motion pictures as other tenants of the facility do, Testin photographs and films actors preforming facial expressions depicting laughing, crying, anger, and more. The images and videos are then used in facial key-point labeling for Chinese AI companies.

Self-driving Cars Need Data, Too

The quest for data to train autonomous systems is also booming. To get a sense of the complexity and level of detail needed for autonomous vehicles, consider the Waymo Open Dataset. The dataset includes high-resolution sensor data collected by Waymo self-driving cars in a wide variety of conditions. The data can be used by companies trying to train AI driving algorithms. This public database includes roughly 3,000 driving scenes, 16.7 hours of video data, 600,000 frames, and approximately 25 million 3D bounding boxes and 22 million 2D bounding boxes. (The most impressive thing about these numbers is that they represent just a tiny fraction of Waymo’s private autonomous driving database.)

A typical high-quality self-driving dataset might include great volumes of metadata and annotations, including such things as:

  • pixel-wise semantic annotation
  • 3D semantic annotation
  • pixel-wise object instance annotation
  • fine-grained road segmentation
  • moving object trajectory
  • high-precision GPS data.

The Role of Data Annotators

Businesses that want to build CI applications that use AI need high-quality data to train the AI models. Such a need has created a new market for data annotation services. The companies that provide such services provide greater value than a public crowdsources service might offer. Instead, this new breed of companies use highly trained data labelers, and many develop their own advanced annotation tools.

The new data labeling companies differentiate themselves from traditional crowdsourcing platforms that offer labeling services. The companies in this new category often tout their offerings as managed data labeling services. They deliver domain-specific labeled data that undergoes quality control.

If funding is a measure of the need or value of these new companies, the services they provide are indeed in great demand. Earlier this year, Scale AI closed $100 million in funding, bringing its valuation above the $1 billion mark. And last month, CloudFactory announced it has raised an additional $65 million in venture funding, bringing its total funding to $78 million.

Why the high level of investment? The human insight such annotation companies provide helps minimize labeling bias and yields data that is more precise and more accurate. This, in turn, helps produce much higher quality data to train AI models. And that leads to more resilient and reliable AI systems. 

thumbnail
Salvatore Salamone

Salvatore Salamone is a physicist by training who writes about science and information technology. During his career, he has been a senior or executive editor at many industry-leading publications including High Technology, Network World, Byte Magazine, Data Communications, LAN Times, InternetWeek, Bio-IT World, and Lightwave, The Journal of Fiber Optics. He also is the author of three business technology books.

Recommended for you...

Real-time Analytics News for the Week Ending January 10
Model-as-a-Service Part 1: The Basics
If 2025 was the Year of AI Agents, 2026 will be the Year of Multi-agent Systems
AI Agents Need Keys to Your Kingdom

Featured Resources from Cloud Data Insights

The Difficult Reality of Implementing Zero Trust Networking
Misbah Rehman
Jan 6, 2026
Cloud Evolution 2026: Strategic Imperatives for Chief Data Officers
Why Network Services Need Automation
The Shared Responsibility Model and Its Impact on Your Security Posture
RT Insights Logo

Analysis and market insights on real-time analytics including Big Data, the IoT, and cognitive computing. Business use cases and technologies are discussed.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.