SHARE
Facebook X Pinterest WhatsApp

New Tool Offers Help with Data Annotation

thumbnail
New Tool Offers Help with Data Annotation

Innovative tools and data annotation service companies help produce much higher quality data to train AI models in much faster time frames.

Feb 4, 2020

Data labeling and annotation are critical in training machine learning (ML) models and artificial intelligence (AI) algorithms that are used in continuous intelligence applications (CI). Could Annotations, a new data annotation tool from IBM gives businesses yet another option to help with this time-consuming, yet vital task.

See also: Data Annotation Feeds the AI Beast

The tool, released on Github, is a fast, easy, and collaborative open-source image annotation tool for teams and individuals. The tool uses AI to automate data annotation, helping to reduce many manual steps of drawing outlines around objects and more.

Cloud Annotations supports uploading both photos and videos. However, there are a few limitations to consider. IBM includes some best practices to ensure businesses get the best results when using the tool. Guidance and suggestions provided by IBM for best use include:

  • Object Type: The model is optimized for photographs of objects in the real world. They are unlikely to work well for x-rays, hand drawings, scanned documents, receipts, etc.
  • Object Environment: The training data should be as close as possible to the data on which predictions are to be made. For example, if your use case involves blurry and low-resolution images (such as from a security camera), your training data should be composed of blurry, low-resolution images. In general, you should also consider providing multiple angles, resolutions, and backgrounds for your training images.
  • Difficulty: The model generally can’t predict labels that humans can’t assign. So, if a human can’t be trained to assign labels by looking at the image for 1-2 seconds, the model likely can’t be trained to do it either.
  • Label Count: We recommend at least 50 labels per object category for a usable model, but using 100s or 1000s would provide better results.
  • Image Dimensions: The model resizes the image to 300×300 pixels, so keep that in mind when training the model with images where one dimension is much longer than the other.
  • Object Size The object of interest’s size should be at least ~5% of the image area to be detected. For example, on the resized 300×300 pixel image, the object should cover ~60×60 pixels.

Expanding the market

Cloud Annotation is the latest tool designed to help with data annotation for items used in ML and AI training. Some tools that offer help in this area include Intel’s Computer Vision Annotation Tool (CVAT) and Google’s Fluid Annotation.

The Computer Vision Annotation Tool (CVAT) is an open-source tool for annotating digital images and videos. The main function of the application is to provide users with convenient annotation instruments. For that purpose, Intel designed CVAT as a versatile service with many features.

CVAT is a browser-based application for both individuals and teams that supports different work scenarios. The main tasks of supervised machine learning can be divided into three groups:

  • Object detection
  • Image classification
  • Image segmentation

CVAT lets users annotate data for each of these cases.

Fluid Annotation first runs an image through a pre-trained semantic segmentation model (Mask-RCNN). This generates around 1000 image segments with their class labels and confidence scores. The segments with the highest confidences are used to initialize the labeling, which is presented to the annotator. Afterward, the annotator can: (1) Change the label of an existing segment choosing from a shortlist generated by the machine. (2) Add a segment to cover a missing object. The machine identifies the most likely pre-generated segments, through which the annotator can scroll and select the best one. (3) Remove an existing segment. (4) Change the depth-order of overlapping segments.

Meeting Market Demands

Businesses that want to build CI applications that use AI need high-quality data to train the AI models. Such a need has created a new market for data annotation tools and services. Complementing tools like IBM’s Cloud Annotation, a booming industry has emerged, comprised of companies that specialize in speedy and highly accurate data annotation services. Some of the companies in this market deliver domain-specific labeled data.

The companies that provide such services provide greater value than a public crowdsources service might offer. Instead, this new breed of companies use highly trained data labelers, and many develop their own advanced annotation tools. Many of those tools are AI-based to work on their own or in tandem with a human operator.

One example of such a company is Samasource, which uses a secured cloud annotation platform, SamaHub, to manage the annotation lifecycle. This includes image upload, annotation, data sampling, and QA, data delivery, and overall collaboration.

Taken together, the innovative tools and data annotation service companies help produce much higher quality data to train AI models in much faster time frames. The availability of high-quality annotated data in a speedier manner can only help businesses build more resilient and reliable AI and CI systems and applications. 

thumbnail
Salvatore Salamone

Salvatore Salamone is a physicist by training who writes about science and information technology. During his career, he has been a senior or executive editor at many industry-leading publications including High Technology, Network World, Byte Magazine, Data Communications, LAN Times, InternetWeek, Bio-IT World, and Lightwave, The Journal of Fiber Optics. He also is the author of three business technology books.

Recommended for you...

The Rise of Autonomous BI: How AI Agents Are Transforming Data Discovery and Analysis
Why the Next Evolution in the C-Suite Is a Chief Data, Analytics, and AI Officer
Digital Twins in 2026: From Digital Replicas to Intelligent, AI-Driven Systems
Real-time Analytics News for the Week Ending December 27

Featured Resources from Cloud Data Insights

The Difficult Reality of Implementing Zero Trust Networking
Misbah Rehman
Jan 6, 2026
Cloud Evolution 2026: Strategic Imperatives for Chief Data Officers
Why Network Services Need Automation
The Shared Responsibility Model and Its Impact on Your Security Posture
RT Insights Logo

Analysis and market insights on real-time analytics including Big Data, the IoT, and cognitive computing. Business use cases and technologies are discussed.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.