SHARE
Facebook X Pinterest WhatsApp

The State of AI: Better Computer Vision, Faster NLP

thumbnail
The State of AI: Better Computer Vision, Faster NLP

Businessman on blurred background using digital artificial intelligence icon hologram 3D rendering

AI has become part of real-life scenarios, including managing national electric grids, supermarket warehousing optimization, drug discovery, and healthcare.

Written By
thumbnail
Joe McKendrick
Joe McKendrick
Nov 30, 2021

Artificial intelligence (AI) is seeing strides in the areas of image recognition and natural language processing. In additon, there has been record funding this year into AI startups, and IPOs for data infrastructure and cybersecurity companies that help enterprises retool for the AI-first era.

These are some of the observations of Nathan Benaich and Ian Hogarth’s fourth annual and densely packed “State of AI” report reviewing developments in the field over the past year. They report that over the past year, AI has become part of real-life scenarios, including managing national electric grids, automated supermarket warehousing optimization, drug discovery, and healthcare.

The report also tracked the following developments:

Self-supervision is taking over computer vision: The report’s authors point to Facebook AI’s introduction of SEER, a self-supervised model pre-trained on a billion Instagram images that achieves 84.2% accuracy on ImageNet, comfortably surpassing all existing self-supervised models. SEER is also “a good few-shot learner,” they related, noting that: it still achieves 77.9% accuracy on ImageNet when trained with only 10% of the dataset. It also outperforms supervised methods on other tasks like object detection and segmentation.”

Transformers extend into efficient self-attention-based architectures. In addition, Benaich and Hogarth document the rise of “transformers,” or neural network-based deep-learning architectures, as a key part of AI. have emerged as a general-purpose architecture for machine learning, increasingly applied to natural language processing (NLP) and computer vision. “DeepMind’s Perceiver is one such architecture,” they observe.

“Textless” natural language processing emerges. Textless NLP is based on Generative Spoken Language Modeling (GSLM), which enables the “task of learning speech representations directly from raw audio without any labels or text.”

Less is more: watching a few clips is enough to learn how to caption a video. “To solve video-and-language (V&L) tasks like video captioning, a new program called ClipBERT “only uses a few sparsely sampled short clips,” according to Benaich and Hogarth. “It still outperforms existing methods that exploit full-length videos.” At the same time, they note, “a natural improvement of this process would be end-to-end learning of vision and text encoders. But due to the length of the video clips, this is usually computationally unaffordable.”

 


thumbnail
Joe McKendrick

Joe McKendrick is RTInsights Industry Editor and industry analyst focusing on artificial intelligence, digital, cloud and Big Data topics. His work also appears in Forbes an Harvard Business Review. Over the last three years, he served as co-chair for the AI Summit in New York, as well as on the organizing committee for IEEE's International Conferences on Edge Computing. (full bio). Follow him on Twitter @joemckendrick.

Recommended for you...

Excel: The Russian Tsar of BI Tools
Real-time Analytics News for the Week Ending January 24
Beware the Distributed Monolith: Why Agentic AI Needs Event-Driven Architecture to Avoid a Repeat of the Microservices Disaster
Ali Pourshahid
Jan 24, 2026
The Key Components of a Comprehensive AI Security Standard
Elad Schulman
Jan 23, 2026

Featured Resources from Cloud Data Insights

Excel: The Russian Tsar of BI Tools
Real-time Analytics News for the Week Ending January 24
Beware the Distributed Monolith: Why Agentic AI Needs Event-Driven Architecture to Avoid a Repeat of the Microservices Disaster
Ali Pourshahid
Jan 24, 2026
The Key Components of a Comprehensive AI Security Standard
Elad Schulman
Jan 23, 2026
RT Insights Logo

Analysis and market insights on real-time analytics including Big Data, the IoT, and cognitive computing. Business use cases and technologies are discussed.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.