Dynaboard: Holistic Next-Generation AI Model Benchmarking

Dynaboard allows users to interact with uploaded models in real time to assess their quality and permits the collection of additional metrics such as memory use, throughput, and robustness.

One exciting application for artificial intelligence in business is the ability for AI to talk to and understand humans. Facebook created Dynabench, a first-of-its-kind benchmarking platform that allows humans to evaluate the uploaded models. Facebook has recently announced an expansion of this platform, the Dynaboard.

Dynaboard is software designed to expand evaluations from accuracy-based to a more holistic approach. Much like human language itself, Dynaboard hopes to train, evaluate, and innovate artificial intelligence language models based on a series of interconnected characteristics.

The software allows developers to visualize the tradeoffs developers make by training for one characteristic over another. It allows for increase accuracy benchmarks, for example, or a focus on fairness language instead of weighting accuracy above everything.

Additionally, according to Facebook, “the software evaluates Natural Language Processing (NLP) models directly instead of relying on self-reported metrics or predictions on a single dataset. Under this paradigm, models are submitted to be evaluated in the cloud, circumventing the issues of reproducibility, accessibility, and backwards compatibility that often hinder benchmarking in NLP. This allows users to interact with uploaded models in real time to assess their quality, and permits the collection of additional metrics such as memory use, throughput, and robustness.”

How it works: Evaluation-as-a-Service

Dynaboard addresses the inherent challenges of thinking of benchmarking as a single correct solution. Instead, it approaches the process using a human-machine loop that operates on a “Dynascore” set by the developer. By placing more or less weight on each component of the Dynascore, developers and researchers can evaluate the real-world implications of their designs.

Facebook has collected over 400,000 examples so far and uploaded two challenging datasets. With an overall focus on language understanding, Facebook wants to lower the obstacles associated with rigorous testing.

Dynaboard requires minimal overhead, allowing developers to test new solutions with this all-in-one software. Score components include:

Accuracy
Compute
Memory
Robustness
Fairness

Plus, the metric leaves room for improvement from Facebook, the community, and other developers.

Improving AI Benchmarks

Researchers can upload their own models now through a command-line interface tool and library known as Dynalab. The ultimate goal is to show what state-of-the-art models can accomplish. Facebook hopes to contribute to the long-term development of fair, unbiased AI that’s beneficial and useful in the real world.

About Elizabeth Wallace

Leave a Reply Cancel reply

How it works: Evaluation-as-a-Service

Improving AI Benchmarks

About Elizabeth Wallace

Recommended Articles

Leave a Reply Cancel reply