SHARE
Facebook X Pinterest WhatsApp

AI Security Leaderboard Reveals Model Cybersecurity

thumbnail
AI Security Leaderboard Reveals Model Cybersecurity

Introducing a new benchmark for AI security, the CalypsoAI Security Leaderboard evaluates the cybersecurity of leading language models using advanced simulation tools and a comprehensive new metric, CASI.

Apr 30, 2025

The use of AI and large language models (LLMs) introduces several critical security concerns, with data privacy and prompt injection attacks among the most pressing. Additionally, LLMs can inadvertently expose sensitive or proprietary information if not properly secured, especially when trained on or interacting with confidential datasets. Ensuring secure deployment, rigorous access control, and continuous monitoring is essential to mitigate these threats in enterprise environments.

To that end, Calypso AI has launched a new leaderboard aptly named “the CalypsoAI Security Leaderboard.” It’s designed to rank the security of popular AI models by simulating attacks on them using the startup’s AI agent. It ranks the models using its Inference Platform. This could provide organizations with valuable input as they make decisions to adopt third-party models into their workflows.

How does the leaderboard work?

The Leaderboard employs a specific component known as Red-Team, which is part of CalypsoAI’s Inference Platform. This component conducts security checks by simulating malicious prompts that aim to exploit weaknesses in large language models (LLMs). Red-Team’s toolkit includes over 10,000 prompts and an AI agent capable of crafting bespoke cyberattacks, such as tricking a banking chatbot into revealing sensitive credit card information.

The results of these evaluations are quantified through the CalypsoAI Security Index (CASI). This metric offers a detailed measure of an LLM’s security by considering the severity of potential vulnerabilities and the sophistication of potential cyberattacks. Unlike the common Attack Success Rate (ASR) metric, CASI provides a deeper insight into security levels by incorporating the severity and complexity of threats.

See also: With AI, It’s a Complex Future for Cybersecurity

Advertisement

How do the rankings look at this stage?

The initial release of the Leaderboard has ranked twelve popular LLMs. Topping the list is Anthropic PBC’s Claude 3.5 Sonnet with a CASI score of 96.25, followed closely by Microsoft Corp.’s Phi4-14B and Claude 3.5 Haiku. Notably, there is a significant gap after the top three, with OpenAI’s GPT-4o landing a distant fourth with a score of 75.06.

The Leaderboard also introduces additional metrics like the risk-to-performance ratio and the cost of security, which aid organizations in balancing security against performance and assessing financial risks linked to AI breaches. The tool may help companies more fully understand where their adopted tools stand in cybersecurity and may offer critical decision-making criteria. The CalypsoAI Security Leaderboard aims to set a benchmark for AI’s safe and scalable integration into business operations.

thumbnail
Elizabeth Wallace

Elizabeth Wallace is a Nashville-based freelance writer with a soft spot for data science and AI and a background in linguistics. She spent 13 years teaching language in higher ed and now helps startups and other organizations explain - clearly - what it is they do.

Recommended for you...

AI Agents Need Keys to Your Kingdom
The Rise of Autonomous BI: How AI Agents Are Transforming Data Discovery and Analysis
Why the Next Evolution in the C-Suite Is a Chief Data, Analytics, and AI Officer
Digital Twins in 2026: From Digital Replicas to Intelligent, AI-Driven Systems

Featured Resources from Cloud Data Insights

The Difficult Reality of Implementing Zero Trust Networking
Misbah Rehman
Jan 6, 2026
Cloud Evolution 2026: Strategic Imperatives for Chief Data Officers
Why Network Services Need Automation
The Shared Responsibility Model and Its Impact on Your Security Posture
RT Insights Logo

Analysis and market insights on real-time analytics including Big Data, the IoT, and cognitive computing. Business use cases and technologies are discussed.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.