AI models could be hacked using a completely new type of Skeleton Key attacks, Microsoft warns

Microsoft has shared details about a new hacking method that bypasses the security systems built into AI models and causes them to return malicious, dangerous and damaging content.

The researchers call this technique Skeleton Key, and it is applied to well-known models including Meta Llama3-70b-instruct (base), Google Gemini Pro (base), OpenAI GPT 3.5 Turbo (hosted), OpenAI GPT 4o (hosted), Mistral Large (hosted), Anthropic Claude 3 Opus (hosted), and Cohere Commander R Plus (hosted).

scroll to top