Most likely, the future of penetration testing and vulnerability hunting will not depend on AI, but on AI, as is the case on multiple occasions, security experts have warned.
Researchers at the University of Illinois Urbana-Champaign (UIUC) found that a group of large language models (LLMs) outperformed the use of single AI and significantly outperformed ZAP and MetaSploit software.
“Although individual AI agents are incredibly powerful, they are limited by existing LLM capabilities. For example, if an AI agent follows a path (for example, trying to exploit an XSS), it is difficult for the agent to go back and try to exploit” Another vulnerability,” said researcher Daniel Kang, “In addition, LLMs work best when focused on a single task.”
Effective system
AI's shortcoming in finding vulnerabilities is at the same time its greatest strength: once it takes one route, it cannot go back and take a different route. You also work best when you focus on a single task.
Therefore, the group designed a system called Hierarchical Planning and Task-Specific Agents (HPTSA), which consists of a Planner, a Manager, and multiple agents. In this system, a scheduler examines the application (or website) to try to determine which exploits to scan and then assigns them to an administrator. The manager then delegates different paths to different LLM agents.
While the system may seem complex, in practice it has proven to be quite effective. Of the 15 vulnerabilities tested in the experiment, the HPTSA exploited 8 of them. A single GPT-4 agent exploded only 3, meaning HPTSA was more than twice as effective. In comparison, ZAP and MetaSploit software were unable to exploit a single vulnerability.
There was one case where a single GPT-4 agent performed better than HPTSA, and that's when a description of the vulnerability was given in the message. In this way, it managed to exploit 11 of 15 vulnerabilities. However, this requires the researcher to carefully craft the cue, which many people may not be able to imitate.
It was said that the prompts used in this experiment will not be shared publicly and will only be given to other researchers who request it.
Through Tom Hardware