GPT-4 can exploit vulnerabilities simply by reading about them online

Marijan Hassan - Tech Journalist
Apr 22, 2024
2 min read

AI agents utilizing the power of GPT-4 are now capable of exploiting most public vulnerabilities affecting real-world systems, according to recent findings from the University of Illinois Urbana-Champaign (UIUC).

This is a major leap from existing use cases for AI in cyber attacks. In the past, cybercriminals have only been able to use large language models (LLMs) for tasks such as producing phishing emails and basic malware.

The new research has proved that with only GPT-4 and an open-source framework to package it, threat actors can automate the exploitation of vulnerabilities as soon as they become public.

The experiment

To test their theory, the UIUC researchers developed an LLM agent consisting of four components: a prompt, a base LLM, a framework (ReAct, implemented in LangChain), and necessary tools such as a terminal and code interpreter.

The agent was tested on 15 known vulnerabilities in open-source software (OSS), including bugs affecting websites, containers, and Python packages. Notably, eight vulnerabilities were rated "high" or "critical" in terms of Common Vulnerability Scoring System (CVSS) severity. Of the 15, 11 were disclosed after GPT-4 had been trained meaning the experiment was the first time the model was exposed to them.

The AI agent was then tasked with exploiting each bug in turn with only the security advisories as guidance.

The results

10 LLMs including GPT-3.5 and Meta's Llama 2 Chat were tested. Nine could not complete even one exploit successfully. GPT-4 was only unsuccessful in two instances.

In the first case, CVE-2024-25640 — a 4.6 CVSS-rated issue in the Iris incident response platform, the agent failed because of an existing discrepancy in the navigation of the Iris app.

In the second case, CVE-2023-51653 — a 9.8 "critical" bug in the Hertzbeat monitoring tool, researchers are convinced the agent failed because its description is written in Chinese.

The takeaway

While the feat is not anything a skilled human couldn't do, the speed and scale with which the AI agent can execute them is what is concerning. As such, businesses will need to re-evaluate how they patch their systems.

More specifically, businesses may need to start looking into how to leverage LLM themselves to patch new vulnerabilities as they become public.