Critical AI Infrastructure Security Threat: Reproducing and Detecting the NVIDIA Triton Critical Vulnerability(CVE-2025-23316)

AI

After an outcry, OpenAI swiftly rereleased 4o to paid users. But experts say it should not have removed the model so suddenly.

OpenAI’s decision to replace 4o with the more straightforward GPT-5 follows a steady drumbeat of news about the potentially harmful effects of extensive chatbot use. Reports of incidents in which ChatGPT sparked psychosis in users have been everywhere for the past few months, and in a blog post last week, OpenAI acknowledged 4o’s failure to…

AI

‘Cheapfake’ AI Celeb Videos Are Rage-Baiting People on YouTube

“They’re tweaking my voice or whatever they’re doing, tweaking their own voice to make it sound like me, and people are commenting on it like it is me and it ain’t me,” Washington recently told WIRED, when asked about AI. “I don’t have an Instagram account. I don’t have TikTok. I don’t have any of…

AI

GPT-5 Doesn’t Dislike You—It Might Just Need a Benchmark for Emotional Intelligence

Since the all-new ChatGPT launched on Thursday, some users have mourned the disappearance of a peppy and encouraging personality in favor of a colder, more businesslike one (a move seemingly designed to reduce unhealthy user behavior.) The backlash shows the challenge of building artificial intelligence systems that exhibit anything like real emotional intelligence. Researchers at…

AI

OpenAI Designed GPT-5 to Be Safer. It Still Outputs Gay Slurs

OpenAI is trying to make its chatbot less annoying with the release of GPT-5. And I’m not talking about adjustments to its synthetic personality that many users have complained about. Before GPT-5, if the AI tool determined it couldn’t answer your prompt because the request violated OpenAI’s content guidelines, it would hit you with a…

NVIDIA Triton Inference Server is an open-source AI model inference platform that supports multiple deep learning frameworks and is widely used for deploying machine learning models in production environments. Recently, a Critical vulnerability was disclosed on NVIDIA’s official website(Security Bulletin: NVIDIA Triton Inference Server – September 2025 | NVIDIA): NVIDIA Triton Inference Server for Windows and Linux contains a vulnerability in the Python backend, where an attacker could cause a remote code execution by manipulating the model name parameter in the model control APIs. A successful exploit of this vulnerability might lead to remote code execution, denial of service, information disclosure, and data tampering.

The affected versions are those earlier than version 25.08. Automated detection for this vulnerability is now available in Tencent’s open-source AI-Infra-Guard framework(GitHub – Tencent/AI-Infra-Guard: A.I.G (AI-Infra-Guard) is a comprehensive, intelligent, and easy-to-use AI Red Teaming platform developed by Tencent Zhuque Lab.), as verified through testing.

Python Backend and Security Audits

The Triton backend for Python(Python Backend). The goal of Python backend is to let you serve models written in Python by Triton Inference Server without having to write any C++ code.

We can quickly set up the environment using a Dockercontainer. To facilitate GDBdebugging, certain parameter options can be added when starting the Docker container to lift restrictions.

docker run --shm-size=2g --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -p 8000:8000 -p 1234:1234 --name tritonserver25.07 -it -d nvcr.io/nvidia/tritonserver:25.07-py3

NVIDIA Triton Inference Server provides an API interface for loading models(server/docs/protocol/extension_model_repository.md at main · triton-inference-server/server · GitHub):

Direct access will prompt: load / unload is not allowed .

Adding –model-control-mode=explicit to the startup parameters means you can manually load or unload models, which resolves this issue(Error when load model with http api · Issue #2633 · triton-inference-server/server · GitHub).

/opt/tritonserver/bin/tritonserver --model-repository=/tmp/models --model-control-mode=explicit

In the previous article(https://medium.com/@Qubit18/from-xss-to-rce-critical-vulnerability-chain-in-anthropic-mcp-inspector-cve-2025-58444-7092ba4ac442), I inadvertently discovered an open-source AI security detection framework called AI-Infra-Guard(GitHub – Tencent/AI-Infra-Guard: A.I.G (AI-Infra-Guard) is a comprehensive, intelligent, and easy-to-use AI Red Teaming platform developed by Tencent Zhuque Lab.). What amazed me is that it can not only detect MCP services, but also integrates vulnerability scanning capabilities for AI infrastructure.

The local deployment process is as follows:

git clone https://github.com/Tencent/AI-Infra-Guard.git
cd AI-Infra-Guard
docker-compose -f docker-compose.images.yml up -d

We only need to input the target URL to initiate the detection, Detection results are as follows:

It can help us harden the security of our AI infrastructure. If we need to understand the root cause of vulnerabilities, further analysis is required.

How to trigger command injection?

Enable process monitoring, set the request backend to python, and then make the request again. You will observe that the handler initiates multiple subprocesses for command execution, where the command string contains controllable parameters from the API request.

Start GDB and you will see that the program loads the shared object file for the Python component libtriton_python.so .

Let’s analyze the processing flow of the API request. The URL is handled by HTTPAPIServer::Handle , and the corresponding processing function is HandleRepositoryControl.

else if (RE2::FullMatch(
             std::string(req->uri->path->full), modelcontrol_regex_, //modelcontrol_regex_( R"(/v2/repository(?:/([^/]+))?/(index|models/([^/]+)/(load|unload)))"),
             &repo_name, &kind, &model_name, &action)) {
// model repository
if (kind == "index") {
  HandleRepositoryIndex(req, repo_name);
  return;
} else if (kind.find("models", 0) == 0) {
  HandleRepositoryControl(req, repo_name, model_name, action);
  return;
}

In HandleRepositoryControl , the parameters of the POST request are parsed, after which triton::core::CreateModel is called to attempt to create the model. Based on the specified backend type, the corresponding backend processing component is selected. For example, when the backend value is python, the triton::backend::python component is invoked, and the execution enters the StubLauncher::Launch function. The call stack obtained during dynamic debugging is as follows:

In the StubLauncher::Launch function within stub_launcher.cc , a character array stub_args is created to store the parameters for the execvp command execution. The concatenated string ss, which originates from the request parameters, is used in this process, resulting in a command injection vulnerability.

const char* stub_args[4];
stub_args[0] = "bash";
stub_args[1] = "-c";
stub_args[3] = nullptr;  // Last argument must be nullptr   [0]
...
ss << "source " << path_to_activate_
   << " && exec env LD_LIBRARY_PATH=" << path_to_libpython_
   << ":$LD_LIBRARY_PATH " << python_backend_stub << " " << model_path_
   << " " << shm_region_name_ << " " << shm_default_byte_size_ << " "
   << shm_growth_byte_size_ << " " << parent_pid_ << " " << python_lib_
   << " " << ipc_control_handle_ << " " << stub_name << " "
   << runtime_modeldir_;
ipc_control_->uses_env = true;
bash_argument = ss.str();      //[1]
...
stub_args[2] = bash_argument.c_str(); //[2]
...
execvp("bash", (char**)stub_args);  //[3]

We can exploit this vulnerability to achieve Remote Code Execution.

Patches Commit

The input parameters have been validated(fix: Add input validation to model load by mattwittwer · Pull Request #404 · triton-inference-server/python_backend · GitHub).