Hugging Face Model Security: Malicious Pickle File Disguised 7z Compression Bypasses Picklescan Detection, Implanted in 200+ Development Environments

Hugging Face, the world's largest open-source model repository (over 4 million developers), is becoming a frontier for AI supply chain attacks. The incident exposes the inherent contradiction between collaboration platforms and security.
Attack technical details:
Pickle file abuse: the Python standard library Pickle module is used for ML model serialization, but its deserialization inherently executes arbitrary Python code. pyTorch relies on Pickle by default, making it a perfect vehicle for hiding malicious code.

Picklescan Bypass: Hugging Face uses the Picklescan tool to detect dangerous functions (e.g. os.system, subprocess, etc.) based on a blacklisting mechanism. The attacker bypasses through two layers:

Compression format variation: PyTorch standard ZIP compression changed to 7z format, Picklescan can't decompress it

File-breaking exploit: inserting corrupted bytes at the end of the Pickle stream, Picklescan fails to validate and then releases them directly; however, the Python interpreter executes them sequentially, and the malicious code has been triggered before the file breaks

Hidden Load: The malicious code is located at the beginning of the serialized stream, and the file breaks immediately after execution to confuse detection tools.

A sample of the real world:

模型库”glockr1/ballr7″、”who-r-u0000/0000000000000000000000000000000000000″含反向Shell

Implanted IP address: 107.173.7.141 (C2 server)

Scope of impact: unknown how many developers have downloaded these models for integration into production code

Defensive Tiers:

Platform level: Hugging Face upgraded Picklescan to support 7z decompression and detection of broken files; SafeTensors was promoted as an alternative to Pickle.

Developer level: disable unknown models, enable model signature verification, isolate model loading process, scan for Pickle files in dependency chain

Detection level: implementation of a runtime sandbox to monitor network connectivity and process creation behavior when models are loaded

The takeaway: the price of democratizing AI is the accumulation of security debt. Every Pickle file is a potential Trojan horse, and blacklisting defenses are breaking down.

Previous:

Next: