Cover Image for Supply chain threats highlight vulnerabilities in LLMs and artificial intelligence.
Wed Nov 20 2024

Supply chain threats highlight vulnerabilities in LLMs and artificial intelligence.

The risks in the artificial intelligence supply chain highlight the vulnerabilities in open-source language models.

Recently, concerns about risks in the software supply chain have increased, especially regarding open-source artificial intelligence (AI) software. Research has revealed that on Hugging Face, a platform similar to GitHub but dedicated to AI models, there could be as many as one hundred potentially malicious models. This finding underscores the vulnerabilities that can surprise development teams when working with machine learning (ML) models, datasets, or demonstration applications.

It's not just Hugging Face that presents these risks, as PyTorch, another open-source ML library developed by Facebook's AI research lab, faces similar concerns. Despite being widely used in deep learning applications and offering a flexible platform for building and deploying neural networks, its recent compromise raises questions about the reliability of AI models from open-source sites, amid fears that their content may have been tainted by malicious actors.

While the open-source community has promoted collaboration and the development of ideas, the possibility of malicious actors infiltrating the supply chain jeopardizes security by corrupting CI/CD pipelines or altering components that were assumed to be safe.

As organizations show interest in incorporating large language models (LLMs) into their portfolios, cybersecurity leaders must adapt to new supply chain security challenges, not only in the traditional DevSecOps domain but also in ML operations (MLOps) and LLM operations (LLMOps). It is essential for CISOs and security professionals to be vigilant in detecting malicious datasets and to react quickly to potential supply chain attacks.

The Open Worldwide Application Security Project (OWASP) has created a set of vulnerabilities specific to LLMs, highlighting that these are not the same as traditional vulnerabilities, and their significance lies in the context of AI development. Among the most common vulnerabilities are command injections, which allow attackers to manipulate an LLM through carefully crafted inputs, potentially generating corrupt outputs and propagating insecure code in connected systems.

Another issue is the tendency of LLMs to "hallucinate," generating inaccurate or misleading information. These hallucinations could be exploited by malicious actors to introduce insecure code, thus presenting new types of attacks in the supply chain. Input manipulation may also cause harmful outputs that evade detection, profoundly affecting the software development lifecycle.

To address these challenges, it is essential to implement a Zero Trust approach, which includes filtering data coming from LLMs to users and backend systems. It is also important to employ cryptographic techniques, such as digital signatures, which ensure the authenticity and integrity of models, crucial in environments where models are distributed via cloud services.

The use of watermarks within models also offers an additional layer of protection, allowing for the tracking of ownership even if the model is duplicated or stolen. Tools like Model Cards and the Software Bill of Materials (SBOM) are useful for increasing transparency over complex software systems, helping to identify vulnerabilities and better understand software composition.

As innovations like ML-BOM (Machine Learning Bill of Materials) gain relevance, they provide clear visibility into the components and processes of ML development and deployment. This is vital for identifying and addressing vulnerabilities and for ensuring compliance with transparency and governance regulations.

Finally, extending DevSecOps strategies to LLMOps includes selecting appropriate models, cleaning training data, and establishing adequate monitoring in systems. All these measures constitute a framework for ensuring the secure development of LLMs while fostering creativity and innovation in the use of AI under a Zero Trust approach.