# Understanding the Risks of Using AI in Code Generation
Written on
Chapter 1: The Role of AI in Software Development
Recently, I engaged in a discussion with a friend about the utilization of AI models like GPT for software development. He is a startup founder who immerses himself in the intricacies of his product and leverages GPT to grasp new technologies and swiftly implement features in unfamiliar programming languages.
I, too, rely on Copilot to assist me in developing software using new languages and to help me get acquainted with various APIs. These LLM (Large Language Model) tools significantly reduce the time needed to acquire new programming skills. Engineers are increasingly using solutions such as Copilot and ChatGPT to produce code that meets production standards.
However, I wonder if you share my experience: the suggestions I receive from Copilot and the scripts generated by GPT are frequently inaccurate. I often encounter inefficient solutions, code with peculiar logic flaws, or snippets that appear correct but fail to function as intended. While I haven’t encountered outright vulnerable code from Copilot, it’s not unreasonable to suspect that such risks may exist.
Section 1.1: How LLM Code Generation Functions
The process of code completion using LLMs resembles that of text completion. The model is trained on sample code collected from diverse sources, predicting the next most probable token or symbol based on the current context. For instance, Copilot is trained on a substantial amount of public code from GitHub. If you are trying to implement a graph cycle-detection algorithm, Copilot will reference similar examples from its training data and provide suggestions tailored to your existing code.
Subsection 1.1.1: The Challenge of Insecure Code Generation
The crux of the issue is that there is no assurance that the generated code is secure. The sources of code include public repositories, blogs, and StackOverflow, which may contain vulnerabilities and outdated best practices. Relying on these inferior or outdated examples could lead the model to produce ineffective or insecure code. From my own experience with Copilot and GPT, these models are currently more often incorrect than accurate.
Even with high-quality training data, coding best practices can quickly become obsolete. Once secure coding guidelines, dependencies, and methodologies may be found vulnerable as new research surfaces. Code generation tools can mislead users by presenting outdated practices as secure.
Without adequate security oversight, this raises the risk of common software vulnerabilities. Some potential vulnerabilities due to insecure coding practices include SQL or command injections, XSS, and CSRF.
Implementing human oversight—having engineers thoroughly review AI-generated code for correctness, efficiency, and security—can help mitigate these risks. Historically, engineers invested significant time understanding and adapting code from platforms like StackOverflow to align with their specific needs. This level of scrutiny may diminish when relying on AI programming tools.
Section 1.2: The Risk of Code Poisoning in AI Training
A further concern is the potential for attackers to "poison" training data by marking deliberately vulnerable code in blog posts and open-source repositories as secure. If this occurs on a large scale, it can skew AI models into accepting these as secure examples—leading to the inclusion of insecure practices in code, or misguiding those unfamiliar with security requirements of specific technologies into believing they are best practices. Attackers might even insert intentional vulnerabilities into online code samples to embed backdoors in applications.