SecureGPT: Vulnerability Scanner for ChatGPT

Example output on the ChatGPT frontend.

Here's the problem...

Picture this. Your software team is trying to streamline their development process by using AI code generation tools, like the ChatGPT we're all so familiar with now. A few weeks later, you've got a slew of new reports of critical security bugs. As it turns out, ChatGPT-generated code suffers from a lot of common security vulnerabilities [1][2][3]. Though these vulnerabilities can be mitigated through corrective prompting, many developers may not notice these code vulnerabilities until it's too late. While both static and dynamic vulnerability analysis tools exist for device code, what if you could check for code vulnerabilities long before integrating GPT-generated code into a larger project?

Our Solution

We developed a prototype fo a browser extension compatible with all browsers running Manifest V2 in JavaScript that interacts with the chat.openai.com frontend to grab code blocks generated by ChatGPT and run them through a static vulnerability scanner. The results of the scan are consolidated into a letter grade- A through F- on the code's overall security, and the details of specific vulnerabilities found. We display the results of the scan after the GPT code block, so developers can easily compare the results with the code. Currently, this extension only works for generated JavaScript code snippets, but we plan on extending compatibility to other coding languages.

How does it work?

Monitor chat.openai.com for generated code blocks
Capture any generated code blocks.
Send code block to the vulnerability scanner (WIP)
Consolidate a report on any returned vulnerabilities
Display a letter grade A - F for the generated code's overall security, and details on specific vulnerabilities.

Challenges

The initial main challenge was identifying static code analysis libraries. There are several that exist with different capabilities with a wide variety of ranges in the vulnerabilities they are looking for. Additionally, several of these libraries only review a single programming language. For this initial proof of concept, we are using ESLint to analyze JavaScript code.

Accomplishments

Finding a way to extract the embedded code blocks and convert that for the static code analyzer to be able to consume it was a big accomplishment. Because ChatGPT pushes text updates to the client-side token-by-token, we had to play around with code sleep times to get something that won't take too long to load, but will still be able to wait long enough for the full code block to be printed- a 10 second wait accomplishes this in the case of most scripts. To further improve this, we could look more into the front-end scripts to see if there are any events called when GPT finishes printing client-side text, rather than when it starts.

What we learned

The page setup of chat.openai.com makes it very clear to identify code blocks, additionally the code block is labeled with the programming language. This was an extremely helpful find. We learned a bit more about HTML traversal and content scraping, as well as how to connect a browser-side JS extension to another client tool (vulnerability scanner).

What's next?

Extend the grader to incorporate more or different static analysis libraries to support additional languages.
Make wait times for GPT text dynamic on the client side so that long code blocks can be fully captured.

Citations

[1] R. Khoury, A. R. Avila, J. Brunelle and B. M. Camara, "How Secure is Code Generated by ChatGPT?," 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, Oahu, HI, USA, 2023, pp. 2445-2451, doi: 10.1109/SMC53992.2023.10394237.

[2] S. Hamer, M. d'Amorim, and L. Williams, "Just another copy and paste? Comparing the security vulnerabilities of ChatGPT generated code and StackOverflow answers," IEEE Symposium on Security and Privacy Workshops (SPW), 2024, doi: 10.48550/arXiv.2403.15600

[3] Z. Liu, Y. Tang, X. Luo, Y. Zhou and L. F. Zhang, "No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT," in IEEE Transactions on Software Engineering, doi: 10.1109/TSE.2024.3392499.