EUT03: Unit Testing to Evaluate LLM-Generated Code
Important!
Please read this entire document carefully before starting the assignment. Understanding all requirements and the workflow is essential for completing this exercise successfully.
Assignment Context
This assignment is part of the Unit Testing Module, where you will apply unit testing methodologies to evaluate AI-generated code. You will use Jest and TypeScript to write comprehensive test suites that verify the correctness of LLM-generated implementations.
Purpose
This assignment challenges you to critically examine and find faults or issues with code generated by Large Language Models (LLMs) such as ChatGPT and Copilot. Rather than simply accepting AI-generated code, you are encouraged to question its correctness, assess the extent to which it adheres to the requirements, identify its limitations, and reflect on how these tools impact software engineering practice. You will experience both the strengths and weaknesses of LLMs and gain insight into the importance of critical evaluation when using AI for code generation.
By completing this assignment, you will be in a better position to:
- Apply unit testing principles to verify code correctness.
- Recognize how LLMs can assist in code generation.
- Notice common pitfalls, errors, or oversights in LLM-generated code.
- Explore strategies for testing, validating, and improving AI-generated code.
- Reflect on the pros and cons of using LLMs in real-world software engineering.
Objective
This assignment is not just about writing correct code. It is about evaluating the output of a Large Language Model (LLM) through the lens of unit testing, understanding its strengths and limitations, and learning how to critically assess and improve AI-generated code using comprehensive test coverage.
In this exercise, you will:
- Use an LLM (e.g., ChatGPT, Copilot) to generate your implementation of the email validation function.
- Write your own test cases to verify the correctness of your implementation.
- After you have completed your implementation and written your own tests, download and run the instructor’s test cases to:
- Identify test cases you may have missed
- Discover edge cases you did not consider
- Evaluate the robustness of the LLM-generated code
Getting Started: A zip file containing skeleton code is provided to help you get started. Extract the contents and follow the setup instructions below.
Setup
- Download the provided skeleton code from here
- Navigate to the project directory
- Install dependencies:
- Verify setup by running:
Requirements
Implementation
- Create a function
isValidEmail(email: string): boolean in src/isValidEmail.ts.
- It should return
true if the input is a valid email address, false otherwise.
- Do not use any third-party libraries.
- You must use an LLM (e.g., ChatGPT, Copilot) for your implementation.
Unit Tests
- The file
tests/student_isValidEmail.test.ts is provided in the skeleton code for your test cases.
- Write a comprehensive set of test cases using Jest to verify your implementation. You must design these test cases yourself, based on your understanding of the requirements.
- Apply unit testing best practices: test edge cases, boundary conditions, and valid/invalid inputs.
- Do not use an LLM to generate the test cases.
- From the project root, run your test cases:
npm test tests/student_isValidEmail.test.ts
Validating with Instructor Tests
- To help you evaluate the LLM-generated code and identify its shortcomings, we have provided a set of instructor test cases.
- Download the instructor test file:
- Download
instructor_isValidEmail.test.ts from here
- Place it in the
tests/ directory of your project
- Do not download or look at the instructor test file until you have completed your own tests and implementation.
- From the project root, after placing the instructor test file in the
tests/ directory, run the instructor test cases:
npm test tests/instructor_isValidEmail.test.ts
- Important: Assume all instructor test cases are valid and correct. If any fail, the issue is with your LLM-generated implementation.
- Use failing instructor tests to:
- Identify edge cases or scenarios you missed in your own tests (for reflection purposes)
- Fix the LLM-generated implementation to handle these cases correctly
- Document the shortcomings and limitations of the LLM’s initial output
- Understand where the LLM misunderstood or oversimplified the requirements
Note: Do not update your student test cases after seeing the instructor tests. The goal is to understand and fix the LLM implementation, not to retrofit your tests.
Process Documentation
Include a pdf file named llm_notes.pdf that contains:
Prompts Used
- The prompt(s) you used with the LLM.
Analysis of LLM Output
- Which requirements were fulfilled?
- Which were missed, misunderstood, or incorrectly implemented?
Debugging and Improvements
- How you debugged or improved the LLM-generated implementation.
- How you used the instructor test cases to identify shortcomings in the LLM-generated code.
- What scenarios did the instructor tests cover that you missed in your own tests? (Reflection only - do not update your student tests)
- What specific changes did you make to the LLM code to pass the instructor tests?
- What does this reveal about the limitations of LLM-generated code?
(Optional) Linting Notes
- What issues did the linter find?
- Which ones did you fix vs. keep (and why)?
- To what extent did you use an LLM to fix the linting issues, and how?
- Did your code have more linting issues than the LLM-generated code?
- Did the linter catch anything your tests did not?
Reflections
- How useful was the LLM?
- What did the LLM get right? What did it miss?
- What did it assume or overlook?
- How did you spot the missing or incorrect parts?
- How would/did you rewrite your prompt to get better output?
- How did your understanding of the requirements influence the prompts or corrections?
Submission
Upload the following files to Lamaku:
src/isValidEmail.ts - Your LLM-generated implementation
tests/student_isValidEmail.test.ts - Your own test cases
llm_notes.pdf - Documentation and reflection
Note: Remember to download instructor_isValidEmail.test.ts from the link above and place it in your tests/ directory only after you have completed your implementation and written your own tests. These tests serve as a validation tool to help you discover what you or the LLM may have missed.