Prompt Engineering for Code Analysis: Lessons from Snapfix
How I iterated on LLM prompts to get consistent, accurate error analysis across GPT-4, Claude, and CodeLlama — the patterns that work and the ones that don't.
The Challenge
When you send a Python traceback to an LLM, you get wildly different responses depending on the model, the prompt structure, and the amount of context you provide. For Snapfix to be reliable, I needed consistent, accurate analysis across 3 different AI providers.
What Doesn't Work
1. Raw Traceback Dumping
Analyze this error: [paste entire traceback]This gives you a vague explanation and a generic fix. Not useful.
2. Asking for 'The Answer'
What's wrong with this code and how do I fix it?Too open-ended. The model doesn't know what level of detail you want.
What Works
Structured Prompts with Explicit Sections
pythonSYSTEM_PROMPT = """ You are a Python debugging expert. Analyze the error and respond in EXACTLY this format: ## Error Type [One line: the exception class and what it means] ## Root Cause [2-3 sentences explaining WHY this error occurred] ## Fix [The specific code change needed, with before/after] ## Prevention [One sentence on how to avoid this in the future] """
Context Extraction
Don't send the entire traceback. Parse it first:
pythondef extract_context(traceback: str) -> dict: return { 'error_type': parse_exception_class(traceback), 'error_message': parse_message(traceback), 'file_path': parse_file(traceback), 'line_number': parse_line(traceback), 'code_snippet': get_surrounding_lines(traceback, context=5), }
Model-Specific Adjustments
Results
After iterating on prompts for 3 weeks:
Key Takeaway
Prompt engineering is software engineering. Version control your prompts, measure their performance, and iterate based on data — not vibes.
Written by Ansh Gautam
Full-stack engineer building production systems with FastAPI, React, and AI/LLM integrations. Currently looking for backend engineering & AI integration roles.