AI Engineering2 min read·Jun 25, 2026

Prompt Engineering for Code Analysis: Lessons from Snapfix

How I iterated on LLM prompts to get consistent, accurate error analysis across GPT-4, Claude, and CodeLlama — the patterns that work and the ones that don't.

AILLMPrompt EngineeringPythonOpenAIClaude
ShareLinkedInX / Twitter

The Challenge

When you send a Python traceback to an LLM, you get wildly different responses depending on the model, the prompt structure, and the amount of context you provide. For Snapfix to be reliable, I needed consistent, accurate analysis across 3 different AI providers.

What Doesn't Work

1. Raw Traceback Dumping

Analyze this error: [paste entire traceback]

This gives you a vague explanation and a generic fix. Not useful.

2. Asking for 'The Answer'

What's wrong with this code and how do I fix it?

Too open-ended. The model doesn't know what level of detail you want.

What Works

Structured Prompts with Explicit Sections

python
SYSTEM_PROMPT = """ You are a Python debugging expert. Analyze the error and respond in EXACTLY this format: ## Error Type [One line: the exception class and what it means] ## Root Cause [2-3 sentences explaining WHY this error occurred] ## Fix [The specific code change needed, with before/after] ## Prevention [One sentence on how to avoid this in the future] """

Context Extraction

Don't send the entire traceback. Parse it first:

python
def extract_context(traceback: str) -> dict: return { 'error_type': parse_exception_class(traceback), 'error_message': parse_message(traceback), 'file_path': parse_file(traceback), 'line_number': parse_line(traceback), 'code_snippet': get_surrounding_lines(traceback, context=5), }

Model-Specific Adjustments

GPT-4: Handles structured output well. Use explicit format instructions.
Claude: Better with natural language instructions. Give it context about the user's skill level.
CodeLlama: Needs simpler prompts. Don't ask for multiple sections — ask one question at a time.

Results

After iterating on prompts for 3 weeks:

Fix accuracy improved from 62% to 89% (measured by manual review)
Response format consistency went from 45% to 97%
Average response time decreased by 30% (less tokens = faster)

Key Takeaway

Prompt engineering is software engineering. Version control your prompts, measure their performance, and iterate based on data — not vibes.

AG

Written by Ansh Gautam

Full-stack engineer building production systems with FastAPI, React, and AI/LLM integrations. Currently looking for backend engineering & AI integration roles.