AI Engineering2 min read·Jun 25, 2026

Prompt Engineering for Code Analysis: Lessons from Snapfix

How I iterated on LLM prompts to get consistent, accurate error analysis across GPT-4, Claude, and CodeLlama — the patterns that work and the ones that don't.

AILLMPrompt EngineeringPythonOpenAIClaude

ShareLinkedIn X / Twitter

The Challenge

When you send a Python traceback to an LLM, you get wildly different responses depending on the model, the prompt structure, and the amount of context you provide. For Snapfix to be reliable, I needed consistent, accurate analysis across 3 different AI providers.

What Doesn't Work

1. Raw Traceback Dumping

Analyze this error: [paste entire traceback]

This gives you a vague explanation and a generic fix. Not useful.

2. Asking for 'The Answer'

What's wrong with this code and how do I fix it?

Too open-ended. The model doesn't know what level of detail you want.

What Works

Structured Prompts with Explicit Sections

python
SYSTEM_PROMPT = """
You are a Python debugging expert. Analyze the error and respond in EXACTLY this format:

## Error Type
[One line: the exception class and what it means]

## Root Cause
[2-3 sentences explaining WHY this error occurred]

## Fix
[The specific code change needed, with before/after]

## Prevention
[One sentence on how to avoid this in the future]
"""

Context Extraction

Don't send the entire traceback. Parse it first:

python
def extract_context(traceback: str) -> dict:
    return {
        'error_type': parse_exception_class(traceback),
        'error_message': parse_message(traceback),
        'file_path': parse_file(traceback),
        'line_number': parse_line(traceback),
        'code_snippet': get_surrounding_lines(traceback, context=5),
    }

Model-Specific Adjustments

GPT-4: Handles structured output well. Use explicit format instructions.

Claude: Better with natural language instructions. Give it context about the user's skill level.

CodeLlama: Needs simpler prompts. Don't ask for multiple sections — ask one question at a time.

Results

After iterating on prompts for 3 weeks:

Fix accuracy improved from 62% to 89% (measured by manual review)

Response format consistency went from 45% to 97%

Average response time decreased by 30% (less tokens = faster)

Key Takeaway

Prompt engineering is software engineering. Version control your prompts, measure their performance, and iterate based on data — not vibes.

Written by Ansh Gautam

Full-stack engineer building production systems with FastAPI, React, and AI/LLM integrations. Currently looking for backend engineering & AI integration roles.

Hire Me →View Projects →