AI Engineering3 min read·Jun 25, 2026

Building a Multi-Provider LLM Pipeline with LiteLLM

Why I designed Snapfix to swap between GPT-4, Claude, and Ollama with zero code changes — and how the architecture works.

AILLMLiteLLMOllamaOpenAIArchitecturePython

ShareLinkedIn X / Twitter

Why Multi-Provider?

When I started building Snapfix, I locked it to OpenAI. But contributors wanted local model support, and I realized Claude handled certain error types better. The question became: how do you support 3+ AI providers without maintaining 3 separate codebases?

The Problem with Single-Provider Lock-in

python
# The naive approach — tightly coupled to OpenAI
from openai import OpenAI

def analyze_error(traceback: str) -> str:
    client = OpenAI()
    response = client.chat.completions.create(
        model='gpt-4',
        messages=[{'role': 'user', 'content': traceback}]
    )
    return response.choices[0].message.content

This works until you want to add Claude. Then Ollama. Each has different APIs, different auth patterns, different response formats. You end up with a mess of if provider == 'openai' conditionals.

Enter LiteLLM

LiteLLM provides a unified interface across 100+ AI providers. One function call, same parameters, different providers.

python
from litellm import completion

def analyze_error(traceback: str, model: str = 'gpt-4') -> str:
    response = completion(
        model=model,  # 'gpt-4', 'claude-3-sonnet', 'ollama/codellama'
        messages=[{'role': 'user', 'content': traceback}]
    )
    return response.choices[0].message.content

My Architecture

Provider Configuration

python
PROVIDERS = {
    'openai': {'model': 'gpt-4', 'max_tokens': 2000},
    'anthropic': {'model': 'claude-3-5-sonnet-20240620', 'max_tokens': 2000},
    'ollama': {'model': 'ollama/codellama', 'max_tokens': 2000},
}

Fallback Chain

If the primary provider fails (rate limit, timeout, API down), the system automatically falls back:

1. Try user's selected provider

2. If it fails, try the next in the chain

3. If all cloud providers fail, fall back to local Ollama

4. If everything fails, return a structured error message

Streaming Support

All providers stream through the same SSE (Server-Sent Events) endpoint:

python
@app.post('/api/analyze')
async def analyze(request: AnalyzeRequest):
    async def generate():
        response = completion(
            model=request.model,
            messages=build_prompt(request.traceback),
            stream=True
        )
        for chunk in response:
            if chunk.choices[0].delta.content:
                yield f'data: {chunk.choices[0].delta.content}\n\n'
    
    return StreamingResponse(generate(), media_type='text/event-stream')

The Prompt Pipeline

The real secret isn't the provider abstraction — it's the prompt. I built a structured prompt pipeline that works consistently across all models:

1. Context Extraction: Parse the traceback, identify the error type, extract the relevant code

2. Prompt Construction: Build a structured prompt with sections for error analysis, root cause, and fix suggestion

3. Response Parsing: Normalize the response into a consistent format regardless of provider

Results

Zero code changes needed to swap providers

Claude handles complex type errors 23% better (measured by fix accuracy)

GPT-4 responds 40% faster for simple syntax errors

Ollama provides offline support with no API costs

Key Takeaway

Don't build for one AI provider. The landscape changes too fast. Use an abstraction layer, build your prompts to be model-agnostic, and always have a local fallback.

Written by Ansh Gautam

Full-stack engineer building production systems with FastAPI, React, and AI/LLM integrations. Currently looking for backend engineering & AI integration roles.

Hire Me →View Projects →