Building a Multi-Provider LLM Pipeline with LiteLLM
Why I designed Snapfix to swap between GPT-4, Claude, and Ollama with zero code changes — and how the architecture works.
Why Multi-Provider?
When I started building Snapfix, I locked it to OpenAI. But contributors wanted local model support, and I realized Claude handled certain error types better. The question became: how do you support 3+ AI providers without maintaining 3 separate codebases?
The Problem with Single-Provider Lock-in
python# The naive approach — tightly coupled to OpenAI from openai import OpenAI def analyze_error(traceback: str) -> str: client = OpenAI() response = client.chat.completions.create( model='gpt-4', messages=[{'role': 'user', 'content': traceback}] ) return response.choices[0].message.content
This works until you want to add Claude. Then Ollama. Each has different APIs, different auth patterns, different response formats. You end up with a mess of if provider == 'openai' conditionals.
Enter LiteLLM
LiteLLM provides a unified interface across 100+ AI providers. One function call, same parameters, different providers.
pythonfrom litellm import completion def analyze_error(traceback: str, model: str = 'gpt-4') -> str: response = completion( model=model, # 'gpt-4', 'claude-3-sonnet', 'ollama/codellama' messages=[{'role': 'user', 'content': traceback}] ) return response.choices[0].message.content
My Architecture
Provider Configuration
pythonPROVIDERS = { 'openai': {'model': 'gpt-4', 'max_tokens': 2000}, 'anthropic': {'model': 'claude-3-5-sonnet-20240620', 'max_tokens': 2000}, 'ollama': {'model': 'ollama/codellama', 'max_tokens': 2000}, }
Fallback Chain
If the primary provider fails (rate limit, timeout, API down), the system automatically falls back:
1. Try user's selected provider
2. If it fails, try the next in the chain
3. If all cloud providers fail, fall back to local Ollama
4. If everything fails, return a structured error message
Streaming Support
All providers stream through the same SSE (Server-Sent Events) endpoint:
python@app.post('/api/analyze') async def analyze(request: AnalyzeRequest): async def generate(): response = completion( model=request.model, messages=build_prompt(request.traceback), stream=True ) for chunk in response: if chunk.choices[0].delta.content: yield f'data: {chunk.choices[0].delta.content}\n\n' return StreamingResponse(generate(), media_type='text/event-stream')
The Prompt Pipeline
The real secret isn't the provider abstraction — it's the prompt. I built a structured prompt pipeline that works consistently across all models:
1. Context Extraction: Parse the traceback, identify the error type, extract the relevant code
2. Prompt Construction: Build a structured prompt with sections for error analysis, root cause, and fix suggestion
3. Response Parsing: Normalize the response into a consistent format regardless of provider
Results
Key Takeaway
Don't build for one AI provider. The landscape changes too fast. Use an abstraction layer, build your prompts to be model-agnostic, and always have a local fallback.
Written by Ansh Gautam
Full-stack engineer building production systems with FastAPI, React, and AI/LLM integrations. Currently looking for backend engineering & AI integration roles.