Building Real-Time Data Sync at Scale: WebSocket Patterns for Sub-50ms Latency
A deep dive into the architectural patterns and optimization techniques I used to achieve sub-50ms synchronization across 300+ concurrent WebSocket connections in a production attendance system.
The Problem
When I started building the attendance management system, the initial requirement seemed straightforward: track student locations via GPS and sync attendance status in real time. But "real time" has teeth. Administrators expected instant feedback when a student checked in—no lag, no batching delays, no excuses. With 300+ concurrent users on a modest VPS, I needed to move fast without burning CPU or network bandwidth.
The naive approach would've been to broadcast every GPS update to every connected client. That's a recipe for message explosion. Instead, I had to think about what actually needs to sync, when it needs to sync, and how to pack it efficiently over the wire.
Understanding the Bottleneck
WebSocket latency doesn't come from a single source. It's a stack:
1. Message serialization overhead: JSON encoding on every update.
2. Database round-trips: Querying state before broadcasting.
3. Inefficient message routing: Sending data to clients that don't need it.
4. Event loop contention: Blocking operations starving async tasks.
5. Network buffering: Small, frequent messages getting delayed by TCP's Nagle algorithm.
I measured baseline performance with a simple ping-pong test: a client sends a message, the server echoes it back. With unoptimized code, round-trip latency hovered around 150–200ms. That's unacceptable for attendance tracking.
Architecture: Separation of Concerns
The key insight was to split responsibilities:
This meant not every GPS update triggered a WebSocket message. A student could send GPS pings every 5 seconds, but their attendance status only changed when they crossed a geofence or an admin marked them absent.
Implementation: FastAPI + Async I/O
FastAPI's async foundation is essential here. Every blocking operation—database queries, file I/O, external API calls—needs to be non-blocking, or the event loop freezes and all 300 connections suffer.
pythonfrom fastapi import FastAPI, WebSocket from fastapi.concurrency import asynccontextmanager import asyncio import json from typing import Set, Dict from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession from sqlalchemy.orm import sessionmaker app = FastAPI() # Connection pool for async database access engine = create_async_engine( "postgresql+asyncpg://user:pass@localhost/attendance", echo=False, pool_size=20, max_overflow=10, ) AsyncSessionLocal = sessionmaker( engine, class_=AsyncSession, expire_on_commit=False ) # Track active connections per school/class active_connections: Dict[str, Set[WebSocket]] = {} @app.websocket("/ws/attendance/{class_id}") async def websocket_endpoint(websocket: WebSocket, class_id: str): await websocket.accept() if class_id not in active_connections: active_connections[class_id] = set() active_connections[class_id].add(websocket) try: while True: # Receive GPS update from client data = await websocket.receive_json() # Non-blocking validation and processing student_id = data.get("student_id") lat, lon = data.get("lat"), data.get("lon") # Async database operation async with AsyncSessionLocal() as session: # Check if this location crosses a geofence status_changed = await check_geofence( session, student_id, lat, lon ) if status_changed: # Only broadcast if state actually changed new_status = status_changed["status"] broadcast_payload = { "type": "status_update", "student_id": student_id, "status": new_status, "timestamp": status_changed["timestamp"], } # Broadcast to all clients in this class await broadcast_to_class( class_id, broadcast_payload ) except Exception as e: print(f"WebSocket error: {e}") finally: active_connections[class_id].discard(websocket) await websocket.close() async def check_geofence(session, student_id, lat, lon): """Check if student crossed a geofence boundary.""" from sqlalchemy import select from models import Student, Geofence, AttendanceRecord # Fetch student's current status and geofence result = await session.execute( select(Student, Geofence).where(Student.id == student_id) ) student, geofence = result.first() or (None, None) if not geofence: return None # Simple distance check (in production, use PostGIS) distance = ((lat - geofence.lat) ** 2 + (lon - geofence.lon) ** 2) ** 0.5 inside = distance < geofence.radius_km # Check if status changed if inside and student.status != "present": student.status = "present" student.check_in_time = asyncio.get_event_loop().time() session.add(student) await session.commit() return { "status": "present", "timestamp": student.check_in_time, } return None async def broadcast_to_class(class_id: str, payload: dict): """Broadcast to all connected clients in a class.""" if class_id not in active_connections: return message = json.dumps(payload) disconnected = set() for websocket in active_connections[class_id]: try: await websocket.send_text(message) except Exception: disconnected.add(websocket) # Clean up dead connections active_connections[class_id] -= disconnected
Optimization: Message Batching
Even with async I/O, sending 300 individual JSON messages for each state change is wasteful. I implemented a batching layer that collects updates over a small time window (10–20ms) and sends them in one payload.
pythonfrom collections import defaultdict import time class MessageBatcher: def __init__(self, batch_window_ms: int = 15): self.batch_window = batch_window_ms / 1000.0 self.pending: Dict[str, list] = defaultdict(list) self.last_flush: Dict[str, float] = defaultdict(float) async def add_message(self, class_id: str, payload: dict): """Add a message to the batch queue.""" self.pending[class_id].append(payload) # Flush if window elapsed or batch is large now = time.time() if (now - self.last_flush[class_id] > self.batch_window or len(self.pending[class_id]) > 50): await self.flush(class_id) async def flush(self, class_id: str): """Send all pending messages for a class.""" if not self.pending[class_id]: return batch_payload = { "type": "batch_update", "updates": self.pending[class_id], "timestamp": time.time(), } await broadcast_to_class(class_id, batch_payload) self.pending[class_id].clear() self.last_flush[class_id] = time.time() batcher = MessageBatcher(batch_window_ms=15)
This reduced message volume by ~70% in typical scenarios while keeping latency under 50ms.
Client-Side: Efficient State Management
The server isn't the only place to optimize. The client (built with Leaflet.js) caches attendance state locally and only re-renders when the server sends a delta.
javascriptclass AttendanceManager { constructor(classId) { this.classId = classId; this.localState = new Map(); // student_id -> status this.ws = null; this.pendingGpsUpdates = []; this.gpsFlushInterval = 5000; // Send GPS every 5 seconds this.connect(); } connect() { this.ws = new WebSocket(`wss://api.example.com/ws/attendance/${this.classId}`); this.ws.onmessage = (event) => this.handleMessage(JSON.parse(event.data)); } handleMessage(payload) { if (payload.type === "batch_update") { // Process multiple updates at once for (const update of payload.updates) { this.localState.set(update.student_id, update.status); } // Single re-render this.renderMap(); } } sendGpsUpdate(studentId, lat, lon) { // Batch GPS updates locally before sending this.pendingGpsUpdates.push({ student_id: studentId, lat, lon }); } flushGps() { if (this.pendingGpsUpdates.length > 0) { this.ws.send(JSON.stringify({ type: "gps_batch", updates: this.pendingGpsUpdates, })); this.pendingGpsUpdates = []; } } } // Flush GPS every 5 seconds setInterval(() => manager.flushGps(), 5000);
Measuring and Validating
I built a latency profiler to track end-to-end timing:
pythonimport time from dataclasses import dataclass @dataclass class LatencyMetric: client_send_time: float server_receive_time: float server_process_time: float broadcast_time: float client_receive_time: float @property def round_trip_ms(self): return (self.client_receive_time - self.client_send_time) * 1000 @property def server_process_ms(self): return (self.broadcast_time - self.server_receive_time) * 1000 # Client includes timestamp in payload # Server echoes it back with server timestamps # Client calculates round-trip on echo
After optimization, median latency settled at 38–45ms under normal load, peaking at ~70ms during database contention.
Key Takeaways
Delta-only updates: Don't broadcast state that hasn't changed. GPS is noise; attendance status is signal.
Async all the way down: A single blocking database query in your WebSocket handler will stall all 300 connections. Use async SQLAlchemy, async database drivers, and async utilities.
Batch at the edges: Collect updates over milliseconds, send once. Reduces network overhead and client re-renders.
Connection pooling matters: A small pool (20–30 connections) with sensible overflow settings prevents database exhaustion.
Profile under load: Latency at 10 concurrent users tells you nothing. Test with your actual target concurrency and measure both client and server timing.
WebSocket performance isn't magic—it's discipline. Measure, batch, cache, and keep the event loop clean.
Written by Ansh Gautam
Full-stack engineer building production systems with FastAPI, React, and AI/LLM integrations. Currently looking for backend engineering & AI integration roles.