technical6 min read·Jun 25, 2026

Building Real-Time Data Sync at Scale: WebSocket Patterns for Sub-50ms Latency

A deep dive into the architectural patterns and optimization techniques I used to achieve sub-50ms synchronization across 300+ concurrent WebSocket connections in a production attendance system.

websocketsfastapireal-timeasync-pythonperformancesystem-design

ShareLinkedIn X / Twitter

The Problem

When I started building the attendance management system, the initial requirement seemed straightforward: track student locations via GPS and sync attendance status in real time. But "real time" has teeth. Administrators expected instant feedback when a student checked in—no lag, no batching delays, no excuses. With 300+ concurrent users on a modest VPS, I needed to move fast without burning CPU or network bandwidth.

The naive approach would've been to broadcast every GPS update to every connected client. That's a recipe for message explosion. Instead, I had to think about what actually needs to sync, when it needs to sync, and how to pack it efficiently over the wire.

Understanding the Bottleneck

WebSocket latency doesn't come from a single source. It's a stack:

1. Message serialization overhead: JSON encoding on every update.

2. Database round-trips: Querying state before broadcasting.

3. Inefficient message routing: Sending data to clients that don't need it.

4. Event loop contention: Blocking operations starving async tasks.

5. Network buffering: Small, frequent messages getting delayed by TCP's Nagle algorithm.

I measured baseline performance with a simple ping-pong test: a client sends a message, the server echoes it back. With unoptimized code, round-trip latency hovered around 150–200ms. That's unacceptable for attendance tracking.

Architecture: Separation of Concerns

The key insight was to split responsibilities:

GPS ingestion layer: Accept location updates, validate, store asynchronously.

Attendance state machine: Track status (present, absent, marked, etc.) separately from location.

Broadcast layer: Only send deltas—changes to attendance state, not every GPS ping.

Client-side caching: Keep local state, only sync when server says it changed.

This meant not every GPS update triggered a WebSocket message. A student could send GPS pings every 5 seconds, but their attendance status only changed when they crossed a geofence or an admin marked them absent.

Implementation: FastAPI + Async I/O

FastAPI's async foundation is essential here. Every blocking operation—database queries, file I/O, external API calls—needs to be non-blocking, or the event loop freezes and all 300 connections suffer.

python
from fastapi import FastAPI, WebSocket
from fastapi.concurrency import asynccontextmanager
import asyncio
import json
from typing import Set, Dict
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
from sqlalchemy.orm import sessionmaker

app = FastAPI()

# Connection pool for async database access
engine = create_async_engine(
    "postgresql+asyncpg://user:pass@localhost/attendance",
    echo=False,
    pool_size=20,
    max_overflow=10,
)

AsyncSessionLocal = sessionmaker(
    engine, class_=AsyncSession, expire_on_commit=False
)

# Track active connections per school/class
active_connections: Dict[str, Set[WebSocket]] = {}

@app.websocket("/ws/attendance/{class_id}")
async def websocket_endpoint(websocket: WebSocket, class_id: str):
    await websocket.accept()
    
    if class_id not in active_connections:
        active_connections[class_id] = set()
    active_connections[class_id].add(websocket)
    
    try:
        while True:
            # Receive GPS update from client
            data = await websocket.receive_json()
            
            # Non-blocking validation and processing
            student_id = data.get("student_id")
            lat, lon = data.get("lat"), data.get("lon")
            
            # Async database operation
            async with AsyncSessionLocal() as session:
                # Check if this location crosses a geofence
                status_changed = await check_geofence(
                    session, student_id, lat, lon
                )
                
                if status_changed:
                    # Only broadcast if state actually changed
                    new_status = status_changed["status"]
                    broadcast_payload = {
                        "type": "status_update",
                        "student_id": student_id,
                        "status": new_status,
                        "timestamp": status_changed["timestamp"],
                    }
                    
                    # Broadcast to all clients in this class
                    await broadcast_to_class(
                        class_id, broadcast_payload
                    )
    
    except Exception as e:
        print(f"WebSocket error: {e}")
    finally:
        active_connections[class_id].discard(websocket)
        await websocket.close()

async def check_geofence(session, student_id, lat, lon):
    """Check if student crossed a geofence boundary."""
    from sqlalchemy import select
    from models import Student, Geofence, AttendanceRecord
    
    # Fetch student's current status and geofence
    result = await session.execute(
        select(Student, Geofence).where(Student.id == student_id)
    )
    student, geofence = result.first() or (None, None)
    
    if not geofence:
        return None
    
    # Simple distance check (in production, use PostGIS)
    distance = ((lat - geofence.lat) ** 2 + (lon - geofence.lon) ** 2) ** 0.5
    inside = distance < geofence.radius_km
    
    # Check if status changed
    if inside and student.status != "present":
        student.status = "present"
        student.check_in_time = asyncio.get_event_loop().time()
        session.add(student)
        await session.commit()
        return {
            "status": "present",
            "timestamp": student.check_in_time,
        }
    
    return None

async def broadcast_to_class(class_id: str, payload: dict):
    """Broadcast to all connected clients in a class."""
    if class_id not in active_connections:
        return
    
    message = json.dumps(payload)
    disconnected = set()
    
    for websocket in active_connections[class_id]:
        try:
            await websocket.send_text(message)
        except Exception:
            disconnected.add(websocket)
    
    # Clean up dead connections
    active_connections[class_id] -= disconnected

Optimization: Message Batching

Even with async I/O, sending 300 individual JSON messages for each state change is wasteful. I implemented a batching layer that collects updates over a small time window (10–20ms) and sends them in one payload.

python
from collections import defaultdict
import time

class MessageBatcher:
    def __init__(self, batch_window_ms: int = 15):
        self.batch_window = batch_window_ms / 1000.0
        self.pending: Dict[str, list] = defaultdict(list)
        self.last_flush: Dict[str, float] = defaultdict(float)
    
    async def add_message(self, class_id: str, payload: dict):
        """Add a message to the batch queue."""
        self.pending[class_id].append(payload)
        
        # Flush if window elapsed or batch is large
        now = time.time()
        if (now - self.last_flush[class_id] > self.batch_window or
            len(self.pending[class_id]) > 50):
            await self.flush(class_id)
    
    async def flush(self, class_id: str):
        """Send all pending messages for a class."""
        if not self.pending[class_id]:
            return
        
        batch_payload = {
            "type": "batch_update",
            "updates": self.pending[class_id],
            "timestamp": time.time(),
        }
        
        await broadcast_to_class(class_id, batch_payload)
        self.pending[class_id].clear()
        self.last_flush[class_id] = time.time()

batcher = MessageBatcher(batch_window_ms=15)

This reduced message volume by ~70% in typical scenarios while keeping latency under 50ms.

Client-Side: Efficient State Management

The server isn't the only place to optimize. The client (built with Leaflet.js) caches attendance state locally and only re-renders when the server sends a delta.

javascript
class AttendanceManager {
  constructor(classId) {
    this.classId = classId;
    this.localState = new Map(); // student_id -> status
    this.ws = null;
    this.pendingGpsUpdates = [];
    this.gpsFlushInterval = 5000; // Send GPS every 5 seconds
    this.connect();
  }

  connect() {
    this.ws = new WebSocket(`wss://api.example.com/ws/attendance/${this.classId}`);
    this.ws.onmessage = (event) => this.handleMessage(JSON.parse(event.data));
  }

  handleMessage(payload) {
    if (payload.type === "batch_update") {
      // Process multiple updates at once
      for (const update of payload.updates) {
        this.localState.set(update.student_id, update.status);
      }
      // Single re-render
      this.renderMap();
    }
  }

  sendGpsUpdate(studentId, lat, lon) {
    // Batch GPS updates locally before sending
    this.pendingGpsUpdates.push({ student_id: studentId, lat, lon });
  }

  flushGps() {
    if (this.pendingGpsUpdates.length > 0) {
      this.ws.send(JSON.stringify({
        type: "gps_batch",
        updates: this.pendingGpsUpdates,
      }));
      this.pendingGpsUpdates = [];
    }
  }
}

// Flush GPS every 5 seconds
setInterval(() => manager.flushGps(), 5000);

Measuring and Validating

I built a latency profiler to track end-to-end timing:

python
import time
from dataclasses import dataclass

@dataclass
class LatencyMetric:
    client_send_time: float
    server_receive_time: float
    server_process_time: float
    broadcast_time: float
    client_receive_time: float
    
    @property
    def round_trip_ms(self):
        return (self.client_receive_time - self.client_send_time) * 1000
    
    @property
    def server_process_ms(self):
        return (self.broadcast_time - self.server_receive_time) * 1000

# Client includes timestamp in payload
# Server echoes it back with server timestamps
# Client calculates round-trip on echo

After optimization, median latency settled at 38–45ms under normal load, peaking at ~70ms during database contention.

Key Takeaways

Delta-only updates: Don't broadcast state that hasn't changed. GPS is noise; attendance status is signal.

Async all the way down: A single blocking database query in your WebSocket handler will stall all 300 connections. Use async SQLAlchemy, async database drivers, and async utilities.

Batch at the edges: Collect updates over milliseconds, send once. Reduces network overhead and client re-renders.

Connection pooling matters: A small pool (20–30 connections) with sensible overflow settings prevents database exhaustion.

Profile under load: Latency at 10 concurrent users tells you nothing. Test with your actual target concurrency and measure both client and server timing.

WebSocket performance isn't magic—it's discipline. Measure, batch, cache, and keep the event loop clean.

Written by Ansh Gautam

Full-stack engineer building production systems with FastAPI, React, and AI/LLM integrations. Currently looking for backend engineering & AI integration roles.

Hire Me →View Projects →

devlog

Introducing devtailored: Building Production-Ready Engineering Systems

4 min read