Configuring async queues for high-volume claims ingestion

The single decision that governs whether a high-volume ingestion tier survives peak switch traffic is queue depth: a bounded asyncio.Queue with a hard maxsize and deterministic backpressure, versus an unbounded queue that silently absorbs load until garbage-collection thrashing stalls adjudication and claims miss their SLA. This page tunes that decision for the ingestion stage of Asynchronous Batch Adjudication Workflows, where the correctness stakes are concrete — an unbounded queue that drops or delays a B1 billing transaction produces a point-of-sale timeout the pharmacy reads as a hard reject, and a backpressure path that blocks the event loop stalls every in-flight claim behind it. The goal is bounded memory, predictable latency, and a dead-letter path that never loses a payload, tuned specifically for NCPDP D.0 ingestion rather than generic web throughput.

Bounded vs. unbounded queue depth: the tradeoffs

Before parsing the full payload, the ingestion layer only needs the routing fields — 440-E4 (Product/Service ID Qualifier), 407-D7 (Product/Service ID / NDC), and the B4 transaction code for priority — so queue sizing is driven by routing throughput, not payload weight. The table below is the decision matrix for depth and backpressure strategy under sustained load; numbers assume a 2 vCPU / 2 GiB container running four adjudication workers.

Strategy	`maxsize`	Peak RSS	Backpressure behaviour	NCPDP correctness impact
Unbounded `asyncio.Queue()`	none	grows until OOM kill	none — producer never blocks	GC pauses delay `B1`/`B2` past POS timeout → spurious rejects
Bounded, blocking `put()`	`workers × 1.5`	flat, ~65% cap	producer awaits a slot	slow worker stalls event loop → head-of-line blocking
Bounded, `wait_for()` + DLQ	`workers × 1.5`	flat, ~65% cap	timeout routes to dead-letter queue	claim preserved with reject `70`/`75`, loop never blocks
Bounded + memory gate	`workers × 1.5`	capped at `memory_threshold`	ingestion pauses above 75% RSS	absorbs formulary-refresh bursts without dropping claims

The bounded queue with a wait_for() timeout and a memory gate is the only combination that keeps RSS flat and guarantees no payload is silently lost. Set maxsize to adjudication worker concurrency multiplied by 1.5 so a transient worker stall has a small buffer without letting the queue grow into GC territory. Claims that fail the timeout are not dropped; they route to a dead-letter queue (DLQ) carrying the NCPDP reject code so downstream reconciliation can replay them. Segment-level parsing rules for those routing fields live in NCPDP D.0 Message Parsing Strategies, and the severity model that decides which reject codes are terminal comes from Schema Validation & Error Categorization.

Queue topology and deterministic backpressure

Unbounded ingestion pipelines fail under sustained switch bursts. Enforce a hard maxsize on the primary ingestion queue and couple it with a synchronous memory-pressure gate: read RSS from /proc/self/status (or psutil) and pause ingestion when utilisation exceeds 75% of the container’s allocated memory. Wrap queue.put() in asyncio.wait_for() with a strict timeout so a full queue applies non-blocking backpressure — when the timeout fires, route the payload to the DLQ rather than awaiting a slot and blocking the event loop.

Field-mapping validation happens synchronously at the queue boundary, on the raw bytes, before any full payload expansion. A malformed 440-E4 qualifier or a missing 407-D7 (NDC) triggers immediate rejection with NCPDP reject code 70 (Product/Service Not Covered) or 75 (Prior Authorization Required / missing required field), and the claim is diverted to the DLQ. Only structurally valid claims proceed to the worker pool. Deserialising just the routing segments here — never the member-identifying segments — is also a PHI-minimisation control: sensitive fields like 302-C2 (Cardholder ID) and 310-CA (Patient First Name) stay out of the ingestion layer entirely and are only hydrated downstream where the encryption and audit-logging obligations from Security & Compliance Boundaries for Claims Data apply.

Priority routing and SLA enforcement

Standard FIFO queues starve stat and prior-authorization claims during peak ingestion. Add a secondary asyncio.PriorityQueue keyed on the B4 transaction code, with priority integers inversely proportional to SLA urgency (1 for stat, 5 for prior-auth, 100 for standard retail). Workers drain the priority queue first and fall back to the standard queue, so an SLA-bound claim never waits behind a retroactive reprice batch — without introducing locks or thread contention. Because a PriorityQueue compares tuples element by element, always insert (priority, monotonic_seq, claim) so two equal-priority claims are broken by the sequence counter and Python never has to compare two uncomparable ClaimPayload objects.

Memory optimisation relies on object pooling and zero-copy slicing: use memoryview over the raw byte stream to avoid string allocation during segment extraction, and flush the batch buffer to the adjudication API when it reaches 5,000 records or 30 seconds elapse, whichever comes first. The official asyncio queue documentation covers the coroutine-safe producer-consumer contract these queues rely on.

Figure: Dual-queue priority routing — validated claims split into a priority queue (B4 stat / prior-auth) or the standard queue, workers drain the priority queue first, and the batch buffer flushes to adjudication at 5,000 records or 30 seconds.

Figure: The four queue-depth strategies from the decision matrix — unbounded (RSS climbs to an OOM kill), bounded-blocking (a slow worker stalls the event loop), bounded wait_for() + DLQ (flat RSS, timed-out claims routed to the dead-letter queue), and bounded + memory gate (ingestion pauses at a 75% RSS line) — each with its NCPDP correctness outcome.

Step-by-step implementation

The controller below implements bounded queues, a memory-aware backpressure gate, structured rejection routing, and priority-aware dispatch. Follow the numbered stages; each maps to a method on ClaimsIngestionController.

Size the queues to worker_count × 1.5 and create a separate unbounded DLQ so a full primary queue can always shed load without blocking.
Gate on memory by comparing RSS from /proc/self/status against a percentage of total container memory before every enqueue.
Validate on raw bytes — check the routing segments and derive priority before any full-payload hydration, so PHI never enters the ingestion buffer.
Dispatch with a timeout, routing timed-out or invalid claims to the DLQ with their NCPDP reject code.
Drain priority-first in the workers, buffer, and flush on size or age.

python

import asyncio
import os
import gc
import time
import itertools
import logging
from typing import Dict, Any, Optional
from dataclasses import dataclass
from collections import deque

logger = logging.getLogger(__name__)


@dataclass
class ClaimPayload:
    # raw_bytes holds ONLY the routing segments (440-E4, 407-D7, B4). Member
    # identifiers (302-C2 Cardholder ID, 310-CA Patient First Name) are NEVER
    # placed here — they are hydrated downstream under audit controls.
    raw_bytes: memoryview
    routing_data: Dict[str, Any]
    priority: int = 100
    is_valid: bool = True
    reject_code: Optional[str] = None  # NCPDP 511-FB reject code on failure


class ClaimsIngestionController:
    def __init__(self, max_queue_size: int = 6, memory_threshold_pct: float = 75.0, worker_count: int = 4):
        # maxsize = worker_count * 1.5 keeps a small buffer without drifting into
        # GC-thrash territory. Priority + standard queues are sized identically.
        self.queue = asyncio.Queue(maxsize=max_queue_size)
        self.priority_queue = asyncio.PriorityQueue(maxsize=max_queue_size)
        self.dlq = asyncio.Queue()  # unbounded: shedding load must never block
        self.memory_threshold = memory_threshold_pct
        self.worker_count = worker_count
        self._shutdown = asyncio.Event()
        # No maxlen: a bounded deque would silently evict the oldest claim once
        # full, dropping payloads before they are flushed to adjudication.
        self._batch_buffer: deque = deque()
        self._last_flush = time.monotonic()
        # Monotonic tiebreaker so equal-priority items never force a comparison
        # of (uncomparable) ClaimPayload objects inside the PriorityQueue.
        self._seq = itertools.count()

    def _get_rss_mb(self) -> float:
        try:
            with open("/proc/self/status", "r") as f:
                for line in f:
                    if line.startswith("VmRSS:"):
                        return int(line.split()[1]) / 1024
        except FileNotFoundError:
            return 0.0
        return 0.0

    def _check_memory_pressure(self) -> bool:
        try:
            total_mem_mb = os.sysconf("SC_PAGE_SIZE") * os.sysconf("SC_PHYS_PAGES") / (1024 * 1024)
            if total_mem_mb == 0:
                return False
            return (self._get_rss_mb() / total_mem_mb) * 100 > self.memory_threshold
        except (AttributeError, ValueError):
            return False

    async def parse_and_validate(self, raw_stream: bytes) -> ClaimPayload:
        # Zero-copy slice over the raw bytes — no str allocation per segment.
        mv = memoryview(raw_stream)
        routing: Dict[str, Any] = {}
        is_valid = True
        reject_code = None

        # Synchronous boundary validation on the routing segments only.
        if b"407-D7" not in raw_stream:            # missing NDC / Product-Service ID
            is_valid = False
            reject_code = "75"                      # NCPDP: missing required field
        elif b"440-E4" not in raw_stream:          # missing Product/Service ID Qualifier
            is_valid = False
            reject_code = "70"                      # NCPDP: product/service not covered

        # Priority from B4 transaction code: 01/02 = stat / prior-auth.
        priority = 100
        if b"B4=01" in raw_stream or b"B4=02" in raw_stream:
            priority = 10

        return ClaimPayload(
            raw_bytes=mv,
            routing_data=routing,
            priority=priority,
            is_valid=is_valid,
            reject_code=reject_code,
        )

    async def ingest(self, raw_stream: bytes):
        # PHI guardrail: never log raw_stream — it may carry 302-C2 / 310-CA.
        if self._check_memory_pressure():
            logger.warning("Memory pressure threshold breached. Applying backpressure.")
            await asyncio.sleep(0.5)
            return

        claim = await self.parse_and_validate(raw_stream)

        if not claim.is_valid:
            await self.dlq.put(claim)
            logger.info("Routed to DLQ. Reject Code: %s", claim.reject_code)  # code only, no payload
            return

        try:
            if claim.priority <= 10:
                await asyncio.wait_for(
                    self.priority_queue.put((claim.priority, next(self._seq), claim)),
                    timeout=2.0,
                )
            else:
                await asyncio.wait_for(self.queue.put(claim), timeout=2.0)
        except asyncio.TimeoutError:
            # Queue full: shed to DLQ instead of blocking the event loop.
            logger.error("Queue full. Backpressure triggered. Routing to DLQ.")
            await self.dlq.put(claim)

    async def _process_worker(self, worker_id: int):
        while not self._shutdown.is_set():
            try:
                if not self.priority_queue.empty():
                    _, _, claim = await self.priority_queue.get()
                    source_queue = self.priority_queue
                else:
                    claim = await asyncio.wait_for(self.queue.get(), timeout=5.0)
                    source_queue = self.queue

                # Downstream stage owns full 407-D7 -> GPI resolution, formulary
                # lookup, and adjudication; ingestion only buffers routing shells.
                self._batch_buffer.append(claim)
                source_queue.task_done()

                now = time.monotonic()
                if len(self._batch_buffer) >= 5000 or (now - self._last_flush) >= 30.0:
                    await self._flush_batch()

                if self._check_memory_pressure():
                    gc.collect()
            except asyncio.TimeoutError:
                continue

    async def _flush_batch(self):
        if not self._batch_buffer:
            return
        batch = list(self._batch_buffer)
        self._batch_buffer.clear()
        self._last_flush = time.monotonic()
        logger.info("Flushing batch of %d claims to adjudication API.", len(batch))
        # Dispatch to the adjudication API (HTTP/gRPC) under the rate limiter.

    async def start_workers(self):
        tasks = [asyncio.create_task(self._process_worker(i)) for i in range(self.worker_count)]
        await asyncio.gather(*tasks)

    async def run(self):
        await self.start_workers()

The downstream calls that _flush_batch makes are governed by the token-bucket limiter in PBM API Sync & Rate Limiting and the circuit breakers in Fallback Routing Logic Design; resolving each 407-D7 to a stable GPI before pricing is the job of NDC to GPI Crosswalk Automation.

Verification and testing pattern

Backpressure correctness is not observable from a happy-path run — you have to assert that an over-full queue sheds to the DLQ deterministically and that priority claims land ahead of standard ones. Drive the controller with known NCPDP fixtures and assert on the DLQ and queue ordering rather than on timing.

python

import asyncio
import pytest

# Minimal NCPDP-shaped fixtures — routing segments only, no PHI.
STAT_CLAIM = b"...440-E4...407-D7...B4=01..."     # priority path (stat)
STD_CLAIM = b"...440-E4...407-D7...B4=03..."      # standard retail path
BAD_NDC = b"...440-E4...B4=03..."                 # missing 407-D7 -> reject 75


@pytest.mark.asyncio
async def test_missing_ndc_routes_to_dlq_with_reject_75():
    c = ClaimsIngestionController(max_queue_size=2, worker_count=1)
    await c.ingest(BAD_NDC)
    claim = c.dlq.get_nowait()
    assert claim.is_valid is False
    assert claim.reject_code == "75"          # NCPDP missing-required-field
    assert c.queue.empty() and c.priority_queue.empty()


@pytest.mark.asyncio
async def test_full_queue_sheds_to_dlq_without_blocking():
    # maxsize=1 with no workers draining: the 2nd standard claim must time out.
    c = ClaimsIngestionController(max_queue_size=1, worker_count=1)
    await asyncio.wait_for(c.ingest(STD_CLAIM), timeout=3.0)   # fills the slot
    await asyncio.wait_for(c.ingest(STD_CLAIM), timeout=3.0)   # must NOT hang
    assert c.dlq.qsize() == 1                 # shed, not lost, not blocked


@pytest.mark.asyncio
async def test_stat_claim_takes_priority_tuple():
    c = ClaimsIngestionController(max_queue_size=4, worker_count=1)
    await c.ingest(STAT_CLAIM)
    priority, seq, claim = c.priority_queue.get_nowait()
    assert priority == 10                     # B4=01 mapped to priority path
    assert isinstance(seq, int)               # tiebreaker present, no obj compare

The second test is the load-bearing one: it proves the wait_for() timeout converts a full queue into a DLQ route instead of a hung coroutine — the exact failure the unbounded strategy hides. Run it in CI with a hard asyncio.wait_for wrapper so a regression that reintroduces blocking put() shows up as a test timeout, not a silent stall in production.

Gotchas and PHI guardrails

Never log the raw stream. raw_stream and claim.raw_bytes can carry 302-C2 (Cardholder ID) and 310-CA (Patient First Name). Log the reject code and a hashed routing key only — never the payload, even in the DLQ path.
Unbounded batch buffer, not unbounded queue. The primary queue must be bounded, but the _batch_buffer deque must not use maxlen — a bounded deque silently evicts the oldest claim on overflow, dropping a payload before it is flushed. Bound it by flush cadence (size or age), not by capacity.
PriorityQueue tuple ordering. Enqueue (priority, seq, claim). Omit the monotonic seq and two equal-priority claims force Python to compare ClaimPayload objects, raising TypeError mid-drain and stalling the worker.
Memory gate uses container limits, not host. SC_PHYS_PAGES reports host memory, which overstates the ceiling inside a cgroup-limited container; read the cgroup memory limit (/sys/fs/cgroup/memory.max) where available so the 75% gate reflects the real OOM boundary.
DLQ is not a graveyard. A claim in the DLQ still has an SLA. Reconcile DLQ contents against inbound 835 remittance and replay transient rejects; a missing 407-D7 is terminal, but a queue-full shed is retryable once pressure clears.
Decimal downstream, not here. Ingestion carries no money, but the adjudication stage it feeds must use decimal.Decimal with explicit quantize for every copay and MAC value — binary float drifts pennies that fail 835 reconciliation.

Asynchronous Batch Adjudication Workflows — parent workflow: how queued claims are dispatched, adjudicated, and reconciled.
NCPDP D.0 Message Parsing Strategies — how the routing segments this queue reads are extracted from raw D.0 bytes.
Schema Validation & Error Categorization — the severity model that decides which reject codes are terminal vs. retryable.
PBM API Sync & Rate Limiting — token-bucket control for the downstream calls the batch flush makes.
Fallback Routing Logic Design — circuit breakers that keep the worker pool resilient when adjudication endpoints degrade.

← Back to Asynchronous Batch Adjudication Workflows

Configuring async queues for high-volume claims ingestion

Bounded vs. unbounded queue depth: the tradeoffs #

Queue topology and deterministic backpressure #

Priority routing and SLA enforcement #

Step-by-step implementation #

Verification and testing pattern #

Gotchas and PHI guardrails #

Related #