Technical Architecture

Ingest Architecture: From File Drop to Searchable Asset

The moment a file enters the system is the most important moment in its lifecycle. Everything that happens downstream — search, curation, deduplication, portfolio distillation — depends on what the ingest pipeline captures. A well-designed ingest architecture extracts maximum information at minimum cost, transforms raw files into richly indexed assets, and does it fast enough that the artist never waits.

February 25, 202611 minNumonic Team
Abstract visualization: Violet orbs against black backdrop

An artist finishes a ComfyUI session and drags two hundred images into their asset manager. Within seconds, they expect to see thumbnails. Within minutes, they expect to search by prompt text. Within an hour, they expect the system to have identified duplicates, clustered sessions, and surfaced the best work. The ingest pipeline is the engine that makes all of this possible — transforming raw files into richly indexed, searchable, deduplicated assets through a sequence of extraction, transformation, and indexing stages.

The architecture of the ingest pipeline determines the ceiling for every feature built on top of it. If the pipeline misses embedded generation parameters, prompt-based search is impossible. If it skips content hashing, deduplication cannot work. If it processes files sequentially, bulk imports take hours instead of minutes. Every design decision in the ingest pipeline cascades through the entire system.

The Forces at Work

  • Generation tools embed metadata inconsistently: ComfyUI stores complete workflow JSON in image metadata. Midjourney provides prompt text but not generation parameters. Stable Diffusion WebUI uses a custom text format. DALL-E provides almost nothing. The ingest pipeline must handle all of these formats, extracting maximum information from whatever is available.
  • Bulk imports are the norm, not the exception: Artists do not upload one image at a time. They import entire session folders — two hundred, five hundred, a thousand images. The pipeline must handle bulk imports efficiently, processing files in parallel without degrading the experience for files already in the system.
  • Availability trumps completeness: An artist who just finished a session wants to see their work immediately. They will tolerate incomplete search results for a few minutes; they will not tolerate waiting ten minutes for thumbnails to appear. The pipeline must prioritize fast visual availability over complete indexing.
  • Extraction is the only chance: Metadata embedded in image files is fragile. If the artist re-exports, crops, or converts the file, the embedded metadata may be lost. The ingest pipeline must capture everything at the point of entry, because there may not be a second opportunity.

The Problem

Traditional DAM systems treat ingestion as a simple file upload: store the bytes, generate a thumbnail, done. This works for photography where EXIF data is standardized, but fails for generative AI where the most valuable metadata — the prompt, the model, the generation parameters — is embedded in tool-specific formats that require specialized parsing. Without this metadata, the asset is just a nameless image in a sea of nameless images.

The Solution: Staged Extraction Pipeline

The ingest architecture separates processing into synchronous stages (must complete before the user sees the asset) and asynchronous stages (complete in the background while the user works). This split ensures fast availability without sacrificing depth of analysis.

Stage 1: Content Hashing and Deduplication

The first operation on any incoming file is computing its content hash. If the hash already exists in the system, the file is a duplicate — no new storage is consumed, and the import creates a reference to the existing asset. This stage is synchronous and takes milliseconds per file, even for large images. For a bulk import of five hundred images with twenty percent duplicates, this stage saves one hundred file writes before any other processing begins.

Stage 2: Metadata Extraction

For files that pass the deduplication check, the pipeline extracts embedded metadata using tool-specific parsers. ComfyUI images yield complete workflow graphs with every node, connection, and parameter. Midjourney exports provide prompt text and job identifiers. Standard EXIF and IPTC fields are captured for all image formats. The extracted metadata is normalized into a common schema and stored alongside the asset, enabling cross-tool search regardless of origin.

Stage 3: Thumbnail Generation

Multiple thumbnail sizes are generated synchronously — browse size for grid views, preview size for detail panels, and a tiny placeholder for progressive loading. These thumbnails are what the artist sees within seconds of import. The original full-resolution file is stored but not served for browsing, keeping the interface responsive even for libraries with tens of thousands of high-resolution images.

Stage 4: Asynchronous Enrichment

After the synchronous stages complete — typically within seconds per file — the asset enters the asynchronous enrichment queue. This is where cost-aware processing determines the depth of analysis. Assets promoted to higher processing tiers receive visual embedding generation for similarity search, quality assessment, style classification, and session clustering. This enrichment happens in the background, progressively improving the asset's searchability and curability over minutes to hours.

Stage 5: Index Update

As each stage completes, the search index is updated incrementally. After Stage 2, the asset becomes searchable by prompt text and generation parameters. After Stage 4, it becomes findable by visual similarity and quality filters. This incremental indexing means the asset's discoverability improves continuously after import, without requiring a full re-index of the library.

Consequences

  • Fast import, progressive enrichment: Artists see their imports within seconds. Search and similarity features become available progressively as async stages complete. This matches the natural workflow: import, browse recent work, then search for specific assets — by the time search is needed, the indexing is complete.
  • Parser maintenance burden: Supporting multiple generation tool formats means maintaining multiple specialized parsers. When ComfyUI changes its metadata format, or a new generation tool gains popularity, the pipeline needs a new or updated parser. This maintenance cost is ongoing and scales with the number of supported tools.
  • Metadata format fragility: Some tools embed metadata in ways that are easily stripped. Image conversion, social media upload, or even some editing tools remove embedded metadata. Assets that arrive without metadata still function — they get thumbnails and content hashes — but lose the rich searchability that metadata enables.
  • Queue management complexity: The asynchronous enrichment queue must handle priority scheduling, retry logic for failed processing, and backpressure during bulk imports. Batch processing patterns address this complexity, but the queue itself adds operational overhead compared to a simple synchronous pipeline.

Related Patterns

Import Fast, Search Everything

Numonic's ingest pipeline extracts every piece of metadata from your generations — making every prompt, parameter, and workflow searchable from the moment of import.

Try Numonic Free