Ingest Architecture: From File Drop to Searchable Asset

An artist finishes a ComfyUI session and drags two hundred images into their asset manager. Within seconds, they expect to see thumbnails. Within minutes, they expect to search by prompt text. Within an hour, they expect the system to have identified duplicates, clustered sessions, and surfaced the best work. The ingest pipeline is the engine that makes all of this possible — transforming raw files into richly indexed, searchable, deduplicated assets through a sequence of extraction, transformation, and indexing stages.

Part of our AI-Native DAM Architecture

The architecture of the ingest pipeline determines the ceiling for every feature built on top of it. If the pipeline misses embedded generation parameters, prompt-based search is impossible. If it skips content hashing, deduplication cannot work. If it processes files sequentially, bulk imports take hours instead of minutes. Every design decision in the ingest pipeline cascades through the entire system.

The Forces at Work

< 5sfrom file drop to visible thumbnail — the maximum acceptable latency for an artist importing work from a generation session, regardless of how many files are in the batch

Generation tools embed metadata inconsistently: ComfyUI stores complete workflow JSON in image metadata. Midjourney provides prompt text but not generation parameters. Stable Diffusion WebUI uses a custom text format. DALL-E provides almost nothing. The ingest pipeline must handle all of these formats, extracting maximum information from whatever is available.
Bulk imports are the norm, not the exception: Artists do not upload one image at a time. They import entire session folders — two hundred, five hundred, a thousand images. The pipeline must handle bulk imports efficiently, processing files in parallel without degrading the experience for files already in the system.
Availability trumps completeness: An artist who just finished a session wants to see their work immediately. They will tolerate incomplete search results for a few minutes; they will not tolerate waiting ten minutes for thumbnails to appear. The pipeline must prioritize fast visual availability over complete indexing.
Extraction is the only chance: Metadata embedded in image files is fragile. If the artist re-exports, crops, or converts the file, the embedded metadata may be lost. The ingest pipeline must capture everything at the point of entry, because there may not be a second opportunity.

6+distinct metadata formats across major generation tools — each requiring specialized extraction logic to capture prompt text, parameters, model information, and workflow dataComfyUI, Midjourney, SD WebUI, DALL-E, Firefly, Flux

The Problem

Traditional DAM systems treat ingestion as a simple file upload: store the bytes, generate a thumbnail, done. This works for photography where EXIF data is standardized, but fails for generative AI where the most valuable metadata — the prompt, the model, the generation parameters — is embedded in tool-specific formats that require specialized parsing. Without this metadata, the asset is just a nameless image in a sea of nameless images.

Ingest Pipeline Stages

Stage	What Happens	Why It Matters
1. Content hashing	SHA hash of file bytes	Deduplication — identical files stored once
2. Metadata extraction	Parse tool-specific embedded data	Enables prompt search, parameter filtering
3. Thumbnail generation	Create browse-size previews	Immediate visual availability
4. Async enrichment	Embeddings, quality scores, clustering	Deep search, curation, similarity

The ingest pipeline is not a file uploader. It is a knowledge extraction system. Every generation parameter, every prompt word, every workflow connection that the pipeline captures at ingest becomes searchable, filterable, and actionable. What it misses is lost.

The Solution: Staged Extraction Pipeline

The ingest architecture separates processing into synchronous stages (must complete before the user sees the asset) and asynchronous stages (complete in the background while the user works). This split ensures fast availability without sacrificing depth of analysis.

Stage 1: Content Hashing and Deduplication

The first operation on any incoming file is computing its content hash. If the hash already exists in the system, the file is a duplicate — no new storage is consumed, and the import creates a reference to the existing asset. This stage is synchronous and takes milliseconds per file, even for large images. For a bulk import of five hundred images with twenty percent duplicates, this stage saves one hundred file writes before any other processing begins.

Stage 2: Metadata Extraction

For files that pass the deduplication check, the pipeline extracts embedded metadata using tool-specific parsers. ComfyUI images yield complete workflow graphs with every node, connection, and parameter. Midjourney exports provide prompt text and job identifiers. Standard EXIF and IPTC fields are captured for all image formats. The extracted metadata is normalized into a common schema and stored alongside the asset, enabling cross-tool search regardless of origin.

Stage 3: Thumbnail Generation

Multiple thumbnail sizes are generated synchronously — browse size for grid views, preview size for detail panels, and a tiny placeholder for progressive loading. These thumbnails are what the artist sees within seconds of import. The original full-resolution file is stored but not served for browsing, keeping the interface responsive even for libraries with tens of thousands of high-resolution images.

Stage 4: Asynchronous Enrichment

After the synchronous stages complete — typically within seconds per file — the asset enters the asynchronous enrichment queue. This is where cost-aware processing determines the depth of analysis. Assets promoted to higher processing tiers receive visual embedding generation for similarity search, quality assessment, style classification, and session clustering. This enrichment happens in the background, progressively improving the asset's searchability and curability over minutes to hours.

Stage 5: Index Update

As each stage completes, the search index is updated incrementally. After Stage 2, the asset becomes searchable by prompt text and generation parameters. After Stage 4, it becomes findable by visual similarity and quality filters. This incremental indexing means the asset's discoverability improves continuously after import, without requiring a full re-index of the library.

Consequences

Fast import, progressive enrichment: Artists see their imports within seconds. Search and similarity features become available progressively as async stages complete. This matches the natural workflow: import, browse recent work, then search for specific assets — by the time search is needed, the indexing is complete.
Parser maintenance burden: Supporting multiple generation tool formats means maintaining multiple specialized parsers. When ComfyUI changes its metadata format, or a new generation tool gains popularity, the pipeline needs a new or updated parser. This maintenance cost is ongoing and scales with the number of supported tools.
Metadata format fragility: Some tools embed metadata in ways that are easily stripped. Image conversion, social media upload, or even some editing tools remove embedded metadata. Assets that arrive without metadata still function — they get thumbnails and content hashes — but lose the rich searchability that metadata enables.
Queue management complexity: The asynchronous enrichment queue must handle priority scheduling, retry logic for failed processing, and backpressure during bulk imports. Batch processing patterns address this complexity, but the queue itself adds operational overhead compared to a simple synchronous pipeline.

Related Patterns

Content-Addressed Storage provides the deduplication layer that the ingest pipeline uses in Stage 1.
Cost-Aware Processing determines how deeply each asset is analyzed in the asynchronous enrichment stage.
Batch Processing Patterns handles the queuing and execution of asynchronous enrichment at scale.
Midjourney Metadata describes the specific challenges of extracting metadata from one of the most popular generation tools.

Part of our AI-Native DAM Architecture

The Forces at Work

< 5sfrom file drop to visible thumbnail — the maximum acceptable latency for an artist importing work from a generation session, regardless of how many files are in the batch

Generation tools embed metadata inconsistently: ComfyUI stores complete workflow JSON in image metadata. Midjourney provides prompt text but not generation parameters. Stable Diffusion WebUI uses a custom text format. DALL-E provides almost nothing. The ingest pipeline must handle all of these formats, extracting maximum information from whatever is available.
Bulk imports are the norm, not the exception: Artists do not upload one image at a time. They import entire session folders — two hundred, five hundred, a thousand images. The pipeline must handle bulk imports efficiently, processing files in parallel without degrading the experience for files already in the system.
Availability trumps completeness: An artist who just finished a session wants to see their work immediately. They will tolerate incomplete search results for a few minutes; they will not tolerate waiting ten minutes for thumbnails to appear. The pipeline must prioritize fast visual availability over complete indexing.
Extraction is the only chance: Metadata embedded in image files is fragile. If the artist re-exports, crops, or converts the file, the embedded metadata may be lost. The ingest pipeline must capture everything at the point of entry, because there may not be a second opportunity.

The Problem

Ingest Pipeline Stages

Stage	What Happens	Why It Matters
1. Content hashing	SHA hash of file bytes	Deduplication — identical files stored once
2. Metadata extraction	Parse tool-specific embedded data	Enables prompt search, parameter filtering
3. Thumbnail generation	Create browse-size previews	Immediate visual availability
4. Async enrichment	Embeddings, quality scores, clustering	Deep search, curation, similarity

The Solution: Staged Extraction Pipeline

Stage 1: Content Hashing and Deduplication

Stage 2: Metadata Extraction

Stage 3: Thumbnail Generation

Stage 4: Asynchronous Enrichment

Stage 5: Index Update

Consequences

Fast import, progressive enrichment: Artists see their imports within seconds. Search and similarity features become available progressively as async stages complete. This matches the natural workflow: import, browse recent work, then search for specific assets — by the time search is needed, the indexing is complete.
Parser maintenance burden: Supporting multiple generation tool formats means maintaining multiple specialized parsers. When ComfyUI changes its metadata format, or a new generation tool gains popularity, the pipeline needs a new or updated parser. This maintenance cost is ongoing and scales with the number of supported tools.
Metadata format fragility: Some tools embed metadata in ways that are easily stripped. Image conversion, social media upload, or even some editing tools remove embedded metadata. Assets that arrive without metadata still function — they get thumbnails and content hashes — but lose the rich searchability that metadata enables.
Queue management complexity: The asynchronous enrichment queue must handle priority scheduling, retry logic for failed processing, and backpressure during bulk imports. Batch processing patterns address this complexity, but the queue itself adds operational overhead compared to a simple synchronous pipeline.

Related Patterns

Content-Addressed Storage provides the deduplication layer that the ingest pipeline uses in Stage 1.
Cost-Aware Processing determines how deeply each asset is analyzed in the asynchronous enrichment stage.
Batch Processing Patterns handles the queuing and execution of asynchronous enrichment at scale.
Midjourney Metadata describes the specific challenges of extracting metadata from one of the most popular generation tools.

Ingest Architecture: From File Drop to Searchable Asset

The Forces at Work

The Problem

Ingest Pipeline Stages

The Solution: Staged Extraction Pipeline

Stage 1: Content Hashing and Deduplication

Stage 2: Metadata Extraction

Stage 3: Thumbnail Generation

Stage 4: Asynchronous Enrichment

Stage 5: Index Update

Consequences

Related Patterns

Import Fast, Search Everything

Content-Addressed Storage

Cost-Aware Processing

Batch Processing Patterns

Ingest Architecture: From File Drop to Searchable Asset

The Forces at Work

The Problem

Ingest Pipeline Stages

The Solution: Staged Extraction Pipeline

Stage 1: Content Hashing and Deduplication

Stage 2: Metadata Extraction

Stage 3: Thumbnail Generation

Stage 4: Asynchronous Enrichment

Stage 5: Index Update

Consequences

Related Patterns

Import Fast, Search Everything

Content-Addressed Storage

Cost-Aware Processing

Batch Processing Patterns