Technical Architecture

The Metadata Normalization Pipeline

ComfyUI embeds JSON in PNG text chunks. Midjourney stores prompts in Discord. DALL-E returns metadata in API responses. Each tool speaks its own metadata dialect. A normalization pipeline translates these dialects into a unified schema that enables cross-tool search, comparison, and lineage.

February 25, 202611 minNumonic Team
Abstract visualization: Purple nanostructure molecules in space

Search for “all images generated with model X.” Simple query. Except ComfyUI records the model as a file path in a JSON blob embedded in a PNG text chunk. Midjourney records the model as a version flag (--v 6.1) in a Discord message. DALL-E records the model as a field in an API response object. Stable Diffusion Web UI records it as a line in a plain-text parameters string embedded in PNG metadata. Same concept — “which model made this” — four completely different representations.

A metadata normalization pipeline sits between the tool-specific extractors and the search index, translating each tool's metadata dialect into a unified canonical schema. This translation layer is what makes cross-tool queries possible — without it, every search must understand every tool's metadata format, and adding a new tool means updating every query.

The Forces at Work

  • Semantic equivalence, structural divergence: The same concept appears in different formats across tools. A “prompt” in ComfyUI is distributed across multiple text nodes in a workflow graph. In Midjourney it is a single string with appended parameters. In DALL-E it is an API request field. The semantic meaning is identical — the text that guided generation — but the structural representation is completely different.
  • Different metadata richness: ComfyUI provides a hundred or more fields per generation. Midjourney provides five to ten (prompt, version, aspect ratio, a few style parameters). DALL-E provides three to five. The normalization pipeline must handle this asymmetry — mapping rich metadata to a schema that also accommodates sparse metadata without losing information from the rich sources.
  • Evolving tool formats: Every tool updates its metadata format over time. ComfyUI adds new node types with new parameter structures. Midjourney adds new parameters (--sref, --cref, --personalize). The normalization pipeline must evolve with the tools without breaking compatibility with historical data.
  • Lossless preservation: The normalized schema must not discard tool-specific metadata that does not map cleanly to canonical fields. A ComfyUI workflow graph has no equivalent in Midjourney — but it is valuable and must be preserved. Normalization is translation with a structured remainder, not lossy compression.

The Problem

Without normalization, every downstream system must understand every tool's metadata format. The search engine needs one parser for ComfyUI, another for Midjourney, another for DALL-E. The lineage tracker needs the same set of parsers. The export system needs them again. Every new tool requires changes in every downstream system. This is the two metadata problem amplified across the entire architecture.

The Solution: Three-Stage Normalization

The normalization pipeline operates in three stages: extraction, mapping, and enrichment.

Stage 1: Tool-Specific Extraction

Each tool has a dedicated extractor that understands its specific metadata format. The ComfyUI extractor parses PNG text chunks and validates JSON structures. The Midjourney extractor parses Discord message formats. The DALL-E extractor processes API response objects. Each extractor produces a tool-specific intermediate representation that preserves every available field. See tool-specific extraction for the detailed patterns.

Stage 2: Canonical Mapping

The mapper translates tool-specific intermediate representations into a canonical schema with standard field names and value formats. The canonical schema includes fields that exist across tools (prompt text, model reference, creation timestamp, dimensions) and extension points for tool-specific fields that have no canonical equivalent (workflow graph, Discord channel ID, API request ID).

The mapping is not one-to-one. A ComfyUI workflow may have multiple text encoding nodes — the mapper must identify which ones contribute to the final prompt and concatenate or structure them appropriately. A Midjourney prompt with --ar 16:9 must be decomposed into prompt text (everything before the first flag) and parameters (the flags), each mapped to the appropriate canonical field.

Stage 3: Enrichment

After canonical mapping, the enrichment stage adds derived information. It computes content hashes for deduplication. It generates embeddings for semantic search. It infers session boundaries from temporal clustering. It tags the asset with detected visual characteristics (style, subject, color palette). Enrichment operates on the canonical schema, not on tool-specific formats — so every enrichment step works for every tool's output without modification.

Consequences

  • New tools require only an extractor: When a new generation tool emerges, the system needs only a new Stage 1 extractor that produces the intermediate representation. Stages 2 and 3 work unchanged because they operate on the canonical schema. This reduces the cost of supporting new tools from cross-cutting changes to a single module.
  • Schema evolution is centralized: When the canonical schema evolves — adding a new standard field, changing a value format — the change happens in one place. All extractors feed into the same schema, and all downstream consumers read from it. This is dramatically simpler than maintaining per-tool schemas throughout the system.
  • Information loss is explicit: The mapping stage tracks which tool-specific fields mapped to canonical fields and which were preserved as extensions. This creates an explicit record of information loss — if a canonical field is “model” and the ComfyUI extractor provides a file path while Midjourney provides only a version number, the mapping records this asymmetry rather than hiding it.
  • Pipeline complexity: Three stages with tool-specific extractors create a meaningful engineering surface. Each extractor must be maintained as its tool evolves. The canonical schema must balance generality (covering all tools) with specificity (preserving meaningful distinctions). This is a deliberate trade-off: pipeline complexity in exchange for downstream simplicity.

Related Patterns

  • Tool-Specific Extraction details the Stage 1 extractors for each generation tool.
  • The Two Metadata Problem describes the cross-tool metadata divergence that the normalization pipeline resolves.
  • Hybrid Search consumes the normalized schema to enable cross-tool queries.
  • Metadata Inversion explains how generative AI reverses the traditional metadata creation model — from human-assigned to machine-embedded.

One Library. Every Tool. Fully Searchable.

Numonic normalizes metadata from ComfyUI, Midjourney, DALL-E, and more into a unified searchable library — no manual tagging required.

Try Numonic Free