Tool-Specific Metadata Extraction: Reading Every Format AI Tools Produce

The PNG specification allows arbitrary text metadata through tEXt, iTXt, and zTXt chunks. ComfyUI uses this mechanism to embed two large JSON structures — the workflow graph and the prompt execution data — under specific keywords. Stable Diffusion Web UI (A1111) uses a different approach: a plain-text “parameters” string with a specific formatting convention. InvokeAI uses yet another format. And tools like Midjourney do not embed metadata in images at all.

Part of our AI-Native DAM Architecture

An asset management system that claims to support AI-generated content must extract metadata from all of these formats. This is not a simple parsing problem — each format has edge cases, size limits, encoding variations, and failure modes that a production extractor must handle gracefully. Getting extraction right is the foundation of everything that follows: search, lineage, reproducibility, and compliance all depend on correct, complete metadata extraction.

The Forces at Work

5+distinct metadata embedding formats across major AI generation tools — each requiring a dedicated extractor with format-specific parsing logic

Format diversity: Even within the PNG specification, tools use different chunk types (tEXt vs. iTXt), different keywords (workflow, prompt, parameters, Dream, invokeai_metadata), and different encoding strategies (plain text, JSON, base64). The extractor must probe for multiple formats and identify which tool produced the file before attempting to parse its metadata.
Large payload sizes: ComfyUI workflow JSON can exceed 200 kilobytes for complex workflows with forty or more nodes. Many PNG metadata libraries have fixed buffer sizes that silently truncate large text chunks, producing corrupted JSON that fails to parse. Understanding the two JSON blobs means understanding that both can be very large and must be extracted completely.
Custom node contamination: ComfyUI's extensibility means custom nodes can inject arbitrary data into the workflow and prompt structures. Some custom nodes add fields with non-standard types, circular references, or extremely large values. The extractor must handle these gracefully — extracting what it can and flagging what it cannot — without crashing on malformed input.
Non-image sources: Not all generation metadata comes from image files. Midjourney metadata comes from Discord messages. Some tools produce metadata in sidecar files (JSON or XML alongside the image). Some export metadata through APIs. The extraction layer must support file-based, message-based, and API-based sources.

200 KBmaximum observed metadata size in a single ComfyUI PNG — larger than many complete web pages and beyond the buffer limits of most PNG metadata librariesComplex 60+ node workflow

The Problem

Most metadata extraction tools are built for traditional photography metadata: EXIF data measured in bytes, with well-defined field types and standard value ranges. AI generation metadata breaks these assumptions in every dimension:

Traditional vs. AI Generation Metadata

Dimension	Photography EXIF	AI Generation Metadata
Size per image	1-10 KB	10-400 KB
Format	Standardized (EXIF/XMP)	Tool-specific, non-standard
Structure	Flat key-value pairs	Nested JSON, graphs, free text
Stability	Mature, rarely changes	Evolves with each tool update
Validation	Schema-defined types	Arbitrary, unvalidated content

A metadata extraction library designed for EXIF will fail on ComfyUI output — not with an error, but with silent data loss. It will read the standard EXIF fields (resolution, color space) and ignore the ComfyUI-specific text chunks that contain the generation parameters. The result is an image with “extracted metadata” that is missing everything that matters for AI asset management.

The most dangerous extraction failure is the silent one. The tool reports success, returns metadata, and the user assumes the extraction was complete. But the generation parameters — the prompt, the seed, the model, the workflow — were never read because the extractor did not know to look for them.

The Solution: Probing Extractors with Tool Detection

A robust extraction system uses a probe-first architecture: before attempting to parse metadata in any specific format, it probes the file to determine which tool produced it and which metadata format to expect.

Tool Detection

The first extraction step is tool identification. The system checks for signature metadata markers: a “workflow” text chunk indicates ComfyUI. A “parameters” text chunk with the A1111 formatting pattern indicates Stable Diffusion Web UI. An “invokeai_metadata” chunk indicates InvokeAI. No AI-specific text chunks with specific EXIF patterns may indicate DALL-E or Midjourney. The detection step routes the file to the correct tool-specific parser.

Format-Specific Parsers

Each tool gets a dedicated parser that understands its specific metadata format:

ComfyUI parser: Reads “workflow” and “prompt” text chunks. Validates both as JSON. Handles large payloads (no fixed buffer). Extracts node types, connections, widget values from the workflow blob. Extracts resolved parameters, seeds, model paths from the prompt blob. Handles custom node fields gracefully.
A1111 parser: Reads “parameters” text chunk. Parses the structured text format: first line is positive prompt, “Negative prompt:” line is negative prompt, remaining lines are key-value parameters (Steps, Sampler, CFG scale, Seed, Model, etc.). Handles multi-line prompts and BREAK tokens.
Midjourney parser: Operates on Discord message data rather than image file metadata. Extracts prompt text, parameter flags, job ID, variation/upscale relationships from message content and formatting.
Generic parser: For images with no recognized AI-specific metadata, extracts standard EXIF, XMP, and IPTC data. Captures technical metadata (dimensions, color space, camera info if present) and flags the asset as having no AI generation metadata detected.

Extraction Quality Reporting

Each extractor reports what it found and what it could not parse. The extraction result includes a quality assessment: full extraction (all expected fields parsed successfully), partial extraction (some fields parsed, some failed), or minimal extraction (only basic technical metadata). This quality signal flows into the normalization pipeline and eventually to the user, who can see which assets have rich metadata and which have gaps.

Consequences

Extractor maintenance burden: Each tool-specific parser must be maintained as tools evolve. When ComfyUI adds new node types or changes its JSON structure, the parser must be updated. When Midjourney changes its Discord message formatting, the parser must adapt. This is a continuous maintenance cost, but it is isolated to the extraction layer — downstream systems work with the normalized schema and are unaffected by extractor changes.
Extensibility: New tools can be supported by adding a new detection signature and a new parser. The architecture is designed for this — the probe-first approach means the system gracefully handles unknown formats (falling back to the generic parser) while providing rich extraction for recognized tools.
Testing complexity: Each parser needs its own test suite with real-world examples from its tool. Edge cases differ per tool: ComfyUI edge cases involve custom nodes and large workflows; A1111 edge cases involve multi-line prompts and extension-specific parameters; Midjourney edge cases involve message threading and variation chains. A comprehensive test suite requires a library of real metadata samples from each tool.
Foundation for everything: Correct extraction is the prerequisite for every other capability. Search quality depends on extraction completeness. Reproducibility depends on extracting the right parameters. Lineage tracking depends on extracting relationships. Investing in extraction quality pays compound returns across the entire system.

Related Patterns

The Normalization Pipeline consumes extractor output and translates it into a unified schema.
The Two JSON Blobs details the specific extraction challenges for ComfyUI's dual metadata structure.
Midjourney Metadata describes extraction from Discord messages rather than image files.
The Two Metadata Problem provides the broader context for why tool-specific extraction is necessary.

Part of our AI-Native DAM Architecture

The Forces at Work

5+distinct metadata embedding formats across major AI generation tools — each requiring a dedicated extractor with format-specific parsing logic

Format diversity: Even within the PNG specification, tools use different chunk types (tEXt vs. iTXt), different keywords (workflow, prompt, parameters, Dream, invokeai_metadata), and different encoding strategies (plain text, JSON, base64). The extractor must probe for multiple formats and identify which tool produced the file before attempting to parse its metadata.
Large payload sizes: ComfyUI workflow JSON can exceed 200 kilobytes for complex workflows with forty or more nodes. Many PNG metadata libraries have fixed buffer sizes that silently truncate large text chunks, producing corrupted JSON that fails to parse. Understanding the two JSON blobs means understanding that both can be very large and must be extracted completely.
Custom node contamination: ComfyUI's extensibility means custom nodes can inject arbitrary data into the workflow and prompt structures. Some custom nodes add fields with non-standard types, circular references, or extremely large values. The extractor must handle these gracefully — extracting what it can and flagging what it cannot — without crashing on malformed input.
Non-image sources: Not all generation metadata comes from image files. Midjourney metadata comes from Discord messages. Some tools produce metadata in sidecar files (JSON or XML alongside the image). Some export metadata through APIs. The extraction layer must support file-based, message-based, and API-based sources.

200 KBmaximum observed metadata size in a single ComfyUI PNG — larger than many complete web pages and beyond the buffer limits of most PNG metadata librariesComplex 60+ node workflow

The Problem

Traditional vs. AI Generation Metadata

Dimension	Photography EXIF	AI Generation Metadata
Size per image	1-10 KB	10-400 KB
Format	Standardized (EXIF/XMP)	Tool-specific, non-standard
Structure	Flat key-value pairs	Nested JSON, graphs, free text
Stability	Mature, rarely changes	Evolves with each tool update
Validation	Schema-defined types	Arbitrary, unvalidated content

The Solution: Probing Extractors with Tool Detection

Tool Detection

Format-Specific Parsers

Each tool gets a dedicated parser that understands its specific metadata format:

ComfyUI parser: Reads “workflow” and “prompt” text chunks. Validates both as JSON. Handles large payloads (no fixed buffer). Extracts node types, connections, widget values from the workflow blob. Extracts resolved parameters, seeds, model paths from the prompt blob. Handles custom node fields gracefully.
A1111 parser: Reads “parameters” text chunk. Parses the structured text format: first line is positive prompt, “Negative prompt:” line is negative prompt, remaining lines are key-value parameters (Steps, Sampler, CFG scale, Seed, Model, etc.). Handles multi-line prompts and BREAK tokens.
Midjourney parser: Operates on Discord message data rather than image file metadata. Extracts prompt text, parameter flags, job ID, variation/upscale relationships from message content and formatting.
Generic parser: For images with no recognized AI-specific metadata, extracts standard EXIF, XMP, and IPTC data. Captures technical metadata (dimensions, color space, camera info if present) and flags the asset as having no AI generation metadata detected.

Extraction Quality Reporting

Consequences

Extractor maintenance burden: Each tool-specific parser must be maintained as tools evolve. When ComfyUI adds new node types or changes its JSON structure, the parser must be updated. When Midjourney changes its Discord message formatting, the parser must adapt. This is a continuous maintenance cost, but it is isolated to the extraction layer — downstream systems work with the normalized schema and are unaffected by extractor changes.
Extensibility: New tools can be supported by adding a new detection signature and a new parser. The architecture is designed for this — the probe-first approach means the system gracefully handles unknown formats (falling back to the generic parser) while providing rich extraction for recognized tools.
Testing complexity: Each parser needs its own test suite with real-world examples from its tool. Edge cases differ per tool: ComfyUI edge cases involve custom nodes and large workflows; A1111 edge cases involve multi-line prompts and extension-specific parameters; Midjourney edge cases involve message threading and variation chains. A comprehensive test suite requires a library of real metadata samples from each tool.
Foundation for everything: Correct extraction is the prerequisite for every other capability. Search quality depends on extraction completeness. Reproducibility depends on extracting the right parameters. Lineage tracking depends on extracting relationships. Investing in extraction quality pays compound returns across the entire system.

Related Patterns

The Normalization Pipeline consumes extractor output and translates it into a unified schema.
The Two JSON Blobs details the specific extraction challenges for ComfyUI's dual metadata structure.
Midjourney Metadata describes extraction from Discord messages rather than image files.
The Two Metadata Problem provides the broader context for why tool-specific extraction is necessary.

Tool-Specific Metadata Extraction: Reading Every Format AI Tools Produce

The Forces at Work

The Problem

Traditional vs. AI Generation Metadata

The Solution: Probing Extractors with Tool Detection

Tool Detection

Format-Specific Parsers

Extraction Quality Reporting

Consequences

Related Patterns

Every Tool. Every Format. Fully Extracted.

The Metadata Normalization Pipeline

The Two JSON Blobs Inside Every ComfyUI PNG

Midjourney Metadata: The Opaque Generation Problem

Tool-Specific Metadata Extraction: Reading Every Format AI Tools Produce

The Forces at Work

The Problem

Traditional vs. AI Generation Metadata

The Solution: Probing Extractors with Tool Detection

Tool Detection

Format-Specific Parsers

Extraction Quality Reporting

Consequences

Related Patterns

Every Tool. Every Format. Fully Extracted.

The Metadata Normalization Pipeline

The Two JSON Blobs Inside Every ComfyUI PNG

Midjourney Metadata: The Opaque Generation Problem