Dataset methodology
Prompt Dataset Methodology
Methodology for the GPT Image Hub public prompt dataset, including fields, provenance, update cadence, and machine-readable distributions.
Dataset scope
The public dataset contains prompt templates that are visible in the GPT Image Hub library and intended for discovery, retrieval, and reuse.
- Each record includes full prompt text and canonical URL.
- Each record includes category, tags, model defaults, aspect ratio, and attribution fields.
- Translations are included when available so agents can map localized prompt intent.
Formats and discovery
The dataset is available in multiple machine-readable formats for search engines, AI agents, and data pipelines.
- JSONL is the recommended format for bulk ingestion.
- CSV is available for spreadsheets and BI tools.
- A JSON Schema and manifest describe field semantics and distributions.
Provenance and freshness
Records expose source fields when known and include timestamps so agents can evaluate freshness and provenance.
- The manifest includes version, generated_at, provenance, and same_as fields.
- Dataset route responses include ETag and Last-Modified headers.
- Category-level distributions allow smaller targeted crawls.