JSONStructuredExtraction
Extract structured JSON with LLMs
Convert unstructured text into a JSON object with predefined fields. Provide a schema name and the list of fields to extract. Compatible with OpenAI, Gemini, and Ollama.
type: "io.kestra.plugin.ai.completion.JSONStructuredExtraction"
Examples
Extract person fields (Gemini)
id: json_structured_extraction
namespace: company.ai
tasks:
- id: extract_person
type: io.kestra.plugin.ai.completion.JSONStructuredExtraction
schemaName: Person
jsonFields:
- name
- city
- country
- email
prompt: |
From the text below, extract the person's name, city, and email.
If a field is missing, leave it blank.
Text:
"Hi! I'm John Smith from Paris, France. You can reach me at john.smith@example.com."
provider:
type: io.kestra.plugin.ai.provider.GoogleGemini
apiKey: "{{ kv('GEMINI_API_KEY') }}"
modelName: gemini-2.5-flash
Extract order details (OpenAI)
id: json_structured_extraction_order
namespace: company.ai
tasks:
- id: extract_order
type: io.kestra.plugin.ai.completion.JSONStructuredExtraction
schemaName: Order
jsonFields:
- order_id
- customer_name
- city
- total_amount
prompt: |
Extract the order_id, customer_name, city, and total_amount from the message.
For the total amount, keep only the number without the currency symbol.
Return only JSON with the requested keys.
Message:
"Order #A-1043 for Jane Doe, shipped to Berlin. Total: 249.99 EUR."
provider:
type: io.kestra.plugin.ai.provider.OpenAI
apiKey: "{{ kv('OPENAI_API_KEY') }}"
modelName: gpt-5-mini
Properties
jsonFields *Requiredarray
JSON Fields
List of fields to extract from the text
prompt *Requiredstring
Text prompt
The input prompt for the AI model
provider *RequiredNon-dynamicAmazonBedrockAnthropicAzureOpenAIDeepSeekGoogleGeminiGoogleVertexAIMistralAIOllamaOpenAI
Language Model Provider
schemaName *Requiredstring
Schema Name
The name of the JSON schema for structured extraction
configuration Non-dynamicChatConfiguration
{}
Chat configuration
Outputs
extractedJson string
Extracted JSON
The structured JSON output
finishReason string
STOP
LENGTH
TOOL_EXECUTION
CONTENT_FILTER
OTHER
Finish reason
schemaName string
Schema Name
The schema name used for the structured JSON extraction
tokenUsage TokenUsage
Token usage
Definitions
Azure OpenAI Model Provider
endpoint *Requiredstring
API endpoint
The Azure OpenAI endpoint in the format: https://{resource}.openai.azure.com/
modelName *Requiredstring
Model name
type *Requiredobject
apiKey string
API Key
clientId string
Client ID
clientSecret string
Client secret
serviceVersion string
API version
tenantId string
Tenant ID
Google VertexAI Model Provider
endpoint *Requiredstring
Endpoint URL
location *Requiredstring
Project location
modelName *Requiredstring
Model name
project *Requiredstring
Project ID
type *Requiredobject
Google Gemini Model Provider
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
Mistral AI Model Provider
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
baseUrl string
API base URL
Ollama Model Provider
endpoint *Requiredstring
Model endpoint
modelName *Requiredstring
Model name
type *Requiredobject
OpenAI Model Provider
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
baseUrl string
API base URL
io.kestra.plugin.ai.domain.ChatConfiguration
logRequests booleanstring
Log LLM requests
If true, prompts and configuration sent to the LLM will be logged at INFO level.
logResponses booleanstring
Log LLM responses
If true, raw responses from the LLM will be logged at INFO level.
responseFormat ChatConfiguration-ResponseFormat
Response format
Defines the expected output format. Default is plain text.
Some providers allow requesting JSON or schema-constrained outputs, but support varies and may be incompatible with tool use.
When using a JSON schema, the output will be returned under the key jsonOutput
.
seed integerstring
Seed
Optional random seed for reproducibility. Provide a positive integer (e.g., 42, 1234). Using the same seed with identical settings produces repeatable outputs.
temperature numberstring
Temperature
Controls randomness in generation. Typical range is 0.0–1.0. Lower values (e.g., 0.2) make outputs more focused and deterministic, while higher values (e.g., 0.7–1.0) increase creativity and variability.
topK integerstring
Top-K
Limits sampling to the top K most likely tokens at each step. Typical values are between 20 and 100. Smaller values reduce randomness; larger values allow more diverse outputs.
topP numberstring
Top-P (nucleus sampling)
Selects from the smallest set of tokens whose cumulative probability is ≤ topP. Typical values are 0.8–0.95. Lower values make the output more focused, higher values increase diversity.
io.kestra.plugin.ai.domain.TokenUsage
inputTokenCount integer
outputTokenCount integer
totalTokenCount integer
Deepseek Model Provider
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name
type *Requiredobject
baseUrl string
https://api.deepseek.com/v1
API base URL
io.kestra.plugin.ai.domain.ChatConfiguration-ResponseFormat
jsonSchema object
JSON Schema (used when type = JSON)
Provide a JSON Schema describing the expected structure of the response. In Kestra flows, define the schema in YAML (it is still a JSON Schema object). Example (YAML):
responseFormat:
type: JSON
jsonSchema:
type: object
required: ["category", "priority"]
properties:
category:
type: string
enum: ["ACCOUNT", "BILLING", "TECHNICAL", "GENERAL"]
priority:
type: string
enum: ["LOW", "MEDIUM", "HIGH"]
Note: Provider support for strict schema enforcement varies. If unsupported, guide the model about the expected output structure via the prompt and validate downstream.
jsonSchemaDescription string
Schema description (optional)
Natural-language description of the schema to help the model produce the right fields. Example: "Classify a customer ticket into category and priority."
type string
TEXT
TEXT
JSON
Response format type
Specifies how the LLM should return output. Allowed values:
- TEXT (default): free-form natural language.
- JSON: structured output validated against a JSON Schema.
Amazon Bedrock Model Provider
accessKeyId *Requiredstring
AWS Access Key ID
modelName *Requiredstring
Model name
secretAccessKey *Requiredstring
AWS Secret Access Key
type *Requiredobject
modelType string
COHERE
COHERE
TITAN
Amazon Bedrock Embedding Model Type
Anthropic AI Model Provider
apiKey *Requiredstring
API Key
modelName *Requiredstring
Model name