curl --request POST \
--url https://api.sierra.absconsulting.com/v1/inference/{model} \
--header 'Api-Key: <api-key>' \
--header 'Content-Type: application/json' \
--data '
{
"max_completion_tokens": 100,
"max_tokens": 100,
"project_code": "NA",
"prompt": "What is the capital of France?",
"response_format": {
"json_schema": {
"description": "<string>",
"name": "<string>",
"schema": {},
"strict": true
},
"type": "text"
},
"stream": false,
"system_prompt": "You are a helpful assistant.",
"temperature": 0.7
}
'{
"model": "<string>",
"response": "<string>",
"usage": {
"completion_tokens": 123,
"prompt_tokens": 123,
"total_tokens": 123
}
}Send a request to an LLM to generate a completion. This endpoint is deprecated. Use the completions endpoint instead.
curl --request POST \
--url https://api.sierra.absconsulting.com/v1/inference/{model} \
--header 'Api-Key: <api-key>' \
--header 'Content-Type: application/json' \
--data '
{
"max_completion_tokens": 100,
"max_tokens": 100,
"project_code": "NA",
"prompt": "What is the capital of France?",
"response_format": {
"json_schema": {
"description": "<string>",
"name": "<string>",
"schema": {},
"strict": true
},
"type": "text"
},
"stream": false,
"system_prompt": "You are a helpful assistant.",
"temperature": 0.7
}
'{
"model": "<string>",
"response": "<string>",
"usage": {
"completion_tokens": 123,
"prompt_tokens": 123,
"total_tokens": 123
}
}Access the API as yourself. You can find your API key in your profile menu in Portal.
The model to use for inference. Currently, gpt-4.1, gpt-4o, gpt-4o-mini, o1, o3, o3-mini, o4-mini, llama-4-maverick, and nemo are supported.
The request parameters for the inference. Only the prompt is required and project code is required. The project code can either be 'NA' or the Royal Caribbean Group project code. Other parameters (system_prompt, temperature, max_tokens, max_completion_tokens, stream, response_format) are optional and model-dependent.
The maximum number of tokens the model can generate in the completion response. Use this parameter for models that support it (e.g., GPT-4o, GPT-4o-mini). Model-specific defaults apply if not provided.
100
The maximum number of tokens (including both prompt and completion) that the model can generate in a single request. Use this parameter for models that don't support max_completion_tokens. Model-specific defaults apply if not provided.
100
The Royal Caribbean Group project code for tracking and billing purposes. Use 'NA' if not associated with a specific project. This is a required field.
"NA"
The user's input message or question that the AI model will process and respond to. This is a required field.
"What is the capital of France?"
Optional format specification for the response. Can be used to request structured JSON output or enforce a specific JSON schema.
Show child attributes
If true, the response will be streamed back as it is generated, allowing for real-time output. If false, the complete response is returned after generation finishes.
false
Optional system message that sets the behavior and context for the AI assistant. If not provided, a default system prompt will be used.
"You are a helpful assistant."
Controls the randomness of the model's output. Lower values (e.g., 0.2) make the output more deterministic and focused, while higher values (e.g., 1.0) make it more creative and varied. Range typically 0.0-2.0. Model-specific defaults apply if not provided.
0.7
Was this page helpful?