Skip to main content
POST
/
v1
/
inference
/
{model}
Generate completion
curl --request POST \
  --url https://api.sierra.absconsulting.com/v1/inference/{model} \
  --header 'Api-Key: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "max_completion_tokens": 100,
  "max_tokens": 100,
  "project_code": "NA",
  "prompt": "What is the capital of France?",
  "response_format": {
    "json_schema": {
      "description": "<string>",
      "name": "<string>",
      "schema": {},
      "strict": true
    },
    "type": "text"
  },
  "stream": false,
  "system_prompt": "You are a helpful assistant.",
  "temperature": 0.7
}
'
{
  "model": "<string>",
  "response": "<string>",
  "usage": {
    "completion_tokens": 123,
    "prompt_tokens": 123,
    "total_tokens": 123
  }
}

Authorizations

Api-Key
string
header
required

Access the API as yourself. You can find your API key in your profile menu in Portal.

Path Parameters

model
string
required

The model to use for inference. Currently, gpt-4.1, gpt-4o, gpt-4o-mini, o1, o3, o3-mini, o4-mini, llama-4-maverick, and nemo are supported.

Body

application/json

The request parameters for the inference. Only the prompt is required and project code is required. The project code can either be 'NA' or the Royal Caribbean Group project code. Other parameters (system_prompt, temperature, max_tokens, max_completion_tokens, stream, response_format) are optional and model-dependent.

max_completion_tokens
integer

The maximum number of tokens the model can generate in the completion response. Use this parameter for models that support it (e.g., GPT-4o, GPT-4o-mini). Model-specific defaults apply if not provided.

Example:

100

max_tokens
integer

The maximum number of tokens (including both prompt and completion) that the model can generate in a single request. Use this parameter for models that don't support max_completion_tokens. Model-specific defaults apply if not provided.

Example:

100

project_code
string

The Royal Caribbean Group project code for tracking and billing purposes. Use 'NA' if not associated with a specific project. This is a required field.

Example:

"NA"

prompt
string

The user's input message or question that the AI model will process and respond to. This is a required field.

Example:

"What is the capital of France?"

response_format
object

Optional format specification for the response. Can be used to request structured JSON output or enforce a specific JSON schema.

stream
boolean

If true, the response will be streamed back as it is generated, allowing for real-time output. If false, the complete response is returned after generation finishes.

Example:

false

system_prompt
string

Optional system message that sets the behavior and context for the AI assistant. If not provided, a default system prompt will be used.

Example:

"You are a helpful assistant."

temperature
number

Controls the randomness of the model's output. Lower values (e.g., 0.2) make the output more deterministic and focused, while higher values (e.g., 1.0) make it more creative and varied. Range typically 0.0-2.0. Model-specific defaults apply if not provided.

Example:

0.7

Response

OK

model
string

The model used for the inference.

response
string

The response from the model.

usage
object

The usage for the inference.