New completions endpoint
We added a new endpoint to the SIERRA API accessible at /v1/completions. The purpose of this endpoint is to send a request to an LLM to generate a completion. This is what the /v1/inference endpoint also does, but the completions endpoint has two distinct advantages:
- You can pass full message arrays to the LLM, instead of just a prompt as a string. Here’s an example of what that looks like:
{
"model": "gpt-5-mini",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of France?"
}
]
}
This allows you to maintain conversation context, include system instructions, and structure complex multi-turn conversations.
- The completions endpoint is OpenAI compatible. OpenAI compatibility means that the endpoint follows the same request/response format as OpenAI’s API, making it a drop-in replacement for OpenAI endpoints. That means you can easily connect tools like Obsidian, LangChain, or any application that supports OpenAI-compatible APIs to this endpoint, enabling free (for you), secure access to LLMs without changing your existing code.
New models
We’ve refreshed the models available in the new completions endpoint. Here’s when to use each of these:
- gpt-5-1 - The most capable GPT-5 model, ideal for complex reasoning tasks, code generation, and sophisticated analysis. Use this when you need the highest quality output and can tolerate longer response times.
- gpt-5-mini - A balanced model offering strong performance at a lower cost. Great for general-purpose tasks, content generation, and most production workloads where you need reliable quality without the premium price.
- gpt-5-nano - The fastest and most cost-effective GPT-5 model. Perfect for simple queries, quick completions, and high-volume applications where speed and cost efficiency are priorities.
- claude-haiku-4-5 - Anthropic’s fastest model, excellent for quick responses, simple Q&A, and high-throughput scenarios. Use this when you need Claude’s safety-focused approach with minimal latency.
- claude-sonnet-4-5 - A balanced Claude model offering strong reasoning capabilities. Ideal for analysis, summarization, and tasks requiring careful consideration of context.
- claude-opus-4-1 - Anthropic’s most capable model, best for complex reasoning, long-form content generation, and tasks requiring deep understanding. Use when you need Claude’s highest quality output.
- grok-4-fast-non-reasoning - Grok’s fast model without reasoning capabilities. Good for straightforward completions and when you need Grok’s unique perspective on current events and real-time information.
- grok-4-fast-reasoning - Grok’s fast model with reasoning capabilities. Useful when you need Grok’s reasoning combined with its real-time knowledge, though note that this model is occasionally experiencing downtime right now.
We’ve also added an endpoint GET /v1/completions/models which lists all available models in the POST /v1/completions endpoint, with some key information about each. Use this endpoint to see the latest models, as we will periodically refresh the models in the API.
Deprecation of inference endpoint
Due to the addition of the OpenAI compatible completions endpoint, equipped with latest models, we are deprecating the /v1/inference endpoint on December 17th.
Important: The /v1/inference endpoint will be deprecated on December 17th. Please migrate your workloads to the /v1/completions endpoint before this date to avoid service interruptions.
New embedding model
We added support for text-embed-3-large to the /v1/embed endpoint. The text-embed-3-small embedding model is suitable for most use cases, but the larger embedding dimensions provide better semantic understanding and improved accuracy for similarity comparisons. We have also added a GET endpoint at /v1/embed/models which lists available models to the POST endpoint, similar to completions. Use this to see the latest models available, as we will periodically refresh them.
We added four new tools endpoints to the API:
/v1/tools/web/crawl - Give it a list of URLs and it returns the contents of those URLs. This is useful for extracting and processing content from specific web pages you want to analyze or incorporate into your applications.
/v1/tools/web/search - Give it a search query and optional filters, and it returns top search results with content. Perfect for finding relevant information across the web, gathering current data, or performing research queries programmatically.
/v1/tools/web/research - Kick off a long-running web research task that runs asynchronously (think Deep Research in ChatGPT). This endpoint initiates comprehensive research on a topic, gathering information from multiple sources and synthesizing findings.
/v1/tools/web/research/{research_id} - Get the status of a research task (useful to poll the research task for completion, then see the research results). Use this endpoint to check on the progress of your research tasks and retrieve the final results once complete.
You will see these tools in use in SIERRA soon, with the addition of web agents.
Explore
Ready to get started? Check out our API reference for detailed documentation and examples. We’ve also recently launched the Documents Service—another powerful tool for your AI workflows.
Your feedback helps us build better tools. If you have ideas, questions, or run into issues, please reach out. Our goal is to empower you to build custom automations and stay at the forefront of how modern companies leverage AI.