/
How to choose an LLM

How to choose an LLM

While creating a customized agent, you will need to select a model (also known as LLM - Large Language Model), which will take care of a given task, such as retrieving information from the web or a data source to generate an answer relevant to the user’s query, analyzing data, summarizing content, creating content and much more.

Therefore it is crucial to choose the right LLM for your use case. This page will give you some insights to help you select a model from the ones we offer.

Available LLMs

Below is the list of available LLMs with their main characteristics :

EU region

Model name

Provider

Context window

(Max number of tokens)

Model name

Provider

Context window

(Max number of tokens)

GPT 4o (EU)

Azure

128K

GPT 4o mini (EU)

Azure

128k

Mistral Mini (EU)

Azure

128k

Mistral Large (EU)

Azure

128k

Claude Haiku (EU)

AWS

200k

Claude Sonnet (EU)

AWS

200k

Llama 3.3 70B (EU)

Azure

128k

Mistral Mini (EU)

Mistral

128k

Mistral Large (EU)

Mistral

128k

Mistral Small (EU)

Mistral

128k

Cohere Rerank 3 (EU)

Azure

8k per article

US region

NOT YET AVAILABLE FOR USE

Model name

Provider

Context window

(Max number of tokens)

Model name

Provider

Context window

(Max number of tokens)

o1 (US)

Azure

200k

GPT 4o (US)

Azure

128k

GPT 4o mini (US)

Azure

128k

Mistral Mini (US)

Azure

128k

Mistral Large (US)

Azure

128k

Claude Haiku (US)

Anthropic

200k

Claude Sonnet (US)

Anthropic

200k

Claude Haiku (US)

AWS

200k

Claude Sonnet (US)

AWS

200k

Gemini Flash (US)

Google

250k

Gemini Pro (US)

Google

250k

Llama 3.3 70B (US)

Azure

128k

Llama 3.3 70B (US)

AWS

128k

Llama 3.1 8B (US)

AWS

128k

Nova Lite (US)

AWS

128k

Nova Pro (US)

AWS

128k

Selection criteria

Selecting an LLM for an agent depends on the following criteria:

  • the type of task done by the agent. Some LLMs excel in specific tasks like image analysis or creative writing for example.

  • the context window of the LLM. This corresponds to the total number of tokens in input and output that the LLM can process.

  • the price for the processed tokens. Some LLMs are more expensive than others.

  • the response time of the LLM. Some LLMs answer faster than others.

  • the stability of the LLM: For the same question, if two answers provided by the model are very similar both in content and structure.

  • the verbosity/conciseness of the answers: Some LLMs provide more concise answers while others are more verbose.

Best performing models

Below are our insights on the best-performing models.

Type of task

Image analysis

If you intend for your agents to analyze images, we recommend choosing one of the following models, which tend to perform better for this task:

  • Claude Sonnet

  • Claude Haiku

  • GPT 4o

  • Gemini Flash

  • Gemini Pro

Creative writing

For creating writing tasks, we recommend choosing one of the smaller models, as they tend to perform better for these tasks:

  • GPT 4o mini

  • Claude Haiku

  • Gemini Flash

  • Llama 3.1 8B

  • Mistral Mini

  • Mistral Small

Knowledge-based tasks (RAG)

For tasks needing to retrieve information from a data source (RAG), we recommend using one of the larger models as they provide better answers and source their answers:

  • Claude Sonnet

  • GPT 4o

  • Gemini Pro

Context window

Some LLMs have larger context windows than others, which might be interesting if you intend to process substantial documents. Here is the list of models with larger context windows:

  • Claude Haiku - 200k tokens maximum

  • Claude Sonnet - 200 tokens maximum

  • Gemini Pro - 250k tokens maximum

  • Gemini Flash - 250k tokens maximum

Conversely, if you want to process smaller documents, you might want to choose an LLM with a smaller context window, such as:

  • GPT 4o mini - 128k tokens maximum

  • GPT 4o - 128k tokens maximum

  • Llama 3.1 70B - 128k tokens maximum

  • Llama 3.1 8B - 128k tokens maximum

  • Llama 3.3 70B - 128k tokens maximum

  • Mistral Mini - 128k tokens maximum

  • Mistral Small - 128k tokens maximum

  • Mistral Large - 128k tokens maximum

The context window size also matters if you want your agent to retain more information from earlier in the conversation.

Price

If one of your selection criteria is the price, you can choose one of the less expensive models per request, such as:

  • Claude Haiku

  • GPT 4o mini

  • Gemini Flash

  • Llama 3.1 8B

  • Mistral Mini

  • Mistral Small

Response time

If you want your agents to respond quickly, we recommend you choose one of the models with the best response times:

  • Gemini Flash

  • GPT 4o mini

Note that the differences in response time are very small for the other models.

Out of all the models, Mistral Large has the slowest response time.

Stability

The models that provide the most stable answers are:

  • Claude Haiku

  • Claude Sonnet

  • GPT 4o

  • GPT 4o mini

  • Gemini Pro

Verbosity/Conciseness

If you want your agents to provide more lengthy and complicated answers, we recommend you choose one of the following models:

  • Claude Haiku

  • Claude Sonnet

  • GPT 4o mini

If you want your agents to provide more concise answers, you can choose one of the following models:

  • GPT 4o

  • Llama 3.3 70B

  • Gemini Flash

  • Mistral Mini

  • Mistral Small

  • Mistral Large

If you want the answers to be neither too verbose nor too concise, you can choose either Gemini Pro or Llama 3.1 8B.

Related content

Getting started
Getting started
More like this
Product updates
Product updates
More like this
Tools
More like this