While creating a customized agent, you will need to select an LLM (a model (also known as LLM - Large Language Model), which will take care of a given task, such as retrieving information from the web or a data source to generate an answer relevant to the user’s query, analyzing data, summarizing content, creating content and much more.

Model name	Provider	Context window (Max number of tokens)
GPT 4o (EU)	Azure OpenAI	128K
GPT 4o (East US)	Azure OpenAI	128k
GPT 4o (South US)	Azure OpenAI	128k
GPT 4o mini (EU)	Azure OpenAI	128k
GPT 4o mini (East US)	Azure OpenAI	128k
GPT 4o mini (South US)	Azure OpenAI	128k
Mistral Mini (EU)Mistral	Azure	128k
Mistral Large (EU)Mistral	Azure	128k
Mistral Small	Mistral	128k	Azure - Mistral Mini (EUUS)	Azure	128k
Azure - Mistral Large (EUUS)	Azure	128k
Azure - Mistral Mini Claude Haiku (US)Azure	Anthropic	128k	Azure - Mistral Large 200k
Claude Sonnet (US)	AzureAnthropic	128k
Groq - Llama 3.2 11B	Groq	128k
Groq - Llama 3.2 90B	Groq	128k
Claude Haiku	Anthropic200k
Claude Haiku (EU)	AWS	200k
Claude Sonnet (EU)	AWS	200k
Claude Haiku (US)	AWS	200k
Claude Sonnet (US)	AnthropicAWS	200k
Gemini Flash (US)	Google	1 mil250k
Gemini Pro (US)	Google	2 mil250k
Azure - Llama 3.2 11B 1 70B (EUUS)Azure	AWS	128k
Azure - Llama 3.2 90B 1 8B (US)	AWS	128k
Mistral Mini (EU)	AzureMistral	128k
Azure - Llama 3.2 11B (USMistral Large (EU)Azure	Mistral	128k
Azure - Llama 3.2 90B (USMistral Small (EU)Azure	Mistral	128k

Selection criteria

Note
need more info from comparison tests (Alban will do them when he has time)

...

Below are our insights on the best-performing models. (might be outdated)

Overall,

GPT 4o is the best-performing model.
GPT 4o mini has good performance and is very fast and cheap.
Mistral Large is slightly worse than GPT 4o but slightly cheaper.
The Groq models are ideal when the text to generate is long.

Versions Compared

Old Version 5

New Version 6

Key

Selection criteria

Page Comparison

Versions Compared

Old Version 5

New Version 6

Key

Selection criteria