While creating a customized agent, you will need to select an LLM (a model (also known as LLM - Large Language Model), which will take care of a given task, such as retrieving information from the web or a data source to generate an answer relevant to the user’s query, analyzing data, summarizing content, creating content and much more.
...
Model name | Provider | Context window (Max number of tokens) | |||
---|---|---|---|---|---|
GPT 4o (EU) | Azure OpenAI | 128K | |||
GPT 4o (East US) | Azure OpenAI | 128k | |||
GPT 4o (South US) | Azure OpenAI | 128k | |||
GPT 4o mini (EU) | Azure OpenAI | 128k | |||
GPT 4o mini (East US) | Azure OpenAI | 128k | |||
GPT 4o mini (South US) | Azure OpenAI | 128k | |||
Mistral Mini (EU)Mistral | Azure | 128k | |||
Mistral Large (EU)Mistral | Azure | 128k | |||
Mistral Small | Mistral | 128k | Azure - Mistral Mini (EUUS) | Azure | 128k |
Azure - Mistral Large (EUUS) | Azure | 128k | |||
Azure - Mistral Mini Claude Haiku (US)Azure | Anthropic | 128k | Azure - Mistral Large 200k | ||
Claude Sonnet (US) | AzureAnthropic | 128k | |||
Groq - Llama 3.2 11B | Groq | 128k | |||
Groq - Llama 3.2 90B | Groq | 128k | |||
Claude Haiku | Anthropic200k | ||||
Claude Haiku (EU) | AWS | 200k | |||
Claude Sonnet (EU) | AWS | 200k | |||
Claude Haiku (US) | AWS | 200k | |||
Claude Sonnet (US) | AnthropicAWS | 200k | |||
Gemini Flash (US) | 1 mil250k | ||||
Gemini Pro (US) | 2 mil250k | ||||
Azure - Llama 3.2 11B 1 70B (EUUS)Azure | AWS | 128k | |||
Azure - Llama 3.2 90B 1 8B (US) | AWS | 128k | |||
Mistral Mini (EU) | AzureMistral | 128k | |||
Azure - Llama 3.2 11B (USMistral Large (EU)Azure | Mistral | 128k | |||
Azure - Llama 3.2 90B (USMistral Small (EU)Azure | Mistral | 128k |
Selection criteria
Note |
---|
need more info from comparison tests (Alban will do them when he has time) |
...
Below are our insights on the best-performing models. (might be outdated)
Overall,
GPT 4o is the best-performing model.
GPT 4o mini has good performance and is very fast and cheap.
Mistral Large is slightly worse than GPT 4o but slightly cheaper.
The Groq models are ideal when the text to generate is long.