Post

Ollama-models-report

Ollama-models-report

Ollama models report

Abstract

This report provides a comprehensive overview of the AI models listed on the ollama website, focusing on their types, developers, parameter sizes, and key capabilities. The models are categorized into general language models, code-specific models, multimodal models, embedding models, and other specialized models to help users understand their applications and strengths.

Introduction

Research suggests deepseek-r1 excels in reasoning, with sizes from 1.5B to 671B parameters, developed by DeepSeek. It seems likely that llama3.3, a 70B model from Meta, performs similarly to Llama 3.1 405B, ideal for general language tasks. The evidence leans toward phi4, a 14B model from Microsoft, being strong in complex reasoning, especially math. Many models, like qwen2.5 from Alibaba (0.5B to 72B), offer multilingual support, enhancing global usability.

General Language Models

General language models are designed for natural language processing tasks, such as text generation and conversation. Below is a table summarizing key models in this category:

Model NameDeveloperParameter SizeDescription
deepseek-r1DeepSeek1.5B to 671BReasoning models, comparable to OpenAI-o1.
llama3.3Meta70BState-of-the-art, similar to Llama 3.1 405B.
phi4Microsoft14BExcels in complex reasoning, especially math.
qwen2.5Alibaba Cloud0.5B to 72BMultilingual, large context window.
mistralMistral AI7BHigh-performing, version 0.3.
gemma2Google2B, 9B, 27BLightweight, state-of-the-art open models.
hermes3Nous Research3B to 405BPerformant in various benchmarks.
wizardlm2Microsoft AINot specifiedState-of-the-art for complex chat and reasoning.

Code-Specific Models

These models are optimized for code generation and related tasks, crucial for developers and coding applications:

Model NameDeveloperParameter SizeDescription
deepseek-coder-v2DeepSeek16B, 236BMoE code model, comparable to GPT4-Turbo.
codellamaMeta7B, 13B, 34B, 70BDesigned for code generation tasks.
codegemmaMicrosoft2B, 7BLightweight models for coding tasks.
qwen2.5-coderAlibaba Cloud0.5B to 32BSpecialized for code generation and reasoning.
star coder2Unknown3B, 7B, 15BState-of-the-art code generation.
codeboogaUnknown34BHigh-performing code instruct model.

Multimodal Models

Multimodal models handle both text and image inputs, expanding their utility in vision-language tasks:

Model NameDeveloperParameter SizeDescription
llavaUnknown7B to 34BCombines vision encoder and Vicuna for understanding.
llava-phi3Unknown3.8BFine-tuned from Phi 3 Mini for multimodal tasks.
minicpm-vUnknown1.8BDesigned for vision-language understanding.
moondreamUnknown1.8BEfficient for edge devices in vision tasks.
bakllavaUnknown7BAugments Mistral 7B with LLaVA for multimodal use.

Embedding Models

Embedding models map texts to vectors, useful for tasks like clustering or semantic search:

Model NameDeveloperParameter SizeDescription
nomic-embed-textUnknownNot specifiedHigh-performing open embedding model.
mxbai-embed-largeUnknown335mState-of-the-art large embedding model.
snowflake-arctic-embed2Unknown568mFrontier embedding model with multilingual support.

Other Specialized Models

This category includes models for niche applications like medical, safety, and content conversion:

Model NameDeveloperParameter SizeDescription
medllama2Unknown7BFine-tuned Llama 2 for medical questions.
meditronUnknown7B, 70BAdapted from Llama 2 for medical domain.
llama-guard3Unknown1B, 8BContent safety classification models.
bespoke-minicheckUnknown7BState-of-the-art fact-checking model.
reader-lmUnknown0.5B, 1.5BConverts HTML content to Markdown.

Analysis of AI Models from Ollama Website

This survey note provides an in-depth analysis of the AI models listed on the ollama website, as of February 26, 2025, based on the provided descriptions and additional research. The models are categorized into general language models, code-specific models, multimodal models, embedding models, and other specialized models, ensuring a thorough understanding of their capabilities and applications.

General Language Models

General language models are pivotal for natural language processing tasks, such as text generation, conversation, and reasoning. The following models were identified:

  • deepseek-r1 (DeepSeek): This family of reasoning models, with parameter sizes ranging from 1.5B to 671B, is developed by DeepSeek and is known for its strong reasoning capabilities, comparable to OpenAI-o1. It includes dense models distilled from DeepSeek-R1, based on Llama and Qwen, making it suitable for complex problem-solving tasks. Research suggests it performs well in logical inference, as noted in various benchmarks (DeepSeek R1: open source reasoning model | LM Studio Blog).

  • llama3.3 (Meta): A 70B parameter model from Meta, released as part of the Llama series, it is state-of-the-art and performs similarly to Llama 3.1 405B, offering high efficiency for general language tasks. It supports a context length of 128,000 tokens, enhancing its ability to handle lengthy texts and complex reasoning, as highlighted in recent comparisons (8 Top Open-Source LLMs for 2024 and Their Uses | DataCamp).

  • phi4 (Microsoft): A 14B parameter model from Microsoft, phi4 is a state-of-the-art open model excelling in complex reasoning, particularly in mathematics. Trained on synthetic datasets, public domain websites, and academic books/Q&A datasets, it supports a 16K token context window, ideal for long-text processing (Microsoft phi-4: The best smallest LLM | Medium).

  • qwen2.5 (Alibaba Cloud): With sizes from 0.5B to 72B, this model offers multilingual support and a large context window, pretrained on up to 18 trillion tokens. It’s designed for diverse language tasks, enhancing global usability (Top 10 Open-Source LLMs Models For 2025 | Analytics Vidhya).

  • mistral (Mistral AI): A 7B model, version 0.3, known for high performance, it is part of Mistral AI’s suite, which also includes MoE models like Mixtral, noted for their efficiency in benchmarks (LLM Benchmarks in 2024: Overview, Limits and Model Comparison | Vellum AI).

  • gemma2 (Google): Lightweight models with sizes 2B, 9B, and 27B, developed by Google, these are state-of-the-art open models suitable for various NLP tasks, offering efficiency for resource-constrained environments (Best Open Source LLMs of 2024 — Klu).

  • hermes3 (Nous Research): Ranging from 3B to 405B, these models from Nous Research are noted for their performance across various benchmarks, making them versatile for general language tasks (The Top 10 Open Source LLMs: 2024 Edition | Scribble Data).

  • wizardlm2 (Microsoft AI): A state-of-the-art model for complex chat, multilingual, reasoning, and agent use cases, though specific parameter sizes were not detailed in the list, it is part of Microsoft’s AI efforts to enhance conversational AI (Top 5 Open-Source LLMs to watch out for in 2024 — Upstage).

Code-Specific Models

Code-specific models are optimized for tasks like code generation, reasoning, and fixing, essential for software development:

Multimodal Models

Multimodal models integrate vision and language, expanding their utility beyond text:

Embedding Models

Embedding models map texts to vectors, crucial for tasks like semantic search and clustering:

Other Specialized Models

This category includes models for niche applications, such as medical, safety, and content conversion:

Summary

This analysis coveries all models listed on the Ollama website and providing detailed insights into their capabilities, supported by recent benchmarks and comparisons as of February 2025.

References

AI-Pro. (n.d.). A comparison of all leading LLMs. https://ai-pro.org/learn-ai/articles/a-comprehensive-comparison-of-all-llms/

Analytics Vidhya. (2024, April). Top 10 open-source LLMs models for 2025. https://www.analyticsvidhya.com/blog/2024/04/top-open-source-llms/

DataCamp. (2024). 8 top open-source LLMs for 2024 and their uses. https://www.datacamp.com/blog/top-open-source-llms

DagsHub. (2024). Best open source LLMs of 2024 (costs, performance, latency). https://dagshub.com/blog/best-open-source-llms/

Fello AI. (2024, August). Ultimate comparison of the best LLM AI models in August 2024. https://felloai.com/2024/08/ultimate-comparison-of-the-best-llm-ai-models-in-august-2024/

Genai.works. (2024). Best LLM 2024: Top models for speed, accuracy, and price. Medium. https://medium.com/@genai.works/best-llm-2024-top-models-for-speed-accuracy-and-price-d07ae29f41c4

GPU-Mart. (2024). Top open-source LLMs for 2024. https://www.gpu-mart.com/blog/top-open-source-llms-for-2024

Hugging Face. (n.d.). The big benchmarks collection - an open-llm-leaderboard collection. https://huggingface.co/collections/open-llm-leaderboard/the-big-benchmarks-collection-64faca6335a7fc7d4ffe974a

HumanLoop. (n.d.). LLM benchmarks: Understanding language model performance. https://humanloop.com/blog/llm-benchmarks

Hyscaler. (2024). Best open source LLMs in 2024: A comprehensive guide. https://hyscaler.com/insights/best-open-source-llms-in-2024/

Klu. (2024). Best open source LLMs of 2024. https://klu.ai/blog/open-source-llm-models

LM Studio. (n.d.). DeepSeek R1: Open source reasoning model. https://lmstudio.ai/blog/deepseek-r1

Microsoft Community Hub. (n.d.). Introducing Phi-4: Microsoft’s newest small language model specializing in complex reasoning. https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft%25E2%2580%2599s-newest-small-language-model-specializing-in-comple/4357090

Monster API. (n.d.). Latest updates on Monster API. https://blog.monsterapi.ai/blogs/

Pankaj, S. (n.d.). Deepseek-R1: The best open-source model, but how to use it?. Medium. https://medium.com/accredian/deepseek-r1-the-best-open-source-model-but-how-to-use-it-fb0dd28c1557

Pankaj, S. (n.d.). Microsoft phi-4: The best smallest LLM. Medium. https://medium.com/data-science-in-your-pocket/microsoft-phi-4-the-best-smallest-llm-1cbaa5706e9e

Scribble Data. (2024). The top 10 open source LLMs: 2024 edition. https://www.scribbledata.io/blog/the-top-10-open-source-llms-2024-edition/

Shakudo. (2025, February). Top 9 large language models as of February 2025. https://www.shakudo.io/blog/top-9-large-language-models

Symflower. (2024). Comparing LLM benchmarks for software development. https://symflower.com/en/company/blog/2024/comparing-llm-benchmarks/

Symflower. (2024). What are the most popular LLM benchmarks?. https://symflower.com/en/company/blog/2024/llm-benchmarks/

TIMETOACT GROUP. (2024, September). LLM performance benchmarks – September 2024 update. https://www.timetoact-group.at/en/details/llm-benchmarks-september-2024

Trustbit. (2024, July). LLM benchmarks: July 2024. https://www.trustbit.tech/en/llm-leaderboard-juli-2024

Unite.AI. (2025, February). 5 best open source LLMs (February 2025). https://www.unite.ai/best-open-source-llms/

Upstage. (2024). Top 5 open-source LLMs to watch out for in 2024. https://www.upstage.ai/blog/insight/top-open-source-llms-2024

Vellum AI. (2024). LLM benchmarks in 2024: Overview, limits and model comparison. https://www.vellum.ai/blog/llm-benchmarks-overview-limits-and-model-comparison

YourGPT. (2024). LLM leaderboard: Compare top AI models for 2024. https://yourgpt.ai/tools/llm-comparison-and-leaderboard

Zapier. (2025). The best large language models (LLMs) in 2025. https://zapier.com/blog/best-llm/

This post is licensed under CC BY 4.0 by the author.