Best Large Language Models - Page 9

Compare the Top Large Language Models as of July 2025 - Page 9

  • 1
    VideoPoet
    VideoPoet is a simple modeling method that can convert any autoregressive language model or large language model (LLM) into a high-quality video generator. It contains a few simple components. An autoregressive language model learns across video, image, audio, and text modalities to autoregressively predict the next video or audio token in the sequence. A mixture of multimodal generative learning objectives are introduced into the LLM training framework, including text-to-video, text-to-image, image-to-video, video frame continuation, video inpainting and outpainting, video stylization, and video-to-audio. Furthermore, such tasks can be composed together for additional zero-shot capabilities. This simple recipe shows that language models can synthesize and edit videos with a high degree of temporal consistency.
  • 2
    Aya

    Aya

    Cohere AI

    Aya is a new state-of-the-art, open-source, massively multilingual, generative large language research model (LLM) covering 101 different languages — more than double the number of languages covered by existing open-source models. Aya helps researchers unlock the powerful potential of LLMs for dozens of languages and cultures largely ignored by most advanced models on the market today. We are open-sourcing both the Aya model, as well as the largest multilingual instruction fine-tuned dataset to-date with a size of 513 million covering 114 languages. This data collection includes rare annotations from native and fluent speakers all around the world, ensuring that AI technology can effectively serve a broad global audience that have had limited access to-date.
  • 3
    Tune AI

    Tune AI

    NimbleBox

    Leverage the power of custom models to build your competitive advantage. With our enterprise Gen AI stack, go beyond your imagination and offload manual tasks to powerful assistants instantly – the sky is the limit. For enterprises where data security is paramount, fine-tune and deploy generative AI models on your own cloud, securely.
  • 4
    Command R

    Command R

    Cohere AI

    Command’s model outputs come with clear citations that mitigate the risk of hallucinations and enable the surfacing of additional context from the source materials. Command can write product descriptions, help draft emails, suggest example press releases, and much more. Ask Command multiple questions about a document to assign a category to the document, extract a piece of information, or answer a general question about the document. Where answering a few questions about a document can save you a few minutes, doing it for thousands of documents can save a company years. This family of scalable models balances high efficiency with strong accuracy to enable enterprises to move from proof of concept into production-grade AI.
  • 5
    CodeGemma
    CodeGemma is a collection of powerful, lightweight models that can perform a variety of coding tasks like fill-in-the-middle code completion, code generation, natural language understanding, mathematical reasoning, and instruction following. CodeGemma has 3 model variants, a 7B pre-trained variant that specializes in code completion and generation from code prefixes and/or suffixes, a 7B instruction-tuned variant for natural language-to-code chat and instruction following; and a state-of-the-art 2B pre-trained variant that provides up to 2x faster code completion. Complete lines, and functions, and even generate entire blocks of code, whether you're working locally or using Google Cloud resources. Trained on 500 billion tokens of primarily English language data from web documents, mathematics, and code, CodeGemma models generate code that's not only more syntactically correct but also semantically meaningful, reducing errors and debugging time.
  • 6
    Defense Llama
    Scale AI is proud to announce Defense Llama, the Large Language Model (LLM) built on Meta’s Llama 3 that is specifically customized and fine-tuned to support American national security missions. Defense Llama, available exclusively in controlled U.S. government environments within Scale Donovan, empowers our service members and national security professionals to apply the power of generative AI to their unique use cases, such as planning military or intelligence operations and understanding adversary vulnerabilities. Defense Llama was trained on a vast dataset, including military doctrine, international humanitarian law, and relevant policies designed to align with the Department of Defense (DoD) guidelines for armed conflict as well as the DoD’s Ethical Principles for Artificial Intelligence. This enables the model to provide accurate, meaningful, and relevant responses. Scale is proud to enable U.S. national security personnel to use generative AI safely and securely for defense.
  • 7
    OpenAI o3-mini
    OpenAI o3-mini is a lightweight version of the advanced o3 AI model, offering powerful reasoning capabilities in a more efficient and accessible package. Designed to break down complex instructions into smaller, manageable steps, o3-mini excels in coding tasks, competitive programming, and problem-solving in mathematics and science. This compact model provides the same high-level precision and logic as its larger counterpart but with reduced computational requirements, making it ideal for use in resource-constrained environments. With built-in deliberative alignment, o3-mini ensures safe, ethical, and context-aware decision-making, making it a versatile tool for developers, researchers, and businesses seeking a balance between performance and efficiency.
  • 8
    Hunyuan-TurboS
    Tencent's Hunyuan-TurboS is a next-generation AI model designed to offer rapid responses and outstanding performance in various domains such as knowledge, mathematics, and creative tasks. Unlike previous models that require "slow thinking," Hunyuan-TurboS enhances response speed, doubling word output speed and reducing first-word latency by 44%. Through innovative architecture, it provides superior performance while lowering deployment costs. This model combines fast thinking (intuition-based responses) with slow thinking (logical analysis), ensuring quicker, more accurate solutions across diverse scenarios. Hunyuan-TurboS excels in benchmarks, competing with leading models like GPT-4 and DeepSeek V3, making it a breakthrough in AI-driven performance.
  • 9
    Amazon Titan
    Amazon Titan is a series of advanced foundation models (FMs) from AWS, designed to enhance generative AI applications with high performance and flexibility. Built on AWS's 25 years of AI and machine learning experience, Titan models support a range of use cases such as text generation, summarization, semantic search, and image generation. Titan models are optimized for responsible AI use, incorporating built-in safety features and fine-tuning capabilities. They can be customized with your own data through Retrieval Augmented Generation (RAG) to improve accuracy and relevance, making them ideal for both general-purpose and specialized AI tasks.
  • 10
    OpenAI o4-mini
    The o4-mini model is a compact and efficient version of the o3 model, released following the launch of GPT-4.1. It offers enhanced reasoning capabilities, with improved performance in tasks that require complex reasoning and problem-solving. The o4-mini is designed to meet the growing demand for advanced AI solutions, serving as a more efficient alternative while maintaining the capabilities of its predecessor. This model is part of OpenAI's strategy to refine and advance their AI technologies ahead of the anticipated GPT-5 launch.
  • 11
    Grok 4
    Grok 4, developed by xAI, is an advanced AI model designed to provide highly accurate and contextually relevant answers to a wide range of questions. Building on its predecessors, it offers enhanced reasoning capabilities, improved natural language understanding, and the ability to process complex queries with greater depth. Accessible through platforms like grok.com, x.com, and mobile apps, Grok 4 supports features such as voice mode (iOS only) and specialized modes like DeepSearch for iterative web analysis. With a focus on accelerating human scientific discovery, it delivers concise, truthful responses, making it a powerful tool for users seeking reliable insights.
  • 12
    Llama

    Llama

    Meta

    Llama (Large Language Model Meta AI) is a state-of-the-art foundational large language model designed to help researchers advance their work in this subfield of AI. Smaller, more performant models such as Llama enable others in the research community who don’t have access to large amounts of infrastructure to study these models, further democratizing access in this important, fast-changing field. Training smaller foundation models like Llama is desirable in the large language model space because it requires far less computing power and resources to test new approaches, validate others’ work, and explore new use cases. Foundation models train on a large set of unlabeled data, which makes them ideal for fine-tuning for a variety of tasks. We are making Llama available at several sizes (7B, 13B, 33B, and 65B parameters) and also sharing a Llama model card that details how we built the model in keeping with our approach to Responsible AI practices.
  • 13
    OPT

    OPT

    Meta

    Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.
  • 14
    T5

    T5

    Google

    With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself.
  • 15
    PanGu-α

    PanGu-α

    Huawei

    PanGu-α is developed under the MindSpore and trained on a cluster of 2048 Ascend 910 AI processors. The training parallelism strategy is implemented based on MindSpore Auto-parallel, which composes five parallelism dimensions to scale the training task to 2048 processors efficiently, including data parallelism, op-level model parallelism, pipeline model parallelism, optimizer model parallelism and rematerialization. To enhance the generalization ability of PanGu-α, we collect 1.1TB high-quality Chinese data from a wide range of domains to pretrain the model. We empirically test the generation ability of PanGu-α in various scenarios including text summarization, question answering, dialogue generation, etc. Moreover, we investigate the effect of model scales on the few-shot performances across a broad range of Chinese NLP tasks. The experimental results demonstrate the superior capabilities of PanGu-α in performing various tasks under few-shot or zero-shot settings.
  • 16
    Megatron-Turing
    Megatron-Turing Natural Language Generation model (MT-NLG), is the largest and the most powerful monolithic transformer English language model with 530 billion parameters. This 105-layer, transformer-based MT-NLG improves upon the prior state-of-the-art models in zero-, one-, and few-shot settings. It demonstrates unmatched accuracy in a broad set of natural language tasks such as, Completion prediction, Reading comprehension, Commonsense reasoning, Natural language inferences, Word sense disambiguation, etc. With the intent of accelerating research on the largest English language model till date and enabling customers to experiment, employ and apply such a large language model on downstream language tasks - NVIDIA is pleased to announce an Early Access program for its managed API service to MT-NLG mode.
  • 17
    Galactica
    Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge alone. Galactica is a large language model that can store, combine and reason about scientific knowledge. We train on a large scientific corpus of papers, reference material, knowledge bases and many other sources. We outperform existing models on a range of scientific tasks. On technical knowledge probes such as LaTeX equations, Galactica outperforms the latest GPT-3 by 68.2% versus 49.0%. Galactica also performs well on reasoning, outperforming Chinchilla on mathematical MMLU by 41.3% to 35.7%, and PaLM 540B on MATH with a score of 20.4% versus 8.8%.
  • 18
    PanGu-Σ

    PanGu-Σ

    Huawei

    Significant advancements in the field of natural language processing, understanding, and generation have been achieved through the expansion of large language models. This study introduces a system which utilizes Ascend 910 AI processors and the MindSpore framework to train a language model with over a trillion parameters, specifically 1.085T, named PanGu-{\Sigma}. This model, which builds upon the foundation laid by PanGu-{\alpha}, takes the traditionally dense Transformer model and transforms it into a sparse one using a concept known as Random Routed Experts (RRE). The model was efficiently trained on a dataset of 329 billion tokens using a technique called Expert Computation and Storage Separation (ECSS), leading to a 6.3-fold increase in training throughput via heterogeneous computing. Experimentation indicates that PanGu-{\Sigma} sets a new standard in zero-shot learning for various downstream Chinese NLP tasks.
  • 19
    OpenELM

    OpenELM

    Apple

    OpenELM is an open-source language model family developed by Apple. It uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy compared to existing open language models of similar size. OpenELM is trained on publicly available datasets and achieves state-of-the-art performance for its size.
  • 20
    LTM-2-mini

    LTM-2-mini

    Magic AI

    LTM-2-mini is a 100M token context model: LTM-2-mini. 100M tokens equals ~10 million lines of code or ~750 novels. For each decoded token, LTM-2-mini’s sequence-dimension algorithm is roughly 1000x cheaper than the attention mechanism in Llama 3.1 405B1 for a 100M token context window. The contrast in memory requirements is even larger – running Llama 3.1 405B with a 100M token context requires 638 H100s per user just to store a single 100M token KV cache.2 In contrast, LTM requires a small fraction of a single H100’s HBM per user for the same context.
  • 21
    OpenAI o3-mini-high
    The o3-mini-high model from OpenAI advances AI reasoning by refining deep problem-solving in coding, mathematics, and complex tasks. It features adaptive thinking time with adjustable reasoning modes (low, medium, high) to optimize performance based on task complexity. Outperforming the o1 series by 200 Elo points on Codeforces, it delivers high efficiency at a lower cost while maintaining speed and accuracy. As part of the o3 family, it pushes AI problem-solving boundaries while remaining accessible, offering a free tier and expanded limits for Plus subscribers.
  • 22
    Grounded Language Model (GLM)
    Contextual AI introduces its Grounded Language Model (GLM), engineered specifically to minimize hallucinations and deliver highly accurate, source-based responses for retrieval-augmented generation (RAG) and agentic applications. The GLM prioritizes faithfulness to the provided data, ensuring responses are grounded in specific knowledge sources and backed by inline citations. With state-of-the-art performance on the FACTS groundedness benchmark, the GLM outperforms other foundation models in scenarios requiring high accuracy and reliability. The model is designed for enterprise use cases like customer service, finance, and engineering, where trustworthy and precise responses are critical to minimizing risks and improving decision-making.
  • 23
    ERNIE 4.5 Turbo
    ERNIE 4.5 Turbo, unveiled by Baidu at the 2025 Baidu Create conference, is a cutting-edge AI model designed to handle a variety of data inputs, including text, images, audio, and video. It offers powerful multimodal processing capabilities that enable it to perform complex tasks across industries such as customer support automation, content creation, and data analysis. With enhanced reasoning abilities and reduced hallucinations, ERNIE 4.5 Turbo ensures that businesses can achieve higher accuracy and reliability in AI-driven processes. Additionally, this model is priced at just 1% of GPT-4.5’s cost, making it a highly cost-effective alternative for enterprises looking for top-tier AI performance.
  • 24
    Chinchilla

    Chinchilla

    Google DeepMind

    Chinchilla is a large language model. Chinchilla uses the same compute budget as Gopher but with 70B parameters and 4× more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks. This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatly facilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of 67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher.