[B! reasoning model] arrowKatoのブックマーク

arrowKato id:arrowKato

reasoning modelに関するarrowKatoのブックマーク (3)

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
In recent years, Large Language Models (LLMs) have been undergoing rapid iteration and evolution (OpenAI, 2024a; Anthropic, 2024; Google, 2024), progressively diminishing the gap towards Artificial General Intelligence (AGI). Recently, post-training has emerged as an important component of the full training pipeline. It has been shown to enhance accuracy on reasoning tasks, align with social value
arrowKato 2025/01/27
DeepSeek R1の論文

LLM

reasoning model

DeepSeek-R1
リンク
GitHub - MoonshotAI/Kimi-k1.5
There are a few key ingredients about the design and training of k1.5. Long context scaling. We scale the context window of RL to 128k and observe continued improvement of performance with an increased context length. A key idea behind our approach is to use partial rollouts to improve training efficiency---i.e., sampling new trajectories by reusing a large chunk of previous trajectories, avoiding
arrowKato 2025/01/21
OSSになるかわからないですが、OpenAI, Google 以外が出したreasoning model

LLM

reasoning model
リンク
GitHub - deepseek-ai/DeepSeek-R1
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Z
arrowKato 2025/01/21
早くもOSSでreasoning modelがでるなんて。しかも性能もo1並

LLM

DeepSeek

reasoning model
リンク
1

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx