arXiv.org e[B!]新着記事・評価 - はてなブックマーク

Outcome-based Reinforcement Learning to Predict the Future
3 users
arxiv.org

Reinforcement learning with verifiable rewards (RLVR) has boosted math and coding in large language models, yet there has been little effort to extend RLVR into messier, real-world domains like forecasting. One sticking point is that outcome-based reinforcement learning for forecasting must learn from binary, delayed, and noisy rewards, a regime where standard fine-tuning is brittle. We show that
- 学び
- 2025/05/28 12:20

Harnessing the Universal Geometry of Embeddings
7 users
arxiv.org

We introduce the first method for translating text embeddings from one vector space to another without any paired data, encoders, or predefined sets of matches. Our unsupervised approach translates any embedding to and from a universal latent representation (i.e., a universal semantic structure conjectured by the Platonic Representation Hypothesis). Our translations achieve high cosine similarity
- 学び
- 2025/05/22 06:05
- あとで読む
Robin: A multi-agent system for automating scientific discovery
3 users
arxiv.org

Scientific discovery is driven by the iterative process of background research, hypothesis generation, experimentation, and data analysis. Despite recent advancements in applying artificial intelligence to scientific discovery, no system has yet automated all of these stages in a single workflow. Here, we introduce Robin, the first multi-agent system capable of fully automating the key intellectua
- 学び
- 2025/05/21 11:11
- あとで読む
LLMs Get Lost In Multi-Turn Conversation
4 users
arxiv.org

Large Language Models (LLMs) are conversational interfaces. As such, LLMs have the potential to assist their users not only when they can fully specify the task at hand, but also to help them define, explore, and refine what they need through multi-turn conversational exchange. Although analysis of LLM conversation logs has confirmed that underspecification occurs frequently in user instructions,
- 学び
- 2025/05/15 12:32
- AI
DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation
3 users
arxiv.org

Retrieval-augmented generation (RAG) systems combine large language models (LLMs) with external knowledge retrieval, making them highly effective for knowledge-intensive tasks. A crucial but often under-explored component of these systems is the reranker. Since irrelevant documents in RAG systems can mislead the generator, the reranker plays a vital role in refining retrieved documents to enhance
- テクノロジー
- 2025/05/14 21:07
- あとで読む
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
3 users
arxiv.org

Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards. Recent RLVR works that operate under the zero setting avoid supervision in labeling the reasoning process, but still depend on manually curated collections of questions and answers for training. The scarcity of hig
- 学び
- 2025/05/11 22:06
- あとで読む
Machine Learning: a Lecture Note
10 users
arxiv.org

This lecture note is intended to prepare early-year master's and PhD students in data science or a related discipline with foundational ideas in machine learning. It starts with basic ideas in modern machine learning with classification as a main target task. These basic ideas include loss formulation, backpropagation, stochastic gradient descent, generalization, model selection as well as fundame
- テクノロジー
- 2025/05/08 23:14
- 機械学習
MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration
3 users
arxiv.org

General matrix-vector multiplication (GeMV) remains a critical latency bottleneck in large language model (LLM) inference, even with quantized low-bit models. Processing-Using-DRAM (PUD), an analog in-DRAM computing technique, has the potential to repurpose on-device DRAM as a GeMV engine, offering additional high-throughput processing capabilities to widespread consumer devices without DRAM modif
- テクノロジー
- 2025/05/05 11:44
UXAgent: A System for Simulating Usability Testing of Web Design with LLM Agents
3 users
arxiv.org

Usability testing is a fundamental research method that user experience (UX) researchers use to evaluate and iterate a web design, but\textbf{ how to evaluate and iterate the usability testing study design } itself? Recent advances in Large Language Model-simulated Agent (\textbf{LLM Agent}) research inspired us to design \textbf{UXAgent} to support UX researchers in evaluating and reiterating the
- テクノロジー
- 2025/04/28 09:06
PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch
4 users
arxiv.org

CUDA Graphs -- a recent hardware feature introduced for NVIDIA GPUs -- aim to reduce CPU launch overhead by capturing and launching a series of GPU tasks (kernels) as a DAG. However, deploying CUDA Graphs faces several challenges today due to the static structure of a graph. It also incurs performance overhead due to data copy. In fact, we show a counter-intuitive result -- deploying CUDA Graphs h
- テクノロジー
- 2025/04/26 00:33
- あとで読む
AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents
3 users
arxiv.org

A/B testing experiment is a widely adopted method for evaluating UI/UX design decisions in modern web applications. Yet, traditional A/B testing remains constrained by its dependence on the large-scale and live traffic of human participants, and the long time of waiting for the testing result. Through formative interviews with six experienced industry practitioners, we identified critical bottlene
- テクノロジー
- 2025/04/23 09:17
- あとで読む
BitNet b1.58 2B4T Technical Report
4 users
arxiv.org

We introduce BitNet b1.58 2B4T, the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale. Trained on a corpus of 4 trillion tokens, the model has been rigorously evaluated across benchmarks covering language understanding, mathematical reasoning, coding proficiency, and conversational ability. Our results demonstrate that BitNet b1.58 2B4T achieves performanc
- 学び
- 2025/04/18 13:05
NNN: Next-Generation Neural Networks for Marketing Mix Modeling
3 users
arxiv.org

We present NNN, a Transformer-based neural network approach to Marketing Mix Modeling (MMM) designed to address key limitations of traditional methods. Unlike conventional MMMs which rely on scalar inputs and parametric decay functions, NNN uses rich embeddings to capture both quantitative and qualitative aspects of marketing and organic channels (e.g., search queries, ad creatives). This, combine
- 学び
- 2025/04/09 14:10
An Introduction to Logical Relations
3 users
arxiv.org

Logical relations (LR) have been around for many years, and today they are used in many formal results. However, it can be difficult to LR beginners to find a good place to start to learn. Papers often use highly specialized LRs that use the latest advances of the technique which makes it impossible to make a proper presentation within the page limit. This note is a good starting point for beginne
- テクノロジー
- 2025/04/05 15:49
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
5 users
arxiv.org

The era of intelligent agents is upon us, driven by revolutionary advancements in large language models. Large Language Model (LLM) agents, with goal-driven behaviors and dynamic adaptation capabilities, potentially represent a critical pathway toward artificial general intelligence. This survey systematically deconstructs LLM agent systems through a methodology-centered taxonomy, linking architec
- テクノロジー
- 2025/03/30 12:26
- あとで読む
ZeroMerge: Parameter-Free KV Cache Compression for Memory-Efficient Long-Context LLMs
3 users
arxiv.org

The linear growth of key-value (KV) cache memory and quadratic computational complexity pose significant bottlenecks for large language models (LLMs) in long-context processing. While existing KV cache optimization methods address these challenges through token pruning or feature merging, they often suffer from irreversible information loss or require costly parameter retraining. We propose ZeroMe
- テクノロジー
- 2025/03/28 10:48
- あとで読む
AI-native Memory 2.0: Second Me
4 users
arxiv.org

Human interaction with the external world fundamentally involves the exchange of personal memory, whether with other individuals, websites, applications, or, in the future, AI agents. A significant portion of this interaction is redundant, requiring users to repeatedly provide the same information across different contexts. Existing solutions, such as browser-stored credentials, autofill mechanism
- テクノロジー
- 2025/03/23 10:09
- あとで読む
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
4 users
arxiv.org

We introduce SmolDocling, an ultra-compact vision-language model targeting end-to-end document conversion. Our model comprehensively processes entire pages by generating DocTags, a new universal markup format that captures all page elements in their full context with location. Unlike existing approaches that rely on large foundational models, or ensemble solutions that rely on handcrafted pipeline
- 学び
- 2025/03/21 13:11
Hilbert's sixth problem: derivation of fluid equations via Boltzmann's kinetic theory
4 users
arxiv.org

In this paper, we rigorously derive the fundamental PDEs of fluid mechanics, such as the compressible Euler and incompressible Navier-Stokes-Fourier equations, starting from the hard sphere particle systems undergoing elastic collisions. This resolves Hilbert's sixth problem, as it pertains to the program of deriving the fluid equations from Newton's laws by way of Boltzmann's kinetic theory. The
- 学び
- 2025/03/20 13:46
Transformers without Normalization
5 users
arxiv.org

Normalization layers are ubiquitous in modern neural networks and have long been considered essential. This work demonstrates that Transformers without normalization can achieve the same or better performance using a remarkably simple technique. We introduce Dynamic Tanh (DyT), an element-wise operation $DyT($x$) = \tanh(\alpha $x$)$, as a drop-in replacement for normalization layers in Transforme
- テクノロジー
- 2025/03/16 18:22
Dudeney's Dissection is Optimal
3 users
arxiv.org

In 1907, Henry Ernest Dudeney posed a puzzle: ``cut any equilateral triangle \dots\ into as few pieces as possible that will fit together and form a perfect square'' (without overlap, via translation and rotation). Four weeks later, Dudeney demonstrated a beautiful four-piece solution, which today remains perhaps the most famous example of a dissection. In this paper (over a century later), we fin
- 学び
- 2025/03/10 15:53
Chain of Draft: Thinking Faster by Writing Less
13 users
arxiv.org

Large Language Models (LLMs) have demonstrated remarkable performance in solving complex reasoning tasks through mechanisms like Chain-of-Thought (CoT) prompting, which emphasizes verbose, step-by-step reasoning. However, humans typically employ a more efficient strategy: drafting concise intermediate thoughts that capture only essential information. In this work, we propose Chain of Draft (CoD),
- テクノロジー
- 2025/03/09 11:59
- LLM
- あとで読む
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
3 users
arxiv.org

We introduce LADDER (Learning through Autonomous Difficulty-Driven Example Recursion), a framework which enables Large Language Models to autonomously improve their problem-solving capabilities through self-guided learning by recursively generating and solving progressively simpler variants of complex problems. Unlike prior approaches that require curated datasets or human feedback, LADDER leverag
- テクノロジー
- 2025/03/07 16:40
Probabilistic Artificial Intelligence
3 users
arxiv.org

Artificial intelligence commonly refers to the science and engineering of artificial systems that can carry out tasks generally associated with requiring aspects of human intelligence, such as playing games, translating languages, and driving cars. In recent years, there have been exciting advances in learning-based, data-driven approaches towards AI, and machine learning and deep learning have en
- 学び
- 2025/03/05 18:47
Fixed point theorem in metric spaces and its application to the Collatz conjecture
8 users
arxiv.org

In this paper, we show the new fixed point theorem in metric spaces. Furthermore, for this fixed point theorem, we apply to the Collatz conjecture.
- 学び
- 2025/03/03 17:26
- あとで読む
Byte Latent Transformer: Patches Scale Better Than Tokens
3 users
arxiv.org

We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inference efficiency and robustness. BLT encodes bytes into dynamically sized patches, which serve as the primary units of computation. Patches are segmented based on the entropy of the next byte, allocating
- テクノロジー
- 2025/02/24 20:35
RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation
14 users
arxiv.org

Despite Retrieval-Augmented Generation (RAG) showing promising capability in leveraging external knowledge, a comprehensive evaluation of RAG systems is still challenging due to the modular nature of RAG, evaluation of long-form responses and reliability of measurements. In this paper, we propose a fine-grained evaluation framework, RAGChecker, that incorporates a suite of diagnostic metrics for b
- テクノロジー
- 2025/02/22 16:53
- 人工知能
- あとで読む
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2
3 users
arxiv.org

We present AlphaGeometry2, a significantly improved version of AlphaGeometry introduced in Trinh et al. (2024), which has now surpassed an average gold medalist in solving Olympiad geometry problems. To achieve this, we first extend the original AlphaGeometry language to tackle harder problems involving movements of objects, and problems containing linear equations of angles, ratios, and distances
- 学び
- 2025/02/10 11:38
s1: Simple test-time scaling
9 users
arxiv.org

Test-time scaling is a promising new approach to language modeling that uses extra test-time compute to improve performance. Recently, OpenAI's o1 model showed this capability but did not publicly share its methodology, leading to many replication efforts. We seek the simplest approach to achieve test-time scaling and strong reasoning performance. First, we curate a small dataset s1K of 1,000 ques
- テクノロジー
- 2025/02/06 05:43
- 機械学習

はてなブックマーク

はてなブックマーク

『arXiv.org e-Print archive』

Outcome-based Reinforcement Learning to Predict the Future

Harnessing the Universal Geometry of Embeddings

Robin: A multi-agent system for automating scientific discovery

LLMs Get Lost In Multi-Turn Conversation

DynamicRAG: Leveraging Outputs of Large Language Model as Feedback for Dynamic Reranking in Retrieval-Augmented Generation

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Machine Learning: a Lecture Note

MVDRAM: Enabling GeMV Execution in Unmodified DRAM for Low-Bit LLM Acceleration

UXAgent: A System for Simulating Usability Testing of Web Design with LLM Agents

PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch

AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents

BitNet b1.58 2B4T Technical Report

NNN: Next-Generation Neural Networks for Marketing Mix Modeling

An Introduction to Logical Relations

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

ZeroMerge: Parameter-Free KV Cache Compression for Memory-Efficient Long-Context LLMs

AI-native Memory 2.0: Second Me

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Hilbert's sixth problem: derivation of fluid equations via Boltzmann's kinetic theory

Transformers without Normalization

Dudeney's Dissection is Optimal

Chain of Draft: Thinking Faster by Writing Less

LADDER: Self-Improving LLMs Through Recursive Problem Decomposition

Probabilistic Artificial Intelligence

Fixed point theorem in metric spaces and its application to the Collatz conjecture

Byte Latent Transformer: Patches Scale Better Than Tokens

RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation

Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2

s1: Simple test-time scaling

キーボードショートカット一覧

はてなブックマーク

公式Twitter

はてなのサービス

『arXiv.org e-Print archive』

このページはまだブックマークされていません

キーボードショートカット一覧

公式Twitter

はてなのサービス

このページはまだ
ブックマークされていません