Skip to content
View rookielyb's full-sized avatar

Block or report rookielyb

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Comprehensive Benchmarking System for CodeWiki

Python 6 4 Updated Nov 23, 2025

The Cloud Sandbox Built for AI Agents

Python 632 36 Updated Dec 23, 2025

Commit0: Library Generation from Scratch

Python 174 11 Updated May 8, 2025

SWE-Swiss: A Multi-Task Fine-Tuning and RL Recipe for High-Performance Issue Resolution

Python 102 5 Updated Sep 24, 2025

Unleashing the Power of Reinforcement Learning for Math and Code Reasoners

Python 737 44 Updated Jun 6, 2025
Python 448 51 Updated Aug 13, 2024

Trae Agent is an LLM-based agent for general purpose software engineering tasks.

Python 10,425 1,100 Updated Sep 24, 2025

MCP server integrating GEPA (Genetic-Evolutionary Prompt Architecture) for automatic prompt optimization with Claude Desktop

Python 46 3 Updated Nov 10, 2025

MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models

Python 57 3 Updated Jul 24, 2025

Code and Data for Tau-Bench

Python 1,037 165 Updated Aug 28, 2025
Python 153 21 Updated Oct 29, 2025

Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving

Python 300 42 Updated Dec 18, 2025

OpenDeepWiki is the open-source version of the DeepWiki project, aiming to provide a powerful knowledge management and collaboration platform. The project is mainly developed using C# and TypeScrip…

C# 2,594 329 Updated Dec 20, 2025

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,368 1,346 Updated Jul 9, 2025

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 2,555 190 Updated Jan 1, 2026
Python 1,376 120 Updated Sep 12, 2025

Go ahead and axolotl questions

Python 11,017 1,227 Updated Jan 1, 2026

A project to improve skills of large language models

Python 729 135 Updated Jan 1, 2026

A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

Python 280 13 Updated Sep 25, 2025

Seed-Coder is a family of lightweight open-source code LLMs comprising base, instruct and reasoning models, developed by ByteDance Seed.

719 54 Updated Jun 6, 2025

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Python 25,970 1,823 Updated Oct 13, 2025

[ACL2024] Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios

Python 66 2 Updated Aug 5, 2025

Code and data for "Measuring and Narrowing the Compositionality Gap in Language Models"

Jupyter Notebook 323 36 Updated Dec 28, 2023

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

Python 12,656 1,303 Updated Dec 17, 2025

Hammer: Highly Agile Masks Made Effortlessly from RTL

Python 308 71 Updated Oct 10, 2025
Python 404 32 Updated Oct 16, 2025

The evaluation benchmark on MCP servers

Python 232 15 Updated Sep 3, 2025

Model Context Protocol Servers

TypeScript 75,339 9,135 Updated Dec 19, 2025

Lightweight coding agent that runs in your terminal

Rust 55,048 7,025 Updated Jan 1, 2026

Function Calling Benchmark & Testing

Jupyter Notebook 92 5 Updated Jul 10, 2024
Next