Did you know Martian sponsors mechanistic interpretability and AI safety research? We provide grants, resources, and free API access, and mentor upcoming researchers getting into the field. This week we're highlighting work that came out of a hackathon we sponsored with Apart Research. Today's highlighted project is: Guardian-Loop: Mechanistically Interpretable Micro-Judges with Adversarial Self-Improvement by Efstathios Siatras (@efsiatras) & Man Kit Tom Chan (@chantomkit) from the abstract: Guardian-Loop is a mechanistically interpretable judge system designed to enhance the Expert Orchestration Architecture through transparent and efficient safety evaluation. We train lightweight classifiers that pre-filter prompts for safety using a Llama 3.1 8B model, fine-tuning only the upper layers to directly output True or False responses. This project had so many things we're excited about: - Model-agnostic safety filtering - Predicting model behavior without running inference - Cost reduction - Mechanistic interpretability leading to conclusions that are understandable by normal people You can read @adammichaelwood's write up on the Martian Blog: https://lnkd.in/gMiURRZg Efstathios and Man also wanted us to mention some of the people that inspired their work. WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs by Seungju Han (@seungjuhan3) & Kavel Rao (@kavel_rao), et al https://lnkd.in/gS-Amtdz "Seungju Han & Kavel Rao’s WildGuard has strong foundations in taxonomy-based safety classification and adversarial training data which inspired our project to extend in this direction with self-improving mechanisms and interpretability." Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts by Mikayel Samvelyan (@samvelyan) & Andrei Lupu, et al https://lnkd.in/gx2cyuDA "Our project extends Mikayel Samvelyan & Andrei Lupu's Rainbow Teaming adversarial prompt generation framework to make it more efficient by reusing existing datasets and aiming to create new instances more efficiently." This is the kind of research that Martian is committed to supporting, so a big thank you to Apart Research (@apartresearch) for partnering with us, organizing the hackathon, and supporting these researchers as they further their exciting work. Martian's research powers our product. You can try out the Research Preview of our LLM Gateway and Code Router today. https://lnkd.in/ghE8PF7n We have more research highlights coming from the Apart Hackathon over the next several days. Follow @withmartian so you won't miss anything.
Martian
Software Development
San Francisco, California 5,138 followers
Outperform any AI model with model routing
About us
Martian built the first model router, backed by $9M from NEA, General Catalyst, and Prosus Ventures. You can think of us like Google for LLMs: every time you send us a request, we automatically find and use the LLM which will give you the best result at the lowest cost. Engineers at 300+ companies, from Amazon to Zapier, have used Martian to achieve higher performance and lower costs, with greater security and reliability. The team consists of previous AI researchers at Stanford, Harvard, University of Pennsylvania, the Google Bard Team, and Microsoft Research who have previously built and sold multiple NLP companies and published in the leading AI research journals.
- Website
-
https://withmartian.com
External link for Martian
- Industry
- Software Development
- Company size
- 11-50 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2022
- Specialties
- Artificial Intelligence, AI interpretability, research, AI safety, AI cost reduction, model deployment, enterprise solutions, and startups
Locations
-
Primary
301 Lyon St
San Francisco, California 94117, US
Employees at Martian
Updates
-
Announcing a research preview and a new paper! `martian/code` uses MI research to out-perform existing models on codegen: withmartian.com/code (openai-api compatible) One focus: text2sql. A peek behind the research preview: TinySQL, a paper on how models generate SQL. We bridge mech interp research and commercialization. MI is stuck between toy & real tasks. Text-to-SQL is ideal: structured yet realistic. Extracting features about how models generate SQL helps choose the best model for each individual codegen request. Key discoveries: 🧩 We identified minimal circuits that can generate SQL! 📊 Models process SQL in stages - first SQL keywords for intent, then table/column names for execution ⚠️ Even similar queries activate different circuits - current interpretability methods have major limitations Open Science: We're releasing everything! - TinySQL datasets - 18 fine-tuned models - Sparse autoencoders Paper: https://lnkd.in/g6Te8sFB Models: https://lnkd.in/gfUrRG3N Credits to our wonderful team and collaborators at Gretel AI and NVIDIA! 🎉 Full thread with all the technical details: https://lnkd.in/gzXahpJZ If you're doing codegen or text2sql, play around with our research preview! 🤖