Split runner into execution and transport modules #34

mwotton · 2026-01-01T03:27:36Z

Summary

split runner logic into execution and transport submodules with stable exports
move schema helper functions into runner execution module and update tests
reorganize runner unit tests to target new module paths

Testing

cargo test --workspace --all-features
cargo llvm-cov --workspace --fail-under-lines 100 --features tooltest-cli/cli-test-server

llamapreview

AI Code Review by LlamaPReview

🎯 TL;DR & Recommendation

Recommendation: Approve with suggestions

This PR successfully refactors the runner module into execution and transport submodules, improving modularity and maintainability without breaking the public API. All changes are backward compatible, with thorough test updates ensuring no regressions.

📄 Documentation Diagram

This diagram illustrates the new modular workflow of the runner module after refactoring, highlighting the separation between execution and transport logic.

sequenceDiagram
    participant C as Caller
    participant R as Runner Module
    participant E as Execution Module
    participant T as Transport Module
    note over R: PR #35;34 splits runner into execution and transport modules
    C->>R: call run_stdio or run_http
    R->>T: connect to endpoint
    T-->>R: return SessionDriver
    R->>E: call run_with_session
    E->>E: execute sequence with assertions and coverage
    E-->>R: return RunResult
    R-->>C: return final result

🌟 Strengths

Backward Compatibility: Public API remains unchanged, preventing disruption for external consumers like the CLI and integration tests.
Clean Separation of Concerns: Modularization isolates execution logic from transport handling, enhancing code organization and testability.

Priority	File	Category	Impact Summary (≤12 words)	Anchors
P2	tooltest-core/.../execution.rs	Architecture	Core execution logic moved to dedicated module, preserving API.	runner_tests.rs, output_schema_validation_tests.rs
P2	tooltest-core/.../transport.rs	Architecture	Transport entry points isolated, maintaining identical signatures for CLI and tests.	tooltest-cli/lib.rs, runner_tests.rs
P2	tooltest-core/.../execution.rs	Testing	Test-only function marked with dead code allowance, clarifying its purpose.	runner_unit_tests.rs
P2	tooltest-core/.../mod.rs	Architecture	Root module acts as clean facade, re-exporting public API without changes.	(omit)
P2	tooltest-core/.../runner_unit_tests.rs	Testing	Tests updated to reference moved types, ensuring alignment with new module boundaries.	transport.rs

🔍 Notable Themes

Consistent Architectural Alignment: The refactoring consistently organizes code by functional boundaries (execution vs. transport), reducing cognitive load and improving maintainability.
Thorough Test Updates: All unit tests have been meticulously updated to reflect new module paths, preserving test coverage and reliability.

💡 Have feedback? We'd love to hear it in our GitHub Discussions.
✨ This review was generated by LlamaPReview Advanced, which is free for all open-source projects. Learn more.

llamapreview · 2026-01-01T03:37:29Z

tooltest-core/src/runner/execution.rs

+/// assertion rules.
+///
+/// Requires a multi-thread Tokio runtime; current-thread runtimes are rejected.
+pub async fn run_with_session(


P2 | Confidence: High

The refactoring successfully moves the core execution logic to a dedicated module without altering the public API. The related_context confirms that external test callers (e.g., runner_tests.rs, output_schema_validation_tests.rs) continue to invoke run_with_session unchanged, demonstrating backward compatibility. This modularization isolates execution from transport concerns, improving separation of concerns and testability. No functional regressions are introduced.

llamapreview · 2026-01-01T03:37:29Z

tooltest-core/src/runner/transport.rs

+/// Execute a tooltest run against a stdio MCP endpoint.
+///
+/// Uses the same default and declarative assertions as [`run_with_session`].
+pub async fn run_stdio(


P2 | Confidence: High

The transport-specific entry points (run_stdio, run_http) are now cleanly separated into their own module. The related_context shows these functions are directly used by the CLI (tooltest-cli/src/lib.rs) and integration tests (runner_tests.rs). The refactoring preserves the exact same function signatures and behavior, ensuring no breaking changes for external consumers. This improves architectural clarity by isolating transport setup from core execution logic.

llamapreview · 2026-01-01T03:37:29Z

tooltest-core/src/runner/execution.rs

+#[cfg_attr(not(test), allow(dead_code))]
+pub(super) async fn execute_sequence(
+    session: &SessionDriver,
+    validators: &BTreeMap<String, jsonschema::Validator>,
+    assertions: &AssertionSet,
+    sequence: &[ToolInvocation],
+) -> Result<Vec<TraceEntry>, FailureContext> {


P2 | Confidence: High

The execute_sequence function is now marked #[cfg_attr(not(test), allow(dead_code))], indicating it's only used in tests. The related_context confirms it is called by unit tests in runner_unit_tests.rs. This is a maintainability improvement—it clearly signals the function's test-only purpose and prevents dead code warnings in production builds. However, ensure no production code paths (e.g., legacy generator mode) inadvertently depend on it; the PR's state-machine focus suggests this is safe.

llamapreview · 2026-01-01T03:37:29Z

tooltest-core/src/runner/mod.rs

 mod assertions;
 mod coverage;
-mod schema;
+mod execution;
+mod transport;

 #[cfg(test)]
 #[path = "../../tests/internal/runner_unit_tests.rs"]
 mod tests;



P2 | Confidence: High

The root mod.rs now acts as a clean facade, re-exporting only the public API (run_with_session, RunnerOptions, run_stdio, run_http) from the new submodules. This preserves the exact same external interface for the runner module, preventing any breaking changes for downstream crates (e.g., tooltest-cli). The internal module structure (assertions, coverage, execution, transport) is now explicitly organized by responsibility, enhancing code navigation and reducing cognitive load.

llamapreview · 2026-01-01T03:37:29Z

tooltest-core/tests/internal/runner_unit_tests.rs

+fn connect_result(
+    result: Result<SessionDriver, SessionError>,
+) -> transport::ConnectFuture<'static> {
    Box::pin(async move { result })
 }


P2 | Confidence: High

The unit tests have been updated to reference the moved ConnectFuture type alias from the new transport module (instead of the old root module). This demonstrates that the test updates are thorough and align with the new module boundaries. The related_context shows the alias is defined in transport.rs, confirming the test's correctness. This ensures the test suite remains functional and aligned with the refactored codebase.

llamapreview

LlamaPReview: Follow-up Review for PR #34

1. Status Update

This follow-up review shows significant progress since my initial assessment. The PR has successfully implemented the runner module split into execution and transport components as planned, with all major refactoring complete. The changes demonstrate a thoughtful approach to modularization while maintaining functionality.

Key Improvements Since Last Review:

✅ Completed the runner module split into focused submodules (execution, transport, etc.)
✅ Moved schema helpers into the execution module
✅ Updated all tests to target new module paths
✅ Maintained 100% test coverage throughout the refactoring
✅ Added new issue tracking for test helper extraction (tooltest-4zr)

New Additions:

Added execution.rs (216 lines) - Core runner logic
Added transport.rs (66 lines) - Transport-specific entry points
Added prepare.rs (70 lines) - Run preparation logic
Added result.rs (151 lines) - Result handling utilities
Added sequence.rs (146 lines) - Sequence execution
Added state_machine.rs (127 lines) - State machine execution

2. Detailed Analysis

2.1 Module Structure Improvements

The refactoring has successfully decomposed the monolithic runner.rs (730 lines → 18 lines) into focused modules:

// New modular structure
mod assertions;
mod coverage;
mod execution;    // NEW
mod prepare;      // NEW
mod result;       // NEW
mod schema;
mod sequence;     // NEW
mod state_machine;// NEW
mod transport;    // NEW

Technical Assessment:

The decomposition follows clear responsibility boundaries
Public API surface remains stable through re-exports
Internal dependencies are properly encapsulated
Module interfaces are well-defined and focused

2.2 Execution Flow Improvements

The new prepare_run function (in prepare.rs) demonstrates excellent separation of concerns:

pub(super) async fn prepare_run(
    session: &SessionDriver,
    config: &RunConfig,
) -> Result<PreparedRun, RunResult> {
    // 1. List tools
    // 2. Validate tools
    // 3. Collect warnings
    // 4. Build validators
    // 5. Return prepared state
}

Impact Analysis:

✅ Clear separation between preparation and execution phases
✅ Better error handling isolation
✅ Improved testability of individual components
✅ Maintains all existing functionality

2.3 Transport Layer Abstraction

The transport module provides clean entry points:

pub async fn run_stdio(...) -> RunResult {
    run_with_transport(...).await
}

pub async fn run_http(...) -> RunResult {
    run_with_transport(...).await
}

Architectural Benefits:

Transport-specific logic is properly isolated
Common execution path through run_with_session
Clean separation between transport initialization and execution
Maintains backward compatibility

2.4 Test Updates and Coverage

The test updates show thorough attention to detail:

// Updated imports in tests
use super::assertions::{...};
use super::coverage::{...};
use super::result::{...};
use super::sequence::{...};
use super::state_machine::{...};
use super::transport::{...};

Testing Assessment:

✅ All tests updated to new module paths
✅ Test coverage maintained at 100%
✅ Test helper types properly imported
✅ No regression in test functionality

2.5 Error Handling Improvements

The refactored error handling in result.rs maintains all previous functionality while being better organized:

pub(super) fn failure_result(
    failure: RunFailure,
    trace: Vec<TraceEntry>,
    minimized: Option<MinimizedSequence>,
    warnings: Vec<RunWarning>,
    coverage: Option<CoverageReport>,
    corpus: Option<CorpusReport>,
) -> RunResult {
    // ... unchanged logic ...
}

3. Outstanding Concerns

3.1 Minor Technical Debt

Issue: The #[cfg_attr(not(test), allow(dead_code))] annotations suggest some functions might be unused in production. While this was present before, the refactoring provides an opportunity to clean this up.

Example:

#[cfg_attr(not(test), allow(dead_code))]
pub(super) async fn execute_sequence(...)

Recommendation: Review these annotations to determine if:

The functions are truly needed for tests only
They should be promoted to public API
They can be removed entirely

3.2 Module Visibility

Issue: Some modules use pub(super) visibility which might be too restrictive for future extensions.

Example:

pub(super) struct PreparedRun {
    // ...
}

Recommendation: Consider whether pub(crate) would be more appropriate for some internal types that might be needed across the crate.

3.3 Future Extensibility

Observation: The current structure is excellent for the immediate needs but could benefit from:

Documentation: Module-level docs explaining the separation of concerns
Feature Flags: Potential to make some modules optional via Cargo features
Trait Boundaries: Clearer trait definitions for extension points

4. Recommendations

4.1 High Priority (Should Address Before Merge)

Documentation Update:
- Add module-level documentation explaining the new architecture
- Update README.md if it references runner internals
- Example:

     //! # Runner Execution Module
     //!
     //! Contains the core execution logic for tooltest runs, separated from
     //! transport concerns and result handling. This module focuses on:
     //! - Session preparation
     //! - Sequence execution
     //! - State machine handling
     //! - Assertion application

Visibility Review:
- Audit all pub(super) usages to ensure appropriate visibility
- Consider promoting some types to pub(crate) where beneficial

4.2 Medium Priority (Can Address Post-Merge)

Technical Debt Cleanup:
- Remove or document all #[allow(dead_code)] instances
- Verify if any can be eliminated through the new structure
Extension Points:
- Consider adding extension traits for custom execution strategies
- Example:

     pub trait ExecutionStrategy {
         async fn execute(
             &self,
             session: &SessionDriver,
             sequence: &[ToolInvocation]
         ) -> Result<Vec<TraceEntry>, RunFailure>;
     }

4.3 Long-term Considerations

Performance Benchmarking:
- The refactoring shouldn't impact performance but worth verifying
- Consider adding benchmarks for critical paths
Error Type Consolidation:
- The error handling is good but could benefit from a more structured approach
- Consider a custom error enum that unifies all failure cases
Test Helper Extraction (tooltest-4zr):
- The new issue for extracting test helpers is well-timed
- This would further reduce coupling in the test suite

5. Final Assessment

Strengths:

✅ Excellent modular decomposition
✅ Maintained all existing functionality
✅ 100% test coverage preserved
✅ Clean separation of concerns
✅ Backward-compatible public API
✅ Proper error handling maintained
✅ Thoughtful transport abstraction

Areas for Improvement:

Documentation could be more comprehensive
Some visibility choices could be more permissive
Minor technical debt remains from previous code

Verdict: The PR successfully addresses all major concerns from the initial review and implements a clean, maintainable architecture. The changes demonstrate excellent software engineering practices with proper separation of concerns, maintained functionality, and comprehensive test coverage.

Recommendation: Approve with suggestions - The changes are ready to merge, with the high-priority recommendations (documentation and visibility review) being nice-to-haves that could be addressed in follow-up commits if time is constrained.

The refactoring sets an excellent foundation for future development and makes the codebase more maintainable. The author should be commended for the thorough and careful implementation of this significant architectural improvement.

💡 LlamaPReview Community
Have feedback on this AI Code review tool? Join our GitHub Discussions to share your thoughts and help shape the future of LlamaPReview.

llamapreview

LlamaPReview: Comprehensive Follow-up Review for PR #34

1. Status Update

Significant Progress Since Initial Review:

This follow-up shows excellent progress in addressing the runner module refactoring. The PR has successfully implemented the planned decomposition into execution and transport modules while maintaining all functionality and test coverage.

Key Achievements:

✅ Module Decomposition: Successfully split the monolithic runner.rs (730 lines → 18 lines) into focused modules
✅ Functionality Preservation: All existing behavior maintained through careful refactoring
✅ Test Coverage: 100% coverage maintained with all tests updated to new module paths
✅ Architectural Improvements: Clear separation of concerns between execution, transport, and result handling
✅ New Issue Tracking: Added tooltest-4zr for future test helper extraction

New Additions Since Last Review:

execution.rs (216 lines) - Core runner logic
transport.rs (66 lines) - Transport-specific entry points
prepare.rs (70 lines) - Run preparation logic
result.rs (151 lines) - Result handling utilities
state_machine.rs (127 lines) - State machine execution

2. Detailed Analysis

2.1 Module Structure Improvements

The refactoring demonstrates excellent architectural decisions:

// Before: Single monolithic module (730 lines)
// After: Well-structured modules
mod assertions;
mod coverage;
mod execution;    // NEW - Core execution logic
mod prepare;      // NEW - Preparation phase
mod result;       // NEW - Result handling
mod schema;
mod state_machine;// NEW - State machine execution
mod transport;    // NEW - Transport abstractions

Technical Assessment:

Separation of Concerns: Each module has clear, focused responsibilities
Public API Stability: Maintained through careful re-exports
Internal Encapsulation: Proper boundaries between modules
Testability: Improved isolation of components

2.2 Execution Flow Improvements

The new prepare_run function shows excellent design:

pub(super) async fn prepare_run(
    session: &SessionDriver,
    config: &RunConfig,
) -> Result<PreparedRun, RunResult> {
    // 1. List tools
    // 2. Validate tools
    // 3. Collect warnings
    // 4. Build validators
    // 5. Return prepared state
}

Impact Analysis:

✅ Clear Phases: Separation between preparation and execution
✅ Error Isolation: Better containment of preparation failures
✅ Testability: Individual components can be tested independently
✅ Functionality: All existing behavior preserved

2.3 Transport Layer Abstraction

The transport module provides clean, consistent entry points:

pub async fn run_stdio(...) -> RunResult {
    run_with_transport(...).await
}

pub async fn run_http(...) -> RunResult {
    run_with_transport(...).await
}

Architectural Benefits:

Transport Isolation: Specific logic properly contained
Common Path: All transports flow through run_with_session
Backward Compatibility: Existing API unchanged
Extensibility: Easy to add new transport types

2.4 Error Handling and Result Processing

The refactored error handling in result.rs maintains robustness:

pub(super) fn failure_result(
    failure: RunFailure,
    trace: Vec<TraceEntry>,
    minimized: Option<MinimizedSequence>,
    warnings: Vec<RunWarning>,
    coverage: Option<CoverageReport>,
    corpus: Option<CorpusReport>,
) -> RunResult {
    // Comprehensive error construction
}

Evaluation:

Consistency: Maintains all previous error information
Structure: Well-organized error construction
Context: Preserves complete failure context
Compatibility: Identical behavior to previous implementation

2.5 Test Updates and Coverage

The test updates demonstrate thorough attention:

// Updated test imports
use super::assertions::{...};
use super::coverage::{...};
use super::result::{...};
use super::state_machine::{...};
use super::transport::{...};

Testing Assessment:

✅ Comprehensive Updates: All tests reference new module paths
✅ Coverage Maintained: 100% coverage preserved
✅ Functionality Verified: All test scenarios still pass
✅ Integration Tested: Transport and execution layers properly tested

3. Outstanding Concerns

3.1 Minor Technical Debt

Issue: Some #[allow(dead_code)] annotations remain, particularly:

#[cfg_attr(not(test), allow(dead_code))]
pub(super) async fn execute_sequence(...)

Recommendation:

Audit these functions to determine if they're:
- Truly test-only utilities (keep annotation)
- Should be promoted to public API (remove annotation)
- Can be eliminated (remove function)
Consider adding documentation explaining why test-only functions exist

3.2 Module Visibility Considerations

Issue: Some types use pub(super) visibility which might be overly restrictive:

pub(super) struct PreparedRun {
    // ...
}

Recommendation:

Review whether pub(crate) would be more appropriate for types that might be needed across the crate
Consider the tradeoff between encapsulation and extensibility

3.3 Documentation Opportunities

Observation: While the code is well-structured, some modules could benefit from:

Module-level documentation explaining the separation of concerns
Examples of how to extend or customize the runner
Architecture diagrams (in comments) showing the flow between modules

4. Recommendations

4.1 High Priority (Address Before Merge)

Documentation Update:

   //! # Runner Execution Module
   //!
   //! Contains the core execution logic for tooltest runs, focusing on:
   //! - Session preparation and validation
   //! - Sequence execution with state machine support
   //! - Assertion application and result handling
   //!
   //! ## Architecture
   //! ```
   //!
   //! The execution flow follows these phases:
   //! 1. Preparation (prepare.rs)
   //! 2. Execution (execution.rs)
   //! 3. Result Processing (result.rs)
   //! ```

Visibility Review:
- Audit all pub(super) usages
- Consider promoting to pub(crate) where beneficial for crate-wide usage
- Document the visibility decisions

4.2 Medium Priority (Post-Merge Improvements)

Technical Debt Cleanup:
- Systematically review all #[allow(dead_code)] instances
- Either:
  - Remove the annotation if the code is actually used
  - Add documentation explaining why it's test-only
  - Remove unused code
Extension Points:
- Consider adding traits for custom execution strategies:

   pub trait ExecutionStrategy {
       async fn execute(
           &self,
           session: &SessionDriver,
           sequence: &[ToolInvocation]
       ) -> Result<Vec<TraceEntry>, RunFailure>;
   }

4.3 Long-term Considerations

Performance Benchmarking:
- While the refactoring shouldn't impact performance, add benchmarks for:
  - Session preparation time
  - Sequence execution throughput
  - State machine generation efficiency
Error Type Consolidation:
- Consider a unified error enum that covers all failure cases:

   #[derive(Debug, thiserror::Error)]
   pub enum RunnerError {
       #[error("Session error: {0}")]
       Session(#[from] SessionError),
       #[error("Validation error: {0}")]
       Validation(String),
       #[error("Execution error: {0}")]
       Execution(String),
       // ... other variants
   }

Test Helper Extraction (tooltest-4zr):
- This new issue is well-timed and should:
  - Reduce coupling in the test suite
  - Improve test maintainability
  - Enable better test reuse

5. Final Assessment

Strengths:

✅ Excellent Modularization: Clear, focused modules with proper boundaries
✅ Functionality Preserved: All existing behavior maintained
✅ Test Coverage: 100% coverage with comprehensive test updates
✅ Architectural Improvements: Better separation of concerns
✅ Backward Compatibility: Public API unchanged
✅ Error Handling: Robust error processing maintained
✅ Transport Abstraction: Clean separation of transport concerns

Areas for Improvement:

Documentation could be more comprehensive
Some visibility choices could be more permissive
Minor technical debt remains from previous code

Verdict: This PR represents a significant architectural improvement that successfully addresses all major concerns from the initial review. The refactoring demonstrates excellent software engineering practices with proper separation of concerns, maintained functionality, and comprehensive test coverage.

Recommendation: Approve with minor suggestions - The changes are ready to merge. The high-priority recommendations (documentation and visibility review) are valuable but not blocking, and could be addressed in follow-up commits if time is constrained.

The author should be commended for this thorough and careful implementation of a complex refactoring that significantly improves the codebase's maintainability and extensibility.

💡 LlamaPReview Insight:
This refactoring exemplifies how to properly decompose a monolithic module while maintaining all functionality. The key success factors were:

Incremental Changes: Breaking down the refactoring into logical components
Test-Driven Approach: Ensuring 100% coverage throughout the process
Clear Boundaries: Establishing well-defined module responsibilities
Backward Compatibility: Preserving the public API surface

The result is a codebase that will be much easier to maintain and extend in the future.

💡 LlamaPReview Community
Have feedback on this AI Code review tool? Join our GitHub Discussions to share your thoughts and help shape the future of LlamaPReview.

llamapreview

LlamaPReview: Comprehensive Follow-up Review for PR #34

1. Status Update

Excellent Progress Since Initial Review:

This follow-up review shows the PR has made significant strides in addressing all major concerns from my previous assessment. The runner module refactoring is now substantially complete with all planned decompositions implemented while maintaining full functionality and test coverage.

Key Achievements:

✅ Completed module decomposition into execution/transport components
✅ Maintained 100% test coverage throughout refactoring
✅ Preserved all existing functionality with no regressions
✅ Improved architectural clarity with focused module responsibilities
✅ Added new issue tracking for future test helper extraction (tooltest-4zr)

New Additions Since Last Review:

execution.rs (216 lines) - Core runner logic
transport.rs (66 lines) - Transport-specific entry points
prepare.rs (70 lines) - Run preparation logic
result.rs (151 lines) - Result handling utilities
state_machine.rs (127 lines) - State machine execution

2. Detailed Analysis

2.1 Module Structure Improvements

The refactoring demonstrates excellent architectural decisions with clear separation of concerns:

// Before: Single monolithic runner.rs (730 lines)
// After: Well-structured focused modules
mod assertions;
mod coverage;
mod execution;    // NEW - Core execution logic
mod prepare;      // NEW - Preparation phase
mod result;       // NEW - Result handling
mod schema;
mod state_machine;// NEW - State machine execution
mod transport;    // NEW - Transport abstractions

Technical Assessment:

Separation of Concerns: Each module has clear, focused responsibilities
Public API Stability: Maintained through careful re-exports
Internal Encapsulation: Proper boundaries between modules
Testability: Improved isolation of components for testing

2.2 Execution Flow Improvements

The new prepare_run function shows excellent design with clear phase separation:

pub(super) async fn prepare_run(
    session: &SessionDriver,
    config: &RunConfig,
) -> Result<PreparedRun, RunResult> {
    // 1. List tools
    // 2. Validate tools
    // 3. Collect warnings
    // 4. Build validators
    // 5. Return prepared state
}

Impact Analysis:

✅ Clear Phases: Separation between preparation and execution
✅ Error Isolation: Better containment of preparation failures
✅ Testability: Individual components can be tested independently
✅ Functionality: All existing behavior preserved

2.3 Transport Layer Abstraction

The transport module provides clean, consistent entry points:

pub async fn run_stdio(...) -> RunResult {
    run_with_transport(...).await
}

pub async fn run_http(...) -> RunResult {
    run_with_transport(...).await
}

Architectural Benefits:

Transport Isolation: Specific logic properly contained
Common Path: All transports flow through run_with_session
Backward Compatibility: Existing API unchanged
Extensibility: Easy to add new transport types

2.4 Error Handling and Result Processing

The refactored error handling in result.rs maintains robustness:

pub(super) fn failure_result(
    failure: RunFailure,
    trace: Vec<TraceEntry>,
    minimized: Option<MinimizedSequence>,
    warnings: Vec<RunWarning>,
    coverage: Option<CoverageReport>,
    corpus: Option<CorpusReport>,
) -> RunResult {
    // Comprehensive error construction
}

Evaluation:

Consistency: Maintains all previous error information
Structure: Well-organized error construction
Context: Preserves complete failure context
Compatibility: Identical behavior to previous implementation

2.5 Test Updates and Coverage

The test updates demonstrate thorough attention:

// Updated test imports
use super::assertions::{...};
use super::coverage::{...};
use super::result::{...};
use super::state_machine::{...};
use super::transport::{...};

Testing Assessment:

✅ Comprehensive Updates: All tests reference new module paths
✅ Coverage Maintained: 100% coverage preserved
✅ Functionality Verified: All test scenarios still pass
✅ Integration Tested: Transport and execution layers properly tested

3. Outstanding Concerns

3.1 Minor Technical Debt

Issue: Some #[allow(dead_code)] annotations remain:

#[cfg_attr(not(test), allow(dead_code))]
pub(super) async fn execute_sequence(...)

Recommendation:

Audit these functions to determine if they're:
- Truly test-only utilities (keep annotation)
- Should be promoted to public API (remove annotation)
- Can be eliminated (remove function)
Consider adding documentation explaining why test-only functions exist

3.2 Module Visibility Considerations

Issue: Some types use pub(super) visibility which might be overly restrictive:

pub(super) struct PreparedRun {
    // ...
}

Recommendation:

Review whether pub(crate) would be more appropriate for types that might be needed across the crate
Consider the tradeoff between encapsulation and extensibility

3.3 Documentation Opportunities

Observation: While the code is well-structured, some modules could benefit from:

Module-level documentation explaining the separation of concerns
Examples of how to extend or customize the runner
Architecture diagrams (in comments) showing the flow between modules

4. Recommendations

4.1 High Priority (Address Before Merge)

Documentation Update:

   //! # Runner Execution Module
   //!
   //! Contains the core execution logic for tooltest runs, focusing on:
   //! - Session preparation and validation
   //! - Sequence execution with state machine support
   //! - Assertion application and result handling
   //!
   //! ## Architecture
   //! ```
   //!
   //! The execution flow follows these phases:
   //! 1. Preparation (prepare.rs)
   //! 2. Execution (execution.rs)
   //! 3. Result Processing (result.rs)
   //! ```

Visibility Review:
- Audit all pub(super) usages
- Consider promoting to pub(crate) where beneficial for crate-wide usage
- Document the visibility decisions

4.2 Medium Priority (Post-Merge Improvements)

Technical Debt Cleanup:
- Systematically review all #[allow(dead_code)] instances
- Either:
  - Remove the annotation if the code is actually used
  - Add documentation explaining why it's test-only
  - Remove unused code
Extension Points:
- Consider adding traits for custom execution strategies:

   pub trait ExecutionStrategy {
       async fn execute(
           &self,
           session: &SessionDriver,
           sequence: &[ToolInvocation]
       ) -> Result<Vec<TraceEntry>, RunFailure>;
   }

4.3 Long-term Considerations

Performance Benchmarking:
- While the refactoring shouldn't impact performance, add benchmarks for:
  - Session preparation time
  - Sequence execution throughput
  - State machine generation efficiency
Error Type Consolidation:
- Consider a unified error enum that covers all failure cases:

   #[derive(Debug, thiserror::Error)]
   pub enum RunnerError {
       #[error("Session error: {0}")]
       Session(#[from] SessionError),
       #[error("Validation error: {0}")]
       Validation(String),
       #[error("Execution error: {0}")]
       Execution(String),
       // ... other variants
   }

Test Helper Extraction (tooltest-4zr):
- This new issue is well-timed and should:
  - Reduce coupling in the test suite
  - Improve test maintainability
  - Enable better test reuse

5. Final Assessment

Strengths:

✅ Excellent Modularization: Clear, focused modules with proper boundaries
✅ Functionality Preserved: All existing behavior maintained
✅ Test Coverage: 100% coverage with comprehensive test updates
✅ Architectural Improvements: Better separation of concerns
✅ Backward Compatibility: Public API unchanged
✅ Error Handling: Robust error processing maintained
✅ Transport Abstraction: Clean separation of transport concerns

Areas for Improvement:

Documentation could be more comprehensive
Some visibility choices could be more permissive
Minor technical debt remains from previous code

Verdict: This PR represents a significant architectural improvement that successfully addresses all major concerns from the initial review. The refactoring demonstrates excellent software engineering practices with proper separation of concerns, maintained functionality, and comprehensive test coverage.

Recommendation: Approve with minor suggestions - The changes are ready to merge. The high-priority recommendations (documentation and visibility review) are valuable but not blocking, and could be addressed in follow-up commits if time is constrained.

The author should be commended for this thorough and careful implementation of a complex refactoring that significantly improves the codebase's maintainability and extensibility.

💡 LlamaPReview Insight:
This refactoring exemplifies how to properly decompose a monolithic module while maintaining all functionality. The key success factors were:

Incremental Changes: Breaking down the refactoring into logical components
Test-Driven Approach: Ensuring 100% coverage throughout the process
Clear Boundaries: Establishing well-defined module responsibilities
Backward Compatibility: Preserving the public API surface

The result is a codebase that will be much easier to maintain and extend in the future.

💡 LlamaPReview Community
Have feedback on this AI Code review tool? Join our GitHub Discussions to share your thoughts and help shape the future of LlamaPReview.

mwotton added 3 commits January 1, 2026 10:24

Split runner into execution and transport modules

99649c9

Update bd issues for d7e.3 closure

63f2d43

Sync bd issue statuses

f75403c

llamapreview bot reviewed Jan 1, 2026

View reviewed changes

mwotton added 3 commits January 1, 2026 10:54

Split runner preparation and reduce test coupling

d7eb752

Sync beads issues

3312697

Add runner test types follow-up task

95621dc

mwotton marked this pull request as draft January 1, 2026 04:14

mwotton marked this pull request as ready for review January 1, 2026 04:14

llamapreview bot reviewed Jan 1, 2026

View reviewed changes

Remove unused sequence helpers

0798571

mwotton marked this pull request as draft January 1, 2026 04:50

mwotton marked this pull request as ready for review January 1, 2026 04:50

llamapreview bot reviewed Jan 1, 2026

View reviewed changes

Remove unused run finalizer helper

cfbc7f7

mwotton marked this pull request as draft January 2, 2026 04:46

mwotton marked this pull request as ready for review January 2, 2026 04:48

llamapreview bot reviewed Jan 2, 2026

View reviewed changes

mwotton merged commit 1bef0cb into main Jan 2, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Split runner into execution and transport modules #34

Split runner into execution and transport modules #34

mwotton commented Jan 1, 2026

Uh oh!

llamapreview bot left a comment

Uh oh!

llamapreview bot Jan 1, 2026

Uh oh!

llamapreview bot Jan 1, 2026

Uh oh!

llamapreview bot Jan 1, 2026

Uh oh!

llamapreview bot Jan 1, 2026

Uh oh!

llamapreview bot Jan 1, 2026

Uh oh!

llamapreview bot left a comment

Uh oh!

llamapreview bot left a comment

Uh oh!

llamapreview bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Split runner into execution and transport modules #34

Split runner into execution and transport modules #34

Conversation

mwotton commented Jan 1, 2026

Summary

Testing

Uh oh!

llamapreview bot left a comment

Choose a reason for hiding this comment

AI Code Review by LlamaPReview

🎯 TL;DR & Recommendation

📄 Documentation Diagram

🌟 Strengths

🔍 Notable Themes

Uh oh!

llamapreview bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

llamapreview bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

llamapreview bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

llamapreview bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

llamapreview bot Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

llamapreview bot left a comment

Choose a reason for hiding this comment

LlamaPReview: Follow-up Review for PR #34

1. Status Update

2. Detailed Analysis

2.1 Module Structure Improvements

2.2 Execution Flow Improvements

2.3 Transport Layer Abstraction

2.4 Test Updates and Coverage

2.5 Error Handling Improvements

3. Outstanding Concerns

3.1 Minor Technical Debt

3.2 Module Visibility

3.3 Future Extensibility

4. Recommendations

4.1 High Priority (Should Address Before Merge)

4.2 Medium Priority (Can Address Post-Merge)

4.3 Long-term Considerations

5. Final Assessment

Uh oh!

llamapreview bot left a comment

Choose a reason for hiding this comment

LlamaPReview: Comprehensive Follow-up Review for PR #34

1. Status Update

2. Detailed Analysis

2.1 Module Structure Improvements

2.2 Execution Flow Improvements

2.3 Transport Layer Abstraction

2.4 Error Handling and Result Processing

2.5 Test Updates and Coverage

3. Outstanding Concerns

3.1 Minor Technical Debt

3.2 Module Visibility Considerations

3.3 Documentation Opportunities

4. Recommendations

4.1 High Priority (Address Before Merge)

4.2 Medium Priority (Post-Merge Improvements)

4.3 Long-term Considerations

5. Final Assessment

Uh oh!

llamapreview bot left a comment

Choose a reason for hiding this comment

LlamaPReview: Comprehensive Follow-up Review for PR #34

1. Status Update

2. Detailed Analysis

2.1 Module Structure Improvements

2.2 Execution Flow Improvements

2.3 Transport Layer Abstraction

2.4 Error Handling and Result Processing

2.5 Test Updates and Coverage