GitRepo::getRevCount(): Compute revcount in parallel#245
Conversation
For repos with a lot of non-linearity in the commit graph (like Nixpkgs), this speeds up getting the revcount a lot, e.g. `nix flake metadata /path/to/nixpkgs?rev=9dc7035bbee85ffc740d893e02cb64460f11989f` went from 9.1s to 3.7s.
WalkthroughThe code refactors repository revision counting to use concurrent multithreading. It introduces a Changes
Sequence DiagramsequenceDiagram
participant Caller as Caller
participant GRI as GitRepoImpl
participant Pool as ThreadPool
participant RepoPool as RepoPool
participant Worker as Worker Lambda
Caller->>GRI: getRevCount(start)
activate GRI
Note over GRI: Initialize
GRI->>GRI: getPool() → RepoPool
GRI->>RepoPool: create concurrent_flat_set
GRI->>RepoPool: insert startOid → done set
GRI->>GRI: create ThreadPool
Note over GRI: Parallel Traversal
GRI->>Pool: spawn workers
activate Pool
loop Concurrent Processing
Pool->>Worker: process commit parent
activate Worker
Worker->>Worker: retrieve parentOID
Worker->>Worker: check if seen
alt Not yet processed
Worker->>RepoPool: insert into done set
Worker->>Pool: enqueue next parent
else Already processed
Worker->>Worker: skip
end
deactivate Worker
end
Pool->>GRI: synchronize (wait completion)
deactivate Pool
GRI->>Caller: return revision count
deactivate GRI
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/libfetchers/git-utils.cc(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/libfetchers/git-utils.cc (2)
src/libutil/include/nix/util/pool.hh (2)
Pool(66-75)Pool(93-99)src/libfetchers/include/nix/fetchers/git-utils.hh (5)
rev(31-31)rev(33-33)rev(85-85)rev(92-92)rev(107-107)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build_aarch64-darwin / build
- GitHub Check: build_x86_64-linux / build
|
upstream ref NixOS#14462 |
Motivation
For repos with a lot of non-linearity in the commit graph (like Nixpkgs), this speeds up getting the revcount a lot, e.g.
nix flake metadata /path/to/nixpkgs?rev=9dc7035bbee85ffc740d893e02cb64460f11989fwent from 9.1s to 3.7s.Context
Summary by CodeRabbit