rustdoc search: prefer stable items in search results #141658

lolbinarycat · 2025-05-27T17:45:59Z

this does add a new field to the search index, but since we're only listing unstable items instead of adding a boolean flag to every item, it should only increase the search index size of sysroot crates, since those are the only ones using the staged_api feature, at least as far as the rust project is concerned.

rustbot · 2025-05-27T17:46:03Z

r? @GuillaumeGomez

rustbot has assigned @GuillaumeGomez.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

rustbot · 2025-05-27T17:46:05Z

Some changes occurred in HTML/CSS/JS.

cc @GuillaumeGomez, @jsha, @lolbinarycat

GuillaumeGomez · 2025-06-10T14:45:24Z

~~I'm concerned here about the search index size increase. In particular for very big crates like windows. Do you have some before/after numbers?~~

Nevermind, the only difference is a new array in the search index, so should be very limited, and only an impact on std/core/alloc crates.

src/librustdoc/html/render/mod.rs

GuillaumeGomez · 2025-06-10T14:50:16Z

src/librustdoc/html/static/js/search.js

+                // sort unstable items later
+                a = Number(
+                    // @ts-expect-error
+                    this.searchIndexUnstable.get(aaa.item.crate).contains(aaa.item.bitIndex),


Instead of retrieving this information every time, why not storing this information in the related item when we build the search index?

yeah, you're right, we should keep linear scans out of inner loops if possible.

if we have a binary search impl, or if we ever pull one in in the future, we could actually use that to speed up the unpacking time for the search index (since we can pre-sort them when generating the index), though it's probably not worth embedding a binary search impl for something that only speeds up the standard libraries.

Ah, right, I copied this from the implementation of deprecated items sorting, so if we want to change one, we should probably change all of them (especially because those ones will be applicable to all crates).

Also, we have RoaringBitmap, so it's not a linear scan, actually.

I'm not sure what the memory impact of having 3 more boolean fields on each row is (especially when one of those is going to be unused in every non-sysroot crate), so I would kinda wanna do some degree of perf testing to see if this is worth it.

One thing we could do actually would be to add this data during transformResults, that way we're not adding a field to everything in the search index, only to the results that are currently being show, so we would only ever do 1 bitmap lookup per item per search, which may be better than doing it per comparsion, except for the fact that we're not doing it once per comparison, we're only doing it to compare when the items have the same edit distance, which is actually fairly likely to be less than the number of results.

TL;DR it's actually not obvious what the most performant solution would be, so I'd rather stick with what the code is already doing and then maybe make a followup PR focused on performance.

Adding this data when we build the search index seems like the way to go.

As for performance checking, @notriddle wrote a tool for it (which you can find here if I'm not wrong).

There are multiple ways of limiting the impact on performance. Another idea I had would be to put the new comparison in a function which would be created during search index creation: if we don't have any stability information, then the function is empty.

Anyway, it'll likely impact performance of all crates using rustdoc (both std/core/alloc and the others without stability info) so I'd really prefer to ensure what the impact is before approving it.

Another idea I had would be to put the new comparison in a function which would be created during search index creation: if we don't have any stability information, then the function is empty.

Is a dynamic function call faster than a searching an empty bitmap?
I suppose JITs have the advantage that they can inline closures dynamically, but I'm not sure how significant it would be.

I'm trying to run the perf tool, but i'm having a bit of trouble making a full custom toolchain, probably due to bootstrap settings.

I got it "working" and I think it's a bit out of date:

Testing T ... TypeError: searchModule.execQuery is not a function at Object.doSearch (/home/binarycat/src/rs/rustdoc-js-profile/src/tester.js:229:37) at main (/home/binarycat/src/rs/rustdoc-js-profile/src/tester.js:316:49) at Object.<anonymous> (/home/binarycat/src/rs/rustdoc-js-profile/src/tester.js:327:1) at Module._compile (node:internal/modules/cjs/loader:1734:14) at Object..js (node:internal/modules/cjs/loader:1899:10) at Module.load (node:internal/modules/cjs/loader:1469:32) at Module._load (node:internal/modules/cjs/loader:1286:12) at TracingChannel.traceSync (node:diagnostics_channel:322:14) at wrapModuleLoad (node:internal/modules/cjs/loader:235:24) at Module.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:152:5)

This issue is still not fixed, right? (not the bench tool, the "unstable" value not being stored in the item)

lolbinarycat · 2025-06-10T14:50:42Z

@GuillaumeGomez the windows crate will not see any size increase (beyond the 6 bytes of the empty u field), as it does not use thr staged_api feature, and we only store a list of unstable items.

i should look at how much bigger the standard library index is, thought.

GuillaumeGomez · 2025-06-10T14:52:04Z

src/librustdoc/html/render/search_index.rs

@@ -736,6 +741,7 @@ pub(crate) fn build_index(
            crate_data.serialize_field("r", &re_exports)?;
            crate_data.serialize_field("b", &self.associated_item_disambiguators)?;
            crate_data.serialize_field("c", &bitmap_to_string(&deprecated))?;
+            crate_data.serialize_field("u", &bitmap_to_string(&unstable))?;


Could we add this field only if unstable is not empty?

we could, although i'm pretty sure it would make most doc bundles bigger, as the logic for handling the undef field would be more than 6 bytes, even when minified (i believe this is why we don't omit other fields either, even tho many crates have no reexports or aliases)

Good point, I'll keep that in a corner of my memory until I have some idea on how to make it better.

we can get part of the way there by noticing js truthiness rules are just sane enough for this case, the only falsy string value is the empty string, so we can do | "" to cleanse the undef. we're still using a few chars for the field access though, so i think in order to make this profitable we have to put this in a loop and apply it to multiple fields. unfortunately our fields are not homogeneous, so we'll need a different default value for different fields, meaning we'd be barely breaking even in the average case.

honestly if we want to make the search index smaller, we're much better off chasing an improvement that scales, like delta compressing the row ids, instead of trying to save a single digit number of bytes.

GuillaumeGomez · 2025-06-10T14:53:28Z

@GuillaumeGomez the windows crate will not see any size increase (beyond the 6 bytes of the empty u field), as it does not use thr staged_api feature, and we only store a list of unstable items.

i should look at how much bigger the standard library index is, thought.

Yeah, after re-reading, I realized I was wrong. So yeah, only impact should be in std/core/alloc crates. Would be nice to have some numbers for them. Although I don't expect the impact to be that big.

So seems like a very good start, just a few nits and it should be ready for merge.

fixes rust-lang#138067

lolbinarycat · 2025-06-10T18:27:04Z

Based on my limited testing, this adds 8KiB to the std search index, which is currently at 1.3MiB, meaning this is an increase in size of 0.6%.

It's a bit worse in terms of compressed size, 0.8%, but it's still not super significant, but it's also only 1.8KB compressed.

GuillaumeGomez · 2025-06-10T18:28:19Z

It's acceptable, but now we have numbers.

lolbinarycat · 2025-06-23T20:31:20Z

@GuillaumeGomez would it be acceptable to merge this as-is, then do followup perf testing about the best way to handle unstable, deprecated, and empty description items is? I have a hunch that the short circuiting nature of search result sorting might mean this is a situation where worst case algorithmic complexity is not the factor that matters in practice.

src/librustdoc/html/render/search_index.rs

GuillaumeGomez · 2025-06-24T08:59:24Z

src/librustdoc/html/render/search_index.rs

@@ -642,6 +643,7 @@ pub(crate) fn build_index(
            let mut parents_backref_queue = VecDeque::new();
            let mut functions = String::with_capacity(self.items.len());
            let mut deprecated = Vec::with_capacity(self.items.len());
+            let mut unstable = Vec::with_capacity(self.items.len());


I just realized that the way we pre-allocate everything is pretty bad since we have x times the necessary memory used. Well, this one can be tweaked in a follow-up.

rustbot · 2025-08-08T16:39:40Z

⚠️ Warning ⚠️

This PR is based on an upstream commit that is 63 days old.

It's recommended to update your branch according to the rustc-dev-guide.

GuillaumeGomez · 2025-08-08T21:17:20Z

Thanks!

@bors r+ rollup

bors · 2025-08-08T21:17:22Z

📌 Commit fdbc8d0 has been approved by GuillaumeGomez

It is now in the queue for this repository.

Rollup of 23 pull requests Successful merges: - #141658 (rustdoc search: prefer stable items in search results) - #141828 (Add diagnostic explaining STATUS_STACK_BUFFER_OVERRUN not only being used for stack buffer overruns if link.exe exits with that exit code) - #144823 (coverage: Extract HIR-related helper code out of the main module) - #144883 (Remove unneeded `drop_in_place` calls) - #144923 (Move several more float tests to floats/mod.rs) - #144988 (Add annotations to the graphviz region graph on region origins) - #145010 (Couple of minor abi handling cleanups) - #145017 (Explicitly disable vector feature on s390x baseline of bad-reg test) - #145027 (Optimize `char::is_alphanumeric`) - #145050 (add member constraints tests) - #145073 (update enzyme submodule to handle llvm 21) - #145080 (Escape diff strings in MIR dataflow graphviz) - #145082 (Fix some bad formatting in `-Zmacro-stats` output.) - #145083 (Fix cross-compilation of Cargo) - #145096 (Fix wasm target build with atomics feature) - #145097 (remove unnecessary `TypeFoldable` impls) - #145100 (Rank doc aliases lower than equivalently matched items) - #145103 (rustc_metadata: remove unused private trait impls) - #145115 (defer opaque type errors, generally greatly reduce tainting) - #145119 (rustc_public: fix missing parenthesis in pretty discriminant) - #145124 (Recover `for PAT = EXPR {}`) - #145132 (Refactor map_unit_fn lint) - #145134 (Reduce indirect assoc parent queries) r? `@ghost` `@rustbot` modify labels: rollup

Rollup merge of #141658 - lolbinarycat:rustdoc-search-stability-rank-138067, r=GuillaumeGomez rustdoc search: prefer stable items in search results fixes #138067 this does add a new field to the search index, but since we're only listing unstable items instead of adding a boolean flag to every item, it should only increase the search index size of sysroot crates, since those are the only ones using the `staged_api` feature, at least as far as the rust project is concerned.

jieyouxu · 2025-08-09T12:20:53Z

Hi bors this already merged
@bors r-

jieyouxu · 2025-08-09T12:21:43Z

I am bors
@rustbot label: -S-waiting-on-author +S-waiting-on-bors +merged-by-bors

rustbot assigned GuillaumeGomez May 27, 2025

This comment has been minimized.

Sign in to view

lolbinarycat force-pushed the rustdoc-search-stability-rank-138067 branch from e40eefb to d35829b Compare May 27, 2025 19:28

GuillaumeGomez reviewed Jun 10, 2025

View reviewed changes

src/librustdoc/html/render/mod.rs Outdated Show resolved Hide resolved

GuillaumeGomez reviewed Jun 10, 2025

View reviewed changes

src/librustdoc/html/render/mod.rs Outdated Show resolved Hide resolved

GuillaumeGomez reviewed Jun 10, 2025

View reviewed changes

rustdoc search: prefer stable items in search results

1140e90

fixes rust-lang#138067

lolbinarycat force-pushed the rustdoc-search-stability-rank-138067 branch from d35829b to 341866a Compare June 10, 2025 18:36

GuillaumeGomez reviewed Jun 24, 2025

View reviewed changes

src/librustdoc/html/render/search_index.rs Outdated Show resolved Hide resolved

GuillaumeGomez reviewed Jun 24, 2025

View reviewed changes

lolbinarycat added 2 commits August 8, 2025 11:40

rustdoc: IndexItem::{stability -> is_unstable}

5e8ebd5

rustdoc search: add performance note about searchIndexUnstable check

fdbc8d0

lolbinarycat force-pushed the rustdoc-search-stability-rank-138067 branch from e15e1b2 to fdbc8d0 Compare August 8, 2025 16:55

GuillaumeGomez approved these changes Aug 8, 2025

View reviewed changes

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 8, 2025

lolbinarycat added the relnotes Marks issues that should be documented in the release notes of the next release. label Aug 8, 2025

rustbot mentioned this pull request Aug 8, 2025

Tracking issue for release notes of #141658: rustdoc search: prefer stable items in search results #145136

Open

Zalathar mentioned this pull request Aug 9, 2025

Rollup of 23 pull requests #145142

Merged

bors merged commit 48f5929 into rust-lang:master Aug 9, 2025
10 checks passed

rustbot added this to the 1.91.0 milestone Aug 9, 2025

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 9, 2025

rustdoc search: prefer stable items in search results #141658

rustdoc search: prefer stable items in search results #141658

Conversation

lolbinarycat commented May 27, 2025

Uh oh!

rustbot commented May 27, 2025

Uh oh!

rustbot commented May 27, 2025

Uh oh!

This comment has been minimized.

GuillaumeGomez commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GuillaumeGomez Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lolbinarycat commented Jun 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GuillaumeGomez commented Jun 10, 2025

Uh oh!

lolbinarycat commented Jun 10, 2025

Uh oh!

GuillaumeGomez commented Jun 10, 2025

Uh oh!

lolbinarycat commented Jun 23, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rustbot commented Aug 8, 2025

Uh oh!

GuillaumeGomez commented Aug 8, 2025

Uh oh!

bors commented Aug 8, 2025

Uh oh!

Uh oh!

jieyouxu commented Aug 9, 2025

Uh oh!

jieyouxu commented Aug 9, 2025

Uh oh!

Uh oh!

GuillaumeGomez commented Jun 10, 2025 •

edited

Loading

GuillaumeGomez Jun 10, 2025 •

edited

Loading