Skip to content

Canonical urls for deduplication of google results in rustdoc #9461

@Seldaek

Description

@Seldaek
Contributor

When multiple versions of the documentation are available, it tends to pollute google results. As a way to prevent that, it would be good to always have the latest stable release available under /current/, and have all previous versions + the master docs contain canonical links to the current docs like:

<link rel="canonical" href="http://.../current/..." />

That way it consolidates all results under the current URL which will always be correct, and it also encourages people linking to docs in blog posts and such to use links that will not rot.

/cc @alexcrichton

Activity

alexcrichton

alexcrichton commented on Sep 30, 2013

@alexcrichton
Member

Do you know how search engines handle situations where pages go away or pages are just created? In theory old documentation could refer to a canonical location which no longer exists (if the module were removed), and new documentation could refer to canonical locations which do not yet exist (because they're newly added modules).

Do you know of special attributes to handle these cases?

thestinger

thestinger commented on Sep 30, 2013

@thestinger
Contributor

If a module is removed, a 404 is correct. In theory it would be better to redirect them on renames but it's not going to be possible because it's not tracked.

The point of a canonical URL is to say that the page is only a non-canonical version of another URL and shouldn't show up separate in search. When we eventually have supported versions, the newest release (or master) can be given as the canonical one so the older pages won't clutter search results but will be available via a drop-down menu.

Of course, if the newer version does not have the module, you would have to omit stating it is the canonical URL - meaning you need to regenerate the old documentation every time you do the new ones. I don't think it's worth the complexity.

thestinger

thestinger commented on Sep 30, 2013

@thestinger
Contributor

FWIW I think we should only have documentation on the site for releases we still support. Until we get to 1.0, we can make an exception for the last 0.x snapshot :).

chris-morgan

chris-morgan commented on Oct 15, 2013

@chris-morgan
Member

When a module is removed, 404 is indeed correct, but just remember that that's not the end of the story, as I wrote recently about at http://chrismorgan.info/blog/github-links-case-study.html.

What the Django docs do is worthwhile considering: https://docs.djangoproject.com/. It makes it easy to switch between versions and shows a warning banner for the development build suggesting you may want to look at the latest stable instead. They don't, however, have a banner reminding you "this isn't the latest stable version" for old versions, which continues to surprise me a little. I reckon old versions (though not before 1.0 after a while) should stay in existence but with a banner at the top indicating that this is an unsupported release, and docs for the latest version, X.Y, are available in such-and-such a place. Of course, these things become much more directly applicable once we get to 1.0 and beyond.

@alexcrichton I guess in the no-longer-exists case you'd need to either implement something so that you can conveniently reprocess the old docs, or do a little bit of post-processing to fix the "errors". For the doesn't-yet-exist case, checking online or comparing crates (which sounds risky) would be the only real ways, I suppose.

steveklabnik

steveklabnik commented on Apr 20, 2015

@steveklabnik
Member

Triage: no change.

steveklabnik

steveklabnik commented on Jun 27, 2016

@steveklabnik
Member

Triage: no changes

SamWhited

SamWhited commented on Jan 22, 2017

@SamWhited
Contributor

(sorry for the duplicate; moving relavant link here)

See also: https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Choosing_between_www_and_non-www_URLs#Using_%3Clink_relcanonical%3E

This page from Google's help center [1] appears to suggest that they only use the canonical URL as a hint. While this doesn't explicitly say it, this seems to concur with behavior I've seen in the past where if the page pointed to by the canonical URL is a 404, Google simply uses the original URL (which I suspect is what we want in this case since it makes it easy: point canonical url's to the /stable path and if the module is deleted it doesn't really matter).

sanmai-NL

sanmai-NL commented on Mar 2, 2017

@sanmai-NL

I would like to re-open discussion on this issue (@steveklabnik, I think you would be the one to ping).

With https://docs.rs/ in place, I think all rustdoc documentation for public crates should provide a canonical link to the appropriate documentation there. Why? Search for any popular crate on Google and you get a litter of confusing, often outdated self-hosted versions of the docs. This may lead the programmer to accidentally study outdated publications of documentation (e.g. when not depending on a specific version of a library), and use and perhaps even bookmark publications that may be partially broken or are not as continuously available as on https://docs.rs/.

Google's former SEO representative has indicated that Google may disregard canonical links that result in 404 HTTP response codes. Since Google is by far the most used search engine, and it addressed this issue in a sensible way already, I personally take little issue with the possibility of 404 canonical links.

Here's my logic:

steveklabnik

steveklabnik commented on Mar 2, 2017

@steveklabnik
Member

I would like to re-open discussion on this issue (@steveklabnik, I think you would be the one to ping).

No need to re-open anything 😄 It's an open issue.

With https://docs.rs/ in place, I think all rustdoc documentation for public crates should provide a canonical link to the appropriate documentation there

That would be nice, but without some improvements to docs.rs, it's not feasible. There are several people who do extra things to make their docs nicer and explicitly don't want their docs hosted on docs.rs at all.

sanmai-NL

sanmai-NL commented on Mar 2, 2017

@sanmai-NL

Interesting, could you point me to examples of what those people do?

steveklabnik

steveklabnik commented on Mar 2, 2017

@steveklabnik
Member

I believe @briansmith, @retep998 , and @bluss are three of those people?

28 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-feature-requestCategory: A feature request, i.e: not implemented / a PR.T-rustdocRelevant to the rustdoc team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @briansmith@steveklabnik@ehuss@skade@alexcrichton

        Issue actions

          Canonical urls for deduplication of google results in rustdoc · Issue #9461 · rust-lang/rust