[OSM-talk] Could we just pause any wikidata edits for a month or two?
Martin Koppenhoefer
dieterdreist at gmail.com
Wed Oct 11 13:02:13 UTC 2017
2017-10-11 13:42 GMT+02:00 Christoph Hormann <osm at imagico.de>:
> * Wikidata is definitely not suited as an universal meta-database
> connecting OSM with other open data sets. This is because of the
> Notability concept (https://www.wikidata.org/wiki/Wikidata:Notability)
> which practically means the vast majority of the >500 million tagged
> features in OSM will never be able to get a Wikidata ID and will
> therefore never be able to be connected to other data sets through
> Wikidata.
>
>From what I have seen so far, this should probably be less of a concern,
but it is an uncertainty (because it could be interpreted more rigidly in
the future), I agree. Requirements seem to be much lower than they are for
wikipedia inclusion, for one because a link to any of these wikimedia
projects is sufficient: Wikipedia, Wikivoyage, Wikisource, Wikiquote,
Wikinews, Wikibooks, Wikidata, Wikispecies, Wikiversity, or Wikimedia
Commons (this paragraph is followed by some clarification and limitation).
In other words, if you want to save your pet wikidata object from deletion
it is sufficient to take a picture of it and upload it to wm commons.
There's also a very soft criterion in the next paragraph which allows
object that "[refer] to an instance of a *clearly identifiable conceptual
or material entity*. The entity must be notable, in the sense that it *can
be described using serious and publicly available references*."
It requires references to be "serious" (how subjective is this?).
On the other hand, even stuff like objects for osm tags don't get removed:
https://www.wikidata.org/wiki/Q29637965
likely because it fulfills a structural need.
>
> > * What is the qualification of Wikidata for having its IDs in OSM
> > (both for wikidata=* and X:wikidata=*)? Is there a particular
> > objective criterion that qualifies it? Would there be other external
> > IDs that would also qualify under these criteria? Is there a limit
> > in the number of different external IDs OSM is going to accept?
>
is it something we have to decide now, or can we wait until we can see how
many external IDs mappers actually put into OSM, and whether this can
become a serious problem?
For me, criterions pro wikidata are:
- it has a very permissive license (cc0)
- it is openly accessible
- it is fully downloadable as a dump (i.e. I don't have to use APIs which
might log what I look at or limit the speed or quantity of my access)
- there is overlap with our field of interest
>
> > > * To what extent has there been information transferred
> > > systematically from Wikidata and Wikipedia to OSM based on wikidata
> > > ID references (like adding names in different languages). As
> > > others have explained this would be legally problematic and it
> > > would be important to know how common this is.
> >
> > I agree that there are questions about OSM's acceptance of labels and
> > statements copied from Wikidata, though I would've expected this
> > phenomenon to be at least as common with Wikipedia long before the
> > introduction of the wikidata tag.
>
> But my question was specifically to what extent data has been
> transferred based on wikidata ID references. The question if such data
> transfer happened before based on other connections has nothing to do
> with this.
>
Is this about reliability of the information, or about licensing questions?
As wikidata is published under cc0, the latter shouldn't matter here, it is
the wikimedia foundation that guarantees that they can release this data as
cc0, no?
Although, admittedly, wikimedia foundation themselves have not yet formed a
definitive view on database protection:
https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights
Disclaimer: "Note: This page shares the Wikimedia Foundation’s preliminary
perspective on a legal issue. This page is not final - if you have
additional information, or want to provide a different perspective, please
feel free to expand or add to it."
And there are also some sentences on this page (
https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights#Conclusion ) that
read like WMF would be suggesting to copy protected material from EU
databases "under the radar": "Extraction and use of data should be kept to
a minimum and *limited to unprotected material, such as uncopyrightable
facts and short phrases*, rather than extensive text. For EU databases,
bots or other automated ways of extracting data should also be avoided
because of the Directive’s prohibition on “repeated and systematic
extraction” of even insubstantial amounts of data."
First, under the database directive there is no "unprotected material such
as ... facts and short phrases", and secondly, if you take information from
a database to build another database like wikidata it is a given that all
users together likely do "repeated and systematic extraction", regardless
of using a bot or doing it "manually".
Cheers,
Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstreetmap.org/pipermail/talk/attachments/20171011/536a03de/attachment.html>
More information about the talk
mailing list