# Wikidata Integrator #
[](https://travis-ci.org/SuLab/WikidataIntegrator)
[](https://pypi.python.org/pypi/wikidataintegrator)
[](https://pypi.python.org/pypi/wikidataintegrator)
[](https://mybinder.org/v2/gh/SuLab/WikidataIntegrator/main)
[<img src="https://img.shields.io/badge/slack-@genewiki/wdi_bot_dev-green.svg?logo=slack">](https://suwulab.slack.com/archives/C014ADW3SGZ)
# Slack channel
We have a slack channel for Wikidata bot developers using the Wikidata Integrator. Send us a [request](mailto:[email protected]) to join this channel.
# Installation #
The easiest way to install WikidataIntegrator is using `pip` or `pip3`. WikidataIntegrator supports python 3.6 and higher, hence the suggestion for pip3. If python2 is installed pip will lead to an error indicating missing dependencies.
```
pip3 install wikidataintegrator
```
You can also clone the repo and execute with administrator rights or install into a virtualenv.
```bash
git clone https://github.com/sebotic/WikidataIntegrator.git
cd WikidataIntegrator
python3 setup.py install
```
To test for correct installation, start a python console and execute the following (Will retrieve the Wikidata item for ['Human'](http://www.wikidata.org/entity/Q5)):
```python
from wikidataintegrator import wdi_core
my_first_wikidata_item = wdi_core.WDItemEngine(wd_item_id='Q5')
# to check successful installation and retrieval of the data, you can print the json representation of the item
my_first_wikidata_item.get_wd_json_representation()
```
# Introduction #
WikidataIntegrator is a library for reading and writing to Wikidata/Wikibase. We created it for populating [Wikidata](http://www.wikidata.org) with content from authoritative resources on Genes, Proteins, Diseases, Drugs and others.
Details on the different tasks can be found on [the bot's Wikidata page](https://www.wikidata.org/wiki/User:ProteinBoxBot).
[Pywikibot](https://www.mediawiki.org/wiki/Manual:Pywikibot) is an existing framework for interacting with the [MediaWiki](https://www.mediawiki.org/) API. The reason why we came up with our own solution is that we need a high integration with the [Wikidata SPARQL endpoint](query.wikidata.org) in order to ensure data consistency (duplicate check, consistency checks, correct item selection, etc.).
Compared to Pywikibot, WikidataIntegrator currently is not a full Python wrapper for the MediaWiki API but is solely focused on providing an easy means to generate Python based Wikidata bots, it therefore resembles a basic database connector like JDBC or ODBC.
## Note: Rate Limits ##
New users may hit rate limits (of 8 edits per minute) when editing or creating items. [Autoconfirmed users](https://www.wikidata.org/wiki/Wikidata:User_access_levels#Autoconfirmed_users), (an account with at least 4 days of age and at least 50 edits), should not need to worry about hitting these limits. Users who anticipate making large numbers of edits to Wikidata should create a separate [bot](https://www.wikidata.org/wiki/Wikidata:Bots) account and [request approval](https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot).
# The Core Parts #
wdi_core supports two modes it can be operated in, a normal mode, updating each item at a time and a 'fastrun' mode, which is pre-loading data locally and then just updating items if the new data provided is differing from what is in Wikidata. The latter mode allows for great speedups (measured up to 9x) when tens of thousand of Wikidata
items need to be checked if they require updates but only a small number will finally be updated, a situation usually encountered when keeping Wikidata in sync with an external resource.
wdi_core consists of a central class called WDItemEngine and WDLogin for authenticating with Wikidata/Wikipedia.
## wdi_core.WDItemEngine ##
This is the central class which does all the heavy lifting.
Features:
* Load a Wikidata item based on data to be written (e.g. a unique central identifier)
* Load a Wikidata item based on its Wikidata item id (aka QID)
* Checks for conflicts automatically (e.g. multiple items carrying a unique central identifier will trigger an exception)
* Checks automatically if the correct item has been loaded by comparing it to the data provided
* All Wikidata data types implemented
* A dedicated WDItemEngine.write() method allows loading and consistency checks of data before any write to Wikidata is performed
* Full access to the whole Wikidata item as a JSON document
* Minimize the number of HTTP requests for reads and writes to improve performance
* Method to easily execute [SPARQL](query.wikidata.org) queries on the Wikidata endpoint.
There are two ways of working with Wikidata items:
* A user can provide data, and WDItemEngine will search for and load/modify an existing item or create a new one, solely based on the data provided (preferred). This also performs consistency checks based on a set of SPARQL queries.
* A user can work with a selected QID to specifically modify the data on the item. This requires that the user knows what he/she is doing and should only be used with great care, as this does not perform consistency checks.
Examples below illustrate the usage of WDItemEngine.
## wdi_login.WDLogin ##
### Login with username and password ###
In order to write bots for Wikidata, a bot account is required and each script needs to go through a login procedure. For obtaining a bot account in Wikidata,
a specific task needs to be determined and then proposed to the Wikidata community. If the community discussion results in your bot code and account being considered useful for Wikidata, you are ready to go.
However, the code of wdi_core can also run with normal user accounts, the differences are primarily that you have lower writing limits per minute.
wdi_login.WDLogin provides the login functionality and also stores the cookies and edit tokens required (For security reasons, every Wikidata edit requires an edit token).
The constructor takes two essential parameters, username and password. Additionally, the server (default www.wikidata.org) and the the token renewal periods can be specified.
```Python
login_instance = wdi_login.WDLogin(user='<bot user name>', pwd='<bot password>')
```
### Login using OAuth1 ###
The Wikimedia universe currently only support authentication via OAuth1. If WDI should be used as a backend for a webapp or the bot should use OAuth for authentication, WDI supports this
You just need to specify consumer token and consumer secret when instantiating wdi_login.WDLogin. In contrast to username and password login, OAuth is a 2 step process as manual user confirmation
for OAuth login is required. This means that the method continue_oath() needs to be called after creating the wdi_login.WDLogin instance.
Example:
```Python
login_instance = wdi_login.WDLogin(consumer_key='<your_consumer_key>', pwd='<your_consumer_secret>')
login_instance.continue_oauth()
```
The method continue_oauth() will either promt the user for a callback URL (normal bot runs) or it will take a parameter so in the case of WDI being
used as a backend for e.g. a web app, where the callback will provide the authentication information directly to the backend and so
no copy and paste of the callback URL is required.
## Wikidata Data Types ##
Currently, Wikidata supports 17 different data types. The data types are represented as their own classes in wdi_core. Each data type has its specialties, which means that some of them
require special parameters (e.g. Globe Coordinates).
The data types currently implemented:
* wdi_core.WDCommonsMedia
* wdi_core.WDExternalID
* wdi_core.WDForm
* wdi_core.WDGeoShape
* wdi_co
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
资源分类:Python库 所属语言:Python 资源全名:wikidataintegrator-0.8.16.tar.gz 资源来源:官方 安装方法:https://lanzao.blog.csdn.net/article/details/101784059
资源推荐
资源详情
资源评论
























收起资源包目录













































共 38 条
- 1
资源评论


挣扎的蓝藻
- 粉丝: 15w+
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- (源码)基于React框架的购物网站.zip
- 施工视频监控系统监控终端使用手册数字视频网络管理系统样本.doc
- 基于单片机的电子称设计.doc
- 序列分析软件DNAMAN的使用100121.ppt
- 以大学生科技竞赛为导向的嵌入式系统教学改革探索教育文档.doc
- 外贸企业利用电子商务发展策略的调查报告.doc
- 渗流模型的计算机模拟设计.doc
- 沁县经济和信息化局预算公开编制说明.doc
- 节网络银行与支付市场.pptx
- 项目管理质量保证说明书样本.doc
- 超算在云计算网络中的部署方案.doc
- 网络与信息安全概述.pptx
- 中国移动网上大学5G+揭秘云计算.doc
- (源码)基于Arduino的MPU6050传感器读取程序.zip
- 谭浩强C语言教程函数.ppt
- 计算机三级笔试266.pdf
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



安全验证
文档复制为VIP权益,开通VIP直接复制
