Ocean Protocol
Beyond online PDFs
Marcus Jones
Monday November 3rd, 2019
Blockchain for Science @ Beta Haus, Berlin
Powered by
Overview
- Motivation: The Data Economy => The AI Economy
- Ocean Protocol Overview
- Marketplace
- Data Science
- Static assets in Ocean
- Generalized assets = Services
- No Data Escape / Federated Learning / Compute to Data
- Beyond PDFs
2
Motivation - What is the Data Economy?
33
The Data Economy measures the overall impacts of the data market on the economy as a whole. It involves the following list
of data enabled by digital technologies. The data economy also includes the direct, indirect, and induced effects of the data
market on the economy.
* Source: A study commissioned by the EU
https://ec.europa.eu/newsroom/dae/document.cfm?doc_id=444
00
The value of the data economy in 2016 was worth nearly 2% (330
Billion Euros) of the European GDP. BY 2020, this will rise to 729 Bn ->
Globally many Trillion EUR.
● Data Generation
● Data Collection
● Data Storage
● Data Processing
● Data Distribution
● Data Analysis
● Data Exploitation
Incentive to hoard and silo data
“It is not who has the best
algorithm that wins. It’s who
has the most data.”
- Banko, M., & Brill, E. (2001). Scaling to very very large corpora for
natural language disambiguation.
https://doi.org/10.3115/1073012.1073017
44
Banko + Brill proven by Deep Learning
55
Grace, K., Salvatier, J., Dafoe, A., Zhang, B., & Evans, O. (2018). Viewpoint:
When will ai exceed human performance? Evidence from ai experts.
Journal of Artificial Intelligence Research.
https://doi.org/10.1613/jair.1.11222
Unlocking the data economy: Data Orchestration
The end-to-end automation of data-driven processes, raw data -> value
6
- data fragmentation
- confusing provenance
- organizational silos
- complicated technology integrations
______________________
loss of potential value
Unlocking the data economy: AI Orchestration
Challenges in AI-driven processes
7
- data fragmentation
- even more confusing provenance
- organizational silos
- complicated technology integrations
- code authorship
- training history
- productionisation
- logging
______________________
loss of potential value!
The AI Ecosystem
8
Composition &
registration of static
AI assets into Ocean
Data science workflow
Datasets registered into Ocean Protocol
AI assets registered into Ocean Protocol
Entire workflow as an asset
(A real data-science workflow…)
Composition of a static ‘asset’
Main components
Satellite imagery:
did:op:3fdcc402b9994d88828e82f9be16e40eaf8eed10036c48ae9a826415e3ca46ce
Jupyter Notebook:
did:op:5268ca64d7d843acae995f1712f0941c0dae57fb3ec0491bb6dda83d93f534c0
Pre-trained model:
did:op:4b8a4bd15e8e429ba7918e8d9005dc58923d0ce408834e0ea9089cd41fc780b3
Example: Deforestation detection in the Amazon
Example: (Publishing an asset)
https://commons.nile.dev-ocean.com/
Key concepts
Open source
Decentralized
Permissionless
Community governed
Incentivized
_______________
Unlocking data
2020
Developer Resources
https://oceanprotocol.com
/developers
https://github.com/oceanpr
otocol
2121
Ocean Protocol Enhancement Proposals
2222
Join the discussion and ideation at:
https://github.com/oceanprotocol/OEPs
Networks
• Local test network
• Public test network
• Ethereum Mainnet
• Commons network
• Priced network
2323
Going
“beyond data”
The generalized AI asset - a service
The generalized AI asset (it’s a service)
SSH access to my GPU @ 2k
OCEAN / hr
The generalized AI asset (it’s a service)
SSH access to my GPU @ 2k
OCEAN / hr
Calculate the deforestedareas in uploaded image @10 OCEAN / image
The generalized AI asset (it’s a service)
SSH access to my GPU @ 2k
OCEAN / hr
Calculate the deforestedareas in uploaded image @10 OCEAN / image
Download the stream of IOT
data @ 50 OCEAN / hr
The generalized AI asset (it’s a service)
SSH access to my GPU @ 2k
OCEAN / hr
Calculate the deforestedareas in uploaded image @10 OCEAN / image
Download the stream of IOT
data @ 50 OCEAN / hr
Train a federated model on
our genetic data @ 500
OCEAN / 1%-increase
Research topics: Ownership of AI Assets
- Tied to identity
- Ownership loss through escape of
– Data
– AI Model
– ?
- Shared ownership?
- Delegated rights?
Research topics: Pricing of Digital Assets
- Fixed price
– Conditional
– Prorated over time, accuracy, etc.
- Negotiated
- Bonding-curve based
- Bounties/contests
(See “Data Pricing” blog series series)
https://blog.oceanprotocol.com/tagged/bonding-curves
https://blog.oceanprotocol.com/lets-talk-about-
data-pricing-part-i-bbc9cf781d9f
Killer app:
“No Data-Escape”
or
“Compute to Data”
protocol with incentives
Bring model to the data
f(x)
private
data
modeling
pipeline
privately
train model
private
model
model
predictions
Data stays
behind
firewall
33
Compute to data modes
- On premise compute
3434
Compute to data modes
- On premise compute
… third party premise compute
3535
Compute to data modes
- On premise compute
… third party premise compute
… federated learning
3636
Compute to data modes
- On premise compute
... third party premise compute
… federated learning
… homomorphic encryption
3737
Compute to data modes
- On premise compute
… third party premise compute
… federated learning
… homomorphic encryption
… production model as an AI Asset
3838
Federated / On-line / Incremental learning
Horizontal : Same feature
space
Vertical : Same ID space
Transfer : Different feature
and ID space
3939
Beyond PDFs
Orchestration of a Research Ecosystem
4242
Orchestration of Research Ecosystem
4343
100 OCEAN / Hour
594 Upvotes
Complete version
history
Delegated partial
payments
Dataset never leaves
premise,
compute-to-data
4444
(Demo links)
4545
https://commons.nile.dev-ocean.com/asset/did:op:62a453025c26438cbb5b0d3060de3a0efe316c44d32c4345a5acbe60e9e26fbd
https://commons.nile.dev-ocean.com/asset/did:op:c947f751c1c547c6a601d5036c3b41623528a2feacdf47f999b75d8df97f72e8
https://commons.nile.dev-ocean.com/asset/did:op:9b8791e65b5440049a512a8815495591445919b265654bb3ad7e3b2bdcb4e2bc
https://commons.nile.dev-ocean.com/asset/did:op:162382ec40f64742929feb7900112538c65a711f6dea4c1e80a28137f100eb96
http://a1e6efedde7f611e8887b1225ec03c49-1029831930.us-east-1.elb.amazonaws.com/hub/user/marcusjones/lab?
https://www.youtube.com/watch?v=27LHfS4xrWg
The roadmap,
going beyond data
The road ahead (Ocean Protocol)
Unlock data
1) Allow and incentivize discovery and
access to data
2) Broaden the scope to allow composite
general assets
3) Build compute-to-data
Build a permissionless ecosystem
4) Expand incentivization mechanisms
5) Fully permissionless and ownerless
4747
Research and Experiment
Provenance
Curation
Bonding curves
Federated learning
Identity
Shared ownership
Delegated ownership
Non-fungible tokens
The road ahead (BigChainDB)
Find product <> market fit
4848

Beyond Online PDFs