SlideShare a Scribd company logo
Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.
Business Value Creation from Linked
Data Analytics: The LinDA Approach
Anastasios Zafeiropoulos, Eleni Fotopoulou
Ubitech Ltd./R&D Department
Athens, Greece
azafeiropoulos@ubitech.eu
Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.
Introduction
• Existence of a wide set of available data sources nowadays with
parallel lack of means to exploit them in an optimal way and realise
advanced analysis.
• Need for proper interconnection of concepts for managing to examine
the relationship among entities represented in different data sets.
• An approach for the extraction of Linked Data analytics is presented,
based on the exploitation of a set of tools for proper transformation and
interlinking of public and private datasets and the realization of analysis
over them.
• The proposed approach targets at enhancing the ability of public and
private sector organizations to provide usable Linked Data, while
offering SMEs the opportunity to perform advanced algorithmic
analysis.
Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.
Challenges (1)
• Need for management of structured and un-structured data in multiple
formats that in some cases lack representation based on defined
schemas.
• Data aggregation from distributed sources
• increasing wealth of dataset cross-linkage;
• SPARQL queries cannot readily be executed as their constituent triple
patterns span across multiple datasets.
• Compilation of proper and meaningful datasets to be provided to the
analytics tools
– review the datasets and prepare them in proper format;
– extract knowledge from the data through interlinking, inferences as well as
analytics extraction;
– maintain and update the data regularly;
– elimination of co-references among the available data.
Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.
Challenges (2)
• Handle data with different quality characteristics regarding
their accuracy, consistency, timeliness, completeness,
relevance, interpretability and trustworthiness
– set of information quality assessment metrics;
– some of the indicators cannot be automatically assessed;
– data quality assessment is performed only on a small sample of the
data which results in a decrease of the precision of the quality
scores;
• Need to process high volume data in some cases as well
as have the capacity to apply and evaluate the results of
proper algorithms.
• Learning curve for the adoption of Linked Data technologies
from SMEs and public administrations.
Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.
Linked Data Analytics: the LinDA approach
• Linked Data: a set of best practices for representing and
connecting structured information on the web.
• LinDA addresses one of the most significant challenges of
the usage and publication of Linked Data, the renovation
and conversion of existing data formats into structures that
support the semantic enrichment and interlinking of data.
• The proposed approach is building upon the collection of
data from available data sources, their transformation in
proper format (e.g. RDF format) and their interlinking for the
creation of extended linked datasets, fed as input in the
analytics extraction process.
Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.
The LinDA Approach
Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.
Linked Data Analytics – Algorithms
Categorization
• A library of basic and robust data analytic functionality is
provided.
• Design and deployment of workflows for algorithms
execution based on their categorization:
– Classifiers for identifying to which of a set of categories a new
observation belongs based on a training set;
– Clusterers for grouping a set of objects in such a way that objects in
the same group are more similar to each other;
– Statistical and Forecasting Analysis for discovering interesting
relations between variables and providing information regarding
future trends;
– Attribute Selection (evaluators and search methods) algorithms for
selecting a subset of relevant features for use in model construction
based on evaluation metrics;
Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.
Integrated Components
• Based on existing open-source platforms for extracting data
analytics.
• Weka open-source tool.
• R project for statistical computing.
• Customized end-user applications for selected business
domains, targeted at reducing the overall complexity in the
configuration of the algorithms and the preparation and
management of the linked datasets.
Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.
Analytics Ecosystem Components
Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.
Interconnection with LinDA Components
Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.
Production of Linked Data Analytics
• Supporting RDF input and output.
• RDF to CSV transformation in the Publication and
Consumption Framework.
• Enriched CSV input loaded to the analytics tool: metadata
for initiating RDF URI, submitted query, analytics process
id, analytics process description, storage options at LinDA
repository.
• RDF output available in the LinDA repository.
• Analytics output available for creation of visualisations
(where appropriate).
Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.
Elaborated Ontologies
• FOAF Ontology: describe person activities;
• PROV Ontology: represent and interchange provenance information
generated in different systems and under different contexts;
• SIO Ontology: simple upper level comprised of essential types and
relations for the rich description of arbitrary (real, hypothesized,
virtual, fictional) objects, processes and their attributes.
Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.
Pilots & Business Value
• Business Intelligence Pilot
• Media Analytics Pilot
• Environment Analytics Pilot
• Redesign of current business processes
• Reduction in overall complexity and
administration overhead
• Deployment of novel services
• Short Demo
info @
LinDA-project.eu
@LinDA_FP7
+
LinDA-project.eu
LinDAFP7
Thank you! Questions? Anastasios Zafeiropoulos| azafeiropoulos@ubitech.eu
Senior R&D Architect
Ubitech Ltd.| www.ubitech.eu

More Related Content

PDF
The Services of the OpenAIREplus Infrastructure for Scholarly Communication –...
PPTX
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...
PDF
Data quality supporting AI in Life Sciences webinar 10 dec 2018
PDF
Streamlining deposit an ojs to repository plugin
PPTX
Paving the way to open and interoperable research data service workflows Prog...
PDF
Darwin ai covid-net mitre
PDF
Lee - The Data Lifecycle: Curating Partners to Curate Data
PDF
VIVO 2013 Topic Modeling Entity Extraction
The Services of the OpenAIREplus Infrastructure for Scholarly Communication –...
Visualizing Co-authorship Networks for Actionable Insights: Action Design Res...
Data quality supporting AI in Life Sciences webinar 10 dec 2018
Streamlining deposit an ojs to repository plugin
Paving the way to open and interoperable research data service workflows Prog...
Darwin ai covid-net mitre
Lee - The Data Lifecycle: Curating Partners to Curate Data
VIVO 2013 Topic Modeling Entity Extraction

What's hot (20)

PPTX
Data sharing in the Netherlands
PDF
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
PDF
Fair webinar, Ted slater: progress towards commercial fair data products and ...
PPTX
Measuring the costs and benefits of RDM to supporta a business case
PPTX
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
PPTX
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...
PDF
Johnston - How to Curate Research Data
PDF
ISWC 2016 Tutorial: Semantic Web of Things M3 framework & FIESTA-IoT EU project
PPTX
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
PDF
ODIN: Connecting research and researchers
PPTX
Manage your online profile: Maximize the visibility of your work and make an ...
PDF
Baker - Evolution of Data Products and Designated Audiences
PPTX
Introduction to Big data
PPTX
Recognising data sharing
PPTX
Paving the way to open and interoperable research data service workflows
PDF
Introduction to PANGAEA & EURO-BASIN Data Management, by Janine Felden
PPTX
University of Northumbria Research
PDF
Connecting GESIS research data and publication information systems – Katarina...
PPTX
Business case and cost modelling for an end-to-end RDM service
PPTX
agINFRA CEFood Presentation
Data sharing in the Netherlands
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent S...
Fair webinar, Ted slater: progress towards commercial fair data products and ...
Measuring the costs and benefits of RDM to supporta a business case
eROSA Stakeholder WS1: Big Data and Open Science in agricultural and environm...
RDN Lightning talk - Open Research Leeds (@OpenResLeeds): networks, metrics a...
Johnston - How to Curate Research Data
ISWC 2016 Tutorial: Semantic Web of Things M3 framework & FIESTA-IoT EU project
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
ODIN: Connecting research and researchers
Manage your online profile: Maximize the visibility of your work and make an ...
Baker - Evolution of Data Products and Designated Audiences
Introduction to Big data
Recognising data sharing
Paving the way to open and interoperable research data service workflows
Introduction to PANGAEA & EURO-BASIN Data Management, by Janine Felden
University of Northumbria Research
Connecting GESIS research data and publication information systems – Katarina...
Business case and cost modelling for an end-to-end RDM service
agINFRA CEFood Presentation
Ad

Viewers also liked (6)

PPT
NORAD Santa Tracker: Tips & Tricks
PPT
Google Wave 20/20: Product, Protocol, Platform
PPT
Writing Apps the Google-y Way (Brisbane)
PPT
Writing Apps the Google-y Way
PDF
The Developer Experience
PDF
Engineering culture
NORAD Santa Tracker: Tips & Tricks
Google Wave 20/20: Product, Protocol, Platform
Writing Apps the Google-y Way (Brisbane)
Writing Apps the Google-y Way
The Developer Experience
Engineering culture
Ad

Similar to 20141030 LinDa Workshop echallenges2014 - Linked Data Analytics (20)

PDF
20141030 LinDA Workshop echallenges2014 - LinDA project overview
PPTX
20140902 LinDa Workshop Semantincs2014 - LinDA Project Overview
PDF
Linda newsletter issue 1 dec2014
PDF
20141030 LinDA Workshop echallenges2014 - Making (linked) data available
PPT
State and future of linked data in learning analytics
PDF
20141030 LinDA Workshop echallenges2014 - Media Pilot
PDF
Tutorial Data Management and workflows
PDF
Linked Data for the Enterprise: Opportunities and Challenges
PPT
LOD2 Plenary Meeting 2011: University of Economics, Prague – Partner Introduc...
PPTX
Relationship status: Libraries and linked data in Europe
PDF
20141030 LinDA Workshop echallenges2014 - Framing the issue
PPTX
SSSW2015 Data Workflow Tutorial
PPT
PDF
Data Collection and Integration, Linked Data Management
ODP
Mining the Web of Linked Data with RapidMiner
PDF
Adopting Semantic Technology for Effective Corporate Transparency
PPTX
Smart Data for Smart Meters - Presentation at Pilod2 Meeting 2013-11-13
PPTX
The Future of LOD
PPTX
Research into Practice case study 2: Library linked data implementations an...
PDF
SemTechBiz 2012 Panel on Linking Enterprise Data
20141030 LinDA Workshop echallenges2014 - LinDA project overview
20140902 LinDa Workshop Semantincs2014 - LinDA Project Overview
Linda newsletter issue 1 dec2014
20141030 LinDA Workshop echallenges2014 - Making (linked) data available
State and future of linked data in learning analytics
20141030 LinDA Workshop echallenges2014 - Media Pilot
Tutorial Data Management and workflows
Linked Data for the Enterprise: Opportunities and Challenges
LOD2 Plenary Meeting 2011: University of Economics, Prague – Partner Introduc...
Relationship status: Libraries and linked data in Europe
20141030 LinDA Workshop echallenges2014 - Framing the issue
SSSW2015 Data Workflow Tutorial
Data Collection and Integration, Linked Data Management
Mining the Web of Linked Data with RapidMiner
Adopting Semantic Technology for Effective Corporate Transparency
Smart Data for Smart Meters - Presentation at Pilod2 Meeting 2013-11-13
The Future of LOD
Research into Practice case study 2: Library linked data implementations an...
SemTechBiz 2012 Panel on Linking Enterprise Data

More from LinDa_FP7 (7)

PPTX
Simplified minimalistic workflows for the publication of Linked Open Data
PDF
20141030 LinDA Workshop echallenges2014 - State of the art in open data infra...
PDF
20141030 LinDA Workshop echallenges2014 - Environmental Pilot
PDF
20141030 LinDA Workshop echallenges2014 - Business Intelligence Pilot
PDF
20141030 LinDA Workshop echallenges2014 - Open data commons for european citi...
PPTX
20140902 LinDa Workshop Semantincs2014 - Bringing LOD to SMEs
PDF
LinDa Official Project Presentation
Simplified minimalistic workflows for the publication of Linked Open Data
20141030 LinDA Workshop echallenges2014 - State of the art in open data infra...
20141030 LinDA Workshop echallenges2014 - Environmental Pilot
20141030 LinDA Workshop echallenges2014 - Business Intelligence Pilot
20141030 LinDA Workshop echallenges2014 - Open data commons for european citi...
20140902 LinDa Workshop Semantincs2014 - Bringing LOD to SMEs
LinDa Official Project Presentation

Recently uploaded (20)

DOCX
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
PDF
Yusen Logistics Group Sustainability Report 2024.pdf
PPTX
FINAL TEST 3C_OCTAVIA RAMADHANI SANTOSO-1.pptx
PPTX
Tour Presentation Educational Activity.pptx
PPTX
Module_4_Updated_Presentation CORRUPTION AND GRAFT IN THE PHILIPPINES.pptx
PPTX
Lesson-7-Gas. -Exchange_074636.pptx
PPTX
MERISTEMATIC TISSUES (MERISTEMS) PPT PUBLIC
PPT
First Aid Training Presentation Slides.ppt
PPTX
water for all cao bang - a charity project
PPTX
Tablets And Capsule Preformulation Of Paracetamol
PDF
_Nature and dynamics of communities and community development .pdf
PPTX
ART-APP-REPORT-FINctrwxsg f fuy L-na.pptx
PPTX
_ISO_Presentation_ISO 9001 and 45001.pptx
PPTX
lesson6-211001025531lesson plan ppt.pptx
PDF
MODULE 3 BASIC SECURITY DUTIES AND ROLES.pdf
PPTX
Research Process - Research Methods course
PDF
PM Narendra Modi's speech from Red Fort on 79th Independence Day.pdf
PDF
Microsoft-365-Administrator-s-Guide_.pdf
PDF
Presentation1 [Autosaved].pdf diagnosiss
PDF
IKS PPT.....................................
ENGLISH PROJECT FOR BINOD BIHARI MAHTO KOYLANCHAL UNIVERSITY
Yusen Logistics Group Sustainability Report 2024.pdf
FINAL TEST 3C_OCTAVIA RAMADHANI SANTOSO-1.pptx
Tour Presentation Educational Activity.pptx
Module_4_Updated_Presentation CORRUPTION AND GRAFT IN THE PHILIPPINES.pptx
Lesson-7-Gas. -Exchange_074636.pptx
MERISTEMATIC TISSUES (MERISTEMS) PPT PUBLIC
First Aid Training Presentation Slides.ppt
water for all cao bang - a charity project
Tablets And Capsule Preformulation Of Paracetamol
_Nature and dynamics of communities and community development .pdf
ART-APP-REPORT-FINctrwxsg f fuy L-na.pptx
_ISO_Presentation_ISO 9001 and 45001.pptx
lesson6-211001025531lesson plan ppt.pptx
MODULE 3 BASIC SECURITY DUTIES AND ROLES.pdf
Research Process - Research Methods course
PM Narendra Modi's speech from Red Fort on 79th Independence Day.pdf
Microsoft-365-Administrator-s-Guide_.pdf
Presentation1 [Autosaved].pdf diagnosiss
IKS PPT.....................................

20141030 LinDa Workshop echallenges2014 - Linked Data Analytics

  • 1. Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd. Business Value Creation from Linked Data Analytics: The LinDA Approach Anastasios Zafeiropoulos, Eleni Fotopoulou Ubitech Ltd./R&D Department Athens, Greece [email protected]
  • 2. Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd. Introduction • Existence of a wide set of available data sources nowadays with parallel lack of means to exploit them in an optimal way and realise advanced analysis. • Need for proper interconnection of concepts for managing to examine the relationship among entities represented in different data sets. • An approach for the extraction of Linked Data analytics is presented, based on the exploitation of a set of tools for proper transformation and interlinking of public and private datasets and the realization of analysis over them. • The proposed approach targets at enhancing the ability of public and private sector organizations to provide usable Linked Data, while offering SMEs the opportunity to perform advanced algorithmic analysis.
  • 3. Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd. Challenges (1) • Need for management of structured and un-structured data in multiple formats that in some cases lack representation based on defined schemas. • Data aggregation from distributed sources • increasing wealth of dataset cross-linkage; • SPARQL queries cannot readily be executed as their constituent triple patterns span across multiple datasets. • Compilation of proper and meaningful datasets to be provided to the analytics tools – review the datasets and prepare them in proper format; – extract knowledge from the data through interlinking, inferences as well as analytics extraction; – maintain and update the data regularly; – elimination of co-references among the available data.
  • 4. Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd. Challenges (2) • Handle data with different quality characteristics regarding their accuracy, consistency, timeliness, completeness, relevance, interpretability and trustworthiness – set of information quality assessment metrics; – some of the indicators cannot be automatically assessed; – data quality assessment is performed only on a small sample of the data which results in a decrease of the precision of the quality scores; • Need to process high volume data in some cases as well as have the capacity to apply and evaluate the results of proper algorithms. • Learning curve for the adoption of Linked Data technologies from SMEs and public administrations.
  • 5. Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd. Linked Data Analytics: the LinDA approach • Linked Data: a set of best practices for representing and connecting structured information on the web. • LinDA addresses one of the most significant challenges of the usage and publication of Linked Data, the renovation and conversion of existing data formats into structures that support the semantic enrichment and interlinking of data. • The proposed approach is building upon the collection of data from available data sources, their transformation in proper format (e.g. RDF format) and their interlinking for the creation of extended linked datasets, fed as input in the analytics extraction process.
  • 6. Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd. The LinDA Approach
  • 7. Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd. Linked Data Analytics – Algorithms Categorization • A library of basic and robust data analytic functionality is provided. • Design and deployment of workflows for algorithms execution based on their categorization: – Classifiers for identifying to which of a set of categories a new observation belongs based on a training set; – Clusterers for grouping a set of objects in such a way that objects in the same group are more similar to each other; – Statistical and Forecasting Analysis for discovering interesting relations between variables and providing information regarding future trends; – Attribute Selection (evaluators and search methods) algorithms for selecting a subset of relevant features for use in model construction based on evaluation metrics;
  • 8. Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd. Integrated Components • Based on existing open-source platforms for extracting data analytics. • Weka open-source tool. • R project for statistical computing. • Customized end-user applications for selected business domains, targeted at reducing the overall complexity in the configuration of the algorithms and the preparation and management of the linked datasets.
  • 9. Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd. Analytics Ecosystem Components
  • 10. Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd. Interconnection with LinDA Components
  • 11. Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd. Production of Linked Data Analytics • Supporting RDF input and output. • RDF to CSV transformation in the Publication and Consumption Framework. • Enriched CSV input loaded to the analytics tool: metadata for initiating RDF URI, submitted query, analytics process id, analytics process description, storage options at LinDA repository. • RDF output available in the LinDA repository. • Analytics output available for creation of visualisations (where appropriate).
  • 12. Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd. Elaborated Ontologies • FOAF Ontology: describe person activities; • PROV Ontology: represent and interchange provenance information generated in different systems and under different contexts; • SIO Ontology: simple upper level comprised of essential types and relations for the rich description of arbitrary (real, hypothesized, virtual, fictional) objects, processes and their attributes.
  • 13. Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd. Pilots & Business Value • Business Intelligence Pilot • Media Analytics Pilot • Environment Analytics Pilot • Redesign of current business processes • Reduction in overall complexity and administration overhead • Deployment of novel services • Short Demo
  • 14. info @ LinDA-project.eu @LinDA_FP7 + LinDA-project.eu LinDAFP7 Thank you! Questions? Anastasios Zafeiropoulos| [email protected] Senior R&D Architect Ubitech Ltd.| www.ubitech.eu