DATAIA & TransAlgo

Data Science, Intelligence & Society
March 2018
DATAIA Institute
Data Science, Artificial Intelligence & Society
Nozha Boujemaa
Director at DATAIA Institute
Research Director at Inria
nozha.boujemaa@inria.fr

Aim of convergence Institutes
•  Structuration of few centres gathering multidisciplinary scientific task forces with large
scale and high visibility in order to reach major challenges, at the crossroads of
societal and economic challenges and questions from the scientific community.
•  Advanced research-training integration.
•  Effective coupling with the socio-economic world –industry partnership.
•  DATAIA is the Convergence Institute in Data Science, AI & Society gathering 130
affiliated researchers and targeting 300 within 3 years, Kick-off => 15 February 2018

DATAIA Institute
•  4 Overarching Challenges:
o  From Machine Learning to Artificial Intelligence,
o  From Data to Knowledge, from Data to Decision,
o  Transparency, Responsible AI & Ethics,
o  Data Protection, Regulation and Economy
•  Scientific and disciplinary foundations: Math, Computer Sciences, Management and Economy,
Social Sciences, Legal Sciences
•  Application domains: Internet of people and things, Urbanization 4.0 & Mobility, Optimal Energy
Management, Business Analytics, Health, Well being & personal nutrition, e-Sciences.
•  Roadmap for 8 years, 10M€ -180 M€ Global Budget, with 14 academic founding institutions
•  Kick-off => February 15th 2018

Les membres fondateurs
•  L’Institut DATAIA est porté par l’Université Paris-Saclay et dirigé par le centre de
recherche Inria Saclay – Île-de-France :
•  Le consortium rassemble des Universités, des Instituts de recherche nationaux
et des Grandes Ecoles :

Industrial Affiliation Program
•  Contributions: research support, data and use cases
•  Participation in the definition, selection and monitoring of programs
•  Participation in defining the long-term strategic vision
•  Workshops, S&T work exchange sessions, brainstorming sessions (open problems), etc
•  IP will follow the rules defined in a consortium agreement
•  First look at IP.
Based of what is done in American Universities (Stanford model)
*
*

•  Alan Turing (UK)
•  IVADO (Canada)
•  Advanced Core Technologies for Big Data
Integration (Japan)
•  DSI (Data Science Institute – Columbia University)
International partners

Data & Algorithms
« 2 sides of the same coin »
•  Rising benefits from Big Data and AI technologies have wide impact on our economy and
social organization ;
•  Transparency and trust of such Algorithmic Systems (data & algorithms) becoming
competitiveness factors for Data-driven economy ;
•  Data analytics is changing from description of past to predictive and prescriptive analytics for
decision support ;
•  Importance of remedying the information asymmetry between the producer of the digital
service and its consumer, be it citizen or professional – B2C or B2B => civil rights, competition,
sovereignty.

Algorithmic systems in every day life
•  Some dominant platforms on the market play a role of "prescriber”
by directing a large share of user traffic:
•  Ranking mechanisms (search engine),
•  Recommendation mechanisms and content selection
•  Product or service recommendation: is it most appropriate for the consumer
(personalization) or the most appropriate to the seller (given the stock)?
•  Opacity of the use made of the personal data and how they are processed,
•  What about the consent? Is it always respected? Mobilitics CNIL-Inria (Privatics)
•  Credit scoring, how fair is it?
•  Predictive justice?
⇒  New discrimination between those who know how algorithms work ad who do not
In addition to economical and geostrategic effects on persons and societies

Algorithmic Systems Bias
Mastering Big Data Technologies: Bias problems could impact data technologies
accuracy and people’s lives
Challenges 1: Data Inputs to an Algorithm
o  Poorly selected data
o  Incomplete, incorrect, or outdated data
o  Data sets that lack disproportionately represent certain populations
o  Malicious attack
Challenges 2: The Design of Algorithmic Systems and Machine Learning
o  Poorly designed matching systems
o  Unintentional perpetuation and promotion of historical biases
o  Decision-making systems that assume correlation implies causation

Challenges / Efforts
•  It is a mistake to assume they are objective simply because they are data-driven
•  Algorithms are encapsulated opinions through decision parameters and learning data
•  Mastering the accuracy and robustness of Big Data & AI techniques: bias, reproducibility,
source of unintentional discrimination
•  Implementing the “Transparent-by-design”: fairness/equity, loyalty, neutrality, etc.
•  Interdisciplinary co-conception of solutions, How responsible is a ML algorithm?
•  Interdisciplinary training of Data Scientists: law, sociology and economy, Careful software reuse
=> mastering information leaks (SRE)
AI is part of the solution and not only the law!
Transparency Tools vs GDPR vs Having the Choice

Transparent-by-design, auditable-by-design, fairness & non-discrimination-by-
design
§ Explainability, reproducibility & robustness of ML,
§ Data provenance and usage monitoring
§ Progressive user-centric analytics (Mix of Dataviz and Analytics)
§ New paradigms for information flow monitoring
§ Fact-checking requiring explicit & verifiable integration of heterogeneous
data sources

•  Complex concepts, Dependent on cultural context, law context, etc.
International collaboration is key
Transparency, Asymmetry, Accountability, Loyalty, Fairness, Equity, Intelligibility, Explainability, Traceability,
Auditability, Proof and Certification, Performance, Ethics, Responsibility
Ethical ≠ Responsible, Transparent ≠ Make available the source code
•  Pedagogy and explanation, awareness, uses-cases, (all public! Including scientists)
•  Auditability and Building Transparent-by-Design tools and algorithms
ML algorithms are shared in open-source but NOT Data (governance of AS!)

Interdisciplinary challenges
•  From Machine Learning to Artificial Intelligence
o  Innovative machine learning and AI: common sense, adaptability, generalization
o  Deep learning and adversarial learning
o  Machine learning and hyper-optimization
o  Optimization for learning, stochastic gradient method improvements, Bayesian
optimization, combinatorial optimization
o  Link between learning and modelling, integration of a priori into learning
o  Repeatability and robust learning
o  Statistical Inference and Validation
o  Composition of deep architectures

•  From Data to Knowledge, from Data to Decision
o  Heterogeneous, semi-structured, complex, incomplete and/or uncertain data
o  Fast big data: new methodologies to use data
o  Online learning, methodology for massive data, efficient methods
o  Improved storage, calculation and estimation for data science
o  Modeling of interactions between agents (human or artificial) by game theory
o  Multiscale and multimodal representation and algorithms
o  Theoretical analysis of heuristic methods (complexity theory, information geometry, Markov
chain theory)
o  Human-machine co-evolution in autonomous systems: conversational agents, autonomous
systems , social robots

•  Transparency & digital trust
o  Responsibility-by-design, Explicability-by-design
o  Transparency-by-design, equity-by-design
o  Audit of algorithmic systems: non-discrimination, loyalty, technical bias, neutrality, fairness
o  Measuring digital trust and ownership
o  Progressive user-centric-analytics (interactive monitoring of decision systems: dataviz,
dashboards, IHM)
o  Responsibility for information processing and decision-making: data usage control and fact-
checking
o  Causal discovery, traceability of inferences from source data, interpretability of deep
architectures

•  Data protection, regulation and economy
o  "Privacy-by-design", GDPR
o  Distributed Machine Learning preserving privacy
o  Development of ethically responsible methodologies and technologies to
regulate the collection, use and process of personal data, and the
exploitation of the knowledge derived from this data.
o  Computer security of data processing chains
o  Security/crypto: block-chain and trusted third parties

Training and research
•  Three doctoral trainings of the Université Paris-Saclay : EDMH, ED STIC & ED SHS.
•  Reinforce the math-info crossover in data sciences training, new interdisciplinary
curricula more open to SHS: awareness of the responsibility of algorithmic systems,
economic models, rights and uses of data.
•  Research Projets– 3 years, 2 thesis scholarships (or 1PhD + 1 Post-Doc/engineer).
•  International student mobility (incoming and outgoing) with 2 thesis scholarships
(excellence scholarships) per year.
•  Thematic Semesters for MSc / PhD /E-C, Biennial Conference, Annual Self-Assessment
Symposium, Workshops, Challenges, Junior Conference, Summer-school.

Co-working
•  Workspaces are available for teams affiliated to the DATAIA Institute in the Alan Turing
building, an emblematic venue :
o  1800 sqm of which approximately 300 sqm for the new teams
o  8 teams on site
o  800 sqm of meeting spaces
•  Implementation of telepresence screens in progress.

•  National Scientific Platform for Transparency &
Accountability Tools and Methods for Data and
Algorithms (Fairness, Neutrality, Loyalty); B2B &
B2C.
•  Support of The new “Law for Digital Republic”: the
right to the explainability of algorithmic decision of
public services (APB service stopped!)
•  Contributors: CNNum, DGCCRF besides academia
(Grenoble, Paris, Lille, Rennes etc), industries and
associations,

Objectives:
o  Resource center, Empowerment tools: reports, publications,
software, controlled data sets & testing protocols ;
o  Awareness rising: workshops & Moocs ;
o  Best practices recommendation & sharing ;
o  Research & Dev. Programs.
Working Groups :
o  Auditability of Recommendation and Ranking systems ;
o  Explainability, Reproducibility and Bias of ML ;
o  Privacy, Data Usage Control & Information-flow-monitoring ;
o  Influence, Nudging, Fact-ckecking.

Merci de votre attention
Science des données, Intelligence & Société
Need for Interdiscplinary efforts
THANK YOU
nozha.boujemaa@inria.fr

Summer School
•  DATAIA Institute co-organizes the DS3
Summer School with École polytechnique :
o  Speakers confirmed: Cédric Villani, Yann
Le Cun, Adrian Weller, Krishna Gummadi,
Jean-Philippe Vert …
o  Format: plenary and parallel sessions on
several sites
o  Attendees: between 400 and 500
participants (students, academics and
professionals)
DATA SCIENCE SUMMER SCHOOL
TUTORIALS ON
Deep Learning
Yann LECUN [Facebook - New York University]
Interpretable Machine Learning
Adrian WELLER [University of Cambridge - Alan Turing Institute]
Fairness in Machine Learning
Krishna GUMMADI [Max Planck Institute]
Probabilistic Numerical Methods
Mark GIROLAMI [Imperial College London]
Online Learning Algorithms
Nicolò CESA-BIANCHI [University of Milano]
Non-convex Optimization
Suvrit SRA [MIT] ... other speakers will be confirmed soon
PARALLEL SESSIONS
on Health and Social Sciences
PRACTICAL SESSIONS
on Deep Learning, Reinforcement
Learning, Recommender Systems, Precision Medicine...
POSTER SESSION
ROUND TABLE DISCUSSION
Targeted for students, academics and professionals
More information to come on:
www.ds3-datascience-polytechnique.fr
JUNE
25-29
2018
at Campus
polyteChnique
OPENING by Cédric VILLANI

France-Japan Symposium
•  The DATAIA Institute co-organize with JST a France-Japan workshop
on Deep Learning and Artificial Intelligence, in partnership with the
French Embassy in Japan and the Ministry of Higher Education,
Research and Innovation (MESRI)
o  Dates: from 11 -12 July, 2018
o  Location: Amphitheatre of MESRI
o  Format: Plenary sessions
o  Attendees: between 150 and 200 participants (academics and
professionals)
o  With the winners of the CREST Program (equivalent to ERC
senior) funded by the JST

DATAIA & TransAlgo

More Related Content

What's hot (20)

Similar to DATAIA & TransAlgo (20)

Recently uploaded (20)

DATAIA & TransAlgo