Neuroimaging, Genetics, and Clinical Data Sharing in Python Using The Cubicweb Framework
Neuroimaging, Genetics, and Clinical Data Sharing in Python Using The Cubicweb Framework
Citation:
Health research strategies using neuroimaging have shifted in recent years: the focus has moved
Grigis A, Goyard D, Cherbonnier R, from patient care only, to a combination of patient care and prevention. In the case of neuro-
Gareau T, Papadopoulos Orfanos D, degenerative and psychiatric diseases, this drives the creation of increasingly numerous massive
Chauvat N, Di Mascio A, imaging studies also known as Population Imaging (PI) surveys (Hurko et al., 2012; Poldrack and
Schumann G, Spooren W, Murphy D Gorgolewski, 2014). It should be noticed that PI studies no longer consist of image data only. The
and Frouin V (2017) Neuroimaging,
recent wide availability of high-throughput genomics has augmented the subject data with genetics,
Genetics, and Clinical Data Sharing in
Python Using the CubicWeb
epigenetics, and functional genomics. Likewise, the standardization of personality, demographics,
Framework. and deficit tests in psychiatry facilitates the acquisition of clinical/behavioral records to enrich the
Front. Neuroinform. 11:18. subject data in large population studies. Moreover, PI studies now classically encompass more than
doi: 10.3389/fninf.2017.00018 one single imaging session per subject and cover multiple-time point heterogeneous experiments.
Ultimately, these studies with complex imaging and extended were originally instituted for IMAGEN and EU-AIMS projects
data (PIx) require multi-center acquisitions to build a large in order to host their data about mental health in adolescents
target population. (Schumann et al., 2010) and autism (Murphy and Spooren, 2012),
A regular PIx infrastructure has to cover the following three respectively. The corresponding studies require key features such
main topics: (1) data collection, (2) quality control (QC) with data as upload/browse published data from the web, dynamic selec-
processing, and (3) data indexing and publication with controlled tion and filtering of displayed data, support for flexible download
data sharing mechanisms. Furthermore, PIx infrastructures must operations, high-level request language, multilevel access rights,
evolve during the life cycle of a population imaging project, and remote data access, remote user access rights management, col-
they must also be resilient to extreme evolutions of the data laborative QC, and interoperability.
content and management. In the projects we manage, we experi-
ence several extreme evolutions. The first kind of evolution may 2. MATERIALS AND METHODS
affect the published dataset such as adding a new modality for
all subjects, a new time point or a new subcohort. Second, the The three services described in the introduction were handled in
amount of data requested evolves dramatically as the project distinct developments. Section 2.1 presents the CW framework
consortium gets enlarged (Gorgolewski et al., 2015). Finally, capabilities, Sections 2.2 and 2.4 introduce the upload and pub-
internal ontologies have to evolve constantly in order to match lication web services through which the tailored requirements
the ongoing initiatives on interoperability (Scheufele et al., 2014; of PIx studies are satisfied. Furthermore, section 2.3 describes
Gorgolewski et al., 2016). a collaborative rating web service that helps users to assess the
Several existing open-source frameworks support one or data quality, and section 2.5 describes a Python API that remotely
several of the described topics sometimes only for one specific queries these web services.
data type. We propose in the following a brief overview of exist-
ing systems. Some of these systems have also been reviewed 2.1. CubicWeb Overview
by Nichols and Pohl (2015). IDA (Horn and Toga, 2009) is a All the implemented services are based on the CW framework
neuroimaging data repository and management system that (Logilab, 2000). We choose a high level pure-Python framework
supports data collection (topic (1)) and data sharing (topic that bridges web technologies and database engines. This choice
(3)). With this system, the published datasets can be searched was also based on the expertise and experience of people from
using automatically extracted metadata. The XNAT framework our laboratory and a tight collaboration with Logilab (Michel
(Marcus et al., 2013) is widely used for neuroimaging data and et al., 2013; Papadopoulos Orfanos et al., 2015). CW distribution
supports all the PIx infrastructure topics, focusing on tools is organized in a core part and a set of basic Python modules,
to pipeline, and to audit the processing of image data (topic referred to as cubes, which can be used to efficiently generate
(2)). The LORIS (Das et al., 2012) and NiDB (Book et al., web applications. The core of the CW framework, developed
2013) frameworks represent a significant effort to account for under the LGPL license, is constructed from well-established
multimodal data involved in PIx studies. These frameworks, technologies (SQL, Python, web technologies such as HTML5
although addressing all the required topics, mainly support and Javascript). The main characteristics of the CW framework
neuroimaging data. Openclinica (2015) and REDCap (Harris are given as follows:
et al., 2009) facilitate the collection of electronic data such
as eCRF or questionnaires and are recognized in projects of 1. CW defines its data model with Python classes and automati-
various sizes that support data collection (topic (1)). Likewise, cally generates the underlying database structure.
laboratory information management systems were developed 2. The queries are expressed with the RQL language which is
for the collection of genomic measurements such as SIMBioMS similar to W3C’s SPARQL (W3C, 2013). All the persistent
(Krestyaninova et al., 2009). Finally, the COINS framework data are retrieved and modified using this language.
brings essential tools for multimodal data support and, more 3. CW implements a mechanism that exposes information in
interestingly, emphasizes the importance of providing sharing several ways, referred to as views. This mechanism imple-
tools (topics (1) and (3)) (Scott et al., 2011). ments the classical model-view-controller software architec-
The two European studies we manage require a tailored PIx ture pattern. Defined in Python, the views are applied to query
infrastructure. Existing frameworks neither completely handle results, and can produce HTML pages and/or trigger external
the diversity of our PIx requirements and project life cycle nor processes. The separation of queries and views offers major
provide efficient tools to collect, check quality, and publish evolv- advantages: first, the same data selection may have several web
ing data. Additional developments were required for building such representations, and second, retrieved data can be exported in
complete infrastructure. We based these developments on a more several other formats without modifying the underlying data
general framework than the dedicated applications described storage.
above. In collaboration with Logilab company (Logilab SA, 4. All the views and triggers are recorded in a registry and are
Paris, France), we developed three highly adaptive web services, automatically selected depending on the current context,
based on the CubicWeb (CW) pure-Python framework, aimed which is inferred from the type of data returned by the RQL.
at creating a (1) multi-center upload platform, (2) collaborative 5. Thanks to the semantic nature of CW, all developments inherit
quality assessment platform, and (3) publication platform with the possibility to follow existing or emerging ontologies,
massive-download features (Logilab, 2000). These developments thereby facilitating sharing, access, and processing.
6. CW has a security system that grants fine-grained access to and middlewares. For example, they connect to LDAP for user
the data. This system is similar to the row-level security and credentials and information or invoke FUSE (2002) as a module
policies available in the most recent versions of PostgreSQL, to construct virtual file systems in a user repository for down-
and links access rights to entities/relations in the schema. loading. The business logic cubes essentially provide the database
Each entity type has a set of attributes and relations, and schema and the application cubes define the access rights and the
permissions that define who can add, read, update, or delete web interface.
such an entity and associated relations. Among the available Python-based frameworks, we chose
7. CW may run either as a standalone application or behind an CW. A major advantage of CW is the RQL language which
Apache front server. We refer to both settings as a data sharing brings end users a query interface adapted for PIx data shar-
service (DSS) (cf. Figure 1). ing. It simplifies and improves the user experience in searching
8. CW can be configured to run with various database engines. for custom datasets. RQL also avoids the use of a complicated
For the best performance, PostgreSQL is recommended. object relational mapper (ORM), is focused on browsing rela-
tions, and allows requesting several DSS at once. The semantic
Starting from the basic CW distributions, our suite of services nature of this request language requires the user to know only
is composed of an assembly of Python modules, also referred to as about the used data model defined as a graph (nothing about
cubes. The Python language is widely used in scientific communi- the underlying low-level relational model). This data model
ties and facilitates interfacing with major or emerging process- simplification and the expressiveness of RQL help users writing
ing tools such as Nipype (Gorgolewski et al., 2011), Biopython custom requests, while most of existing DSS do not expose a
(Chapman and Chang, 2000), Nilearn (Abraham et al., 2014), query language but offer a limited predefined number of opera-
and Morphologist (Fischer et al., 2012). Application cubes, built tions that can be carefully designed to be efficient (e.g., RESTful
over system cubes, and business logic cubes can be distinguished. APIs). Criticisms against systems exposing a query request
The system cubes ensure interactions with the operating system language to the end users emphasize a risk of denial of service.
Figure 1 | Architecture of a CubicWeb data sharing service (DSS) integrated in an Apache platform with LDAP. The business logic cubes provide a
schema that can be instantiated in the database management system (DBMS: red puzzle piece). The system cubes ensure low-level system interactions (green
puzzle piece), and the application cube proposes a web user interface (blue puzzle piece). End users access the database content through a web browser, a Python
API scripting the DSS or an FTP solution, where virtual folders (acting as filters on the central repository) are proposed for download.
To avoid this issue (i.e., overloading the server with arbitrary core features allowed the creation of an upload web service that
complex requests), CW allows limitation of usable resources is completely described in a single JSON file. This file links the
(RAM per request, CPU per request, number of requests per web form fields with customized or CW-internal controllers that
user, CPU time per request). We believe that users should be able manage the type of data to be collected.
to select and download only what they specifically need using a
query request language. This avoids filtering the data locally and
saves the bandwidth. 2.3. Collaborative Quality Control Service
Owing to the large amount of data gathered/analyzed in PIx stud-
2.2. Structured Data Upload Service ies, we must consider more sophisticated operating procedures
In PIx studies, massive and complex data are gathered from than simple quality controls (QCs), where datasets are usually
multiple data acquisition centers or devices (topic (1)). Each only rated once by a handful of individuals. This issue can be
collected dataset must be mapped with definitions that follow addressed by implementing a web-based collaborative quality
consensus representation rules. Those definitions are grouped control process that will also remove the bias introduced by
in data dictionaries that ideally follow standards (Rockhold isolated raters (topic (2)). Moreover, for the studies we manage,
and Bishop, 2012), but they are mainly manufacturer and/or we also added controlled vocabulary description to the ratings.
site specific. Thus, an efficient and versatile tool is required for We achieve these goals by implementing a flexible collabora-
mapping the different data dictionaries during the collection tive rating mechanism, i.e., an application cube named zeijemol.3
process. As in section 2.2, a collaborative quality control DSS is entirely
Leveraging those ideas, we implemented a flexible upload described in a single JSON file. This file consists, on the one
mechanism, a system cube named rql_upload1 and provided a hand, of the list of elements that will be rated (e.g., a Nifti image,
web frontend by integrating this cube with the application cube a FreeSurfer segmentation, or a motion curve in a diffusion
named PIWS2 (Population Imaging Web Service, cf. Figure 1). sequence of an individual) and, on the other hand, related quality
Based on a CW feature that allows database completion through indicators (e.g., binary good/bad, controlled vocabulary, scaled
online HTML forms, these two cubes were developed to collect, rating). Each element is displayed by one of the embedded view-
in a DSS, both raw data and metadata. CW also enables the ers such as triplanar view or mesh rendering (cf. Figure 3). The
customization of triggers that determine the integrity of the QC results are stored directly in the database.
uploaded data: synchronous and asynchronous validation filters The emergence of such DSS will allow machine learning tech-
can be specified and applied to each upload dataset. The upload niques to learn new classifiers to automatized the quality control
proceeds as follows (cf. Figure 2): task. The QC scores may also be directly used as prior knowledge
during the analysis stage.
1. Synchronous validations are applied to each form field (e.g., to
check the extension of a file or the structure of an Excel table). 2.4. Publication Service
If the validation filtering fails, then the web form is refreshed In PIx studies, data collection and QC are followed by data
and an adapted feedback is displayed. anonymization, ordering, and analysis. Ultimately, data are made
2. After synchronous validation, all the uploaded raw data/ available to the acquisition partners or the scientific community
metadata are stored in generic entities and a “Quarantine” (topic (3)). While browsing the database content through the web
status is set. To avoid cluttering of the database and to ease interface, users expect to be able to download the displayed files
file manipulation, files are stored in the central repository as well as the data description and rich links between the data,
but remain accessible through the database. File hashes are also referred to as metadata. An intuitive and reliable sharing
automatically computed and indexed in order to assess data mechanism is therefore crucial as large amounts of heterogene-
integrity. ous evolving data must be provided. Furthermore, for the studies
3. To update the upload status from “Quarantine” to we manage, access rights are split along time points, scan types,
“Rejected”/“Validated,” automatic asynchronous validations questionnaires, or questions to match the consortia multilevel
can be configured in the service as looping tasks. Those valida- access permissions.
tion filters are project and/or data and/or upload specific and Therefore, we implemented a system cube named rql_down-
generate adapted feedbacks for users and data managers. load4 and provided a web frontend by integrating this cube with
PIWS5 whenever it was used in a publication service (cf. Figure 1).
Moreover, any entity or relation may be endowed with access The rql_download cube converts the result of any RQL query into
permission rules (Logilab, 2000). Based on the CW security files on a virtual file system that, in turn, can be accessed through
mechanisms, a customized security model was implemented for a secured file transfer protocol (sFTP) (cf. Figure 1). Section 2.4.1
our upload DSS (it can be extended later). Only specific groups introduces the business logic cubes used to describe the neuro-
have the authorization to upload, and users can only access the imaging genetics data and metadata and the relationship between
uploads, which they are interested in. The customization of these
3
http://neurospin.github.io/zeijemol.
1
http://neurospin.github.io/rql_upload. 4
http://neurospin.github.io/rql_download.
2
http://neurospin.github.io/piws. 5
http://neurospin.github.io/piws.
Figure 2 | Illustration of the upload process. The (A) syntax of a form description JSON file, (B) corresponding web form as presented to users (here an error
message returned by synchronous validation is displayed in the top red box), (C) “Quarantine” status, and (D) “Validated” status (obtained after asynchronous
validation) as displayed to users: note that no feedback is shown here.
these data. Section 2.4.2 shows how users can save the content of (including chromosomes, genes, SNPs, or genomic platforms).
their current search from the DSS web interface. Section 2.4.3 An excerpt of the produced schema is shown in Figure 4.
describes two approaches of rql_download, based on two basic
softwares (FUSE or Twisted), that give users access to their saved 2.4.2. Efficient Data Selection and Download Tool:
searches. This section also discusses the pros and cons of both.
The Data Shopping Cart Mechanism
Section 2.4.4 presents a suitable strategy for setting user rights
When an RQL query result set is returned by the DSS, the most
from the CW security system. Finally, section 2.4.5 presents a
adapted view is automatically selected, and facets are attached
descriptive data insertion mechanism, as a set of Python scripts.
to each webpage, thereby providing filtering rules. Facets allow
interactive and graphical search refinements in accordance with
2.4.1. A Dedicated Structure for Imaging Genomics
selected attributes (e.g., sex or handedness filter for a subject
Questionnaire Data result set). The developed shopping cart mechanism serves to
The database schema was developed for handling multi-time save the user searches that consist of data, possibly large files, and
point/multimodal datasets in the brainomics business logic cube.6 metadata. This mechanism and the facet filtering are smoothly
This schema supports general information such as subject data integrated: activating a filter option from the web interface auto-
and associated metadata (age, handedness, sex, …), acquisi- matically updates the search query result set, and thus, the list of
tion center definitions, multimodal imaging datasets, clinical/ files that will be dropped for download (cf. Figure 5). The data
behavioral records, processed data, and some genomic concepts added to one cart has an expiration date that can be configured
in the service. Convenient access rights are set: users can only
6
https://github.com/neurospin/brainomics2. access their own searches. For the sake of the EU-AIMS project
Figure 3 | The collaborative quality control web service of a FreeSurfer segmentation element of one subject. (A) the quality indicators (in this case, a
controlled vocabulary with an accept/prescribe manual edit/reject decision and an optional check-box justification), (B) a triplanar view of the white and pial surfaces
overlayed on the anatomical image, and (C) the white and pial meshes with statistical indicators.
Figure 4 | A snippet of the schema used in a publication DSS. We see from the green boxes that all entities are related to an “Assessment” entity through an
“in_assessment” relation. This behavior is inherited from the access rights described in section 2.4.4.
hosted in our laboratory, a video explaining the data shopping 1. FUSE virtual folders: For each search, the system builds a list
cart mechanism is available.7 of files to be downloaded, and subsequently creates a virtual
FUSE directory acting as a filter on the central repository. The
2.4.3. The Transfer of the Shopping Cart Content: user can only see subsets of files/directories corresponding to
Data Download his queries built in accordance with his access rights. Finally,
When saved, the cart content is made available as virtual files and the system delegates the data transfers to the sFTP server. The
folders. A major advantage of the developed solution is that data major advantage of this approach is the use of the standard
compression or duplication is avoided, that in turn requires no sFTP port. However, additional system level configurations
extra load for the publication DSS. Data download operations are are required during the installation of the DSS in order to set
delegated to sFTP servers to ensure secure transfers. The sFTP is the user home directories and system accounts.
standard and supported by numerous client softwares on most 2. Twisted server: This approach is characterized by a Python
systems. process that creates a Twisted8 event-driven networking server,
Two approaches are implemented in the rql_download cube retrieves all the searches in the database, and exposes the files
that can be selected by configuration settings: via sFTP through the created server. Again, this process acts as
7
ftp://ftp.cea.fr/pub/unati/euaims/download_euaims_data.mp4. 8
https://twistedmatrix.com/trac/.
Figure 5 | Illustration of the download process via the proposed shopping cart mechanism. (A) the facet filter bar when all the scans (“Scan” entities) are
requested (as highlighted in bold, the user has selected only the “FU2” time point and the diffusion MRI “DTI” scans), (B) the view corresponding to the filtered
dataset, (C) add this new search to the cart (by activating these filtering options, the save RQL path search will be automatically updated), (D) a new search has
been created, and (E) the download of the search and associated files as presented in FileZilla.
a filter on the central repository where a user only sees a subset we propose an operational setup of the CW security model
of files/directories. In this case, the authentication and file for our publication DSS. We built our security model around
transfers are directly operated by CW. The major advantage of “pivotal entities” rather than specifying rights on all entities.
this strategy is that no system level configuration is required. Pivotal entities are those on which access rights are defined,
However, listening on a non-default sFTP port, which could and they are related to all entities that must be covered by the
lead to firewall issues, is sometimes required. security model through a specific relation (the “in_assessment”
relation in Figure 4). Each time an entity covered by the secu-
2.4.4. Access Rights Mechanism rity model is requested, the system automatically requests its
In the CW security model, any entity or relation may be related pivotal entity and propagates the corresponding access
endowed with permission rules. To fulfill consortia’s criteria, rights.
2.4.5. The Unified Insertion Procedure We provide a regular Python module, named cwbrowser,9 that
A unified insertion module is provided as a set of Python scripts implements a Python API to connect and send RQL to a remote
to insert neuroimaging, genomic, and clinical data such as scans, DSS based on the CW framework. This module is completely
genomic measures, questionnaires, and processing steps. These independent of CW (no CW installation required) and similar
scripts were helpful in efficiently managing the large amount of to the CW distribution cwclientlib cube. A publication DSS,
evolving data in our projects. The indexed data are uniformly as described in section 2.4, can be requested by the cwbrowser
organized according to the schema structure and thus take advan- module that embeds the previously described data selection
tage of all the aforementioned developments (e.g., shopping cart and shopping cart capabilities. It automatically fills and saves a
mechanism cf. section 2.4.2, security model cf. section 2.4.4, and shopping cart from a custom RQL request, downloads the asso-
common renderings cf. Figure 6). Generating such a DSS with ciated virtual directories onto the local file system, and returns
these scripts can be performed without specific CW knowledge. the complete requested dataset. The returned dataset contains
Indeed, only a rich description of the data to be published is metadata stored in the DSS such as subject sex or quality scores,
required as a set of Python dictionary objects. and the path to the downloaded directories. These resources are
organized following the DSS layout of files and folders. The users
2.5. A Transverse Python Module to will get the same local tree which will help in writing sharable
analysis scripts.
Remotely Connect a CubicWeb DSS
With the aforementioned capabilities of the DSS, a user manu-
ally selects and downloads data through graphical interfaces in 3. RESULTS
order to analyze them locally (cf. sections 2.4.2 and 2.4.3). In the
case of an evolving DSS, the downloaded data must be regularly Our laboratory operates several DSS for the IMAGEN project
updated, and this manual process becomes time consuming and about mental health in adolescents (Schumann et al., 2010) and
error prone when large and heterogeneous data are considered. the EU-AIMS project about autism (Murphy and Spooren, 2012).
Moreover, the metadata, such as quality scores, used to specify Other DSS are currently under development to support new and
the dataset to download are also likely to change. Therefore, to ongoing initiatives. Note that the access to both IMAGEN and
achieve the analysis of up-to-date data stored in a DSS, direct EU-AIMS datasets is (to date) restricted.
programmatic interaction with the DSS is recommended. In the In the IMAGEN project, 2,000 subjects are monitored on
neuroimaging and neuroscience communities, data are typically at least two visits (the third follow-up is underway). T1, T2,
analyzed by using Python scripts. Classically, the systems provide FLAIR, DWI, B0, task fMRI, resting-state fMRI scans are
RESTful web services such as XNAT, with a Python API (Schwartz acquired, as well as clinical/behavioral records, genotyping,
et al., 2012). Inheriting from the RQL request language, our gene expression, and methylation. A publication DSS at https://
publication DSS (cf. section 2.4) offers a rich interface to access
the data. 9
http://neurospin.github.io/rql_download/cwbrowser.
Figure 6 | Summary views of the database status. Global information, for example the (A) gender or (B) handedness distributions, (C) acquisition status, and
(D) age distribution, or longitudinal information, such as (E) the answers of subject 2 to specific questions across the study time points.
REFERENCES Das, S., Zijdenbos, A. P., Vins, D., Harlap, J., and Evans, A. C. (2012). LORIS:
a web-based data management system for multi-center studies. Front.
Abraham, A., Pedregosa, F., Eickenberg, M., Gervais, P., Mueller, A., Kossaifi, Neuroinformatics 5:37. doi:10.3389/fninf.2011.00037
J., et al. (2014). Machine learning for neuroimaging with scikit-learn. Front. Dumontier, M., Callahan, A., Cruz-Toledo, J., Ansell, P., Emonet, V., Belleau, F.,
Neuroinform. 8:14. doi:10.3389/fninf.2014.00014 et al. (2014). “Bio2rdf release 3: a larger connected network of linked data for
Book, G., Anderson, B., Stevens, M., Glahn, D., Assaf, M., and Pearlson, G. D. the life sciences,” in Proceedings of the 2014 International Conference on Posters
(2013). Neuroinformatics database (nidb) a modular, portable database for & Demonstrations Track – Volume 1272, ISWC-PD’14, 401–404. Available at:
the storage, analysis, and sharing of neuroimaging data. Neuroinformatics 11, http://CEUR-WS.org
495–505. doi:10.1007/s12021-013-9194-1 Fischer, C., Operto, G., Laguitton, S., Perrot, M., Denghien, I., Rivire, D., et al.
Chapman, B., and Chang, J. (2000). Biopython: python tools for computational (2012). “Morphologist 2012: the new morphological pipeline of brainvisa,” in
biology. SIGBIO Newsl. 20, 15–19. doi:10.1145/360262.360268 Human Brain Mapping HBM’12, Beijing, China.
FUSE. (2002). Filesystem in Userspace. Available at: http://fuse.sourceforge.net/ Nichols, B., and Pohl, K. (2015). Neuroinformatics software applications
Gibaud, B., Kassel, G., Dojat, M., Batrancourt, B., Michel, F., Gaignard, A., et al. supporting electronic data capture, management, and sharing for the neuro-
(2011). Neurolog: sharing neuroimaging data using an ontology-based feder- imaging community. Neuropsychol. Rev. 25, 356–368. doi:10.1007/s11065-015-
ated approach. AMIA Annu. Symp. Proc. 2011, 472–480. 9293-x
Gorgolewski, K., Auer, T., Calhoun, V., Craddock, R., Das, S., Duff, E., et al. (2016). Openclinica. (2015). OpenClinica Reference Guide. Available at: https://docs.
The brain imaging data structure, a format for organizing and describing openclinica.com/
outputs of neuroimaging experiments. Scientific Data 3, 160044. doi:10.1038/ Papadopoulos Orfanos, D., Michel, V., Schwartz, Y., Pinel, P., Moreno, A., Bihan,
sdata.2016.44 D. L., et al. (2015). The brainomics/localizer database. Neuroimage 144(Pt B),
Gorgolewski, K. J., Burns, C. D., Madison, C., Clark, D., Halchenko, Y. O., 309–314. doi:10.1016/j.neuroimage.2015.09.052
Waskom, M. L., et al. (2011). Nipype: a flexible, lightweight and extensible neu- Poldrack, R., Kittur, A., Kalar, D., Miller, E., Seppa, C., Gil, Y., et al. (2011). The
roimaging data processing framework. Front. Neuroinform 5:13. doi:10.3389/ cognitive atlas: toward a knowledge foundation for cognitive neuroscience.
fninf.2011.00013 Front. Neuroinform. 5:17. doi:10.3389/fninf.2011.00017
Gorgolewski, K. J., Varoquaux, G., Rivera, G., Schwartz, Y., Ghosh, S. S., Maumet, Poldrack, R. A., and Gorgolewski, K. J. (2014). Making big data open: data sharing
C., et al. (2015). Neurovault.org: a web-based repository for collecting and in neuroimaging. Nat. Neurosci. 17, 1510–1517. doi:10.1038/nn.3818
sharing unthresholded statistical maps of the human brain. Front. Neuroinform Rockhold, F., and Bishop, S. (2012). Extracting the value of standards: the role of
9:8. doi:10.3389/fninf.2015.00008 CDISC in a pharmaceutical research strategy. Clin. Eval. 40, 91–96.
Harris, P., Taylor, R., Thielke, R., Payne, J., Gonzalez, N., and Conde, J. (2009). Scheufele, E., Aronzon, D., Coopersmith, R., McDuffie, M. T., Kapoor, M., Uhrich,
Research electronic data capture (redcap) – a metadata-driven methodology C. A., et al. (2014). tranSMART: an open source knowledge management and
and workflow process for providing translational research informatics support. high content data analytics platform. AMIA Jt. Summits Transl. Sci. Proc. 2014,
J. Biomed. Inform. 42, 377–381. doi:10.1016/j.jbi.2008.08.010 96–101.
Horn, J. V., and Toga, A. (2009). Is it time to re-prioritize neuroimaging data- Schumann, G., Loth, E., Banaschewski, T., Barbot, A., Barker, G., Büchel, C., et al.
bases and digital repositories? Neuroimage 47, 1720–1734. doi:10.1016/j. (2010). The IMAGEN study: reinforcement-related behaviour in normal brain
neuroimage.2009.03.086 function and psychopathology. Mol. Psychiatry 15, 1128–1139. doi:10.1038/
Hurko, O., Black, S. E., Doody, R., Doraiswamy, P. M., Gamst, A., Kaye, J., et al. mp.2010.4
(2012). The ADNI publication policy: commensurate recognition of critical Schwartz, Y., Barbot, A., Thyreau, B., Frouin, V., Varoquaux, G., Siram, A., et al.
contributors who are not authors. Neuroimage 59, 4196–4200. doi:10.1016/ (2012). Pyxnat: Xnat in Python. Front. Neuroinform. 6:12. doi:10.3389/
j.neuroimage.2011.10.085 fninf.2012.00012
Keator, D., Helmer, K., Steffener, J., Turner, J., Erp, T. V., Gadde, S., et al. (2013). Scott, A., Courtney, W., Wood, D., de la Garza, R., Lane, S., King, M., et al.
Towards structured sharing of raw and derived neuroimaging data across (2011). COINS: an innovative informatics and neuroimaging tool suite built
existing resources. Neuroimage 82, 647–661. doi:10.1016/j.neuroimage.2013. for large heterogeneous datasets. Front. Neuroinform. 5:33. doi:10.3389/fninf.
05.094 2011.00033
Krestyaninova, M., Zarins, A., Viksna, J., Kurbatova, N., Rucevskis, P., Neogi, S. G., W3C. (2013). SPARQL Query Language for RDF. Available at: http://www.w3.org/
et al. (2009). A system for information management in BioMedical studies – TR/rdf-sparql-query/
SIMBioMS. Bioinformatics 25, 2768–2769. doi:10.1093/bioinformatics/btp420
Logilab. (2000). CubicWeb – The Semantic Web Is a Construction Game. Available Conflict of Interest Statement: The authors declare that the research was con-
at: https://www.cubicweb.org/ ducted in the absence of any commercial or financial relationships that could be
Marcus, D. S., Harms, M. P., Snyder, A. Z., Jenkinson, M., Wilson, J. A., Glasser, construed as a potential conflict of interest.
M. F., et al. (2013). Human connectome project informatics: quality control,
database services, and data visualization. Neuroimage 80, 202–219. doi:10.1016/ Copyright © 2017 Grigis, Goyard, Cherbonnier, Gareau, Papadopoulos Orfanos,
j.neuroimage.2013.05.077 Chauvat, Di Mascio, Schumann, Spooren, Murphy and Frouin. This is an open-access
Michel, V., Schwartz, Y., Pinel, P., Cayrol, O., Moreno, A., Poline, J.-B., et al. (2013). article distributed under the terms of the Creative Commons Attribution License (CC
“Brainomics – a management system for exploring and merging heterogeneous BY). The use, distribution or reproduction in other forums is permitted, provided the
brain mapping data,” in Human Brain Mapping HBM’13, Seattle. original author(s) or licensor are credited and that the original publication in this
Murphy, D., and Spooren, W. (2012). EU-AIMS: a boost to autism research. Nat. journal is cited, in accordance with accepted academic practice. No use, distribution
Rev. Drug Discov. 11, 815–816. doi:10.1038/nrd3881 or reproduction is permitted which does not comply with these terms.