HDDM：分层文档数据模型资源-CSDN下载

共97个文件

cpp：39个

h：23个

xml：4个

需积分: 19 65 浏览量 2021-02-09 04:33:30 上传评论收藏 234KB ZIP 举报

资源详情

资源评论

资源推荐

收起资源包目录

HDDM-main.zip （97个子文件）

HDDM-main

.gitignore 17B

src

xml-hddm.cpp 16KB

XParsers.cpp 12KB

hddmcat.cpp 4KB

hddm-cpp.cpp 167KB

hddm-py.cpp 110KB

XString.cpp 2KB

hddm-xml.cpp 19KB

particleType.h 74KB

VersionConfig.hpp.in 96B

hddm-root.cpp 29KB

XParsers.hpp 2KB

md5.h 3KB

XString.hpp 1KB

CMakeLists.txt 2KB

md5.c 12KB

LICENSE 11KB

schema

hddm-schema.xsl 7KB

schema-hddm.xsl 17KB

examples

hddm_t.c 10KB

event.xsd.test 20KB

exam2.py 2KB

hddm_t.h 2KB

hitexam.xml 405B

exam1.xml 330B

testhddm.cpp 3KB

hddmcp.py 1KB

exam2.c 3KB

test.xml 901B

exam2.cpp 2KB

exam2.xml 375B

README.md 8KB

INSTALL 4KB

CMakeLists.txt 513B

xstream

src

dater.cpp 2KB

bz.cpp 28KB

common.cpp 1KB

debug.cpp 552B

xdr.cpp 5KB

tee.cpp 4KB

digest.cpp 3KB

md5_t.h 1KB

md5_t.pl 107B

posix.cpp 3KB

fd.cpp 3KB

z.cpp 22KB

base64.cpp 8KB

CMakeLists.txt 397B

md5.cpp 8KB

z_digest.cpp 1KB

debug.h 453B

SConscript 873B

doc

doxygen_devel.cfg.in 47KB

doxygen.cfg 47KB

doxygen_devel.cfg 47KB

doxygen.cfg.in 47KB

index 8KB

configure.ac 4KB

COPYING 26KB

examples

dater.cpp 649B

xdr_test.cpp 627B

fd_read.cpp 683B

adler.cpp 763B

b64_encode.cpp 1KB

z_decompress.cpp 1KB

crc.cpp 760B

xdr_out.cpp 367B

xdr_in.cpp 553B

xdr_test.h 274B

bz_decompress.cpp 1KB

z_compress.cpp 1KB

bz_compress.cpp 1KB

n_tee.cpp 565B

b64_decode.cpp 1KB

md5.cpp 792B

fd_write.cpp 601B

include

xstream.h 599B

xstream

stamp-h1 39B

base64.h 5KB

common.h 2KB

except

base64.h 1KB

posix.h 936B

z.h 2KB

bz.h 2KB

posix.h 3KB

except.h 1KB

config.h 2KB

z.h 7KB

xdr.h 5KB

fd.h 2KB

tee.h 2KB

bz.h 6KB

digest.h 6KB

dater.h 2KB

README 8KB

AUTHORS 43B

CMakeLists.txt 22B

# HDDM ## Hierarchical Document Data Model HDDM is a tool for automatic building of a full-featured C++ library for representation of highly structured scientific data in memory, complete with a performant i/o library for integrated storage and retrieval of unlimited amounts of repetitive data with associated metadata. Starting from a structured document written in plain text, where the user describes the data values and relationships to be expressed, the HDDM tools automatically generate custom C++ header and source files that define new user classes for building an object-oriented representation of the data in memory, storing them in a standard format in disk files for retrieval later, and efficient means for browsing/manipulation of the data using familiar OO semantics in the user's C++ or python analysis application. All of the following features that a user expects fromd a big-data modeling and i/o library are supported by HDDM. - uses standard c++11 language features, compiles with g++ 4.8.5 -std-c++11 - python support through automatic generation of custom C++ extension modules - stl list container iteration semantics in C++ for repeated data - standard python list iteration semantics for repeated data - efficient handling of sparse lists and tables through nested variable-length lists - configurable on-the-fly compression / decompression during i/o - configurable on-the-fly data integrity validation during i/o - browsable data representation on disk, choice at run-time between **HDF5** and native stream formats - platform-independence through standard byte-ordered formats of int, IEEE float in streams and on disk - automatic detection and conversion between standard and native formats - multi-threaded, multi-buffered i/o for high throughput with compression In addition to meeting the above requirements, this package combines the following features in a unique way. - automatic user library code generated using the user's own terminology for class and member names - user's data model document can be validated against xml schema using standard open-source tools - user's data model is compactly represented in a plain-text template that looks like real data - HDDM streams are highly compact prior to compresson, smaller than equivalent block-tables in an RBD - thousands of lines of efficient C++ code generated froma few dozen lines of xml written by the user - only the actual binary data values are stored in the stream, the structure and fixed metadata are saved in the header - general tools provided for rendering the contents of hddm data streams in plain-text xml ## Applications HDDM was designed in response to the needs of particle physics experiments producing petabytes of data per year, but nothing in the design is specific to that application. HDDM is of general utility for any application with large datasets consisting of highly structured data. In contrast to simple data, like photographic images consisting of regular arrays of floats or color vectors, structured data consist of heterogeneous values of various types (variable-length strings, variable-length lists of ntuples, lists of variable-length lists...) that are related to one another through a hierarchical graph. Data from advanced scientific instruments typically contain repeated blocks of a basic pattern of such relationships, with variations in the number of nodes connecting to each point in the graph from one block to the next. An xml document provides a flexible means to represent such a hierarchical graph, where the xml tags represent the data and their nesting reflects the relationships in the graph. At the top of the graph is the largest repeating pattern in the dataset, also called a record. At the bottom are the individual values representing the measured data in terms of integers and floats, together with their units. In between are the intermediate nodes of the graph that represent the ways the different values come together to form a single record from the instrument. Once this graph has been written down in the form of a structured xml document, the HDDM tools read the xml and automatically generates a custom set of C++ / python classes. The user's application can then include / import these classes and use them to read the raw data from the instrument into C++ objects for subsequent storage on disk in a standard format, and for final analysis. ## Documentation The documentation for HDDM consists of three parts: a description of the data modeling language used by the xml record template and the associated schema, a description of the user application interface in C++ and python that gives access to the generated data and i/o classes provided by the library, and instructions on how to use the tools through the examples provided with the package as a guide to users writing their own custom applications. All three of these have now been combined into The HDDM User's Guide. Instructions for building the HDDM tools from sources are found in the INSTALL file distributed with the sources. ## Dependencies HDDM relies on the following external open-source packages. Some must be installed on the user platform before HDDM can be built, and others are optional. - gcc/g++ compiler version 4.8.5 or above : compiler must support -std=c++11 standard language features - python 2.7 or above : standard python installation, including shutil, distutils modules and dependencies - apache xerces-c version 3 : standard implementation of the xerces xml library in C++, required - apache xalan-c version 1 : standard tools for schema-based xml validation and translation, optional - HDF5 version 1.12+ : public-domain library for standard disk representation of structured data, optional Uncountable other dependencies exist for other features of a standard unix/linux platform environment, such as the ld link loader, standard glibc and system libraries, compression libraries libz, libbz2, etc. ## Acknowledgements HDDM contains as a part of its source codebase a sub-package named xstream, which is a fork of an earlier open-source package that was released as xstream 2.1 by its author Claudio Valente in 1999 under the GNU LESSER GENERAL PUBLIC LICENSE. The original author and license is included unchanged under xstream/AUTHOR and xstream/COPYING. The original README written by Claudio Valente is also included. The HDDM fork of xstream 2.1 was made in 2004 in order to correct some bugs in the original v2.1 code and to add new features related to stream repositioning and multi-threaded compression/decompression. These changes made the HDDM fork of xstream no longer backward-compatible with xstream 2.1. With open acknowledgement of the important contribution of xstream 2.1 by Claudio Valente to this project, the release here of the modified xstream code under an Apache open-source license is deemed consistent with the terms of the original LGPL license that accompanied Valente's release of xstream 2.1. The original C++ xstream 2.1 package released in 1999 is apparently unrelated to a number of other currently active open-source projects named xstream, including the java project XStream by Joe Walnes et al, the javascript project xstream by Andre Staltz, among others. The author acknowledges support from the United States National Science Foundation that has enabled the development of this package within the context of the University of Connecticut nuclear physics research group, where the author serves as a professor. ## Contact HDDM is released as a public github project under an Apache Open-Source license by its designer and developer, Richard Jones, richard.t.jones(at)uconn.edu. On-going development of HDDM and user support is provided by the author to the GlueX Collaboration as a part of his contribution to the GlueX Experiment at Jefferson Lab in Newport News, Virginia. Support for other users of HDDM will be provided by the author on an as-able basis.