# HDDM
## Hierarchical Document Data Model
HDDM is a tool for automatic building of a full-featured C++ library for representation of highly structured scientific data in memory, complete with a performant i/o library for integrated storage and retrieval of unlimited amounts of repetitive data with associated metadata. Starting from a structured document written in plain text, where the user describes the data values and relationships to be expressed, the HDDM tools automatically generate custom C++ header and source files that define new user classes for building an object-oriented representation of the data in memory, storing them in a standard format in disk files for retrieval later, and efficient means for browsing/manipulation of the data using familiar OO semantics in the user's C++ or python analysis application. All of the following features that a user expects fromd a big-data modeling and i/o library are supported by HDDM.
- uses standard c++11 language features, compiles with g++ 4.8.5 -std-c++11
- python support through automatic generation of custom C++ extension modules
- stl list container iteration semantics in C++ for repeated data
- standard python list iteration semantics for repeated data
- efficient handling of sparse lists and tables through nested variable-length lists
- configurable on-the-fly compression / decompression during i/o
- configurable on-the-fly data integrity validation during i/o
- browsable data representation on disk, choice at run-time between **HDF5** and native stream formats
- platform-independence through standard byte-ordered formats of int, IEEE float in streams and on disk
- automatic detection and conversion between standard and native formats
- multi-threaded, multi-buffered i/o for high throughput with compression
In addition to meeting the above requirements, this package combines the following features in a unique way.
- automatic user library code generated using the user's own terminology for class and member names
- user's data model document can be validated against xml schema using standard open-source tools
- user's data model is compactly represented in a plain-text template that looks like real data
- HDDM streams are highly compact prior to compresson, smaller than equivalent block-tables in an RBD
- thousands of lines of efficient C++ code generated froma few dozen lines of xml written by the user
- only the actual binary data values are stored in the stream, the structure and fixed metadata are saved in the header
- general tools provided for rendering the contents of hddm data streams in plain-text xml
## Applications
HDDM was designed in response to the needs of particle physics experiments producing petabytes of data per year, but nothing in the design is specific to that application. HDDM is of general utility for any application with large datasets consisting of highly structured data. In contrast to simple data, like photographic images consisting of regular arrays of floats or color vectors, structured data consist of heterogeneous values of various types (variable-length strings, variable-length lists of ntuples, lists of variable-length lists...) that are related to one another through a hierarchical graph. Data from advanced scientific instruments typically contain repeated blocks of a basic pattern of such relationships, with variations in the number of nodes connecting to each point in the graph from one block to the next.
An xml document provides a flexible means to represent such a hierarchical graph, where the xml tags represent the data and their nesting reflects the relationships in the graph. At the top of the graph is the largest repeating pattern in the dataset, also called a record. At the bottom are the individual values representing the measured data in terms of integers and floats, together with their units. In between are the intermediate nodes of the graph that represent the ways the different values come together to form a single record from the instrument. Once this graph has been written down in the form of a structured xml document, the HDDM tools read the xml and automatically generates a custom set of C++ / python classes. The user's application can then include / import these classes and use them to read the raw data from the instrument into C++ objects for subsequent storage on disk in a standard format, and for final analysis.
## Documentation
The documentation for HDDM consists of three parts: a description of the data modeling language used by the xml record template and the associated schema, a description of the user application interface in C++ and python that gives access to the generated data and i/o classes provided by the library, and instructions on how to use the tools through the examples provided with the package as a guide to users writing their own custom applications. All three of these have now been combined into The HDDM User's Guide. Instructions for building the HDDM tools from sources are found in the INSTALL file distributed with the sources.
## Dependencies
HDDM relies on the following external open-source packages. Some must be installed on the user platform before HDDM can be built, and others are optional.
- gcc/g++ compiler version 4.8.5 or above : compiler must support -std=c++11 standard language features
- python 2.7 or above : standard python installation, including shutil, distutils modules and dependencies
- apache xerces-c version 3 : standard implementation of the xerces xml library in C++, required
- apache xalan-c version 1 : standard tools for schema-based xml validation and translation, optional
- HDF5 version 1.12+ : public-domain library for standard disk representation of structured data, optional
Uncountable other dependencies exist for other features of a standard unix/linux platform environment, such as the ld link loader, standard glibc and system libraries, compression libraries libz, libbz2, etc.
## Acknowledgements
HDDM contains as a part of its source codebase a sub-package named xstream, which is a fork of an earlier open-source package that was released as xstream 2.1 by its author Claudio Valente in 1999 under the GNU LESSER GENERAL PUBLIC LICENSE. The original author and license is included unchanged under xstream/AUTHOR and xstream/COPYING. The original README written by Claudio Valente is also included. The HDDM fork of xstream 2.1 was made in 2004 in order to correct some bugs in the original v2.1 code and to add new features related to stream repositioning and multi-threaded compression/decompression. These changes made the HDDM fork of xstream no longer backward-compatible with xstream 2.1. With open acknowledgement of the important contribution of xstream 2.1 by Claudio Valente to this project, the release here of the modified xstream code under an Apache open-source license is deemed consistent with the terms of the original LGPL license that accompanied Valente's release of xstream 2.1. The original C++ xstream 2.1 package released in 1999 is apparently unrelated to a number of other currently active open-source projects named xstream, including the java project XStream by Joe Walnes et al, the javascript project xstream by Andre Staltz, among others.
The author acknowledges support from the United States National Science Foundation that has enabled the development of this package within the context of the University of Connecticut nuclear physics research group, where the author serves as a professor.
## Contact
HDDM is released as a public github project under an Apache Open-Source license by its designer and developer, Richard Jones, richard.t.jones(at)uconn.edu. On-going development of HDDM and user support is provided by the author to the GlueX Collaboration as a part of his contribution to the GlueX Experiment at Jefferson Lab in Newport News, Virginia. Support for other users of HDDM will be provided by the author on an as-able basis.

weixin_42166626
- 粉丝: 26
最新资源
- 跨平台Python环境配置与多版本管理工具_支持Linux_OSX_Windows系统_Python3安装_版本切换_路径配置_包管理_开发环境搭建_适用于初学者和开发者_解决Py.zip
- liuyangnorway_tensorflow-anaconda_54428_1754228236474.zip
- shxiangyan_emacsd_25036_1754228238278.zip
- 基于S3C2440硬件平台与Linux34系统的激光打靶自动识别系统_使用UVC兼容USB摄像头采集绿色背景胸环靶图像_通过图像处理算法实时计算弹着点环数并通过串口发送至STM3.zip
- 2023年TI杯全国大学生电子设计竞赛控制E题_基于STM32与OpenMV的激光追踪系统_运动目标控制与自动追踪系统_嵌入式视觉识别与PID控制算法_双轴舵机云台精确定位系统_红.zip
- 《AutoUpdate软件自动更新包》
- 激光打靶与三子棋游戏控制上位机_基于Java开发的安卓应用程序_用于激光打靶游戏和三子棋游戏的控制与交互_包含游戏逻辑控制_用户界面设计_蓝牙通信模块_传感器数据处理_游戏状态管理.zip
- 2024年TI杯全国大学生电子设计竞赛H题智能小车控制系统完整解决方案_基于M0G3507主控芯片的自动循迹避障遥控多功能智能车_包含完整硬件电路设计软件算法开发及系统调试文档的竞.zip
- 2024年全国大学生电子设计竞赛H题智能小车控制系统_基于M0微控制器的四轮驱动智能循迹避障小车_包含四个子任务模块其中第三四问共用key3信号处理_初始位置需略微前移调整_实现自.zip
- CZH-jiezhou_smart_car_36224_1754228015519.zip
- ashkorehennessy_nuedc-2024-H_54428_1754228012785.zip
- hawav_ds2024_54428_1754228058989.zip
- HXBJ1737_car0_54428_1754228021780.zip
- zhanivgit_2025-0716_24844_1754228027808.zip
- JamieK32_ti-contest_36224_1754228018582.zip
- 基于ESP32微控制器的智能自动驾驶小车系统_2024全国大学生电子设计竞赛H题参赛项目_包含多传感器融合定位导航_超声波避障_OpenMV视觉识别_蓝牙遥控_OLED显示_锂电池.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



评论0