Applications

New submissions
Cross-lists
Replacements

See recent articles

Showing new listings for Monday, 2 June 2025

Total of 20 entries

Showing up to 2000 entries per page: fewer | more | all

[1] arXiv:2505.23778 [pdf, html, other]: Title: Unpaired Test for the Comparison of Frequency Response Functions Groups

Vittorio Lippi

Comments: 11 pages, 6 figures. Published as additional material for doi: https://doi.org/10.3389/fnsys.2025.1466809. arXiv admin note: substantial text overlap with arXiv:2504.20588

Subjects: Applications (stat.AP)

The frequency response function (FRF) is a typical way to describe the outcome of experiments where posture control is perturbed with an external stimulus. The FRF is an empirical transfer function between an input stimulus and the induced body segment sway profile, represented as a vector of complex values associated with a vector of frequencies. This work proposes an unpaired test based on bootstrap to compare the averages the outcome of posture control experiments.
[2] arXiv:2505.24078 [pdf, html, other]: Title: Estimation of Gender Wage Gap in the University of North Carolina System

Zihan Zhang, Jan Hannig

Subjects: Applications (stat.AP); General Economics (econ.GN)

Gender pay equity remains an open challenge in academia despite decades of movements. Prior studies, however, have relied largely on descriptive regressions, leaving causal analysis underexplored. This study examines gender-based wage disparities among tenure-track faculty in the University of North Carolina system using both parametric and non-parametric causal inference methods. In particular, we employed propensity score matching and causal forests to estimate the causal effect of gender on academic salary while controlling for university type, discipline, titles, working years, and scholarly productivity metrics. The results indicate that on average female professors earn approximately 6% less than their male colleagues with similar qualifications and positions.
[3] arXiv:2505.24235 [pdf, html, other]: Title: Alternate Groundwater Modelling Strategies: A Multi-Faceted Data-Driven Approach

Muralidharan K., Agniva Das, Shrey Pandya, Jong Min Kim

Comments: 43 Pages, 5 figures. Communicated for publication in the International Journal of Systems Assurance and Engineering

Subjects: Applications (stat.AP); Computation (stat.CO); Machine Learning (stat.ML)

The impact of statistical methodologies on studying groundwater has been significant in the last several decades, due to cheaper computational abilities and presence of technologies that enable us to extract and measure more and more data. This paper focuses on the validation of statistical methodologies that are in practice and continue to be at the earliest disposal of the researcher, demonstrating how traditional time-series models and modern neural networks may be a viable option to analyze and make viable forecasts from data commonly available in this domain, and suggesting a copula-based strategy to obtain directional dependencies of groundwater level, spatially. This paper also proposes a sphere of model validation, seldom addressed in this domain: the model longevity or the model shelf-life. Use of such validation techniques not only ensure lower computational cost while maintaining reasonably high accuracy, but also, in some cases, ensure robust predictions or forecasts, and assist in comparing multiple models.
[4] arXiv:2505.24412 [pdf, html, other]: Title: A Time-Scaled ETAS Model for Earthquake Forecasting

Agniva Das, Muralidharan K

Comments: 11 pages, 4 figures, Accepted for publication as a chapter in the Asset Analytics Book Series, Springer

Subjects: Applications (stat.AP); Methodology (stat.ME)

The Himalayan region, particularly Nepal, is highly susceptible to frequent and severe seismic activity, underscoring the urgent need for robust earthquake forecasting models. This study introduces a suite of time-scaled Epidemic-Type Aftershock Sequence (ETAS) models tailored for earthquake forecasting in Nepal, leveraging seismic data from 2000 to 2020. By incorporating alternative time-scaling approaches - such as calibration, proportional hazards, log-linear, and power time scales - the models capture nuanced temporal patterns of aftershocks, improving event classification between background and triggered occurrences. We evaluate model performance under various assumptions of earthquake magnitude distributions (exponential, gamma, and radially symmetric) and employ optimization techniques including the Davidon-Fletcher-Powell algorithm and Iterative Stochastic De-clustering. The results reveal that time-scaling significantly enhances model interpretability and predictive accuracy, with the ISDM-based ETAS model achieving the best fit. This work not only deepens the statistical understanding of earthquake dynamics in Nepal but also lays a foundation for implementing more effective early warning systems in seismically active regions.
[5] arXiv:2505.24506 [pdf, html, other]: Title: Enhancing the Accuracy of Spatio-Temporal Models for Wind Speed Prediction by Incorporating Bias-Corrected Crowdsourced Data

Eamonn Organ, Maeve Upton, Denis Allard, Lionel Benoit, James Sweeney

Subjects: Applications (stat.AP)

Accurate high-resolution spatial and temporal wind speed data is critical for estimating the wind energy potential of a location. For real-time wind speed prediction, statistical models typically depend on high-quality (near) real-time data from official meteorological stations to improve forecasting accuracy. Personal weather stations (PWS) offer an additional source of real-time data and broader spatial coverage than offical stations. However, they are not subject to rigorous quality control and may exhibit bias or measurement errors. This paper presents a framework for incorporating PWS data into statistical models for validated official meteorological station data via a two-stage approach. First, bias correction is performed on PWS wind speed data using reanalysis data. Second, we implement a Bayesian hierarchical spatio-temporal model that accounts for varying measurement error in the PWS data. This enables wind speed prediction across a target area, and is particularly beneficial for improving predictions in regions sparse in official monitoring stations. Our results show that including bias-corrected PWS data improves prediction accuracy compared to using meteorological station data alone, with a 7% reduction in prediction error on average across all sites. The results are comparable with popular reanalysis products, but unlike these numerical weather models our approach is available in real-time and offers improved uncertainty quantification.
[6] arXiv:2505.24767 [pdf, other]: Title: A survey of using EHR as real-world evidence for discovering and validating new drug indications

Nabasmita Talukdar, Xiaodan Zhang, Shreya Paithankar, Hui Wang, Bin Chen

Subjects: Applications (stat.AP); Artificial Intelligence (cs.AI)

Electronic Health Records (EHRs) have been increasingly used as real-world evidence (RWE) to support the discovery and validation of new drug indications. This paper surveys current approaches to EHR-based drug repurposing, covering data sources, processing methodologies, and representation techniques. It discusses study designs and statistical frameworks for evaluating drug efficacy. Key challenges in validation are discussed, with emphasis on the role of large language models (LLMs) and target trial emulation. By synthesizing recent developments and methodological advances, this work provides a foundational resource for researchers aiming to translate real-world data into actionable drug-repurposing evidence.
[7] arXiv:2505.24771 [pdf, html, other]: Title: Supporting product launching decisions with adversarial risk analysis

Pablo G. Arce, Sonali Das, David Ríos Insua

Subjects: Applications (stat.AP)

In a world of utility-driven marketing, each company acts as an adversary to other contenders, with all having competing interests. A major challenge for companies launching a new product is that, despite testing, flaws in their product can remain, potentially risking a loss in market share. However, delayed launch decisions can lead to losing first-mover advantages. Furthermore, each company generally has incomplete information on the launch strategy and the product quality of competing brands. From a buyer's perspective, along with the price, customers need to make their buying decisions based on noisy signals, e.g.\ regarding the quality of competing brands. This paper proposes how to support product launch decisions by a company in the presence of several competitors and multiple buyers, with the aid of adversarial risk analysis methods. We illustrate applications in two software launch cases that require deciding about timing, pricing, and quality, referring to single and multiple product purchases.
[8] arXiv:2505.24775 [pdf, other]: Title: Numerical Simulation Informed Rapid Cure Process Optimization of Composite Structures using Constrained Bayesian Optimization

Madhura Limaye, Yezhuo Li, Qiong Zhang, Gang Li

Subjects: Applications (stat.AP)

The present study aimed to solve the cure optimization problem of laminated composites through a statistical approach. The approach consisted of using constrained Bayesian Optimization (cBO) along with a Gaussian process model as a surrogate to rapidly solve the cure optimization problem. The approach was implemented to two case studies including the cure of a simpler flat rectangular laminate and a more complex L-shaped laminate. The cure optimization problem with the objective to minimize cure induced distortion was defined for both case studies. The former case study was two-variable that is used two cure cycle parameters as design variables and was constrained to achieve full cure, while the latter was four-variable and had to satisfy constraints of full cure as well as other cure cycle parameters. The performance of cBO for both case studies was compared to the traditional optimization approach based on Genetic Algorithm (GA). The comparison of results from GA and cBO including deformation and final degree of cure showed significant agreement (error < 4%). The computational efficiency of cBO was calculated by comparing the convergence steps for GA (>1000) and cBO (<50). The computational efficiency of cBO for all optimization cases was found to be > 96%. The case studies conclude that cBO is promising in terms of computational time and accuracy for solving the cure optimization problem.

[9] arXiv:1811.03437 (cross-list from cs.CY) [pdf, other]: Title: Integrating Project Spatial Coordinates into Pavement Management Prioritization

Omar Elbagalati, Mustafa Hajij

Subjects: Computers and Society (cs.CY); Machine Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)

To date, pavement management software products and studies on optimizing the prioritization of pavement maintenance and rehabilitation (M&R) have been mainly focused on three parameters; the pre-treatment pavement condition, the rehabilitation cost, and the available budget. Yet, the role of the candidate projects' spatial characteristics in the decision-making process has not been deeply considered. Such a limitation, predominately, allows the recommended M&R projects' schedule to involve simultaneously running but spatially scattered construction sites, which are very challenging to monitor and manage. This study introduces a novel approach to integrate pavement segments' spatial coordinates into the M&R prioritization analysis. The introduced approach aims at combining the pavement segments with converged spatial coordinates to be repaired in the same timeframe without compromising the allocated budget levels or the overall target Pavement Condition Index (PCI). Such a combination would result in minimizing the routing of crews, materials and other equipment among the construction sites and would provide better collaborations and communications between the pavement maintenance teams. Proposed herein is a novel spatial clustering algorithm that automatically finds the projects within a certain budget and spatial constrains. The developed algorithm was successfully validated using 1,800 pavement maintenance projects from two real-life examples of the City of Milton, GA and the City of Tyler, TX.
[10] arXiv:2505.23852 (cross-list from cs.CL) [pdf, html, other]: Title: Large Language Model-Based Agents for Automated Research Reproducibility: An Exploratory Study in Alzheimer's Disease

Nic Dobbins, Christelle Xiong, Kristine Lan, Meliha Yetisgen

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Applications (stat.AP)

Objective: To demonstrate the capabilities of Large Language Models (LLMs) as autonomous agents to reproduce findings of published research studies using the same or similar dataset.
Materials and Methods: We used the "Quick Access" dataset of the National Alzheimer's Coordinating Center (NACC). We identified highly cited published research manuscripts using NACC data and selected five studies that appeared reproducible using this dataset alone. Using GPT-4o, we created a simulated research team of LLM-based autonomous agents tasked with writing and executing code to dynamically reproduce the findings of each study, given only study Abstracts, Methods sections, and data dictionary descriptions of the dataset.
Results: We extracted 35 key findings described in the Abstracts across 5 Alzheimer's studies. On average, LLM agents approximately reproduced 53.2% of findings per study. Numeric values and range-based findings often differed between studies and agents. The agents also applied statistical methods or parameters that varied from the originals, though overall trends and significance were sometimes similar.
Discussion: In some cases, LLM-based agents replicated research techniques and findings. In others, they failed due to implementation flaws or missing methodological detail. These discrepancies show the current limits of LLMs in fully automating reproducibility assessments. Still, this early investigation highlights the potential of structured agent-based systems to provide scalable evaluation of scientific rigor.
Conclusion: This exploratory work illustrates both the promise and limitations of LLMs as autonomous agents for automating reproducibility in biomedical research.
[11] arXiv:2505.24127 (cross-list from stat.ME) [pdf, html, other]: Title: Estimating dynamic transmission rates with a Black-Karasinski process in stochastic SIHR models using particle MCMC

Avery Drennan, Jeffrey Covington, Dan Han, Andrew Attilio, Jaechoul Lee, Richard Posner, Eck Doerry, Joseph Mihaljevic, Ye Chen

Subjects: Methodology (stat.ME); Quantitative Methods (q-bio.QM); Applications (stat.AP)

Compartmental models are effective in modeling the spread of infectious pathogens, but have remaining weaknesses in fitting to real datasets exhibiting stochastic effects. We propose a stochastic SIHR model with a dynamic transmission rate, where the rate is modeled by the Black-Karasinski (BK) process - a mean-reverting stochastic process with a stable equilibrium distribution, making it well-suited for modeling long-term epidemic dynamics. To generate sample paths of the BK process and estimate static parameters of the system, we employ particle Markov Chain Monte Carlo (pMCMC) methods due to their effectiveness in handling complex state-space models and jointly estimating parameters. We designed experiments on synthetic data to assess estimation accuracy and its impact on inferred transmission rates; all BK-process parameters were estimated accurately except the mean-reverting rate. We also assess the sensitivity of pMCMC to misspecification of the mean-reversion rate. Our results show that estimation accuracy remains stable across different mean-reversion rates, though smaller values increase error variance and complicate inference results. Finally, we apply our model to Arizona flu hospitalization data, finding that parameter estimates are consistent with published survey data.
[12] arXiv:2505.24397 (cross-list from stat.ME) [pdf, html, other]: Title: Bayesian Inference for Spatially-Temporally Misaligned Data Using Predictive Stacking

Soumyakanti Pan, Sudipto Banerjee

Comments: 34 pages, 14 figures

Subjects: Methodology (stat.ME); Applications (stat.AP)

Air pollution remains a major environmental risk factor that is often associated with adverse health outcomes. However, quantifying and evaluating its effects on human health is challenging due to the complex nature of exposure data. Recent technological advances have led to the collection of various indicators of air pollution at increasingly high spatial-temporal resolutions (e.g., daily averages of pollutant levels at spatial locations referenced by latitude-longitude). However, health outcomes are typically aggregated over several spatial-temporal coordinates (e.g., annual prevalence for a county) to comply with survey regulations. This article develops a Bayesian hierarchical model to analyze such spatially-temporally misaligned exposure and health outcome data. We introduce Bayesian predictive stacking, which optimally combines multiple predictive spatial-temporal models and avoids iterative estimation algorithms such as Markov chain Monte Carlo that struggle due to convergence issues inflicted by the presence of weakly identified parameters. We apply our proposed method to study the effects of ozone on asthma in the state of California.
[13] arXiv:2505.24436 (cross-list from stat.ME) [pdf, html, other]: Title: Joint space-time modelling for upper daily maximum and minimum temperature record-breaking

Jorge Castillo-Mateo (1), Zeus Gracia-Tabuenca (1), Jesús Asín (1), Ana C. Cebrián (1), Alan E. Gelfand (2) ((1) University of Zaragoza, (2) Duke University)

Comments: 31 pages (+25 pages supplement), 13 figures (+14 figures supplement), 3 tables (+4 tables supplement)

Subjects: Methodology (stat.ME); Applications (stat.AP)

Record-breaking temperature events are now frequently in the news, proffered as evidence of climate change, and often bring significant economic and human impacts. Our previous work undertook the first substantial spatial modelling investigation of temperature record-breaking across years for any given day within the year, employing a dataset consisting of over sixty years of daily maximum temperatures across peninsular Spain. That dataset also supplies daily minimum temperatures (which, in fact, are now available through 2023). Here, the dataset is converted into a daily pair of binary events, indicators, for that day, of whether a yearly record was broken for the daily maximum temperature and/or for the daily minimum temperature. Joint modelling addresses several inference issues: (i) defining/modelling record-breaking with bivariate time series of yearly indicators, (ii) strength of relationship between record-breaking events, (iii) prediction of joint, conditional and marginal record-breaking, (iv) persistence in record-breaking across days, (v) spatial interpolation across peninsular Spain. We substantially expand our previous work to enable investigation of these issues. We observe strong correlation between both processes but a growing trend of climate change that is well differentiated between them both spatially and temporally as well as different strengths of persistence and spatial dependence.
[14] arXiv:2505.24501 (cross-list from stat.ME) [pdf, html, other]: Title: Inhomogeneous mark correlation functions for general marked point processes

Mehdi Moradi, Matthias Eckardt

Comments: submitted for publication

Subjects: Methodology (stat.ME); Applications (stat.AP)

Spatial phenomena in environmental and biological contexts often involve events that are unevenly distributed across space and carry attributes, whose associations/variations are space-dependent. In this paper, we introduce the class of inhomogeneous mark correlation functions, capturing mark associations/variations, while explicitly accounting for the spatial inhomogeneity of events. The proposed functions are designed to quantify how, on average, marks vary or associate with one another as a function of pairwise spatial distances. We develop nonparametric estimators and evaluate their performance through simulation studies covering a range of scenarios with mark association or variation, spanning from nonstationary point patterns without spatial interaction to those characterised by clustering tendencies. Our simulations reveal the shortcomings of traditional methods in the presence of spatial inhomogeneity, underscoring the necessity of our approach. Furthermore, the results show that our estimators accurately identify both the positivity/negativity and effective spatial range for detected mark associations/variations. The proposed inhomogeneous mark correlation functions are then applied to two distinct forest ecosystems: Longleaf pine trees in southern Georgia, USA, marked by their diameter at breast height, and Scots pine trees in Pfynwald, Switzerland, marked by their height. Our findings reveal that the inhomogeneous mark correlation functions provide deeper and more detailed insights into tree growth patterns compared to traditional methods
[15] arXiv:2505.24594 (cross-list from stat.ME) [pdf, html, other]: Title: Two-stage MCMC for Fast Bayesian Inference of Large Spatio-temporal Ordinal Data, with Application to US Drought

Staci Hepler, Rob Erhardt

Subjects: Methodology (stat.ME); Applications (stat.AP)

High dimensional space-time data pose known computational challenges when fitting spatio-temporal models. Such data show dependence across several dimensions of space as well as in time, and can easily involve hundreds of thousands of observations. Many spatio-temporal models result in a dependence structure across all observations and can be fit only at a substantial computational cost, arising from dense matrix inversion, high dimensional parameter spaces, poor mixing in Markov Chain Monte Carlo, or the impossibility of utilizing parallel computing due to a lack of independence anywhere in the model fitting process. These computational challenges are exacerbated when the response variable is ordinal, and especially as the number of ordered categories grows. Some spatio-temporal models achieve computational feasibility for large datasets but only through overly restrictive model simplifications, which we seek to avoid here. In this paper we demonstrate a two-stage algorithm to fit a Bayesian spatio-temporal model to large datasets when the response variable is ordinal. The first stage models locations independently in space, capturing temporal dependence, and can be run in parallel. The second stage resamples from the first stage posterior distributions with an acceptance probability computed to impose spatial dependence from the full spatio-temporal model. The result is fast Bayesian inference which samples from the full spatio-temporal posterior and is computationally feasible even for large datasets. We quantify the substantial computational gains our approach achieves, and demonstrate the preservation of the posterior distribution as compared to the more costly single-stage model fit. We apply our approach to a large spatio-temporal drought dataset in the United States, a dataset too large for many existing spatio-temporal methods.
[16] arXiv:2505.24694 (cross-list from stat.ME) [pdf, html, other]: Title: Bayesian nonparametric clustering for spatio-temporal data, with an application to air pollution

Luca Aiello, Raffaele Argiento, Sirio Legramanti, Lucia Paci

Subjects: Methodology (stat.ME); Applications (stat.AP)

Air pollution is a major global health hazard, with fine particulate matter (PM10) linked to severe respiratory and cardiovascular diseases. Hence, analyzing and clustering spatio-temporal air quality data is crucial for understanding pollution dynamics and guiding policy interventions. This work provides a review of Bayesian nonparametric clustering methods, with a particular focus on their application to spatio-temporal data, which are ubiquitous in environmental sciences. We first introduce key modeling approaches for point-referenced spatio-temporal data, highlighting their flexibility in capturing complex spatial and temporal dependencies. We then review recent advancements in Bayesian clustering, focusing on spatial product partition models, which incorporate spatial structure into the clustering process. We illustrate the proposed methods on PM10 monitoring data from Northern Italy, demonstrating their ability to identify meaningful pollution patterns. This review highlights the potential of Bayesian nonparametric methods for environmental risk assessment and offers insights into future research directions in spatio-temporal clustering for public health and environmental science.

[17] arXiv:2412.07134 (replaced) [pdf, html, other]: Title: A Bayesian Mixture Model Approach to Examining Neighborhood Social Determinants of Health Disparities in Endometrial Cancer Care in Massachusetts

Carmen B. Rodríguez, Stephanie M. Wu, Stephanie Alimena, Alecia J McGregor, Briana JK Stephenson

Subjects: Applications (stat.AP)

Many studies have examined social determinants of health (SDoH) independently, overlooking their interconnected nature. Our study uses a multidimensional approach to construct a neighborhood-level measure that explores how multiple SDoH jointly impact care received for endometrial cancer (EC) patients in Massachusetts (MA). Using 2015-2019 American Community Survey data, we implemented a Bayesian multivariate Bernoulli mixture model to identify neighborhoods with similar SDoH features in MA. Five neighborhood SDoH (NSDoH) profiles were derived and characterized: (1) advantaged non-Hispanic White; (2) disadvantaged racially/ethnically diverse, more renter-occupied housing with limited English proficiency; (3) working class, lower educational attainment; (4) racially/ethnically diverse and greater economic security and educational attainment; and (5) racially/ethnically diverse, more renter-occupied housing with limited English proficiency. Based on residential information, we assigned these profiles to EC patients in the Massachusetts Cancer Registry. We used these profile assignments as the primary exposure in a Bayesian logistic regression to estimate the odds of receiving optimal EC care, adjusting for patient-level sociodemographic and clinical characteristics. NSDoH profiles were not significantly associated with receiving optimal EC care. However, compared to patients assigned to Profile 1, patients in all other profiles had lower odds of receiving optimal care. Our findings demonstrate how a flexible model-based clustering approach can account for the interconnected and multidimensional nature of NSDoH in a practical and interpretable way. Deriving and geospatially mapping NSDoH profiles may allow for identifying areas of need and inform targeted public health interventions tailored to each neighborhood's specific social determinants to improve healthcare delivery.
[18] arXiv:2504.00459 (replaced) [pdf, html, other]: Title: Graphical Models and Efficient Inference Methods for Multivariate Phase Probability Distributions

Andrew S. Perley, Todd P. Coleman

Subjects: Methodology (stat.ME); Information Theory (cs.IT); Statistics Theory (math.ST); Quantitative Methods (q-bio.QM); Applications (stat.AP)

Multivariate phase relationships are important to characterize and understand numerous physical, biological, and chemical systems, from electromagnetic waves to neural oscillations. These systems exhibit complex spatiotemporal dynamics and intricate interdependencies among their constituent elements. While classical models of multivariate phase relationships, such as the wave equation and Kuramoto model, give theoretical models to describe phenomena, the development of statistical tools for hypothesis testing and inference for multivariate phase relationships in complex systems remains limited. This paper introduces a novel probabilistic modeling framework to characterize multivariate phase relationships, with wave-like phenomena serving as a key example. This approach describes spatial patterns and interactions between oscillators through a pairwise exponential family distribution. Building upon the literature of graphical model inference, including methods like Ising models, graphical lasso, and interaction screening, this work bridges the gap between classical wave dynamics and modern statistical approaches. Efficient inference methods are introduced, leveraging the Chow-Liu algorithm for directed tree approximations and interaction screening for general graphical models. Simulated experiments demonstrate the utility of these methods for uncovering wave properties and sparse interaction structures, highlighting their applicability to diverse scientific domains. This framework establishes a new paradigm for statistical modeling of multivariate phase relationships, providing a powerful toolset for exploring the complexity of these systems.
[19] arXiv:2505.17836 (replaced) [pdf, html, other]: Title: Robust Distributed Estimation: Extending Gossip Algorithms to Ranking and Trimmed Means

Anna Van Elst, Igor Colin, Stephan Clémençon

Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP)

This paper addresses the problem of robust estimation in gossip algorithms over arbitrary communication graphs. Gossip algorithms are fully decentralized, relying only on local neighbor-to-neighbor communication, making them well-suited for situations where communication is constrained. A fundamental challenge in existing mean-based gossip algorithms is their vulnerability to malicious or corrupted nodes. In this paper, we show that an outlier-robust mean can be computed by globally estimating a robust statistic. More specifically, we propose a novel gossip algorithm for rank estimation, referred to as \textsc{GoRank}, and leverage it to design a gossip procedure dedicated to trimmed mean estimation, coined \textsc{GoTrim}. In addition to a detailed description of the proposed methods, a key contribution of our work is a precise convergence analysis: we establish an $\mathcal{O}(1/t)$ rate for rank estimation and an $\mathcal{O}(\log(t)/t)$ rate for trimmed mean estimation, where by $t$ is meant the number of iterations. Moreover, we provide a breakdown point analysis of \textsc{GoTrim}. We empirically validate our theoretical results through experiments on diverse network topologies, data distributions and contamination schemes.
[20] arXiv:2505.20929 (replaced) [pdf, html, other]: Title: Two-step dimensionality reduction of human mobility data: From potential landscapes to spatiotemporal insights

Yunhan Du, Takaaki Aoki, Naoya Fujiwara

Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph); Applications (stat.AP)

Understanding the spatiotemporal patterns of human mobility is crucial for addressing societal challenges, such as epidemic control and urban transportation optimization. Despite advancements in data collection, the complexity and scale of mobility data continue to pose significant analytical challenges. Existing methods often result in losing location-specific details and fail to fully capture the intricacies of human movement. This study proposes a two-step dimensionality reduction framework to overcome existing limitations. First, we construct a potential landscape of human flow from origin-destination (OD) matrices using combinatorial Hodge theory, preserving essential spatial and structural information while enabling an intuitive visualization of flow patterns. Second, we apply principal component analysis (PCA) to the potential landscape, systematically identifying major spatiotemporal patterns. By implementing this two-step reduction method, we reveal significant shifts during a pandemic, characterized by an overall declines in mobility and stark contrasts between weekdays and holidays. These findings underscore the effectiveness of our framework in uncovering complex mobility patterns and provide valuable insights into urban planning and public health interventions.

Total of 20 entries

Showing up to 2000 entries per page: fewer | more | all

Applications

Showing new listings for Monday, 2 June 2025

New submissions (showing 8 of 8 entries)

Cross submissions (showing 8 of 8 entries)

Replacement submissions (showing 4 of 4 entries)