


default search action
ICS 2020: Barcelona, Spain
- Eduard Ayguadé, Wen-mei W. Hwu, Rosa M. Badia, H. Peter Hofstee:

ICS '20: 2020 International Conference on Supercomputing, Barcelona Spain, June, 2020. ACM 2020, ISBN 978-1-4503-7983-0
Algorithms I
- Max Carlson, Robert M. Kirby

, Hari Sundar:
A scalable framework for solving fractional diffusion equations. 2:1-2:11 - Octavi Obiols-Sales, Abhinav Vishnu, Nicholas Malaya, Aparna Chandramowlishwaran:

CFDNet: a deep learning-based accelerator for fluid simulations. 3:1-3:12 - Kanak Mahadik, Qingyun Wu, Shuai Li, Amit Sabne:

Fast distributed bandits for online recommendation systems. 4:1-4:13 - Robin Kumar Sharma, Marc Casas

:
Wavefront parallelization of recurrent neural networks on multi-core architectures. 5:1-5:12 - Ching-Hsiang Chu, Pouya Kousha

, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni, Dhabaleswar K. D. K. Panda:
NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systems. 6:1-6:12 - Shaoshuai Zhang

, Ruchi Shah, Panruo Wu
:
TensorSVM: accelerating kernel machines with tensor engine. 7:1-7:11
Algorithms II
- Brian Donnelly, Michael Gowanlock:

A coordinate-oblivious index for high-dimensional distance similarity searches on the GPU. 8:1-8:12 - Azin Heidarshenas, Serif Yesil, Dimitrios Skarlatos, Sasa Misailovic, Adam Morrison, Josep Torrellas:

V-Combiner: speeding-up iterative graph processing on a shared-memory platform with vertex merging. 9:1-9:13 - Kshitij Shukla, Sai Charan Regunta, Sai Harsh Tondomker, Kishore Kothapalli:

Efficient parallel algorithms for betweenness- and closeness-centrality in dynamic graphs. 10:1-10:12 - Ruoming Jin, Zhen Peng, Wendell Wu, Feodor F. Dragan, Gagan Agrawal, Bin Ren:

Parallelizing pruned landmark labeling: dealing with dependencies in graph algorithms. 11:1-11:13 - Marco Minutoli

, Maurizio Drocco, Mahantesh Halappanavar, Antonino Tumeo, Ananth Kalyanaraman:
cuRipples: influence maximization on multi-GPU systems. 12:1-12:11 - Hans Vandierendonck:

Graptor: efficient pull and push style vectorized graph processing. 13:1-13:13 - Babak Falsafi:

Post-moore server architecture. 14:1
Architecture I
- Laith M. AlBarakat, Paul V. Gratz

, Daniel A. Jiménez
:
SB-Fetch: synchronization aware hardware prefetching for chip multiprocessors. 15:1-15:12 - Vladimir Dimic

, Miquel Moretó
, Marc Casas
, Jan Ciesko, Mateo Valero:
RICH: implementing reductions in the cache hierarchy. 16:1-16:13 - Xianwei Cheng, Hui Zhao, Mahmut T. Kandemir, Beilei Jiang

, Gayatri Mehta:
AMOEBA: a coarse grained reconfigurable architecture for dynamic GPU scaling. 17:1-17:13 - Azin Heidarshenas, Tanmay Gangwani, Serif Yesil, Adam Morrison, Josep Torrellas:

Snug: architectural support for relaxed concurrent priority queueing in chip multiprocessors. 18:1-18:13 - Xin He, Subhankar Pal

, Aporva Amarnath, Siying Feng
, Dong-Hyeon Park, Austin Rovinski, Haojie Ye, Kuan-Yu Chen, Ronald G. Dreslinski, Trevor N. Mudge:
Sparse-TPU: adapting systolic arrays for sparse matrices. 19:1-19:12
Architecture II
- Fei Lei, Dezun Dong, Xiangke Liao, José Duato:

Bundlefly: a low-diameter topology for multicore fiber. 20:1-20:11 - Zaid Salamah A. Alzaid

, Saptarshi Bhowmik, Xin Yuan, Michael Lang:
Global link arrangement for practical Dragonfly. 21:1-21:11 - Shivani Tripathy, Debiprasanna Sahoo, Manoranjan Satpathy, Madhu Mutyam:

Fuzzy fairness controller for NVMe SSDs. 22:1-22:12 - Imran Fareed

, Mincheol Kang, Wonyoung Lee, Soontae Kim:
Leveraging intra-page update diversity for mitigating write amplification in SSDs. 23:1-23:12 - Runbin Shi, Peiyan Dong, Tong Geng, Yuhao Ding, Xiaolong Ma, Hayden Kwok-Hay So

, Martin C. Herbordt, Ang Li, Yanzhi Wang:
CSB-RNN: a faster-than-realtime RNN acceleration framework with compressed structured blocks. 24:1-24:12 - Gongjin Sun, Seongyoung Kang, Sang-Woo Jun

:
BurstZ: a bandwidth-efficient scientific computing accelerator platform for large-scale data. 25:1-25:12
Performance
- Keren Zhou

, Mark W. Krentel, John M. Mellor-Crummey
:
Tools for top-down performance analysis of GPU-accelerated applications. 26:1-26:12 - Benjamin Welton, Barton P. Miller:

Identifying and (automatically) remedying performance problems in CPU/GPU applications. 27:1-27:13 - Gleison Souza Diniz Mendonca, Chunhua Liao

, Fernando Magno Quintão Pereira:
AutoParBench: a unified test framework for OpenMP-based parallelizers. 28:1-28:10 - Zhengchun Liu, Ryan Lewis

, Rajkumar Kettimuthu, Kevin Harms, Philip H. Carns, Nageswara S. V. Rao, Ian T. Foster, Michael E. Papka
:
Characterization and identification of HPC applications at leadership computing facility. 29:1-29:12 - Jaemin Choi, David F. Richards, Laxmikant V. Kalé, Abhinav Bhatele:

End-to-end performance modeling of distributed GPU applications. 30:1-30:12 - Yehia Arafa, Abdel-Hameed A. Badawy, Gopinath Chennupati, Atanu Barai

, Nandakishore Santhi
, Stephan J. Eidenbenz:
Fast, accurate, and scalable memory modeling of GPGPUs using reuse profiles. 31:1-31:12 - Michael Wolfe:

Optimizing supercompilers for supercomputers. 32:1
Runtime
- Jesús Carretero

, Emmanuel Jeannot, Guillaume Pallez, David E. Singh
, Nicolas Vidal:
Mapping and scheduling HPC applications for optimizing I/O. 33:1-33:12 - Isaac Sánchez Barrera

, David Black-Schaffer, Marc Casas
, Miquel Moretó
, Anastasiia Stupnikova, Mihail Popov:
Modeling and optimizing NUMA effects and prefetching with machine learning. 34:1-34:13 - Rohit Zambre

, Aparna Chandramowlishwaran
, Pavan Balaji:
How I learned to stop worrying about user-visible endpoints and love MPI. 35:1-35:13 - Masab Ahmad, Mohsin Shan, Akif Rehman, Omer Khan:

Accelerating relax-ordered task-parallel workloads using multi-level dependency checking. 36:1-36:11 - Yudong Wu, Mingyao Shen

, Yi-Hui Chen, Yuanyuan Zhou:
Tuning applications for efficient GPU offloading to in-memory processing. 37:1-37:12 - Martin Winter

, Daniel Mlakar, Mathias Parger, Markus Steinberger
:
Ouroboros: virtualized queues for dynamic memory management on GPUs. 38:1-38:12
Compilers
- Ji Liu

, Abdullah-Al Kafi, Xipeng Shen, Huiyang Zhou:
MKPipe: a compiler framework for optimizing multi-kernel workloads in OpenCL for FPGA. 39:1-39:12 - Indu K. Prabhu, V. Krishna Nandivada:

Chunking loops with non-uniform workloads. 40:1-40:12 - Tyler Coy, Shuibing He, Bin Ren, Xuechen Zhang:

Compiler aided checkpointing using crash-consistent data structures in NVMM systems. 41:1-41:13 - Jialiang Tan, Shuyin Jiao, Milind Chabbi, Xu Liu:

What every scientific programmer should know about compiler optimizations? 42:1-42:12 - Tao Wang

, Nikhil Jain, David Böhme
, David Beckingsale, Frank Mueller, Todd Gamblin:
CodeSeer: input-dependent code variants selection via machine learning. 43:1-43:11

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














