SlideShare a Scribd company logo
A Parallel, Energy Efficient Hardware Architecture
for the merAligner on FPGA using Chisel HCL
Lorenzo Di Tucci, Marco Santambrogio {lorenzo.ditucci, marco.santambrogio}@polimi.it
Alessandro Comodi, Davide Conficconi {alessandro.comodi,davide.conficconi}@mail.polimi.it
Steven Hofmeyr, David Donofrio {shofmeyr, ddonofrio}@lbl.gov
RAW @ JW Marriott, Vancouver
May 22 2018
speaker: Alessandro Comodi
Context 1
Large amounts of
genomic data Algorithm complexity
In such scenario there is a need to have efficient solutions both
from a performance and a power consumption point of view
Sequence Alignment 2
Sequence alignment algorithms are some of the most compute
intensive ones
Pure software solution
Poor performance
High power consumption
merAligner 3
To overcome performance issues Lawrence Berkeley National
Labs and UC Berkeley have proposed the merAligner
High
Performance
Low power efficiency
(90 kW per cabinet)
More than 15,000 cores
Contributions 4
• The design and development of a hardware
architecture for the Smith-Waterman algorithm
on FPGA, using Chisel HCL
• The development of a wrapper written in Chisel,
used to integrate RTL cores into the Xilinx
SDAccel Framework
Smith-Waterman 5
The main bottleneck of the merAligner tool is the Smith-
Waterman algorithm implementation
Highly parallel computation
Architecture 6
Systolic array
based design
Each processing element is fed
with the result of the previous one
Results 7
Read Reference
Frequency
[MHz]
Performance
[GCUPS]
Speed up
Performance
Efficiency
[GCUPS/W]
Speed up
Power efficiency
128[*] 1024 - 3.87 - 0.0165 -
128 1024 150 3.542 0.91X 0.141 8.54X
128 2048 140 5.616 1.45X 0.224 14.35X
128 4096 180 6.529 1.68X 0.261 15.81X
128 16384 110 11.443 2.84X 0.457 27.69X
256 1024 160 6.123 1.58X 0.244 14.78X
256 2048 160 8.393 2.16X 0.335 20.30X
256 4096 130 15.225 3.93X 0.609 36.90X
256 16384 140 27.312 7.05X 1.092 66.18X
[*] State of the Art Smith-Waterman software implementation
Concluding Remarks 8
Read Reference
Frequency
[MHz]
Performance
[GCUPS]
Speed up
Performance
Efficiency
[GCUPS/W]
Speed up
Power efficiency
128[*] 1024 - 3.87 - 0.0165 -
128 1024 150 3.542 0.91X 0.141 8.54X
128 2048 140 5.616 1.45X 0.224 14.35X
128 4096 180 6.529 1.68X 0.261 15.81X
128 16384 110 11.443 2.84X 0.457 27.69X
256 1024 160 6.123 1.58X 0.244 14.78X
256 2048 160 8.393 2.16X 0.335 20.30X
256 4096 130 15.225 3.93X 0.609 36.90X
256 16384 140 27.312 7.05X 1.092 66.18X
[*] State of the Art Smith-Waterman software implementation
Thank you for your attention!
Lorenzo Di Tucci, Marco Santambrogio {lorenzo.ditucci, marco.santambrogio}@polimi.it
Alessandro Comodi, Davide Conficconi {alessandro.comodi,davide.conficconi}@mail.polimi.it
Steven Hofmeyr, David Donofrio {shofmeyr, ddonofrio}@lbl.gov
speaker: Alessandro Comodi
Hardware architecture for the acceleration of the
Smith-Waterman step of the merAligner on FPGA
using Chisel HCL

More Related Content

PDF
On Design and Realization of Digitally Programmable Square-Wave Generator
PDF
New Algorithms to Improve X-Ray Inspection
PPTX
Development of variable speed wind turbine test bench eep giki group 21
PDF
An Introduction to Electronics Cooling
PPTX
Report on Project Management of a particular project
PDF
CEPH DAY BERLIN - CEPH IMPLEMENTATIONS FOR THE MEERKAT RADIO TELESCOPE
PPTX
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
On Design and Realization of Digitally Programmable Square-Wave Generator
New Algorithms to Improve X-Ray Inspection
Development of variable speed wind turbine test bench eep giki group 21
An Introduction to Electronics Cooling
Report on Project Management of a particular project
CEPH DAY BERLIN - CEPH IMPLEMENTATIONS FOR THE MEERKAT RADIO TELESCOPE
Architectural Optimizations for High Performance and Energy Efficient Smith-W...

Similar to A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA using Chisel HCL (20)

PDF
Applying Tiny Machine Learning on FOC for PMSMs
PDF
Energy Iot Architecture From Theory To Practice Stuart Mccafferty
PDF
Trends in Systems and How to Get Efficient Performance
PDF
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
PDF
Technology overview
PPTX
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
PDF
Lecture 1 Advanced Computer Architecture
PDF
unit 1vlsi notes.pdf
PPT
Fmcad08
PDF
IRJET - Predicting the Maximum Computational Power of Microprocessors using M...
PPT
Quad Core Processors - Technology Presentation
PDF
Practical Embedded Controllers Design and Troubleshooting with the Motorola 6...
PDF
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
PPTX
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
PDF
Supermicro X12 Performance Update
PDF
Architecting for Hyper-Scale Datacenter Efficiency
PDF
Content
PDF
Advanced Computer Architecture - Lec1.pdf
PDF
TRACK D: Advanced design regardless of process technology/ Marco Casale-Rossi
PPTX
Advanced Computer Architecture – An Introduction
Applying Tiny Machine Learning on FOC for PMSMs
Energy Iot Architecture From Theory To Practice Stuart Mccafferty
Trends in Systems and How to Get Efficient Performance
Performance Analysis and Optimizations of CAE Applications (Case Study: STAR_...
Technology overview
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
Lecture 1 Advanced Computer Architecture
unit 1vlsi notes.pdf
Fmcad08
IRJET - Predicting the Maximum Computational Power of Microprocessors using M...
Quad Core Processors - Technology Presentation
Practical Embedded Controllers Design and Troubleshooting with the Motorola 6...
"Approaches for Energy Efficient Implementation of Deep Neural Networks," a P...
Klessydra t - designing vector coprocessors for multi-threaded edge-computing...
Supermicro X12 Performance Update
Architecting for Hyper-Scale Datacenter Efficiency
Content
Advanced Computer Architecture - Lec1.pdf
TRACK D: Advanced design regardless of process technology/ Marco Casale-Rossi
Advanced Computer Architecture – An Introduction
Ad

More from NECST Lab @ Politecnico di Milano (20)

PDF
Mesticheria Team - WiiReflex
PPTX
Punto e virgola Team - Stressometro
PDF
BitIt Team - Stay.straight
PDF
BabYodini Team - Talking Gloves
PDF
printf("Nome Squadra"); Team - NeoTon
PPTX
BlackBoard Team - Motion Tracking Platform
PDF
#include<brain.h> Team - HomeBeatHome
PDF
Flipflops Team - Wave U
PDF
Bug(atta) Team - Little Brother
PDF
#NECSTCamp: come partecipare
PDF
NECSTLab101 2020.2021
PDF
TreeHouse, nourish your community
PDF
TiReX: Tiled Regular eXpressionsmatching architecture
PDF
Embedding based knowledge graph link prediction for drug repurposing
PDF
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PDF
EMPhASIS - An EMbedded Public Attention Stress Identification System
PDF
Luns - Automatic lungs segmentation through neural network
PDF
BlastFunction: How to combine Serverless and FPGAs
PDF
Maeve - Fast genome analysis leveraging exact string matching
Mesticheria Team - WiiReflex
Punto e virgola Team - Stressometro
BitIt Team - Stay.straight
BabYodini Team - Talking Gloves
printf("Nome Squadra"); Team - NeoTon
BlackBoard Team - Motion Tracking Platform
#include<brain.h> Team - HomeBeatHome
Flipflops Team - Wave U
Bug(atta) Team - Little Brother
#NECSTCamp: come partecipare
NECSTLab101 2020.2021
TreeHouse, nourish your community
TiReX: Tiled Regular eXpressionsmatching architecture
Embedding based knowledge graph link prediction for drug repurposing
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
EMPhASIS - An EMbedded Public Attention Stress Identification System
Luns - Automatic lungs segmentation through neural network
BlastFunction: How to combine Serverless and FPGAs
Maeve - Fast genome analysis leveraging exact string matching
Ad

Recently uploaded (20)

PDF
ETO & MEO Certificate of Competency Questions and Answers
PPT
Project quality management in manufacturing
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
DOCX
573137875-Attendance-Management-System-original
PDF
오픈소스 LLM, vLLM으로 Production까지 (Instruct.KR Summer Meetup, 2025)
PPT
Drone Technology Electronics components_1
PPTX
Simulation of electric circuit laws using tinkercad.pptx
PPTX
Fluid Mechanics, Module 3: Basics of Fluid Mechanics
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
Structs to JSON How Go Powers REST APIs.pdf
PPTX
“Next-Gen AI: Trends Reshaping Our World”
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
ETO & MEO Certificate of Competency Questions and Answers
Project quality management in manufacturing
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
573137875-Attendance-Management-System-original
오픈소스 LLM, vLLM으로 Production까지 (Instruct.KR Summer Meetup, 2025)
Drone Technology Electronics components_1
Simulation of electric circuit laws using tinkercad.pptx
Fluid Mechanics, Module 3: Basics of Fluid Mechanics
Internet of Things (IOT) - A guide to understanding
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Strings in CPP - Strings in C++ are sequences of characters used to store and...
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Structs to JSON How Go Powers REST APIs.pdf
“Next-Gen AI: Trends Reshaping Our World”
bas. eng. economics group 4 presentation 1.pptx
CH1 Production IntroductoryConcepts.pptx
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
OOP with Java - Java Introduction (Basics)
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx

A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA using Chisel HCL

  • 1. A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA using Chisel HCL Lorenzo Di Tucci, Marco Santambrogio {lorenzo.ditucci, marco.santambrogio}@polimi.it Alessandro Comodi, Davide Conficconi {alessandro.comodi,davide.conficconi}@mail.polimi.it Steven Hofmeyr, David Donofrio {shofmeyr, ddonofrio}@lbl.gov RAW @ JW Marriott, Vancouver May 22 2018 speaker: Alessandro Comodi
  • 2. Context 1 Large amounts of genomic data Algorithm complexity In such scenario there is a need to have efficient solutions both from a performance and a power consumption point of view
  • 3. Sequence Alignment 2 Sequence alignment algorithms are some of the most compute intensive ones Pure software solution Poor performance High power consumption
  • 4. merAligner 3 To overcome performance issues Lawrence Berkeley National Labs and UC Berkeley have proposed the merAligner High Performance Low power efficiency (90 kW per cabinet) More than 15,000 cores
  • 5. Contributions 4 • The design and development of a hardware architecture for the Smith-Waterman algorithm on FPGA, using Chisel HCL • The development of a wrapper written in Chisel, used to integrate RTL cores into the Xilinx SDAccel Framework
  • 6. Smith-Waterman 5 The main bottleneck of the merAligner tool is the Smith- Waterman algorithm implementation Highly parallel computation
  • 7. Architecture 6 Systolic array based design Each processing element is fed with the result of the previous one
  • 8. Results 7 Read Reference Frequency [MHz] Performance [GCUPS] Speed up Performance Efficiency [GCUPS/W] Speed up Power efficiency 128[*] 1024 - 3.87 - 0.0165 - 128 1024 150 3.542 0.91X 0.141 8.54X 128 2048 140 5.616 1.45X 0.224 14.35X 128 4096 180 6.529 1.68X 0.261 15.81X 128 16384 110 11.443 2.84X 0.457 27.69X 256 1024 160 6.123 1.58X 0.244 14.78X 256 2048 160 8.393 2.16X 0.335 20.30X 256 4096 130 15.225 3.93X 0.609 36.90X 256 16384 140 27.312 7.05X 1.092 66.18X [*] State of the Art Smith-Waterman software implementation
  • 9. Concluding Remarks 8 Read Reference Frequency [MHz] Performance [GCUPS] Speed up Performance Efficiency [GCUPS/W] Speed up Power efficiency 128[*] 1024 - 3.87 - 0.0165 - 128 1024 150 3.542 0.91X 0.141 8.54X 128 2048 140 5.616 1.45X 0.224 14.35X 128 4096 180 6.529 1.68X 0.261 15.81X 128 16384 110 11.443 2.84X 0.457 27.69X 256 1024 160 6.123 1.58X 0.244 14.78X 256 2048 160 8.393 2.16X 0.335 20.30X 256 4096 130 15.225 3.93X 0.609 36.90X 256 16384 140 27.312 7.05X 1.092 66.18X [*] State of the Art Smith-Waterman software implementation Thank you for your attention! Lorenzo Di Tucci, Marco Santambrogio {lorenzo.ditucci, marco.santambrogio}@polimi.it Alessandro Comodi, Davide Conficconi {alessandro.comodi,davide.conficconi}@mail.polimi.it Steven Hofmeyr, David Donofrio {shofmeyr, ddonofrio}@lbl.gov speaker: Alessandro Comodi Hardware architecture for the acceleration of the Smith-Waterman step of the merAligner on FPGA using Chisel HCL