Activity
-
Proud to share Professor Frank Kschischang (ECE) has been named a U of T University Professor — the University of Toronto’s highest academic rank…
Proud to share Professor Frank Kschischang (ECE) has been named a U of T University Professor — the University of Toronto’s highest academic rank…
Liked by Deshanand Singh
-
https://lnkd.in/evUET4vs
https://lnkd.in/evUET4vs
Liked by Deshanand Singh
-
A major milestone for EnCharge AI! I’m proud, along with my colleagues, to introduce EN100, our first commercial AI accelerator, born from many years…
A major milestone for EnCharge AI! I’m proud, along with my colleagues, to introduce EN100, our first commercial AI accelerator, born from many years…
Liked by Deshanand Singh
Experience
Education
Publications
-
FPGAs for Software Programmers : OpenCL
Springer
This book makes powerful Field Programmable Gate Array (FPGA) and reconfigurable technology accessible to software engineers by covering different state-of-the-art high-level synthesis approaches (e.g., OpenCL and several C-to-gates compilers). It introduces FPGA technology, its programming model, and how various applications can be implemented on FPGAs without going through low-level hardware design phases.
Other authorsSee publication -
Higher Level Programming Abstractions for FPGAs using OpenCL
DATE : Design Automation and Test in Europe 2011
-
Line-level incremental resynthesis techniques for FPGAs
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
FPGA logic density is roughly doubling at every process generation. Consequently, it is becoming increasingly challenging for FPGA CAD tools to keep up with the growing complexities of high-speed designs while keeping CAD run-times reasonable. In this paper, we present a novel incremental resynthesis tool called Line-Level Incremental reSynthesis (LLIS), integrated within an industrial tool suite, that addresses the problems of timing closure as well as CAD runtime (patent pending). We describe…
FPGA logic density is roughly doubling at every process generation. Consequently, it is becoming increasingly challenging for FPGA CAD tools to keep up with the growing complexities of high-speed designs while keeping CAD run-times reasonable. In this paper, we present a novel incremental resynthesis tool called Line-Level Incremental reSynthesis (LLIS), integrated within an industrial tool suite, that addresses the problems of timing closure as well as CAD runtime (patent pending). We describe a general framework that can incrementally reuse results from a previous compile based on automatic differencing of HDL changes. We show that it is possible to reduce synthesis runtime by 6.5x for common HDL changes. As compared with complete resynthesis, we preserve known good timing solutions more than 82% of the time. This represents a 3X improvement vs. non-incremental techniques.
Other authorsSee publication -
Improving FPGA designer productivity using OpenCL
FPGA 2011 Pre-Conference Workshop
Today's FPGAs have logic capacities that are steadily increasing. The FPGA is a large array of fine-grained programmable elements that can be configured in such a way to efficiently solve many complex problems. For many applications, FPGAs are a tremendously efficient computational fabric; however, the primary method of design entry for FPGAs is through Hardware Design Languages (HDLs) such as VHDL or Verilog. These languages model the FPGA at an extremely low level where the programmer is…
Today's FPGAs have logic capacities that are steadily increasing. The FPGA is a large array of fine-grained programmable elements that can be configured in such a way to efficiently solve many complex problems. For many applications, FPGAs are a tremendously efficient computational fabric; however, the primary method of design entry for FPGAs is through Hardware Design Languages (HDLs) such as VHDL or Verilog. These languages model the FPGA at an extremely low level where the programmer is expected to understand cycle-accurate details of how data is moved and transformed through the FPGA. While this programming model is required to achieve the highest possible efficiency from FPGAs, it is akin to "assembly language" programming for processors. In this talk, we explore techniques that allow us to program FPGAs at a level of abstraction that is closer to traditional software-centric approaches. These techniques allow us to tradeoff some efficiency for added designer productivity. Our language of choice is OpenCL, which is an industry standard parallel language based on 'C.' OpenCL offers numerous compelling advantages that enable designers to harness the computational power of FPGAs and yet ease the programming burden to a significant extent.
-
Parallelizing FPGA Technology Mapping Using Graphics Processing Units (GPUs)
Field Programmable Logic and Applications (FPL) 2010
GPUs are becoming an increasingly attractive option for obtaining performance speedups for data-parallel applications. FPGA technology mapping is an algorithm that is heavily data parallel; however, it has many features that make it unattractive to implement on a GPU. The algorithm uses data in irregular ways since it is a graph-based algorithm. In addition, it makes heavy use of constructs like recursion which is not supported by GPU hardware. In this paper, we take a state-of-the-art FPGA…
GPUs are becoming an increasingly attractive option for obtaining performance speedups for data-parallel applications. FPGA technology mapping is an algorithm that is heavily data parallel; however, it has many features that make it unattractive to implement on a GPU. The algorithm uses data in irregular ways since it is a graph-based algorithm. In addition, it makes heavy use of constructs like recursion which is not supported by GPU hardware. In this paper, we take a state-of-the-art FPGA technology mapping algorithm within Berkeley's ABC package and attempt to parallelize it on a GPU. We show that runtime gains of 3.1× are achievable while maintaining identical quality as demonstrated by running these netlists through Altera's Quartus II place-and-route tool.
Other authorsSee publication -
A comprehensive approach to modeling, characterizing and optimizing for metastability in FPGAs
Symposium on Field Programmable Gate Arrays (FPGA) 2010
Metastability is a phenomenon that can cause system failures in digital circuits. It may occur whenever signals are being transmitted across asynchronous or unrelated clock domains. The impact of metastability is increasing as process geometries shrink and supply voltages drop faster than transistor Vts. FPGA technologies are significantly affected since leading edge FPGAs are amongst the first devices to adopt the most recent process nodes. In this paper, we present a comprehensive suite of…
Metastability is a phenomenon that can cause system failures in digital circuits. It may occur whenever signals are being transmitted across asynchronous or unrelated clock domains. The impact of metastability is increasing as process geometries shrink and supply voltages drop faster than transistor Vts. FPGA technologies are significantly affected since leading edge FPGAs are amongst the first devices to adopt the most recent process nodes. In this paper, we present a comprehensive suite of techniques for modeling, characterizing and optimizing metastability effects in FPGAs. We first discuss a theoretical model of metastability, and verify the predictions using both circuit level simulations and board measurements. Next we show how designers have traditionally dealt with metastability problems and contrast that with the automatic CAD algorithms described in this paper that both analyze and optimize metastabilityrelated issues. Through our detailed experimental results, we show that we can improve the metastability characteristics of a large suite of industrial benchmarks by an average of 268,000 times with our optimization techniques.
Other authorsSee publication -
Predicting Interconnect Delay for Physical Synthesis in an FPGA CAD Flow
IEEE Transactions on Very Large Scale Integration Systems
This paper studies the difficulty of predicting interconnect delay in an industrial setting. Industrial circuits and two industrial FPGA architectures were used in the study. We show that there is a large amount of inherent randomness in a state-of-the-art FPGA placement algorithm. Thus, it is impossible to predict interconnect delay with a high degree of accuracy. Futhermore, we show that a simple timing model can be used to predict some aspects of interconnect timing with just as much…
This paper studies the difficulty of predicting interconnect delay in an industrial setting. Industrial circuits and two industrial FPGA architectures were used in the study. We show that there is a large amount of inherent randomness in a state-of-the-art FPGA placement algorithm. Thus, it is impossible to predict interconnect delay with a high degree of accuracy. Futhermore, we show that a simple timing model can be used to predict some aspects of interconnect timing with just as much accuracy as predictions obtained by running the placement tool itself. Next, we present a metric for predicting the accuracy of our interconnect delay model and show how this metric can be used to improve a timing driven physical synthesis flow. Finally, we examine the benefits of using the simple timing model in a timing driven physical synthesis flow, and attempt to establish an upper bound on these possible gains, given the difficulty of interconnect delay prediction.
Other authorsSee publication -
FPGA PLB Architecture Evaluation and Area Optimization Techniques using Boolean Satisfiability
IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems
This work presents a Field-Programmable Gate Array (FPGA) logic synthesis technique based upon Boolean Satisfiability (SAT). This work shows how to map any Boolean function into an arbitrary PLB architecture without any custom decomposition techniques. The authors illustrate several useful applications of this technique by showing how this technique can be used for architecture evaluation and area optimization. When evaluating FPGA architecture, the authors focus on the basic building block of…
This work presents a Field-Programmable Gate Array (FPGA) logic synthesis technique based upon Boolean Satisfiability (SAT). This work shows how to map any Boolean function into an arbitrary PLB architecture without any custom decomposition techniques. The authors illustrate several useful applications of this technique by showing how this technique can be used for architecture evaluation and area optimization. When evaluating FPGA architecture, the authors focus on the basic building block of the FPGA which they refer as a programmable logic block (PLB). In order to illustrate the flexibility of their evaluation framework, several unrelated PLB architectures are evaluated in an automated fashion. Furthermore, the authors show that using their technique is able to reduce FPGA resource usage by 27% on average in common subcircuits found in digital design.
Other authors -
-
Two-Stage Physical Synthesis for FPGAs
Proceedings of the IEEE Custom Integrated Circuits Conference
This paper presents an overview of an industrial physical synthesis CAD flow for FPGAs. The flow provides a performance speedup of 10%15% for most circuits, and a significant number of circuits show a speedup of 20%180%. We describe the algorithms used to achieve this result including: incremental retiming, BDD-based resynthesis, local rewiring, and logic replication. The effectiveness of these operations depends on the ability to accurately determine which portions of logic are timing…
This paper presents an overview of an industrial physical synthesis CAD flow for FPGAs. The flow provides a performance speedup of 10%15% for most circuits, and a significant number of circuits show a speedup of 20%180%. We describe the algorithms used to achieve this result including: incremental retiming, BDD-based resynthesis, local rewiring, and logic replication. The effectiveness of these operations depends on the ability to accurately determine which portions of logic are timing critical at a stage of the CAD flow where there is still freedom to perform logic restructuring. We show how this problem can be effectively solved by inserting prediction and restrurcturing operations at multiple points of the FPGA CAD flow.
Other authorsSee publication
Patents
-
Method and apparatus for implementing configurable streaming networks
Issued US 9,515,658
A method of configuring a programmable integrated circuit device. A channel source within the virtual fabric is configured to receive input data from a first kernel outside of the virtual fabric and on the programmable integrated circuit device, and a channel sink within the virtual fabric is configured to transmit output data to the first kernel. The configuring of the channel source is modified such that the channel source receives input data from a second kernel in response to detecting a…
A method of configuring a programmable integrated circuit device. A channel source within the virtual fabric is configured to receive input data from a first kernel outside of the virtual fabric and on the programmable integrated circuit device, and a channel sink within the virtual fabric is configured to transmit output data to the first kernel. The configuring of the channel source is modified such that the channel source receives input data from a second kernel in response to detecting a change in operation of the programmable integrated circuit device.
Other inventorsSee patent -
Configuring a programmable device using high-level language
Issued US 9,449,132
A method of preparing a programmable integrated circuit device for configuration using a high-level language includes compiling a plurality of virtual programmable devices from descriptions in said high-level language. the compiling includes compiling configurations of configurable routing resources from programmable resources of said programmable integrated circuit device, and compiling configurations of a plurality of complex function blocks from programmable resources of said programmable…
A method of preparing a programmable integrated circuit device for configuration using a high-level language includes compiling a plurality of virtual programmable devices from descriptions in said high-level language. the compiling includes compiling configurations of configurable routing resources from programmable resources of said programmable integrated circuit device, and compiling configurations of a plurality of complex function blocks from programmable resources of said programmable integrated circuit device. A machine-readable data storage medium may be encoded with a library of such compiled configurations. A virtual programmable device may include a stall signal network and routing switches of the virtual programmable device may include stall signal inputs and outputs.
Other inventorsSee patent -
OpenCL compilation
Issued US 9,134,981
Systems and methods for increasing speed and reducing processing power of a compile process of programmable logic of an integrated circuit (IC) are provided. For example, in one embodiment, a method includes obtaining a high level program, comprising computer-readable instructions for implementation on programmable logic of an integrated circuit (IC); translating the high level program into low level code representative of functional components needed to execute functionalities of the high…
Systems and methods for increasing speed and reducing processing power of a compile process of programmable logic of an integrated circuit (IC) are provided. For example, in one embodiment, a method includes obtaining a high level program, comprising computer-readable instructions for implementation on programmable logic of an integrated circuit (IC); translating the high level program into low level code representative of functional components needed to execute functionalities of the high level program; generating a host program comprising computer-readable instructions for implementing the low level code based upon the high level program; obtaining modifications to the high level program; determining whether the modifications can be implemented by a new host program utilizing the low level code; and generating the new host program to implement the modifications, when the modifications can be implemented by the new host program utilizing the low level code.
Other inventorsSee patent -
Adaptable programs using partial reconfiguration
Issued US 9,100,012
Systems and methods for dynamically adjusting programs implemented on an integrated circuit (IC) are provided. During runtime, characteristics of the application may change or become known. Accordingly, the embodiments described herein allow for partial reconfiguration of kernels implemented on an IC during runtime to dynamically alter performance based upon these characteristics.
Other inventorsSee patent -
Method and apparatus for performing fast incremental resynthesis
Issued US 8,732,634
-
Integrated circuit compilation
Issued US 8,650,525
Systems and methods for increasing speed and reducing processing power of a compile process of programmable logic of an integrated circuit (IC) are provided. For example, in one embodiment, a method includes obtaining a high level program, comprising computer-readable instructions for implementation on programmable logic of an integrated circuit (IC); translating the high level program into low level code representative of functional components needed to execute functionalities of the high…
Systems and methods for increasing speed and reducing processing power of a compile process of programmable logic of an integrated circuit (IC) are provided. For example, in one embodiment, a method includes obtaining a high level program, comprising computer-readable instructions for implementation on programmable logic of an integrated circuit (IC); translating the high level program into low level code representative of functional components needed to execute functionalities of the high level program; generating a host program comprising computer-readable instructions for implementing the low level code based upon the high level program; obtaining modifications to the high level program; determining whether the modifications can be implemented by a new host program utilizing the low level code; and generating the new host program to implement the modifications, when the modifications can be implemented by the new host program utilizing the low level code.
Other inventorsSee patent -
Method And Apparatus For Implementing Soft Constraints In Tools Used For Designing Programmable Logic Devices
Issued US 8589849
A method for designing a system on a target device utilizing programmable logic devices (PLDs) includes generating options for utilizing resources on the PLDs in response to user specified constraints. The options for utilizing the resources on the PLDs are refined independent of the user specified constraints.
Other inventorsSee patent -
Methods and systems for measuring and presenting performance data of a memory controller system
Issued US 8,499,201
Mechanisms for measuring, analyzing, and presenting performance data associated with a memory controller system are described. The mechanisms include a performance monitor that detects and analyzes performance including efficiency and latency of a memory controller system. In addition to determining performance, the systems identifies reasons for loss of memory controller system efficiency. Moreover, the reasons, the efficiency, and the latency are analyzed and presented in a manner easily…
Mechanisms for measuring, analyzing, and presenting performance data associated with a memory controller system are described. The mechanisms include a performance monitor that detects and analyzes performance including efficiency and latency of a memory controller system. In addition to determining performance, the systems identifies reasons for loss of memory controller system efficiency. Moreover, the reasons, the efficiency, and the latency are analyzed and presented in a manner easily understandable to a user.
Other inventors -
Method and apparatus for performing fast incremental resynthesis
Issued US 8,484,596
A method for designing a system on a target device is disclosed. Extraction is performed on a first version of the system during synthesis in a first compilation resulting in a first netlist. Optimizations are performed on the first version of the system during synthesis in the first compilation resulting in a second netlist. Placement and routing are performed on the first version of the system in the first compilation. Extraction is performed on a second version of the system having a changed…
A method for designing a system on a target device is disclosed. Extraction is performed on a first version of the system during synthesis in a first compilation resulting in a first netlist. Optimizations are performed on the first version of the system during synthesis in the first compilation resulting in a second netlist. Placement and routing are performed on the first version of the system in the first compilation. Extraction is performed on a second version of the system having a changed portion during synthesis in a second compilation resulting in a third netlist. The first version of the system in the first netlist and the second version of the system in the third netlist are differentiated to identify identical regions, wherein at least one of the performing and differentiating is performed by a processor.
Other inventors -
Method and apparatus for performing fast incremental resynthesis
Issued US 8,296,695
A method for designing a system on a target device is disclosed. A first netlist with a first set of functionally invariant boundaries (FIBs) is generated after performing extraction during synthesis of a first version of the system in a first compilation. One or more of the FIBs is invalidated from the first set after performing optimizations during synthesis in the first compilation resulting in a second netlist with a second set of FIBs. A third netlist with a third set of FIBs is generated…
A method for designing a system on a target device is disclosed. A first netlist with a first set of functionally invariant boundaries (FIBs) is generated after performing extraction during synthesis of a first version of the system in a first compilation. One or more of the FIBs is invalidated from the first set after performing optimizations during synthesis in the first compilation resulting in a second netlist with a second set of FIBs. A third netlist with a third set of FIBs is generated after performing extraction during synthesis of a second version of the system having a changed portion in a second compilation. Connectivity of matching nodes from the first netlist and the third netlist reaching FIBs is traversed to identify equivalent nodes associated with identical regions. The identical region in the third netlist is replaced with an optimized synthesized region from the second netlist.
Other inventorsSee patent -
Method and apparatus for performing simultaneous register retiming and combinational resynthesis during physical synthesis
Issued US 8,296,696
A method for designing a system on a target device includes synthesizing the system. The system is mapped. The system is placed on the target device. Physical synthesis is performed on the system by identifying a plurality of register retiming solutions for each register in the system, performing combinational resynthesis on each of the register retiming solutions, and selecting a combinational resynthesis solution for the system.
Other inventorsSee patent -
Method and apparatus for performing multiple stage physical synthesis
Issued US 7,996,797
-
Method and apparatus for performing post-placement routability optimization
Issued US 7,620,925
-
Method and apparatus for performing retiming on field programmable gate arrays
Issued US Method and apparatus for performing retiming on field progra
-
Leveraging combinations of synthesis, placement and incremental optimizations
Issued US 7,290,240
-
Method and apparatus for designing systems using logic regions
Issued US 7,197,734
-
Method and apparatus for placement of components onto programmable logic devices
Issued US 7,181,717
-
Programmable logic devices with skewed clocking signals
Issued US 7,107,477
-
Programmable logic devices with skewed clocking signals
Issued US 7,464,286
-
Method and apparatus for placement of components onto programmable logic devices
Issued US 6,779,169
Honors & Awards
-
Design Tool of the Year
Elektra European Electronics Industry Awards
The Elektra European Electronics Industry Awards are the most prestigious product, technology and business awards in Europe. Founded by Electronics Weekly in 2003, the Elektra Awards recognize the achievements of individuals and companies in the electronics industry. The awards are presented to companies whose products demonstrate advanced technical capabilities and usefulness. A panel of independent industry experts and representatives from Electronics Weekly selected the Altera SDK for OpenCL…
The Elektra European Electronics Industry Awards are the most prestigious product, technology and business awards in Europe. Founded by Electronics Weekly in 2003, the Elektra Awards recognize the achievements of individuals and companies in the electronics industry. The awards are presented to companies whose products demonstrate advanced technical capabilities and usefulness. A panel of independent industry experts and representatives from Electronics Weekly selected the Altera SDK for OpenCL as the winner in its "Design Tools and Development Software" category.
-
EE Times Ultimate Product - Software Award
EE Times
Altera's SDK for OpenCL was recognized by this prestigious Ace Award for its ability to allow software programmers to access the performance and low-power advantages of FPGAs. Altera is the industry's first company to offer an SDK for OpenCL that targets FPGAs and today offers a full production release of the high-level design tool. The EE Times Ultimate Product - Software award recognizes the company with the most innovative software product of the year.
More activity by Deshanand
-
They thought it was just a “glitch.” It nearly cost them $1M. A manufacturing plant outside Toronto kept seeing tiny shutdowns. Quick resets. No…
They thought it was just a “glitch.” It nearly cost them $1M. A manufacturing plant outside Toronto kept seeing tiny shutdowns. Quick resets. No…
Liked by Deshanand Singh
-
Today marks one year since I was sworn in as your Member of Provincial Parliament. It’s a milestone I reflect on with pride, gratitude, and a deep…
Today marks one year since I was sworn in as your Member of Provincial Parliament. It’s a milestone I reflect on with pride, gratitude, and a deep…
Liked by Deshanand Singh
-
Wrapping up an invigorating week pre–Memorial Day with positive vibes and inspired conversations. The week started with our amazing Rocket team, with…
Wrapping up an invigorating week pre–Memorial Day with positive vibes and inspired conversations. The week started with our amazing Rocket team, with…
Liked by Deshanand Singh
-
Today I am happy to announce that we have completed a $100M Series A led by Magnetar and AMD! This marks a momentous milestone for TensorWave as we…
Today I am happy to announce that we have completed a $100M Series A led by Magnetar and AMD! This marks a momentous milestone for TensorWave as we…
Liked by Deshanand Singh
-
Thanks for a great time in Frankfurt, #ICASummit2025! We were thrilled by the response to our speedAI® technology for autonomous mobility…
Thanks for a great time in Frankfurt, #ICASummit2025! We were thrilled by the response to our speedAI® technology for autonomous mobility…
Liked by Deshanand Singh
-
A 25-year-old Pentium II with only 128MB of RAM just ran a modern AI model - a technological earthquake that challenges everything we assume about…
A 25-year-old Pentium II with only 128MB of RAM just ran a modern AI model - a technological earthquake that challenges everything we assume about…
Liked by Deshanand Singh
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More