Namaste FPGA Technologies’ cover photo
Namaste FPGA Technologies

Namaste FPGA Technologies

Professional Training and Coaching

Mumbai, Maharashtra 11,315 followers

Empowering Tomorrow's Innovators through Specialized FPGA Training for Semiconductor Applications

About us

We offer a comprehensive learning path designed specifically for Front-End VLSI enthusiasts. Our curated program takes you from the fundamentals to advanced topics, ensuring a smooth and successful learning journey. We understand the challenges of navigating the vast world of VLSI learning. Namaste FPGA solves this problem by providing a structured curriculum with courses sequenced for optimal learning. We emphasize practical skills by offering courses with a 95% coding focus and 5% theory, allowing you to learn by doing and solidifying your understanding. We've helped over 50K students on Udemy master Front-End VLSI since 2019. We even made UVM training super affordable ($5!), unlike others charging crazy amounts ($100-$500) making it accessible to everyone. Imagine having HackerRank's challenges, Udemy's in-depth lessons, and Internshala's internships – all rolled into one platform! That's a Namaste FPGA. It's easy to use and affordable, offering everything you need to master Front-End VLSI. Namaste FPGA offers : Low-latency microservice architecture for an uninterrupted learning experience, Best-in-class user data encryption for security, Always-available cloud-native application, Secure payment methods compliant with PCI-DSS, ISO 27001, and SOC 2, Modern UI with mobile-friendly design, Integration of dedicated Discord servers for 24/7 connectivity with instructors, 48-hour turnaround time for all support inquiries, Curated learning paths for Design, Verification, and SoC, Courses on Essential Job Skills (RTL Design & Verification) + Foundational Skills + Soft Skills, Remote Internship available for all participants, Verified Certificate of Completion, Coding Exercises Coding Contests with unique badges, Learn at your Pace, Affordable and fixed pricing for all courses.

Website
https://namaste-fpga.com/
Industry
Professional Training and Coaching
Company size
2-10 employees
Headquarters
Mumbai, Maharashtra
Type
Partnership
Founded
2024
Specialties
RTL Design, RTL Verification, Formal Verification, SoC, Verilog, SystemVerilog, and UVM

Locations

Employees at Namaste FPGA Technologies

Updates

  • Namaste FPGA Technologies reposted this

    Choosing the Right STA Violation Correction Technique for FPGAs: Small, Medium, and Large Fixes Explained Numerous violation correction techniques have been discovered over time, each suited to specific requirements. Let us go over a simple cheat sheet that helps decide which technique to use under different situations. Violation correction techniques can be classified into three categories: small violations, medium violations, and large violations. A violation of less than 1 ns can be considered small, a medium violation occurs when the range is between 1 ns and 3 ns, and a large violation is anything above 3 ns. Small violations are usually corrected with inbuilt tool strategies or simple manual techniques. Medium violations require more aggressive strategies, while large violations often need structural changes. This also means that small violations can be corrected quickly, whereas large violations significantly increase development time. Small setup violations (less than 1 ns) can be corrected in two ways: automatic correction using inbuilt optimization strategies in tools like phys_opt_design, or manual methods such as forward or backward retiming, logic replication to reduce fan-out, using faster primitives, and manually optimizing physical placement. These techniques can help correct setup violations in the range of 0.3 ns to 1 ns without major structural changes. Medium violations (1 ns to 3 ns) can be corrected using tool-based phys_opt_design optimization, as well as manual methods such as logic optimization by replacing critical logic with dedicated DSP blocks or carry chains, or using forward or backward retiming. These techniques can typically correct setup violations in the range of 1 ns to 3 ns, but may require structural changes. For large violations beyond 3 ns, tool-based optimization techniques are usually insufficient, and we mostly rely on adding pipeline registers or reducing the operating frequency to correct them. Similarly, there are techniques to correct hold violations, which we can explore another day. If you wish to build a strong foundation in STA for the Vivado Design Suite, explore our beginner-friendly STA course : https://lnkd.in/dWB-Y9B8

    • No alternative text description for this image
  • Choosing the Right STA Violation Correction Technique for FPGAs: Small, Medium, and Large Fixes Explained Numerous violation correction techniques have been discovered over time, each suited to specific requirements. Let us go over a simple cheat sheet that helps decide which technique to use under different situations. Violation correction techniques can be classified into three categories: small violations, medium violations, and large violations. A violation of less than 1 ns can be considered small, a medium violation occurs when the range is between 1 ns and 3 ns, and a large violation is anything above 3 ns. Small violations are usually corrected with inbuilt tool strategies or simple manual techniques. Medium violations require more aggressive strategies, while large violations often need structural changes. This also means that small violations can be corrected quickly, whereas large violations significantly increase development time. Small setup violations (less than 1 ns) can be corrected in two ways: automatic correction using inbuilt optimization strategies in tools like phys_opt_design, or manual methods such as forward or backward retiming, logic replication to reduce fan-out, using faster primitives, and manually optimizing physical placement. These techniques can help correct setup violations in the range of 0.3 ns to 1 ns without major structural changes. Medium violations (1 ns to 3 ns) can be corrected using tool-based phys_opt_design optimization, as well as manual methods such as logic optimization by replacing critical logic with dedicated DSP blocks or carry chains, or using forward or backward retiming. These techniques can typically correct setup violations in the range of 1 ns to 3 ns, but may require structural changes. For large violations beyond 3 ns, tool-based optimization techniques are usually insufficient, and we mostly rely on adding pipeline registers or reducing the operating frequency to correct them. Similarly, there are techniques to correct hold violations, which we can explore another day. If you wish to build a strong foundation in STA for the Vivado Design Suite, explore our beginner-friendly STA course : https://lnkd.in/dWB-Y9B8

    • No alternative text description for this image
  • Namaste FPGA Technologies reposted this

    Divergence, Convergence & Reconvergence in CDC ? A) Divergence (or fan-out) in Clock Domain Crossing (CDC) happens when a single signal in the single clock domain splits into multiple independent paths before crossing to the destination domain through separate synchronizers. We have two situations here. If the signals are never merged and each follows its own independent logic path in the destination domain, then there is no collective data incoherency, so the best strategy is simply to use a standard flop synchronizer on each independent path. But if those signals later merge again in the destination domain, then flip-flop synchronizers alone are not enough, because each path can arrive at different times; in that case we need to adopt same strategies that prevent reconvergence hazards, described next. B) Reconvergence happens when multiple signals from one clock domain take different paths, then each crosses the boundary through its own synchronizer, and finally they are combined again (for example, ANDed, ORed, compared, decoded) by combinational logic in the destination domain to produce some output. Reconvergence can also happen when two signals are generated in the same clock domain but have different path delays and we merge them in combinational logic; this local reconvergence can still cause a glitch if the logic is sensitive to momentary mismatches. Most preferred approach here is to register output in same clock domain before crossing. While in case of signal merging in different domain startegies vary based on source and destination clock frequencies. Most common approach is to treat the related signals as one unit: combine them into a bus in the source domain, freeze that bus so it stays stable, and send a single control (valid/ready style) across. Then, depending on the frequency relationship between the two domains, we choose the mechanism. If the source clock is slower than the destination clock, we can hold the grouped signals stable and send a valid pulse that the destination synchronizes and uses to latch the whole bus at once, which is basically a lightweight handshake. If we need stronger guarantee or bi-directional control, we use a full request/acknowledge handshake. If the source clock is faster than the destination clock, we should use an asynchronous FIFO to safely transfer the merged data, because the FIFO can absorb rate differences without losing information. If both clocks are frequency-related (for example derived from the same PLL with known phase) then we may be able to treat that crossing with normal STA-based timing techniques instead of generic asynchronous CDC. A more detailed coding walkthrough and demo of these techniques will be covered in the next post. If you want to learn the fundamentals of CDC for Vivado FPGA engineers, enroll in our beginner-friendly CDC course here: https://lnkd.in/dFvsAM_n

    • No alternative text description for this image
  • Divergence, Convergence & Reconvergence in CDC ? A) Divergence (or fan-out) in Clock Domain Crossing (CDC) happens when a single signal in the single clock domain splits into multiple independent paths before crossing to the destination domain through separate synchronizers. We have two situations here. If the signals are never merged and each follows its own independent logic path in the destination domain, then there is no collective data incoherency, so the best strategy is simply to use a standard flop synchronizer on each independent path. But if those signals later merge again in the destination domain, then flip-flop synchronizers alone are not enough, because each path can arrive at different times; in that case we need to adopt same strategies that prevent reconvergence hazards, described next. B) Reconvergence happens when multiple signals from one clock domain take different paths, then each crosses the boundary through its own synchronizer, and finally they are combined again (for example, ANDed, ORed, compared, decoded) by combinational logic in the destination domain to produce some output. Reconvergence can also happen when two signals are generated in the same clock domain but have different path delays and we merge them in combinational logic; this local reconvergence can still cause a glitch if the logic is sensitive to momentary mismatches. Most preferred approach here is to register output in same clock domain before crossing. While in case of signal merging in different domain startegies vary based on source and destination clock frequencies. Most common approach is to treat the related signals as one unit: combine them into a bus in the source domain, freeze that bus so it stays stable, and send a single control (valid/ready style) across. Then, depending on the frequency relationship between the two domains, we choose the mechanism. If the source clock is slower than the destination clock, we can hold the grouped signals stable and send a valid pulse that the destination synchronizes and uses to latch the whole bus at once, which is basically a lightweight handshake. If we need stronger guarantee or bi-directional control, we use a full request/acknowledge handshake. If the source clock is faster than the destination clock, we should use an asynchronous FIFO to safely transfer the merged data, because the FIFO can absorb rate differences without losing information. If both clocks are frequency-related (for example derived from the same PLL with known phase) then we may be able to treat that crossing with normal STA-based timing techniques instead of generic asynchronous CDC. A more detailed coding walkthrough and demo of these techniques will be covered in the next post. If you want to learn the fundamentals of CDC for Vivado FPGA engineers, enroll in our beginner-friendly CDC course here: https://lnkd.in/dFvsAM_n

    • No alternative text description for this image
  • Namaste FPGA Technologies reposted this

    Verilator Lint Rule Walkthrough – Part 1: Combinational & Sequential Best Practices Verilator is one of the interesting free-to-use lint checkers, offering far more checks compared to Vivado, sufficient for most hobbyist projects and even for graduate and post-graduate projects. Verilator supports around 128 lint checks, out of which approximately 48% (about 60) belong to the synthesizable RTL category. These can be used before running Vivado lint checks to ensure there are no unexpected syntax issues in the code that could lead to functional failures. Synthesizable rules can be divided into six categories, viz. combinational & sequential circuit recommended practices, width & type practices, block connection practices, constants & parameters practices, simulation-only construct practices, and coding style practices. Let us cover most of them in a six-day series, handling one category at a time. We start with combinational & sequential best practices to avoid simulation and synthesis mismatches and to ensure the intended hardware blocks are correctly inferred. This category includes six rules to correctly infer combinational and sequential circuits. Let us walk through them: 1) always@(): Combinational always blocks must be sensitive to all desired inputs of the system and cannot be empty. An empty sensitivity list means the block will never run. The recommended practice is to always use * with combinational always blocks. 2) All variables in a combinational always block must be assigned before being read to avoid unintended feedback paths or combinational loops. The recommended approach is to declare the variable, assign a value to it, and then read it. 3) A combinational always block should only include blocking assignment operators. Mixing blocking and non-blocking assignments in a combinational always block will lead to synthesis-simulation mismatches. Ensure that the same signal is not assigned values using both blocking and non-blocking assignments. A variable should be updated in only one always block, and the assignment operator must be chosen based on the type of circuit and remain consistent within that block. 4) Sequential always blocks should only use non-blocking assignment operators. 5) Avoid using initial blocks in synthesizable RTL code. Instead, use reset logic to assign initial or default values to signals. 6) Delay constructs are non-synthesizable and must be avoided in both sequential and combinational circuits. Instead, use counters to generate hardware delays. Follow all of the above rules to create synthesizable designs that correctly infer the intended hardware behavior. Learn more about other commonly used lint rules in Verilog for FPGA engineering here: https://lnkd.in/dYR8BnbS

    • No alternative text description for this image
  • Verilator Lint Rule Walkthrough – Part 1: Combinational & Sequential Best Practices Verilator is one of the interesting free-to-use lint checkers, offering far more checks compared to Vivado, sufficient for most hobbyist projects and even for graduate and post-graduate projects. Verilator supports around 128 lint checks, out of which approximately 48% (about 60) belong to the synthesizable RTL category. These can be used before running Vivado lint checks to ensure there are no unexpected syntax issues in the code that could lead to functional failures. Synthesizable rules can be divided into six categories, viz. combinational & sequential circuit recommended practices, width & type practices, block connection practices, constants & parameters practices, simulation-only construct practices, and coding style practices. Let us cover most of them in a six-day series, handling one category at a time. We start with combinational & sequential best practices to avoid simulation and synthesis mismatches and to ensure the intended hardware blocks are correctly inferred. This category includes six rules to correctly infer combinational and sequential circuits. Let us walk through them: 1) always@(): Combinational always blocks must be sensitive to all desired inputs of the system and cannot be empty. An empty sensitivity list means the block will never run. The recommended practice is to always use * with combinational always blocks. 2) All variables in a combinational always block must be assigned before being read to avoid unintended feedback paths or combinational loops. The recommended approach is to declare the variable, assign a value to it, and then read it. 3) A combinational always block should only include blocking assignment operators. Mixing blocking and non-blocking assignments in a combinational always block will lead to synthesis-simulation mismatches. Ensure that the same signal is not assigned values using both blocking and non-blocking assignments. A variable should be updated in only one always block, and the assignment operator must be chosen based on the type of circuit and remain consistent within that block. 4) Sequential always blocks should only use non-blocking assignment operators. 5) Avoid using initial blocks in synthesizable RTL code. Instead, use reset logic to assign initial or default values to signals. 6) Delay constructs are non-synthesizable and must be avoided in both sequential and combinational circuits. Instead, use counters to generate hardware delays. Follow all of the above rules to create synthesizable designs that correctly infer the intended hardware behavior. Learn more about other commonly used lint rules in Verilog for FPGA engineering here: https://lnkd.in/dYR8BnbS

    • No alternative text description for this image
  • Namaste FPGA Technologies reposted this

    Build Your Own Processor in a Week: A Beginner’s Guide to CPU Design in Verilog Beginners love to build their own processors after learning Verilog to get hands-on experience with complex designs. Since processor architectures are usually part of most college curriculums, many of us already have some familiarity with them, making it easier to understand the sub-blocks present in a processor compared to building Ethernet MACs, DDR controllers, or encryption modules. Let’s understand how we can approach building our own processor in a week. A processor performs operations based on the instructions provided to it, but every instruction requires either data itself or the address where the data is stored. So, we usually begin by adding general-purpose registers (GPRs), which provide data for the instructions. These GPRs are placed close to the CPU on the same die, so we can’t add too many of them without increasing the die size. Typically, we limit the GPR count to 16 or 32. Once we have the GPRs, we can start adding arithmetic and logic instructions that work on these GPRs and return results. The results must also be stored back into the GPRs. This is the first step, where our processor supports arithmetic and logical instructions with register addressing. However, the program flow will always be sequential since the program counter can only increment. Most processors need jump operations to perform repetitive tasks. Hence, the next step is adding condition flags. The most common condition flags are zero, carry, sign, and overflow. We look at the register used to store the result of the current instruction and generate values for these flags. Then we can add instructions based on each flag, for example, “jump if zero flag is set” or “jump if zero flag is clear.” These add about eight more instructions for the four flags. So far, our processor includes jumps and supports register addressing. Next, we can include secondary memory by utilizing BRAM for storing data or data memory, which adds direct addressing mode support to the processor. We also need memory to store the program. To keep the processor simple, we maintain separate data and instruction memories, following the Harvard architecture. The instruction memory (IMEM) will be ROM, while the data memory (DMEM) will be RAM. We then include separate instructions for direct addressing mode, and likewise add support for immediate data. This step completes a Very Small Instruction Set Processor with Harvard arch , supporting register, direct, and immediate addressing, and 16 general-purpose registers with about 20 instructions (3 arithmetic, 5 logic, 8 jump, and 3 addressing mode instructions). These are ideal starting points for learning computer architecture and can be stepping stones toward understanding more complex architectures. If you wish to learn how to build a processor from scratch in Verilog, explore our beginner-friendly processor course here : https://lnkd.in/dDKt6j8j

    • No alternative text description for this image
  • Build Your Own Processor in a Week: A Beginner’s Guide to CPU Design in Verilog Beginners love to build their own processors after learning Verilog to get hands-on experience with complex designs. Since processor architectures are usually part of most college curriculums, many of us already have some familiarity with them, making it easier to understand the sub-blocks present in a processor compared to building Ethernet MACs, DDR controllers, or encryption modules. Let’s understand how we can approach building our own processor in a week. A processor performs operations based on the instructions provided to it, but every instruction requires either data itself or the address where the data is stored. So, we usually begin by adding general-purpose registers (GPRs), which provide data for the instructions. These GPRs are placed close to the CPU on the same die, so we can’t add too many of them without increasing the die size. Typically, we limit the GPR count to 16 or 32. Once we have the GPRs, we can start adding arithmetic and logic instructions that work on these GPRs and return results. The results must also be stored back into the GPRs. This is the first step, where our processor supports arithmetic and logical instructions with register addressing. However, the program flow will always be sequential since the program counter can only increment. Most processors need jump operations to perform repetitive tasks. Hence, the next step is adding condition flags. The most common condition flags are zero, carry, sign, and overflow. We look at the register used to store the result of the current instruction and generate values for these flags. Then we can add instructions based on each flag, for example, “jump if zero flag is set” or “jump if zero flag is clear.” These add about eight more instructions for the four flags. So far, our processor includes jumps and supports register addressing. Next, we can include secondary memory by utilizing BRAM for storing data or data memory, which adds direct addressing mode support to the processor. We also need memory to store the program. To keep the processor simple, we maintain separate data and instruction memories, following the Harvard architecture. The instruction memory (IMEM) will be ROM, while the data memory (DMEM) will be RAM. We then include separate instructions for direct addressing mode, and likewise add support for immediate data. This step completes a Very Small Instruction Set Processor with Harvard arch , supporting register, direct, and immediate addressing, and 16 general-purpose registers with about 20 instructions (3 arithmetic, 5 logic, 8 jump, and 3 addressing mode instructions). These are ideal starting points for learning computer architecture and can be stepping stones toward understanding more complex architectures. If you wish to learn how to build a processor from scratch in Verilog, explore our beginner-friendly processor course here : https://lnkd.in/dDKt6j8j

    • No alternative text description for this image
  • Namaste FPGA Technologies reposted this

    Online PCIe Courses review 🙃 We recently thought of going through a few tutorials available on web about PCIe, and these are our observations. Most of them talk only about theoretical aspects e.g. in PCIe we use LTSSM for negotiating link parameters, so we went through the numerous states in LTSSM before we reach link up/L0 state. Most courses talk about what we do in each state but none of the courses actually focus on role-specific skills. e.g. a verification engineer requires only an overview of LTSSM to build test seq, while a design engg needs to know the entry and exit conditions. In fact, when design engg build LTSSM, it actually consists of three sub-blocks: LTSSM_TX working on the TX side, LTSSM_RX working on the RX side, and the central LTSSM coordinating both. TX LTSSM assume max number of ordered sets to be sent to the RX side so that the minimum condition required to move to the next state can be achieved within timeout.  Courses focusing on design should cover the designing aspects of sub-blocks, understanding the role of each, typical coding of each block, and if it’s a control path, how we enter and exit. Those referred books on PCIe without implementation discusses most of topics e.g. LTSSM from a single perspective combining both RX and TX, while actual design separates them. e.g. ltssm state like DETECT, which has two substates — Quiet and Active. Active simply waits for certain period to allow all modules to initialize before we start detecting the receiver in the Active state. Most courses teach that we wait for a certain period in Quiet and then move to Detect, where we find receiver presence. But from the TX perspective, we actually turn off the PIPE interface to keep the entire TX path idle, and the Detect state just checks the receiver status returned by the main LTSSM. These lectures makes sense to those who already worked on design aspects, while those learning LTSSM for the first time may think we have a single FSM where we simply wait and move to the next state. The RX LTSSM doesn’t have any Detect state since nothing is sent from the TX side, so it directly starts at the Polling state — something none of the courses teach. From a design perspective, this matters, while for verification engineers, we simply build a sequence that sends Idle for 12 ms and then reads the receiver status before the timeout is reached. As soon as we detect one receiver’s presence, we move ahead to the next state.  You can see that design engineers focus on how the TX and RX LTSSMs are built, while verification engineers focus on generating sequences independent of the internal LTSSM. However, most courses focus neither on design nor verification, offering only theoretical overviews. Be ready for Upcoming course covering design aspects of PCIe PHY on first week of Nov. If you are not familiar with PCI yet go ahead and prepare PCI foundation before PCIe release : https://lnkd.in/dFvsAM_n

    • No alternative text description for this image
  • Online PCIe Courses review 🙃 We recently thought of going through a few tutorials available on web about PCIe, and these are our observations. Most of them talk only about theoretical aspects e.g. in PCIe we use LTSSM for negotiating link parameters, so we went through the numerous states in LTSSM before we reach link up/L0 state. Most courses talk about what we do in each state but none of the courses actually focus on role-specific skills. e.g. a verification engineer requires only an overview of LTSSM to build test seq, while a design engg needs to know the entry and exit conditions. In fact, when design engg build LTSSM, it actually consists of three sub-blocks: LTSSM_TX working on the TX side, LTSSM_RX working on the RX side, and the central LTSSM coordinating both. TX LTSSM assume max number of ordered sets to be sent to the RX side so that the minimum condition required to move to the next state can be achieved within timeout.  Courses focusing on design should cover the designing aspects of sub-blocks, understanding the role of each, typical coding of each block, and if it’s a control path, how we enter and exit. Those referred books on PCIe without implementation discusses most of topics e.g. LTSSM from a single perspective combining both RX and TX, while actual design separates them. e.g. ltssm state like DETECT, which has two substates — Quiet and Active. Active simply waits for certain period to allow all modules to initialize before we start detecting the receiver in the Active state. Most courses teach that we wait for a certain period in Quiet and then move to Detect, where we find receiver presence. But from the TX perspective, we actually turn off the PIPE interface to keep the entire TX path idle, and the Detect state just checks the receiver status returned by the main LTSSM. These lectures makes sense to those who already worked on design aspects, while those learning LTSSM for the first time may think we have a single FSM where we simply wait and move to the next state. The RX LTSSM doesn’t have any Detect state since nothing is sent from the TX side, so it directly starts at the Polling state — something none of the courses teach. From a design perspective, this matters, while for verification engineers, we simply build a sequence that sends Idle for 12 ms and then reads the receiver status before the timeout is reached. As soon as we detect one receiver’s presence, we move ahead to the next state.  You can see that design engineers focus on how the TX and RX LTSSMs are built, while verification engineers focus on generating sequences independent of the internal LTSSM. However, most courses focus neither on design nor verification, offering only theoretical overviews. Be ready for Upcoming course covering design aspects of PCIe PHY on first week of Nov. If you are not familiar with PCI yet go ahead and prepare PCI foundation before PCIe release : https://lnkd.in/dFvsAM_n

    • No alternative text description for this image

Similar pages

Browse jobs