Last week, I described four design patterns for AI agentic workflows that I believe will drive significant progress: Reflection, Tool use, Planning and Multi-agent collaboration. Instead of having an LLM generate its final output directly, an agentic workflow prompts the LLM multiple times, giving it opportunities to build step by step to higher-quality output. Here, I'd like to discuss Reflection. It's relatively quick to implement, and I've seen it lead to surprising performance gains. You may have had the experience of prompting ChatGPT/Claude/Gemini, receiving unsatisfactory output, delivering critical feedback to help the LLM improve its response, and then getting a better response. What if you automate the step of delivering critical feedback, so the model automatically criticizes its own output and improves its response? This is the crux of Reflection. Take the task of asking an LLM to write code. We can prompt it to generate the desired code directly to carry out some task X. Then, we can prompt it to reflect on its own output, perhaps as follows: Here’s code intended for task X: [previously generated code] Check the code carefully for correctness, style, and efficiency, and give constructive criticism for how to improve it. Sometimes this causes the LLM to spot problems and come up with constructive suggestions. Next, we can prompt the LLM with context including (i) the previously generated code and (ii) the constructive feedback, and ask it to use the feedback to rewrite the code. This can lead to a better response. Repeating the criticism/rewrite process might yield further improvements. This self-reflection process allows the LLM to spot gaps and improve its output on a variety of tasks including producing code, writing text, and answering questions. And we can go beyond self-reflection by giving the LLM tools that help evaluate its output; for example, running its code through a few unit tests to check whether it generates correct results on test cases or searching the web to double-check text output. Then it can reflect on any errors it found and come up with ideas for improvement. Further, we can implement Reflection using a multi-agent framework. I've found it convenient to create two agents, one prompted to generate good outputs and the other prompted to give constructive criticism of the first agent's output. The resulting discussion between the two agents leads to improved responses. Reflection is a relatively basic type of agentic workflow, but I've been delighted by how much it improved my applications’ results. If you’re interested in learning more about reflection, I recommend: - Self-Refine: Iterative Refinement with Self-Feedback, by Madaan et al. (2023) - Reflexion: Language Agents with Verbal Reinforcement Learning, by Shinn et al. (2023) - CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing, by Gou et al. (2024) [Original text: https://lnkd.in/g4bTuWtU ]
Performance Optimization Techniques
Explore top LinkedIn content from expert professionals.
-
-
Apache Spark has levels to it: - Level 0 You can run spark-shell or pyspark, it means you can start - Level 1 You understand the Spark execution model: • RDDs vs DataFrames vs Datasets • Transformations (map, filter, groupBy, join) vs Actions (collect, count, show) • Lazy execution & DAG (Directed Acyclic Graph) Master these concepts, and you’ll have a solid foundation - Level 2 Optimizing Spark Queries • Understand Catalyst Optimizer and how it rewrites queries for efficiency. • Master columnar storage and Parquet vs JSON vs CSV. • Use broadcast joins to avoid shuffle nightmares • Shuffle operations are expensive. Reduce them with partitioning and good data modeling • Coalesce vs Repartition—know when to use them. • Avoid UDFs unless absolutely necessary (they bypass Catalyst optimization). Level 3 Tuning for Performance at Scale • Master spark.sql.autoBroadcastJoinThreshold. • Understand how Task Parallelism works and set spark.sql.shuffle.partitions properly. • Skewed Data? Use adaptive execution! • Use EXPLAIN and queryExecution.debug to analyze execution plans. - Level 4 Deep Dive into Cluster Resource Management • Spark on YARN vs Kubernetes vs Standalone—know the tradeoffs. • Understand Executor vs Driver Memory—tune spark.executor.memory and spark.driver.memory. • Dynamic allocation (spark.dynamicAllocation.enabled=true) can save costs. • When to use RDDs over DataFrames (spoiler: almost never). What else did I miss for mastering Spark and distributed compute?
-
6 proven techniques to increase productivity, And reclaim your time: This sheet highlights the ↳What ↳When ↳Why ↳And how So you can start putting these to work today: 1) Eisenhower Matrix What it is - A system to prioritize When to use it - You feel busywork is keeping you from "real" work Why it works - The least important tasks keep rising to the top because they're the easiest How to use it - Sort your tasks into quadrants: ↳Important and urgent: do it now ↳Important but less urgent: schedule it ↳Not important but urgent: delegate it ↳Not important and not urgent: delete it 2) 80/20 Rule What - A rule for focusing only on the most impactful work When - You feel over-capacity, and you need to cut things Why - 80% of outcomes come from 20% of causes, and then results diminish quickly after that How - Focus on just the most critical 20%: ↳20% of effort → 80% of results ↳20% of products → 80% of sales ↳20% of habits → 80% of impact ↳20% of innovations → 80% of growth 3) 1-3-5 Method What - A tool for simplifying your to-do list so you can actually complete it When - Your list is never-ending, and it's hard to know what to tackle Why - In reality, committing to work on less lets you finish more How - The night before or morning of, choose for the day just: ↳1 key project (only 1!) ↳3 medium items ↳5 smaller items ↳Leave everything else off 4) Eat Your Frog What - A commitment to do your most critical item first When - You keep putting off an important (but scary or intimidating) task Why - Doing it likely won’t be as bad as you thought, and it builds momentum How - Follow these 4 simple steps: ↳Identify the big task you're avoiding ↳Schedule time for it early in the day ↳Eat your frog: actually complete the task ↳Celebrate an early win and progress 5) Deep Work What - A block of distraction-free time to work on a key item When - You constantly get interrupted and can't focus Why - Multitasking doesn't work - you dramatically increase productivity by focusing on just one thing How - Create a deep work environment: ↳Schedule a block on your calendar ↳Put away your phone, exit your email, close Slack, shut the door ↳Focus on just 1 task for at least an hour (and preferably 2 to 3) 6) Pomodoro Technique What - A style of working in intervals When - Your feel your energy fade over time or your work seems too big Why - Short bursts paired with breaks keeps your energy and productivity up How - Alternate medium work, short break: ↳Typical: work for 25 minutes, break for 5 ↳Experiment to find what’s best for you ↳Your break should be restful (breathing, time outside) not staring at your phone or answering email The most productive people you know aren't superhuman, They're simply using these strategies. Put these to work, And you'll soon get much more done AND have more time. Any you'd add to this list? --- ♻️ Repost to help your network reclaim their time. And follow me George Stern for more productivity content
-
Imagine using video game technology to solve one of the toughest challenges in nuclear fusion — detecting high-speed particle collisions inside a reactor with lightning-fast precision. A team of researchers at UNIST has developed a groundbreaking algorithm inspired by collision detection in video games. This new method dramatically speeds up identifying particle impacts inside fusion reactors, essential for improving reactor stability and design. By cutting down unnecessary calculations, the algorithm enables real-time visualization and analysis, paving the way for safer and more efficient fusion energy development. 🎮 Gaming tech meets fusion science: The algorithm borrows from video game bullet-hit detection to track particle collisions. ⚡ 15x faster detection: It outperforms traditional methods by speeding up collision detection by up to fifteen times. 🔍 Smart calculation: Eliminates 99.9% of unnecessary computations with simple arithmetic shortcuts. 🌐 3D digital twin: Applied in the Virtual KSTAR, a detailed Korean fusion reactor virtual model. 🚀 Future-ready: Plans to leverage GPU supercomputers for faster processing and enhanced reactor simulations #FusionEnergy #VideoGameTech #ParticleDetection #NuclearFusion #Innovation #AIAlgorithm #VirtualKSTAR #CleanEnergy #ScientificBreakthrough #HighSpeedComputing https://lnkd.in/gfcssNTC
-
My next tutorial on pretraining an LLM from scratch is now out. It starts with a step-by-step walkthrough of understanding, calculating, and optimizing the loss. After training, we update the text generation function with temperature scaling and top-k sampling. And finally, we also load openly available pretrained weights into our scratch-built model architecture. Along with this pretraining tutorial, I also have bonus material on speeding up the LLM training. These apply not just to LLMs but also to other transformer-based models like vision transformers: 1. Instead of saving the causal mask, this creates the causal mask on the fly to reduce memory usage (here it has minimal effect, but it can add up in long-context size models like Llama 3.2 with 131k-input-tokens support) 2. Use tensor cores (only works for Ampere GPUs like A100 and newer) 3. Use the fused CUDA kernels for `AdamW` by setting 4. Pre-allocate and re-use GPU memory via the pinned memory setting in the data loader 5. Switch from 32-bit float to 16-bit brain float (bfloat16) precision 6. Replace from-scratch implementations of attention mechanisms, layer normalizations, and activation functions with PyTorch counterparts that have optimized CUDA kernels 7. Use FlashAttention for more efficient memory read and write operations 8. Compile the model 9. Optimize the vocabulary size 10. After saving memory with the steps above, increase the batch size Video tutorial: https://lnkd.in/gDRycWea PyTorch speed-ups: https://lnkd.in/gChvGCJH
-
Yes, even Errors have to be on budget! 2 simple metrics you need to know to call your system resilient. You publish your source code into the wild with a promise: it will work most of the time. To make sure your system is reliable, you need to understand: 𝗘𝗿𝗿𝗼𝗿 𝗕𝘂𝗱𝗴𝗲𝘁𝗶𝗻𝗴 Error budgeting defines how much downtime is acceptable over a certain period. Let's say our goal is to be 99.9% uptime, then the allowable downtime can be calculated as follows: SLO = 99.9% uptime Total Time Period = 30 days (43,200 minutes) Allowable Downtime = Total Time * (1 − Uptime %) Allowable Downtime = 43,200 minutes * (1−0.999) So, your system can have a maximum of 43.2 minutes of downtime in 30 days. Knowing your error budget helps you decide when to add new features and when to focus on fixing problems. 𝗠𝗲𝗮𝗻 𝗧𝗶𝗺𝗲 𝘁𝗼 𝗥𝗲𝗰𝗼𝘃𝗲𝗿𝘆 (𝗠𝗧𝗧𝗥) Mean Time to Recovery is the average time to fix a problem and get your system back up and running after an issue occurs. Let's say: Total Downtime = 240 minutes Number of Incidents = 6 MTTR = Total Downtime / Number of Incidents MTTR = 240 minutes / 6 = 40 minutes So, the average recovery time per incident is 40 minutes. 𝗞𝗲𝘆 𝗧𝗮𝗸𝗲𝗮𝘄𝗮𝘆𝘀 • Error Budget -> It's about balancing innovation with system reliability. • MTTR -> How quickly you can bounce back from failures. • Lower MTTR = Higher Resilience! Resilience isn't just dodging failures; it's about planning for them and bouncing back fast.
-
Guide to Building an AI Agent 1️⃣ 𝗖𝗵𝗼𝗼𝘀𝗲 𝘁𝗵𝗲 𝗥𝗶𝗴𝗵𝘁 𝗟𝗟𝗠 Not all LLMs are equal. Pick one that: - Excels in reasoning benchmarks - Supports chain-of-thought (CoT) prompting - Delivers consistent responses 📌 Tip: Experiment with models & fine-tune prompts to enhance reasoning. 2️⃣ 𝗗𝗲𝗳𝗶𝗻𝗲 𝘁𝗵𝗲 𝗔𝗴𝗲𝗻𝘁’𝘀 𝗖𝗼𝗻𝘁𝗿𝗼𝗹 𝗟𝗼𝗴𝗶𝗰 Your agent needs a strategy: - Tool Use: Call tools when needed; otherwise, respond directly. - Basic Reflection: Generate, critique, and refine responses. - ReAct: Plan, execute, observe, and iterate. - Plan-then-Execute: Outline all steps first, then execute. 📌 Choosing the right approach improves reasoning & reliability. 3️⃣ 𝗗𝗲𝗳𝗶𝗻𝗲 𝗖𝗼𝗿𝗲 𝗜𝗻𝘀𝘁𝗿𝘂𝗰𝘁𝗶𝗼𝗻𝘀 & 𝗙𝗲𝗮𝘁𝘂𝗿𝗲𝘀 Set operational rules: - How to handle unclear queries? (Ask clarifying questions) - When to use external tools? - Formatting rules? (Markdown, JSON, etc.) - Interaction style? 📌 Clear system prompts shape agent behavior. 4️⃣ 𝗜𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁 𝗮 𝗠𝗲𝗺𝗼𝗿𝘆 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝘆 LLMs forget past interactions. Memory strategies: - Sliding Window: Retain recent turns, discard old ones. - Summarized Memory: Condense key points for recall. - Long-Term Memory: Store user preferences for personalization. 📌 Example: A financial AI recalls risk tolerance from past chats. 5️⃣ 𝗘𝗾𝘂𝗶𝗽 𝘁𝗵𝗲 𝗔𝗴𝗲𝗻𝘁 𝘄𝗶𝘁𝗵 𝗧𝗼𝗼𝗹𝘀 & 𝗔𝗣𝗜𝘀 Extend capabilities with external tools: - Name: Clear, intuitive (e.g., "StockPriceRetriever") - Description: What does it do? - Schemas: Define input/output formats - Error Handling: How to manage failures? 📌 Example: A support AI retrieves order details via CRM API. 6️⃣ 𝗗𝗲𝗳𝗶𝗻𝗲 𝘁𝗵𝗲 𝗔𝗴𝗲𝗻𝘁’𝘀 𝗥𝗼𝗹𝗲 & 𝗞𝗲𝘆 𝗧𝗮𝘀𝗸𝘀 Narrowly defined agents perform better. Clarify: - Mission: (e.g., "I analyze datasets for insights.") - Key Tasks: (Summarizing, visualizing, analyzing) - Limitations: ("I don’t offer legal advice.") 📌 Example: A financial AI focuses on finance, not general knowledge. 7️⃣ 𝗛𝗮𝗻𝗱𝗹𝗶𝗻𝗴 𝗥𝗮𝘄 𝗟𝗟𝗠 𝗢𝘂𝘁𝗽𝘂𝘁𝘀 Post-process responses for structure & accuracy: - Convert AI output to structured formats (JSON, tables) - Validate correctness before user delivery - Ensure correct tool execution 📌 Example: A financial AI converts extracted data into JSON. 8️⃣ 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 𝘁𝗼 𝗠𝘂𝗹𝘁𝗶-𝗔𝗴𝗲𝗻𝘁 𝗦𝘆𝘀𝘁𝗲𝗺𝘀 (𝗔𝗱𝘃𝗮𝗻𝗰𝗲𝗱) For complex workflows: - Info Sharing: What context is passed between agents? - Error Handling: What if one agent fails? - State Management: How to pause/resume tasks? 📌 Example: 1️⃣ One agent fetches data 2️⃣ Another summarizes 3️⃣ A third generates a report Master the fundamentals, experiment, and refine and.. now go build something amazing! Happy agenting! 🤖
-
A/B testing can increase conversions by 161%+. Yet, most people don't know how to do A/B testing properly. Here are 10 copy elements you should test (and how to do it). From a $200M Marketer: Before we start... When running an A/B test, it's important to only test one variable at a time. This way, you'll know exactly what change impacted the metric you’re optimizing for. Various tips below pertain to landing pages, emails, ads, etc. 1) Headline Testing headlines is crucial as they directly impact how many people read the rest of your copy. You can test headlines by changing: - Tone - Length - Emotional appeal - Use of Numbers This will help you understand what catches your target's attention the best. 2) Call-to-Actions Testing CTAs is vital because it can lead to higher click rates & purchases. You can test headlines by changing: - Copy - Placement - Color & Design - Urgency and Scarcity This will give you insights into what combo of attributes drives the most clicks. 3) Value Proposition Testing different value propositions allows you to communicate what your product offers in a better way. Test your value props by changing: - Format - Benefits - Pain Points This will uncover the value proposition that resonates most with your target. 4) Body Copy Length Testing the body copy length helps you find the balance between information and engagement. This testing is made by comparing short-form copy and long-form copy. This experimentation will reveal the ideal copy length that keeps readers engaged. 5) Emotional Appeal Testing different emotional triggers allows you to tap into your target's desires and motivations. To test this, experiment with different emotions: - Fear - Anger - Desire This will help you create copy that deeply connects with your prospect. 6) Social Proof Testing different presentations of social proof helps establish trust and credibility with your audience. To test this, try different formats such as: - UGC - Testimonials - Case studies This will highlight the most compelling way to present social proof. 7) Pricing Strategies Testing pricing strategies helps you optimize your pricing model for maximum conversions. Test pricing strategies by offering: - Discounts - Bonuses - Financing This will uncover the pricing approach that resonates best with your target. 8) Storytelling Using storytelling techniques allows you to captivate your audience more easily. To test this, try incorporating: - Stories - Characters - Narrative arcs Into your copy. This will help you create emotion-evoking and thought-provoking copy. 9) Formatting Testing different types of formatting enhances the readability and scannability of your copy. Test formatting by varying: - Text alignment - Sentence length - Word styling (bold/italics) This will improve your copy's presentation & lead to higher engagement. ( #10 in the comments )
-
A sluggish API isn't just a technical hiccup – it's the difference between retaining and losing users to competitors. Let me share some battle-tested strategies that have helped many achieve 10x performance improvements: 1. 𝗜𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝘁 𝗖𝗮𝗰𝗵𝗶𝗻𝗴 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝘆 Not just any caching – but strategic implementation. Think Redis or Memcached for frequently accessed data. The key is identifying what to cache and for how long. We've seen response times drop from seconds to milliseconds by implementing smart cache invalidation patterns and cache-aside strategies. 2. 𝗦𝗺𝗮𝗿𝘁 𝗣𝗮𝗴𝗶𝗻𝗮𝘁𝗶𝗼𝗻 𝗜𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗮𝘁𝗶𝗼𝗻 Large datasets need careful handling. Whether you're using cursor-based or offset pagination, the secret lies in optimizing page sizes and implementing infinite scroll efficiently. Pro tip: Always include total count and metadata in your pagination response for better frontend handling. 3. 𝗝𝗦𝗢𝗡 𝗦𝗲𝗿𝗶𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻 This is often overlooked, but crucial. Using efficient serializers (like MessagePack or Protocol Buffers as alternatives), removing unnecessary fields, and implementing partial response patterns can significantly reduce payload size. I've seen API response sizes shrink by 60% through careful serialization optimization. 4. 𝗧𝗵𝗲 𝗡+𝟭 𝗤𝘂𝗲𝗿𝘆 𝗞𝗶𝗹𝗹𝗲𝗿 This is the silent performance killer in many APIs. Using eager loading, implementing GraphQL for flexible data fetching, or utilizing batch loading techniques (like DataLoader pattern) can transform your API's database interaction patterns. 5. 𝗖𝗼𝗺𝗽𝗿𝗲𝘀𝘀𝗶𝗼𝗻 𝗧𝗲𝗰𝗵𝗻𝗶𝗾𝘂𝗲𝘀 GZIP or Brotli compression isn't just about smaller payloads – it's about finding the right balance between CPU usage and transfer size. Modern compression algorithms can reduce payload size by up to 70% with minimal CPU overhead. 6. 𝗖𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗼𝗻 𝗣𝗼𝗼𝗹 A well-configured connection pool is your API's best friend. Whether it's database connections or HTTP clients, maintaining an optimal pool size based on your infrastructure capabilities can prevent connection bottlenecks and reduce latency spikes. 7. 𝗜𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝘁 𝗟𝗼𝗮𝗱 𝗗𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻 Beyond simple round-robin – implement adaptive load balancing that considers server health, current load, and geographical proximity. Tools like Kubernetes horizontal pod autoscaling can help automatically adjust resources based on real-time demand. In my experience, implementing these techniques reduces average response times from 800ms to under 100ms and helps handle 10x more traffic with the same infrastructure. Which of these techniques made the most significant impact on your API optimization journey?
-
Excited to share our end-to-end LLM workflows guide that we’ve used to help our industry customers fine-tune and serve OSS LLMs that outperform closed-source models in quality, performance and cost. Key LLM workloads with docs.ray.io and Anyscale: - 🔢 Preprocess our dataset (filter, schema, etc.) with batch data processing. - 🛠️ Fine-tune our LLMs (ex. Meta Llama 3) with full control (LoRA/full param, compute, loss, etc.) and optimizations (parallelism, mixed precision, flash attn, etc.) with distributed training. - ⚖️ Evaluate our fine-tuned LLMs with batch inference using Ray + vLLM. - 🚀 Serve our LLMs as a production application that can autoscale, swap between LoRA adapters, optimize for latency/throughput, etc. Key Anyscale infra capabilities that keeps these workloads efficient and cost-effective: - ✨ Automatically provision worker nodes (ex. GPUs) based on our workload's needs. They'll spin up, run the workload and then scale back to zero (only pay for compute when needed). - 🔋 Execute workloads (ex. fine-tuning) with commodity hardware (A10s) instead of waiting for inaccessible resources (H100s) with data/model parallelism. - 🔙 Configure spot instance to on-demand fallback (or vice-versa) for cost savings. - 🔄 Swap between multiple LoRA adapters using one base model (optimized with multiplexing). - ⚡️ Autoscale to meet demand and scale back to zero. 🆓 You can run this guide entirely for free on Anyscale (no credit card needed). Instructions in the links below 👇 🔗 Links: - Blog post: https://lnkd.in/gvPQGzjh - GitHub repo: https://lnkd.in/gxzzuFAE - Notebook: https://lnkd.in/gmMxb36y
Explore categories
- Hospitality & Tourism
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development