Building an AI pipeline in 72 hours: Our weekend challenge | Cameron Yeo posted on the topic | LinkedIn

LinkedIn respects your privacy

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Computer Science + Economics Double Major @NUS || Creative Director + Chief Editor @NUS Motoring Club || NUS Merit Scholar

1mo

Can you really build an AI pipeline in just 72 hours? That was the challenge Joshua Teo, Lucas Sam, and I took on this past weekend! In just three days, we: ⚡ Built our very first ML pipeline from scratch ⚡ Engineered custom features and integrated 3 open-source models ⚡ Processed nearly 200,000 Google Maps reviews ⚡ Trained an AI model that filters out spam, irrelevant, and rant reviews It was an intense but rewarding sprint, and we’re proud of what we managed to put together. 👉 Check out our submission here: https://lnkd.in/g53UA55J? If you found it interesting, we’d love your support with a vote. Feel free to drop any questions! We’re happy to share what we learned!

4 Comments

Transcript

Hi, my name is Cameron and together with me I have Lucas and Joshua and together we are team brothers and this is our submission. For submission, we've built an EI/ML pipeline for filtering Google location reviews. I'll walk you through the actual code organized across 5 notebooks, namely feature engineering, labeling, training, active learning, and evaluation. Each one is modular, so the pipeline can be repeated or scaled. Starting with feature engineering, we generally structured features from reviews, users, and images, which captures counts, text metrics, and behavioral signals. We then move towards labeling, where we use multiple LMS to assign truth labels to ensure consistency across our agreement checks. Next we move to model training where we train our model on engineer engineered feedback and optimize with validation feedback. We move on to active learning where the model predicts labels based on new data and where we do cast aware sampling, which ensures balanced categories, which is reinserted into labeling first to repeat this pipeline multiple times. And lastly, we do model evaluation where we test against held out data. We measure accuracy, precision, recall, and reliability metrics. Let me walk you through the exact code right now. Here is feature engineering and you can see the exact models or the exact features that we have engineered. We range from things like um sentiment analysis to timewise features, topic modeling, keyword extraction. Policy insights, stylistic features, emotional and expressive markers, as well as user information and photo information. After these features are done, we move on to labeling. For leveling, we are using multiple open source LMS to to assign the true value to each to each cell data. As you can see here, we are using Mistral, Jemma and Lama for this. At the end of the prompts we should have it labeled data set over here. After that, we do some encoding and data preparation such as standardization and normalization to prepare our and data splitting as well to prepare our our data to train the model. After that, we feed this into model training into. After this we go to the model training network. In the monetary notebook, we have 3 branches which is namely the tax brush, the categorical branch and the continuous branch. You can see that over here. In forward, the tree branches are concatenated and passed through a classifier to predict the class logics. After that the training loop is handled by this method over here which is trained under score. Enhanced under score model over here. This computes the class weights, tokenizers, net the net, the text with auto tokenizer and trains with the hugging phase trainer using early stopping. Is a te fixeren the ateageerd a. Taxes are the train weights and the tokenizer over here. And then you can see our final output over here. After that, we head to the Active Learning Notebook. Here we score our unlabeled reviews using the trained model. The matter that we that the matter known as calculate under score uncertainty. Compute entropy based uncertainty and then class under score where sampling ensures balanced sampling where it chooses the most uncertain samples per class to avoid imbalanced training. Then we head there over now. Yep here. You can see at the end we return the user IDs of these text IDs which we will go which we are going to refeed into the cycle over here. Last but not least, we have our model evaluation. Here we compute per class precision recall an F1 without per class metrics. You can see what's happening over here. And to calculate our problem, to calibrate our probabilities we have we have a method called under score C under score. Under score CRISPR BOE to confidence, which converts our raw softmax scores into confidence levels. This notebook also includes error analysis and even a cost of false positive table to show business impact. So in summary, we have 5 main notebooks and this is what each of them done each of them do. You can find out information in their GitHub repository and we'll be happy to take any questions. Thank you.

Liang Rong Lim, graphic

Strategy & Insights @ Grab | Business Analytics @ NUS Computing | VIP@SoC Start-Up

1mo

Congrats cameron!! I agree, TechJam was fun and exciting 🔥

Lucas Sam, graphic

AI Engineer @ SealXpert Products | Computer Science @ NUS, Minor in Data Analytics

1mo

pleasure working with you!

Nicholas Lim, graphic

Strategy | Policy | Asset Management | Business Development

1mo

Good job guys !

Bowen Qiao, graphic

--

3w

Amazing efforts! Solid work!

See more comments

To view or add a comment, sign in

More Relevant Posts

Clemens Adolphs

Helping enterprises avoid AI pilot purgatory | Co-founder at AICE Labs | End-to-end AI & Software that actually gets deployed
3w
Report this post
"Don't fall in love with the solution / tool, fall in love with the problem" Okay cool but... when something truly novel comes along (GenAI in the last few years), why WOULDN'T you brainstorm all sorts of use cases for it? You might find solutions for problems that weren't even on your radar because, with the previously available tech, they were completely intractable. By all means check those brainstormed use cases to see if they're actually solving a problem, but don't limit yourself to the already identified ones. --- 📧 Daily newsletter on building AI/software projects that deliver. Link in profile.

2 Comments
Like Comment
To view or add a comment, sign in
Achim Amling

Senior PM & Business Analyst | Advancing AI & Business Intelligence for Digital Transformation | (Mindfully) Surfing the Waves of AI
1mo
Report this post
I just got an invite to try Comet, the new AI assistant from Perplexity. The pitch is bold: browse at the speed of thought. From what I see, it goes beyond search. → Real-time answers pulled from the web. → Summaries of long documents and articles. → Email and calendar integration. → A built-in assistant that can actually complete tasks, not just suggest them. That makes it feel less like a search engine and more like a true productivity layer. Here is my question to the network: → Has anyone here tested Comet hands-on? → What stood out compared to other AI assistants or browsers? → Would you use it daily, or does it still feel like hype? Curious to hear real experiences before I dive in.
1 Comment
Like Comment
To view or add a comment, sign in
Naresh Edagotti

Data Scientist at HITLOOP | Harnessing Data to Drive Business Success | Python | ML | DL | NLP | Gen AI | AI Agents
3w
Report this post
Struggling to understand all the terms around RAG, LLMs, and vector search? You’re building pipelines, tweaking prompts, and deploying agents — but the jargon slows you down. This cheat sheet covers the 25 most important RAG terms — with simple, clear definitions to help you speak the language of GenAI. ✅ Learn terms like: Chunking, Hybrid Search, Reranking, Few-shot Prompting, Context Injection & more. ➕ Follow Naresh Edagotti for more content that makes complex AI topics feel simple.
22 Comments
Like Comment
To view or add a comment, sign in
Orious Strategy

68 followers
1mo
Report this post
Already know the basics? Time to build something real. Here’s Chapter 1 where you’ll create your first simple AI model and explore real-world applications. 🛠 Hands-on projects 📊 No-code & code options 📌 Resources to follow along 📖 Read the first chapter for free and get the full step-by-step guide here: 🔗 https://lnkd.in/dSMM2zbj
Like Comment
To view or add a comment, sign in
Arthur Howell

Cybersecurity Executive | Head of Cyber Advisory | Incident Response & Digital Forensics Leader | Generative AI Security | Revenue Growth through Resilience
1mo
Report this post
Attention, fellow digital marketers and tech enthusiasts! Ever wondered what happens when GPT-5 gets a thought-provoking makeover as "Research Goblin"? Spoiler alert: It turns searching for information into an exhilarating adventure! Our latest blog post dives deep into how this sharp-witted AI not only enhances your search game but does so with a flair that’ll leave you wondering why you ever relied on traditional methods. Say goodbye to tedious scrolling and hello to a smarter way of sifting through data. Curious yet? Check out the full scoop right here: https://ift.tt/JqVAosd. Get ready to level up your search strategy!
Like Comment
To view or add a comment, sign in
Andrew E.

Helping software companies use AWS more effectively
1mo
Report this post
AI usage—and costs—are climbing fast. I’ve been trying to get the most out of AI in my programming workflow without spending a tonne. I’ve compared API pricing and subscription plans across the major providers, and I’ve ended up leaning on the daily allocations from subscriptions instead of metered API tokens. In the video, I’ll dig into why that’s a bit strange—and why these subscriptions are surprisingly compelling.
Like Comment
To view or add a comment, sign in
Alan Saldivar

Senior Frontend UI Developer
1mo
Report this post
As a heavy user of GPT and other GenAI tools, I’ve learned an important lesson: stop arguing with the models and fix what I know I need to fix myself. It’s fine to circle back a few times, but after spending hours in unproductive iterations, I realized it’s far more effective to invest upfront in better prompt engineering. Asking AI something vague like “refactor my codebase with no errors” only starts an unnecessary fight—one with no winners, and usually just one whiner.
1 Comment
Like Comment
To view or add a comment, sign in
Laly Bar-Ilan

Natural Language Processing & AI Expert | Chief Scientist at Bit
1mo
Report this post
In this video I cover the 4 biggest ways AI-generated code can mess up your codebase, and strategies to keep things clean and maintainable. Here’s what I get into: - How to handle code inflation brought in by AI - How to minimize "standards drift" - How to handle AI's inherent "Context blindness" - 7 practical strategies you can use right now to keep your codebase under control If you’re using AI in your dev workflow, this will help you ship faster without turning your repo into chaos. 🎤 Curious to hear from you: What’s the worst AI-generated mess you’ve had to fix? And how are you keeping AI in check in your codebase?

19 Comments
Like Comment
To view or add a comment, sign in
Daniel Kempe

Co-Founder @ Quuu.co and BlogJoy.co.
1mo
Report this post
Just spent the weekend test-driving 40+ AI tools so you don't have to. Honest take? The landscape has shifted dramatically in the last year. Standouts that actually delivered: - Synthesia for AI video (way better than shooting stuff yourself) - Manus for automation (finally replaced my VA for data tasks) - NotebookLM turned my chaotic research into an audio summary I could listen to while running Most overhyped? Those AI search engines still spitting out hallucinations half the time. And don't get me started on how many "AI email assistants" just reword things into corporate jargon. But here's what nobody's talking about: The real game-changer isn't any single tool - it's how founders are building entire workflows by connecting 3-4 specialized AI tools together. Who's actually using these in interesting ways? Drop your setup in the comments. Always looking to upgrade my stack.
Like Comment
To view or add a comment, sign in
Foundation Marketing

11,933 followers
2w
Report this post
If AI doesn’t cite you, your content might as well not exist. Here’s how to increase your visibility chances: ▸ Quotes: Add concise, credible statements worth pulling ▸ Stats: Original data or fresh angles that models want to reference ▸ Structure: Clear sections, headers, and formatting for easy parsing ▸ Metadata: Titles and descriptions written for humans and machines ▸ Reddit & community signals: Conversations that validate your authority You don’t need more blog posts, you need to make the ones you have AI-ready. Read the full Labs blog break down on what's Generative Engine Optimization (GEO) and how you do it with examples. Link in the first comment. 👇

1 Comment
Like Comment
To view or add a comment, sign in

Cameron Yeo

525 followers

20 Posts

View Profile Follow

Explore content categories