Joachim Schork’s Post

View profile for Joachim Schork

Data Science Education & Consulting

Overfitting is a critical issue in machine learning and data modeling. It occurs when a model learns the training data too well, capturing noise and minor fluctuations rather than the underlying pattern. This results in a model that performs exceptionally on training data but poorly on new, unseen data. Overfitting can lead to: ✅ Poor generalization, meaning the model fails to perform well on new data. ✅ High variance, where the model's predictions vary greatly for different data sets. ✅ Increased error rates on unseen data, making the model unreliable for real-world applications. ✅ Misleading insights, as the model's decisions are based on noise rather than true patterns. ✅ Inefficient models, which are overly complex and harder to interpret. A visualization from a Wikipedia article illustrates this issue (link: https://lnkd.in/eP5gS4U6). The green line represents an overfitted model and the black line represents a regularized model. While the green line best follows the training data, it is too dependent on that data and is likely to have a higher error rate on new unseen data, as illustrated by black-outlined dots, compared to the black line. For regular tips on data science, statistics, Python, and R programming, check out my free email newsletter. More info: http://eepurl.com/gH6myT #datascienceeducation #businessanalyst #database #datasciencecourse

  • No alternative text description for this image
Dmitrii Bychenko

Senior Software Developer at SOTI

2mo

I doubt if we can show overfitting in just one picture: overfit appears when model tries to include random noise into itself and that's why we have a (near) perfect fit for training data (noise included) but poor fit for cross validation or test data (training data noise included into the model spoils the fun). Having one picture we can only suspect overfit, to be sure we must have at least test data.

Imogen Hull

Data Scientist | AI Ethics & Age Bias Research | Age-Inclusive AI Advocate | Writer & Researcher at Beyond the Average

2mo

A very common problem!

Alex Brooks

IT Business Analyst | Data Analytics | Math & Statistics

2mo

When I first started doing ML in school I got pretty excited when my accuracy was 90%…that is, until I tested it.

Robert Rachford

CEO of Better Biostatistics 🔬 A Biometrics Consulting Network for the Life Sciences 🌎 Father 👨🏻🍼

2mo

Overfitting is one of those concepts that seems simple in theory, but in practice, I’ve seen it sneak into everything from clinical trial modeling to operational forecasting. What resonates most with me is your point on misleading insights. In applied settings, an overfit model doesn’t just “underperform,” it can quietly shape decisions on faulty assumptions. That’s why I’m such a believer in balancing model accuracy with interpretability and stress-testing results across multiple datasets or simulation scenarios. Appreciate you highlighting this so clearly. Joachim Schork

Yogesh Mudliar

Data Scientist @ iPUMPNET | Empowering Businesses with Data-Driven AI Solutions | Generative & Agentic AI | Driving Innovation Through AI-Powered Insights & Automation"

2mo

Thanks for sharing, Joachim

See more comments

To view or add a comment, sign in

Explore content categories