#datascienceeducation #businessanalyst #database #datasciencecourse | Joachim Schork

Data Science Education & Consulting

2mo

Overfitting is a critical issue in machine learning and data modeling. It occurs when a model learns the training data too well, capturing noise and minor fluctuations rather than the underlying pattern. This results in a model that performs exceptionally on training data but poorly on new, unseen data. Overfitting can lead to: ✅ Poor generalization, meaning the model fails to perform well on new data. ✅ High variance, where the model's predictions vary greatly for different data sets. ✅ Increased error rates on unseen data, making the model unreliable for real-world applications. ✅ Misleading insights, as the model's decisions are based on noise rather than true patterns. ✅ Inefficient models, which are overly complex and harder to interpret. A visualization from a Wikipedia article illustrates this issue (link: https://lnkd.in/eP5gS4U6). The green line represents an overfitted model and the black line represents a regularized model. While the green line best follows the training data, it is too dependent on that data and is likely to have a higher error rate on new unseen data, as illustrated by black-outlined dots, compared to the black line. For regular tips on data science, statistics, Python, and R programming, check out my free email newsletter. More info: http://eepurl.com/gH6myT #datascienceeducation #businessanalyst #database #datasciencecourse

12 Comments

Dmitrii Bychenko

Senior Software Developer at SOTI

2mo

I doubt if we can show overfitting in just one picture: overfit appears when model tries to include random noise into itself and that's why we have a (near) perfect fit for training data (noise included) but poor fit for cross validation or test data (training data noise included into the model spoils the fun). Having one picture we can only suspect overfit, to be sure we must have at least test data.

5 Reactions

Imogen Hull

Data Scientist | AI Ethics & Age Bias Research | Age-Inclusive AI Advocate | Writer & Researcher at Beyond the Average

2mo

A very common problem!

2 Reactions

Alex Brooks

IT Business Analyst | Data Analytics | Math & Statistics

2mo

When I first started doing ML in school I got pretty excited when my accuracy was 90%…that is, until I tested it.

1 Reaction

Robert Rachford

CEO of Better Biostatistics 🔬 A Biometrics Consulting Network for the Life Sciences 🌎 Father 👨🏻🍼

2mo

Overfitting is one of those concepts that seems simple in theory, but in practice, I’ve seen it sneak into everything from clinical trial modeling to operational forecasting. What resonates most with me is your point on misleading insights. In applied settings, an overfit model doesn’t just “underperform,” it can quietly shape decisions on faulty assumptions. That’s why I’m such a believer in balancing model accuracy with interpretability and stress-testing results across multiple datasets or simulation scenarios. Appreciate you highlighting this so clearly. Joachim Schork

1 Reaction

Yogesh Mudliar

Data Scientist @ iPUMPNET | Empowering Businesses with Data-Driven AI Solutions | Generative & Agentic AI | Driving Innovation Through AI-Powered Insights & Automation"

LinkedIn respects your privacy

Joachim Schork’s Post

Explore content categories

Joachim Schork’s Post

Explore related topics

Explore content categories