The Mother of All Database Normalization Debates on Coding Horror
The Mother of All Database Normalization Debates on Coding Horror
html
Normalization is not magical fairy dust you sprinkle over your database
to cure all ills; it often creates as many problems as it solves. (Jeff)
Normalize until it hurts, denormalize until it works. (Jeff)
Use materialized views which are tables created and maintained by
your RDBMS. So a materialized view will act exactly like a de-
normalized table would - except you keep you original normalized
structure and any change to original data will propagate to the view
automatically. (Goran)
According to Codd and Date table names should be singular, but what
did they know. (Pablo, LOL)
Denormalization is something that should only be attempted as an
optimization when EVERYTHING else has failed. Denormalization
brings with it it's own set of problems. You have to deal with the
increased set of writes to the system (which increases your I/O costs), you
have to make changes in multiple places when data changes (which means
http://weibo.com/developerworks 2012-11-11 整理 第 1/7页
http://highscalability.com/blog/2008/7/16/the-mother-of-all-database-normalization-debates-on-coding-h.html
either taking giant locks - ugh or accepting that there might be temporary
or permanent data integrity issues) and so on. (Dare Obasanjo)
What happens, is that people see "Normalisation = Slow", that makes
them assume that normalisation isn't needed. "My data retrieval needs to
be fast, therefore I am not going to normalise!" (Tubs)
You can read fast and store slow or you can store fast and read slow.
The biggest performance killer is so called physical read. Finding and
accessing data on disk is the slowest operation. Unless child table is
clustered indexed and you're using the cluster index in the join you will be
making lots of small random access reads on the disk to find and access
the child table data. This will be slow. (Goran)
The biggest scalability problems I face are with human processes, not
computer processes. (John)
Don't forget that the fastest database query is the one that doesn't
happen, i.e. caching is your friend. (Chris)
Normalization is about design, denormalization is about optimization.
(Peter Becker)
You're just another knucklehead. (BuggyFunBunny)
Lets unroll our loops next. RDBMS is about shared *transactional
data*. If you really don't care about keeping the data right all the time,
then how you store it doesn't matter.(Christog)
Jeff, are you awake? (wiggle)
Denormalization may be all well and good, when you need the
performance and your system is STABLE enough to support it. Doing this
in a business environment is a recipe for disaster, ask anyone who has
spent weeks digging through thousands of lines of legacy code, making
sure to support the new and absolutely required affiliation_4. Then do the
whole thing over again 3 months later when some crazy customer has five
affiliations. (Sean)
Do you sex a cat, or do you gender it? (Simon)
This is why this article is wrong, Jeff. This is why you're an idiot, in
case the first statement wasn't clear enough. You just gave an excuse to be
http://weibo.com/developerworks 2012-11-11 整理 第 2/7页
http://highscalability.com/blog/2008/7/16/the-mother-of-all-database-normalization-debates-on-coding-h.html
(Greg)
How is the data being used? Rapid inserts like Twitter? New user
registration? Heavy reporting? How one stores data vs. how one uses data
vs. how one collects data vs. how timely must new data be visible to the
world vs. should be put OLTP data into a OLAP cube each night? etc. are
all factors that matter. (Steve)
It might be possible to overdo it, but trust me, I have had 20 times the
problems with denormalized data than with normalized. (PRMAN)
Your LOGICAL model should *always* be fully normalized. After all,
it is the engine that you derive everything else from. Your PHYSICAL
model may be denormalized or use system specific tools (materialized
views, cache tables, etc) to improve performance, but such things should
be done *after* the application level solutions are exhausted (page
caching, data caches, etc.) (Jonn)
For very large scale applications I have found that application
partitioning can go a long way to giving levels of scalability that
monolithic systems fail to provide: each partition is specialized for the
function it provides and can be optimized heavily, and when you need the
parts to co-operate you bind the partitions together in higher level code.
(John)
People don't care what you put into a database. They care what you can
get out of it. They want freedom and flexibility to grow their business
beyond "3 affiliations". (PRMan)
People don't care what you put into a database. They care what you can
get out of it. They want freedom and flexibility to grow their business
beyond "3 affiliations". (Steve)
I normalise, then have distinct (conceptually transient) denormalised
cache tables which are hammered by the front-end. I let my model deal
with the nitty-gritty of the fix-ups where appropriate (handy beginUpdate/
endUpdate-type methods mean the denormalised views don't get rebuilt
more than necessary). (Mo)
Stop playing with mySQL. (Jonathan)
http://weibo.com/developerworks 2012-11-11 整理 第 5/7页
http://highscalability.com/blog/2008/7/16/the-mother-of-all-database-normalization-debates-on-coding-h.html
OK, more than a few quotes. There's certainly no lack of passion on the
issue!
Related Articles