Intro Physics 2.a4
Intro Physics 2.a4
by
Robert G. Brown
Duke University Physics Department
Durham, NC 27708-0305
[email protected]
Copyright Notice
Copyright Robert G. Brown 1993, 2007, 2013
Notice
This physics textbook is designed to support my personal teaching activities at Duke
University, in particular teaching its Physics 141/142, 151/152, or 161/162 series (Introduc-
tory Physics for life science majors, engineers, or potential physics majors, respectively).
It is freely available in its entirety in a downloadable PDF form or to be read online at:
http://www.phy.duke.edu/∼rgb/Class/intro physics 2.php
It is also available in an inexpensive (really!) print version via Lulu press here:
http://www.lulu.com/shop/product-21025164.html
where readers/users can voluntarily help support or reward the author by purchasing either
this paper copy or one of the even more inexpensive electronic copies.
By making the book available in these various media at a cost ranging from free to
cheap, I enable the text can be used by students all over the world where each student can
pay (or not) according to their means.
Nevertheless, I am hoping that students who truly find this work useful will purchase
a copy through Lulu or a bookseller (when the latter option becomes available), if only
to help subsidize me while I continue to write inexpensive textbooks in physics or other
subjects.
This textbook is organized for ease of presentation and ease of learning. In partic-
ular, they are hierarchically organized in a way that directly supports efficient learning.
They are also remarkably complete in their presentation and contain moderately detailed
derivations of many of the important equations and relations from first principles while not
skimping on simpler heuristic or conceptual explanations as well.
As a “live” document (one I actively use and frequently change, adding or deleting
material or altering the presentation in some way), this textbook may have errors great
and small, “stub” sections where I intend to add content at some later time but haven’t yet
finished it, and they cover and omit topics according to my own view of what is or isn’t
important to cover in a one-semester course. Expect them to change with little warning or
announcement as I add content or correct errors.
Purchasers of the paper version should be aware of its probable imperfection and be
prepared to either live with it or mark up their copy with corrections or additions as need
be. The latest (and hopefully most complete and correct) version is always available for
free online anyway, and people who have paid for a paper copy are especially welcome
to access and retrieve it.
I cherish good-hearted communication from students or other instructors pointing out
errors or suggesting new content (and have in the past done my best to implement many
such corrections or suggestions).
Books by Robert G. Brown
Physics Textbooks
• Introductory Physics I and II
A lecture note style textbook series intended to support the teaching of introductory
physics, with calculus, at a level suitable for Duke undergraduates.
• Classical Electrodynamics
A lecture note style textbook intended to support the second semester (primarily
the dynamical portion, little statics covered) of a two semester course of graduate
Classical Electrodynamics.
Computing Books
• How to Engineer a Beowulf Cluster
An online classic for years, this is the print version of the famous free online book on
cluster engineering. It too is being actively rewritten and developed, no guarantees,
but it is probably still useful in its current incarnation.
Fiction
• The Book of Lilith
ISBN: 978-1-4303-2245-0
Web: http://www.phy.duke.edu/∼rgb/Lilith/Lilith.php
Lilith is the first person to be given a soul by God, and is given the job of giving all
the things in the world souls by loving them, beginning with Adam. Adam is given the
job of making up rules and the definitions of sin so that humans may one day live in
an ethical society. Unfortunately Adam is weak, jealous, and greedy, and insists on
being on top during sex to “be closer to God”.
Lilith, however, refuses to be second to Adam or anyone else. The Book of Lilith is
a funny, sad, satirical, uplifting tale of her spiritual journey through the ancient world
soulgiving and judging to find at the end of that journey – herself.
Poetry
• Who Shall Sing, When Man is Gone
Original poetry, including the epic-length poem about an imagined end of the world
brought about by a nuclear war that gives the collection its name. Includes many long
and short works on love and life, pain and death.
• Hot Tea!
More original poetry with a distinctly Zen cast to it. Works range from funny and
satirical to inspiring and uplifting, with a few erotic poems thrown in.
All of these books can be found on the online Lulu store here:
http://stores.lulu.com/store.php?fAcctID=877977
The Book of Lilith is available on Amazon, Barnes and Noble and other online book-
seller websites.
Contents
I: Preliminaries v
Preface v
i
ii CONTENTS
Summary 47
II: Electrostatics 51
I Optics 507
xi
Preface
This introductory electromagnetism and optics text is intended to be used in the second
semester of a two-semester series of courses teaching introductory physics at the college
level, following a first semester course in (Newtonian) mechanics and thermodynamics.
The text is intended to support teaching the material at a rapid, but advanced level – it
was developed to support teaching introductory calculus-based physics to potential physics
majors, engineers, and other natural science majors at Duke University over a period of
more than twenty-five years.
Students who hope to succeed in learning physics from this text will need, as a min-
imum prerequisite, a solid grasp of mathematics. It is strongly recommended that all
students have mastered mathematics at least through single-variable differential calculus
(typified by the AB advanced placement test or a first-semester college calculus course).
Students should also be taking (or have completed) single variable integral calculus (typ-
ified by the BC advanced placement test or a second-semester college calculus course).
In the text it is presumed that students are competent in geometry, trigonometry, algebra,
and single variable calculus; more advanced multivariate calculus is used in a number of
places but it is taught in context as it is needed and is always “separable” into two or three
independent one-dimensional integrals.
Many students are, unfortunately weak in their mastery of mathematics at the time they
take physics. This enormously complicates the process of learning for them, especially if
they are years removed from when they took their algebra, trig, and calculus classes as
is frequently the case for pre-medical students. For that reason, several supplemental
materials including an online textbook (work in progress) on the math required specifically
to learn introductory physics quickly and efficiently, a short collection of “One Sheet Math
Review” pages that cover individual requirements such as the “needed algebra” skills or
“needed calculus” skills on a single side of a single sheet of paper, and finally, a chapter
in this textbook that falls somewhere in between – more than the one sheet review, but far
from a textbook and concentrating more on the new math required for E&M specifically.
The online book is located here:
http://www.phy.duke.edu/∼rgb/Class/math for intro physics.php
The “One Sheet Math Review” is located here:
http://www.phy.duke.edu/∼rgb/Class/math for intro physics.php
The chapter is located at the end of this section, right before we begin actual content.
Again, I strongly suggest that all students who are reading these words while preparing to
xiii
xiv CONTENTS
begin studying physics pause for a moment, and at least visit the One Sheet Review site,
print out its pages, put them into their physics notebook, and go over them enough to feel
comfortable with their content.
Note that Getting Ready to Learn Physics in this Preliminaries section is not part of
the course per se, but I usually do a quick review of this material (as well as the course
structure, grading scheme, and so on) in my first lecture of any given semester, the one
where students are still finding the room, dropping and adding courses, and one cannot
present real content in good conscience unless you plan to do it again in the second lecture
as well. Students greatly benefit from guidance on how to study, as most enter physics
thinking that they can master it with nothing but the memorization and rote learning skills
that have served them so well for their many other fact-based classes. Of course this is
completely false – physics is reason based and conceptual and it requires a very different
pattern of study than simply staring at and trying to memorize lists of formulae or examples.
Students, however, should not count on their instructor doing this – they need to be
self-actualized in their study from the beginning. It is therefore strongly suggested that
all students read this preliminary chapter right away as their first “assignment” whether
or not it is covered in the first lecture or assigned. In fact, (if you’re just such a student
reading these words) you can always decide to read it right now (as soon as you finish this
Preface). It won’t take you an hour, and might make as much as a full letter difference (to
the good) in your final grade. What do you have to lose?
Even if you think that you are an excellent student and learn things totally effortlessly,
I strongly suggest reading it. It describes a new perspective on the teaching and learning
process supported by very recent research in neuroscience and psychology, and makes
very specific suggestions as to the best way to proceed to learn physics.
Finally, the Introduction is a rapid summary of the entire course! If you read it and
look at the pictures before beginning the course proper you can get a good conceptual
overview of everything you’re going to learn. If you begin by learning in a quick pass the
broad strokes for the whole course, when you go through each chapter in all of its detail,
all those facts and ideas have a place to live in your mind.
That’s the primary idea behind this textbook – in order to be easy to remember, ideas
need a house, a place to live. Most courses try to build you that house by giving you one
nail and piece of wood at a time, and force you to build it in complete detail from the ground
up.
Real houses aren’t built that way at all! First a foundation is established, then the frame
of the whole house is erected, and then, slowly but surely, the frame is wired and plumbed
and drywalled and finished with all of those picky little details. It works better that way. So
it is with learning.
Textbook Layout and Design
This textbook has a design that is just about perfectly backwards compared to most text-
books that currently cover the subject. Here are its primary design features:
• There are only thirteen substantive chapters. The book is organized so that it can be
sanely taught in a single college semester with at most a chapter a week. I teach it
in a five week summer session at the Duke Marine Lab in Beaufort, NC and (at three
chapters a week plus startup and wind-down) that works too!
• It begins each chapter with an “abstract” and chapter summary. Detail, especially
lecture-note style mathematical detail, follows the summary rather than the other
way around.
• This text does not spend page after page trying to explain in English how physics
works (prose which to my experience nobody reads anyway). Instead, a terse “lecture
note” style presentation outlines the main points and presents considerable mathe-
matical detail to support solving problems.
• Each chapter ends with a short (by modern standards) selection of challenging
homework problems that are specifically chosen to precisely span the primary con-
cepts and examples, often requiring a student to rederive for themselves things that
were presented as primary content or examples in lecture. A good student might well
get through all of the problems in the book, rather than at most 10% of them as is
the general rule for other texts. Students that really, really want more problems to
solve to shoot for an ‘A’ can look at can find them in a supplementary (online) book
filled with nothing but problems, but students that can do the homework perfectly will
almost certainly get a ‘B’ or better without them.
• The homework problems are weakly sorted out by level, as this text is intended
to support non-physics science and pre-health profession students, engineers, and
xv
Getting Ready to Learn Physics 1
physics majors all three. The material covered is of course the same for all three,
but the level of detail and difficulty of the math used and required is a bit different.
• The textbook is entirely algebraic in its presentation and problem solving require-
ments – with very few exceptions no calculators should be required to solve prob-
lems. The author assumes that any student taking physics is capable of punching
numbers into a calculator, but it is algebra that ultimately determines the formula that
they should be computing. Numbers are used in problems only to illustrate what
“reasonable” numbers might be for a given real-world physical situation or where the
problems cannot reasonably be solved algebraically (e.g. resistance networks).
2 Getting Ready to Learn Physics
Getting Ready to Learn Physics
If you are reading this, I assume that you are either taking a course in physics or wish
to learn physics on your own. If this is the case, I want to begin by teaching you the
importance of your personal engagement in the learning process. If it comes right down
to it, how well you learn physics, how good a grade you get, and how much fun you have all
depend on how enthusiastically you tackle the learning process. If you remain disengaged,
detatched from the learning process, you almost certainly will do poorly and be miserable
while doing it. If you can find any degree of engagement – or open enthusiasm – with the
learning process you will very likely do well, or at least as well as possible.
Note that I use the term learning, not teaching – this is to emphasize from the beginning
that learning is a choice and that you are in control. Learning is active; being taught is pas-
sive. It is up to you to seize control of your own educational process and fully participate,
not sit back and wait for knowledge to be forcibly injected into your brain.
You may find yourself stuck in a course that is taught in a traditional way, by an instructor
that lectures, assigns some readings, and maybe on a good day puts on a little dog-and-
pony show in the classroom with some audiovisual aids or some demonstrations. The
standard expectation in this class is to sit in your chair and watch, passive, taking notes.
No real engagement is “required” by the instructor, and lacking activities or a structure that
encourages it, you lapse into becoming a lecture transcription machine, recording all kinds
of things that make no immediate sense to you and telling yourself that you’ll sort it all out
later.
You may find yourself floundering in such a class – for good reason. The instructor
presents an ocean of material in each lecture, and you’re going to actually retain at most
a few cupfuls of it functioning as a scribe and passively copying his pictures and symbols
without first extracting their sense. And the lecture makes little sense, at least at first, and
reading (if you do any reading at all) does little to help. Demonstrations can sometimes
make one or two ideas come clear, but only at the expense of twenty other things that the
instructor now has no time to cover and expects you to get from the readings alone. You
continually postpone going over the lectures and readings to understand the material any
more than is strictly required to do the homework, until one day a big test draws nigh and
you realize that you really don’t understand anything and have forgotten most of what you
did, briefly, understand. Doom and destruction loom.
Sound familiar?
3
4 Getting Ready to Learn Physics
On the other hand, you may be in a course where the instructor has structured the
course with a balanced mix of open lecture (held as a freeform discussion where questions
aren’t just encouraged but required) and group interactive learning situations such as a
carefully structured recitation and lab where discussion and doing blend together, where
students teach each other and use what they have learned in many ways and contexts. If
so, you’re lucky, but luck only goes so far.
Even in a course like this you may still be floundering because you may not understand
why it is important for you to participate with your whole spirit in the quest to learn anything
you ever choose to study. In a word, you simply may not give a rodent’s furry behind about
learning the material so that studying is always a fight with yourself to “make” yourself do
it – so that no matter what happens, you lose. This too may sound very familiar to some.
The importance of engagement and participation in “active learning” (as opposed to
passively being taught) is not really a new idea. Medical schools were four year programs
in the year 1900. They are four year programs today, where the amount of information that
a physician must now master in those four years is probably ten times greater today than
it was back then. Medical students are necessarily among the most efficient learners on
earth, or they simply cannot survive.
In medical schools, the optimal learning strategy is compressed to a three-step adage:
See one, do one, teach one.
See a procedure (done by a trained expert).
Do the procedure yourself, with the direct supervision and guidance of a trained expert.
Teach a student to do the procedure.
See, do, teach. Now you are a trained expert (of sorts), or at least so we devoutly hope,
because that’s all the training you are likely to get until you start doing the procedure over
and over again with real humans and with limited oversight from an attending physician
with too many other things to do. So you practice and study on your own until you achieve
real mastery, because a mistake can kill somebody.
This recipe is quite general, and can be used to increase your own learning in almost
any class. In fact, lifelong success in learning with or without the guidance of a good
teacher is a matter of discovering the importance of active engagement and participation
that this recipe (non-uniquely) encodes. Let us rank learning methodologies in terms of
“probable degree of active engagement of the student”. By probable I mean the degree of
active engagement that I as an instructor have observed in students over many years and
which is significantly reinforced by research in teaching methodology, especially in physics
and mathematics.
Listening to a lecture as a transcription machine with your brain in “copy machine” mode
is almost entirely passive and is for most students probably a nearly complete waste of
time. That’s not to say that “lecture” in the form of an organized presentation and review
of the material to be learned isn’t important or is completely useless! It serves one very
important purpose in the grand scheme of learning, but by being passive during lecture
you cause it to fail in its purpose. Its purpose is not to give you a complete, line by line
transcription of the words of your instructor to ponder later and alone. It is to convey, for a
Getting Ready to Learn Physics 5
brief shining moment, the sense of the concepts so that you understand them.
It is difficult to sufficiently emphasize this point. If lecture doesn’t make sense to you
when the instructor presents it, you will have to work much harder to achieve the sense of
the material “later”, if later ever comes at all. If you fail to identify the important concepts
during the presentation and see the lecture as a string of disconnected facts, you will
have to remember each fact as if it were an abstract string of symbols, placing impossible
demands on your memory even if you are extraordinarily bright. If you fail to achieve some
degree of understanding (or synthesis of the material, if you prefer) in lecture by asking
questions and getting expert explanations on the spot, you will have to build it later out of
your notes on a set of abstract symbols that made no sense to you at the time. You might
as well be trying to translate Egyptian Hieroglyphs without a Rosetta Stone, and the best
of luck to you with that.
Reading is a bit more active – at the very least your brain is more likely to be somewhat
engaged if you aren’t “just” transcribing the book onto a piece of paper or letting the words
and symbols happen in your mind – but is still pretty passive. Even watching nifty movies
or cool-ee-oh demonstrations is basically sedentary – you’re still just sitting there while
somebody or something else makes it all happen in your brain while you aren’t doing
much of anything. At best it grabs your attention a bit better (on average) than lecture, but
you are mentally passive.
In all of these forms of learning, the single active thing you are likely to be doing is
taking notes or moving an eye muscle from time to time. For better or worse, the human
brain isn’t designed to learn well in passive mode. Parts of your brain are likely to take
charge and pull your eyes irresistably to the window to look outside where active things are
going on, things that might not be so damn boring!
With your active engagement, with your taking charge of and participating in the learn-
ing process, things change dramatically. Instead of passively listening in lecture, you can
at least try to ask questions and initiate discussions whenever an idea is presented that
makes no intial sense to you. Discussion is an active process even if you aren’t the one
talking at the time. You participate! Even a tiny bit of participation in a classroom setting
where students are constantly asking questions, where the instructor is constantly answer-
ing them and asking the students questions in turn makes a huge difference. Humans
being social creatures, it also makes the class a lot more fun!
In summary, sitting on your ass1 and writing meaningless (to you, so far) things down as
somebody says them in the hopes of being able to “study” them and discover their meaning
on your own later is boring and for most students, later never comes because you are busy
with many classes, because you haven’t discovered anything beautiful or exciting (which
is the reward for figuring it all out – if you ever get there) and then there is partying and
hanging out with friends and having fun. Even if you do find the time and really want to
succeed, in a complicated subject like physics you are less likely to be able to discover
the meaning on your own (unless you are so bright that learning methodology is irrelevant
and you learn in a single pass no matter what). Most introductory students are swamped
by the details, and have small chance of discovering the patterns within those details that
constitute “making sense” and make the detailed information much, much easier to learn
1
I mean, of course, your donkey. What did you think I meant?
6 Getting Ready to Learn Physics
by enabling a compression of the detail into a much smaller set of connected ideas.
Articulation of ideas, whether it is to yourself or to others in a discussion setting, re-
quires you to create tentative patterns that might describe and organize all the details you
are being presented with. Using those patterns and applying them to the details as they
are presented, you naturally encounter places where your tentative patterns are wrong, or
don’t quite work, where something “doesn’t make sense”. In an “active” lecture students
participate in the process, and can ask questions and kick ideas around until they do make
sense. Participation is also fun and helps you pay far more attention to what’s going on than
when you are in passive mode. It may be that this increased attention, this consideration
of many alternatives and rejecting some while retaining others with social reinforcement, is
what makes all the difference. To learn optimally, even “seeing” must be an active process,
one where you are not a vessel waiting to be filled through your eyes but rather part of a
team studying a puzzle and looking for the patterns together that will help you eventually
solve it.
Learning is increased still further by doing, the very essence of activity and engage-
ment. “Doing” varies from course to course, depending on just what there is for you to do,
but it always is the application of what you are learning to some sort of activity, exercise,
problem. It is not just a recapitulation of symbols: “looking over your notes” or “(re)reading
the text”. The symbols for any given course of study (in a physics class, they very likely will
be algebraic symbols for real although I’m speaking more generally here) do not, initially,
~ = q(~
mean a lot to you. If I write F ~ on the board, it means a great deal to me, but
v × B)
if you are taking this course for the first time it probably means zilch to you, and yet I pop
it up there, draw some pictures, make some noises that hopefully make sense to you at
the time, and blow on by. Later you read it in your notes to try to recreate that sense, but
you’ve forgotten most of it. Am I describing the income I expect to make selling B ~ tons of
barley with a market value of ~v and a profit margin of q?
To learn this expression (for yes, this is a force law of nature and one that we very
much must learn this semester) we have to learn what the symbols stand for – q is the
charge of a point-like object in motion at velocity ~v in a magnetic field B, ~ and F ~ is the
resulting force acting on the particle. We have to learn that the × symbol is the cross
product of evil (to most students at any rate, at least at first). In order to get a gut feeling
for what this equation represents, for the directions associated with the cross product, for
the trajectories it implies for charged particles moving in a magnetic field in a variety of
contexts one has to use this expression to solve problems, see this expression in action in
laboratory experiments that let you prove to yourself that it isn’t bullshit and that the world
really does have cross product force laws in it. You have to do your homework that involves
this law, and be fully engaged.
The learning process isn’t exactly linear, so if you participate fully in the discussion
and the doing while going to even the most traditional of lectures, you have an excellent
chance of getting to the point where you can score anywhere from a 75% to an 85% in the
course. In most schools, say a C+ to B+ performance. Not bad, but not really excellent.
A few students will still get A’s – they either work extra hard, or really like the subject, or
they have some sort of secret, some way of getting over that barrier at the 90’s that is only
crossed by those that really do understand the material quite well.
Getting Ready to Learn Physics 7
Here is the secret for getting yourself over that 90% hump, even in a physics class
(arguably one of the most difficult courses you can take in college), even if you’re not a
super-genius (or have never managed in the past to learn like one, a glance and you’re
done): Work in groups! In fact, a really good course (in my opinion) is one where the
entire learning process is organized around student teams, basically carefully contructed,
semi-permanent groups where each member is at least partly responsible for the effective
learning of all the team members, not just themselves!
That’s it. Nothing really complex or horrible, just get together with your friends who are
also taking the course and do your homework together. In a well designed physics course
(and many courses in mathematics, economics, and other subjects these days) you’ll have
some aspects of the class, such as a recitation or lab, where you are required to work in
groups/teams, and the teams and team activities may be highly structured or freeform.
“Studio” or “Team Based Learning” for teaching physics have even interleaved the lec-
ture itself with team-based active learning, so everything is done in teams. This makes
it it nearly impossible to be disengaged and sit passively in class waiting for learning to
“happen”. It also yields measureable improvements (all things being equal) on at least
some objective instruments for measurement of learning, although (long story) measuring
learning is a lot harder than you might think...
If you take charge of your own learning, though, you will quickly see that in any course,
however it is formally organized and taught, you can study in a group! This is true even in
a course where “the homework” is to be done alone by fiat of the (unfortunately ignorant
and misguided) instructor. Just study “around” the actual assignment – assign yourselves
problems “like” the actual assignment – most textbooks have plenty of extra problems and
then there is the Internet and other textbooks – and do them in a group, then (afterwards!)
break up and do your actual assignment alone. Note that if you use a completely different
textbook to pick your group problems from and do them together before looking at your
assignment in your textbook, you can’t even be blamed if some of the ones you pick turn
out to be ones your instructor happened to assign.
Oh, and not-so-subtly – give the instructor a (link to a) PDF copy of this book (it’s as
free for instructors as it is for students, after all, just a click away on the Internet). Who
knows? Maybe they will give some of these ideas a try!
Let’s understand in more detail why working on hard problems in teams often has a
dramatic effect on learning. What happens when a team works together? Well, a lot of
discussion happens, because humans working on a common problem like to talk. There
is plenty of doing going on, presuming that the group has a common task list to work
through, like a small mountain of really difficult problems that nobody can possibly solve
working on their own and are barely within their abilities working as a group backed up
by the course instructor! Finally, in team-based learning everybody has the opportunity to
teach!
The importance of teaching – not only seeing the lecture presentation with your whole
brain actively engaged and participating in an ongoing discussion so that it makes sense at
the time, not only doing lots of homework problems and exercises that apply the material
in some way, but articulating what you have discovered in this process and answering
8 Getting Ready to Learn Physics
questions that force you to consider and reject alternative solutions or pathways (or not)
cannot be overemphasized. Teaching each other in a peer setting (ideally with mentorship
and oversight to keep you from teaching each other mistakes) is essential!
This problem you “get”, and teach others (and actually learn it better from teaching
it than they do from your presentation – never begrudge the effort required to teach your
fellow team members even if some of them are very slow to understand). The next problem
you don’t get but some other group member does – they get to teach you. In the end you
all learn far more about every problem as a consequence of the struggle, the exploration
of false paths, the discovery and articulation of the correct path, the process of discussion,
resolution and agreement in teaching whereby everybody in the team hopefully reaches
full understanding.
Note that success in this last key metric depends on you and you alone. No teach-
ing/learning approach will help you learn if you quit halfway there. Some approaches make
it easier, some harder, but in the end you bear the ultimate responsibility for your own ac-
tive, engaged learning. When you have completed see, do, teach, you have achieved a
critical milestone on the path to comprehension.
I would assert that it is all but impossible for someone to become a (halfway decent)
teacher of anything without learning along the way that the absolute best way to learn any
set of material deeply is to teach it – it is the very foundation of Academe and has been for
two or three thousand years. It is, as we have noted, built right into the intensive learning
process of medical school and graduate school in general. For some reason, however, we
don’t incorporate a teaching component in most undergraduate classes, which is a shame,
and it is basically nonexistent in nearly all K-12 schools, which is an open tragedy.
As an engaged student you don’t have to live with that! Put it there yourself, by in-
corporating group study and mutual teaching into your learning process with or without
the help or permission of your teachers! A really smart and effective team soon learns
to iterate the teaching – I teach you, and to make sure you got it you immediately use
the material I taught you and try to articulate it back to me. Eventually everybody on the
team understands, everybody on the team benefits, everybody on the team gets the best
possible grade on the material. This process will actually make you (quite literally) more
intelligent. You may or may not manage to lock down an A, but you will get the best grade
you are capable of getting, for your given investment of effort.
This is close to the ultimate in engagement – highly active learning, with all cylinders of
your brain firing away on the process. You can see why learning is enhanced. It is simply
a bonus, a sign of a just and caring God, that it is also a lot more fun to work in a team,
especially in a relaxed context with food and drink present. Yes, I’m encouraging you to
have “physics study parties” (or history study parties, or psychology study parties). Hold
contests. Give silly prizes. See. Do. Teach.
Learning isn’t only dependent on the engagement pattern implicit in the See, Do, Teach
rule. Let’s absorb a few more True Facts about learning, in particular let’s come up with a
Getting Ready to Learn Physics 9
handful of things that can act as “switches” and turn your ability to learn on and off quite
independent of how your instructor structures your courses. Most of these things aren’t
binary switches – they are more like dimmer switches that can be slid up between dim (but
not off) and bright (but not fully on). Some of these switches, or environmental parameters,
act together more powerfully than they act alone. We’ll start with the most important pair,
a pair that research has shown work together to potentiate or block learning.
Instead of just telling you what they are, arguing that they are important for a paragraph
or six, and moving on, I’m going to give you an early opportunity to practice active learning
in the context of reading a chapter on active learning. That is, I want you to participate in a
tiny mini-experiment. It works a little bit better if it is done verbally in a one-on-one meeting,
but it should still work well enough even if it is done in this text that you are reading.
I going to give you a string of ten or so digits and ask you to glance at it one time for a
count of three and then look away. No fair peeking once your three seconds are up! Then
I want you to do something else for at least a minute – anything else that uses your whole
attention and interrupts your ability to rehearse the numbers in your mind in the way that
you’ve doubtless learned permits you to learn other strings of digits, such as holding your
mind blank, thinking of the phone numbers of friends or your social security number. Even
rereading this paragraph will do.
At the end of the minute, try to recall the number I gave you and write down what you
remember. Then turn back to right here and compare what you wrote down with the actual
number.
Ready? (No peeking yet...) Set? Go!
Ok, here it is, in a footnote at the bottom of the page to keep your eye from naturally
reading ahead to catch a glimpse of it while reading the instructions above2.
How did you do?
If you are like most people, this string of numbers is a bit too long to get into your
immediate memory or visual memory in only three seconds. There was very little time for
rehearsal, and then you went and did something else for a bit right away that was supposed
to keep you from rehearsing whatever of the string you did manage to verbalize in three
seconds. Most people will get anywhere from the first three to as many as seven or eight
of the digits right, but probably not in the correct order, unless...
...they are particularly smart or lucky and in that brief three second glance have time to
notice that the number consists of all the digits used exactly once! Folks that happened to
“see” this at a glance probably did better than average, getting all of the correct digits but
maybe in not quite the correct order.
People who are downright brilliant (and equally lucky) realized in only three seconds
(without cheating an extra second or three, you know who you are) that it consisted of
the string of odd digits in ascending order followed by the even digits in descending or-
der. Those people probably got it all perfectly right even without time to rehearse and
“memorize” the string! Look again at the string, see the pattern now?
The moral of this little mini-demonstration is that it is easy to overwhelm the mind’s
2
1357986420 (one, two, three, quit and do something else for one minute...)
10 Getting Ready to Learn Physics
A chess master, on the other hand, can play umpty games at once, blindfolded, against
pitiful fools like myself and when they’ve finished winning them all they can go back and
recontruct each one move by move, criticizing each move as they go. Often they can
remember the games in their entirety days or even years later.
This isn’t just because they are smarter – they might be completely unable to derive
the Lorentz group from first principles, and I can, and this doesn’t automatically make me
smarter than them either. It is because chess makes sense to them – they’ve achieved a
deep understanding of the game, as it were – and they’ve built a complex meta-structure
memory in their brains into which they can poke chess moves so that they can be retrieved
extremely efficiently. This gives them the attendant capability of searching vast portions of
the game tree at a glance, where I have to tediously work through each branch, one step
at a time, usually omitting some really important possibility because I don’t realize that that
particular knight on the far side of the board can affect things on this side where we are
both moving pieces.
This sort of “deep” (synthetic) understanding of physics is very much the goal of this
course (the one in the textbook you are reading, since I use this intro in many textbooks),
and to achieve it you must not memorize things as if they are random factoids, you must
work to abstract the beautiful intertwining of patterns that compress all of those apparently
random factoids into things that you can easily remember offhand, that you can easily
reconstruct from the pattern even if you forget the details, and that you can search through
at a glance. But the process I describe can be applied to learning pretty much anything,
as patterns and structure exist in abundance in all subjects of interest. There are even
sensible rules that govern or describe the anti-pattern of pure randomness!
There’s one more important thing you can learn from thinking over the digit experiment.
Some of you reading this very likely didn’t do what I asked, you didn’t play along with the
game. Perhaps it was too much of a bother – you didn’t want to waste a whole minute
learning something by actually doing it, just wanted to read the damn chapter and get it
over with so you could do, well, whatever the hell else it is you were planning to do today
that’s more important to you than physics or learning in other courses.
If you’re one of these people, you probably don’t remember any of the digit string at this
point from actually seeing it – you never even tried to memorize it. A very few of you may
actually be so terribly jaded that you don’t even remember the little mnemonic formula
I gave above for the digit string (although frankly, people that are that disengaged are
probably not about to do things like actually read a textbook in the first place, so possibly
not). After all, either way the string is pretty damn meaningless, pattern or not.
Pattern and meaning aren’t exactly the same thing. There are all sorts of patterns one
can find in random number strings, they just aren’t “real” (where we could wax poetic at
this point about information entropy and randomness and monkeys typing Shakespeare
or seeing fluffy white sheep in the clouds if this were a different course). So why bother
wasting brain energy on even the easy way to remember this string when doing so is utterly
unimportant to you in the grand scheme of all things?
From this we can learn the second humble and unsurprising conclusion I want you to
draw from this one elementary thought experiment. Things are easier to learn when you
12 Getting Ready to Learn Physics
care about learning them! In fact, they are damn near impossible to learn if you really don’t
care about learning them.
Let’s put the two observations together and plot them as a graph, just for fun (and
because graphs help one learn for reasons we will explore just a bit in a minute). If you
care about learning what you are studying, and the information you are trying to learn
makes sense (if only for a moment, perhaps during lecture), the chances of your learning
it are quite good. This alone isn’t enough to guarantee that you’ll learn it, but it they are
basically both necessary conditions, and one of them is directly connected to degree of
engagement.
0.8
Learning Performance
0.6
0.4
0.2
1
0.8
1
0.6 0.8
0.4 0.6
Care
0.4
0.2 Sense
0.2
0 0
On the other hand, if you care but the information you want to learn makes no sense,
or if it makes sense but you hate the subject, the instructor, your school, your life and just
don’t care, your chances of learning it aren’t so good, probably a bit better in the first case
than in the second as if you care you have a chance of finding someone or some way that
will help you make sense of whatever it is you wish to learn, where the person who doesn’t
cares, well, they don’t care. Why should they remember it?
If you don’t give a rat’s ass about the material and it makes no sense to you, go home.
Leave school. Do something else. You basically have almost no chance of learning the
material unless you are gifted with a transcendent intelligence (wasted on a dilettante who
lives in a state of perpetual ennui) and are miraculously gifted with the ability learn things
effortlessly even when they make no sense to you and you don’t really care about them.
All the learning tricks and study patterns in the world won’t help a student who doesn’t try,
doesn’t care, and for whom the material never makes sense.
If we worked at it, we could probably find other “logistic” controlling parameters to as-
sociate with learning – things that increase your probability of learning monotonically as
they vary. Some of them are already apparent from the discussion above. Let’s list a few
more of them with explanations just so that you can see how easy it is to sit down to study
and try to learn and have “something wrong” that decreases your ability to learn in that
particular place and time.
Getting Ready to Learn Physics 13
Learning is actual work and involves a fair bit of biological stress, just like working out.
Your brain needs food – it burns a whopping 20-30% of your daily calorie intake all
by itself just living day to day, even more when you are really using it or are somewhat
sedentary in your physical habits so your consumption in the form of physical motion is
smaller than normal or healthy. Note that your brain runs on pure, energy-rich glucose, so
when your blood sugar drops your brain activity drops right along with it. This can happen
(paradoxically) because you just ate a carbohydrate rich meal. A balanced diet containing
foods with a lower glycemic index3 tends to be harder to digest and provides a longer
period of sustained energy for your brain. A daily multivitamin (and sometimes various
antioxidant or metabolic supplements such as alpha lipoic acid) can also help maintain
your body’s energy release mechanisms at the cellular level.
Blood sugar is typically lowest first thing in the morning, so this is a lousy time to actively
study. On the other hand, a good hearty breakfast, eaten at least an hour before plunging
in to your studies, is a great idea and is a far better habit to develop for a lifetime than
eating no breakfast and instead eating a huge meal right before bed4
Learning requires adequate sleep. Sure this is tough to manage at college – there are
no parents to tell you to go to bed, lots of things to do, and of course you’re in class during
the day and then you study, so late night is when you have fun. Unfortunately, learning is
clearly correlated with engagement, activity, and mental alertness, and all of these tend to
shut down when you’re tired. Furthermore, the formation of long term memory of any kind
from a day’s experiences has been shown in both animal and human studies to depend on
the brain undergoing at least a few natural sleep cycles of deep sleep alternating with REM
(Rapid Eye Movement) sleep, dreaming sleep. Rats taught a maze and then deprived of
REM sleep cannot run the maze well the next day; rats that are taught the same maze but
that get a good night’s of rat sleep with plenty of rat dreaming can run the maze well the
next day. People conked on the head who remain unconscious for hours and are thereby
deprived of normal sleep often have permanent amnesia of the previous day – it never gets
turned into long term memory.
Wikipedia: http://www.wikipedia.org/wiki/Sleep Apnea is also a great undiagnosed epi-
demic (e.g. 24% of all males by late middle age, most of them untreated) and can seriously
affect learning. Indeed, if you have any variation of Attention Deficit Disorder (ADD) and
snore, or have any symptoms of interrupted sleep due to breathing interruption or
e.g. restless legs you should probably read about the co-morbidity of sleep disorders and
ADD5 and talk to your doctor to make sure that you really have ADD and are not suffering
from a sleep disorder, as the two can actually result in nearly identical daytime symptoms,
including difficulty learning!
This is hardly surprising. Pure common sense and experience tell you that your brain
won’t work too well if it is hungry and tired or oxygen deprived. Common sense (and
yes, experience) will rapidly convince you that learning generally works better if you’re not
stoned or drunk when you study. Learning works much better when you have time to learn
3
Wikipedia: http://www.wikipedia.org/wiki/glycemic index.
4
...which is, alas, my own pattern unless I’m careful, made into a habit back in college. It seemed to work
a lot better at age 20 than it does at age 60...
5
A Clinical Overview of Sleep and Attention-Deficit/Hyperactivity Disorder in Children and Adolescents
14 Getting Ready to Learn Physics
and haven’t put everything off to the last minute. In fact, all of Maslow’s hierarchy of needs6
are important parameters that contribute to the probability of success in learning.
There is one more set of very important variables that strongly affect our ability to learn,
and they are in some ways the least well understood. These are variables that describe
you as an individual, that describe your particular brain and how it works. Pretty much
everybody will learn better if they are self-actualized and fully and actively engaged, if the
material they are trying to learn is available in a form that makes sense and clearly com-
municates the implicit patterns that enable efficient information compression and storage,
and above all if they care about what they are studying and learning, if it has value to them.
But everybody is not the same, and the optimal learning strategy for one person is
not going to be what works well, or even at all, for another. This is one of the things that
confounds “simple” empirical research that attempts to find benefit in one teaching/learning
methodology over another. Some students do improve, even dramatically improve – when
this or that teaching/learning methodology is introduced. In others there is no change. Still
others actually do worse. In the end, the beneficial effect to a selected subgroup of the
students may be lost in the statistical noise of the study and the fact that no attempt is
made to identify commonalities among students that succeed or fail.
The point is that finding an optimal teaching and learning strategy is technically an op-
timization problem on a high dimensional space. We’ve discussed some of the important
dimensions above, isolating a few that appear to have a monotonic effect on the desired
outcome in at least some range (relying on common sense to cut off that range or suggest
trade-offs – one cannot learn better by simply discussing one idea for weeks at the ex-
pense of participating in lecture or discussing many other ideas of equal and coordinated
importance; sleeping for twenty hours a day leaves little time for experience to fix into long
term memory with all of that sleep). We’ve omitted one that is crucial, however. That is
your brain!
Your brain is more than just a unique instrument. In some sense it is you. You could
imagine having your brain removed from your body and being hooked up to machinary that
provided it with sight, sound, and touch in such a way that “you” remain7 . It is difficult to
imagine that you still exist in any meaningful sense if your brain is taken out of your body
and destroyed while your body is artificially kept alive.
Your brain, however, is an instrument. It has internal structure. It uses energy. It does
“work”. It is, in fact, a biological machine of sublime complexity and subtlety, one of the true
wonders of the world! Note that this statement can be made quite independent of whether
“you” are your brain per se or a spiritual being who happens to be using it (a debate that
6
Wikipedia: http://www.wikipedia.org/wiki/Maslow’s hierarchy of needs. In a nutshell, in order to become
self-actualized and realize your full potential in activities such as learning you need to have your physiological
needs met, you need to be safe, you need to be loved and secure in the world, you need to have good self-
esteem and the esteem of others. Only then is it particularly likely that you can become self-actualized and
become a great learner and problem solver.
7
Imagine very easily if you’ve ever seen The Matrix movie trilogy...
Getting Ready to Learn Physics 15
need not concern us at this time, however much fun it might be to get into it) – either way
the brain itself is quite marvelous.
For all of that, few indeed are the people who bother to learn to actually use their brain
effectively as an instrument. It just works, after all, whether or not we do this. Which is fine.
If you want to get the most mileage out of it, however, it helps to read the manual.
So here’s at least one user manual for your brain. It is by no means complete or
authoritative, but it should be enough to get you started, to help you discover that you are
actually a lot smarter than you think, or that you’ve been in the past, once you realize that
you can change the way you think and learn and experience life and gradually improve it.
In the spirit of the learning methodology that we eventually hope to adopt, let’s simply
itemize in no particular order the various features of the brain8 that bear on the process
of learning. Bear in mind that such a minimal presentation is more of a metaphor than
anything else because simple (and extremely common) generalizations such as “creativity
is a right-brain function” are not strictly true as the brain is far more complex than that.
• The brain is bicameral: it has two cerebral hemispheres9 , right and left, with brain
functions asymmetrically split up between them.
• The brain’s hemispheres are connected by a networked membrane called the corpus
callosum that is how the two halves talk to each other.
• The human brain consists of layers with a structure that recapitulates evolutionary
phylogeny; that is, the core structures are found in very primitive animals and com-
mon to nearly all vertebrate animals, with new layers (apparently) added by evolution
on top of this core as the various phyla differentiated, fish, amphibian, reptile, mam-
mal, primate, human. The outermost layer where most actual thinking occurs (in
animals that think) is known as the cerebral cortex.
• The cerebral cortex 10 – especially the outermost layer of it called the neocortex – is
where “higher thought” activities associated with learning and problem solving take
place, although the brain is a very complex instrument with functions spread out over
many regions.
actual chemicals responsible for the triggered functioning of neurons and hence the
neural network in the cortex that spans the halves of the brain.
• Parts of the cortex are devoted to the senses. These parts often contain a map of
sorts of the world as seen by the associated sense mechanism. For example, there
exists a topographic map in the brain that roughly corresponds to points in the retina,
which in turn are stimulated by an image of the outside world that is projected onto
the retina by your eye’s lens in a way we will learn about later in this course! There is
thus a representation of your visual field laid out inside your brain!
• Similar maps exist for the other senses, although sensations from the right side of
your body are generally processed in a laterally inverted way by the opposite hemi-
sphere of the brain. What your right eye sees, what your right hand touches, is
ultimately transmitted to a sensory area in your left brain hemisphere and vice versa,
and volitional muscle control flows from these brain halves the other way.
• You can also block neurotransmitters by chemical means, put neurotransmitter ana-
logues into your system, and alter the chemical trigger potentials of your neurons
by taking various drugs, poisons, or hormones. The biochemistry of your brain is
extremely important to its function, and (unfortunately) is not infrequently a bit “out
of whack” for many individuals, resulting in e.g. attention deficit or mood disorders
that can greatly affect one’s ability to easily learn while leaving one otherwise highly
functional.
• Intelligence15 , learning ability, and problem solving capabilities are not fixed; they
can vary (often improving) over your whole lifetime! Your brain is highly plastic and
can sometimes even reprogram itself to full functionality when it is e.g. damaged by
a stroke or accident. On the other hand neither is it infinitely plastic – any given brain
has a range of accessible capabilities and can be improved only to a certain point.
However, for people of supposedly “normal” intelligence and above, it is by no means
clear what that point is! Note well that intelligence is an extremely controversial
subject and you should not take things like your own measured “IQ” too seriously.
• Intelligence is not even fixed within a population over time. A phenomenon known
as “the Flynn effect”16 (after its discoverer) suggests that IQ tests have increased
almost six points a decade, on average, over a timescale of tens of years, with most
of the increases coming from the lower half of the distribution of intelligence. This
is an active area of research (as one might well imagine) and some of that research
has demonstrated fairly conclusively that individual intelligences can be improved by
five to ten points (a significant amount) by environmentally correlated factors such as
nutrition, education, complexity of environment.
15
Wikipedia: http://www.wikipedia.org/wiki/intelligence.
16
Wikipedia: http://www.wikipedia.org/wiki/flynn effect.
Getting Ready to Learn Physics 17
• The best time for the brain to learn is right before sleep. The process of sleep appears
to “fix” long term memories in the brain and things one studies right before going to
bed are retained much better than things studied first thing in the morning. Note that
this conflicts directly with the party/entertainment schedule of many students, who
tend to study early in the evening and then amuse themselves until bedtime. It works
much better the other way around.
• Sensory memory17 corresponds to the roughly 0.5 second (for most people) that
a sensory impression remains in the brain’s “active sensory register”, the sensory
cortex. It can typically hold less than 12 “objects” that can be retrieved. It quickly
decays and cannot be improved by rehearsal, although there is some evidence that
its object capacity can be improved over a longer term by practice.
• Short term memory is where some of the information that comes into sensory mem-
ory is transferred. Just which information is transferred depends on where one’s
“attention” is, and the mechanics of the attention process are not well understood
and are an area of active research. Attention acts like a filtering process, as there is
a wealth of parallel information in our sensory memory at any given instant in time
but the thread of our awareness and experience of time is serial. We tend to “pay
attention” to one thing at a time. Short term memory lasts from a few seconds to as
long as a minute without rehearsal, and for nearly all people it holds 4 − 5 objects18 .
However, its capacity can be increased by a process called “chunking” that is basi-
cally the information compression mechanism demonstrated in the earlier example
with numbers – grouping of the data to be recalled into “objects” that permit a larger
set to still fit in short term memory.
• Studies of chunking show that the ideal size for data chunking is three. That is, if you
try to remember the string of letters:
FBINSACIAIBMATTMSN
with the usual three second look you’ll almost certainly find it impossible. If, however,
I insert the following spaces:
FBI NSA CIA IBM ATT MSN
It is suddenly much easier to get at least the first four. If I parenthesize:
(FBI NSA CIA) (IBM ATT MSN)
so that you can recognize the first three are all government agencies in the gen-
eral category of “intelligence and law enforcement” and the last three are all market
symbols for information technology mega-corporations, you can once again recall
the information a day later with only the most cursory of rehearsals. You’ve taken
eighteen ”random” objects that were meaningless and could hence be recalled only
through the most arduous of rehearsal processes, converted them to six “chunks” of
three that can be easily tagged by the brain’s existing long term memory (note that
you are not learning the string FBI, you are building an association to the already
17
Wikipedia: http://www.wikipedia.org/wiki/memory. Several items in a row are connected to this page.
18
From this you can see why I used ten digits, gave you only a few seconds to look, and blocked rehearsal
in our earlier exercise.
18 Getting Ready to Learn Physics
existing memory of what the string FBI means, which is much easier for the brain to
do), and chunking the chunks into two objects.
Eighteen objects without meaning – difficult indeed! Those same eighteen objects
with meaning – umm, looks pretty easy, doesn’t it...
Short term memory is still that – short term. It typically decays on a time scale that
ranges from minutes for nearly everything to order of a day for a few things unless
the information can be transferred to long term memory. Long term memory is the
big payoff – learning is associated with formation of long term memory.
• Now we get to the really good stuff. Long term is memory that you form that lasts
a long time in human terms. A “long time” can be days, weeks, months, years, or
a lifetime. Long term memory is encoded completely differently from short term or
sensory/immediate memory – it appears to be encoded semantically 19 , that is to
say, associatively in terms of its meaning. There is considerable evidence for this,
and it is one reason we focus so much on the importance of meaning in the previous
sections.
To miraculously transform things we try to remember from “difficult” to learn random
factoids that have to be brute-force stuffed into disconnected semantic storage units
created as it were one at a time for the task at hand into “easy” to learn factoids,
all we have to do is discover meaning associations with things we already know, or
create a strong memory of the global meaning or conceptualization of a subject that
serves as an associative home for all those little factoids.
A characteristic of this as a successful process is that when one works systematically
to learn by means of the latter process, learning gets easier as time goes on. Every
factoid you add to the semantic structure of the global conceptualization strengthens
it, and makes it even easier to add new factoids. In fact, the mind’s extraordinary
rational capacity permits it to interpolate and extrapolate, to fill in parts of the struc-
ture on its own without effort and in many cases without even being exposed to the
information that needs to be “learned”!
• One area where this extrapolation is particularly evident and powerful is in math-
ematics. Any time we can learn, or discover from experience a formula for some
phenomenon, a mathematical pattern, we don’t have to actually see something to
be able to “remember” it. Once again, it is easy to find examples. If I give you data
from sales figures over a year such as January = $1000, October = $10,000, Decem-
ber = $12,000, March=$3000, May = $5000, February = $2000, September = $9000,
June = $6000, November = $11,000, July = $7000, August = $8000, April = $4000, at
first glance they look quite difficult to remember. If you organize them temporally by
month and look at them for a moment, you recognize that sales increased linearly by
month, starting at $1000 in January, and suddenly you can reduce the whole series
to a simple mental formula (straight line) and a couple pieces of initial data (slope
and starting point). One amazing thing about this is that if I asked you to “remember”
something that you have not seen, such as sales in February in the next year, you
could make a very plausible guess that they will be $14,000!
19
Wikipedia: http://www.wikipedia.org/wiki/semantics.
Getting Ready to Learn Physics 19
Note that this isn’t a memory, it is a guess. Guessing is what the mind is designed
to do, as it is part of the process by which it “predicts the future” even in the most
mundane of ways. When I put ten dollars in my pocket and reach in my pocket for it
later, I’m basically guessing, on the basis of my memory and experience, that I’ll find
ten dollars there. Maybe my guess is wrong – my pocket could have been picked20 ,
maybe it fell out through a hole. My concept of object permanence plus my memory
of an initial state permit me to make a predictive guess about the Universe!
This is, in fact, physics! This is what physics is all about – coming up with a set of
rules (like conservation of matter) that encode observations of object permanence,
more rules (equations of motion) that dictate how objects move around, and allow
me to conclude that “I put a ten dollar bill, at rest, into my pocket, and objects at
rest remain at rest. The matter the bill is made of cannot be created or destroyed
and is bound together in a way that is unlikely to come apart over a period of days.
Therefore the ten dollar bill is still there!” Nearly anything that you do or that happens
in your everyday life can be formulated as a predictive physics problem.
• The hippocampus21 appears to be partly responsible for both forming spatial maps
or visualizations of your environment and also for forming the cognitive map that or-
ganizes what you know and transforms short term memory into long term memory,
and it appears to do its job (as noted above) in your sleep. Sleep deprivation prevents
the formation of long term memory. Being rendered unconscious for a long period
often produces short term amnesia as the brain loses short term memory before it
gets put into long term memory. The hippocampus shows evidence of plasticity –
taxi drivers who have to learn to navigate large cities actually have larger than nor-
mal hippocampi, with a size proportional to the length of time they’ve been driving.
This suggests (once again) that it is possible to deliberately increase the capacity of
your own hippocampus through the exercise of its functions, and consequently in-
crease your ability to store and retrieve information, which is an important component
(although not the only component) of intelligence!
• Memory is improved by increasing the supply of oxygen to the brain, which is best
accomplished by exercise. Unsurprisingly. Indeed, as noted above, having good gen-
eral health, good nutrition, good oxygenation and perfusion – having all the biomech-
anism in tip-top running order – is perfectly reasonably linked to being able to perform
at your best in anything, mental activity included.
• Finally, the amygdala22 is a brain organ in our limbic system (part of our “old”, reptile
brain). The amygdala is an important part of our emotional system. It is associated
with primitive survival responses, with sexual response, and appears to play a key
role in modulating (filtering) the process of turning short term memory into long term
memory. Basically, any sort term memory associated with a powerful emotion is
much more likely to make it into long term memory.
20
With three sons constantly looking for funds to attend movies and the like, it isn’t as unlikely as you might
think!
21
Wikipedia: http://www.wikipedia.org/wiki/hippocampus.
22
Wikipedia: http://www.wikipedia.org/wiki/amygdala.
20 Getting Ready to Learn Physics
There are clear evolutionary advantages to this. If you narrowly escape being killed
by a saber-toothed tiger at a particular pool in the forest, and then forget that this
happened by the next day and return again to drink there, chances are decent that
the saber-tooth is still there and you’ll get eaten. On the other hand, if you come
upon a particular fruit tree in that same forest and get a free meal of high quality food
and forget about the tree a day later, you might starve.
We see that both negative and positive emotional experiences are strongly correlated
with learning! Powerful experiences, especially, are correlated with learning. This
translates into learning strategies in two ways, one for the instructor and one for the
student. For the instructor, there are two general strategies open to helping students
learn. One is to create an atmosphere of fear, hatred, disgust, anger – powerful
negative emotions. The other is to create an atmosphere of love, security, humor,
joy – powerful positive emotions. In between there is a great wasteland of bo-ring,
bo-ring, bo-ring where students plod along, struggling to form memories because
there is nothing “exciting” about the course in either a positive or negative way and
so their amygdala degrades the memory formation process in favor of other more
“interesting” experiences.
Now, in my opinion, negative experiences in the classroom do indeed promote the for-
mation of long term memories, but they aren’t the memories the instructor intended.
The student is likely to remember, and loath, the instructor for the rest of their life but
is not more likely to remember the material except sporadically in association with
particularly traumatic episodes. They may well be less likely, as we naturally avoid
negative experiences and will study less and work less hard on things we can’t stand
doing.
For the instructor, then, positive is the way to go. Creating a warm, nurturing class-
room environment, ensuring that the students know that you care about their learning
and about them as individuals helps to promote learning. Making your lectures and
teaching processes fun – and funny – helps as well. Many successful lecturers make
a powerful positive impression on the students, creating an atmosphere of amaze-
ment or surprise. A classroom experience should really be a joy in order to optimize
learning in so many ways.
For the student, be aware that your attitude matters! As noted in previous sections,
caring is an essential component of successful learning because you have to attach
value to the process in order to get your amygdala to do its job. However, you can do
much more. You can see how many aspects of learning can be enhanced through
the simple expedient of making it a positive experience! Working in groups is fun,
and you learn more when you’re having fun (or quavering in abject fear, or in an
interesting mix of the two). Attending an interesting lecture is fun, and you’ll retain
more than average. Participation is fun, especially if you are “rewarded” in some way
that makes a moment or two special to you, and you’ll remember more of what goes
on.
From all of these little factoids (presented in a way that I’m hoping helps you to build
at least the beginnings of a working conceptual model of your own brain) I’m hoping that
you are coming to realize that all of this is at least partially under your control! Even if
Getting Ready to Learn Physics 21
your instructor is scary or boring, the material at first glance seems dry and meaningless,
and so on – all the negative-neutral things that make learning difficult, you can decide to
make it fun and exciting, you can ferret out the meaning, you can adopt study strategies
that focus on the formation of cognitive maps and organizing structures first and then on
applications, rehearsal, factoids, and so on, you can learn to study right before bed, get
enough sleep, become aware of your brain’s learning biorhythms.
Finally, you can learn to increase your functional learning capabilities by a significant
amount. Solving puzzles, playing mental games, doing crossword puzzles or sudoku,
working homework problems, writing papers, arguing and discussing, just plain thinking
about difficult subjects and problems even when you don’t have to all increase your active
intelligence in initially small but cumulative ways. You too can increase the size of your
hippocampus, learn to engage your amygdala by choosing in a self-actualized way what
you value and learning to discipline your emotions accordingly, and create more conceptual
maps within your brain that can be shared as components across the various things you
wish to learn. The more you know about anything, the easier it is to learn everything and
vice versa! This is the pure biology underlying the value of the liberal arts education.
Use your whole brain, exercise it often, don’t think that you “just” need math and not
spatial relations, visualization, verbal skills, a knowledge of history, a memory of performing
experiments with your hands or mind or both – you need it all! Remember, just as is the
case with physical exercise (which you should get plenty of), mental exercise gradually
makes you mentally stronger, so that you can eventually do easily things that at first appear
insurmountably difficult. You can learn to learn three to ten times as fast as you did in high
school, to have more fun while doing it, and to gain tremendous reasoning capabilities
along the way just by trying to learn to learn more efficiently instead of continuing to use
learning strategies that worked (possibly indifferently) back in elementary and high school.
The next section, at long last, will make a very specific set of suggestions for one
very good way to study physics (or nearly anything else) in a way that maximally takes
advantage of your own volitional biology to make learning as efficient and pleasant as it is
possible to be.
By now in your academic career (and given the information above) it should be very appar-
ent just where homework exists in the grand scheme of (learning) things. Ideally, you attend
a class where a warm and attentive professor clearly explains some abstruse concept and
a whole raft of facts in some moderately interactive way that encourages engagement and
“being earnest”. Alas, there are too many facts to fit in short term/immediate memory and
too little time to move most of them through into long term/working memory before finish-
ing with one and moving on to the next one. The material may appear to be boring and
random so that it is difficult to pay full attention to the patterns being communicated and
remain emotionally enthusiastic all the while to help the process along. As a consequence,
by the end of lecture you’ve already forgotten many if not most of the facts, but if you were
paying attention, asked questions as needed, and really cared about learning the material
22 Getting Ready to Learn Physics
you would remember a handful of the most important ones, the ones that made your brief
understanding of the material hang (for a brief shining moment) together.
This conceptual overview, however initially tenuous, is the skeleton you will eventu-
ally clothe with facts and experiences to transform it into an entire system of associative
memory and reasoning where you can work intellectually at a high level with little effort
and usually with a great deal of pleasure associated with the very act of thinking. But you
aren’t there yet.
You now know that you are not terribly likely to retain a lot of what you are shown in
lecture without engagement. In order to actually learn it, you must stop being a passive re-
cipient of facts. You must actively develop your understanding, by means of discussing the
material and kicking it around with others, by using the material in some way, by teaching
the material to peers as you come to understand it.
To help facilitate this process, associated with lecture your professor almost certainly
gave you an assignment. Amazingly enough, its purpose is not to torment you or to be
the basis of your grade (although it may well do both). It is to give you some concrete
stuff to do while thinking about the material to be learned, while discussing the material
to be learned, while using the material to be learned to accomplish specific goals, while
teaching some of what you figure out to others who are sharing this whole experience while
being taught by them in turn. The assignment is much more important than lecture, as it
is entirely participatory, where real learning is far more likely to occur. You could, once you
learn the trick of it, blow off lecture and do fine in a course in all other respects. If you fail
to do the assignments with your entire spirit engaged, you are doomed.
In other words, to learn you must do your homework, ideally at least partly in a group
setting. The only question is: how should you do it to both finish learning all that stuff you
sort-of-got in lecture and to re-attain the moment(s) of clarity that you then experienced,
until eventually it becomes a permanent characteristic of your awareness and you know
and fully understand it all on your own?
There are two general steps that need to be iterated to finish learning anything at all.
They are a lot of work. In fact, they are far more work than (passively) attending lecture,
and are more important than attending lecture. You can learn the material with these steps
without ever attending lecture, as long as you have access to what you need to learn in
some media or human form. You in all probability will never learn it, lecture or not, without
making a few passes through these steps. They are:
a) Review the whole (typically lecture, textbooks and/or notes, the Internet, videos...)
b) Work on the parts (do homework, and otherwise try to use what you are learning
for something)
(iterate until you thoroughly understand whatever it is you are trying to learn).
Let’s examine these steps.
The first is pretty obvious. You generally don’t “get it” (where “it” is almost anything
nontrivial you are trying to learn) from one lecture, from reading one textbook one time.
There is too much material, and it doesn’t initially make sense to you. If you are lucky
Getting Ready to Learn Physics 23
and well prepared and blessed with a good instructor, perhaps you grasp some of it for
a moment (and if your instructor is poor or you are particularly poorly prepared you may
not manage even that) but what you do momentarily understand is fading, flitting further
and further away with every moment that passes. You need to review the entire topic,
as a whole, as well as all its parts. A set of good summary notes might contain all the
relative factoids, but there are relations between those factoids – a temporal sequencing,
mathematical derivations connecting them to other things you know, a topical association
with other things that you know. They tell a story, or part of a story, and you need to know
that story in broad terms, not try to memorize it word for word.
Reviewing the material should be done in layers, skimming the textbook and your notes,
creating a new set of notes out of the text in combination with your lecture notes, maybe
reading in more detail to understand some particular point that puzzles you, reworking
a few of the examples presented. Lots of increasingly deep passes through it (starting
with the merest skim-reading or reading a summary of the whole thing) are much better
than trying to work through the whole text one line at a time and not moving on until you
understand it. Many things you might want to understand will only come clear from things
you are exposed to later, as it is not the case that all knowledge is ordinal, hierarchical,
and derivatory.
You especially do not have to work on memorizing the content. In fact, it is not de-
sireable to try to memorize content at this point – you want the big picture first so that
facts have a place to live in your brain. If you build them a house, they’ll move right in
without a fuss, where if you try to grasp them one at a time with no place to put them,
they’ll (metaphorically) slip away again as fast as you try to take up the next one. Let’s
understand this a bit.
As we’ve seen, your brain is fabulously efficient at storing information in a compressed
associative form. It also tends to remember things that are important – whatever that
means – and forget things that aren’t important to make room for more important stuff, as
your brain structures work together in understandable ways on the process. Building the
cognitive map, the “house”, is what it’s all about. But as it turns out, building this house
takes time.
This is the goal of your iterated review process. At first you are memorizing things
the hard way, trying to connect what you learn to very simple hierarchical concepts such
as this step comes before that step. As you do this over and over again, though, you
find that absorbing new information takes you less and less time, and you remember it
much more easily and for a longer time without additional rehearsal. Sometimes your
brain even outruns the learning process and “discovers” a missing part of the structure
before you even read about it! By reviewing the whole, well-organized structure over and
over again, you gradually build a greatly compressed representation of it in your brain and
tremendously reduce the amount of work required to flesh out that structure with increasing
levels of detail and remember them and be able to work with them for a long, long time.
Now let’s understand the second part of doing homework – working problems. As you
can probably guess on your own at this point, there are good ways and bad ways to do
homework problems. The worst way to do homework (aside from not doing it at all, which is
far too common a practice and a bad idea if you have any intention of learning the material)
24 Getting Ready to Learn Physics
is to do it all in one sitting, right before it is due, and to never again look at it.
Doing your homework in a single sitting, working on it just one time fails to repeat and
rehearse the material (essential for turning short term memory into long term in nearly
all cases). It exhausts the neurons in your brain (quite literally – there is metabolic energy
consumed in thinking) as one often ends up working on a problem far too long in one sitting
just to get done. It fails to incrementally build up in your brain’s long term memory the
structures upon which the more complex solutions are based, so you have to constantly
go back to the book to get them into short term memory long enough to get through a
problem. Even this simple bit of repetition does initiate a learning process. Unfortunately,
by not repeating the steps associated with the solution to this kind of problem after this one
sitting they soon fade, often without a discernable trace in long term memory.
Just as was the case in our experiment with memorizing the number above, the prob-
lems almost invariably are not going to be a matter of random noise. They have certain key
facts and ideas that are the basis of their solution, and those ideas are used over and over
again. There is plenty of pattern and meaning there for your brain to exploit in information
compression, and it may well be very cool stuff to know and hence important to you once
learned, but it takes time and repetition and a certain amount of meditation for the “gestalt”
of it to spring into your awareness and burn itself into your conceptual memory as “high
order understanding”.
You have to give it this time, and perform the repetitions, while maintaining an optimistic,
philosophical attitude towards the process. You have to do your best to have fun with it.
You don’t get strong by lifting light weights a single time. You get strong lifting weights re-
peatedly, starting with light weights to be sure, but then working up to the heaviest weights
you can manage. When you do build up to where you’re lifting hundreds of pounds, the
fifty pounds you started with seems light as a feather to you.
As with the body, so with the brain. Repeat broad strokes for the big picture with
increasingly deep and “heavy” excursions into the material to explore it in detail as the
overall picture emerges. Intersperse this with sessions where you work on problems and
try to use the material you’ve figured out so far. Be sure to discuss it and teach it to others
as you go as much as possible, as articulating what you’ve figured out to others both uses
a different part of your brain than taking it in (and hence solidifies the memory) and it helps
you articulate the ideas to yourself! This process will help you learn more, better, faster
than you ever have before, and to have fun doing it!
Your brain is more complicated than you think. You are very likely used to working hard
to try to make it figure things out, but you’ve probably observed that this doesn’t work very
well. A lot of times you simply cannot “figure things out” because your brain doesn’t yet
know the key things required to do this, or doesn’t “see” how those parts you do know fit
together. Learning and discovery is not, alas, “intentional” – it is more like trying to get a
bird to light on your hand that flits away the moment you try to grasp it.
People who do really hard crossword puzzles (one form of great brain exercise) have
learned the following. After making a pass through the puzzle and filling in all the words
they can “get”, and maybe making a couple of extra passes through thinking hard about
ones they can’t get right away, looking for patterns, trying partial guesses, they arrive at an
Getting Ready to Learn Physics 25
impasse. If they continue working hard on it, they are unlikely to make further progress, no
matter how long they stare at it.
On the other hand, if they put the puzzle down and do something else for a while –
especially if the something else is go to bed and sleep – when they come back to the puzzle
they often can immediately see a dozen or more words that the day before were absolutely
invisible to them. Sometimes one of the long theme answers (perhaps 25 characters long)
where they have no more than two letters just “gives up” – they can simply “see” what the
answer must be.
Where do these answers come from? The person has not “figured them out”, they have
“recognized” them. They come all at once, and they don’t come about as the result of a
logical sequential process.
Often they come from the person’s right brain23 . The left brain tries to use logic and
simple memory when it works on crosswork puzzles. This is usually good for some words,
but for many of the words there are many possible answers and without any insight one
can’t even recall one of the possibilities. The clues don’t suffice to connect you up to a word.
Even as letters get filled in this continues to be the case, not because you don’t know the
word (although in really hard puzzles this can sometimes be the case) but because you
don’t know how to recognize the word “all at once” from a cleverly nonlinear clue and a few
letters in this context.
The right brain is (to some extent) responsible for insight and non-linear thinking. It
sees patterns, and wholes, not sequential relations between the parts. It isn’t intentional
– we can’t “make” our right brains figure something out, it is often the other way around!
Working hard on a problem, then “sleeping on it” (to get that all important hippocampal
involvement going) is actually a great way to develop “insight” that lets you solve it without
really working terribly hard after a few tries. It also utilizes more of your brain – left and right
brain, sequential reasoning and insight, and if you articulate it, or use it, or make some-
thing with your hands, then it exercieses these parts of your brain as well, strengthening
the memory and your understanding still more. The learning that is associated with this
process, and the problem solving power of the method, is much greater than just working
on a problem linearly the night before it is due until you hack your way through it using
information assembled a part at a time from the book.
The following “Method of Three Passes” is a specific strategy that implements many
of the tricks discussed above. It is known to be effective for learning by means of do-
ing homework (or in a generalized way, learning anything at all). It is ideal for “problem
oriented homework”, and will pay off big in learning dividends should you adopt it, espe-
cially when supported by a group oriented recitation with strong tutorial support and many
opportunities for peer discussion and teaching.
23
Note that this description is at least partly metaphor, for while there is some hemispherical specialization
of some of these functions, it isn’t always sharp. I’m retaining them here (oh you brain specialists who might
be reading this) because they are a valuable metaphor.
26 Getting Ready to Learn Physics
Pass 1 Three or more nights before recitation (or when the homework is due), make a fast
pass through all problems. Plan to spend 1-1.5 hours on this pass. With roughly 10-
12 problems, this gives you around 6-8 minutes per problem. Spend no more than
this much time per problem and if you can solve them in this much time fine, otherwise
move on to the next. Try to do this the last thing before bed at night (seriously) and
then go to sleep.
Pass 2 After at least one night’s sleep, make a medium speed pass through all problems.
Plan to spend 1-1.5 hours on this pass as well. Some of the problems will already be
solved from the first pass or nearly so. Quickly review their solution and then move on
to concentrate on the still unsolved problems. If you solved 1/4 to 1/3 of the problems
in the first pass, you should be able to spend 10 minutes or so per problem in the
second pass. Again, do this right before bed if possible and then go immediately to
sleep.
Pass 3 After at least one night’s sleep, make a final pass through all the problems. Begin
as before by quickly reviewing all the problems you solved in the previous two passes.
Then spend fifteen minutes or more (as needed) to solve the remaining unsolved
problems. Leave any “impossible” problems for recitation – there should be no more
than three from any given assignment, as a general rule. Go immediately to bed.
This is an extremely powerful prescription for deeply learning nearly anything. Here
is the motivation. Memory is formed by repetition, and this obviously contains a lot of
that. Permanent (long term) memory is actually formed in your sleep, and studies have
shown that whatever you study right before sleep is most likely to be retained. Physics is
actually a “whole brain” subject – it requires a synthesis of both right brain visualization and
conceptualization and left brain verbal/analytical processing – both geometry and algebra,
if you like, and you’ll often find that problems that stumped you the night before just solve
themselves “like magic” on the second or third pass if you work hard on them for a short,
intense, session and then sleep on it. This is your right (nonverbal) brain participating as it
develops intuition to guide your left brain algebraic engine.
Other suggestions to improve learning include working in a study group for that third
pass (the first one or two are best done alone to “prepare” for the third pass). Teaching
is one of the best ways to learn, and by working in a group you’ll have opportunities to
both teach and learn more deeply than you would otherwise as you have to articulate your
solutions.
Make the learning fun – the right brain is the key to forming long term memory and it
is the seat of your emotions. If you are happy studying and make it a positive experience,
you will increase retention, it is that simple. Order pizza, play music, make it a “physics
homework party night”.
Use your whole brain on the problems – draw lots of pictures and figures (right brain)
to go with the algebra (left brain). Listen to quiet music (right brain) while thinking through
the sequences of events in the problem (left brain). Build little “demos” of problems where
possible – even using your hands in this way helps strengthen memory.
Week 0: Math Needed for Introductory E&M (and Optics) 27
Avoid memorization. You will learn physics far better if you learn to solve problems
and understand the concepts rather than attempt to memorize the umpty-zillion formulas,
factoids, and specific problems or examples covered at one time or another in the class.
That isn’t to say that you shouldn’t learn the important formulas, Laws of Nature, and all of
that – it’s just that the learning should generally not consist of putting them on a big sheet
of paper all jumbled together and then trying to memorize them as abstract collections of
symbols out of context.
Be sure to review the problems one last time when you get your graded homework
back. Learn from your mistakes or you will, as they say, be doomed to repeat them.
If you follow this prescription, you will have seen every assigned homework problem a
minimum of five or six times – three original passes, recitation itself, a final write up pass
after recitation, and a review pass when you get it back. At least three of these should
occur after you have solved all of the problems correctly, since recitation is devoted to
ensuring this. When the time comes to study for exams, it should really be (for once) a
review process, not a cram. Every problem will be like an old friend, and a very brief
review will form a seventh pass or eighth pass through the assigned homework.
With this methodology (enhanced as required by the physics resource rooms, tutors,
and help from your instructors) there is no reason for you do poorly in the course and every
reason to expect that you will do well, perhaps very well indeed! And you’ll still be spending
only the 3 to 6 hours per week on homework that is expected of you in any college course
of this level of difficulty!
This ends our discussion of course preliminaries (for nearly any serious course you
might take, not just physics courses) and it is time to get on with actual material for this
course. The first “chapter” of course material is still placed in the Preliminaries section,
for good reason. The topic of this chapter is math, not physics. Math is a very important
component of learning physics, and while you can, and should, significantly improve your
math skills while taking physics and actually using them for something instead of drilling in
them, it is still quite important that you enter the course with a certain amount of compe-
tence. The following ”Week 0” chapter, then, is intended to help guide you in both finding
resources for review and math support for the course and to include a self-contained de-
scription of a math topic only lightly passed over to now – integrating over two and three
dimensional distributions in cartesian, cylindrical and spherical polar coordinates, the “big
three” as far as electricity and magnetism (and a lot of other physics!) are concerned.
28 Week 0: Math Needed for Introductory E&M (and Optics)
Week 0: Math Needed for
Introductory E&M (and Optics)
Physics in general, as was noted in the preface, requires a solid knowledge of all mathe-
matics through calculus. Newton invented calculus so that he could invent physics. Yes,
there are “algebraic physics” textbooks out there, but I don’t think much of them, just as I
don’t think much of memorization of physics formulae (as opposed to learning them at a
much deeper level than memorization). To use a (possibly poor) metaphor – if mathemat-
ics is the language of physics, algebraic physics is the equivalent of memorizing a book of
phrases in a foreign language for a traveller who expects to spend only a couple of weeks
using it and then never use that language again. Of course some stuff will stick – a few
phrases here or here – and you may even remember in detail what those words mean.
You won’t be able to engage in an actual conversation in the language, however, even if
you can ask where the train station is, or order a beer and say please and thank you while
doing so because you remember those particular very important phrases.
To engage in a meaningful “conversation” in physics – in order to master both the con-
cepts and apply them in solving real problems to the point where you could read a scientific
or medical paper involving physics and not be utterly lost – you really do have to be com-
petent in a wide range of mathematics so that you can understand the physical principles
expressed in the language of mathematics, and understand the mathematical derivations
and relations that connect them and tell you what they mean. Meaning in physics is often
very difficult to convey in words, and requires a lot of them to make things precise, where a
mathematical formula that is linked to many other formulas in a meaningful way condenses
it all into a single short statement.
Here’s some of what you need to get started in introductory mechanics (the first semester
course):
a) A bit of Number Theory – what integers, rational numbers, irrational numbers, real
numbers, and complex numbers are, an understanding of their arithmetical opera-
tions, and how to represent them and these operations with symbols and manipulate
them using...
b) Algebra. Physics with calculus is still mostly algebra, it just includes the algebraic op-
erations not only of symbols representing physical quantities that can have numerical
values but those of...
c) Geometry. This includes traditional plane geometry – knowing various true facts
29
30 Week 0: Math Needed for Introductory E&M (and Optics)
about opposite interior angles and the number of degrees/radians inside a triangle
to analytic geometry and trigonometry – coordinate systems and sine, cosine and
tangents. This is used mostly to understand...
d) Vectors. Many of the quantities of greatest interest in physics are vectors (or if you
prefer, tensors with a range greater than zero). Finally, you use all of this in physics
problems ultimately based on...
e) Simple Differential and Integral Calculus. You don’t have to know a lot of calculus
to take the course. Five derivatives and their corresponding integrals will do (seven
if you want to master a few “advanced” topics). In addition to the derivatives them-
selves, you of course need to know the chain rule and its linked cousin, u-substitution,
and the derivative of a product and its linked cousin, integration by parts. Finally, you
need to be at least a little bit familiar with the Taylor series expansion of a smooth
function and the binomial expansion.
For second semester E& M and Optics, we still need all of this but we also need to add
a few items specific to E& M. The most important of these are:
f) Coordinate Systems. One can “get by” in mechanics with cartesian coordinates
plus the plane polar coordinate for 2D problems. Only a few problems emphasize
integration over more than one dimension, cylindrical symmetry, or spherical sym-
metry, and that is mostly to prepare you for E& M. Here we will really need to under-
stand cylindrical coordinate frames and spherical polar coordinate frames because
many problems have precisely those symmetries. Spherical coordinates, especially,
are the “natural” coordinates for much of E& M (and for gravitation in the mechanics
semester as well, although this isn’t really emphasized beyond radial dependencies).
This leads us to...
g) Integration over Two and Three Dimensions. This sounds like multivariate cal-
culus, and of course it is, but the textbook itself only presents problems where the
integration effectively separates so integrating over (say) a three dimensional charge
density distribution is equivalent to doing three independent one-dimensional inte-
grals. In this way one can encounter and solve much more “interesting” and realistic
problems but still use nothing but one dimensional integrals covered on the One
Sheet Math Review pages available on the internet24 .
Let’s go over the crucial new math content (only) and take a quick look at cartesian,
cylindrical, and spherical polar coordinate systems and see how to set up separable in-
tegrals like the ones you will encounter in this course (with two examples that are not
separable, one easily enough solvable, the other very difficult indeed to solve, so you can
see what the fuss is about and why – eventually – you might have to take multivariate
calculus to go on in physics as a major or tackle simularly multivariate problems in other
disciplines).
24
http://www.phy.duke.edu/r̃gb/Class/one-sheet-math-review.php As noted above, students are strongly en-
couraged to download and print out a copy of these review sheets and stick them in their notes for quick
reference, although their general content will be covered in this chapter.
Week 0: Math Needed for Introductory E&M (and Optics) 31
Before we dig in to the electric field produced by continuous charge distributions, we need
to review the three most useful coordinate systems, or frames, that we will use to evaluate
those fields. Up to now we have mostly used ordinary cartesian coordinates (x, y) or
(x, y, z), sometimes used plane polar coordinates (r, θ), and in the first semester course
have even used the radial coordinate r of spherical polar coordinates (r, θ, φ) or cylin-
drical coordinates (r, θ, z), but only in contexts where the other two coordinates don’t
really matter.
This will no longer do. We will be doing integrals and expressing relations in all three
of these coordinate frames! This means that we have to both know what the coordinates
are, and be able to go back and forth between the coordinate systems and do calculus
– specifically integrating over continuous charge distributions – using them! We will stop
far short of invoking the full power and range of multivariate calculus – the integrals we
will ultimately do will still be one-dimensional integrals over one coordinate at a time with
each coordinate integral fully independent of the others (which is why you don’t need to
have taken multivariate calculus to take this course). We will therefore develop concepts
like “length elements”, “area elements”, and “volume elements” informally at a level suffi-
cient to support their intelligent use in this course, without the use of partial derivatives or
jacobians.
Let’s start with the easiest and most familiar of the three: cartesian cooordinates!
+z
z P = (x,y,z)
y +y
x
+x
Figure 0.1: The classic rectilinear, orthogonalcartesian coordinate system, with a point
P = (x, y, z) illustrated.
In figure 0.1 the well-known cartesian coordinate frame is used to represent the position
in 3-dimensional space of a single point, P = (x, y, z). This point P can also be thought of
as the tip of position vector from the origin of the particular coordinate frame to the point
32 Week 0: Math Needed for Introductory E&M (and Optics)
P:
~
r = xx̂ + y ŷ + z ẑ ( or ) ~
r = xî + y ĵ + z k̂ (0.1)
(depending on how you learned to represent your unit vectors in the x, y, and z directions).
Since we use i, j, and k for many things in physics and engineering – the imaginary unit,
currents, constants such as the spring constant or electric and magnetic field constants –
I prefer to use x̂, ŷ, ẑ as the unit vectors in the three orthogonal directions.
If you are taking this course, you have learned to do one dimensional integrals – inte-
grals along lines of functions expressed in linear coordinates, and if you took the preceding
mechanics course you were required to integrate over mass distributions in one or two
dimensions to evaluate centers of mass or moments of inertia.
+z ∆ V = ∆ x ∆y∆ z
∆z
z
∆y
x y +y
∆x
+x ∆ A = ∆ x ∆y
In figure 0.2 the differential elements in cartesian coordinates are displayed as finite-
sized “chunks” of each coordinate. In your mind you should look at e.g. the ∆x as a finite
chunk from x to x + ∆x on the x axis, and mentally shrink it (at the end) to “differential”
sized – dx. In this way we can see the length, area, and volume elements as actual
(small) lengths, areas, and volumes where of course the ‘size’ of an infinitesimal is, well,
infinitesimal, a point. It’s hard to visualize points squared or points cubed, be easy to
visualize smaller and smaller square areas or cubic volumes.
From this figure, you can immediately see:
Three length elements: ∆x ⇒ dx, ∆y ⇒ dy, ∆z ⇒ dz. Using these three differential
lengths, we can integrate functions of x (but not y or z), y (but not x or z), and z (but
not x or y).
Week 0: Math Needed for Introductory E&M (and Optics) 33
One (of three) area elements: ∆A = ∆x∆y ⇒ dA = dx dy. Note that there are area el-
ements for each of the orthogonal planes: dA = dx dz and dA = dy dz allow surfaces
parallel to the x-z plane or y-z plane to be integrated over as well, again assuming
that we have no dependence on the omitted coordinate in each case.
This is pretty obvious if you’ve taken calculus at all, so I’ll just write down a few trivial
examples. A power law integral:
Z x2 x
2 ax3 2 a 3
x2 − x31
ax dx = = (0.2)
x1 3 x1 3
In order for us to proceed, we have to assume that the x-coordinate itself appearing in
a function does not depend on the y-coordinate. Usually this is true in the expression of
the function itself in x and y coordinates, but it may well not be true at the boundary of
integration.
For example, it is very easy to find the area inside a rectangle that runs from (0, 0) to
(a, 0), from (a, 0) to (a, b), from (a, b) to (0, b), and finally from (0, b) to (0, 0):
Z Z a Z b
Arectangle = dx dy = dx dy = ab (0.5)
rectangle 0 0
It is a bit more difficult, but still pretty easy, to integrate over a right triangle in cartesian
coordinates provided one lines the sides up with the coordinate system. For a triangle that
34 Week 0: Math Needed for Introductory E&M (and Optics)
runs from (0, 0) to (a, 0), then to (a, b), then back to (0, 0) one can determine its area using
the function y(x) = ab x as its upper limit of integration in y. Then:
a y(x) a a
b 1b 2 1
Z Z Z Z Z
Atriangle = dx dy = dx dy = y(x)dx = x dx = a = ab (0.6)
triangle 0 0 0 0 a 2a 2
as expected. The complication of this integral is that it no longer separates into two inde-
pendent integrals – the x-integral depends on doing the y-integral first, and the result of
the integration is y as a function of x and not a (possibly dimensioned) number.
However, it is rather absurdly difficult to integrate the area under (say) a semicircle in
cartesian coordinates. Here’s what it looks like. Remember that the cartesian formula for
a circle of radius R is: R2 = x2 + y 2 . If we try what we just did for a triangle, and solve this
√
for y(x) = + R2 − x2 :
Z Z R Z y(x) Z a Z R 1
Asemicircle = dx dy = dx dy = y(x) dx = (R2 − x2 ) 2 dx (0.7)
semicircle −R 0 0 −R
and we’re stuck. There is no simple u-substitution to give us the answer! This is the kind
of nasty integral that requires both integration by parts and trigonometric substitution to
evaluate25 . Assuming, I think safely enough, that most of my readers cannot evaluate this
integral without help, they are left with the somewhat unsatisfying method called looking up
the answer (nowadays, on the internet). While remarkably efficient, this is hardly useful on
a quiz or exam unless you are presented with an integral table along with the exam, and
besides, at some point this stops testing your understanding of basic physics and starts
simply connecting your grade to how good you are at moderately advanced calculus!
In a bit, we’ll do this integral another way entirely that makes it trivial to evaluate – at
the cost of changing coordinate systems to one where the boundaries of integration are
“natural” ones and the integral separates into two independent integrals. However, just to
finish the example, if we use the look-it-up method, we continue to get:
R R
√
R 2 − x2 R 2 −1 x R2 π 1
Z
1
2 2
Asemicircle = (R − x ) dx = x + sin
2 =2× = πR2
2 2 R 2 2 2
−R
−R
(0.8)
or half the area inside a circle of radius R, as expected. But ouch!
We’ll conclude with an integral over a function – one we might shortly consider to be
a surface charge distribution function – of x and y inside the rectangular boundary used
in the first example. The result of such an integral is the total charge of the surface, so
dQ
it means something useful physically. Let = σ(x, y) = σ0 xy. Then dQ = σdA =
dA
σ(x, y)dx dy and:
aZ b a
b
xy 2 b2 a a2 b2
Z Z Z Z
Q= dQ = σ0 xy dx dy = σ0 dx = σ 0 x dx = σ0 (0.9)
0 0 0 2 0 2 0 4
25
https://www.emathzone.com/tutorials/calculus/integration-of-square-root-of-a2-x2.html As of the time of
my writing this, this page had a fairly concise listing of the steps used to evaluate this integral, all ten or
twelve of them. This course does not require integration at this level of skill!
Week 0: Math Needed for Introductory E&M (and Optics) 35
This integral still separates! The result is the product of an integral over x and an integral
over y! We will learn to write this sort of integral more or less automatically as:
Z a Z b
a2 b2 a 2 b2
Q = σ0 x dx y dy = σ0 = σ0 (0.10)
0 0 2 2 4
as it is now the product of two trivial integrals, ones we can do in our heads!
By this point, you should be ready for an example that uses exactly the kind of reasoning
that will suffice for nearly all of the integrals we will need to do in this course. We may
still need to do some work (largely, choosing the right coordinate frame and doing some
math to express the problem in that frame) to get integrals that separate, but once we do
they are all going to be 1-3 one dimensional, independent integrals, easily done using the
methods of ordinary calculus covered on the One Sheet Math Review pages provided for
the course26 .
Separable volume integrals in cartesian coordinates must therefore have rectilinear
boundaries – for example from (0, 0, 0) to (a, b, c) enclosing a total volume of abc – and
must be over functions or distributions where x is independent of y is independent of z.
We’ll do a single example, just like the previous one – we’ll find the total charge in exactly
this volume given a volume charge distribution:
dQ
ρ(x, y, z) = = ρ0 e−κx cos(ky)z 2 (0.11)
dV
which corresponds to no realistic physics problem I can think of but which does let us
practice writing down rectilinear separable integrals and doing them:
Z Z Z Z a Z b Z c
−κx
Q = dQ = ρ dV = ρ(x, y, z) dxdydz = ρ0 e dx cos(ky) dy z 2 dz (0.12)
0 0 0
I get:
1
1 − e−κa sin(kb)c3
Q= (0.13)
3κk
How about you?
In figure 0.3 the cylindrical coordinate system is illustrated, including a typical point P =
(r, φ, z). The angle φ is swept out, by convention, counterclockwise (or in the ẑ direction
using the right hand rule in this right handed coordinate frame) from the positive x-axis.
r is the perpendicular distance of P from the z axis – the radius of the cylinder on which P
lies. z is the usual cartesian z component of P .
An important note about the symbols chosen: Note well, some math and physics
textbooks (more math than physics) use θ instead of φ for the azimuthal angle in cylindrical
26
http://www.phy.duke.edu/r̃gb/Class/one sheet math review.php
36 Week 0: Math Needed for Introductory E&M (and Optics)
+z
P = (r, φ, z)
φ r y
x +y
+x
Figure 0.3: Cylindrical coordinates are basically plane polar coordinates in the x-y plane,
plus the regular cartesian z-axis perpendicular to that plane. The point P = (r, θ, z) can
be found by rotating a point at x = r around z in the z direction (RHR) through the angle φ
and then lifting it straight up a distance z.
coordinates. Other textbooks, both math and physics, may well use ρ instead of r (to avoid
colliding with r as defined for spherical polar coordinates, covered next). I cannot guess
the symbols you, dear reader, might have used in the course(s) you took that hopefully
covered coordinate systems, so please take a few minutes now to make sure you know
what we will use in this course.
We will not use ρ as the radius of the cylinder, as ρ is already getting plenty of a workout
in the textbook as both a charge density ρ and as conductivity ρc of a material, and we
would rather not have to make sense of expressions like ρ(ρ, φ, z). We will use φ for the
azimuthal angle around the z-axis because it then matches the same azimuthal angle
used in spherical polar coordinates most commonly used in physics (as opposed to math)
textbooks and research papers. However, nearly everybody learns plane polar coordinates
as (r, θ), not (r, φ) or ρ, θ, in high school because of those pesky math textbooks, so I’m
left trading off the possibility of confusion now against the certainty of confusion later for
some students no matter what I do.
Defining the angle to be φ fortunately agrees with wikipedia article on the subject27
but it does use ρ instead of r, which we will not do for the reason given above. I think
that the spherical vs cylindrical contexts of physics problems are so obviously different
that less confusion will result than that which can follow from overloading the symbol for
ρ. Note well, Wikipedia itself isn’t even consistent on this – it uses (r, φ) for plane polar
coordinates, which (again, fortunately) corresponds with our usage, allowing us to define
cylindrical coordinates as “Wikipedia’s plane polar coordinates plus a z-axis!” If you want to
have a better idea of the confusion and lack of consistency among even primary University
mathematical physics textbooks, there is a nice table in Wolfram’s Mathworld article on the
27
Wikipedia: http://www.wikipedia.org/wiki/Cylindrical coordinate system. I highly recommend that you look
at this article while reading this section, especially if you have had multivariate calculus.
Week 0: Math Needed for Introductory E&M (and Optics) 37
+z
∆A ∆ V = r ∆ r ∆φ ∆ z
∆z
z
∆φ ∆ s = r ∆φ
φ +y
r
+x ∆r ∆ A = r ∆φ ∆ r
In figure 0.4 the differential elements in cylindrical coordinates are displayed as finite-
sized “chunks” of each coordinate (as before). Again we can think of these as going to
the point P and then swinging around the z-axis through an angle ∆φ to create a small
arc of length ∆s = r∆φ as one length element, then “pushing out” that arc by a second
28
https://mathworld.wolfram.com/CylindricalCoordinates.html This table still doesn’t do the subject justice.
If math is the language of physics and engineering, then there are distinct dialects of math used in different
contexts and by different authors, which can easily lead to confusion if you aren’t aware of it.
38 Week 0: Math Needed for Introductory E&M (and Optics)
length element ∆r to make an area element ∆A = r∆φ∆r, and then lifting this surface up
a length element ∆z in the z-direction to make a volume element ∆V = r∆r∆φ∆z
We will summarize this as:
One volume element: ∆V = ∆r∆s∆z ⇒ dV = rdr dφ dz. We can view this as either
dA in the polar plane times dz or as the area dA on the cylinder pushed out by the
distance dr.
Here is an example we will frequently encounter in Gauss’s Law problems with cylindrical
symmetry. The electric field flux through a tiny chunk of the the surface of a cylinder of
radius r in these problems turns out to be dΦe = Er dA where Er is constant everywhere
R
on the cylinder, so all we have to do to find the total flux is evaluate A = dA for a cylinder
of radius r and length (say) ℓ.
While we can do this one in our heads (see below) let’s do it “the hard way” with explicit
integration. We need to integrate:
dA(= dA2 ) = r dφ dz
where φ goes from 0 to 2π and z from 0 to H. Even the hard way is trivial:
Z Z 2π Z ℓ Z 2π Z ℓ
A = dA = r dφ dz = r dφ × dz = 2πrℓ (0.16)
0 0 0 0
This is obviously correct! It is the area you would measure if you imagine that the area
in question is the area of the label of a soup can with radius r and height ℓ! We can use
Week 0: Math Needed for Introductory E&M (and Optics) 39
our “mental scissors” to snip this label right off of the can and unroll it into a rectangle with
one side equal to 2πr (the circumference of the can) and the other equal to ℓ in length. The
area is then just width times height or A = 2πrℓ!
We don’t have a lot of occasions to integrate over dA1 or dA2 explicitly, but dA1 is most
convenient one to use to form the (one) volume element so I gave it precedence in figure
0.4 above.
Again we can get this result heuristically easily enough, but let’s integrate it the hard way
and use the result to inspire a more difficult example following. To find the volume of a
cylinder of radius R and height H, we start with:
dV = r dr dφ dz (0.17)
The integral separates, as the limits of integration of each coordinate is not a function of the
other two, and because there is no other function inside which one coordinate is a function
of the others. Hence:
Z R Z 2π Z H
R2
V = r dr × dφ dz = × 2π × H = πR2 H (0.19)
0 0 0 2
Again this is obviously the area of the base of the cylinder times its height, so we
worked way harder than we had to. Note that two of these integrals are evaluating the area
of the base as dA1 = πR2 above; our example includes finding the area inside a circle!
R
This is a tricky one! Indeed, it is the cylindrical equivalent of finding the area of a right
triangle, a non-separable example! To find the volume of a right circular cone of radius R
and height H, we will mentally place its apex at the origin and its base at the height H.
This may seem strange at first, but it makes it very easy to find the radius of the cone as a
function of z.
R
r(z) = z
H
To see this, note that when z = 0, r = 0 (apex at the origin) and when z = H, r = R
(circular base at height H). The volume is then the integral of dV as before, but the upper
limit of integration for dr is a function of z!
We need to do the integrals in a certain order, then, and differentiate r(z) the limit of
integration at height z from r ′ , a variable we integrate over. Let’s write it down, as that
makes it clear enough:
Z H Z r(z) Z 2π Z H Z r(z) ! Z
Z 2π
′ ′ ′ ′
V = dV = r dr dzdφ = dz × r dr × dφ (0.20)
0 0 0 0 0 0
40 Week 0: Math Needed for Introductory E&M (and Optics)
The first (φ) integral just gives us 2π. The next two are tricksy! Let’s do the r ′ integral
next: r(z)
Z r(z)
′ ′ r ′2 r(z)2 R2 2
r dr = = = z
0 2 0 2 2H 2
Note that this one integral is the result of both of the remaining integrals – we had to
integrate (over z) the result of integrating over r ′ because it was a function of z.
We combine all of the pieces to get:
R2 H3 πR2 H
V = 2π × × = (0.21)
2H 2 3 3
You might even remember that this is the correct answer from a previous geometry or
calculus class – the volume of a right circular cone is a third of the volume of the cylinder
with the same radius and height!
Finally, we’ll do one non-trivial separable example of integrating a radial volume charge
density distribution to find the total charge in a cylinder of radius R and height H. Suppose
the charge density varies only with the radius r such that ρ(r, φ, z) = Ar e−κr where A
has appropriate dimensions to make the volume integral have units of charge. This is
a physically plausible model of charge that is concentrated along the z axis but dies off
exponentially the further you are away from the axis. Then (letting u = −κr, du = −κ dr
and recalling that dV = rdr dφ dz):
Z
Q = ρ(r, φ, z) dV
cylinder
R Z 2π H
A −κr
Z Z
= e (r dr dφ dz)
0 0 0 r
Z R Z 2π Z H Z R
−κr −κr
= A e dr dφ dz = A e [2π] [H]
0 0 0 0
−κR
2πHA 2πHA
Z
eu du = 1 − e−κR
= − (0.22)
κ 0 κ
In figure 0.5 the spherical coordinate system is illustrated, including a typical point P =
(r, θ, φ). In this coordinate frame, r is the distance of the point P from the origin – basically
the radius of the sphere that contains P . The remaining two angles are used to locate P
on the surface of that sphere. If you start with a point at the “north pole”, where z = r, and
then rotate the point around the y axis in the direction of the x axis by the angle θ ∈ [0, π],
Week 0: Math Needed for Introductory E&M (and Optics) 41
z
P = (r, θ , φ)
r
θ
φ y y
x
r sinφ
Figure 0.5: Spherical polar coordinates represent an arbitrary point as P = (r, θ, φ).
Note that φ is the same as in cylindrical coordinates when the coordinates of point P are
projected onto the x-y plane, but r is quite different. θ is the angle between the positive
z-axis and the line from the origin to the point P .
you locate the circle concentric to the z-axis on which the point P lies. One then rotates
the point aximuthally around the z axis through the angle φ (counterclockwise, or “in the z
direction” according to the right-hand rule) to end up with it as the point P .
Alternatively (as figure 0.5 most clearly illustrates), you can visualize projecting the
point P into the x-y plane so that φ is the usual plane polar angle of the projection, but the
“cylindrical” radius of
pits projection is now r sin θ. From the picture it is also obvious that
z = r cos θ, that r = x2 + y 2 + z 2 , and so on, see below.
An important note about the symbols chosen: Some math and physics textbooks
(more math than physics) switch θ and φ. Other textbooks, both math and physics, may
well use ρ instead of r (to avoid colliding with r as defined for spherical polar coordinates,
but in the other direction). The convention I’m using matches that used in the Wikipedia
article on spherical coordinates29 and again there is a table in Wolfram’s related Mathworld
article30 that is even longer than the table for cylindrical coordinates!
As before, there are unit vectors (r̂, θ̂, φ̂) defined in spherical polar coordinates that are
again because the unit vectors themselves are functions of the coordinates and beyond
our scope. We will just give the spherical polar components A ~ = (Ar , Aθ , Aφ ) cylindri-
cal components of the vector when necessary and avoid expressing e.g. r̂ in cartesian
components or as a function of θ and φ as it will not be needed at this time.
From the picture in figure 0.5 we can easily see how to go back and forth between
29
Wikipedia: http://www.wikipedia.org/wiki/Spherical coordinate system. I highly recommend that you look
at this article while reading this section, especially if you have had multivariate calculus.
30
https://mathworld.wolfram.com/SphericalCoordinates.html Sheesh! Seven of them!
42 Week 0: Math Needed for Introductory E&M (and Optics)
and:
!
1 y z
2 2 2 −1 −1
r = x +y +z 2
φ = tan θ = cos 1 (0.24)
x (x2 + y 2 + z 2 ) 2
dr rdθ
dV= r sin θ dφ dθdr
2
r
dθ
θ rsin θ
φ y
dφ
φ
rsinθ dφ
x
Figure 0.6: The differential elements appropriate to cylindrical coordinates, expressed as
∆’s.
In figure 0.6 the differential elements in cylindrical coordinates are displayed as finite-
sized “chunks” of each coordinate, but this time I didn’t bother to label them as ∆’s, I
went straight to differential form with d’s. The easiest way to visualize and remember the
elements is to go to the point P = (r, theta, phi) and rotate it out from the z-axis through
an angle dθ on the great circle of radius r to create a small arc of length r dθ as one
length element. The second length element is obtained by again starting at the point P
and rotating it around the z-axis by the angle dφ to make a length element r sin θ dφ (note
the length is obtained from the projection of this arc onto the x-y plane as lies in a plane
parallel to this plane). The third length element comes from “pushing out” P radially by dr.
We can then assemble three area elements by taking these orthogonal length elements
two at a time, and one volume element from the product of all three:
We will summarize this as:
Three area elements: dA1 = r dr dθ, dA2 = r sin θ dr dφ, and dA3 = r 2 sin θ dθ dφ. Of
these three, dA3 is the most important/useful (see example below).
One volume element: ∆V = r 2 sin θ dr dθ dφ (basically, the area element dA3 pushed
out by a radial distance dr to form our differentially rectilinear volume).
We can use the differential elements described in detail above to easily find e.g. the surface
area of a sphere of (fixed) radius R as follows. We start with the area element of the
constant r = R spherical surface (dA3 above) and integrate the θ and φ contributions
independently. This seems as if it is pretty trivial (and it is) but note well the trick I use to do
the θ integral as it will often be the only or best way to proceed for more difficult spherical
integrals that involve θ explicitly!
So:
dA = R2 sin θ dθ dφ
Z Z π Z 2π
2
A= dA = R sin θ dθ dφ
0 0
Z π Z 2π
A = R2 sin θ dθ × dφ
0 0
Z π
2
A = 2πR sin θ dθ (0.25)
0
Note Well that the limits of the θ integral are 0 to π, not 2π. If you integrate both θ and φ
from 0 to 2π you will double count every point (look until you see why).
At this point we could easily do the remaining integral using the standard formula to get
“2” as the result, but instead I’m going to use u-substitution. This is overkill for this integral,
but in the next example, it will be the only game in town! Let:
A = 4πR2 (0.28)
To conclude: A very useful thing to remember is that the double integral over θ and φ
by themselves for a sphere is 4π. This is called “the integral over the solid angle of the
sphere”. The units in this integral are dimensionless (angles are always dimensionless)
but just as planar angles are expressed in dimensionless radians in the SI, solid angles
are expressed in the SI as “stereo radians”, or steradians. The area of a sphere is thus:
where we simply drop the “steradians” in the final expression as they are dimensionless!
It isn’t always the case that you integrate only constants over the surface of a sphere.
Sometimes you have to integrate a function of θ! Arbitrary functions of θ can easily be
challenging, but a very common case you will encounter (more in later courses than in this
one, but a bit even in this one) is integrating a function of not θ, but cos θ, over the surface
of a sphere. This example embraces integrals over sin θ as well, because one can express
sin θ in terms of cos θ.
Let’s pick a case you are actually likely to encounter sooner or later. Suppose:
f (θ, φ) = sin2 θ
over the spherical surface with (constant) radius R. We immediately use the u-substitution
trick illustrated above to convert this into a simple one dimensional integral. Recall u =
cos θ, du = − sin θ dθ, and observe that
sin2 θ = 1 − cos2 θ = 1 − u2
Observe that I simply ‘did’ the φ integral over its appropriate limits to get the 2π right
away (since f was independent of φ) and in a single step converted the sin3 integral into
u-form, where it was a trivial power law integral! This trick isn’t always the best one to use
– a different trick is used to integrate:
Z π
π
sin2 θ dθ =
0 2
because integrating Z 1 1
1 − u2 2
du
−1
doesn’t get us very far in terms of our five simple integral forms!
Week 0: Math Needed for Introductory E&M (and Optics) 45
Evaluating the volume of a sphere of radius R is now straightforward. The spherical ele-
ment is:
dV = r 2 dr sin θ dθ dφ
so we perform the usual u-substitution for sin θ dθ and sort this into three independent
integrals:
R π 2π 3
4πR3
Z Z Z
R
Z
2
V = dV = r dr sin θ dθ dφ = (2) (2π) = (0.31)
0 0 0 3 3
A second (and faster!) way to visualize and express this is to start with the known result
that the area of a sphere of radius r is A = 4πr 2 so that dV is the volume of a spherical
shell of radius r and thickness dr:
Z R Z R
4πR3
Z
V = dV = A dr = 4π r 2 dr = (0.32)
0 0 3
A very common problem you will encounter is evaluating, for example, the total charge in
a sphere of radius R when it is distributed according to some function of r on the interior.
Let’s pick a simple/integrable charge distribution function:
dQ
ρ(r) = = Ar 2
dV
where A has the appropriate dimensions to produce a total charge Q from the integral:
Z Z Z R Z π Z 2π
2
Ar 2 × r 2 sin θ dr dθ dφ
Q = dQ = Ar dV = (0.33)
sphere 0 0 0
We perform the usual u-substitution for sin θ dθ and sort this into three independent inte-
grals: Z R Z 1 Z 2π
4
Q=A r dr du dφ (0.34)
0 −1 0
or
4πR5
Q= (0.35)
5
In figure 0.7 a tiny chunk of mass (or charge!) with volume dV is drawn. The mass of
this chunk is (using the litany we learned in mechanics and will learn in the context of
electromagnetism as well herein) “the mass of the chunk is the mass per unit volume times
the volume of the chunk, or (for uniformly distributed mass M in a sphere of radius R as
shown):
3M
× r 2 sin θ dr dθ dφ
dm = ρM dV = 3
(0.36)
4πR
46 Summary
r sinθ
R dV
θ r
φ y
Figure 0.7: A differential chunk of mass (or charge!) in a solid sphere of mass of volume
dV moves in a circle of radius r sin θ as the sphere is rotated around the z-axis.
This chunk moves in a circle of radius r sin θ if the solid sphere is rotated around its
z-axis. This figure has enough symmetry that this is a “principle axis”, so we can find e.g.
its total moment of inertia around the z-axis by summing up:
2 3M
dI = dm (r sin θ) = r 4 sin3 θ dr dθ dφ (0.37)
4πR3
That’s enough preliminary stuff. At this point, if you’ve read all of this “week”’s material
and vowed to adopt the method of three passes in all of your homework efforts, if you’ve
bookmarked the math help or downloaded it to your personal ebook viewer or computer,
if you’ve realized that your brain is actually something that you can help and enhance in
various ways as you try to learn things, then my purpose is well-served and you are as
well-prepared as you can be to tackle physics.
There isn’t really any homework for the preliminary part of the book other than to read
over it to accomplish these goals, but here are a few things you could do on your own – or
even just think about how you would do them – if you want to put it into practice:
• Skim read the How to Learn Physics section, then then read it like a novel, front
to back. Think about the connection between engagement and learning and how
important it is to try to have fun in a physics course. Think about at least one time in
the past where you were extremely engaged in a course you were taking, had lots of
fun in the class, and had a really great learning experience. Contrast it to a course
where were lost and hated it. What made the two experiences so different? Sure,
maybe the material was boring, maybe the book was terrible, maybe the teacher was
awful – none of these are directly under your contro – but were there things that were
under your control that you could have used to both have more fun and do better?
• Skim-read the entire content of the Math Needed for Introductory E&M section
above. Identify things that it covers that you don’t remember or never learned, and
don’t understand (yet).
• Apply the Method of Three Passes to learning the things you just identified. Over a
few days, you can learn each coordinate system well enough to draw the essential
pictures defining its coordinates, how to go back and forth between it and cartesian
coordinates, and its differential elements. See if you can get to where you can work
the examples (at least) without looking and without real pain.
Note well: You may well have found the content boring on the third pass because
it was so familiar to you, but that’s not a bad thing! If you learn physics and its
requisite math so thoroughly that its laws become boring, not because they confuse
you and you’d rather play World of Warcraft but because you know them so well that
reviewing them isn’t adding anything to your understanding, well damn you’ll do well
on the exams testing the concept, won’t you?
47
48 Summary
II: Electrostatics
49
Week 1: Discrete Charge and the
Electrostatic Field
Summary
• Charge
Objects can carry a (net) charge q when “electrified” various ways. This charge
comes in two flavors, + and -. Like charges exert a long range (action at a distance)
repulsive force on one another. Unlike charges attract. The SI unit of charge is called
the Coulomb (C).
• Charge Quantization
Charge is discrete and quantized in units of e/3, where e = 1.6 × 10−19 C, but we
can never directly observe bare particles with the thirds (quarks). All charges we
can directly measure on independent particles come in units of e, the charge of the
electron or proton.
dq
ρ =
dV
dq
σ =
dA
dq
λ =
dx
• Charge Conservation
Net charge is a conserved quantity in nature. Later we will learn to write the conser-
vation law mathematically in terms of the flux of the current density, but we don’t yet
have the mathematical tools to do this with.
51
52 Week 1: Discrete Charge and the Electrostatic Field
– Insulators
– Conductors
– Semiconductors
• Coulomb’s Law
From performing many careful experiments directly measuring the forces between
static charges and from the consistent observations of many other things such as
the electric structure of atoms, the conductivity of metals, the motion of charged
particles, and much, much more, we infer that for any two stationary charges, the
experimentally verified electrostatic force acting on charge 1 due to charge 2 is:
~ 12 = ke q1 q2 (~
F
r1 − ~
r2)
|~ r 2 |3
r1 − ~
Note that it acts on a line from charge 2 to charge 1, is proportional to both charges,
and is inversely proportional to the distance that separates them squared.
1 N − m2
ke = = 9 × 109
4πǫ0 C2
This is accurate to something like 3 significant figures, which is plenty for our pur-
poses. Note also that you don’t have to remember the units of ke per se, you can
figure them out by just remembering Coulomb’s Law (which you have to know any-
way). Newtons on the left, coulombs squared on top and meters squared on the
bottom on the right.
• Electrostatic Field
The fundamental definition of electrostatic field produced by a charge q at position ~ r
is that it is the electrostatic force per unit charge on a small test charge q0 placed at
each point in space ~ r 0 in the limit that the test charge vanishes:
~ = lim F
E
q0 →0 q0
or
Week 1: Discrete Charge and the Electrostatic Field 53
~ r 0 ) = ke q (~
E(~
r0 − ~
r)
|~ r |3
r0 − ~
If we locate the charge q at the origin and relabel ~ r0 → ~r , we obtain the following
simple expression for the electrostatic field of a point charge:
~ r ) = ke q r̂
E(~
r2
• Superposition Principle
Given a collection of charges located at various points in space, the total electric field
at a point is the sum of the electric fields of the individual charges:
X ke qi (~
r −~
ri )
~ r) =
E(~
|~ r i |3
r−~
i
To find the electrostatic field produced by a charge density distribution, we use the
superposition principle in integral form:
ρ(~
r 0 )(~
r−~r 0 )d3 r0
Z
~ r ) = ke
E(~
|~
r−~r 0 |3
Because one has to integrate over the vectors, this integral is remarkably difficult.
We’ll revisit it in a much more similar form when we get to electrostatic potential, a
scalar quantity.
• Electric Dipoles
When two electric charges of equal magnitude and opposite sign are bound together,
they form an electric dipole. The dipole moment of this arrangement is the source
of a characteristic electrostatic field, the dipole field. The dipole moment of the two
charges is defined to be:
~ = q~l
p
where q is the magnitude of the charge and ~l is the vector that points from the nega-
tive charge to the positive charge.
When an electric dipole p ~ the following expres-
~ is placed in a uniform electric field E,
sions describe the force and torque acting on the dipole (which tries to align itself
with the applied field):
~
F = 0
τ = p
~ ~
~×E
U = −~ ~
p·E
54 Week 1: Discrete Charge and the Electrostatic Field
and from this, we can see that the force on the dipole in a more general (non-uniform)
field should be:
~ = −∇U
F ~ = ∇(~
~ p · E)
~
This completes the chapter/week summary. The sections below illuminate these basic
facts and illustrate them with examples.
Week 1: Discrete Charge and the Electrostatic Field 55
1.1: Charge
In nature we can readily observe electromagnetic forces. In fact, we can do little else. In a
very fundamental sense, we are electromagnetism. Electromagnetic forces bind electrons
to atomic nuclei, bond atoms together to form molecules, mediate the interactions between
molecules that allow them to change and organize and, eventually, live. The energy that
is used to support life processes is electromagnetic energy. The objects that we touch, or
hear, or taste, or smell, the light that we see, the organized pattern of neural impulses that
we use to think about the input from our senses – all are electromagnetic.
Given its ubiquity, it should come as no surprise that the directed observation and study
of electricity is quite ancient. It was studied, and written about, at least 3000 years ago,
and artifacts that may have been primitive electrical batteries have been discovered in the
Middle East that date back to perhaps 250 BCE. It is revealing that the very word electricity
and the name of the elementary particle most visibly responsible for its transport is derived
from the greek word for amber, electron. One of the first recorded observations of electrical
force was the static electrical force created between amber, charged by rubbing it with wool,
and small bits of wool or hair.
However, it took until the Enlightenment (roughly 1600) and the invention of physics
and calculus for the scientific method to develop to where systematic studies of the phe-
nomenon could occur, and it wasn’t until the middle 1700s that the correct model for elec-
trical charge31 was proposed. From that point rapid progress was made over a period
of 250 years, culminating in our contemporary understanding of electromagnetic forces as
one aspect of a unified field theory.
As pointed out above, even our prehistoric ancestors no doubt knew about “charge”.
The experience of rubbing one’s body against fur on a cold, dry day and thereby picking
up enough charge to generate a spark is probably tens of thousands of years old. By the
historic time of the Greeks, it was known that rubbing amber with wool or fur would charge
the amber, and the term electricity is derived from the Greek word for amber, elektron. We
now know that the charge produced on the amber is negative.
During the Enlightenment much more systematic studies were made of this phenomenon.
It is possible to charge many objects by rubbing them against other objects. For example,
if one rubs glass with silk, one literally rubs electrons off of the molecules that make up
the glass and transfer them to the silk. The silk becomes negatively charged and the glass
becomes positively charged. The study of this continues today where this sort of charge
transfer due to friction is called the Triboelectric effect32 . Recall that the study of friction is
called “Tribology”, so that this makes sense.
In order to do the experimental work that led to the identification of the two kinds of
charge and our ability to manipulate electrostatic charges and measure forces quantita-
tively, it was necessary to find ways of systematically charging up conductors with specific
increments of charge. One could use the triboelectric effect to charge up a piece of glass
or amber or bone or metal, but the amount and even the sign of the charge produced was
not always consistent. Charge also has a habit of “leaking away” from anything that is
31
Wikipedia: http://www.wikipedia.org/wiki/electric charge.
32
Wikipedia: http://www.wikipedia.org/wiki/Triboelectric effect.
56 Week 1: Discrete Charge and the Electrostatic Field
Figure 1.1: Kids! Don’t try this at home! The angels in this figure are simply waiting for
lightning to follow the graphite covered string down and fry Benjamin Franklin so they can
escort him to heaven!
.
One of the premier figures in the earliest days of the study of electricity was Benjamin
Franklin, an individual who can only be described as a “polymath” – physicist, inventor, pub-
lisher, politician, diplomat. Franklin conducted a series of experiments in the mid-1700’s
(long before the American revolution!) that determined that lightning was electrical in na-
ture, that charging an object generally involved moving charge of a single sign (an invisible
electrical “fluid”) from one object that otherwise contained equal, balanced amounts of both
signs of charge, to another, leaving behind a surplus of the other sign.
Figure 1.1 is an apocryphal illustration of one of Franklin’s experiments with charge –
flying a kite in a thunderstorm using string that had been rubbed with graphite to make it
conductive to charge up a Leyden jar and demonstrate that lightning itself is a triboelectric
static electrical phenomenon. Note that this is incredibly risky as it provides an easy path to
ground for the massive charge collected on an overhead cloud, making it not at all unlikely
to attract a lightning strike, which would of course then kill anyone holding onto the string
(and quite possibly any nearby onlookers).
Franklin’s discoveries were a tremendous achievement, as they set the stage for inves-
tigating electricity in the context of Newtonian mechanics. Unfortunately, he misguessed
the sign of the mobile charge, thinking it to be the one that he named positive, but as it
happens, mobile charge in solid conductors is almost always electrons, which are neg-
33
Wikipedia: http://www.wikipedia.org/wiki/Leyden Jar. A Leyden Jar is a primitive capacitor, which we will
study in more detail in three more weeks.
Week 1: Discrete Charge and the Electrostatic Field 57
ative. This mistake persists today as generations of physics students have had to draw
arrows indicating the motion of electrical current one way, associated with the movement
of (negative) electrical charge the opposite way. Sorry about that, but be warned, when we
get to the chapter on electrical current.
In 1756 Franklin was elected as a Fellow of the Royal Society in England, which in some
ways was the “heart” of the Enlightenment, and remained engaged in natural philsophy (as
science was then called) for most of the rest of his life, but his energies from then on were
largely diverted to politics, revolution, and ultimately, diplomacy.
Once charge was correctly identified as the “source” and electrical force, many natural
philsophers of the time felt strongly that electric charge would follow the inverse square
law Newton guessed and then demonstrated for the gravitational field (possibly influenced
by other contemporary researchers in the late 17th century). In the end, Charles-Augustin
de Coulomb34 the inventor of a very sensitive torsional balance, was able to use the
balance and his ability to precisely divide charges to precisely demonstrate the correctness
of the inverse square law hypothesis and make electrostatics quantitative. He published his
results over the period from 1785 to 1789, thirty full years after Franklin first demostrated
the existence of two opposed electrical charges.
The primary way one can use charge generated by any of several simple electrostatic
generators create conducting objects with at least controlled increments of charge upon
them is by induction and charge transfer or charge sharing. We will discuss these in more
detail next week after establishing the electrostatic properties of conductors.
Charge, as we shall see, is the fundamental quantity that permits objects to “couple” –
affect one another – via the electromagnetic interaction. It therefore will serve us well to
know a some of the most important True Facts about charge. This initial listing is just to
prime the pump, as it were – we will go over all of these ideas in much more detail, and
repeatedly, later!
• Experimentally, objects can carry a (net) charge q when “electrified” various ways (for
example by rubbing materials together).
• Charge comes in two flavors, + and -, but most matter is approximately charge-
neutral most of the time. Consequently, as Benjamin Franklin correctly guessed,
most charged objects end up that way by adding or taking away charge from this
neutral base.
• The SI unit of charge is called the coulomb (C). As we shall see, a coulomb is a
lot of charge, far more than one can usually place on or remove from a macroscopic
object in the lab to do experiments on. Microcoulombs or even nanocoulombs are
much more reasonable lab-scale electrostatic charges!
• “Like” charges exert a long range (action at a distance) repulsive force on one an-
other. “Unlike” charges attract.
• The force varies with the inverse square of the distance between the charges and
acts along a line connecting them. Coulomb’s Law (covered next) describes this
34
Wikipedia: http://www.wikipedia.org/wiki/Charles-Augustin de Coulomb. ,
58 Week 1: Discrete Charge and the Electrostatic Field
A quantity that is a constant througout all known interactions, neither created nor de-
stroyed, is said (in physics) to be “conserved”. In the first semester of this course, you
learned of a number of quantities that were conditionally conserved – momentum or an-
gular momentum, conserved when the net force or torque acting on a system is zero – or
unconditionally conserved, such as net energy (or more properly, mass-energy). Our final
True Fact is that:
Later we will learn to write this conservation law mathematically in terms of the flux
of the current density, but since we have not yet covered the mathematical tools to do
this with, we will for now learn the experimental result that charge cannot be created nor
destroyed; we can only move charge that already exists from one place to another35 .
Experimentally, we can readily see that charge can be isolated and moved around in very
large to extremely small quantities. A natural question is then: Can we continue dividing
charge indefinitely, and move an infinitesimal (in the formal sense of calculus) amount of
charge? Is charge a continuous quantity, the way we classically imagine space and time
to be? In Franklin’s time it appeared so, and he spoke of at least one of the two kinds of
charge as being a “fluid” that could be moved around in arbitrary amounts.
However, just as we learned in mechanics that solids and fluids are themselves not
continuous, but rather microscopically particulate, composed of things like atoms and
molecules, it turns out that atoms and molecules are in turn constructed out of quan-
tized elementary particles, that many of these named particles are charged, and that the
charge of each elementary particle is an integer multiple of an “elementary” quantum of
charge. Indeed, we characterize elementary particles by their unique signature consisting
of (among other things) their (rest) mass and their charge!
Even though this is a course in classical physics, we must never lose sight of the
fact that somewhere down there underneath it all, a quantum universe is lurking, and
sometimes it matters. In particular, it is very important for us to build an accurate mental
model for things like the “matter” we wish to apply our theory to even as we treat most
of its properties and interactions classically. In the spirit of this, let’s try to understand in
very simple terms – mostly pictures and ideas – the way everyday matter is put together,
especially as it relates to the idea of charge.
35
Later in the study of physics you may learn of quantum field interactions that lead to e.g. pair production
(or annihilation) – the simultaneous creation (destruction) of e.g. an electron-positron pair. Note well that while
charges are indeed produced (destroyed) in this sort of interaction, the total charge of a produced (destroyed)
pair is zero, justifying the careful use of the term “net” above in formulating the law. At the “everyday” energies
of normal matter at normal temperatures, one pretty much can ignore this sort of thing.
Week 1: Discrete Charge and the Electrostatic Field 59
There are two “kinds” of elementary particles observed in nature that form the massive
building blocks of nearly everything we see, usually grouped into families. One family
consists of the quarks36 , which carry a charge that is quantized in units of e/3, where
proton neutron
u u u d
d d
Figure 1.2: Simple model for protons and neutrons built out of three quarks. Note that the
“diameter” of a proton or neutron is on the order of 10−15 meters, and atomic nuclei are
made up of protons and neutrons that are basically “touching” and hence are of this same
order in size.
state (bound together by nuclear forces we will not discuss here) into the nucleons: the
proton (charge +e) and neutron (charge 0). In fact, a proton is made up of three quarks:
uud, with charge 23 e + 23 e + −1
3 e = e – where the neutron is also made up of three quarks:
2
udd, with charge 3 e + 3 e + −1
−1
3 e = 0, as illustrated in figure 1.2 above. Experimentally,
we only see particles with a net charge quantized in units of ±e outside of a nucleon,
something called “confinement” in particle physics circles.
Protons have a (rest) mass around 938.3 MeV/c2 (1.67 × 10−27 kg). This is tiny, but is
still almost 2000 times larger than the mass of an electron at 0.511 MeV/c2 (9.11 × 10−31
kg). Neutrons are just a hair more massive than a proton (939.6 MeV/c2 ). Protons and
neutrons are bound together by the strong interaction into an atomic nucleus on the order
of 10−15 meters in diameter. This (positively charged) nucleus strongly attracts negatively
charged electrons via the electrostatic force that is the first object of our study, which then
arrange themselves around the nucleus to create a structured, electrically neutral object –
the atom.
Since neutral atoms must contain an integer number of protons (charge +e) and an
equal integer number of electrons (charge -e), we can name atoms according to the num-
ber of protons in their nucleus (the number of electrons is somewhat variable as we can
comparatively easily add or take electrons away from most atoms). This is the basis of the
periodic table of the elements, where every element is distinguished by its atomic number,
symbol (usually) Z, which is the number of protons or electrons in the electrically neutral
atom, arranged according to the chemical properties of the material, which varies dramat-
ically with atomic number in families due to quantum mechanics (and, sigh, beyond the
scope of this course).
Still, we need a simple mental model for atoms in order to understand some very im-
portant things in this course! I therefore encourage you to visualize atoms using one of
the two pictures in figure 1.3, more the second model than the first. We will develop these
models much more completely, and even semi-quantitatively, later in the course.
On the left, a heavy (carbon) nucleus – not to scale! – has its six electrons in “classical”
elliptical orbits, as expected from Kepler’s first law (or solution to the classical equations of
motion) for inverse square force laws like Coulomb’s Law and Newton’s Law of Gravitation.
On the right, however, an actual nanoscale “photograph” of a hydrogen atom reveals the
Week 1: Discrete Charge and the Electrostatic Field 61
quantum reality – an invisibly tiny, heavy nucleus surrounded by an electron smeared out
in a “cloud” with colors that illustrate electron (probability) density.
The former figure is actually one of the models that led to the death of classical physics
as it turns out that any sort of classical orbit is inconsistent with Maxwell’s Equations (the
fundamental subject matter of this entire course) but it is still sometimes useful. It is best,
however, to keep the photograph on the right in figure 1.3 – revealing a very massive and
invisibly tiny nucleus surrounded more or less symmetrically surrounded by a much larger
“cloud” of light, relatively mobile electrons with the same total charge – as your mental
model of a neutral atom. This picture will turn out to be enormously useful to us as we
seek to understand electronic properties of matter.
Finally, atoms in turn are “glued” together by electrostatic forces to form molecules (the
object of the study of chemistry, so we will not dwell much on this in this text beyond noting
the fact). Molecules, as it turns out, also tend to stick together for reasons we will explore a
bit later on, and hence bulk ordinary matter is made up of molecules, that are in turn
made up of atoms, that are in turn made up of electrons and nuclei, and the nuclei
in turn are made up of protons and neutrons, which are (finally!) made up of quarks!
Get it? As you can see from the small mass of the protons and neutrons that make up
more than 99.9% of the weight of atoms , there are a lot of protons and neutrons – and
atoms – in even micrograms of ordinary matter!
From our model we can see that nearly all the mobile charge in solid matter is made up
of electrons, as the nucleus of any given atom is much more massive and is surrounded
by a nearly “impenetrable” ball of negative electrical charge, locked into solids in a rigid
structure in such a way that it isn’t terribly mobile. However, in fluids ionic charge can move
around with either sign. In semiconductors the mobile charge can even be something
called electron “holes” – de facto positive charge carriers consisting of regions of electron
deficit that move against an otherwise stationary neutral electronic background, in a weird
quantum sort of way.
Franklin, unfortunately, thought that the flavor of mobile charge in ordinary conductors
was positive. In fact, as noted above, it is negative – associated with moving electrons.
This is “Franklin’s mistake” – the bane of physics students for over two hundred years,
where the current in a wire generally points in the opposite direction to the actual motion
of the (negative) electrons in the wire. This will – rarely – matter in particular problems, so
keep it in mind.
62 Week 1: Discrete Charge and the Electrostatic Field
With our picture of “an atom” in mind, we can proceed to figure out what happens when we
consider “chunks” of matter in any of its common forms – solids, liquids or gases. First, we
should note that atoms themselves – let alone the point-like elementary charges they are
ultimately made up of – are quite tiny in terms of their mass and physical extent compared
to the SI units describing bulk matter. This is so much so that physicists actually keep a
few other sets of units in their pockets to use when doing atomic or nuclear physics! We
won’t do much of this now, but three important unit conversion numbers (two of them for
length scales) to keep in mind are:
A fermi is the typical length scale of the diameter of an atomic nucleus. An angstrom
is the length scale of the diameter of an atom. Avogadro’s number NA is the number of
atoms or molecules in one “mole” of matter – a quantity that has a mass on the order of
tens of grams, centimeter sized chunks of liquids or solids. A nucleus is invisibly small
indeed, relative to an atom, and an atom is invisibly small relative to “macroscopic” chunks
of matter (which we will arbitrarily consider to be the smallest chunks of matter we can
resolve and hence see with the naked eye through a microscope, around one micron in
size), and even these smallest chunks are made up of a lot of atoms!
There is therefore a very large number of discrete charges in nearly any macroscopic
piece of solid or liquid matter. We can easily estimate how much within a factor of two or
three by assuming that anywhere from nearly 100% (in the case of hydrogen) to roughly
40% (in the case of Uranium) of the mass of matter consists of the protons and neutrons
in the nuclei of the atoms that make it up. Protons and neutrons are themselves made up
of three elementary charged particles (quarks, see above), and in neutral matter for every
proton there is an electron. We can reasonably expect that close-packed solid hydrogen
thus has the fewest charges per cubic micron, and we can estimate this number as 4 (three
quarks and an electron) times (104 )3 (the volume of a cubic micron in angstroms, with
roughly one hydrogen atom per cubic angstrom) or 4 × 1012 individual charged particles40 !
The mass of such a chunk would be order of 1.67 × 10−27 ∗ ×1012 ≈ 1.67 × 10−15 kg
and it would contain around 1014 discrete charges. Our smallest visible chunk, the cubic
micron of just about anything solid or liquid, will have at least 100 trillion discrete charges
in it, or quite likely even more!
This makes precisely summing up fields produced by all of these charges in chunks of
matter much bigger than atoms all but impossible, even with computers. It is also generally
pointless to even try – with so many objects, surely an average would do for most purposes!
We will therefore have frequent cause to “coarse-grain” our description of bulk matter – to
40
Note that I say “roughly”. Estimation is an important practice in physics! The estimates for the number
of charges in this or that that I am presenting could easily be off by a factor of 10 by the time I lop off this or
that smaller factor and treat it as ‘1’, or solid hydrogen might well not be arranged with precisely one atom per
cubic angstrom, but as you will see, this will not matter as the conclusion will still easily hold.
Week 1: Discrete Charge and the Electrostatic Field 63
point charges
dq
ρ = dV ∆q
∆V
Figure 1.4: The “idea” behind coarse-graining: Any tiny block of ordinary matter – even
one as small as a cubic micron in size – contains a lot of charges – so many that we can
fairly use calculus instead of discrete summation to add it all up!
ignore the discrete particulate nature of charge and average out the total charge ∆q in a
finite but still invisibly small volume of matter ∆V , as illustrated in figure 1.4. By choosing
∆V small enough that we can treat it like a volume differential but large enough that
it contains a very large number of discrete charges (of either or both signs), we can
define a quasi-continuous charge density:
∆q dq
ρ= lim ≈ (1.5)
∆V →“0” ∆V dV
and use calculus to manage all of our sums!
This works because as individual charges of order e – electrons, or nuclei – move
across the boundary of the “infinitesimal” micron-scaled volume ∆V , they make changes
in the total charge inside the volume that only show up in some irrelevant decimal digit
in the estimated total electrical charge in ∆V . After all, there are roughly 100 trillion, ap-
proximately equally balanced numbers of positive and negative charges in there! Even if
we move a million charges across the line, that’s only 0.00000000000001 of a coulomb of
charge – a hundred thousandth of a nanocoulomb of charge. Our calculus-based compu-
tation will surely work to well within any reasonable experimental accuracy.
Similarly, we can associate surface charge densities with “two dimensional” distribu-
tions of charge (for example, a charged piece of paper or a charged metal plate) and linear
charge densities with thin “lines” of charged matter (for example, a wire or piece of fishing
line). The calculus of all three of these distributions are illustrated in figure 1.5. In all of
these forms, at the length scales associated with everyday objects, it is indeed better to
think of charge as being the “fluid” that Franklin imagined it to be, and unimaginably difficult
to consider solving problems using full sums over the trillions of trillions of discrete charges
that make up the object.
Note that two of these are further approximations of truly three dimensional volume
charge distributions. The surface charge distributions on conductors we will treat in this
course aren’t truly two-dimensional (with zero thickness); typically the unbalanced charges
are confined to a few layers of atoms at the surface. However, this distance is on the order
of nanometers, and it seems safe to ignore this thickness relative even to the side length
of square micron “chunks” of the area, let alone the centimeter and meter length scales of
actual charged objects. Similarly linear charges might in reality be confined to a similarly
64 Week 1: Discrete Charge and the Electrostatic Field
dq dq
L λ= (1.6)
1 dx
dx
dq
dq
A σ= (1.7)
2 dA
dA
dq
dq
ρ= (1.8)
3 V dV
dV
Figure 1.5: Three charge density distributions we will use in this course – linear, surface,
and volume.
thin layer on the surface of a “thin” wire or insulating string (perhaps one with a diameter
on the order of 100 microns), but as long as the string is much thinner than it is long, we
will make little error if we assume it is mathematically a one dimensional distribution.
The last property associated with the charges that make up matter that we wish to at least
mention this early (although we’ll examine it in more detail later) is that various materials
can often be categorized, broadly speaking, into one of three types with quite distinct
properties:
• Insulators. The charge in the atoms and molecules from which an insulating material
is built tends to not be mobile – electrons tend to stick to their associated atoms and
molecules tightly enough that ordinary electric fields cannot remove them (as we’ll
see, strong enough fields still can). Surplus charge placed on an insulator tends
to remain where you put it. Vacuum is usually considered an insulator, as is air,
although neither is a perfect insulator and even vacuum responds to and modifies
electromagnetic fields41 .
Insulators still respond measurably to an applied field, however – the charges in the
atoms or molecules distort as the molecules polarize, and the resulting microscopic
dipoles modify the applied field inside the material. Since we live in air (a material)
we do not generally see the true electric field produced by a charge but one that is
very slightly reduced by the polarization of the air molecules through which the field
travels. This is called dielectric response and we’ll discuss it extensively later.
41
Wikipedia: http://www.wikipedia.org/wiki/Vaccum polarization. Beyond the scope of this course is quantum
field theory, vacuum polarization, and pair production, but beyond the scope of not, in nature even an initially
charge-free vacuum responds to strong electromagnetic fields.
Week 1: Discrete Charge and the Electrostatic Field 65
• Conductors. For many materials, notably metals but also ionic solutions and suffi-
ciently hot gases (plasmas), at least one electron per atom or molecule is only weakly
bound to its parent and can easily be pushed from one atom/molecule to the next by
small electric fields. We say that these conduction electrons are free to move in
response to applied field and that the material conducts electricity.
Conductors also have some special properties when they respond to applied fields
beyond this that we’ll learn about later. Since electrons are bound to atoms by forces
with a finite magnitude, all matter becomes a conductor in a strong enough field! Di-
electric insulators that are placed in such a strong field experience something called
dielectric breakdown and shift suddenly from an insulating to a conducting state.
Lightning is a spectacular example of dielectric breakdown.
• Semiconductors. These are special “quantum” materials that can be shifted be-
tween being a conductor or an insulator depending on the potential difference at
the interfaces between different “kinds” of semiconducting materials. This is an en-
tirely quantum mechanical effect and is hence a bit beyond the classical bounds of
this course, but it certainly doesn’t hurt to know that semiconductors exist even in
this course, as semiconductors are extremely important to our society. In particu-
lar, semiconductors are used in three critical ways: they are used to make diodes
(one-way gates that allow electrical current to pass only in one direction, which we
will discuss as electrical circuit elements when we talk of rectification in AM radios),
as amplifiers (transistors) (used to make electronically played music and speech ad-
justably loud enough to listen to, for example), and as switches from which the digital
information processing devices are built that dominate modern existence. This list is
far from exhaustive – see Wikipedia: http://www.wikipedia.org/wiki/SemiconductorAn
important contemporary use of semiconductors is as the basis for solar cells. One
very crude way to think of at least some kinds of solar cell is as diodes where light (in
the form of a photon) “kicks” an electron across the one-way semiconductor barrier.
for a more complete discussion.
This concludes our discussion of charge per se for now. At this point you can see that
charge is indeed ubiquitous! We (and everything around us) are made up of charged
particles – even the neutral neutrons in the nuclei that make up most of our mass are
made up of charged particles! But just what force is it that binds electrons to nuclei, and
in turn binds one atom to another to form a molecule? What force pushes atoms apart,
so that we can set a coffee cup down on a table and not have it fall right through or get
stuck there? It is time to learn about one of the most important force laws in the Universe,
the one that is perhaps the most directly responsible for chemistry and biology: Coulomb’s
Law.
If one charges various objects (for example, two conducting balls suspended from an in-
sulating string so that they are near to one another but not touching) and measures the
deflection of the string when the balls are in force equilibrium, one can verify that:
66 Week 1: Discrete Charge and the Electrostatic Field
• The force between the charges is proportional to each charge separately. The force
is bilinear in the charge.
• The force acts along the line connecting the two charges.
• The force is repulsive if the charges have the same sign, attractive if they have differ-
ent signs.
• The force is inversely proportional to the square of the distance between them.
These four experimental observations are summarized as Coulomb’s Law. They are
a law of nature, on a par with Newton’s Law of Gravitation (which it greatly resembles),
although we will actually use an equivalent (and slightly more fundamental) version of this
law, Gauss’s Law for Electrostatics, as the version we will spend most of our time studying.
In general, while we like to understand laws like this verbally, they are more useful to
us if we can formulate them algebraically, expressed in a suitable coordinate frame. Let’s
draw just such a frame in figure 1.6.
q1
r 12
F12 q2
F21
r1
r2
We need the force acting on one of the charges, say q1 to be bilinear in the charges
(proportional to both of them as the labelling of each as 1 or 2 is arbitrary). It has to
act along the line connecting the charges, so we make a unit vector from 2 to 1 as (~ r1 −
~
r 2 )/|~ r 2 |. It has to be inversely proportional to |~
r 1 −~ r 1 −~ 2
r2 | . Finally, we need a dimensioned
constant to connect up the (SI) units on both sides of the equal sign.
If we put these things together, we get:
~ 12 = ke q1 q2 (~
r1 − ~
r2) (~
r1 − ~
r2)
F × = ke q1 q2 (1.9)
|~
r1 − ~r 2 |2 |~
r1 − ~
r2| |~ r 2 |3
r1 − ~
as the force acting on charge q1 at position ~ r 1 (in an arbitrary coordinate frame) due to
charge q2 at position ~ r 2 . It acts on a line from charge 2 to charge 1 independent of frame
(contains only relative vectors). It is proportional to both charges and satisfies Newton’s
third law. It is inversely proportional to the distance that separates them squared. It even
has the benefit of being repulsive if both charges have the same sign and attractive if they
have opposite signs, in either order! It is a perfect rendition of the verbal statements of the
observations of Franklin and Coulomb, but now we can compute the force in a specific set
of coordinates – if we have the constant of proportionality.
Week 1: Discrete Charge and the Electrostatic Field 67
Following this plan, then, let us propose an electrostatic field that is the supposed “local
cause” of the electrostatic force between two charged objects. This means that every
charged object in the Universe produces its own electrostatic field that emanates from the
charge and presumable contributes (according to the superposition principle) to the total
force any other charge experiences at any given point in space and time. Note well that
this field is present everywhere in space whether or not we measure it, whether or not
there is a charge there to be acted on by it.
42
Actually, a coulomb is not a fundamental unit in the SI system. Its size is defined in terms of the ampere
– the unit of electrical current – and seconds via measurements made of magnetic forces. We’ll learn about
this in a few weeks when we study magnetism but it won’t matter to us much if at all except as a historical
curiousity.
68 Week 1: Discrete Charge and the Electrostatic Field
We leave until a future electrodynamics course the question of whether or not this field
is instantaneously linked to the position of the charge (so that moving a source charge
moves its field with it everywhere, no matter how far away, synchronously) or if the field
change propagates away from the charge at some finite speed. Properly speaking and
empirically it is the latter (one of many ways we come to believe in the reality of our con-
struct) but in the context of the stationary or slowly moving charges of electrostatics it won’t
really matter as the speed of propagation of the field – the speed of light, c = 3 × 108 m/sec
– is so rapid that it might as well be “instantaneous” in the laboratory scale experiments
and problems we will encounter in this course.
The fundamental definition of electrostatic field produced by a “source” charge qs at
position ~
r s is that it is the electrostatic force per unit charge acting on a small test
charge q0 placed at any given point in space ~ r in the limit that the test charge vanishes:
~ = lim F
E (1.11)
q0 →0 q0
~ r ) = lim 1 ke q0 qs (~
E(~
r−~
rs )
= ke qs
(~
r −~
rs )
(1.12)
q0 →0 q0 |~
r−~
rs |3 |~ r s |3
r −~
A much more compact way to understand this results from putting the source charge
at the origin of our coordinate system and (since now there is only one charge) labelling it
q. In this case we get the following extremely simple way of representing the electrostatic
field of a point charge q:
~ r ) = ke q r̂
E(~ (1.13)
r2
This result can be re-expressed in words. A point charge q produces a radially symmetric
electrostatic field proportional to q that drops off like 1/r 2 where r is the distance of the point
of observation from q. It points radially away from positive charges and radially in towards
negative charges (basically preserving the sign of q, in other words). If you remember this,
then it is easy to figure out the changes needed when you have multiple charges and at
least some of them are not at the origin.
A common question that students often ask is: “Why all of the hassle with letting test
charges go to zero after dividing if you’re just going to divide it out anyway?” The reason
is that – as we will see later – the presence of the test charge exerts a force in turn on
the source distribution of charge that produced the field! If that charge distribution is not
metaphorically “nailed down”, if it can move at all in response to the test charge, it will
rearrange and thereby change the field one is trying to measure. One is no longer mea-
suring the field produced by (say) a charged conducting sphere, one is measuring the field
produced by a charged conducting sphere in the presence of another charge that alters
the charge distribution on the conducting sphere! By letting the test charge in the definition
go to zero, one formally causes any disturbance caused by the measurement itself to go
to zero, leaving you with the field that is (presumably) still there in the limit in the absence
of any charges but those in the provided distribution.
However, students are also correct – we do start with the force and factor out and
eliminate that test charge because the only way we have of measuring the field is via the
Week 1: Discrete Charge and the Electrostatic Field 69
force, and the measurement (like all measurements) is bound to alter the thing we are
measuring and formally minimizing this disturbance is confusing! It makes just as much
sense to skip the force entirely – since we are making up the entire idea of electrostatic
field in the first place – and just assert that an isolated point charge – a charged elementary
particle from Table 1 above, for example – simply produces a radial electrostatic field as
given in equation 1.12 above by definition.
As it happens, this works quite well. The fundamental definition of the electrostatic
field starts with the force between two charges and infers the field, but almost always, we’ll
actually work the other way around. In general we’ll be given a distribution of charges
(either discrete or a continuous charge distribution), from which we must determine the
field. With the field known, we can then evaluate the force we expect these charges to
exert on an arbitrary (e.g. test) charge placed placed in the field at an arbitrary point in
space by means of the following rule:
~ = qE
F ~ (1.14)
For two point charges and the definition of field given by 1.12 this result is exact. Indeed, it
correctly describes the alterations in the force observed, when one accounts for the force-
driven alterations in the positions of the source charges! So we might as well use equation
1.12 as our definition for the field of a point charge and move on from there!
So much for a single charge, but as we noted above, there are lots of charges in even
tiny chunks of matter. We need a way of finding the total field produced by many charges,
not just one. Furthermore, that way needs to work for charges counted “one at a time”
(when there are only a few and they are enumerable) and it also needs to be useful in the
limit of so many charges that a coarse-grained average yields an approximately continuous
charge distribution in bulk matter.
Fortunately for all concerned, the fields of many charges simply add right up! This too
is a principle of nature (and is related to the linearity of the underlying equations that are
the laws of nature). We call it the Superposition Principle.
70 Week 1: Discrete Charge and the Electrostatic Field
E3
q1
P
r1 r
r r3
q2
r2 q3
r3
x
Figure 1.7: Geometry needed to evaluate the field of “many” (three) charges at an arbitrary
point P at position ~
r . Only the field of the third charge E~ 3 is shown explicitly. Note well the
magnitude and direction of the vector ~ r−~ r 3 : head at ~
r , tail at ~
r 3 . This is a vector from
the position of the charge q3 to the point of observation P at ~ r.
Given a collection of charges located at various points in space, the total electric field
at a point is the sum of the electric fields of the individual charges:
X kqi (~
r −~
ri )
~ r) =
E(~ (1.15)
|~ r i |3
r −~
i
This is illustrated in figure 1.7. Note that this is basically the force superposition rule we
learned above, divided by the “test charge” q0 in the standard definition. For fixed-position
point charges, as we’ve seen, there can be no charge rearrangement of the sources so
the limiting step q0 ⇒ 0 is not strictly necessary, but it doesn’t hurt to use it. As before,
the vector field contributions from each charge carry the sign of the source charge – away
from the source as in the figure for (assumed) positive charges, but if any charge is actually
negative its field is directed towards the source charge.
Simple as it is, the superposition principle is extremely important in physics. It tells us
that the electrostatic field results from a linear field theory and later in a study of physics you
will learn that this means that the differential equations that describe the field produced by
charge distributions are linear differential equations. Field theories don’t have to be linear,
but it turns out that the ones of the greatest importance in physics are43 .
~
Observe that the total E-field in equation 1.15 is a vector sum! This means that in
most cases one will have to decompose the field produced by each charge into its
vector components in the coordinate frame in question, then add the components
separately, and finally reconstruct the total vector! It is easier to show you how that
all works than tell you, so let’s look at two simple examples of evaluating the total electric
field produced y only two point charges. Both of these are very useful examples quite
43
Mostly, anyway. In quantum mechanics things like vacuum polarization make even electrodynamics – the
“poster child” of a linear theory – somewhat nonlinear at very short length scales very near a point charge.
Week 1: Discrete Charge and the Electrostatic Field 71
aside from illustrating the fairly simple math associated with summing up the field of point
charges.
Example 1.4.1: Finding the Field of Two Point Charges – An ‘Electric Dipole’
Two charges ±q located symmetrically on (say) the y-axis produce a field that is easy to
evaluate at points on the x and y-axis. This arrangement of charges is called an electric
dipole and is a very important concept that we will work with extensively this semester and
beyond. So, suppose we have two point charges of magnitude −q and +q, located on the
y-axis at y = −a and y = +a, respectively. We’d like to find the electric field first at an
arbitrary point on the y axis, and again at an arbitrary point on the x axis.
At that point you should be able to find an expression for the electric field at an arbitrary
Cartesian point (x, y) by generalizing the steps we take in these problems. Note that we
are not really ignoring z even though the field is three dimensional and we are omitting it in
~
the example, because we expect the E-field for this example to be azimuthally symmetric,
that is, not to change as we rotate the solution around the y-axis, turning the solution in
the x-y plane into the solution in other planes containing the y-axis.
y Let’s start by drawing a good figure like the one on the left
of the two charges on the y-axis, as well as the arbitrary
E+
point y on the axis where we wish to evaluate the field.
Recall that the field of a point charge is:
E− y−a
+q ~ = ke q r̂
E
y r2
As we might have expected just from the figure, the problem is basically one dimen-
sional – there is only one field component to add up with no vector decomposition per se
really needed. The total field on the y axis is just:
~ tot (0, y) = ke q 1 1
E − ŷ (1.18)
|y − a|2 |y + a|2
The field on the x-axis is a tiny bit more difficult. As before, we start with a good figure
defining the coordinate system and the point on the x-axis where we want to evaluate the
field:
y
+q
+a
r
θ
x θ x
E
E
E tot
−a
−q
~
Figure 1.9: The coordinate frame and geometry needed to evaluate the E-field at an ar-
bitrary point on the x-axis. Note Well that we defined an angle θ in the figure to aid us in
decomposing the vector components of the field even though there is no ‘θ’ in the coordi-
nate system!
Here the field produced by each charge has two (both x and y) components. We now
have to solve the problem in three steps:
• Find the magnitude of the field of each charge, using e.g. the pythagorean theorem.
• Find the vector components of the field of each charge. This is tricky! Remember
θ is not given so we will have to find things like sin θ and cos θ in terms of the givens
and the coordinates of the point in question!
• Finally, we have to add up the components and reconstruct the full vector field. As
always, there are multiple ways we might represent the final vector and we’re not
picky as long as your answer uniquely and correctly specifies that vector!
To find the vector field, we must first find the magnitude of the field. Observe that the
distance from either charge to the point of observation drawn above is r = (x2 + a2 )1/2 .
Then the magnitude of the electric field vector of either charge is just:
~ = ke q ke q
|E| 2
= 2 (1.19)
r (x + a2 )
This magnitude is represented by the length of the field arrows in figure 1.9. We orient the
arrows away from the upper (+) charge and toward the lower (-) charge.
Week 1: Discrete Charge and the Electrostatic Field 73
Now look at the right triangle formed by x, a and r with θ drawn in one corner. By
definition (think about it):
x x
cos θ = = 2 (1.20)
r (x + a2 )1/2
a a
sin θ = = 2 (1.21)
r (x + a2 )1/2
(where we are writing down the positive, quadrant 1 values, and will handle the signs
needed in our final algebraic expressions from the picture). Using these, we can find the
components:
~ cos θ = ke q x
Ex = |E| · 2
(x2 2
+ a ) (x + a2 )1/2
ke qx
= (1.22)
(x2 + a2 )3/2
and
~ sin θ ke q a
Ey = −|E| · 2
(x2 2
+ a ) (x + a2 )1/2
ke qa
= − (1.23)
(x2 + a2 )3/2
This is for a single charge (+q). The other charge has components that are the same
magnitude but its Ex obviously cancels Ex from the first charge while its Ey obviously adds
to it (doubling it). The total field is thus:
~ tot (x, 0) = −2 ke qa
E ŷ (1.24)
(x2 + a2 )3/2
Our next topic will be the further investigation of the electric dipole. In it, we will define
the electric dipole moment and see that the dipole moment of this arrangement of charges
is:
p
~ = 2qaŷ. (1.25)
~
Thus the E-field can be expressed in terms of the magnitude of the dipole moment p =
p| = 2qa as:
|~
~ tot (x, 0) = − ke p
E ŷ. (1.26)
(x2 + a2 )3/2
The field on both the x and y axes seems to drop off like the (approximate) distance
from the origin cubed in the limits where x ≫ a or y ≫ a (you can easily show this using
the binomial expansion). This goes to zero faster than the one over r-squared field of an
“electric monopole” (single point charge) but is still certainly not zero anywhere in space
any more than the field of a single charge is!
At this point you should be able to evaluate the electric field vector of a y-directed dipole
at an arbitrary point (x, y) in the x-y plane, not just on the x and y axes themselves! The
geometry and trigonometry are just a little bit more difficult, but are still pretty straightfor-
ward if you use the pythagorean theorem and carefully draw the triangles needed to find
74 Week 1: Discrete Charge and the Electrostatic Field
the x and y components of the field of each charge. Now however, there will be no for-
~
tuitous cancellation of field components – the E-field will generally not point along the y
axis unless you are at a point on the y axis itself or on the x-z plane through the origin.
Sadly, cartesian coordinates are not the ideal coordinate system in which to study electric
dipoles, and it will turn out to be much easier to define the electric potential (a scalar field)
of the dipole rather than the vector electrostatic field, once we know what that is, so we’ll
come back to dipoles again later when we’ve reviewed a couple more coordinate systems
and learned about potential.
Before we leave, however, we do need to spend a bit of time on just what an “electric
dipole” is, and why they are important to us. We’ll start, then, by formally definining an
electric dipole and learning how it interacts with the electric field and at least look at the
shape of the field it produces at more points than just those on the x or y axis.
As we just noted, the arrangement of two equal but opposite charges above is called an
electric dipole44 . Dipole fields play an enormously important role in physics! That is
because dipolar arrangements of charge and the forces and torquest that act on them and
the fields that they produce are common in nature and play a critical role in things like, well,
life! Our life! Let’s see why.
−e −e
+e
+e
Figure 1.10: An atom in an electric field polarizes as its nucleus is displaced relative to its
electron cloud when it is pushed one way while the cloud is pulled the other.
In a few weeks we will consider the field produced by polarized atoms on average
inside a solid as this field modifies the field that polarizes the atoms and we will learn
some wonderful things. The model for “an atom” we develop will be very much like that
illustrated in figure 1.10 above. In particular we will develop this picture into something
called the Lorentz Oscillator Model which idealizes an atom as a uniform ball of negative
charge symmetrically surrounding a small (pointlike) positively charged nucleus such that
the total charge is zero. This particular atomic model predicts harmonic oscillation of an
atomic dipole moment as well as a linear response model for the polarization of an atom. It
actually works remarkably well all the way up to graduate electrodynamics classes to help
students understand the general principles of dielectric polarization, electric conduction,
radiation, and more!
For the moment, however, it suffices for us to recognize that since we are a big pile
of atoms and those atoms spontaneously polarize in electrical fields (which are also ubiq-
uitous), the forces and torques acting on dipoles, and the fields produced by dipoles, are
both of great interest to us as we seek to understand ourselves and everyday “stuff” about
the world around us such as why charged balloons stick to walls, why the sky is blue and
the sunset is red, why matter hangs together even though it is generally electrically neutral.
Weseems as though it might be will eventually learn that dipoles are literally everywhere,
and are very important indeed in our efforts to build a rational worldview that explains the
world of our everyday experience in simple, intuitive terms.
Let’s start by modelling the resulting charge distribution of a polarized atom (or any
other dipolar system) as a “basic neutral electric dipole” constructed directly out of two
pointlike charges of opposite sign separated by a vector distance ~l from the negative to
the positive charge: Note that we are not in this figure assuming that any particular E-
+q F = qE
l E
F = −qE
−q
Figure 1.11: The basic dipole consists of two equal and opposite charges ±q separated by
a vector displacement ~l, in which case the dipole moment of the arrangement is defined to
~ = q~l.
be p
field (like the one drawn) or other force is creating the dipole; rather we are assuming that
it is fixed, with the charges rigidly separated by e.g. a massless rod in between. Because
we are interested in determining what the force and torque acting on the dipole are when
it is located in a field, in figure 1.11 we have gone ahead and placed our basic dipole in a
uniform field pointing to the right.
76 Week 1: Discrete Charge and the Electrostatic Field
~ = q~l
p (1.27)
where q is the magnitude of the charge and ~l is the vector that points from the negative
charge to the positive charge. As our work evaluating the electric field of a dipole in the
previous section might suggest, this definition is far from arbitrary – it turns out that this
is precisely the quantity the behaves somewhat “like a charge” in the equations that de-
scribe its field. To emphasize the similarity, note well that we can refer to isolated charges
as electric monopoles, and will eventually learn to speak of monopolar and dipolar (and
quadrupolar!) moments of arrangements or distributions of charge. In future electrody-
namics courses, you will spend a considerable amount of time learning to expand electro-
magnetic fields in terms of the multipolar moments of source distributions, using math that
is considerably more complex than the simple stuff we will use here, but the idea will be
exactly the same, which is why it makes sense to understand the terminology and point of
it all here, where it is still very simple!
We’ll start by considering the force and the torgue exerted by a uniform electric field on a
dipole as illustrated in figure 1.11 above. When an electric dipole is placed in a uniform
electrical field, the forces on the two poles are equal in magnitude and opposite in
direction, that is, they form what we learned to call a force couple in the mechanics course
preceding this one.
In that course, we learned (and it is easy to see explicitly) that the net force exerted
by a force couple (and hence the net force acting on a dipole in a uniform field) is zero.
Algebraically:
~
F ~ + qE
= −q E ~
= 0 (1.28)
If the dipole is not aligned or antialigned with the uniform field, however, the force
couple produced by the field clearly exerts a pure torque on the dipole. In particular, this
torque is independent of our choice of pivot 45 .
If we pick (say) the negative charge as the pivot, we can evaluate the torque most
easily, as it is due to the force exerted on the positive charge only, at position vl relative to
the pivot. The torque is therefore:
τ = ~
~ r×F ~
= ~l × q E
~
= q~l × E
~
= p ~
~×E (1.29)
45
Do not hesitate to look back at Introductory Physics I if necessary to review this and other aspects of
torque.
Week 1: Discrete Charge and the Electrostatic Field 77
Note well that that charge is a scalar quantity so we can pull it back through the cross-
product!
This is a very important result; learn this picture and mini-derivation well so you can
easily remember and apply it46 .
Associated with this torque is the following potential energy which is clearly minimized
when the dipole moment aligns with the applied field. We look at the picture above, and
consider the amount of work done by only the component of the force perpendicular to the
arc of motion as we twist the dipole from a position at right angles to the field (where we
define the potential energy to be zero) to an arbitrary angle. A bit of consideration and a
good picture (see homework) should convince you that:
Z Z
U = − Ft ds (or = − τ dθ)
Z θ
= − (−qE sin(θ)) ℓdθ
π/2
= −pE cos(θ)
(Note well! The force/torque has the opposite sign to that of the angle θ! There’s bound
to be a harmonic oscillator in there somewhere...) or
U = −~ ~
p·E (1.30)
Note that U (θ) is minimum (maximally negative) when the dipole is aligned with the field,
maximum (maximally positive) when antialigned. Alignment (when both force and torque
are zero and energy is minimum) is a point of stable equilibrium for this system!
These two expression is only generally exact if E ~ is uniform – we need it to be at
least approximately the same at the two ends of the dipole so the forces form a couple
and the energy is strictly due to the torque. More practically, however, it is usable (and
quite accurate) whenever the dipole is short (ideally “point-like”) relative to the scale over
which E~ varies, so that the value of E
~ “at the position of the dipole”, varying or not, is a
well-defined quantity.
Note well that while τ = p~×E ~ is going to always be true in this limit and context, from
the picture above and our general knowledge of intro-level mechanics, we do not expect
that the force on a dipole, point-like or not, in a more general non-uniform field will be zero!
Recall that:
~ = −∇U
F ~ (1.31)
was the inverse of our definition of work in mechanics! Each component of a conservative
force was related to a (partial) derivative of the scalar potential energy. From this we expect
that:
~ = ∇(~
F ~ p · E)
~ (1.32)
46
Since this is the first time this semester that you have seen a cross product, if you have started to forget
it is also a really good time to go back and review that, as well! You need to be very comfortable with its its
pictorial representation, its algebra and geometry, and of course the good old right hand rule!
78 Week 1: Discrete Charge and the Electrostatic Field
~
F ~′
~ + qE
= −q E
~ + qE
= −q E ~ + q∆E
~
~
= q∆E
~ p · E)
= ∇(~ ~ (1.33)
We’ve already started to get a feel for the field produced by an electric dipole, in at least two
possible (rather symmetric) directions. Not to worry! It is already familiar to any student
who has done the simple school experiment of sprinkling iron filings onto a sheet of paper
sitting on a small bar magnet (which as we will eventually learn is a magnetic dipole) as
the “butterfly pattern” that emerges as the filings line up with the magnetic field. It is even
possible to do the same thing with an electric dipole under a sheet of paper or plexiglass,
but one has to use something like husks of rice in place of the iron filings. Both of these
experimental results suggest that the “lines of force” associated with our ‘imaginary’ electric
field (and later magnetic field) as an explanation or cause of electrostatic or magnetostatic
forces is a bit more than just a metaphor, but is supported by observations of nature!
In many cases, the physical length of the dipole (2a in this case worked out above and
extended below) will be small compared to x, the distance of the point of observation to
the dipole. In this limit, the field (or later, potential) produced is that of an ideal dipole, or
a point dipole. The general butterfly shape of the electric dipolar field of a point dipole is
illustrated in figure 1.12 using the visualization enabled by drawing field lines of force that
are everywhere tangent to the electric field and proceed smoothly from positive to negative
charges.
47
Students who have never seen the gradient operator ∇ ~ before and who are not potential physics or math
majors will not be tested on this, but are still advised to read and study it and to try to understand it, because
it actually explains a lot of things very compactly that otherwise (as we have seen and will see further below)
are actually more difficult to derive and evaluate than the gradient.
Week 1: Discrete Charge and the Electrostatic Field 79
-1
-2
-6 -4 -2 0 2 4 6
Figure 1.12: The characteristic “butterfly” electric field of an electric dipole oriented in
the positive y-direction (up, in the figure) located at the origin. The field has azimuthal
symmetry around the axis of the dipole.
We can actually find the dipolar field to very high accuracy in the limit that x ≫ a
very easily by factoring out the larger of the two quantities (x) in the denominator, moving
the remaining part of the denominator to the numerator by giving it a negative exponent,
and then performing a binomial expansion on that part and keeping terms to any desired
degree of precision (but usually, keeping just the first surviving, that is, non-zero term as
that is the only one important at long distances).
Let’s apply this process to find the field of our y-directed electric dipole at an arbitrary
point on the x-axis in the limit where x ≫ a! In this case we get:
~ tot (x, 0) = − ke |~
p|
E ŷ
(x2 + a2 )3/2
ke |~
p|
= − 3/2 ŷ
3 a 2
x 1+ x
ke |~
p|
a 2 −3/2
= − 3 1+ ŷ
x x
ke |~
p| 3 a 2
≈ − 3 1− + . . . ŷ
x 2 x
ke |~
p| 1
≈ − 3 ŷ + O (1.34)
x x5
(where the last term is read in mathematese as: “plus neglected terms of order 1/x5 ”).
80 Week 1: Discrete Charge and the Electrostatic Field
It turns out that the field of a point dipole generally scales like 1/r 3 where r is the
distance from the dipole to the point of observation. It thus vanishes more rapidly than the
electric monopolar moment (the field of a single bare charge, which goes like 1/r 2 ) with
distance, but that does not mean the field is negligible because the electric force is very
powerful, far stronger than gravity, and the strongest force of nature outside of the nucleus
of an atom.
Indeed, for most problems in physics that don’t involve planet-sized masses, the elec-
tromagnetic forces – whatever form or magnitude they might have – are by far the largest
forces acting within a system. To decide whether or not any algebraic expression for the
field can be neglected requires specific numbers; for that reason many problems will have
you find the leading order term(s) in a binomial or taylor series expansion of the field or
potential.
Please go review both the binomial and taylor series expansions, as they will be very
useful to us as we solve problems and work examples in this course. The binomial expan-
sion in particular is a wonderful way to do “in your head” estimates of quantities that would
otherwise require a calculator to evaluate.
Week 1: Discrete Charge and the Electrostatic Field 81
Note well that there are “no numbers” in the following problems. Most problems are for “all
students of physics”. Some problems are marked with a * as “advanced” and are intended
to be assigned primarily to physics majors or engineering students, who are expected to
know and use a bit more calculus than life science students, but note well that there is
plenty of calculus in the general problems! It is impossible to learn and understand physics
without calculus; Newton invented calculus just so he could formulate physics and this
course teaches the correct use of algebra, geometry, trigonometry, calculus in general
including simple differential equations (e.g. the harmonic oscillator, the wave equation) in
the solving of problems.
Problem 1.
Physics Concepts
In order to solve the following physics problems for homework, you will need to have the
following physics and math concepts first at hand, then in your long term memory, ready to
bring to bear whenever they are needed. Every week (or day, in a summer course) there
will be new ones.
To get them there efficiently, you will need to carefully organize what you learn as you
go along. This organized summary will be a standard, graded part of every homework
assignment!
Your homework will be graded in two equal parts. Ten points will be given for a complete
crossreferenced summary of the physics concepts used in each of the assigned problems.
One problem will be selected for grading in detail – usually one that well-exemplifies the
material covered that week – for ten more points.
Points will be taken off for egregiously missing concepts or omitted problems in the
concept summary. Don’t just name the concepts; if there is an equation and/or diagram
associated with the concept, put that down too. Indicate (by number) all of the homework
problems where a concept was used.
This concept summary will eventually help you prioritize your study and become your
own personal study guide to review for exams! To help you understand what I have in mind,
I’m building you a list of the concepts for this week, and indicating the problems that (will)
need them:
• Coulomb’s Law:
~ ij = ke qi qj (~
F
ri − ~ rj )
|~
ri − ~ rj |3
• Electric Field:
~
~ = lim F 0 = ke q(~
E
r0 − ~r)
q0 →0 q0 |~
r0 − ~r|3
~ = ke q r̂
E
r2
Needed in nearly all of the problems.
• This definition ensures that we can find the force on a charge as follows:
~ = qE
F ~
which is the version of Coulomb’s Law that we will most often use in the problems –
find field first, then find force if necessary. Used in nearly all of the problems in this
context.
• We should keep in mind that charge is conserved. The net charge of objects cannot
change; charge can only move around, not be created or destroyed. A basic concept.
• The electric dipole moment of a pair of equal and opposite point charges of magni-
tude q separated by a vector ~l is:
~ = q~l
p
We sometimes need the idea of quadrupole moments and monopole moments in this
chapter. Needed in problems 2, 3, 5, 6, 9.
τ =p
~ ~
~×E
Needed in problems 2, 3, 5, 6, 9.
Week 1: Discrete Charge and the Electrostatic Field 83
~ = m~
F a
τ = Iα
(problem 9); our knowledge of the Simple Harmonic Oscillator equation and its solu-
tions:
d2 x
+ ω2x = 0
dt2
(problems 9 and 11); and gravity near the Earth’s surface:
~ g = −mgŷ
F
(down, in problems 7 and 8); and the ideas associated with stable versus unstable
equilibrium in problem 3.
Our knowledge of Newton’s Laws, rotation and oscillation and gravity near the earth’s
surface from the Mechanics part of this course is essential in this part as well!
• Two pieces of math that we will use repeatedly in this part of the course are the
Taylor Series Expansion of a function in terms of its derivatives:
Note well the similarity between this concepts summary needed for the homework and
the concepts summary that started the chapter. This is no accident; the chapter summary
is there at the start for a reason! However, there may be additions or deletions – don’t just
copy the summary, and be sure to cross-reference the problems. The latter step is what
will really help you when you are studying for a quiz or exam. What are the most important
ideas, the ones you must know for the exam? Your concept review will (eventually) let you
see at a glance...
Also, I included more concepts than are strictly needed by the problems – don’t hesitate
to add important concepts to your list (including concepts from Introductory Physics 1 in
this series) even if none of the problems seem to need them! Some concepts are ideas
and underlie problems even when they aren’t actually/obviously used in an algebraic way
in the solution! Remember, anything that you needed to know to solve the problems should
(in the end) be in this list along with a list of the problems where it is needed.
84 Week 1: Discrete Charge and the Electrostatic Field
Problem 2.
c) Two equal positive charges +q located at y = −a and y = +a, and a third charge of
−2q at the origin. Note that in this arrangement, the net charge is zero (so we expect
no monopolar field far away). The two visible dipoles also cancel, so we expect no
dipolar field far away. What might we call the first surviving term in the distant field?
(Note that there are four monopoles in this distribution.)
Problem 3.
c) Find the magnitude and sign of a specific charge q0 that, when placed at the ori-
gin (0, 0) exactly results in a net force of zero on each of the three charges. What
will happen if any of the charges are displaced slightly from equilibrium in different
directions (that is, is the equilibrium stable or unstable)?
This (stable and unstable, in different directions) equilibrium is called a saddle point. We’ll
see why in a couple of chapters.
Week 1: Discrete Charge and the Electrostatic Field 85
Problem 4.
Problem 5.
phosphorescent screen
x v0
z(out)
l L
From the first commercial production in the mid-1930’s until the year 2000, “television”
and other electronic video displays were predominantly cathode ray tubes (CRTs). They
were subsequently superceded by the various high resolution flat panel displays and CRT-
based TVs ceased production in the US and Canada by 2010 (a good thing, since the
screens contained lead, a toxic heavy metal, to prevent X-ray damage to the skin and eyes
of viewers). They are, however, good examples of physics-based engineering!
In a (somewhat oversimpified) “electrostatic” CRT design, an electron of mass m and
charge −e (boiled off of a negatively charged heated “cathode”) emerges from a collimating
hole after a “fall” across an accerating field to move directly to the right with speed v0 along
the axis of a cathode ray tube. Assume that there is a uniform electric field E ~ = −E0 ŷ in
the region between the vertical deflection plates (of length l) and that everywhere else,
E~ = 0. A nearly flat phosphorescent screen (that glows where the electron beam strikes)
is a distance L from the end of the plates.
Ignoring the effect of the gravitional force on the electron as it is irrelevant for electrons
travelling at such high speed, find ∆y, the deflection from the center point where an unde-
flected electron beam would hit the screen. Hint: You might want to break the trajectory
problem up into two parts, across l and then across L.
86 Week 1: Discrete Charge and the Electrostatic Field
Problem 6.
dEx dU
Fx = px = −
dx dx
evaluated “at” x.
Problem 7.
A dipole p
~ = q∆~r points in the radial r̂ direction a distance r (where r ≫ ∆r) away
from a charge +Q at the origin as shown.
y
a) Show that the force exerted on the dipole by the point
∆ r +q charge is attractive and has the approximate magnitude:
−q 2kQpr
Fr ≈
r3
b) Now assume that it is the dipole that is centered at the
r origin and that the point charge Q is located a distance r
away along the line of the dipole. Use Newton’s third law,
your result for part a), and the definition of the electric
field to show that field at a distance r along this line has
2kpr
the approximate magnitude: Er ≈ ± 3
x
r
+Q
and is parallel to the dipole moment of the dipole on either side. You will probably need to
use a Binomial/Taylor expansion to deal with the “r ≫ ∆r” condition and arrive at the first
surviving term for the approximate result.
Week 1: Discrete Charge and the Electrostatic Field 87
Problem 8.
Note that numbers are given in this problem so you can confront just what a reasonable
“size” is for macroscopic isolated electric charges in the laboratory. Note well that it is
much, much smaller than a Coulomb!
Problem 9.
pivot E pivot
E
+q
+q
θ
−q
−q
side view top view
Suppose you have a “dumbbell” consisting of two identical (pointlike) masses m at-
tached to the ends of a thin (massless) rod of length ℓ that is suspended by a string and
pivoted at its center so that it can rotate freely in the horizontal plane. The masses carry
a charge of +q and −q, and the system is located in an uniform horizontal electric field E ~
parallel to the plane of rotation.
Show that for small values of of the angle θ between the direction of the dipole and the
electric field, the system displays simple harmonic motion, and obtain an expression for
the period of that motion. You may want to review simple harmonic motion and torsional
88 Week 1: Discrete Charge and the Electrostatic Field
Problem 10.
ω = (2g/yeq )1/2
+Q .
x −q,m
A small bead of mass m and carrying a negative
charge −q is constrained to move along a long, thin,
frictionless rod. A distance L from the center of this
rod is a positive charge Q. Show that if the bead is
L displaced a distance x from the center (where x ≪ L)
and released, it will exhibit simple harmonic motion.
Obtain an expression for the period of this motion in
terms of the parameters L, Q, q, and m.
You will need to use expansions to solve this problem.
+Q
90 Week 2: Continuous Charge and Gauss’s Law
Week 2: Continuous Charge and
Gauss’s Law
• Continuous Charge
Charge distributions can often be continuous. We therefore define the following
charge densities:
dq
ρ =
dV
dq
σ =
dA
dq
λ =
dL
for the charge per unit volume, per unit area, and per unit length respectively.
• Superposition Principle
To find the electrostatic field produced by a continuous charge density distribution,
we use the superposition principle in integral form:
ρ(~
r 0 ) · (~
r−~ r 0 )d3 r0
Z
~ r) = k
E(~
|~
r−~ r 0 |3
σ(~
r 0 ) · (~
r−~ r 0 )d2 r0
Z
~ r) = k
E(~
|~
r−~ r 0 |3
λ(~
r 0 ) · (~
r −~ r 0 )dr0
Z
~ r) = k
E(~
|~
r−~ r 0 |3
91
92 Week 2: Continuous Charge and Gauss’s Law
where in all cases the integral is over the entire charge distribution in question. Note
that dA0 = d2 r0 and dL0 = dr0 are the “area element” and “length element” one uses
in an infinitesimal chunk of the distribution in the last two expressions.
Qin S
I Z
~ · n̂ dA = 4πk
E ρ dV =
S/V V ǫ0
or in words, the flux of the electric field through a closed surface S equals the total
charge inside S divided by ǫ0 , the permittivity of the electric field.
Gauss’s law can be used to easily evaluate the electric field for charge density distri-
butions that have the symmetry of a coordinate system, but its real importance is that
it is one of Maxwell’s Equations, the fundamental laws of nature that govern charge
and the electromagnetic field.
~ || = 0
E
~ ⊥ only outside
• 5 Since the field at the surface of a conductor in ESE can at best be E
and zero inside, if we consider an infinitesimally thin Gaussian pillbox with inner
surface in the conductor and outer surface just outside, we can easily show that:
~ ⊥ = 4πke σ = σ
E
ǫ0
The field at the surface is directly proportional to the surface charge density!
Week 2: Continuous Charge and Gauss’s Law 93
In natural matter, charges are very, very small compared to the length scales we can
directly perceive. An atom is order of 1 Å (10−10 meters) in size where a nucleus is order
of 1 fermi (10−15 meters) in size. An electron is a pointlike particle with no physical extent
at all. In a tiny piece of solid matter – one only 10−6 meters cubed, say – there are around
(104 )3 = 1012 atoms, and each atom is made up of 2 to 200 electric charges in its electron
cloud and nucleus, and this is still only a chunk one micron in size!
Clearly, if we want to evaluate the electric field produced by a macroscopic piece of
~ i fields produced
matter, we’re going to have to do something other than just sum over the E
by all of these charges. Instead we average over the amount of charge inside all of the tiny
micron-scale blocks that might make up a large object. For each block there is a certain
net charge ∆Q, in the block of size (volume) ∆V . We can use this to define the average
charge density of the object:
∆Q
ρ= (2.1)
∆V
Now we can sum over a lot fewer objects. There aren’t as many blocks a micron in
size as there were charges, but there are still way, way too many blocks in an object even
the size of a centimeter – 1012 of them, in fact – too many for us to actually sum up with
a calculator. Generally, however, ρ varies only a little from block to block. Also, on a
centimeter-plus scale, those micron sized blocks are infinitesimal, small enough to treat as
if they are differential in size. We can then consider using calculus to do our sums. Here’s
how it works:
r − ri
ri
In the amoebic blob shaped object above, we’ve chopped the whole volume up into little
chunks ∆V in size (highly exaggerated in the picture so you can see them). We’ve tallied
up the charge in each block ∆Q, and labelled (in our minds) each block with an index i at
position ~
r i . We can then compute the field using the superposition principle at the point P
94 Week 2: Continuous Charge and Gauss’s Law
(position ~
r ) as:
X k∆Qi
~ tot (~
E r) = r\
(~ −~
ri ) (2.2)
|~ r i |2
r−~
i
As noted, there are too many chunks in the blob for us to sum over. So we pretend that
the charge is continuously distributed according to:
∆Q dQ
ρ = lim = (2.3)
∆V →0 ∆V dV
R
and turn the summation into an integral (remember both σ and stand for S(um), they are
both summation symbols, the latter the one we use for continuous things):
X k∆Qi r ′ )dV ′ \′
kρ(~
Z
~
E tot (~
r) = \
(~
r − ~
r i ) = (~
r −~
r) (2.4)
i
|~
r −~r i |2 V |~
r−~ r ′ |2
where we’ve used dQ = ρdV (in the primed coordinates we use to replace the ~ r i ’s). This is
just the field of every little differential sized chunk that makes up the entire object, summed
over all the chunks!
This is a lot to remember, so we’ll create a little mnemonic to help you. Just as we
found the electric field last week by using the field of a single point charge in its simplest
form and then putting it into suitable coordinates, we’ll find it this week the exact same way,
but the point charge in question will be dq and not q. That is:
~ = kq r̂
E ⇐⇒ ~ =
dE
k dq
r̂ (2.5)
r2 r2
To use the latter, we just have to find dq for the particular kind of distribution, and be able
to do the final integrals.
We used charge per unit volume in this discussion, but we will find that charge often
distributes itself on surfaces, and we’ll often need to find the field produced by lines as well.
We therefore define all of the charge densities we might need to handle these cases as:
dq
ρ= ⇐⇒ dq = ρ dV (2.6)
dV
dq
σ= ⇐⇒ dq = ρ dA (2.7)
dA
dq
λ= ⇐⇒ dq = ρ dℓ (2.8)
dℓ
the charge per unit volume, per unit area, and per unit length respectively. In each equation
I put the way we will need to use it – to find dq – after the defining expression.
There are thus three steps associated with solving an actual problem:
a) Draw a picture, add a suitable coordinate system, identify the right differential chunk
(one you can integrate over) and draw in the vectors needed to express dE ~ as given
above.
~ (or rather, usually, |dE|)
b) Put down an expression for dE ~ in terms of the coordinates,
and find its vector components in terms of those same coordinates, using symmetry
to eliminate unnecessary work.
Week 2: Continuous Charge and Gauss’s Law 95
The first two are pretty simple, and are worth most of the credit. The last will be easy
enough if you’ve done the homework and are working hard to relearn all the calculus you
need to do the integrals required in this course48 ! If your grader has a generous heart, at
the beginning of this course you won’t be too heavily penalized if you do the first two steps
correctly, but this is the second semester in a calculus-based physics course and by now
you really should have mastered all the math but the “new” calculus specific to E&M – line,
surface, and volume integrals, especially those in cylindrical and spherical coordinates –
that were only touched on in the Mechanics course.
Let’s try some examples.
dE z
dE dE z
Q φ
z r
dθ
y
R
ds (or dq)
x
Figure 2.2: A charged ring with charge per unit length λ.
In figure 2.2 above we see a circular ring of charge of radius R and uniform charge per
unit length:
Q Q
λ= = (2.9)
L 2πR
Our job is to find the electric field at an arbitrary point on the z-axis, a point with sufficient
symmetry to make the evaluation fairly straightforward49 .
48
http://www.phy.duke.edu/ rgb/Class/one-sheet-math-review.php The “One Sheet Math Review” pages
here may help! These encapsulate just about all of the math you will need for this course – algebra, cal-
culus, vectors – with everything you need to know in each topic on a single sheet. Print them out and put them
in with your notes and study them from time to time as needed on problems!
49
We could use the same general approach to find the field at an arbitrary point in space, but the calculus
and geometry required to get an actual answer would become very difficult in cartesian coordinates – so
difficult that in real life one would be very likely to concede finding an analytic solution as too difficult and resort
to the use of a computer instead, or at least try it in a coordinate system more adapted to the symmetry of the
ring.
96 Week 2: Continuous Charge and Gauss’s Law
We begin by finding a small chunk of charge on the ring expressed in some coordinate
we can integrate over. In this case the best possible coordinate system to use is (fairly
obviously) cylindrical coordinates, so that we can locate a small chunk on the ring at an
angle φ swung around in the counterclockwise direction from the positive x-axis. The
angular width of the chunk is then dφ, and the length of the arc subtended is ds = R dφ.
From the previous section we recall that we need to find the charge of this little chunk
of arc, repeating the litany: “the charge in the chunk is the charge per unit length, times
the length of the chunk”. That is:
dφ
dq = λ ds = λR dφ = Q (2.10)
2π
where the last form is clearly the fraction of the total charge that lies inside the tiny sub-
tended arc. The magnitude of the field produced by this little chunk of charge at the point
z on the axis is:
~ = ke dq = ke λadφ
|dE| (2.11)
r2 z 2 + R2
√
where we have used the pythagorean theorem to evaluate r = z 2 + R2 as drawn in the
figure.
This vector has three components. All we need to worry about is the z-component from
the symmetry of the ring. The field at a point on the axis cannot change as we rotate the
coordinate system around the z-axis because the ring of charge looks the same as we
do. Therefore it cannot have x or y components as these would change as we rotated the
coordinate system. However, for the sake of completeness (and to give you something to
figure out on the picture) I’ll put down the x and y components as well:
~ sin θ cos φ
dEx = −|dE| (2.12)
~ sin θ sin φ
dEy = −|dE| (2.13)
dEz ~ cos φ
= |dE| (2.14)
In these equations, we must evaluate sin φ and cos φ using the right triangle Rzr:
a R
sin φ = = 2 (2.15)
r (z + R2 )1/2
z z
cos φ = = 2 (2.16)
r (z + R2 )1/2
so that:
2π
ke λz adθ ke λ(2πR) z ke Q z
Z
Ez = 2 2 3/2
= 2 2 3/2
= 2 (2.17)
0 (z + R ) (z + R ) (z + R2 )3/2
Physicists are as lazy as they can be when it is possible to be lazy without compro-
mising correctness! We will often invoke symmetry – as I did above to conclude that
Ex = Ey = 0 – to avoid doing a tedious computation whose outcome we can already see
and concentrate our effort on the one integral we actually cannot avoid (at least not without
more practice than you’ve had so far in avoiding integrals:-).
You are encouraged to follow this practice yourselves, whenever you can “see” where
symmetry can help out! However, you may well be confused as to exactly what the argu-
ment is that leads us to this hands-free conclusion, so just this once let’s work through it in
words.
Week 2: Continuous Charge and Gauss’s Law 97
The problem itself has cylindrical symmetry. In particular, if we mentally rotate the ring
around the z-axis (or rotate the coordinate frame itself around its z-axis without moving the
ring), nothing in the problem changes! If we had a nonzero, say, x component in the efield,
then as we rotate the ring or coordinate, the direction of this component would have to
change along with the ring, but since the problem itself doesn’t change the solution cannot
change either and the only way this is possible is if Ex = 0 (and ditto for Ey ). Ez , on the
other hand, would not change with this rotation even if it isn’t zero, so it is allowed!
There are several other arguments that work just as well. For every chunk ds on the
ring, there is an identical chunk that is its mirror image across the z axis, and the vector
components of these two chunks in the plane of the ring cancel just like the Ex field of two
point charges at x = ±R cancel (while the vertical components add).
Now, this all sounds (I hope) just great to you, and illustrates verbally some of the
reasoning that flashes through a physicists mind before digging in to compute Ez only,
but you might well still be suspicious. Words are all well and good, but math is the real
language of physics – does this intuitive conclusion actually work? So again, just this once,
let’s do the calculus explicitly:
Z 2π 2π
ke λa2 cos θdθ ke λa2
Ex = − 2 2 3/2
=− 2 2 3/2
· sin θ = 0 (2.18)
0 (z + a ) (z + a ) 0
(and ditto, of course, for Ey )! The reason Ex = Ey = 0 mathematically is because:
Z 2π
[sin θ, cos θ] dθ = 0
0
but why bother setting this up and doing it when we can just see that it must be so?
Invoking symmetry when appropriate is thus a perfectly legitimate step in solving many
of the physics problems you will enounter in this textbook and course. It will take a bit of
practice to see just when that is, and you won’t lose points if you get answers the hard way,
but it will take time, time you might wish to spend some other way during an exam, so do
give it a try.
In figure 2.3 we see a long straight line of charge. As before, we have to choose a coordi-
nate system in terms of which to do the integral to add up the field components produced
by all the little chunks of charge that make up the line.
At first glance, it seems as though cartesian components are a natural choice for the
problem, so we start by using them. We want to find the field at an arbitrary point P in
space, so we pick one, make the x-axis lie along the line of charge, and draw a y-axis
through the x-axis such that P is the (shortest) perpendicular distance y from the x-axis.
In this coordinate frame, the left hand end of the rod is x1 , right right hand is x2 . Either
x1 or x2 may be positive or negative depending on where P is relative to the line, and by
making P and the x-y coordinate frame all lie in one plane, we have made z irrelevant (that
is, expect Ez = 0 from good old “symmetry”).
Next, we pick a chunk of charge of length dx, a distance x out from the origin directly
under the point P . The charge of our chunk is again given by our ‘magic’ incantation:
98 Week 2: Continuous Charge and Gauss’s Law
dE x
dE y
θ
dE
y
θ dq
r
Q
−x1 x2
x dx
Figure 2.3: A straight line of charge with uniform charge per unit length λ.
“The charge of the chunk is the charge per unit length of the chunk times the length of the
chunk”, or:
dq = λ dx (2.19)
Finally, the magnitude of the field is given by:
~ = dE = ke dq ke λ dx
|dE| 2
= 2 (2.20)
r (x + y 2 )
We need in this case to evaluate both the (red) dEx and dEy , components in figure 2.3
as Ex and Ey will in general both be nonzero (unless P happens to be in the middle of the
line, in which case we expect Ex = 0 – from symmetry). From the triangles in the figure it
is pretty obvious that:
where we will assume that the θ we have drawn is positive when swung out to the right in
the positive x direction, and negative when it swings out in the direction of negative x.
Noting (from the xyr right triangle) that cos θ = y/r we get:
ke λ dx ke λ dx dx
dEy = cos θ = 2 cos θ = ke λy 2 (2.23)
r2 (x + y 2 ) (x + y 2 )3/2
(for example). This, unfortunately, doesn’t look terribly easy to integrate!
In fact, this is one of the most difficult integrals we have to do in this course, not be-
cause it is particularly difficult but because it is one of the few times we have to integrate
something other than xn dx, a simple trig function, or an exponential function with fairly ob-
vious u-substitutions. The problem is that we have too many mutually dependent variabls
– as we vary x, both r and θ vary as well and vice versa!
It turns out that this problem is easier to do if we convert it into a trigonometric form
using nothing but y (which is fixed) and θ as our one independent variable. Here’s how it
Week 2: Continuous Charge and Gauss’s Law 99
If we substitute equation 2.25 into the expressions for dEx and dEy above we get:
ke λ dx ke λ ydθ ke λ y✁ ke λ
dEy = cos θ = 2 cos θ = cos θdθ = cos θdθ (2.27)
r2 r cos2 θ y✁2 y
which looks easy to integrate!
θ2
θ1
x1 x2 x
Figure 2.4: Definition of the limits of integration θ1 and θ2 in terms of our coordinate frame.
The limits of integration are the angles to the dotted lines that point at the ends of the
line, which we will call θ1 on the left, θ2 on the right as indicated in figure 2.4. Using these
limits:
ke λ θ2 ke λ
Z
Ey = cos θdθ = (sin θ2 − sin θ1 ) (2.28)
y θ1 y
where we should carefully note that the specific θ1 in the figure above is a negative angle
as drawn above (just as x1 happens to be negative) and would go into this formula as a
negative number in radians.
If we evaluate Ex everything is the same except that there is an overall minus sign and
we integrate over sin θ dθ instead, to get:
ke λ θ2 ke λ
Z
Ex = − sin θdθ = (cos θ2 − cos θ1 ) (2.29)
y θ1 y
An interesting consequence of this result is that we can easily evaluate the field a
distance y away from an infinite line of charge (that still has a uniform charge per unit
length λ. In that case, θ1 = −π/2 and θ2 = π/2. We get:
Ex (∞) = 0 (2.30)
2ke λ
Ey (∞) = (2.31)
y
where we should recall that every point P has an x-coordinate in the middle of an infinite
line of charge so that Ex = 0 from symmetry in this case! Isn’t symmetry useful?
Remember this result for an infinite line of charge for later, where we will obtain it again
using Gauss’s Law and hence use it to check that Gauss’s Law works as expected.
100 Week 2: Continuous Charge and Gauss’s Law
dE dE z
P
σ
φ 2 2 1/2
z (z + r )
θ r y
R dr
x r dθ dA = r dr d θ
In figure 2.5 above we see a disk of charge with a uniform charge density:
Q
σ= (2.32)
πR2
As before with a ring, we can only easily evaluate the field on the z-axis where we now
from symmetry that Ex = Ey = 0. Also as before, we will proceed by finding the field of a
tiny chunk of charge in suitable coordinates and sum it up using integration(s).
In order for us to be able to sum over all of the chunks of charge that make up the disk,
we have to use coordinates in which integrating over the disk’s area is easy. It will not be
easy at all in cartesian coordinates (try it, if you enjoy suffering)! Instead, we use plane
polar coordinates (r, θ) for the disk itself, but keep the cartesian coordinate z to describe
the point P .
We have just invented a 3D coordinate frame called cylindrical coordinates (r, θ, z).
They are in all respects equivalent to the cartesian coordinates (x, y, z), and one can freely
go from a description in one frame to the other if it turns out the solution is easier there.
Cylindrical coordinates are often quite useful when a problem has cylindrical symmetry –
does not change when rotating around the z (polar) axis – or when a domain we must
integrate over has clean boundaries in this coordinate frame.
We actually implicitly used them to do the ring of charge example above, but in that
case r = R and our integral was basically both one-dimensional and trivial, so it wasn’t
worth pointing out. Later in the chapter (after working out Gauss’s Law) we will spend some
time discussing the three coordinate frames that are the most useful in electrodynamics
problems: cartesian, cylindrical, and spherical polar coordinates.
At the moment we can get by without the full discussion if we note that the easiest way
to integrate over the disk of charge in plane polar coordinates locates a point at (r, θ) inside
the disk. There we swing out a small chunk of arc length r dθ as before for the ring, and
Week 2: Continuous Charge and Gauss’s Law 101
then (mentally) push the tiny arc out in the r-direction by a distance dr to sweep out a tiny
“rectangular” differential chunk of area dA. This area is then:
dA = r dθ dr. (2.33)
As an exercise, let’s use this differential area element to find the area of the disk of
radius R itself. To do this, all we have to do is integrate dA between appropriate limits that
cover the disk exactly once. The advantage of using plane polar (or cylindrical) coordinates
is that in these coordinates, the two-dimensional integral separates into two independent
one dimensional integrals which are easy to do using our standard set of integrals (from,
say, the one-sheet reviews). Here’s the algebra:
R Z 2π R 2π
R2
Z Z Z Z
A= dA = rdr dθ = rdr dθ = (2π) = πR2 . (2.34)
0 0 0 0 2
I hope you already know that the area of a disk is πR2 , but you may have wondered how
we know it! Now you can see – we’ve explicitly evaluated the area of a disk using calculus!
This is an important exercise, as it shows that the integral can be grouped so that it
separates. That is, the r integration and θ integration are independent. We will only do
integrals over more than one coordinate in this course when they separate (in a suitable
coordinate frame!), so that a student can easily work enough non-trivial problems to master
physics at this level if they have mastered (a rather small subset of) one-dimensional inte-
gration methods. These separable problems are trivially multivariate, so to speak, and do
not require that a student have taken a course in multivariate calculus to fully understand.
At any rate, we are now ready to proceed to solve our actual problem. We can easily
find dq, the charge of a tiny chunk of the disk at the specific coordinates in the plane r, θ
from our mantra: “The charge of the chunk is the charge per unit area times the area of
the chunk”, or:
Q
dq = σdA = σ rdr dθ = rdr dθ (2.35)
πR2
As before, we find
~ = dE = ke dq ke σ rdr dθ
|dE| = (2.36)
(r 2 +z ) 2 (r 2 + z 2 )
and
ke σz rdr dθ
dEz = dE cos φ = (2.37)
(r 2 + z 2 )3/2
1
(where now cos φ = z/(r 2 + z 2 ) 2 from the 0rz right triangle in figure 2.5 above).
Finally:
R Z 2π Z R Z 2π
rdr dθ rdr
Z Z
Ez = dEz = ke σz = ke σz dθ (2.38)
disk 0 0 (r 2 + z 2 )3/2 0 (r 2 + z 2 )3/2 0
Note that this integral exactly covers the disk! It runs from r = 0 to r = R, and for each
r it runs from θ = 0 to θ = 2π, catching the entire ring with radius r (and differential
thickness dr). These integrals are independent since r is independent of θ and the limits
of integration are fixed and do not vary with either coordinate.
102 Week 2: Continuous Charge and Gauss’s Law
As we saw in examples done in the previous chapter, when we are far away from a
charge distribution the details of that distribution are averaged away and we are left with a
field whose leading order behavior is determined by what is called its multipolar moment
– if the distribution has a net charge it is monopolar; if it has no net charge but has a
+/− asymmetry it is dipolar; if it has no net charge but two balanced dipolar charges it is
quadrupolar; and so on. This means that we can often guess or very simply calculate what
the field of a charge distribution will look like (to leading order) far away from the distribution;
all we need to know (or calculate) are the total charge and/or the total separated charge
and distance and direction of separation.
In future electrodynamics courses, you will learn how to express the multipolar mo-
ments of the electric and magnetic fields as specific integrals of the charge and current
distributions multiplied with some very special functions that make at least formulating the
electromagnetic field produced by those distributions simple enough – if you can do the
integrals. Fortunately, with modern computers we can basically always do the integrals
numerically, even if it would be better to give yourself a root canal with a rusty Black and
Decker drill than try to do them analytically...
At this point we are almost finished with examples of how to use direct integration over
a charge distribution to find the vector electric field. At this point you should be able to
tackle all of the problems on the homework and/or the in-class problem sets (or, for that
matter, in other textbooks at this introductory level). We will do one more (optional) highly
advanced example below, after we cover Gauss’s Law, where it will serve both to directly
prove the ‘shell theorem’ covered with Newton’s Law of Gravitation in the first semester
of this course and to help validate Gauss’s Law in application to a nontrivial spherically
symmetric charge distribution.
Gauss’s Law for the electrostatic field is, as we shall see, one of Maxwell’s Equations.50
Maxwell’s equations are, in turn, the equations of motion for the unified dynamic electro-
magnetic field, considered to be ‘laws of nature’, and in my option at least, are one of the
most beautiful things (mathematically and conceptually speaking) in all of physics. It is
therefore of critical importance that you work hard developing a conceptual understanding
of this law that permits you to visualize the relationship between the mathematics of its
expression and the geometry of the field in addition to “just” learning to solve problems
with it.
For that reason we will begin this chapter with a derivation of this law from the field
equation of the point charge (which in turn is basically Coulomb’s Law in disguise) and
the superposition principle. Derivations, of course, work both ways and physicists today
generally consider Gauss’s Law the fundamental law of nature and the field of a point
charge and Coulomb’s law are rather consequences to be derived from it instead of the
other way around. You will not be responsible for being able to “do” the derivation yourself
in a problem or on an exam, but it is strongly advised that you work through it a couple
50
Wikipedia: http://www.wikipedia.org/wiki/Maxwell’s Equations.
104 Week 2: Continuous Charge and Gauss’s Law
of times anyway and get to where you intuitively understand the relationship between flux
integrals and conservation, as we’ll use this idea in a critical way later when we add the
Maxwell Displacement Current to Ampere’s Law in order to be able to show that light is an
electromagnetic wave!
θ E
n’
a θ
n
a’
∆ S’
b
∆S
a a’
Figure 2.6: Geometry of the flux integral over a small surface area
We begin our derivation of Gauss’s Law by considering the flux of the electrostatic vec-
tor field through a small rectangular patch of surface ∆S. To compute this, we first must
understand what the flux of an arbitrary vector field F ~ through a surface S is. Mathemati-
cally, the flux of a vector field through some surface is defined to be:
Z
φf = ~ · n̂ dS
F (2.42)
∆S
Note that the word flux means flow, and this integral measures the flow of the field
through the surface. It’s mathematical purpose is to detect the conservation of flow in the
~ at all points on the surface, com-
vector field. Basically it takes the magnitude of the field F
~
putes the component of F that goes through the surface at right angles (instead of tangent
to the surface, which doesn’t really go “through”), multiplies it times a tiny differential chunk
of the area, and then adds up all the differential chunks thus computed.
Let’s look at this in more detail, specializing to the case of the electric field. Consider
figure 2.6, where we show electric field lines flowing through a small ∆S = ab at right
angles to the field lines (so that a unit vector n̂ normal to the surface is parallel to the
electric field). ∆S is small enough that the continuous field is approximately uniform across
it (we will eventually make it differentially small, of course, so this is no problem).
Since the field is uniform and at right angles to the field, the flux through just this little
chunk is easy to evaluate. It is just:
~
∆φe = |E|∆S ~
= |E|ab (2.43)
Week 2: Continuous Charge and Gauss’s Law 105
That was easy enough! Let’s make things a little more complicated.
Suppose that we consider a rectangular surface ∆S ′ = a′ b that is tipped with respect
to the first surface at an angle θ, that shares the length b of the first surface, and that has a
length a′ that is long enough that it precisely subtends the same “stream” of the vector field
E~ as shown. Basically, all the field lines that pass through the first surface pass through
the second surface, and again we are assuming that the field is continuous and we can
~
make the picture as small as we like (differentially small in the limit) so that a conserved E
doesn’t change its magnitude or direction in between the two surfaces.
Note that a = a′ cos(θ), so that:
ab
∆S ′ = a′ b = (2.44)
cos(θ)
~ by ∆S ′ , we see that we’ll get ∆φ′e = ∆φe / cos(θ), right? And we’d
If we just multiply |E|
like to get the same thing, as we’d like the flux integral to measure the continuity and
conservation of the electric field across the tiny region between the two surfaces. So we
multiply by cos(θ) on top to compensate and get:
~ cos(θ)a′ b
∆φ′e = |E|
~ cos(θ) ab
= |E|
cos(θ)
~
= |E|ab
= ∆φe (2.45)
~ · n̂∆S = E
lim ∆φe = E ~ · n̂′ ∆S ′ (2.46)
∆S→0
which does not vary for any possible tipping of a surface element ∆S originally perpendic-
ular to the field lines as long as the tipped surface ∆S ′ stretches to cover the exact same
field lines as illustrated above.
This is all very specific to the case where the field is uniform and points in a single
direction, but in fact it is easy to see that the result is more general than that.
Suppose that we consider a single a point charge, producing the usual, symmetric
electric field:
E~ = ke q r̂ (2.47)
r2
Some of that field streams out in the narrow solid cone with apex on the charge q pictured
in figure 2.7 – this is basically a cone with a “two-dimensional angle” – called a solid
angle – in the apex. At the radius r, this cone chops off a chunk of a surrounding (closed)
spherical surface with area:
dA = r 2 dΩ (2.48)
106 Week 2: Continuous Charge and Gauss’s Law
q ∆Ω
r dA θ
n
dA’ n E
Figure 2.7: A spherical surface area dA subtended by the cone with solid angle dΩ with a
charge q at the apex, a distance r away, and its outward directed unit vector normal n̂ = r̂
(in red), and an area at the same distance r tipped so it is still subtended by the same
solid angle, but now has an angle θ between its unit normal n̂′ (in blue) and r̂ (the direction
of the outflowing electric field).
As one can see in the section discussing spherical polar coordinates, one can actually
formulate dΩ in coordinates and integrate it over the entire solid angle around the charge
such that (for fixed r):
I I Z 2π Z π
2 2
A= dA = r dΩ = r dφ sin θdθ = 4πr 2 (2.49)
0 0
Thus the solid angle of 4π exactly “covers the sphere” a single time the area of a sphere of
radius r is 4πr 2 .
If one tips up this surface so that it is still at the distance r from the charge (within
differential scales) and still is subtended by the same conical solid angle so that the same
r 2 dΩ
field lines flow through both, its area still increases to dA′ = (because all of the
cos θ
lengths on this tipped surface along the tipping angle increase precisely as they do in
figure 2.6 above, while the ones at right angles to this are unchanged). On a differential
scale, then, this is still precisely compensated by taking a dot product with r̂, the direction
of the electric field.
Now let’s think about what this means if we deform a closed spherical surface S sur-
rounding the charge q into an arbitrary closed surface S ′ that still completely contains the
charge as illustrated in figure 2.8. From the arguments given above, the flux of the electric
field from the point charge through the tipped differential patch dA′ that osculates (kisses)
dA at one end but is tipped up through an angle θ so it is actually a part of the blob shaped
arbitrary closed surface S ′ , the flux through the two patches is the same:
2
dφ′e = E ~ cos θ r dΩ = |E|
~ · n̂′ dA′ = |E| ~ r 2 dΩ = ∆φe (2.50)
cos θ
In the differential limit, then, we can compute the flux through a small chunk of the
Week 2: Continuous Charge and Gauss’s Law 107
S
d Ω ∆S
q
r ∆ S’
n
θ n
E
Figure 2.8: Point charge inside a closed surface S ′ and inside the closed spherical surface
S of radius r that osculates S ′ at the patch dA. We have just shown that the electric field
flux dφe through dA is equal to the flux through the tipped dA′ .
dφe = E~ · n̂dA′
~ r 2 dΩ
= |E|
ke q 2
= r dΩ
r2
= ke q dΩ (2.51)
which is independent of the shape of S ′ and involves only the differential solid angle swept
out from the charge as one does the integral. The point is that the r 2 in the differential
area precisely cancels the r 2 in the electric field, while the cos θ in the dot product precisely
compensates for the increased area of the tipped differential patch that still is subtended
by the (same) solid angle dΩ
This result no longer depends on anything but the solid angle! to compute the
total flux through the closed surface S or S ′ , then, we just have to integrate the complete
solid angle surrounding the point charge! We already did this above – the result in spherical
polar coordinates was:
Z Z π Z 2π
dΩ = sin(θ)dθ dφ = 4π “steradians” (2.52)
0 0
(where steradians is the name of the dimensionless solid angle “coordinate” just as radians
is the name of he usual planar angle “coordinate”). Thus we let S itself be any closed
surface (since it doesn’t matter if it is spherical and concentric or neither to get:
I
φe = E~ · n̂ dS = 4πke q (2.53)
S
independent of the shape of the closed surface S that we integrate over that en-
closes the charge q!
108 Week 2: Continuous Charge and Gauss’s Law
S
q
S’
Figure 2.9: In this figure, closed surface S is a sphere concentric with the charge q, while
the red closed surface S ′ consists of two hemispheric surfaces and a flat plane bisecting S
and does not contain the charge q.
But what about closed surfaces with no point charge inside? It is easy enough to
see that if the charge q is outside the closed surface S ′ , the net flux through S ′ is zero.
Consider figure 2.9. The lower half of the sphere S subtends 2π steradians (half the total
solid angle of the sphere), so exactly half of the flux from charge q, 2πke q, emerges from
it. I deliberately drew S ′ so that all of the electric flux that enters the closed (red) surface S ′
on the inner red hemispheric surface also exits through the outer red hemispheric surface.
There is no contribution to the flux through S ′ from the bisecting plane part of S ′ as the
field is parallel to the surface. Hence:
It might look like this result is a special case, but really it is quite general. Anytime
the field enters on one side of a closed surface but exits on the other, the contribution of
the flux through that part of the surface cancels because the two sides subtend the same
solid angle with one side going in and the other out (hence having opposite signs in the
flux integral) for a net contribution of zero. Closed surfaces don’t even have to form a
simply connected domain (that’s mathspeak for “they can have tunnels through them”, the
topology of e.g. a donut-shaped surface or torus) – they just have to have a surface that
splits space into two pieces such that the only way to go from the inside to the outside is
through a surface, not around it the way one can go around a piece of paper to get from
one side to the other without going through.
To get our final result, Gauss’s Law itself, all that remains is using the superposition
principle. If we enclose more than one charge in S:
I I
~
E tot · n̂ dA = (E~1 + E
~ 2 + ...) · n̂ dA = 4πke (q1 + q2 + ... = 4πke Qtot ) (2.55)
S S
because integration is a linear operation – the integral of a sum is the sum of the inte-
grals. Clearly this result doesn’t depend on the charges being “point charges”, as we can
Week 2: Continuous Charge and Gauss’s Law 109
follow the usual ritual and coarse grain the sum to convert it to an integral so it applies to
continuous charge distributions as well as usual. When we do so, we (finally!) arrive at:
Gauss’s Law for the Electric Field
Qin S
I Z
~ · n̂ dA = 4πke
E ρ dV = (2.56)
S/V V ǫ0
in integral form. In this expression, we have (re)introduced a new quantity called the
permittivity of free space, ǫ0 , seen but not remarked upon in equation 1.10. It is trivially
related to the Coulomb constant we have been using exclusively to find the electric field or
force up to now:
1 1 Coulomb2
ke = ⇔ ǫ0 = = 8.85 × 10−12 (2.57)
4πǫ0 4πke Newton-meters2
The reason we have ignored it up to now is that it is more difficult to remember than ke ,
and we don’t need it yet to help us understand, for example, electric fields in matter and
polarizability. However, I will start using it occasionally in algebraic expressions of Gauss’s
Law just so you can get used to it and learn where it goes gradually without having to work
very hard at it, so it is there when we need it. At that time we will also learn more useful
ways of expressing its SI units!
In words, Gauss’s Law for Electrostatics is:
The flux of the electric field through a closed surface S equals the total charge
inside S divided by ǫ0 (the permittivity in a vacuum of the electric field).
This is our first of four Maxwell’s Equations that we will cover, in considerable detail, this
semester. Those four equations, plus perhaps a few definitions of things like force in terms
of field, (almost) completely determine electrodynamics – the study of the unified electro-
magnetic field! Because we will use it often all semester long, I will usually abbreviate it
instead of write it out as “GLE”.
Note well that I used integration to expressed the total charge of a continuous distribu-
tion in the boxed expression, but of course Gauss’s Law is equally well the discrete sum
immediately preceding it that I got from directly considering superposition and the linearity
of integrals. The main reason to write GLE in just this way is that if we (again) apply the
multivariate calculus result known as the divergence theorem, we can convert the integral
form into a partial differential equation that is also GLE:
~ ·E
∇ ~ = ρ (2.58)
ǫ0
So, what’s GLE good for? Lots! But for the moment, we’ll start but using GLE to
easily evaluate the electric field for charge density distributions that have the symmetry of
a coordinate system that we’d otherwise have to evaluate using painful direct integration.
We will also use it to help us reason about things like the distribution of charge on a
conductor in electrostatic equilibrium. And don’t forget, we consider it to be the actual Law
of Nature for the electrostatic field, so things like the field of a point charge and Coulomb’s
Law and so on are actually consequences of Gauss’s Law (or consistently equivalent to
Gauss’s Law) rather than the other way around. So basically, everything else we do with
the electrostatic field this semester will be a “use” of Gauss’s Law.
110 Week 2: Continuous Charge and Gauss’s Law
One of the first and most important applications of Gauss’s law for our current purposes
will be to easily evaluate the electric field for certain symmetric charge distributions that
we’d otherwise have to integrate over, painfully. There are precisely three symmetries we
can manage in this way:
That’s it! No more. For charge distributions that are spherically symmetric, cylindrically
symmetric, or planarly symmetric, we can do the flux integral in Gauss’s law once and for
all for the symmetry. As we’ll see, all that remains for us to be able to easily obtain the field
from algebra is for us to evaluate the total charge inside a Gaussian surface for any given
symmetric distribution. Here’s the recipe:
a) Draw a closed Gaussian Surface that has the symmetry of the charge distribution.
The various pieces that make up the closed surface should either be perpendicular
to the field (which should also be constant on those pieces) or parallel to the field
(which may then vary but which produces no flux through the surface).
b) Evaluate the flux through this surface. The flux integral will have exactly the same
form for every problem with each given symmetry, so we will do this once and for all
for each surface type and be done with it!
c) Compute the total charge inside this surface. This is the only part of the solution that
is “work”, or that might be different from problem to problem. Sometimes it will be
easy, adding it up on fingers and toes. Sometimes it will be fairly easy, multiplying a
constant charge per unit volume times a volume to obtain the charge, say. At worst
it will be a problem in integration if the associated density of charge is a function of
position.
d) Set the (once and for all) flux integral equal to the (computed per problem) charge
~ That’s all there is to it!
inside the surface and solve for |E|.
Now, you don’t want to be memorizing these steps, you want to be learning them, so
please use exactly these steps and show all of your work doing them in every homework
problem that requires using them. If you use them five or six times in a row, in slightly
different contexts, it will get quite easy! At the very least, even if you get a problem where
you can’t “do” (say) an integral to find the charge inside a given surface, you’ll get most of
the credit for laying out the precisely correct method except for an integral you can’t quite
do.
Note Well: You cannot use Gauss’s Law to e.g. evaluate the field of a ring of charge,
or a disk over charge, or a line segment of charge or any other continuous distribution that
does not have the symmetry of sphere, infinite cylinder, or infinite plane. Sorry, that’s just
Week 2: Continuous Charge and Gauss’s Law 111
the way it is. It isn’t that it isn’t true for these distributions, it is that we cannot compute the
flux integral. Let’s do some examples, at least one for each symmetry.
σo
S
1
a r S2
Figure 2.10: A spherical shell of radius a, carrying a uniform charge per unit area σ0 . Two
spherical concentric Gaussian surfaces S1 (with radius r < a and S2 (with radius r > a)
are shown.
Suppose you are given a spherical shell of charge with a uniform charge per
unit area σ0 and radius a. Find the field everywhere in space.
As you can see in figure 2.10, there are two distinct regions where we must find the
field: inside the shell and outside the shell. Draw a spherical Gaussian surface S1 inside
~ must
the sphere (for r < a). From the symmetry of the distribution we know that the field E
point in the direction of ~
r and (hence) be perpendicular and constant in magnitude at all
points on the Gaussian surface S1 . Hence:
I I
φe = ~ · r̂ dA = Er
E dA = Er (4πr 2 ) (2.59)
S1 S1
where it is presumed that everybody knows how to integrate to evalute the area of a sphere
and knows the result.
The total charge QS inside this sphere is zero by inspection – the fingers and toes thing.
That was easy! Now we write Gauss’s law:
~ · r̂ dA = Er (4πr 2 ) = QS1 = 0
I
φe = E (2.60)
S1 ǫ0
and solve for Er :
Er (4πr 2 ) = 0
0
=
4πr 2
Er = 0 for r < a (2.61)
112 Week 2: Continuous Charge and Gauss’s Law
We’ve just shown that in general the electric field of a spherical shell of charge (like the
gravitational field of a spherical shell of mass last semester) vanishes inside, but using
Gauss’s law the derivation was trivial!
Outside the shell we draw a second spherical Gaussian surface S2 at r > a. Again,
the field must be constant and normal to all points on this surface from symmetry. The flux
integral is algebraically identical:
I I
φe = ~
E · r̂ dA = Er dA = Er (4πr 2 ) (2.62)
S2 S2
and in fact it will always have this algebraic form for a spherical problem, to the point where
we will get bored writing this line out umpty times doing homework. Don’t let that stop you!
Do it every time, as when you know something well enough to be slightly bored writing it
out, that’s just about perfect, isn’t it?
Again we can count up the charge inside S2 on the thumbs of one hand. It is the total
charge on the shell! Which is, in fact (noting that dA for a spherical shell of radius a is
a2 sin(θ) dθ dφ):
Z Z 2π Z π Z 1
QS = σ0 dA = dφ sin θdθ a2 σ0 = 2πa2 σ0 d(cos θ)
S 0 0 −1
= 4πa2 σ0 (2.63)
which we could have done using our heads instead of calculus, but there is a clever trick
in this example (using sin θdθ = −d(cos θ) to change variables and limits on the θ integral)
which we used above when explicitly integrating above and which we’ll have occasion to
use again in other problems.
Finally, we write out Gauss’s law and solve for Er :
QS
φe = Er (4πr 2 ) = (2.64)
ǫ0
or
Qs ke Qs
Er = 2
= 2 (2.65)
4πǫ0 r r
where once again Gauss’s law gets us extremely simply something we probably should
remember from last semester, which is that the field of a spherically symmetric charge
distribution outside that distribution is the same as that of a point charge with the same net
charge located the origin.
This is exactly what we got the hard way earlier in this chapter! The hard way being
an explicit (and quite difficult) integral over the actual charge distribution. The fact that we
get the same answer should give us some confidence that Gauss’s Law is true and correct.
It also convinces us that when we can use it it is much easier than explicit integration!
In lecture your instructor will probably do a few more difficult problems – perhaps a
solid sphere of charge, or multiple spherical shells, or even a solid sphere with a charge
distribution like ρ(r) = Ar where A is a constant! You should be able to do any problem with
a spherical distribution of charge that you can integrate or sum inside any given Gaussian
sphere using this method.
Week 2: Continuous Charge and Gauss’s Law 113
Also note that once one has done a single spherical shell, one can easily do as many
concentric shells as you might have on your fingers and toes using the superposition prin-
ciple. Simply add the field produced by each shell at the point in question (which might
be inside or outside the given shell) to that produced by all the other shells! There’s a
homework problem to help you learn that – do it!
dE
dE dE z
R−r α
s
R
dq
r
θ
φ y
Figure 2.11: Geometry for finding the field of a uniform spherical shell of constant charge
density σ by direct integration, both inside and outside. Note that θ is the angle swept
down from the positive z axis (the equivalent of “latitude’, although measured down from
the north pole and not up from the equator) and φ is the angle to the x-y projections of the
point, measured counterclockwise from the positive x-axis, the equivalent of ‘longitude’).
We call φ the azimuthal angle.
We will now proceed to set up and find the electric field inside and outside a uniform
spherical shell of charge by direct integration. This is just difficult enough that this section
is marked “Advanced”. However, even normal humans – that is, humans who don’t plan
to major in physics or mathematics – who probably won’t spend a lot of their lifetime inte-
grating nontrivial functions and solving partial differential equations in spherical coordinate
systems might want to look the solution over just to see how it works and so that they can
use it as a check for Gauss’s Law, which we will cover next.
We begin by choosing a spherical polar coordinate system, where a point is repre-
sented by the triplet (r, θ, φ). Physicists usually use θ and φ as represented on the figure
above, although in recent years some mathematics texts (and even a few physics texts)
swap them so that θ is the usual polar angle in the x-y plane. Sadly, I am an ‘old guy’ and
learned it so thoroughly the other way that I just don’t want to change, so we’ll stick with
the variable representation as given above.
Because the charge distribution (and hence the field) has spherical symmetry we lose
nothing by choosing the point P where we want to evaluate the field on the z-axis and giving
114 Week 2: Continuous Charge and Gauss’s Law
it a z-coordinate R (which is also the distance of the point from the origin). Furthermore,
although it is not strictly necessary, we can ignore dE⊥ in the figure above because the
problem has azimuthal symmetry and hence cannot have a total field component in the
x-y plane. I’m assuming that you have some familiarity with spherical polar coordinates51
and things like the area element on the surface of a sphere:
This trick doesn’t always work, but in physics a lot of time it does and when it does it is
really useful!
Consider, then, the small differential chunk of area dA of charge in figure 2.11. We
know from our usual rule that the charge in the chunk is the charge per unit volume times
the volume of the chunk, or:
We know that the field of just this chunk at the point P is has a magnitude:
r 2 d(cos θ) dφ
ke dq
dE = = ke σ (2.70)
s2 s2
Finally, we only care (for the moment, anyway) about dEz so we might as well write it
down too: 2
r d(cos θ) dφ
dEz = dE cos(α) = ke σ cos(α) (2.71)
s2
which we can rewrite using the geometry in figure 2.12 as:
r 2 d(cos θ) dφ
R − r cos(θ)
dEz = ke σ (2.72)
s2 s
Piece of cake, right? Well, not quite. Sadly, s and cos(α) depend on P , r and θ via e.g.
the law of cosines52 for s and the geometry of the triangle with sides s, R − r cos(θ) and
r sin(θ) for the other. On the other hand, the result still has azimuthal symmetry, which is
51
Wikipedia: http://www.wikipedia.org/wiki/Spherical Coordinate Systems. Note well that I’m using the
physics convention, that is, the second of the two pictures on the right.
52
Wikipedia: http://www.wikipedia.org/wiki/Law of Cosines.
Week 2: Continuous Charge and Gauss’s Law 115
dE
α dE z
dE
α s R − r cos θ
P dq
R
r cos θ
θ r
r sin θ y
x
~ into dEz .
Figure 2.12: Geometry for the vector decomposition of dE
good! This means we can immediately do the (trivial) φ integral and rearrange the result
so we can tackle it:
Z 1
(R − r cos(θ)) d(cos θ)
Ez = 2πr 2 σke
−1 s3
Z 1
(R − r cos(θ)) d(cos θ)
= 2πr 2 σke 2 2 3/2
−1 (R + r − 2rR cos(θ))
Z 1
R d(cos θ)
= 2πr 2 σke 2 2 3/2
−1 (R + r − 2rR cos(θ))
Z 1
r cos(θ) d(cos θ)
− 2 2 3/2
(2.73)
−1 (R + r − 2rR cos(θ))
This integral looks difficult, and perhaps it is, but it isn’t that difficult. The worst thing
about it is that we have to integrate the second piece of it by parts. Let’s start with the first
(fairly easy) piece:
Z 1 Z 1
R d(cos θ) 1
2 2 3/2
= − (R2 + r 2 − 2rR cos(θ))−3/2 (−2rR d(cos θ))
−1 (R + r − 2rR cos(θ)) 2r −1
1
1 1
= 2 2 1/2
r (R + r − 2rR cos(θ)) −1
1 1 1
= −
r (R2 + r 2 − 2rR)1/2 (R2 + r 2 + 2rR)1/2
1 1 1
= −
r (R − r) (R + r)
1 2r
=
r (R2 − r 2 )
2
= (2.74)
(R − r 2 )
2
The second integral is also easy enough, at least if you you remember how to integrate
by parts: Z Z
udv = uv − vdu (2.75)
(where I’ve gone ahead and multiplied and divided by −2R, thinking ahead).
Let’s let:
u = cos(θ) (2.77)
and
ζ = R2 + r 2 − 2rR cos(θ) (2.78)
so that:
−2rR d(cos θ)
dv = = ζ −3/2 dζ (2.79)
(R2 + r 2 − 2rR cos(θ))3/2
−2
Z
v = dv = −2ζ −1/2 = (2.80)
(R2 + r2 − 2rR cos(θ))1/2
Note that this is just the first integral before we plugged in the limits!
So let’s dig into the algebra. This bit isn’t exactly trivial – be patient and try to under-
stand each step.
Z 1 1
1 −2rR d(cos θ) cos(θ) 1 −2 cos(θ)
=
−2R −1 (R2 + r 2 − 2rR cos(θ))3/2 −2R (R2 + r 2 − 2rR cos(θ))1/2 −1
Z 1
−2 d cos(θ)
− 2 2 1/2
−1 (R + r − 2rR cos(θ))
1 −2 2
= −
−2R R−r R+r
Z 1
1 −2Rr d cos(θ)
−
rR (R2 + r 2 − 2rR cos(θ))1/2
−1
1 −4R
=
−2R R2 − r 2
1
2 2 2 1/2
− (R + r − 2rR cos(θ)) )
rR −1
1 −4R
=
−2R R2 − r 2
2
− [(R − r) − (R + r)]
rR
1 −4R 4
= +
−2R R2 − r 2 R
2 2
= 2 2
− 2 (2.81)
R −r R
Week 2: Continuous Charge and Gauss’s Law 117
Ouch! That was a lot of work! And technically, we’re not even done – we should really
pick a point where R < r (inside the sphere) to prove that the electric field vanishes inside.
At an interior point, one has to break the cos(θ) integral up into two pieces with different
signs because the charge from the part of the sphere above R creates a field that points
down, where the charge from the part of the sphere below R points up. The integral limits
change to: (Z
R/r Z 1 )
2
Ez = 2πr σke ... − (2.83)
−1 R/r
r r
S (inner)
S (outer)
Find the electric field at all points in space of a solid insulating sphere with
uniform charge density ρ and radius R
Just for grins, let’s do a teensy bit of your homework together. Note well that you don’t
get to just copy this onto your paper! In order to learn this and get it right three weeks
from now on an exam, you have to be able to do it without looking, or copying. So by all
118 Week 2: Continuous Charge and Gauss’s Law
means, go through the example, study it, figure it out, then close this book or put aside
your digital interface, get out paper, and do it on your own without looking – as many times
as necessary to make the steps, and reasoning, easy to you. Go over it in multiple passes,
work on it in your groups, review it in your notes (your teacher/professor probably did this
example in class), discuss it in recitation. Learn it.
We begin by writing Gauss’s Law for the outer surface in the figure ??:
I Z
~ · n̂dA = 4πke
E ρdV
Souter V /S
Z R Z 2π Z π
2
Er 4πr = 4πke ρr 2 sin(θ) dθ dφ dr
0 0 0
Z r
+ 0 dV
R
Z R Z 1
2
= 4πke (2πρ) r dr d(cos(θ))
0 −1
4πR3
= 4πke ( ρ)
3
= 4πke Qtotal (2.84)
ke Q
Er = r>R (2.85)
r2
or (as by now you should come to expect) the spherical distribution of charge creates a
field outside of the sphere that is identical to that of a point charge of the same total value
at the origin.
Note that we did a bunch of stuff that we didn’t really “have” to do – in an actual solution
you’d be tempted to skip those steps or do them by inspection, which is fine, but that risks
confusing at least some of you who don’t just see what we are skipping and why it is
OK to do so. So note well – to find the total charge inside Souter , we integrated over the
charge distribution from 0 to r including the region where it was zero – getting, of course,
a zero value for that value. Zero regions drop out, and we’d usually just integrate over the
support of ρ (the volume where it is nonzero) without thinking about it. Note also that this
integral explicitly illustrates doing multiple integrals of a symmetric function – we just do the
integrals over each coordinate independently (which is then really easy).
Finally, note the clever trick for integrating θ in spherical coordinates. sin(θ)dθ =
−d(cos(θ)), so we change variables from θ → cos(θ) (and change and swap order of the
limits to get rid of the minus sign). It is very often much easier to integrate with cos(θ) as
the variable instead of θ in spherical coordinates – in this case one can just look at it and
see that one gets “2” from the integral in your head, for example.
Week 2: Continuous Charge and Gauss’s Law 119
This is a common, and important, example – so let’s plot it to make it easier to remem-
ber: Things to note and remember: The field increases linearly inside the sphere and is
Er
k eQ
R2
R r
Figure 2.14: Electric field produced by a uniform sphere of charge both inside and outside,
as a function of r.
zero at the origin, not infinite! Outside, the field drops off like 1/r 2 – as you do more and
more of these, you’ll come to expect this to the point where you don’t think twice about it.
Any charge distribution with compact support and a net charge (spherical or not) produces
a field that is dominantly monopolar and drops off like 1/r 2 far away from the distribution.
This is very cool! The fact that the field is bounded at the origin means that the sin-
gularity that appears implicitly in the electrostatic field of a point charge need not trouble
us if the charge isn’t really a point charge but is rather a small ball of charge. However,
if charge is bound up in a small finite size ball it produces other problems – such as the
need for a force to hold it all together, as electrostatic charge of a single sign repels itself.
In the case of a proton, there is such a binding force – the strong nuclear force. In the case
of electrons, quarks, elementary particles, there is (as far as we can tell experimentally or
120 Week 2: Continuous Charge and Gauss’s Law
predict theoretically) no such force, and hence those particles “should” be, and experimen-
tally appear to be, truly pointlike. Which leads to a whole new set of problems (oops, that
nasty infinity is back and has to be dealt with), the invention of renormalizable quantum
field theories that soften or throw away the infinity – and in the process, makes physics an
enormously interesting discipline! Much as we do understand at this point, the problem of
understanding our Universe, especially at the smallest length and time scales, is far from
solved53 .
The uniform ball of charge is the basis for a model of the neutral atom – a positive
nucleus surrounded by a uniform ball of negative charge – that helps us understand po-
larization in a few weeks. This model is still used (dressed up with damping and a time
dependent driving field) in physics graduate school where the model is called the Lorentz
Oscillator Model for the atom and where the result of analyzing the model is understanding
of dispersion – basically time dependent dielectric response and the absorption of electro-
magnetic energy by matter! It sounds complicated, but it isn’t, not really. It is almost within
your reach at the end of taking this introductory course (where we will cover the static
part of the result perfectly adequately) – all that separates you is a bit more work with the
damped driven harmonic oscillator to help you manage even the dynamics. The reward for
the effort is that afterwards, you understand microscopically why, e.g. rainbows happen,
why the sky is blue, how light from the sun warms the earth, and much more. So keep it in
mind for later.
L
S2
σ0
r2 E n
E S1
n
r1
a L
Figure 2.15: A cylindrical shell of radius a, carrying a uniform charge per unit area σ0 . Two
cylindrical concentric Gaussian surfaces S1 (with radius r < a and S2 (with radius r > a)
are shown.
53
Students who are interested in reading something accessible for the lay person on the subject are encour-
aged to pick up a copy of The Black Hole War: My Battle with Stephen Hawking to Make the World Safe for
Quantum Mechanics by Leonard Susskind. Great fun, and it will help make many of the concepts discussed
here clearer in context.
Week 2: Continuous Charge and Gauss’s Law 121
Suppose you are given an infinite cylindrical shell of charge with a uniform
charge per unit area σ0 and radius a. Find the field everywhere in space.
We solve this problem exactly like we did the sphere. In fact, I block-copied the solution
from above to write this and changed only a few minimal things.
There are two distinct regions, inside the cylinder and outside the cylinder. Draw a
cylindrical Gaussian surface S1 of length L inside the cylinder (for r < a). We don’t know
that the field is on this surface yet, but we do know that on the cylinder part it must lie
along ~r and be constant in magnitude and perpendicular to the surface at all points on our
Gaussian surface from the symmetry of the distribution. On the end caps the field may well
vary with r, but it is parallel to those surfaces and therefore there is no net flux through the
caps. Hence:
I
φe = ~ · r̂ dA
E
S1
Z
= φcaps + Er dA
Cyl
= 0 + Er (2πr)L (2.88)
where it is presumed that everybody knows how to integrate to evalute the area of a cylin-
drical surface of radius r and length L and knows the result54 . Note that I indicate explicitly
that the flux through the end caps is zero even though the field there may not be.
The total charge QS1 inside this cylinder is zero by inspection – the fingers and toes
thing. That was easy! Now we write Gauss’s law:
~ · r̂ dA = Er (2πrL) = QS1 = 0
I
φe = E (2.89)
S1 ǫ0
and solve for Er :
Er (2πrL) = 0
0
=
2πrL
Er = 0 for r < a (2.90)
We’ve just shown that in general the electric field of a cylindrical shell of charge vanishes
inside.
Outside the shell we draw a second cylindrical Gaussian surface S2 with length L at
r > a. Again, the field must be constant and normal to all points on this surface from
symmetry, again the flux through the end caps must be zero even though the field on the
end caps may not be. The flux integral is identical:
I
φe = ~ · r̂ dA
E
S2
Z
= φcaps + Er dA
C
= Er (2πr)L (2.91)
54
Think of the label of a soup can. Use mental scissors to snip, snip, snip it off. Unroll it in your mind. It is
2πr long and L wide.
122 Week 2: Continuous Charge and Gauss’s Law
and in fact it will always be this algebraic form for a cylindrical problem, to the point where
we will get bored writing this line out umpty times doing homework. Don’t let that stop you!
Do it every time, as when you know something well enough to be slightly bored writing it
out, that’s just about perfect, isn’t it?
Again we can count up the charge inside S2 on the thumbs of one hand. It is the total
charge on the shell inside the Gaussian surface of length L! Which is, in fact (noting that
dA for a cylindrical shell of radius a is adθ dz):
Z Z 2π Z L/2
Q S2 = σ0 dA = dθ aσ0 dz
S 0 −L/2
= 2πaLσ0 (2.92)
which we could have done using our heads instead of calculus, but again this way you get
to see how to do a two dimensional integral that separates into two trivial one dimensional
integrals.
Finally, we write out Gauss’s law and solve for Er :
Q S2
φe = Er (2πrL) =
ǫ0
2πaLσ0 1
Er =
2πLǫ0 r
σ0 a
=
ǫ0 r
2kλ0
= (2.93)
r
where I’ve used the fact that λ0 = QS /L = 2πaσ0 to help show that the field of a cylindrically
symmetric charge distribution outside that distribution is the same as that of a line of
charge with the same net charge per unit length on its axis.
Note well: The parameter L (which you made up when you drew your Gaussian sur-
face) cancels from the problem. Of course it does! And a good thing, too!
In lecture your instructor will probably do a few more difficult problems – perhaps a
solid cylinder of charge, or multiple cylindrical shells, or even a solid cylinder with a charge
distribution like ρ(r) = Ar where A is a constant! You should be able to do any problem
with a cylindrical distribution of charge that you can integrate or sum inside any given
Gaussian cylinder using this method.
Suppose you are given an infinite sheet of charge with a uniform charge per
unit area σ0 . Find the field everywhere in space.
We solve this problem exactly like we did the two above. You (by now) should know the
drill.
Here we only need to draw a single Gaussian surface as indicated in figure ?? above.
We will again draw a cylindrical Gaussian surface of length z, but this time it must be
Week 2: Continuous Charge and Gauss’s Law 123
z
σ0 E A
n
E
n
E
Figure 2.16: An (infinite) plane sheet of uniform charge per unit area σ0 . The Gaussian
surface in this case is a simple “pillbox” symmetrically drawn so it intersects the sheet as
drawn.
symmetrically located so that it symmetrically intersects the plane of charge with z/2 of its
length above and below the plane. This cylinder has an end-cap area of A which (like L
in the previous problem) will cancel when we go to evaluate the field. We don’t know what
the field is on this surface yet, but we do know that on the end-caps it must lie parallel to ~z
and be constant in magnitude and perpendicular to the end caps at all points. On the side
of the cylinder the field may well vary with r, but it is parallel to this surface and therefore
there is no net flux through it. Hence:
I
φe = ~ · ẑ dA
E
S
= φside + 2Ez A
= 2Ez A (2.94)
where you should note that we have two end caps, each of which contributes Ez A to the
flux.
The total charge inside this Gaussian surface is trivial:
Z
QS = σ0 dA = σ0 A (2.95)
A
where there really isn’t much of anything to integrate or evaluate.
Finally, we write out Gauss’s law and solve for Ez :
QS σ0 A
φe = 2Ez A = =
ǫ0 ǫ0
σ0
Ez =
2ǫ0
= 2πkσ0 (2.96)
where we note that the field is uniform – it doesn’t depend on z, and of course it cannot
depend on x and y either as every point is in the middle of an infinite plane! This last result
is very important.
124 Week 2: Continuous Charge and Gauss’s Law
Note well: The parameter A (which you made up when you drew your Gaussian sur-
face) cancels from the problem. Also note that this is exactly the result we got for the field
on the axis of a disk of charge when we let the radius go to ∞. This gives us confidence
that Gauss’s Law works!
As before, in lecture your instructor will probably do a few more problems, perhaps a
slab of charge of finite thickness or the field produced by two infinite sheets of charge, one
with charge σ0 and the other with charge −σ0 (a model for a parallel plate capacitor that
we will study in great detail shortly).
A conductor is a material that contains many “free” charges that are bound to the material
so that they cannot easily jump from the conductor into a surrounding insulating material
(where a vacuum is considered an insulator for the time being, as is air) but free to move
within the material itself if any e.g. electrical field exerts a force on them.
In a typical conductor – for example a metal such as silver or copper – there is on aver-
age roughly one free electron per atom in the material. That is in the ballpark of 1024 free
electrons per mole of metal, which in turn is somewhere between 104 and 105 Coulombs
of free charge! As we discussed in class, two charges of one Coulomb each separated by
one meter exert a force of 9 × 109 Newtons on each other, more than enough to rip apart
any normal material (releasing roughly ten billion joules of energy) as they “explosively”
rejoin. Consequently we have no hope of either removing all of the free electrons from
(say) 50 or 60 grams of solid metal and separating them by any appreciable macroscopic
distance, or adding enough electrons so that every atom has an extra one. From energy
balance alone, it would require the entire energy output of a city-sized gigawatt electrical
generator for more than a day to accomplish it, and the material itself would come apart in
a cloud of superheated plasma long before we succeeded.
This means that we can consider the free charge in a macroscopic chunk of
conductor to be ‘inexhaustible’. As far as we’re concerned, we can always add charge
to a conductor, or take it away, or rearrange it as we please with fields and forces, and
never run a risk of “saturating” the conductor’s ability to supply still more free charge, at
least not with the work we are willing to do (and pay for!) and while keeping the conductor
itself intact.
Now let’s think a moment about the “free” bit without worrying about actual equations –
we’ll use logic and reason to figure out what we should expect to see when we try to push
free charges around inside (say) a metallic conductor. First, we have to try to imagine
a force we can use to push on those free charges with. So far, we know of only two
forces that can push on massive charged particles – gravitation and the electrostatic force.
Obviously Earthly gravitation is much, much weaker than the electrostatic forces that bind
the electrons to atoms and molecules, as we don’t observe electrons “dripping” out of solid
matter and falling to the ground under a conductor (and besides, with luck you did an in-
Week 2: Continuous Charge and Gauss’s Law 125
class problem where you concluded at the end that the Coulomb force acting on a bound
electron in (say) a hydrogen atom is some 1036 times stronger than the gravitational force
between the electron and proton in the nucleus at any separation). Finally, it is a True Fact
(well beyond the scope of the course to treat in detail) that the quantum interaction of an
electron with protons or neutrons via the strong and weak nuclear forces is essentially
irrelevant.
Consequently, if we want to exert a force on the free charges in an isolated conductor
to move them independent of the underlying atomic/molecular lattice they are bound to
(they will still fall with the object they are bound to as gravity acts on the protons, neutrons
and electrons alike) it will have to be using the electric field itself (or, as we will see
later, with the electromagnetic field as magnetism will also push free charges around, but
only if they are moving and never doing any actual work as it does so).
Next, those charges are (by hypothesis) free to move and hence will accelerate in
the direction of the net electrostatic field/force we apply to push them around to the
extent permitted by their interaction with the underlying lattice of atoms or molecules that
make up the conducting medium (something we will discuss in detail in Week/Chapter 5).
They will continue to move (accelerating or in a steady state of motion) until they encounter
the boundaries of the isolated material, where a potential energy barrier holds them in-
side of the conductor, exerting an electrostatic surface force perpendicular to the surface
of the conductor sufficient to prevent any motion across those boundaries). At the bound-
aries, free charges build up and rearrange until they create a macroscopic electrostatic
field of their own.
Empirically, this rearrangement of charge in an isolated conductor placed in an external
electrostatic field only lasts for a very, very short time – nanoseconds to microseconds, de-
pending on a variety of things we’ll learn about later. By the end of this time, the conductor
will reach a state we will call electrostatic equilibrium (which I will routinely abbreviate
‘ESE’ as we will refer to this state quite frequently for the next few chapters) where the
net free charge in the conductor stops moving around as all of the forces acting on it,
including the surface force confining it to the material, cancel. Since the force that caused
the rearrangement was, by the argument above, almost certainly caused by an external
electrostatic field, and since only electrostatic fields exert non-negligible forces on the con-
duction charges, we can conclude that the charges have, at that time, arranged themselves
in such a way that the total electrostatic force on the free charges in the material is zero.
At last we are ready to bring our threads of logic and reason above to a conclusion, one
where GLE will play a crucial, and quantitative, role! When the conductor is in ELE, we
can see that the following five propositions must be true, and of course, you should learn
them, and more importantly, master the chain of reasoning we use to arrive at them!
1 The electric field vanishes inside a conductor in ESE. The argument for this is
pretty much given in its entirety above. By definition, in ESE the free charges in the
bulk inerior of the conductor are not moving/accelerating. Only forces exerted by a
nonzero electric field are capable of making them move relative to the underlying
lattice, and evidently this force is zero. Consequently:
~ inside = 0
E (2.97)
126 Week 2: Continuous Charge and Gauss’s Law
E=0
E=0
Q=0
Figure 2.17: An arbitrary chunk of conducting material in electrostatic equilibrium can have
no field inside, or else it wouldn’t be in equilibrium. It can have no field tangent to its
surface, or it wouldn’t be in equilibrium. From these facts we can deduce several useful
things about conductors in electrostatic equilibrium using Gauss’s Law.
Note that when we leave our coarse-grained macroscopic world where a micrometer
is “infinitesimal” and look at things at the atomic/molecular scale, E~ really vanishes
across the first few layers of atoms, a distance of a few nanometers, not at a math-
ematically precise “surface”. At the macroscopic scale where we consider charge to
be continuously distributed along lengths, surfaces, or volumes we will consider this
layer a few tens of angstroms thick as being, for all practical purposes, an “infinitely”
thin surface where the field “instantly” vanishes. Bear this in mind as we continue.
2 There is no net charge in the volume of matter inside a conductor in ESE. This
follows from 1 and from Gauss’s Law applied “backwards”. Consider the gaussian
surface S drawn just inside the surface of the arbitrary “blob” of conducting material
drawn in figure 2.17 above. For this surface or any other closed surface drawn en-
tirely inside the conductor (that is, not containing any part of its surface or the space
outside):
1 Qinside
I Z
~
(E = 0) · n̂dA = 0 = ρinside dV = ⇒ ρinside , Qinside = 0
S ǫ 0 V /S ǫ0
(2.98)
This is “backwards” because instead of using knowledge of Q or ρ to find E, ~ we use
our knowledge of E ~ from 1 to conclude something important about ρ and/or Q!
outside the dashed line representing the largest surface S we can construct that is
completely “inside” the figure 2.17 (which is really a few nanometers inside its “true”
surface) where it will form a coarse-grained surface charge density σ 6= 0 consisting
of charges unbalanced only in a layer a few atoms/molecules thick.
The argument is identical to that used to deduce rule 1. Suppose that there were
a nonzero electric field parallel to the surface of the conductor “just outside” of the
conductor. The only thing that can cancel that field so that it goes to zero by the time
one is inside the conductor is the field created by free charge inside the conductor
itself. The electric field is continuous everywhere but at the location of true point
charges, so the field outside must penetrate at least that surface layer a few atoms
thick into the conductor before it can be completely cancelled in the coarse grained
limit, but even in this layer, if it were macroscopically nonzero, it would act on the free
charges there and push them around, contradicting the assertion that the conductor
is in ESE!
From this we can conclude that by the time the charge on the isolated conductor
stops rearranging (reaches ESE) when we put it in an electrostatic field or add a net
charge to it, there can be no electric field component parallel to its surface at – “just”
outside – that surface.
~ ⊥ = 4πke σ = σ
E (2.101)
ǫ0
128 Week 2: Continuous Charge and Gauss’s Law
Note well that all of these properties are for ESE only! As we will shortly learn, con-
ductors that carry nonzero currents are not in ESE and do have nonzero electric fields
inside that are parallel to the surfaces. I often ask questions that test whether or not you
understand this on exams, so be careful!
+ +
+
+ +
+
+
+
+
E=0 −
+
+
−
−
− −
−
Figure 2.18: A conductor with an arbitrary shape near an external charge rearranges its
charge into a surface charge that cancels the field inside and causes the field near the
surface to be perpendicular to the surface.
• Where is the field inside strongest? (The field inside is zero everywhere, trick ques-
tion.)
• Given the conductor and the charges, can we sketch a guesstimate of the field in the
plane of the figure? (Yes, done for you above. Note the use of the rule that the field
Week 2: Continuous Charge and Gauss’s Law 129
lines enter or leave the surface of the conductor at right angles. Of course in reality
the conductor and location of external charge could/would be three dimensional and
everything could be more complicated...)
• Is the entire conductor electrically neutral? (No, charge on the surface only has rear-
ranged, with negative electrons being attracted to the positive charges and getting as
“close as they can” to them (while still remaining as far apart as possible from each
other, in competition) and leaving behind positive charges on the atoms as “close
as possible” to the nearby negative charges ditto. The + and - signs on the figure
represent a possible visualization of this surface charge, which is related to the field
outside by:
E⊥ = 4πke σ
from Gauss’s Law plus our knowledge that the field vanishes inside. )
• Is the interior of the conductor electrically neutral? (Sure, it must be. If it weren’t
the charges there would create a field (see Gauss’s Law!) and move away from
one another until they reach the surface and become part of the surface charge
distribution.)
• Can we tell just from the figure whether or not the conductor is overall electrically
neutral (has a net charge or not)? (No, not really. The lines of force in the figure
above suggest that it might be, but we drew them in response to the question above,
right? So there isn’t any real reason to rely on them. What we do know is that if it
isn’t neutral, all of the surplus charge will be located on the surface of the conductor,
arranged in just the right way that the field lines leave the surface at right angles.)
Make sure that you understand the ideas underlying all of these answers.
In the figure above, two conducting plates with facing area A, with wires attached to them
are schematically illustrated. The plates are deliberately drawn to be thick and the gap be-
tween the plates is similarly exaggerated. We assume that the plates are large compared
to this gap.
Suppose equal and opposite charges ±Q are placed on the plates (and prevented from
flowing together through the conducting wires). We know that the field inside the shaded
metal region must be zero once the plates are in electrostatic equilibrium. We also know
that the charges have to spread out on the surface(s) of the conductors. Finally, we know
that the oppposite charges will attract across the gap between the plates.
The charge distribution illustrated above, with the charges spread out uniformly on the
facing surfaces of the plates as ±σ = ±Q/A satisfies all of these conditions. As we have
σ
seen, the field of a single plane sheet of charge is E = = 2πke σ, directed away from
2ǫ0
a positive surface charge density.
The field lines from the upper plate go up above the surface layer +σ and down below
it. Similarly the field lines go down above the surface layer −σ and up beneath it. The
130 Week 2: Continuous Charge and Gauss’s Law
cancel
+ + + + + + + + +σ
add!
− − − − − − − −
−σ
cancel
Figure 2.19: Opposite charges placed on two facing conducting plates spread out to form
surface charge layers. This is exactly what is needed to cancel the fields of the two lay-
ers in the plates themselves while adding together in the space between the plates.
idealized field lines from each surface charge layer go all he way to infinity, where the
total field is the vector sum of the two fields, one from the upper layer +σ, the other from
the lower layer −σ.
As you can see in the figure, above +σ the up field from the upper layer and the down
field from the lower layer cancel, making the field zero (as desired) everywhere in the
metal plate above +σ. The same is true below the lower layer −σ. In between the plates,
though, the field from the upper layer is down, the field from the lower layer is down also
and hence the total field is:
σ σ σ
Etot = Eu + El = + =
2ǫ0 2ǫ0 ǫ0
down. The field runs from the positive surface layer to the negative surface layer and is
zero everywere inside the bulk conductor and for that matter in the air above and below the
plates!
This is an important example as finding this field in terms of σ = Q/A is a required step
for finding first the potential difference between the two plates (next chapter) and then the
capacitance of this arrangement of conductors (the chapter after that).
Note well! The charges spread out on these surface must be equal and opposite!
This is true even if one puts different charges on the two plates! You will work some
examples for spherical conducting shells for homework and should pay attention to this
happening there as well, and for the same reasons.
As noted at the beginning of week 1, the ability to demonstrate things like Coulomb’s Law
revolves around several things. One is the ability to accurately measure very small forces
Week 2: Continuous Charge and Gauss’s Law 131
– this Coulomb was able to do with his personally invented torsional balance. The other
was the ability to create controlled amounts of charge and place it on isolated conductors
on his balance.
This section is intended to give you some idea of how one can generate charge (by
means of friction or induction) and how one can then use it to generate like amounts of
charge for experiments. The primary two means for the latter are charging by induction
and charge transfer.
Charging by induction is illustrated below:
−q −q
+ + + + +
+ ++ + ++
+
Q=0 +
+
+
+ Q =+ 0 +
+ +
+
−
−
−− −−
In the first panel (a), a neutral, spherical conductor is connected to “ground”, which
can be thought of as a really, really big conductor, a reservoir of charge that generates
essentially no additional field no matter how much charge you pull from it or deliver to it.
Note well the symbol used for ground.
Second (b) a charged object (perhaps prepared by the triboelectric effect, rubbing a
glass rod with silk to produce the negative charge shown or using a crude electrostatic
generator) is brought near the conductor. There it attracts charge of the opposite sign and
repels charge of the same sign which tries to get as far away as possible, which happens
to be the ground.
Third (c) the connection to ground is removed, isolating the charge on the sphere, and
the induction charge is removed, producing:
(d) a charged, isolated conducting sphere.
−q
− + +
− −−
+
+ ++ +
−− + +
−Q − −
+
−− +
+ +Q
−− − − +
− +
(a) (b)
It is not strictly necessary to use the ground. You can also produce equal and opposite
charges by using two spheres connected with a wire, bringing the charged object near
132 Week 2: Continuous Charge and Gauss’s Law
one and pushing charge over to the other before disconnecting the wire as before. This
is schematically illustrated in figure 2.21 above. Since the two objects began electrically
neutral, they will have equal and opposite charges!
To produce the same charge on two identical conducting spheres, it suffices to charge
one sphere up as shown in figures 2.20 or 2.20 and then bring it into contact with an
identical sphere. The charge then splits onto the two spheres symmetrically, leaving them
both with half of the original charge. This process can be repeated with more spheres,
producing a series of spheres with Q, Q/2, Q/4, Q/8 on them. This suffices to be able
to demonstrate the needed bilinearity in charge in Coulomb’s Law, provided only that one
can measure very small forces and distances with some accuracy.
+
+ +
charged conducting sphere
+
++ + (electrode)
corona
effect + +
transfer
+ rubber belt
+
+
+ friction charge transfer
motor + +
+
surface of the sphere. One has to push further charge up through the hole against the
force exerted by the charge already on the sphere, so the motor at the bottom has to do
work in order to increase or maintain the charge on the sphere.
Van de Graff generators were the basis of the very first “atom smashing” particle ac-
celerators used to probe nuclear structure. They are still in use today in research accel-
erators55 They were quickly largely replaced by e.g. cyclotrons – described elsewhere in
this text – and other accelerators capable of achieving more than the 1-30 MeV particle
energies they can produce. While Van de Graff generators were for a time used or con-
sidered for the productions of nucleotides used in nuclear medicine, I was able to find no
real evidence that they are currently in an sort of medical production environment. The
much more compact cyclotron, on the other hand, has almost become a standard piece of
hospital equipment, because many of the most useful isotopes have very short half-lives
(deliberately!) and hence have to be produced right next to where they will be used (as
close as “down the hall”) in order for the isotopes not to decay below useful levels during
the time required for transportation.
55
Duke University has a high-resolution tandem Van de Graff accelerator as of the time of this writing – I
helped to design its beam optics as a project in my senior year at Duke as an undergraduate.
134 Week 2: Continuous Charge and Gauss’s Law
Problem 1.
Physics Concepts
Make this week’s physics concepts summary as you work all of the problems in this
week’s assignment. Be sure to cross-reference each concept in the summary to the prob-
lem(s) they were key to. Do the work carefully enough that you can (after it has been
handed in and graded) punch it and add it to a three ring binder for review and study come
finals!
Problem 2.
θ1 θ 2
y
λ
x1 x2 x
A uniform line of charge with charge per unit length λ0 runs from x1 to x2 (where x1 < x2
by convention) on the x axis. Find both components of the electric field at an arbitrary point
y on the y axis. Note that x1 and x2 are arbitrary aside from their ordering, so your answer
should make sense for e.g x1 < 0 and x1 > 0.
Note that this problem is worked for you as an example both in class and in the text.
Why, then, you might ask yourself, is it also on the homework? Many of the examples
worked in class or the text are very nearly the only problems of their type that can be
sanely solved by e.g. integration by ordinary mortal humans. You will not learn it from
just seeing me present it, or reading its presentation in a textbook. You must do it yourself
– ideally enough times and carefully enough to be able to do it yourself without looking
back at the solution, easily – in order to learn this problem and the ideas it archetypically
represents and make it/them your own.
All problems that are presented in lecture, in the textbook, and as homework
problems are extremely likely to show up as quiz or exam problems! In some cases,
“extremely likely” means certain. In a subset of those cases I might even tell you that
it is certain. But regardless, a good student will always be able to solve every homework
problem perfectly, without looking, by exam time. An excellent student – one who deserves
an A in the course – will be able to explain what they are doing as they do so (for example,
Week 2: Continuous Charge and Gauss’s Law 135
to other members of their study group) and will be able to handle minor variations that
make the problem not quite identical to the lecture/text/homework it is based on.
Just something to keep in mind while working on these problems in groups. The home-
work in this textbook is (unlike that of many textbooks) carefully designed to direct your
study activities to where they will pay off. The “gold standard of learning” is being able
to articulate your solutions well enough that you could teach a novice how to solve every
problem of your homework a month after doing it.
Problem 3.
θ0
R
Problem 4.
λ0
A ‘point dipole’ p
~ is located a distance r from an infinitely long line of charge with a uni-
form linear charge density +λ0 . Assume that the dipole is aligned with the field produced
by the line charge. Determine the force acting on the dipole. Is it attracted to or repelled
by the line?
Note that there are two ways to do this problem. One uses calculus and
dU d~
p·E ~
Fr = − =
dr dr
The other assumes a finite size dipole p ~ = q~ℓ and uses e.g. the binomial expansion of the
force in the limit ℓ ≪ r as you did on your homework in the previous chapter to arrive at the
result for a ‘point dipole’. You may want to look back at those problems as you do this, and
compare the difficulty of the two methods.
136 Week 2: Continuous Charge and Gauss’s Law
Problem 5.
ρ0
b
a
A thick, nonconducting spherical shell of inner radius a and outer radius b has a uniform
volume charge density ρ(r) = ρ0 .
Problem 6.
ρ0
a) Find the total charge in a chopped-off section of the infinite cylinder of (finite) length
ℓ.
c) Let a = 0. Find the electric field (now that of a uniform cylinder of charge) everywhere.
Week 2: Continuous Charge and Gauss’s Law 137
Problem 7.
b
a
+q
A spherical conducting shell with zero net charge has inner radius a and outer radius
b. A point charge q is placed at the center of the shell (the origin) as shown.
a) Use Gauss’s Law and the properties of conductors in equilibrium to find the electric
field in the three regions:
b) Find the charge density on the inner and outer surfaces of the shell.
Problem 8.
Problem 9.
+Q 0
−Q0
Consider three “thin” concentric conducting spherical shells with radii a < b < c respec-
tively. Initially all three shells are neutral. Then a negative charge −Q0 is placed on the
innermost sphere, a matching positive charge +Q0 is placed on the outermost sphere, and
the arrangement allowed to come to equilibrium.
a) Find the electric field everywhere and plot it. You will probably find this easier to do if
you let each shell have a small (relative to a) finite thickness as drawn above.
b) Make a table showing the net charge on the inner and outer surfaces of each con-
ducting shell.
Week 2: Continuous Charge and Gauss’s Law 139
Problem 10.
The electric field vanishes inside a uniform spherical shell of charge because the shell
has exactly the right geometry to make the 1/r 2 field produced by opposite sides of the
shell cancel according to the intuition we developed from our derivation of Gauss’s Law. It
isn’t a general result for arbitrary symmetries, however.
Consider a ring of charge of radius R and linear charge density λ. Pick a point P that
is in the plane of the ring but not at the center.
a) Write an expression the field produced by the small pieces of arc subtended by op-
posed small angles with vertex P , along the line that bisects this small angle.
b) Does this field point towards the nearest arc of the ring or the farthest arc of the ring?
c) Suppose a charge −q is placed at the center of the ring (at equilibrium). Is this
equilibrium stable56 ?
d) Suppose the electric field dropped off like 1/r instead of 1/r 2 . Would you expect the
electric field to vanish in the plane inside of the ring? Would this be a good form for
the electric field in Edwin Abbot’s novel Flatland so that they could have a Gauss’s
Law too57 ?
Problem 11.
56
As a parenthetical aside, note that this is the problem with the ringworld described in Larry Niven’s famous
Ringworld series of science fiction novels, as gravitational attraction has the same form as the electrostatic
attraction discussed in this problem.
57
Alternatively, could a flatlander speculate that reality was really three dimensional because of the apparent
failure of an expected 1/r force law? Questions such as this are highly relevant to modern field theorists hoping
to infer extra/hidden dimensions.
140 Week 2: Continuous Charge and Gauss’s Law
ρ0
a) Show that at a point within the sphere a distance r from the center the electric field
is given by:
~ = ρ0 ~
E
r
=
4πkρ0 ~
r
3ǫ0 3
b) Material is removed from the sphere to create a spherical cavity of radius b = a/2
with center at x = b on the x axis (shown above). Show that the electric field inside
the cavity is uniform and equal to:
~ ~
~ = ρ0 b = 4πkρ0 b
E
3ǫ0 3
Hint: By far the easiest way to attack this problem is to imagine that the “hole” is made
up of a sphere of uniform charge density −ρ0 and radius b that is superposed on the
uniform sphere of charge density ρ0 and radius a. In that way the two charge densities
cancel and leave “the cavity”, while you can easily find the fields using the results of part
(a) with a bit of algebra. Also, draw big pictures of the spheres. You have to add vectors
in the hole! If you don’t make a big sphere with a hole large enough to draw vectors in, it’s
going to be really hard to visualize what’s going on accurately enough to guide you when
you try to add up the field. If you do a really good picture, you may see the trivial way to do
the addition that actually makes this problem rather easy (given (a)) instead of a matter of
adding up vector components the hard way!
Week 2: Continuous Charge and Gauss’s Law 141
(x 0 , y0 , z 0)
∆y
∆z
∆x
x
Consider a small gaussian surface in the shape of a cube with faces parallel to the
xy, xz, and yz planes sitting in region where there is a continuous electric field. Let the
corner nearest the origin be located at ~r 0 = (x0 , y0 , z0 ) and the cube edge lengths be
∆x = ∆y = ∆z in the directions parallel to the different axes.
Since the electric field is continuous, each component of the field can be expanded in
a Taylor series:
~ r 0 + ∆~
E(~ r) =
∂Ex ∂Ex
Ex (~
r 0 ) + ∆x + ∆y +
∂x ~ r0 ∂y ~ r0
!
∂Ex
∆z + ... x̂ +
∂z ~ r0
∂Ey ∂Ey
Ey (~
r 0 ) + ∆x + ∆y +
∂x ~r0 ∂y ~ r0
!
∂Ey
∆z + ... ŷ +
∂z ~ r0
∂Ez ∂Ez
Ez (~
r 0 ) + ∆x + ∆y +
∂x ~r0 ∂y ~ r0
!
∂Ez
∆z + ... ẑ +
∂z ~ r0
(2.102)
∆x∆y∆z, show that the net electric flux out of this box is:
X
~ ∂Ex ∂Ey ∂Ez ~ ·E
~ ∆V
E · n̂ ∆A = φnet = + + ∆V = ∇
∂x ∂y ∂z
sides
Note well, to get this result you need to eliminate certain components in the full expansion.
To accomplish this, you will need to neglect any term that is second order in ∆x, ∆y, or
∆z.
This is justified by taking the differential limit: ∆x → dx, etc. Then Gauss’s Law as we
have thus far learned it becomes the following vector differential form:
~ dV = ρ dV
X
E~ · n̂ dA = ∇ ~ ·E
ǫ0
sides
or
∇ ~ = ρ
~ ·E (2.103)
ǫ0
Congratulations! You’ve just derived Gauss’s Law in its vector differential form (and,
incidentally, have derived the divergence theorem for vector fields if we extend the sums
above back to integrals by summing over all the little differential cubes in an extended
volume with interior surface contributions cancelling out). We won’t use this this semester,
but it is very important to start to think about how the one (integral) form is equivalent to
the other (differential) form, as the latter turns out to be very useful!
Week 3: Potential Energy and
Potential
• The change in electrostatic potential energy moving a charge between two points in
the field of other charges is:
Z ~
x1
∆U (~
x0 → ~
x1 ) = − ~ · d~
F x (3.1)
~
x0
~ is the total force due to all other charges.
where F
• The vector electrostatic force can be found from the the potential energy function by
taking its negative gradient:
F~ = −∇U
~ (3.2)
• For charge density distributions with “compact support” (ones we can draw a ball
around, basically) we by convention define the zero of the potential energy function
to be at ∞:
Z ~
x
U (~
x) = − ~ · d~
F x (3.3)
∞
For point charges q1 and q2 , it is just:
kq1 q2
U (~
x1 , ~
x2 ) = (3.4)
|~
x1 − ~x2 |
• Since the potential energy is just a scalar and satisfies the superposition principle,
we can evalute the total energy of a system of point charges as:
1 X kqi qj
Utot = (3.5)
2 |~
xi − ~
xj |
i6=j
(there is a similar integral expression for continuous charge distributions we will ad-
dress later) where the 1/2 is to compensate for double counting in the sum.
so that the potential of a point charge in coordinates centered on the charge is just:
kq
V (~
r) = (3.7)
r
143
144 Week 3: Potential Energy and Potential
• The potential is to the field as the potential energy is to the force, so:
Z
x) = − E
V (~ ~ · d~
x + V0 (3.8)
with V0 and arbitrary constant of integration, used to set a suitable zero of the poten-
tial energy. For compact charge distributions:
Z ~
x
V (~
x) = − ~ · d~
E x (3.9)
∞
and
~ = −∇V
E ~ (3.10)
or
kdq0 x0 )d3 r0
kρ(~
Z Z
Vtot (~
x) = = (3.12)
|~
x−~ x0 | |~
x−~ x0 |
• Conductors at electrostatic equilibrium are equipotential. We can therefore speak of
the potential difference between two conductors in electrostatic equilibrium where it
doesn’t matter what path we use to go from one conductor to the other. This also
means that if we charge one isolated conductor to some potential and then connect
it to another isolated conductor, charge will flow until the two conductors (now one)
are at the same potential, a process called charge sharing.
• In a strong enough electric field, dielectric breakdown occurs and insulators “sud-
denly” become conductors (e.g. lightning in air). Strong fields are often induced in
the vicinity of a sharp conducting point, causing a slower corona effect discharge that
is the basis for lightning rods.
This completes the chapter/week summary. The sections below illuminate these basic
facts and illustrate them with examples.
The electrostatic force is conservative. That is, the work done moving a charge between
any two points in an electrostatic field is independent of the path taken. For conservative
forces we can define the change in potential energy to be the negative work done by the
electrostatic force moving between two points:
Z ~
x1
∆U (~
x0 → ~
x1 ) = − ~ · d~
F x (3.13)
x0
~
The corresponding relation between the potential energy thus defined and the force is
(as usual):
~ = −∇U
F ~ (3.14)
Week 3: Potential Energy and Potential 145
Consequently we see that we could equally well define the electrostatic potential energy in
terms of an indefinite integral and an arbitrary constant of integration:
Z
x) = − F
∆U (~ ~ · d~
x + U0 (3.15)
that effectively sets the point where the potential energy is zero.
By convention, for charge densities that have compact support – ones that one can
draw a ball of finite radius (however large that radius might be) so that it completely contains
all of the charge – we define the potential energy to be zero at ∞, just as we did for the
gravitational potential energy:
Z ~
x
∆U (~
x) = − ~ · d~
F x (3.16)
∞
(so that U0 is zero, if you prefer). We remain free to choose a different zero, however, in
any problem where doing so is computationally convenient.
Using the relations above, it is easy to show that the potential energy of two point
charges is:
kq1 q2
U= (3.17)
|~
x1 − ~x2 |
which again looks very much like that for gravity as might be expected.
One important advantage of working with the potential energy is that it is a scalar. To
find the total potential energy of a collection of charges, we just add it up pairwise:
1 X kqi qj
Utot = (3.18)
2 |~
xi − ~
xj |
i6=j
Note that in this sum the 1 → 2 interaction is counted twice, once as q1 q2 and once as q2 q1 .
We only wish to count it once, so we divide the result by 1/2. Another way to deal with this
issue is to order the sum so that we simply never do a pair twice:
X kqi qj
Utot = (3.19)
|~
xi − ~xj |
i<j
This stands for “sum over all qj and all qi such that i < j” which excludes all the self-energy
i = j terms. Good thing, too, since they are all infinite!
3.2: Potential
The good thing about potential energy is that it is a scalar and easier to evaluate than the
vector force or field. However, it isn’t terribly easy! It is still a two-body interaction term and
requires us to do a nasty double sum (that becomes an even nastier double integral) when
we have a large collection of charges.
A couple of weeks ago we introduced the idea of the field to eliminate two body com-
putations for electric force and to give us the comfort of an apparent action-at-a-distance
146 Week 3: Potential Energy and Potential
cause of the electric force. Let us do exactly the same thing here. We will define the elec-
trostatic potential to be a scalar field of “potential energy per unit charge” that is the cause
of a charged particle placed in it having a potential energy.
The formal definition of the potential is that it is the potential energy of a small test
charge q0 interacting with all the other charges that create the potential, per unit test
charge, in the limit that this small test charge vanishes:
U (~
x)
V (~
x) = lim (3.20)
q0 →0 q0
Note that this strange-seeming condition ensures that the test charge itself doesn’t perturb
the charge distribution that produces the potential.
The SI units for potential are:
1 Joule
1 Volt = (3.21)
1 Coulomb
where r = |~
x|. Alternatively we could use the definition of the field relative to the force to
define: Z
x) = − E
V (~ ~ · d~
x + V0 (3.23)
For charge distributions with compact support, we by convention pick the zero of potential
at ∞ so that:
Z ~x
V (~
x) = − ~ · d~
E x (3.24)
∞
In many cases (especially when we start to treat conductors more thoroughly in later
chapters) we will be interested in potential differences. If the field is known and well be-
haved, they can be easily computed by means of:
Z ~
x2
∆V (~
x1 → ~
x2 ) = − ~ · d~
E x (3.25)
x1
~
~ = −∇V
E ~ (3.26)
which in some cases will give us a relatively easy path to find the field. If the potential is
relatively easy to find by (say) superposition (because it is a straight scalar sum or integral
over the potentials of all the contributing charges) then one can find the field by doing
relatively easy derivatives instead of sums or integrals over vector components.
Note that this relation gives us a new way to write the strength of a field in SI units
as volts per meter. Note also that there is a precise analogy between force and potential
energy and field and potential. Finally, note that once we know the potential produced by
Week 3: Potential Energy and Potential 147
a collection of fixed charges, we can compute the potential energy of a charge q placed in
the potential subject to the condition that the presence of the charge in the potential does
not cause significant rearrangement of the charges that create that potential as:
U = qV (3.27)
This will not always be the case! In fact, if we were picky we’d say that it is almost
never the case in nature, because atoms aren’t “solid” objects and inevitably distort in
the presence of the field of the perturbing charge. However, that doesn’t really stop us
from using this expression; we merely have to compute the potential energy in the self-
consistent perturbed potential of the other charges. It does make it a bit more difficult,
though.
3.3: Superposition
As we noted in the previous section, a major motivation for introducing potential is that it
is a scalar quantity that we can evaluate by doing sums that don’t involve the complexity
of vector components or charge-charge interactions. The rule for finding the potential of
a collection of charges is simple: We just add up the scalar potential of each (point-like)
charge independent of all the rest!
This is once again the superposition principle for electrostatics, now applied to the
scalar potential:
X kqi
Vtot (~
x) = (3.28)
|~
x−~ xi |
i
In words, the potential at a point in space is the simple (scalar) sum of the individual
potentials of all the charges that contribute to that total potential.
As before, when we are working at scales where there are many many elementary
point charges contributing to the potential, we can coarse grain average. That is, we can
look at a volume ∆V that is large enough to contain sufficient charge for a smooth average
charge density to result that is also small enough that we can sum over it as if it is the
integration volume element dV (or ditto for surface or linear distributions with elements dA
and dx respectively).
Then the sum becomes:
k dq0
Z
Vtot (~
x) =
|~
x−~ x0 |
k ρ(~x0 ) d3 r0
Z
= volume (3.29)
|~
x−~ x0 |
k σ(~ x0 ) d2 r0
Z
= area (3.30)
|~
x−~ x0 |
k λ(~ x0 ) dr0
Z
= line (3.31)
|~
x−~ x0 |
148 Week 3: Potential Energy and Potential
The rules above give us two distinct ways to evaluate the potential in any given problem,
and we must look at the problem carefully to assess which one is best.
a) If the field is known, varies only in one dimension, and is integrable in some system
of coordinates, we can integrate Z
− Ex dx
to find the potential. For all practical purposes in this course, problems involving the
symmetric distributions of charge whose fields we can find using Gauss’s Law are
precisely the ones where it is likely to be most convenient to evaluate the potential in
this way.
It is necessary to use this approach to find the potential differences of a non-compact
charge density distribution such as an infinite line or infinite sheet. This is because
the sum of the potential of an infinite amount of charge (however it is distributed)
is infinite, which is in turn why we restrict the use of the superposition forms of the
potential that vanish at ∞ to compact charge distributions.
b) If the field is not known or discoverable from Gauss’s Law and/or is not “one dimen-
sional” in the sense that we can easily find a line to integrate over where the vector
components of the field don’t enter in a non-trivial way, we will probably be better off
computing the field directly from the superposition principle – summing or integrating
all of the contributions to the potential from all the point charges or point-like elements
of a charge distribution to find the total.
Note that both of these approaches will yield the same answer for charge distributions
with compact support within the inevitable constant V0 for all problems to which they are
consistently applied. In fact, even for non-compact distributions they will yield the same
answer for the part that varies with the coordinates of the point once one “renormalizes” the
limiting form of the superposition answer by subtracting the appropriate infinite constant.
That’s because the negative gradient of the two forms must, of course, return the same
field!
Week 3: Potential Energy and Potential 149
+q
+a
x x
−a
−q
This is the same dipole studied in the the chapter on field. Find the potential at
an arbitrary point on the x-axis.
This problem is deceptively simple. We know from the superposition principle that the
potential is:
2
X ke qi
V (x) =
ri
i=1
ke q ke q
= − 2 =0 (3.32)
(x2 +a )2 1/2 (x + a2 )1/2
This is absolutely correct – the potential of a dipole vanishes on the entire plane that
symmetrically bisects the line connecting the charges.
The “deception” occurs when we try to compute the field by using E ~ = −∇V~ . We are
ever so tempted to go e.g.:
dV d0
Ez = − =− =0 (3.33)
dz dz
which is simple, easy, and wrong! The problem is that even though the function V (x, y, z)
is zero at a point that does not mean that its slope is zero at the point! We have to use
L’Hopital’s Rule to evaluate a derivative at a point where its lower order derivatives or value
are zero.
What this means is that we have to evaluate the function for V (x, y, z) near but not on
the point where the function is zero, take the desired derivative, and then let the parameter
that describes that nearness go to zero. In this case, we need to find V (x, z) for some
small z (near zero), take the derivative, and let the value of z in the derivative go to zero.
150 Week 3: Potential Energy and Potential
See if you can draw pictures to verify the following algebra, for a point z ≪ a ≪ x above
the point on the x-axis.
ke q ke q
V (x, z) = 1/2
− 2 (3.34)
(x2 2
+ (a − z) ) (x + (a + z)2 )1/2
Now we can differentiate:
d ke q d ke q
Ez = − +
dz (x2 + (a − z)2 )1/2 dz (x2 + (a + z)2 )1/2
ke q(a − z) ke q(a + z)
= − 2 2 3/2
− 2 (3.35)
(x + (a − z) ) (x + (a + z)2 )1/2
NOW we can let z → 0 to find out what the field is on the x-axis (adding and cancelling
terms as necessary, and substituting pz = 2qa in for the dipole moment):
2ke qa
Ez = −
(x2
+ a2 )3/2
ke pz
= − 2 (3.36)
(x + a2 )3/2
Compare this to equation (1.26)! Hmmm, looks the same58 ! And it wasn’t that difficult,
although it was certainly more difficult than we might have expected. To see how really
easy it was, consider. We actually just obtained the exact Ez field for all points in space,
since the answer is azimuthally symmetric and we could rotate the answer to tell us the
field in planes other than the xz plane! And the Ex field is equally easy to find.
It will turn out that Cartesian coordinates suck in so many ways when doing physics
problems. Physics is if anything naturally spherical or cylindrical – nature is only rarely
rectilinear. Let’s redo the potential problem above, but not let’s find the potential at an
arbitrary point in space in spherical polar coordinates. Remember, the math section has
a lovely little review of Cartesian, Cylindrical and Spherical coordinate systems – the big
three one needs to work with in this course – in case you have never seen spherical
coordinates before (or don’t remember them, effectively the same thing).
58
Allowing, of course, for the change in the name of the vertical axis...
Week 3: Potential Energy and Potential 151
z r1
+q
r
+a
r2
θ
x
−a
−q
Figure 3.2: A simple dipole aligned with the z-axis, in a spherical coordinate system.
Find the potential of this dipole at an arbitrary point P = (r, φ, θ). Because the
problem is manifestly azimuthally symmetric the answer cannot depend in any
way on φ (the azimuthal/longitude coordinate), so we might as well label the
point P = (r, θ) in the plane of the figure, where the answer can be azimuthally
rotated by φ about the z-axis to any other plane without changing the form of
the answer.
The potential in this problem is extremely easy to find if you can remember the law of
cosines:
p
r1 = + r 2 + a2 − 2ar cos(θ) (3.37)
p
r2 = + r 2 + a2 + 2ar cos(θ) (3.38)
Of course, if you don’t remember the law of cosines, you should visit the math chapter and
learn to derive it in two or three lines so you don’t ever forget it again, as we will use it fairly
often and you don’t want this to be an obstacle to your learning!
To find the field now, one can take the gradient of this exact result. However, actually
taking gradients is beyond the immediate scope of this course, so just bear in mind that
you can (and if you are a physics major, almost certainly sooner or later will) and otherwise
forget it. Doing so isn’t particularly simple in any event because of the fairly complicated
denominators (although it is still much easier than finding the field directly).
Consider what happens, though, when one looks at the potential at a point r ≫ a, so far
away that the dipole looks like a “point object”. To find the potential then, we must use the
152 Week 3: Potential Energy and Potential
binomial expansion to factor out the leading r dependence and to move the complicated
stuff from the denominator to the numerator (losing the square roots in the process). That
is:
ke q ke q
lim V (r, θ) = −
r≫a (r 2 + a2 − 2ar cos(θ))1/2 (r 2 + a2 + 2ar cos(θ))1/2
a2 −1/2 a2 −1/2
ke q a a
= (1 − 2 cos(θ) + 2 ) − (1 + 2 cos(θ) + 2 )
r r r r r
ke q
a a 2 a a2
= (1 + cos(θ) − 2 + ...) − (1 − cos(θ) − 2 + ...)
r r 2r r 2r
3
ke q a a
= 2 cos(θ) + O
r r r3
ke 2qa
≈ cos(θ)
r2
ke pz p
~ · r̂
≈ 2
cos(θ) = ke 2
r r
(3.40)
z r
λ
a
y
x dl = a d θ
dθ
Figure 3.3: A ring of charge in the xy-plane, concentric with the z-axis.
Suppose you are given a ring of charge with charge per unit length λ and radius
a on the xy-plane concentric with the z-axis. Find the potential at an arbitrary
point on the z axis.
Although there is a quick and easy answer to this problem (that will be apparent at
the end, if not at the beginning) we will work through this problem in detail to illustrate the
general methodology of finding a potential by integrating over a continuous distribution of
charge. The steps are:
b) Determine the differential charge of the chunk as “the charge of the chunk is the
charge per unit whatever times the differential whatever of the chunk” where ‘what-
ever’ might be length, area or volume (in this case length).
c) Write a simple expression in suitable coordinates for the differential potential pro-
duced at the point of interest by the differential (point-like) chunk of charge:
ke dq
dV =
r
where r is the distance from the chunk to the point of observation. Note well that this
is a scalar integral, making it relatively simple!
d) Integrate both sides. The left hand side becomes V (~r) at the point of observation (in
suitable coordinates). The right hand side becomes the algebraic expression of the
potential (the answer).
f) If one wishes to find the field from the potential, remember e.g.
dV
Ez = −
dz
Beware L’Hopital’s Rule! That is, if differentiating someplace that the function itself
vanishes (or its functional dependence on certain coordinates vanishes) be sure that
you differentiate at a general point near the limit point and then take the limit!
dq = λ dl (3.42)
We integrate over all of the chunks of charge that make up the ring by integrating θ from 0
to 2π:
Z 2π
ke λa dθ
Z
V (z) = dV =
0 (z 2 + a2 )1/2
ke (2πa)λ
=
(z 2 + a2 )1/2
ke Q
= (3.44)
r
where we used the fact that 2πaλ = Q, the total charge of the ring!
This final answer we can easily understand and might have even guessed without
doing an integral. All of the charge of the ring is the same distance r from the point of
observation, and potential depends only on this distance (not on direction) so the potential
is just ke times the total charge divided by that distance.
If we do indeed try to find the electric field by differentiating this last result:
d ke (2πa)λ
Ez = −
dz (z 2 + a2 )1/2
ke (2πa)λz
=
(z 2 + a2 )3/2
ke Qz
= (3.45)
(z + a2 )3/2
2
Compare this to equation (2.17) above. Hmmm, looks like they are the same! However,
evaluating the potential integral and then taking its derivative seems (to me, at any rate) to
be much easier than doing the integral to find the field directly, with all of its components,
and that’s before we evaluated the Ex and Ey fields explicitly.
Note that we can exploit the insight we gained from this problem in a variety of ways to
answer certain questions concerning the potential “by inspection”. For example:
Week 3: Potential Energy and Potential 155
• An arc of charge Q that has angular width θ and radius R, at the center of curvature;
• Six charges each with charge Q/6 arranged in a hexagon that has a distance 2R
between opposing corners, at the center;
all produce a potential ke Q/R at the point of observation indicated! In all these cases a
total charge of Q is arranged in various ways a distance R from the point of observation.
In potential direction doesn’t matter, so all of the potentials of all of the charges that make
up these systems add to the one simple result.
r r
S (inner)
S (outer)
If we want to find the potential produced by a spherical shell (or other spherical distri-
bution of charge) and try to find it by direct integration of the potential of all the charges
that make up the shell, we’ll quickly discover that while it is easy to write down the integral
we need to solve in some system of coordinates, it isn’t so easy to do the integral. It’s
still possible – good students of calculus or students who just want a challenge can tackle
it with a reasonable chance of success – but it isn’t terribly easy. It’s a useful example,
though, useful enough that I include it in the book after this “easy way” example, for those
very students who want to give it a try on their own and then have some way to check or
correct their work.
156 Week 3: Potential Energy and Potential
On the other hand, finding the electric field from Gauss’s Law is very easy (and is done
in detail in Week 2 above, so we won’t repeat the steps here). Try it on your own to make
sure that you get:
~ = 0
E (r < R)
~ = k e Q
E r̂ (r > R)
r2
in sphere-centered spherical coordinates. We recall that the potential of any charge dis-
tribution with compact support can be found from the field by directly integrating the field
according to:
Z ~r
V (~
r) = − ~ · d~l
E (3.46)
∞
In this case, we integrate piecewise from the outside in to find the field outside and inside
of the sphere, accordingly. Outside:
Z r
ke Q ke Q
V (~
r) = − 2
dr = (3.47)
∞ r r
R−r
s
R
dq
r
θ
φ y
Figure 3.5: Geometry for finding the potential of a uniform spherical shell of constant
charge density σ by direct integration.
Consider figure 3.5. You should recognize it has being almost exactly the same geom-
etry as was used to integrate to find the (much more difficult) electric field of the spherical
shell last week in a similarly advanced example. In a way, it would be a lot easier to just
do these two examples in the opposite order, as it is a lot easier to integrate to find the
potential than the field in the first place, and once we have done so we can always find the
field by differentiating.
As before, we lose nothing by putting a point P at a distance R from the origin. We
consider the charge dq of a tiny patch dA on the surface of the sphere, and write down the
potential of this patch at P :
ke dq ke σr 2 d cos(θ) dφ
dV = = (3.49)
s (R2 + r 2 − 2Rr cos(θ))1/2
We integrate both sides, the right hand side over the entire solid angle:
1 2π
ke dq ke σr 2 d cos(θ) dφ
Z Z Z Z
V = dV = = (3.50)
s −1 0 (R2 + r 2 − 2Rr cos(θ))1/2
We can do the φ integral immediately and factor out all the constants:
1
d cos(θ)
Z
2
V = 2πr σke (3.51)
−1 (R2 + r2 − 2Rr cos(θ))1/2
158 Week 3: Potential Energy and Potential
This is much easier to integrate than the vector relation of the field chapter example:
1
d cos(θ)
Z
2
V = 2πr σke
−1 (R2 + r2
− 2Rr cos(θ))1/2
1
2πr 2 σke −2Rrd cos(θ)
Z
=
−2Rr −1 (R2 + r 2 − 2Rr cos(θ))1/2
2πr 2 σke 1/2 1
= 2 R2 + r 2 − 2Rr cos(θ)
−2Rr
−1
2
2πr σke
= 2 ((R − r) − (R + r))
−2Rr
2πr 2 σke
= (−2r)
−2Rr
ke (4πr 2 σ) ke Q
= = (3.52)
R R
r r
S (inner)
S (outer)
Find the field and the potential at all points in space of a solid insulating sphere
with uniform charge density ρ and radius R.
If you will recall, finding the field of a solid sphere of charge is both an example in
the text above and was a homework assignment a couple of weeks ago – so by now you
should have gone over it repeatedly and made it your own. The result was:
4πR3 ρ
ke 3 ke Q
Er = 2
= r>R
r r2
Week 3: Potential Energy and Potential 159
and
4πρ ρr
Er = ke r= r<R
3 3ǫ0
for the exterior and interior of the sphere (where we used 4πke = 1/ǫ0 in the last equation
just so you don’t completely forget this relation as we prefer to work with ke but one day
you’ll need to be able to work with ǫ0 ). So just to humor me, get out paper and prove (to
yourself, if nobody else) that you can still get this result, starting with Gauss’s Law and
without looking.
With the field(s) in hand, we now recapitulate the reasoning of the previous example.
The distribution of charge has compact support, so we can integrate in from infinity to find
the potential (relative to infinity):
Z r Z r
V (r) = − ~ ~
E · dl = − Er > R dr
∞ ∞
Z r
= − ke Q r ′ −2 dr ′
∞
ke Q
= r>R (3.53)
r
and we find, as hopefully you had already anticipated, that the potential of the solid sphere
outside was that of a point charge with the same total charge at the origin, in perfect
correspondance with the field.
The place things get more interesting is when we try to evaluate the potential inside the
sphere. The potential is defined as an integral in from ∞, but the field changes functional
form at r = R. We therefore have to do the integral piecewise, doing first the integral
from ∞ to R, then from R to r. This is why we wrote out both terms in the spherical shell
example above, even though the field inside was zero (and so was that part of the integral)
– we want to get in the habit of always doing the integral piecewise and simply being happy
when one or another piece is zero, rather than either expecting it or forgetting that this is
what we are really doing. Thus:
Z r Z R Z r
V (r) = − ~ ~
E · dl = − Er > R dr − Er < R dr
∞ ∞ R
Z R Z r
4πR3 ρ
′ −2 ′ 4πρ
= − ke r dr − ke r ′ dr ′
∞ 3 R 3
4πR2 ρ
2πρ 2
R − r2
= ke + ke
3 3
2πρ 2
= 2πke ρR2 − ke r r<R (3.54)
3
Let’s think a teensy bit about this result, and then plot it (as we did for the field) to help
us remember it, as (recall) the uniform ball of charge is the basis of the simplest model
for an atom and hence the key to easily understanding lots of things such as polarization,
ionization, and more. First of all, note that the potential is (by the meaning of integrals
in the first place) the area under the Er (r) curve from r to ∞. E ~ is continuous but not
smooth (look back at figure 2.14 and note the cusp at r = R), but V (r) is continuous
and smooth at r = R – the function and its first derivative match at the point, although
160 Week 3: Potential Energy and Potential
the second derivatives differ. Outside the potential drops off like 1/r, a monopolar potential
that corresponds to the monopolar field. Inside, the potential increases like an upside down
quadratic all the way to the origin, where it has its maximum value!
V(r)
k eQ
R
R r
Figure 3.7: The potential produced by a uniform sphere of charge both inside and outside,
as a function of r.
r0
Find the field and the potential relative to the reference radius r0 at all points
in space around an infinite line of charge. Explore the necessity of a reference
point (because the indefinite integral is infinite at 0 and ∞).
As before, we will assume that you already know and can easily show that the field of an
Week 3: Potential Energy and Potential 161
where we use the convenient property of natural logs: ln(a) + ln(b) = ln(ab) to simplify the
final expression. If we let r0 = 1 (in whatever units we are considering this can be further
simplified to:
V (r) = −2ke λ ln(r) (3.57)
but this obscures the units – recall that the argument of any function with a power series
expansion e.g. ln must be dimensionless, so the “r” in this is the ratio of r in the units of
choice to “1” in the unit of choice. Note well that this does not matter whenever we compute
potential difference, which is the quantity that will be the most important one in the next
chapter/week:
Z r2
2ke λ ′ r1
∆V (r1 → r2 ) = − ′
dr = 2ke λ ln (3.58)
r1 r r2
where the natural log is negative (recall) when r1 < r2 so r1 /r2 < 1. This makes sense!
Note well that the potential decreases when we move away from the line in the direction of
the field (as the potential energy decreases when we move in the direction of its associated
conservative force).
On your own, show that we also get this expression if we form ∆V (r1 → r2 ) = V (r2 ) −
V (r1 ) using any of the forms for V (r) given above (even the one with ∞ in it, as long as
we are permitted to subtract ∞ − ∞ = 0, which of course is not necessarily or generally
true but which can be true as the setting of the zero of the potential).
Find the field and the potential relative to the plane itself at all points in space
around an infinite plane of charge. Explore the necessity of a finite reference
point (where e.g. z = 0 is the most convenient) because the potential integrated
in from ∞ is clearly infinite.
162 Week 3: Potential Energy and Potential
z
σ
Using Gauss’s Law (or taking the limit of e.g. a disk on its axis) you can easily show
that the electric field a distance z above an infinite plane of charge with charge density σ
is:
Ez = 2πke σ
(pointing away from the plane symmetrically on both sides) independent of z. That is, the
plane of charge creates a uniform electric field that reaches from the plane to (in principle)
∞ without change.
If we try to evaluate the potential at a finite point z relative to ∞ we get into trouble once
again because the charge distribution is non-compact:
Z z
V (z) = − 2πke σ dz = ∞ − 2πke σz (3.59)
∞
We feel uncomfortable with infinite quantities, so we either subtract away the infinity with
a new (infinite) constant of integration, or just measure the potential difference relative to
some other zero. A common, and convenient one (that leads to the same result as throwing
away the infinity is z = 0, on the plane itself. Interestingly, this is still well defined!
Z z
V (z) = − 2πke σ dz = 0 − 2πke σz = −2πke σz (3.60)
0
Again we will most often be interested in computing potential differences rather than
potentials in the subsequent chapters, especially for non-compact charge distributions.
We note that the functional variation with z is such that the potential decreases when one
moves away from the plane; this is the most important thing to keep in mind when trying to
assign or check the sign of the potential (or potential difference). The field always points in
the direction of decreasing potential.
In our initial discussion above, we went from finding the electrostatic potential energy of a
charge in the field of other charges to the “potential” – a scalar field that exists at all points
in space due to the presence of charges and is the “cause” of the electrostatic potential
Week 3: Potential Energy and Potential 163
energy of a (test) charge placed at that point in space. We wrote down a couple of equa-
tions that each represented the potential energy of a collection of discrete charges using
the superposition principle – we added up the potential energy for each pair of charges in
the collection (counted only once!) to get the total potential energy:
1 X ke qi qj X ke qi qj
Utot = = (3.61)
2 |~
xi − ~xj | |~
xi − ~xj |
i6=j i<j
where the inequality in the latter expresssion effectively causes us to count each ij pair
only once while the former just counts them in both orders but divides by two.
One way to think about the “count each pair only once” rule is to think of the energy as
being associated with a sort of bond between the two charges – there is only one “bond”
in between the two ends. We actually do the same thing when we think of the work stored
in a stretched spring with masses at both ends – we don’t count it twice just because two
objects will share that potential energy if the masses are released.
These formulas are simple enough, to be sure, but we should without any doubt do at
least one simple example just to see how this works.
Suppose we have a four identical charges q arranged in a square, and would like to com-
pute their total potential energy. There are a variety of ways we could draw a picture to
represent this – one such way is presented in figure 3.10, where we place the four charges
at the corners with coordinates (±a, ±a) in the x-y plane.
q q
+a
2a
+a
q
q
Figure 3.10: Four identical charges q arranged on the corners of a square with sides of
length 2a, centered on the origin in the x-y plane. Note that the length of a diagonal of this
√
square is 2 2a.
We could just write down the answer by inspection, but I’m going to walk you through
a particular way of understanding this result. Suppose we start with no charges anywhere
164 Week 3: Potential Energy and Potential
closer than ∞. In that case, there is no field anywhere in space and it costs us no work to
move a charge q from ∞ and locate it at (say) the lower left corner.
Now we want to bring in the second charge, but when we do, we have to do work
against the field/force produced by the charge that is already there. As we showed above,
the work I have to do against that charge to end up with it at the upper left hand corner
is:
ke q 2
Wme,12 = U12 = (3.62)
2a
where I introduced dummy indices to help us keep track of what charge we are working on
and which corner we are going to put it on (1234 starting at the lower left hand corner).
To bring in a third charge, I have to do work against the fields of both of these charges.
The work I do is stored as potential energy in the system, the same way the work I do lifting
a book of mass m off the ground by a height H against gravity is stored as the potential
energy of the book, mgH. In this case, the work done is to put the third charge in the upper
right hand corner is:
ke q 2 ke q 2
Wme,13 + Wme,23 = U13 + U23 = √ + (3.63)
2 2a 2a
(you can seem I’m just using ke q 2 /r for whatever the r is between the final resting points
of the charges).
Finally, bringing in the fourth charge to the lower right hand corner is clearly:
ke q 2 ke q 2 ke q 2
Wme,14 + Wme,24 + Wme,34 = U14 + U24 + U34 = + √ + (3.64)
2a 2 2a 2a
Now when I add them, sure, I get exactly what I expected and could have written down
directly:
j−1 X
4
X ke qi qj
Utot = = U12 + U13 + U14 + U23 + U24 + U34 (3.65)
|~
xi − ~ xj |
i=1 j=2
where we do not count “self-energy”, that is, any sort of Uii contribution. Self-energy is a
tricky subject! We’ll work on a specific model for it in a bit, and in the process encounter
one of the “mysteries” of physics that points along the path to a consistent (eventually) field
theory.
Summing up the terms, we get:
ke q 2 ke q 2
Utot = 2 +√ (3.66)
a 2a
This will guide us as we seek to compute the potential energy of of continuous charge
distributions.
Week 3: Potential Energy and Potential 165
We are now ready to compute the potential energy of a continuous distribution of charges.
We’ll start by generalizing the sum rules from before by coarse graining and treating each
little chunk of charge as a point charge. This is illustrated in figure 3.11.
dq 1 ρ
r 12
dq 2
r1
r2
We start by applying the usual ritual to express the charges in terms of differential
volumes:
r 1 )d3 r1
dq1 = ρ(~ r 2 )d3 r2
dq2 = ρ(~ (3.67)
where d3 r is another way of writing “the volume element in three dimensions for coordinate
r ”. We use it because in this particular context, we’re about to discover a potential V inside
~
an integral and it wouldn’t do to confuse it with a volume element written as dV !
Now we can easily write the (differential) potential energy of just these two chunks!
We have to integrate both sides of this over the entire distribution, but we have a problem!
If we fix, say, ~
r 2 and compute the potential energy of just dq2 at this point in the field of all
of the other charge in the distribution, we still have to sum over all of the chunks dq2 , using
a second integral. But if we integrate over both ~ r 2 to get every chunk in the field of
r 1 and ~
every other chunk, we’ll count each pair twice! This means that for this to work, we’ll have
to divide by one half:
r 2 )d3 r2
Z
1 ke dq1 dq2 1 ke ρ(~
Z Z Z Z
Utot = dU = = r 1 )d3 r1
ρ(~ (3.69)
2 V1 V2 r12 2 V1 V2 |~r1 − ~ r2|
where both integrals are over the entire support volume V of the charge distribution and
are labelled with the coordinate one is integrating over to keep all of this straight.
166 Week 3: Potential Energy and Potential
We can make this just a bit simpler if we identify the second integral in the big {} above:
r 2 )d3 r2
Z
ke ρ(~
V (~
r1) = (3.70)
V2 |~r1 − ~ r2 |
so:
1
Z
Utot == V (~ r 1 )d3 r1
r 1 )ρ(~ (3.71)
2 V1
In words, one way of evaluating the electrostatic potential energy of a charge distribu-
tion is to find the potential of that distribution at an arbitrary point inside by integrating
dV over the distribution, find the potential energy of a tiny chunk of that distribution at that
point, sum the chunks (with the integral) over the entire distribution, and then (recognizing
that this double counted the sum) divide by two. This clearly corresponds to one of the
two forms for the potential energy as a sum over discrete charges. But what of the other
form?
The second form, as we saw in the example above, is the “build a distribution” ap-
proach. We start with no charge, then bring a tiny chunk dq to (say) the origin for free – this
”for free” corresponds again to not counting self-energy, something we’ll talk a bit about
below once we have solved a key example. Then we bring in a second chunk, counting the
increase in potential energy as the work we do brinnging that charge in. We bring in the
third, fourth, etc, only instead of doing this with discrete charges, we do this with differential
chunks in a suitable coordinate system!
Formulating this algebraically in a way that doesn’t depend on a specific coordinate
frame is a bit tricky. That’s because we have to break the two integrals involved into two
disjoint pieces – an “interior” volume that represents the amount of charge brought in so
far and an “exterior” part that represents the integral of all of the charge outside of that
volume. By partitioning the integral in this way, we avoid double counting the same way we
avoid it with the double sum where one index (say j) runs from 1 to N and the other index
(i) runs from 1 to j − 1, with no self-energy, so i < j for all terms in the sum. With hopefully
obvious notation, then:
r 2 )d3 r2
ke ρ(~
Z Z
3
Utot = ρ(~
r 1 )d r1 (3.72)
V1,exterior to 2 V2,interior to 1
|~r1 − ~ r2 |
Both integrals are over the entire volume V of the support of ρ, note well! They are simply
arranged so that as each value of (say) ~
r 2 is included in the evaluation of V (~
r 1 ), that point
is subsequently excluded from the integral over ~ r2.
If you are reading this and trying to learn both what this means so you understand
it and how to have a faint hope of putting it into practice solving problems, you are very
likely feeling pretty insecure right about now. Hopefully you do get the idea – we do both
integrals (somehow) in such a way that we avoid double counting, great, fine, super, but
how the heck do we do that?
The easiest way to get the rest of the way there is (as usual) to work a good example!
Week 3: Potential Energy and Potential 167
R
dQ = ρd V
S
r’
dr
r
ρ
Figure 3.12: A ball of charge in the process of being built. So far it has accumulated out
to a radius r, and the figure illustrates a (dark shaded) thin layer of charge being added to
the ball.
Let’s find the potential energy of a uniform ball of charge with total charge Q and radius R.
We will use the method that implements the disjoint integral above and (more importantly)
interpret this as summing the work required to assemble the ball of charge from infinity –
the “build a ball” method, if you like.
Figure 3.12 shows just such a ball at an intermediate stage of being built, in spherical
coordinates centered on the ball. At the instant portrayed, a uniform charge density ρ has
filled the ball of eventual radius R (dashed sphere) out to the radius r (light grey sphere).
We have grabbed a handful of charge dQ = ρdV at infinity and pushed it, working against
the field of the ball in to form a differentially thin layer with volume dV on the surface of this
ball.
Now, the field of the ball of charge (so far) is, from GLE, using a (red) gaussian surface
S of radius r ′ outside of the ball:
1 4πρr 3
I Z
~ ′2
E · n̂dA = Er 4πr = ρdV = (3.73)
S ǫ0 V/S 3ǫ0
3Q
ρ= (3.74)
4πR3
so:
3
′ ke Q Rr 3 ke Qball so far
Er (r ) = ′2
= (3.75)
r r ′2
expressed in terms of the givens Q and R. In this:
r3
Qball so far (r) = Q
R3
168 Week 3: Potential Energy and Potential
is the fraction of the total charge inside r. At this point, if you have mastered the homework
and examples done so far I could probably have just written the last two equations down
and started here and you would have followed the argument, but it never hurts to practice
using GLE as we go.
Next, we find the work we do pushing the charge dQ in from ∞ against this field to the
r ′ ) = dQEr (r ′ ), so:
radius r. We push in the negative r ′ direction, and Fr (~
3 3
r ke Q Rr 3 ′ ke Q Rr 3 ke Qr 2
Z
dW = −dQ dr = dQ = dQ (3.76)
∞ r ′2 r R3
As we have argued before, the potential energy we add to the system equals the work
we do adding this charge. We recognize the fraction multiplying the dQ at the end as just
V (Qball so far , r). We also use our usual litany to write:
3Q 2 3Qr 2 dr
dQ = ρ dV = × 4πr dr = (3.77)
4πR3 R3
ke Qr 2 3ke Q2 r 4 dr
dU = dW = V (Qball so far , r) dQ = dQ = (3.78)
R3 R6
We have now reduced the two 3D integrals in the formal algebraic expression to one
one dimensional integral that (I sincerely hope) makes sense! If you like, the increase in
potential energy brought about by increasing the thickness of the charged ball of radius r
so far by a differential amount dr is:
dU = V dQ (3.79)
where V is the potential at the surface of the ball so far and dQ is the charge added at that
radius. V of the ball at its surface is the interior integral at the specific radius r, and we’re
about to do the exterior integral to add up the work required to build the ball out of many
infinitesimal layers, like an onion!
Now it is easy!
R
3ke Q2 3 ke Q2
Z Z
Utot = dU = r 4 dr = (3.80)
R6 0 5 R
Let’s do this example a second time, but this time we’ll use the potential inside the ball of
the whole ball times dQ and divide the result by 2. We’ll start with the potential inside a
ball of charge, computed as an example above in equation 3.54:
2πρ 2 3ke Q ke Qr 2
V (r) = 2πke ρR2 − ke r = − (3.81)
3 2R 2R3
Now we have to multiply this by dQ (in spherical polar coordinates as usual) and divide
by 2 to form dU :
1 3ke Q ke Qr 2
1 3Q
dU = V (r) dQ = − 3
× 3 r 2 dr (3.82)
2 2 2R 2R R
Week 3: Potential Energy and Potential 169
where I’ve simplifed ρ dV. We have two terms, then, to integrate from 0 to R to cover the
ball. First:
9ke Q2 R 2 3ke Q2
Z
U1 = r dr =
4R4 0 4R
and second:
R
3ke Q2 3ke Q2
Z
U2 = − r 4 dr =
4R6 0 20R
so that:
3ke Q2 3ke Q2
1 1
Utot = U1 + U2 = − = (3.83)
R 4 20 5R
as we got before.
Note that this method is arguably slightly more difficult than adding up the work required
to build the ball, once you factor in the work involved in both finding V (r) for r < R and
doing the resulting integrals. For spheres, at least, it is easy enough to jump straight to
V (r) = ke Qball so far (r)/r, evaluate the charge in the ball, and just write down the integral
to be done in one step, because GLE provides rules for the field and potential outside of a
spherical ball.
Before we go on to a final topic of interest regarding the potential energy of a spherical dis-
tribution of charge with compact support, let’s find the potential energy of a simple spherical
shell of charge Q at radius R. Now we know that Er = ke Q/r 2 for r > R and:
ke Q
V (R) = (3.84)
R
1
This lets us find U using the rule with the factor of 2 in it in one step:
1 ke Q2
Utot = V (R)Q = (3.85)
2 2R
There are several other ways of getting this result, including a “build a ball” solution that
actually uses the first method, summing up the work needed to add small increments of
charge dq to a ball that already has the charge q:
ke q
dW = dU = V (q)dq = dq (3.86)
R
so:
Q
ke q ke Q2
Z Z
Utot = dU = dq = (3.87)
0 R 2R
Note that in this latter case we actually end up integrating over q dq, not r, but we end up
getting the same result. It turns out that this approach is slightly easier to understand when
we think about charging up a conductor, which is the primary topic of the next chapter.
170 Week 3: Potential Energy and Potential
Up to now, we have ignored the so-called self-energy of a point charge – why, when we
sum over the potential energy of all pairs of charges do we not include a term that describes
a charge interacting with itself, the diagonal terms omitted in the discrete sum rule? Also,
why does it turn out to be “OK” (or at least, give us reasonable answers) when we do not
omit this double counting when we do a double integral over a smooth charge distribution,
presumably counting dq interacting with itself at zero-ish distance?
The answer is both complicated and extremely interesting. To understand it, we have
to build a model for a charged particle. Or rather, we just finished building something that
would work very well as a model for a charged particle – a charged ball with charge Q and
radius R. We see that when we do so, we end up with a potential “self-energy” of the ball
equal to something like:
3ke Q2 ke Q2 ke Q2
UQ = or or ∼ O(1) ×
5R 2R R
The first result was for a uniform ball of charage. The second was for a spherical shell
with the same charge and radius. Presumably other similar ways of distributing charge Q
inside a ball of radius R will differ in the dimensionless numerical factor, but will likely scale
like a constant of order unity times ke Q2 /R!
Now we can determine what happens when we imagine compressing a given finite
charge Q into smaller and smaller balls! As R → 0, we see that U (R → 0) → ∞! If we
forget the factor of 3/5, or 1/2 (which depends on the details of the charge distribution) and
focus on the rest, we can compute a couple of extremely interesting quantities that give
us insight into nuclear physics, certain properties of electrons, and field theories involving
pointlike charged particles in general!
Consider a model for a proton – where we know Q = +e and R ≈ 10−15 meters (one
fermi) – as a ball of charge. If one computes ke e2 /R in eV, one gets +1.44 MeV (try it!) This
is the order of magnitude of the energy bound up in the electrostatic field of the charge
of a proton. This energy is repulsive – the nucleus wants to blow apart and release the
electrostatic potential energy in the form of e.g. kinetic energy of its constituents, which we
believe to be three quarks if you look back at the first chapter!
Why don’t the quarks fly apart? They have to be bound together by even stronger
forces than this remarkably strong electrostatic repulsion! The so-called “strong nuclear
forces” that glue all of this charge together (with gluons, yet) must be much stronger than
electrostatic forces to make the total energy negative or a proton would not be a stable
bound state, and they are. Electronic energy levels in atoms are scale eV, nuclear energy
levels are scale MeV (and higher) which explains why stars burn slowly and release far, far
more energy than can be explained by “atomic” electronic bonding (conventional burning).
Nuclear fusion releases on the order of ten million times as much energy per fusion event
than does e.g. burning one carbon atom into carbon dioxide.
To consider the electron, we require a “true fact” (that is, fortunately, fairly common
knowledge): Mass and energy are interchangeable, and the “rest mass” of an object cor-
responds to a “rest energy” of mc2 where c = 3 × 108 meters/second is the speed of light.
Week 3: Potential Energy and Potential 171
Now we suppose that an electron’s rest mass is all due to its electrostatic energy of con-
finement, the energy tied up in the charge e confined to some radius, and we seek that
radius, which we will call “the classical radius of the electron”59 . This is the same compu-
tation as above, only backwards – we know the energy already, we know ke and the charge
−e, we solve for re . If you do this, using U ∼ mc2 = 0.5 MeV for an electron, one gets
2.8 × 10−15 meters. Note well that this is somewhat larger than the size of a proton (as the
electron has less energy). The classical radius of the electron turns out to be an important
quantity in determining the properties of electromagnetic radiation from point charges.
However, this leaves us with a serious problem. We can imagine a proton made up
of quarks bound with a force so strong that it overcomes the electrostatic force trying to
blow the quarks apart, and can even find some evidence that such a force exists and that
the proton is a composite particle. However, when we examine the electron in very high
energy collisions, we find no evidence of “structure” that might be expected to exist if it were
a composite particle. Furthermore, in all cases the electron behaves (within the bounds of
quantum mechanics) as if it were a true point-like particle with zero60 radius!
Last week we learned together, Gauss’s Law and the notion of equilibrium combine to
give us important information about conductors – material with an “inexhaustible” supply
of charged particles such as electrons that are free to move within the conductor and
~ = 0 inside a conductor
behave like an “electrical fluid”. In particular, we determined that E
in electrostatic equilibrium and that E ~ || = 0 at the surface, so that any electrical field
immediately outside its surface must be perpendicular to the surface.
This suffices to show that conductors are equipotential – the potential difference be-
tween any two points in the conductor or on its surface is:
Z ~
x1
∆V = − ~ · d~
E x=0 (3.88)
x0
~
Note that this doesn’t mean that the potential of the conductor is zero, only that it is a
constant. That is consistent:
~ = −∇V
E ~ 0=0 (3.89)
either of them and not have their interaction break the symmetry of the charge’s redistri-
bution, we can compute the potential difference between the conducting pair as a function
of the charge difference between them. This potential difference will turn out to be propor-
tional to the charge transferred and will only otherwise depend on the geometry of their
arrangment. In the next chapter this will be the basis of the notion of capacitance.
+Q
a
b
Figure 3.13: Charge sharing between two distant conductors connected by a wire. They
become equipotential, with charge transferred (shared) between them to make it so.
Here is an important example of equipotentiality. Suppose one has two conducting spheres,
one with radius a and one with radius b such that a ≪ b (as seen in figure ?? above. Let
us further suppose that the spheres are very distant from one another so that the field of
one is very weak in the vicinity of the other (so that very little charge redistribution occurs
if one or the other is charged up). We begin by imagining that we have put a charge Q on
sphere b.
In that case it is easy to see or show that:
b
kQ
Z
Vb = − Er dr = (3.90)
∞ b
on the other sphere. There is clearly a potential difference between the two spheres. Now
imagine that we connect the two with a thin conducting wire. They form a single conductor
and therefore quickly equalize their potentials as charge flows from b to a.
Charge is conserved. They will reach equilibrium when:
k(Q − q) kq ′ kq
= = (3.92)
b b a
where q is the net charge transferred from b to a and q ′ is the remaining charge on b. This
can be rewritten as:
q a
′
= (3.93)
q b
The smaller the sphere the smaller the fraction of charge on it, which makes sense since
the ratio of charge to radius must be the same.
Week 3: Potential Energy and Potential 173
Now, however, we compute the radial field at the surface of the two conductors. It is:
kq
Ea = (3.94)
a2
kq ′
Eb = (3.95)
b2
If we take the ratio of the field strengths we get:
Ea q b2 b
= ′ 2 = (3.96)
Eb q a a
and conclude that the field is much stronger on the surface of the smaller conductor. In
fact, it becomes infinite in the limit that a → 0 relative to a finite b.
What this tells us is that the field in the vicinity of a conductor in electrostatic equilibrium
at some non-zero potential is much stronger at sharp points than it is on smooth surfaces
with a large radius of curvature. This has important consequences, as we shall see!
Insulators are not ever perfect, because electrons as charge carriers are not bound to
the conducting substrate by an infinite potential energy barrier. In a sufficiently large field
electrons are torn from their parent atoms and insulators “suddenly” become conductors,
a process called dielectric breakdown. Lightning is a spectacular example of dielectric
breakdown in nature.
The way lightning (or any sort of arc discharge) works is that charge builds up on clouds
and/or the ground to create a large potential difference. At some point the field strength
associated with this potential difference becomes great enough that the force it exerts on
electrons exceeds the force binding the electrons to their parent atoms in the insulator (or
alternatively, they get enough potential energy to overcome the potential energy barrier
that confines them). At first only a few electrons get away, and are quickly accelerated by
the field as they get over the confining potential barrier.
These electrons in turn collide with other nearby atoms, tranferring momentum to them
and knocking still more electrons loose. A cascading chain reaction occurs that heats the
atoms in the path of the ever increasing flow of charge and knocks still more charge loose
to join that flow. In a fraction of a second, the superheated air becomes a white-hot plasma
that conducts electricity quite well and the enormous charge difference between ground
and cloud or cloud and cloud neutralizes in a burst of millions of ampere’s of current.
Bang! Zap! Ouch!
It is important to remember whenever working with high voltages that few materials are
terribly good insulators against the strong fields associated with large potential differences
over a short distance. That is, if you get close enough to a high voltage line it will simply
arc over and electrocute you. It may well arc through a piece of glass or plastic and kill
you. Wood is an insulator for ordinary voltages but conducts more than enough to kill you
if you try to touch a high voltage power line with a stick.
174 Week 3: Potential Energy and Potential
Note also that if one approaches a conductor with a charge, one induces a charge
on the part of the conductor nearest the charge. If that part happens to be a sharp point,
the properties of charge sharing on an equipotential conductor create an extremely strong
field in the immediate vicinity of the point. The field at a sharp point can easily be strong
enough to ionize air molecules in the immediate vicinity of the tip and make them conduct!
The ionized air molecules recover electrons from their surroundings, which emit light as
they rebind. This light (visible in the dark as a faint blue-violet glow on a thumbtack point
attached to an electrostatic generator) is called the corona.
+Q
E
glow (corona)
ground
Figure 3.14: External charge +Q induces a charge -q on the sharp tip of a nearby conduc-
tor. Electric fields lines leave the tip at right angles, producing a field that looks like that of a
very large point charge which is extremely strong very close to the tip. This in turn ionizes
nearby air molecules, creating the corona (and spraying/repelling negatively charged ions
out into the air where they are attracted to +Q and eventually neutralize it).
Those molecules quickly pick up charge from the tip and are then repelled by it. They
literally spray away from it, carrying charge and momentum and flowing towards the in-
ducing charge. This is a process called corona discharge and is how lightning works. A
lightning rod does not attract lightning (you never want to attract lightning) it neutralizes
it by allowing charge to gradually be pulled up from the ground and sprayed onto an ap-
proaching strongly charged cloud and slowly neutralize it.
Week 3: Potential Energy and Potential 175
Problem 1.
Physics Concepts
Make this week’s physics concepts summary as you work all of the problems in this
week’s assignment. Be sure to cross-reference each concept in the summary to the prob-
lem(s) they were key to. Do the work carefully enough that you can (after it has been
handed in and graded) punch it and add it to a three ring binder for review and study come
finals!
Problem 2.
Problem 3.
Now let’s assume a charge −q at both positions z = ±a on the z-axis and a charge +2q at
the origin. Note that this is a pair of opposed electric dipoles. a) Write an exact expression
for the eletrostatic potential of the dipole at ~
r = (r, θ, φ). Note that the potential must be
φ-independent because of azimuthal symmetry. b) Expand your answer to a) for r ≫ a to
leading (surviving) order. c) What might we call this term? (Hint: Count the poles.)
176 Week 3: Potential Energy and Potential
Problem 4.
Find by direct integration the potential on the axis of a thin disk of charge with surface
charge density σ and radius R. Then expand the result to leading order in the two limits
R ≫ z and z ≫ R and interpret the potentials in both of these cases.
Problem 5.
How much work is required to assemble a uniform ball of charge with total (final) charge
Q and radius R? Hint: This is the same as the potential energy of the sphere, so use
dU = V dq and imagine “building” the sphere a layer of thickness dr at a time. Alternatively,
compute the work directly by bringing a charge dq in from infinity against the electric field
of the charge already there (and distributed as a sphere of radius r).
Problem 6.
Compute the potential difference ∆V between: a) Two conducting spherical shells of radius
a and b with a charge +Q on the inner one and charge −Q on the outer one. b) Two
(infinitely long) conducting cylindrical shells of radius a and b with a charge per unit length
+λ on the inner one and charge per unit length −λ on the outer one. c) Two (infinite)
conducting sheets of charge, one with charge +σ on the xy plane and with with charge −σ
parallel to the first one but at z = d. Great! Now you’ve done almost all the work required
to understand Capacitance!
Problem 7.
t=0 b
a
+Q
−Q
Three thin conducting spherical shells have radii a < b < c respectively. Initially the
shell with radius a has a charge +Q and the shell with radius b has a charge −Q. At
t = 0, you connect the shells with radii a and c using a thin wire that passes through a tiny
(insulated!) hole through the middle shell and wait for the charges on all three to reach a
new equilibrium. Find:
c) The electric field at all points in space (in terms of Qa , Qb and Qc is OK).
Week 4: Capacitance 177
Problem 8.
Two rings of charge Q and radius R (uniformly distributed) are located at z = ±R and have
the same (z) axis. A small bead of mass m with charge q is threaded on a frictionless
string along the z axis. If the bead is displaced a small distance +z0 ≪ R from the origin,
describe the subsequent motion of the bead in detail. (Hint: That means find z(t) and the
approximate period T or angular frequency ω of harmonic oscillation for the bead, in case
that wasn’t clear.)
Problem 9.
Suppose you have a solid sphere with a radius R and a uniform charge density ρ. Find
the potential at all points in space. Now repeat this for a non-uniform charge density of the
form ρ(r) = ρ0 Rr (starting by using Gauss’s Law to find the field). Note that this is right on
the edge of being an “advanced” problem as it requires you to do an integral to evaluate the
total charge inside a Gaussian surface. To keep it from being “just” an exercise in calculus,
note the following:
The volume of a differentially thin spherical shell is its area 4πr ′2 times its thickness dr ′ :
dV = 4πr ′2 dr ′
So integrate both sides between sensible limits to find the charge inside a Gaussian
sphere of a given radius inside or outside of the sphere. You can do it! (BTW, I use r ′
instead of r so you can make r a limit of integration – remember how that works?)
Let’s try to use this to understand a little bit about nuclear fission. Suppose that the charge
Q in the previous problem is distributed uniformly in an incompressible fluid. Now imagine
that sphere splitting into two identical, smaller spheres. Find the radius R′ of these two
spheres. Obviously, each sphere has a charge of Q/2. Find the total electrostatic energy
of these two spheres once they have stabilized and are separated by a large distance.
Compare the answer to the answer from the previous problem. Was energy released?
What form would you expect this energy to take?
178 Week 4: Capacitance
Week 4: Capacitance
• Conductors store charge and as they do so, their potential (difference) increases
relative to ground.
|∆Q|
C= (4.1)
|∆V |
or, the capacitance is the amount of charge we can store that creates a potential
difference of one volt between the conductors. Note the absolute value bars –
capacitance is given as a positive quantity.
1 Coulomb
1F = (4.2)
1 Volt
A farad is an enormous capacitance. Typical values for capacitors in devices range
from picofarads to microfarads, although one can actually buy one farad capacitors
for special projects these days. Large capacitors are dangerous! Especially when
strung together to make a large capacitor at high voltage! Anything over a few hun-
dred microfarads at a potential of 100+ volts or so can be lethal!
You should be able to derive the following quantities (from Gauss’s Law, integration
of potential difference, dividing into the presumed total charge):
• Cylindrical capacitor:
2πLǫ0
C= (4.4)
ln(b/a)
where a is the outer radius of the inner conductor, b the inner radius of the outer
conductor, and L is its length (where we assume L ≫ (b − a)).
179
180 Week 4: Capacitance
• Spherical capacitor:
ab
C = 4πǫ0 (4.5)
(b − a)
where a is the outer radius of the inner conductor and b the inner radius of the outer
conductor.
1 1 1 Q2
U = QV = CV 2 = (4.6)
2 2 2 C
where the first form is the simplest to understand.
One question that is very important is where is all this energy stored in the capacitor?
The “best” answer will be: in the electric field! If we write the energy in terms of the
electric field, we find that the energy density of the electric field is given by:
1
ηe = ǫ0 E 2 (4.7)
2
• Dielectrics are insulators that polarize when placed in an electric field. This builds up
a surface charge that reduces the electric field inside the material – it displaces it
from its usual value. For “weak fields” this reduced field is:
~
~ = E0
E (4.10)
ǫr
where E ~ 0 is the external field, E~ is the field inside the dielectric, and ǫr ≥ 1 is the
relative permittivity (also called the dielectric constant κ in many “standard” physics
textbooks, although this usage has been deprecated as being too ambiguous) and is
characteristic of the material.
One can consistently describe both conductors and insulators in terms of their di-
electric properties by evaluating their permittivity (relative to the vacuum permittivity
ǫ0 we’ve used so far) and using it to compute the electric field inside the material:
ǫ = ǫr ǫ0 (4.11)
This is the actual permittivity of the material, and in the general case of a time depen-
dent applied electric field is a complex-valued function of frequency, leading (eventu-
ally) to a consistent description of resistance and Ohm’s Law, and to dispersion and
the rainbow!
a) They physically separate the plates (which, recall, experience a possibly strong
force of attraction).
b) They reduce the field in between the plates, which reduces the potential dif-
ference, which increases the amount of charge one can store per volt – the
capacitance. If the material fills the space between the plates you should be
able to (easily) show that:
C = ǫ r C0 (4.12)
where C0 is the capacitance without the dielectric.
c) They prevent dielectric breakdown, so the physical separation of the plates d can
be much smaller (and the capacitance much larger) at some design voltage.
4.1: Capacitance
In the previous chapter we noted that conductors in electrostatic equilibrium are equipo-
tential. If you imagine charging up any given conductor, every new bit of charge we add to
it spreads itself out the same way. One expects the field produced at its surface to scale
up or down proportional to the amount of charge on the conductor but not change its basic
shape. As a consequence, one expects the potential produced by the conductor to be
proportional to its total charge at all points in space, in particular inside the equipotential
conductor itself.
This has been apparent in all of our Gauss’s Law examples up to now. For example, a
conducting sphere of radius R, charged with a total charge Q, has a field:
ke Q
Er = (r > R) (4.13)
r2
= 0 (r < R inside the conductor) (4.14)
parameters in the potential besides the charge are ke and things that describe its geometry,
such as its physical dimensions and shape.
We could thus define a quantity we might call the “volticitance” of the conductor V so
that (in the case of this example):
V = VQ (4.17)
with
ke 1
V= = (4.18)
R 4πǫ0 R
where we have introduced the capacitance, the constant of proportionality that depends
only on the geometry of the conductor.
To be specific, we define the capacitance of an arrangement of conductors used to
store charge to be:
Q
C= (4.20)
V
where V is the potential difference across the arrangement as a function of the common
charge Q used to create it. In the case of our example, the capacitance of an isolated
conducting sphere is:
C = 4πǫ0 R (4.21)
In general the SI units of capacitance are easily remembered (as always) from the
defining relation:
1 Farad = 1 Coulomb
1 Volt
and it is useful to remember that the dielectric permittivity of free space is:
which we should also recognize as being the natural units of ǫ0 (or 1/ke ) times a length.
Although we might have occasion to refer to the capacitance of an isolated conductor
used (for example) as the storage ball on a VandeGraff generator, we will almost always
use capacitance in the context of specific arrangements of two conductors that are de-
signed and intended just to store charge in this way. Those three arrangements are:
• A parallel plate capacitor. This is our template model, and you should thoroughly
learn it as it is quite simple and informative.
The latter two are primarily useful as teaching models, as you know everything you need to
know in order to compute their capacitance from Gauss’s Law and the definition of potential
difference. Let’s examine these three cases in some detail.
+Q
∆Q E d
−Q
Figure 4.1: An “ideal” parallel plate capacitor of cross-sectional area A and plate separation
d.
In figure 4.1 two parallel, flat, conducting plates are arranged so that they are sepa-
rated by an insulating (empty/vacuum) gap d. A metaphorical “blue devil” armed with a
metaphorical micro-pitchfork (that is, a still undefined process we will discuss later) forks
up charge from one plate and shoves it, working against an ever increasing electric field,
over to the other plate, eventually creating (after doing an amount of work that we will
of course calculate shortly) the situation portrayed, with a charge +Q on the lower plate
and −Q on the upper plate. We will invariably assume that a charged capacitor has the
same magnitude of opposing charges on the two plates – in the static limit this is an exact
result61 .
Drawing all of this is a bit much, so we will idealize the figure as shown in figure 4.2,
seen from a perspective that shows us the cross-sectional area A and the field between
the plates. The two wires connected to the upper and lower plates are used to charge them
up or connect them into a circuit.
We will name this arrangement a parallel plate capacitor – this is our archetypical
capacitor, and finding the capacitance of other geometries, even when some dielectric
material is inserted between the plates instead of a vacuum, will follow exactly the same
steps illustrated below. This means that you should pay careful attention to those steps,
as they reinforce pretty much everything learned in the first three chapters and will help to
keep you from forgetting any of it as we move on to new material!
To compute the capacitance we execute the following steps, in order, every time!
a) Compute the electric field at all points in space, but in particular in between the plates,
using a mix of Gauss’s Law and the superposition principle. The field will, of course,
be directly proportional to Q. We will idealize the field at the edges of the plates,
61
Why? Consider the properties of a conductor in electrostatic equilibrium, which requires perfect cancella-
tion of the fields inside the conductors just inside the opposing surfaces...
184 Week 4: Capacitance
A −Q
d
+Q
Figure 4.2: An “ideal” parallel plate capacitor of cross-sectional area A and plate separation
d. In order for us to use Gauss’s Law to compute the electric field between the plates, the
√
condition d ≪ A should hold.
√
something that is permissible if d ≪ A and that in any event will not substantively
affect their potential difference.
b) Compute the potential difference between the plates. Like the field, this will depend
on the charge Q transferred from one plate to the other. Note well that we will always
be computing a potential difference but we will often be lazy and write it as V , not
bothering to add the ∆ as in ∆V . It just makes the algebra a bit simpler, and keeps
us from having to do the same thing for Q vs ∆Q.
c) Form the capacitance, C = Q/V . Note that the Q will always cancel out and leave
us with something that depends on ǫ0 and the geometric parameters of the plate.
Pay close attention to the dimensions and units, as you will need to be able to tell if
your answers to problems “make dimensional sense” on the fly!
So here are the steps. First we note that the charges distribute themselves (approx-
imately) uniformly on the facing surfaces of the two plates, getting as close together as
they can. This forms two equal and opposite sheets of charge with charge per unit area
±σ = ±Q/A. Applying Gauss’s Law to either one of them, say the lower, we get:
I
E~ · n̂dA = 4πke QinS
S
σA
|Ez |2A =
ǫ0
σ
Ez = = 2πke σ (4.22)
2ǫ0
(pointing away from the sheet of charge above and below it). We get exactly the same for
the upper plate, except that the field points toward the negative sheet of charge.
We then apply the superposition principle using figure 4.3 as a guide. Above and below
both sheets, the fields produced by the upper and lower charges cancel, as e.g. field from
Week 4: Capacitance 185
E
cancel
R −Q
−
d
+ add +Q
cancel
E
~
Figure 4.3: The E-field in between the two oppositely charged plates adds, while above
and below it cancels.
the upper one (in green) points down and the field from the lower one (in red) points up,
and the fields have equal magnitudes. In between the plates, the field from the upper plate
points up and so does the field from the lower one – the two fields add. Thus we obtain a
total field of:
σ
Ez = 4πke σ = (4.23)
ǫ0
directed upwards between the plates, as drawn, and conclude that Ez = 0 should hold
above and below the plates, at least for “infinitely wide” plates. Note well that this field is
automagically zero inside the conducting metal of the plates themselves and in the wires
above and below the plates! Our assumption of charge distributing itself in two uniform
sheets is consistent as it leads to the field vanishing inside the conductor, as we expect.
Actual Ideal
Figure 4.4: Fringe fields at the edge of an actual pair of parallel plates carrying opposite
charge compared to the idealized field that vanishes sharply at the edge and is uniform
in between the plates. Note that the field, and hence the potential difference, is almost
identical in most of the volume between the plates.
Of course, our plates cannot really be infinite in area. What happens at the edges of
the plate? There, the field “bulges” out from between the plates and forms curved field
lines that resemble those of an electric dipole (because after all, the plates do form an
electric dipole of a peculiar form). However, this “fringing field” rapidly falls off in magnitude
compared to its strength between the plates, so much so that we won’t go far wrong if
we assume that it is zero – so that the electric field is effectively confined to the volume
186 Week 4: Capacitance
directly in between the facing plates. In this course we will therefore always idealize this
~
by asserting that the E-field “vanishes” just beyond the edges of the plates and is perfectly
uniform in between, even though this isn’t precisely true. This situation is portrayed in
figure 4.4
With the fields in hand, it is but the work of a moment to compute the potential difference
of the upper plate relative to the lower (or vice versa):
d
Qd
Z
V = ∆V = − Ez dz = −4πke σd = − (4.24)
0 ǫ0 A
Note that the integral we computed is negative, which simply means that the upper plate
is at a lower potential than the lower plate (consistent with the field pointing from the lower
to the upper plate).
We are ready to form the capacitance. Our potential difference is negative, but when
we form the capacitance we by convention make it a positive number – obviously the
capacitance is symmetric and we can charge the plates in either direction, so there is no
point in giving it a sign. We correspondingly form:
|Q| Q ǫ0 A
C= = Qd = (4.25)
|V | d
ǫ0 A
Note well the dependence of this archtypical capacitance on the dimensions of the capac-
itor. The dielectric permittivity of free space ǫ0 appears on top and clearly has SI units
(above others) of farads per meter. The capacitance varies with the cross-sectional area
of the facing plates and inversely with their separation. Bigger plates (more area) means
bigger capacitance; closer plates (smaller separation) also means bigger capacitance.
This is an important enough result that you should probably try to remember it as well
as being able to derive it in detail, following all three steps outlined above. Note that this is
a great problem to practice because this one problem requires you to use Gauss’s Law for
the electric field, the superposition principle, the definition of potential (difference) in terms
of an integral of the field, the definition of capacitance, and a certain amount of common
sense as far as idealization of the plate fields and the self-consistent distribution of charge
in static equilibrium.
We’ll now quickly indicate the key step for cylindrical and spherical capacitors, but with-
out presenting all of the steps. Your very first homework problem is to fill in the missing
steps yourself, creating “perfect” derivations of the capacitance for conducting plates with
all three Gauss’s Law geometries. Don’t forget to draw your own figures!
Given two concentric cylindrical conducting shells of length L and radii a and b such that
δ = b − a ≪ L, find their capacitance. This is pictured in figure 4.5, although the figure
exaggerates the size of a and b relative to each other or L. Usually the shells would be very
close together, effectively trapping the field in between them everywhere but quite close to
their edges.
Week 4: Capacitance 187
Figure 4.5: A cylindrical capacitor of length L, inner radius a, and outer radius b.
Solution: As before, assume that they are charged up to +Q on the inner and −Q on
the outer, perhaps by means of work done by our little blue devil dude and his charged-
particle pitchfork. This puts a charge per unit length of ±λ = ±Q/L on the inner and outer
shell, respectively.
~
Again, we assume that L ≫ b − a so that we can use Gauss’s Law to find the E-field:
2ke λ
Er = a<r<b
r
in between the cylindrical shells and Er = 0 otherwise – both for r < a and r > b – and
as before we neglect the fringing fields that we expect to bulge out at the ends of the
cylinders). Then:
b
b 1 Q b
Z
V = ∆V = − Er dr = −2ke λ ln =− ln (4.26)
a a 2πǫ0 L a
This is negative because we integrated from inside out (in the direction of the field). We
could just as easily have integrated from outside in and gotten a positive potential differ-
ence. As always, the only thing that matters is that the potential must decrease when
moving in the direction of the field.
The capacitance is now easy:
Q 2πǫ0 L
C= = (4.27)
ln ab
|V |
which has the right units – ǫ0 times a length. Still, it isn’t at all obvious that this has the
limiting form of ǫ0 A/d. You are asked to show that it does, after all, have this form for
homework. You might want to remember that ln(1 + x) ≈ x for x ≪ 1 is the limiting form of
the power series expansion for the natural log function when you get to this part of the first
problem.
188 Week 4: Capacitance
r
S
−Q +Q
a
b
Figure 4.6: A spherical capactor with iinner radius a, and outer radius b.
Given two concentric spherical conducting shells with the radius of the inner one a and the
outer one b such that δ = b − a, find their capacitance. This is pictured in figure 4.6.
Solution: At this point, the steps should be familiar. Imagine a charge ±Q on the inner
and outer shell respectively, put there by our intrepid devil. From Gauss’s Law:
ke Q
Er = a<r<b
r2
and Er = 0 otherwise, with no idealization or fringing fields. From this we trivially find:
Z a
V = ∆V = − Er dr
b
1 1
= ke Q −
a b
b−a
= ke Q
ab
1 b−a
= Q (4.28)
4πǫ0 ab
This time I cleverly integrated from the outside in, recognizing that this would give me
a positive potential difference as I integrate against the direction of the field. Now finding
the capacitance is easy:
4πab
C = ǫ0 (4.29)
b−a
where I’ve deliberately arranged it this way as a hint as to how to proceed to answer the
“limiting form” part of the first homework problem.
Week 4: Capacitance 189
It’s time to compute how much work our little devil dude does shovelling charge from one
plate over to the other. Imagine that he starts with the plates uncharged. The first pitchfork
full of charge ∆Q ⇒ dQ that he moves over is “free”. There is no field to push against yet.
The second one, however, he must push against the field of the first one. The third one he
must push against the field of the total charge of the first two. And so on.
V(Q)
V0
Slope 1/C
U = Area = 1/2 Q0V0
Q0 Q
dU = VdQ
Figure 4.7: The energy as the area underneath the curve V (Q) = Q/C.
Suppose he has been shovelling for a while on a capacitor C (where the particular
geometry of the capacitor does not matter as long as we know the capacitance) and at
this moment the total charge on capacitor plates is ±Q, so that:
Q
V = (4.30)
C
is the potential difference between the plates. Then the next chunk of charge that he
moves over with his little pitchfork (against the field/potential difference of the charge that
is already there) requires him to do the work of the devil 62 :
dWdevil = dUcapacitor = V dQ (4.31)
This is illustrated in figure 4.7. The work the blue devil does charging up the plates is equal
to the change in the potential energy of the charged plates63 . We therefore write:
Q
dQ
dU = V dQ = (4.32)
C
and can easily integrate both sides to find the total energy stored on the capacitor when
we begin with no charge and charge it up to a total charge Q0 :
1 Q0 1 Q20
Z Z
U = dU = Q dQ = (4.33)
C 0 2 C
62
Metaphorically speaking, of course...
63
Think of the work you do lifting a book over your head being equal to the increase in its gravitational
potential energy – the work done by gravity, or the electric field in the case of the capacitor, is the opposite of
the work done by you or the devil.
190 Week 4: Capacitance
We can thus easily write the total energy stored three ways:
1 Q20 1 1
U= = CV02 = V0 Q0 (4.34)
2 C 2 2
(where note, we use Q0 = CV0 to go from the first to the second, then use it again to go
to the third). The particular one of these that you end up using in any given problem most
often depends on the givens – in some problems, you’ll know Q and C; in others Q and
V . My usual advice to students is to be certain to learn at least one of these form plus
C = Q/V – then it is easy to find the other two as needed.
Of these, the third form is perhaps the best one to learn as it has a very simple graphical
interpretation. If we plot V (Q) = Q/C, we get a straight line of slope 1/C. The integral
of dU = V dQ is just the area under this straight line at the particular values Q0 and
V0 = Q0 /C. This, in turn, is just the area of a triangle – one half the base times the height.
1
Which is, as you can easily see in figure 4.7, U = Q0 V0 .
2
A very important question to ask is: just where is all of this energy in the capacitor stored?
We did a lot of work charging up the capacitor, and all of the work we can get back comes
from charge we’ve stored in this way being driven by the electric field of the charge itself
back into equilibrium as the separated charges neutralize and the field collapses. It is
therefore reasonable to guess that the energy is stored in the electric field we create as we
rearrange the charge in the first place.
Can we write the energy of the capacitor in terms of the field strength? Yes we can!
For simplicity, we’ll as usual in this chapter consider the parallel plate capacitor to see how.
In this course, we will then limit ourselves to verifying that this works in every case we can
compute directly from the potential as well as from the electric field energy density, that
is, that the result is consistent with the energy computed for e.g. spherical or cylindrical
capacitors, or with just the energy stored creating a uniform ball of charge or spherical shell
of charge. This isn’t quite a proof that it is general, but it certainly seems as though it makes
it more likely. We will defer to your next (more advanced) course in electrodynamics to
derive the result more precisely, where energy conservation in electromagnetism is known
as Poynting’s Theorem64 .
Consider, then, the energy stored in a parallel plate capacitor and write it in terms of
the electric field strength:
1 1 ǫ0 A
U = CV 2 = (Ed)2
2 2 d
1 1
= ǫ0 E 2 (Ad) = ǫ0 E 2 × (Volume where field isn’t zero) (4.35)
2 2
where Ad is the volume of the region in between the plates where the field is nonzero and
constant in our idealized picture (neglecting fringing fields). If we divide both sides of this
64
Wikipedia: http://www.wikipedia.org/wiki/Poynting’s Theorem. We could almost do the integral form of the
theorem in this course, but ts proper derivation and formulation requires both Maxwell’s equations in differential
form and some “real” multivariate calculus in the form of differential vector identities...
Week 4: Capacitance 191
At this point, we know how to compute the capacitance of our three “simple” geometries,
and know in principle how to proceed for more complicated cases (although the integrals
and so on may be very difficult in the general case, as always). Once we’ve either com-
puted or, even better, measured the capacitance of a capacitor, we won’t really care much
what the geometry is. We can start to treat a capacitor as an “object” in its own right, and
give it a symbol to use in designing e.g. electrical circuits. Our “standard symbol” for a
capacitor will be a pair of stylized “plates” viewed edgewise, with a wire running into each
plate.
Let’s use this symbol (and our knowledge that C = Q/V ) and compute the total capac-
itance of series and parallel arrangements of capacitors. We’ll start with series.
In figure 4.8 we see two arrangements. The top arrangement consists of three capac-
itors, labelled C1 , C2 , C3 , in a line, so that the tail of each is connected to the head of the
next one by a conducting wire (which appears as a simple straight line in the figure). This
arrangement is called series as each capacitor “follows” the next. Underneath this is a
single capacitor labelled Ctot .
192 Week 4: Capacitance
+Q −Q +Q −Q +Q −Q
C1 C2 C3
V V V
1 2 3
+Q −Q
C tot
V
tot
We need to find what Ctot has to be for these two arrangements to behave identically
in an electrical circuit. That is, when our devil-dude moves a charge Q from one end to
the other end, we want the potential difference between the ends to be exactly the same.
Here’s how you can understand what goes on.
Suppose you have a charge +Q on the leftmost plate as shown (which came from the
rightmost plate in either arrangment, leaving behind a charge of −Q). This pair of charges
creates a field in between. However, there can be no field in the conducting plates and
wires in the middle of the top row – they are in equilibrium! To cancel the field produced
by the first plate, a charge −Q is attracted to the plate facing it. But it cannot come from
any part of the conducting plates or wires in between, it has to come from the surface of
the next plate (leftmost of capacitor C2 ) charging it up to +Q. This in turn attracts −Q
to the right plate of C2 , leaving a charge +Q on the left plate of C3 . At this point (and
you should check this) the capacitors should all be happy. Each one has a charge ±Q on
it, with a field confined to live only between its plates. The field is zero inside the plates
themselves and in the connecting wires. Note that all we really used in this reasoning is
charge conservation – we couldn’t create charges anywhere, only move charges around –
and the idea that conductors in equilbrium can have no field inside.
Now consider the potential differences across each capacitor on top. Clearly the po-
tential difference across C1 is V1 = Q/C1 , the potential difference across C2 is V2 = Q/C2 ,
across C3 is V3 = Q/C3 . Similarly the potential difference across our desired total ca-
pacitance is Vtot = Q/Ctot , since it has to have the same charge on its left plate as the
arrangement on top.
Each wire between the capacitors is equipotential, because conductors in electrostatic
equilibrium have no field inside and are thus equipotential. If we want to find the total
potential difference across the top row of capacitors, we just have to add up the potential
difference across each capacitor. You can think of this as doing a piecewise continuous
integral across the wire at one end (get zero), the gap (pick up potential difference V1 ),
across the next wire (get zero), across the next capacitor’s gap, (get V2 ) etc. We end up
Week 4: Capacitance 193
with the two equations for the upper and lower arrangements:
Q Q Q
Vtot = V1 + V2 + V3 + ... = + + + ... (4.38)
C2 C2 C3
Q
Vtot = (4.39)
Ctot
where the dots indicate that there was nothing special about three capacitors in a row –
there could have been any number! We just add the potentials across as many as we have
(with the same charge on each capacitor) to get the total potential difference for the series
row.
These two forms must be equal for equal Q on the two arrangements. That’s the defini-
tion of the total capacitance of the upper arrangement – the equivalent single capacitor one
could replace the row with and get the same potential difference for the given Q. Equating
them and cancelling the common Q, we get:
1 1 1 1 X 1
= + + + ... = (4.40)
Ctot C1 C2 C3 Ci
i
where again the ... and final summation indicates that we just sum over as many capac-
itors as there are in the series row. For capacitors in series, the reciprocal of the total
capacitance equals the sum of the reciprocals of the individual capacitors in series.
Why is this rule so odd? Because in series, we would get a more intuitive result by
thinking of adding capacitors as if they were volticitors, and “volticitance” is the reciprocal
of the capacitance!
Why is series addition of capacitors important and useful? Putting capacitors in series
reduces the total capacitance (check this for yourself!) and isn’t a big capacitor better
than a small one? Well, yes and no. It turns out that most capacitors can only support
a finite voltage across them before dielectric breakdown occurs across the intervening
gap, shorting them out and burning them out. If you want to put more voltage than that
maximum across a capacitor in a circuit (and don’t have any rated at the desired voltage)
you can put a bunch of capacitors rated at a lower voltage in series until you can put the
desired voltage across them without exceeding the maximum for any single capacitor in the
series leg. Or, you might have a bunch of big capacitors in your box and need a smaller
one that wasn’t in your box – adding several up in series can let you save a trip to radio
shack!
So how about parallel? When several circuit elements are connected on both sides
by a common conductor, the conductor on each side is equipotential. That means that all
of the elements have the same potential difference across them. Note that this time I am
not bothering to explicitly indicate the charge −Q1 etc on the other plate of each capacitor.
Recall, a capacitor is presumed to always have equal and opposite charges on its plates
unless someone goes far out of their way to make up a problem with something different.
In figure 4.9 each capacitor in the top arrangement has a potential V across it. There-
fore the first capacitor has a charge Q1 = C1 V , the second has a charge Q2 = C2 V , the
third Q3 = C3 V . The equivalent total capacitance Ctot with the same voltage V across it
has a charge Qtot = Ctot V on it. For them to be the same, the total charge store on the
top arrangement has to equal that on the bottom.
194 Week 4: Capacitance
Q Q2 Q3
1
C1 C2 C3 V
Q
tot
C tot V
This makes the problem of finding the total capacitance really easy!
Qtot = Q1 + Q2 + Q3 + ...
Ctot V = C1 V + C2 V + C3 V + ...
X
Ctot = C1 + C2 + C3 + ... = Ci (4.41)
i
where we note that our rule works for any number of capacitors in series and write the final
rule accordingly. Capacitors in parallel add!
We can understand these two rules intuitively in the following way. Capacitors in parallel
increase the effective area where charge is stored, and hence just add. Capacitors in
series increases the effective separation of the plates for a given area, and hence reduce
the capacitance, adding reciprocally.
Before moving on, it is important to make one final observation. Capacitors (as we shall
see) behave in electrical circuits the way springs behave in mechanical systems – they
store energy and exert a restoring force on the charges that are stored that is proportional
to the charge. Note well the analogy:
Fx = −ks x (4.42)
1
V = − Q (4.43)
C
where 1/C behaves like a “spring constant” and where the minus sign indicates that the
potential created opposes the addition of more charge (we ignore this in the definition of
C, but used it in the computation of U ). If one computes the effective spring constant of
springs in parallel or in series, one obtains very similar results. Springs in parallel add,
with a total spring constant equal to the sum of the spring constants. Springs in series add
as reciprocals, where the total spring constant is less than the smallest constant of the
springs in the series.
Later we will learn that this analogy is nearly exact, after we discover the quantities
which behave like “friction” or “drag forces” in circuits and even discover a quantity that
Week 4: Capacitance 195
behaves like a “mass”. In the end we will find ourselves solving an equation that is identical
in form to the damped, driven harmonic oscillator studied last semester, only this equation
will yield the currents flowing in the circuit as a function of time. At that time it will be very
fruitful to be thinking “the capacitor is like a spring” to help us understand what is going on.
4.4: Dielectrics
We have taken some care to study electric dipoles as the most common arrangement of
matter that leads to an electric field, given the generally neutral character of matter. In-
deed, all of the capacitors studied above can be thought of as stylized “dipoles” storing
energy by separating charge. We have also observed that conductors placed in an electric
field polarize and create a (mostly dipolar) arrangement of surface charge that completely
cancels the electric field inside. But what of insulators? They too are made up of neutral
atoms and molecules, but lack the “free charges” that carry current, as the electrons as-
sociated with each molecule prefer to stay home instead of wandering off long distances
under the influence of any vagrant electric field.
To understand what a neutral atom does in the presence of an electric field, it will be
very useful to have a model of an atom. We know that an atom consists of a tiny, massive
nucleus with a charge +Ze where Z is the atomic number of the atom. Surrounding this
nucleus is a “cloud” of Z electrons (for a total charge of −Ze resulting in an electrically
neutral atom), bound to the nucleus by the electrostatic force. We rather expect the neutral
atom to be spherically symmetric in its distribution of charge so that there is little or no
electric field outside of the charge cloud.
We still don’t know all of Maxwell’s equations, but when we do, we will be forced to con-
front the unpleasant truth that it is impossible for the electrons to be moving in “convenient”
planetary-style classical orbits and for Maxwell’s equations to be true. Of course we also
don’t know how to solve the associated quantum problem. Se we might as well construct
the simplest possible model and hope that it provides us with some insight.
The model we will build is a to imagine the atom to consist of a pointlike nucleus surrounded
by a uniform ball of negative charge with a total charge of −Ze and a radius a (where
a is around one angstrom). This is called the Lorentz model for the atom, and works
surprisingly well – so much so that physics graduate students still use a dynamical version
to understand dielectric polarization and dispersion! See figure 4.10:
Now we can easily compute what will happen when we place this atom into a “weak”
electric field! We imagine that the field doesn’t change the shape or size of the electron
cloud but simply diplaces the nucleus away from its equilibrium position in the center to a
~ 0 balances the
new equilibrium where the force exerted on it by the external electric field E
force on it due to the electron cloud:
The upward field is E0 in the +z direction. The electric field of a uniform distribution of
−Ze in a ball of radius a is (see above or better yet, use Gauss’s Law to derive it again for
196 Week 4: Capacitance
a
+Ze
electron cloud
−Ze
Figure 4.10: An “atom” consisting of a tiny massive nucleus surrounded by a uniform ball
of negative charge modelling the “electron cloud”.
yourself):
−ke (Ze)z
Eatom = (4.44)
a3
(down). Thus the forces balance when:
ke (Ze)2 z0
+ ZeE0 − =0 (4.45)
a3
We can then solve for the dipole moment of the polarized atom:
a3
pz = (Ze)z0 = E0 = 4πǫ0 a3 E0 (4.46)
ke
There are two very important things to note about this. One is that the polarization of
the model atom is directly proportional to the applied field. Second, since each atom has
a dipole moment of this magnitude, one can compute the average dipole moment per unit
volume by dividing this estimate by the approximate volume occupied by each polarized
atom in a solid or liquid or gas. We call this “dipole moment per unit volume the polarization
of the material and give it the (vector) symbol P ~ . If (for example) we imagine a simple cubic
lattice of spherical atoms, there is one atom per cube of side 2a, with volume 8a3 . Thus:
pz π
P = = ǫ0 E0 (4.47)
8a3 2
where E0 is the field in the immediate vicinity of the atom (which in general will be the field
inside the material, not necessarily the applied external field).
There was nothing special about our guestimate of a volume of 8a3 per atom, and of
course the actual field will probably not be exactly what we compute above in the model
– we might well expect it to depend on the kind of atom and its quantum structure, on the
time dependence of the field (if any) and perhaps on still other things – but we nevertheless
expect that the restoring force will be linear in the charge displacement for weak fields be-
cause of the usual argument, a Taylor series expansion of the energy about the equilibrium
Week 4: Capacitance 197
+ZeE
0
+Ze
−ZeE atom
electron cloud
−Ze
position gets a leading possible contribution from the quadratic piece, corresponding to a
linear restoring force.
Overall, we expect quite generally that an insulating material will polarize, that the po-
larization for weak to moderate field strengths will be linear in the field, and that the order of
the polarization density will be some pure number times ǫ0 E. We give that dimensionless
number a special name and its own symbol – we call it the electric susceptibility χe such
that:
~ = χe ǫ0 E
P ~ (4.48)
Note well that the units of polarization are coulombs per square meter – those of surface
charge density. It remains to find a surface for which the polarization tells us a surface
charge density.
To continue our observations above, χe will, in general, be characteristic of the material;
it will depend on whether the material is solid or liquid or gas (gases usually have a very
weak polarization response because of the large volume occupied per atom) and of course
upon the neglected details of the material in our model – the quantum structure and/or
molecular structure of the material. For solids and liquids it will generally be of the order
of unity – in our example, χe = π/2 ≈= 1.5 – where for gases it will usually be “small” as
there simply aren’t a lot of atoms or molecules per unit volume, so no matter how well they
polarize individually you won’t build up much of a polarization density.
We are only interested in the static limit of the susceptibility in this intro course, but
it really depends on the time dependent behavior of the electric field, on temperature,
and much more. It takes the charge in a real material time to respond to changes in the
applied field and response times depend on the natural frequencies and damping times of
the charges that are responding. Many physicists have spent their entire careers studying
quantities that amount to general susceptibilities for various materials (which can have very
odd properties indeed!)
198 Week 4: Capacitance
Now that we understand what each atom in an insulating material does when the material
is placed in an external field, let’s try to understand what the material as a whole does – in
particular, what happens to the electric field inside, which is now the sum of the external
field and the field produced by all of those dipoles!
+σ Area A
+ + + + + + + + S top
−− −− −− −− −− −− −− −−
+ + + + + + + +
−− −− −− −− −− −− −− −−
+ + + + + + + +
−− −− −− −− −− −− −− −−
+ + + + + + + + t
E0 S
−− −− −− −− −− −− −− −−
+ + + + + + + + ρ= 0
−− −− −− −− −− −− −− −−
+ + + + + + + +
−− −− −− −− −− −− −− −−
+ + + + + + + +
−− −− −− −− −− −− −− −− S
bottom
−σ
In figure 4.12, we see an imaginary lattice of atoms, all polarized by an external field in
the direction indicated. Note well that we’ve erased the details of even our simple model
– we represent each atom as a neutral object with a small dipole moment where “some”
charge is split by “some” distance by the general process derived and discussed in the
previous section. We’ve drawn several possible Gaussian Surfaces inside the material.
Now let use Gauss’s Law. On the inside, if we draw any Gaussian Surface S large
enough to contain “many atoms”, since the atoms are neutral the average charge inside
will be zero 65 .
Note that even where it contains an extra charge or two of either sign by splitting an
atom, those charges are almost always paired with charges above or below on the neigh-
boring atoms and the bulk remains neutral, with an average charge density ρ ≈ 0. The
interior atoms, then, do not directly modify the average field.
This is not true on the surface. If we draw a Gaussian surface Stop so that it just
contains the upper half of the polarized atoms we see that it contains a nonzero positive
charge; inside a similar surface Sbottom on the lower surface there is an equal and opposite
negative charge. These charges make up a surface charge layer with a surface charge
density ±σb that is directly proportional to E, the net field in the medium.
65
If it contained an integer number of whole atoms, it would be exactly zero. If the surface cuts through
atoms to include or exclude some of their charge, the surplus charge is limited to be some fraction of the
charge on the atoms on the surface. But the number of atoms on the surface scales with the characteristic
length scale of the volume D like D2 where the volume inside the surface scales like D3 , so the average
charge scales smoothly to zero as the volume gets larger.
Week 4: Capacitance 199
Note Well: I put a subscript “b” on σ to indicate that this kind of “surface charge”
produced by the polarization of neutral insulator atoms or molecules where the plus
and minus charge is “bound” together and not “free” to move as it is in a conductor is
generally referred to as bound charge. We will only consider bound surface charge σb in
this course as the most common important case, but in principle one can generate bound
bulk charge distributions ρb .
In contrast, the charge we have discussed up to this point is primarily is “bare”, isolated,
normal, unbalanced charge, the charge that is directly producing electric fields or potentials
that we have evaluated various ways. In contexts where both are present, we will usually
differentiate them by means of “f” and “b” subscripts: A net free charge might be referred
to as Qf and a net bound charge might be referred to as Qb . Now, back to the thread of
our discussion.
Let us understand this in this particularly simple case, where the upper and lower sur-
faces are conveniently perpendicular to the field and the cross-section of the material is
rectangular. The total dipole moment of the system is given by the total charge on the
upper or lower surface, times that thickness (recall that all the charges in between sum to
zero). That is:
psystem = Qsurface t = (σb A)t = P V = P (At) (4.49)
(all in the direction of the field) or clearly:
σb = P (4.50)
This argument is actually more general than one might suspect – if you think about it in
terms of calculus you can see why it would be true for less conveniently shaped objects in
a uniform field and how it might be changed to accomodate an angle between the polar-
ization density direction at a surface and the normal to the surface there. In any event, the
modifications of the field we deduce from this below are reasonably general and hold for
arbitrary objects in nearly arbitrary fields66 .
Now let’s imagine this figure redrawn on a length scale where atoms are tiny – too small
to be seen in the figure (as they are in any macroscopic chunk of matter large enough to be
seen with the naked eye). When we consider the field between the surface charge layers,
the block of matter starts to look like, and behave like, a capacitor internally, with a reaction
field Er that flows from the positive to the negative charge layers in the opposite direction
to the applied external field. This situation is portrayed in figure ??.
Applying Gauss’s Law to the induced surface charge layers in this simple rectangular
geometry, we expect:
σb
Er = (4.51)
ǫ0
66
Truly advanced students might look ahead at a book on electrodynamics and learn about how this state-
ment is not precisely true and how polarization density itself both satisfies certain partial differential equations
and how our entire picture at this level relies on a linear response that is at best an (often quite good) approx-
imation.
200 Week 4: Capacitance
+σ
+ + + + + + + + + + + + + + + +
Er
E0
− − − − − − − − − − − − − − − −
−σ
+ + + + + + + + + + + + + + + +
E
0 E = E 0 / εr
− − − − − − − − − − − − − − − −
Figure 4.13: The polarized material generates a reaction field Er that opposes the applied
field and partially cancels it, making the total field in the material smaller. A dielectric
material thus reduces the applied electric field inside the material.
ǫr = (1 + χe ) (4.55)
Note Well: Most introductory physics books written for college or high school physics
courses omit any explicit mention of the susceptibility (leaving students with quite a chore
later if they go on in physics and have never seen it the next time they take electricity and
magnetism) and use the symbol κ to represent 1 + χe and call it the dielectric constant for
the material, as in:
κ = (1 + χe ) = ǫr (4.56)
This may seem very confusing to you, so let me review. ǫ0 is functionally equivalent to
ke , a constant of nature that connects the units of charge and length to those of field and
force at the microscopic scale of elementary particles (or in a vacuum), where of course
ke = 1/(4πǫ0 ). The presence of bulk neutral matter modifies the electric field E ~ 0 produced
by bare/isolated/free charges Qf that would be there in a vacuum; the field polarizes the
material, which creates a reaction field that strictly reduces the applied field inside the
material. The polarization density (dipole moment per unit volume) of the medium is related
to the net field in the medium E ~ by P~ = χǫ0 E.
~ The net field itself is related to the applied
field by E~ =E ~ 0 /ǫr where ǫr = 1 + χ.
There is one more thing we can do with the relative permittivity, the thing that gives it
its name. We can use it to define the permittivity of any medium:
ǫ = ǫr ǫ0 (4.57)
This form proves to be most useful in the more advanced treatments of electrodynamics
that e.g. physics majors will take that build on this course, but is beyond the scope of this
course. It is still worth reading about in passing for “culture”, or to plant a seed or two that
might flower later if you continue studying physics. If this does not describe you (and it well
might not!) feel free to skip the material between the next two separator lines.
We see that the field produced by the usual free charge we considered in the first three
chapters changes form “suddenly” – is displaced – at the surface of neutral dielectric
materials. It is useful to define a new field, closely related to the electric field (and force)
experienced by a bare test charge anywhere in space in a medium of some sort or not. We
will think of this new field as being produced only by bare unbalanced charge, and explic-
itly exclude from consideration the “bound” neutral charge that we have been discussing
above. We will call this non-bound charge free charge. This field will not change form as it
propagates from one material to another!
The field in question is called the electric displacement:
~ = ǫE
D ~ (4.58)
Note well that this is a very odd name. One would be inclined to call the reaction
field produced by the surface bound charge the “displacement” of the vacuum field inside
a medium, but this is incorrect. On the other hand, the electric displacement does not
change at the surface of a dielectric medium, totally counterintuitively! This drove me batty
for years of study as a physics major and even as a graduate student because it is some
sense an abuse of the English language.
Don’t fight it, accept it! The electric displacement “is what it is” according to this defi-
nition, and is the un-displaced version of the electric field. Sure, it might have been more
useful and descriptive to call it the “charge field”, but we are at this point all stuck with the
name, so if you plan to go on in physics you might as well learn it.
202 Week 4: Capacitance
The fundamental advantage of this electric displacement (field) is that we can write
Gauss’s Law anywhere, inside a dielectric, conductor, or vacuum, in a form that depends
only on the free charge present, not on any dielectric response of the medium. Since we’ve
cancelled out all dependence on permittivity, this form is just:
I Z
~
D · n̂dA = ρf dV (4.59)
S V /S
where ρf is the free charge density only. Note the absence of any form of the dielectric
permittivity! If we solve this, we can find the resulting field inside any linear medium by just
~ by ǫ = ǫr ǫ0 .
dividing D
Following this reasoning, the electric displacement of a point charge is even simpler
than the electric field of a point charge in charge centered coordinates:
~ = 1 Q r̂
D (4.60)
4π r 2
Note well the absence of ǫ0 ! The displacement itself has the units of charge per unit area
and completely captures the geometry of Gauss’s Law, but it is a vector that does not
correspond in any way to an actual surface charge density. In some sense it corresponds
to the imaginary (as in pretend, not complex) surface charge density one would get if
one took the central charge, displaced it uniformly by a distance r, producing the same
charge smeared out uniformly over the spherical surface of radius r, and then made it a
vector directed outward for positive charge an inwards for negative charge.
All clear now? Well, probably not so much. Possibly even as clear as mud! But if you
think about it even a bit now, and pay attention to my warnings about the undisplaced dis-
placement field that depends only on the free charge and never on the dielectrics present,
it will make the more mathematically involved treatments of this in intermediate and ad-
vanced electrodynamics a whole lot easier later.
At this point you hopefully understand how a dielectric insulator is polarized by a field,
how the polarization appears as a surface charge layer, how the surface charge creates a
reaction field that opposes the applied field and reduces it inside the dielectric so that we
can wrap all of that up in the simple relation:
E0
Ematerial = (4.61)
ǫr
where ǫr is the relative dielectric permittivity of the material. It seems like a good time to
list a few useful relative permittivities in a table:
So fine, so what are dielectrics good for? Dielectric insulators are often inserted be-
tween the plates of capacitors! Dielectrics have three purposes in capacitor design:
Table 2: Table of relative dielectric permittivities at room temperature (20◦ C) and some
associated dielectric strenths.
c) They prevent dielectric breakdown (most dielectrics have a dielectric strength greater
and more reliable than that of air, which is relatively small and varies with pressure
and humidity).
You can easily experience all three benefits by building your own capacitor. Take a roll
of aluminum foil, and cut two square pieces 10 cm by 10 cm. Use tape to fasten an unbent
paper clip to each one. Cut a piece of white printer paper 12 cm by 12 cm.
For grins, try setting up the two pieces of foil so they are separated by a perfect 0.1 mm
air gap. Don’t worry, if you wreck the foil you can cut new pieces. Can’t do it, right? And if
you did, somehow, manage it, the first time you put an equal and opposite charge on the
”plates” they would attract, and being as how they are made out of foil, they’d bend until
they touched, pop, end of capacitor.
Now just lay down one sheet of foil on the table. Cover it (symmetrically) with the paper.
Top it with the second piece of foil. Tape the foil to the paper on both sides. Congratulations!
You’ve made a capacitor! When the foil is pressed tight to the paper, the gap d is roughly
0.1 mm (a ream of 500 sheets of printer paper is roughly 5 cm = 50 mm thick) and has an
area A = 0.12 = 0.01 square meters. The paper prevents the paper from touching and is
more resistant to arcing than 0.1 mm = 10−4 meters of air!
To compute the capacitance, we have to solve the parallel plate capacitor problem all
over again. Suppose you put a charge ±Q on your capacitor (e.g. moving a net charge Q
from one plate and putting it on the other). This charge is free charge, unbalanced charge
that distributes itself on the conducting plates of the capacitor, so perhaps we should refer
to it as Qf to cleanly differentiate it from bound charge on the surface of the dielectric
paper.
The capacitor plates have an area A, so the magnitude σ= Qf /A and Gauss’s Law tells
you that the magnitude of the field in between the plates if there were no paper there would
204 Week 4: Capacitance
be:
σf
E0 = 4πǫ0 σf = (4.62)
ǫ0
However, now there is a dielectric in that space. The field is modified to become:
E0 σf σf
E= = = (4.63)
ǫr ǫr ǫ0 ǫ
−σ f
+ + + + + + + + +σ b
E = E 0/ εr
−σ b
+ + + + + + + + + + + + + + + + +σ
f
Figure 4.14: Bound and free charge in a capacitor filled with a dielectric.
In figure 4.14 we can write the field in the dielectric in two ways:
E0
E= = E0 − Er (4.67)
ǫr
where recall that Er is the reaction field generated by the surface charge σb , which is also
equal to the local polarization density at the surface. If we write out the fields E0 and Er
Week 4: Capacitance 205
in terms of the charges that produce them (basically using Gauss’s law on the two surface
charges), we get:
4πke σf
= 4πke σf − 4πke σb (4.68)
ǫr
If we cancel out the common factor of 4πke = 1/ǫ0 , we get:
σf
= σf − σb (4.69)
ǫr
or
1
σb = 1− σf
ǫr
ǫr − 1
= σf
ǫr
−χ
= σf (4.70)
1+χ
where the last form is in terms of the material’s susceptibility instead of the more commonly
used ǫr .
Note that an alternate, perhaps simpler, route to this relation is through the observation
that the magnitude of the bound surface charge density σb = P = ǫ0 χe E (from our previous
discussion of polarization density and the definition of the susceptibility).
σb = ǫ 0 χ e E
E0
= ǫ0 χe
ǫr
σf
= ǫ0 χe
ǫ0 ǫr
χe
= σf (4.71)
1 + χe
where we once again used ǫr = 1 + χe by definition. In this case one must put in the sign
relation (the bound charge always has the opposite sign of the free charge that it faces) by
hand.
We see that the bound surface charge on the dielectric σb is closely related to the
free surface charge σf on the actual plate of the conductor. Note well that Qf = σf A is
the actual charge stored on the conductor, but the presence of the bound charge layer
reduces the field that charge produces across the dielectric and therefore reduces the
potential difference between the plates of the capacitor for any given charge. This is, by
definition, an increase in the capacitance of the arrangement – more charge stored per volt
of potential difference.
Although we’ve done all of our derivation and examples in the cases above in the
context of a parallel plate capacitor, they hold in the general case for fields in materials,
even where the fields vary. The electric field in a medium is always given by E = E0 /ǫr ,
even where the field is varying as a function of coordinates. This latter derivation has the
advantage in that the first two lines hold for any source of the free-space field E0 , not
just a presumed external parallel plate capacitor with its uniform field. For example, if we
surround a bare point charge with a dielectric shell as portrayed in figure 4.15:
206 Week 4: Capacitance
+Q b b εr
a
+Q
−Qb
Figure 4.15: A bare charge +Q surrounded by a dielectric shell with relative permittivity ǫr .
Hopefully we all know quite well at this point that the “bare” field of the free charge +Q
in the center is just
Q
Er =
4πǫ0 r 2
From the reasoning above:
Q Q(ǫr − 1)
σb = −χe 2
=− (4.72)
(4πa )ǫr (4πa2 )ǫr
Q Q(ǫr − 1)
σb = +χe =+ (4.73)
(4πb2 )ǫr (4πb2 )ǫr
Note well that the total bound charge on either surface has magnitude:
ǫr − 1
Qb = Q (4.74)
ǫr
The charge on the inner surface reduces the field produced by Gauss’s Law “just right” to
produce a field of E/ǫr in the dielectric; the charge on the outer surface puts it back so that
the usual field obtains outside of the dielectric sphere!
Advanced: This can safely be skipped to the next separator line if you are not a physics major.
Before we go on to energy density, we should at least put down the more advanced
relations that you will derive and learn in a more advanced course in Electrodynamics
and hint at how such a derivation would proceed. Suppose n̂ is a normal unit vector
~ = ǫ0 χe E
perpendicular to a dielectric surface, where the polarization density is e.g. P ~ for
E~ just inside the material. Then σb , which is a scalar, is given by:
~ · n̂
σb = P
Our treatment above was valid for the special case that n̂||P ~ , but note that the dot
product gets the sign of σb right and corrects for the “tilt” of the surface relative to the field!
Week 4: Capacitance 207
We had to put the former in “by hand” above, and had no clue about the latter (although
you can show it easily enough if you recapitulate the original argument connecting σb to P
above for a tilted surface).
The last important relation involving bound charge is well beyond the scope of this
course to discuss, but note well that one can in principle generate a dielectric material with
nonzero bulk bound charge, that is, with a bound charge density ρb distributed throughout
the material itself and not just confined to the surface. In this case, the polarization density
becomes a function of this bound charge that is given by solving:
~ ·P
∇ ~ = −ρb
or equivalently: I Z
~ · n̂dA = −
P ρb dV
S V /S
(the two are equivalent due to the divergence theorem).
This expression looks a lot like Gauss’s Law, but for the polarization density, which in
turn is related to the local field, which is in turn related to the total charge inside Gaussian
surfaces, and in fact one derives this expression in the next course up from this one by con-
sidering all of these things and working out how the total field is modified by the presence
of a (e.g. linear response) dielectric material and extra bound charge distributed through
the dielectric.
That is:
~
~ =E
E ~0 − P
ǫ0
(see above) and if we take the divergence of both sides:
~ ~
~ = ρtot = ∇
~ ·E
∇ ~ 0 − ∇ · P = ρf + ρb
~ ·E
ǫ0 ǫ0 ǫ0
where we used ∇ ~ ·E
~ 0 = ρf /ǫ0 from Gauss’s Law for the free charge only. It all works out
just as it should!
So much to look forward to, if you are going on in physics!
As a last remark, consider field energy density inside a dielectric. If we recapitulate the
argument for field energy density for a parallel plate capacitor filled with a dielectric, we
get:
1 1 ǫr ǫ0 A
U = CV 2 = (Ed)2 (4.75)
2 2 d
where E is still the field between the plates, in this case the field inside the dielectric.
Hence
dU 1
ηe = = ǫE 2 (4.76)
dV 2
where ǫ = ǫr ǫ0 is the dielectric permittivity of the material. This is the correct form of the
energy density to use inside a linear dielectric material.
This is all we need to know about dielectrics, although the problems below will challenge
you with half-filled capacitors and the like to make sure you understand it well enough to
be able to use it.
208 Week 4: Capacitance
Problem 1.
Physics Concepts
Make this week’s physics concepts summary as you work all of the problems in this
week’s assignment. Be sure to cross-reference each concept in the summary to the prob-
lem(s) they were key to. Do the work carefully enough that you can (after it has been
handed in and graded) punch it and add it to a three ring binder for review and study come
finals!
Problem 2.
b) A cylindrical capacitor with inner conductor radius a, outer conductor radius b, and
length L (where L ≫ b − a);
c) A spherical capacitor with inner conductor radius a and outer conductor radius b.
ǫ0 A
C≈
d
where A is the area of the cylinder/sphere and d = b − a ≪ a (“small” separation). You
will need to use the power series expansion ln(1 + x) ≈ x + O(x2 )... to first order to do the
cylinder.
Problem 3.
Prove that the energy stored on the capacitor can be written as either side of:
1 1
Z
U = QV = ǫ0 E 2 dV
2 V 2
for all three geometries (where the integral is over the volume V between the plates).
Week 4: Capacitance 209
Problem 4.
A A
d d c
b a
where the first two are parallel plate capacitors half-filled with a dielectric material with
relative dielectric permittivity ǫr as shown, and the third is a spherical capacitor patially-
filled with the same dielectric as shown.
Problem 5.
Problem 6.
Problem 7.
You are given a square parallel plate capacitor of side L and plate separation d and a
slab of dielectric material with relative dielectric permittivity ǫr that exactly fills the volume
between the plates if fully inserted. At the moment, however, the slab is inserted only a
distance x. The capacitor has a constant free charge Q0 on it.
b) Is the potential energy minimal when the dielectric slab is fully inserted or fully re-
moved? Explain why.
c) By using
dU
Fx = −
dx
=find the force on the partially inserted dielectric slab. Does the force pull the dielec-
tric slab in (to fill the plate volume) or does it push it out from between the plates?
d) Draw a simple picture involving the probable bound charge distribution on the partially
inserted dielectric slab that physically explains this force.
Week 5: Resistance 211
Advanced Problem 8.
You are given a square parallel plate capacitor of side L and plate separation d and a
slab of dielectric material with relative dielectric permittivity ǫr that exactly fills the volume
between the plates if fully inserted. At the moment, however, the slab is inserted only
a distance x. The capacitor has a constant voltage V0 connected across it that can do
work adding charge to or taking charge away from the capacitor as the slab is inserted or
removed (!).
b) Draw a simple picture involving the probable bound charge distribution on the partially
inserted slab and the plates. Do you, based on this distribution, expect the slab to be
pulled into or expelled from between the plates?
c) Find the amount the potential energy of the capacitor changes when one inserts the
slab an additional amount ∆x. Does the energy increase or decrease? Is this result
surprising given your gut level physical expectation from the picture? (Don’t worry,
your gut is correct...)
d) To find the “missing energy”, determine the amount of work done by the voltage
source as one inserts the capacitor an additional amount ∆x. Note that this is re-
lated to the additional charge that flows onto the capacitor at constant voltage and
represents the decrease in the potential energy of the voltage source.
e) By using
dU
Fx = −
dx
where U is the total potential energy of the voltage source and capacitor, find the
force on the partially inserted place. Does the force pull the dielectric slab in (to fill
the plate volume) or does it push it out from between the plates after all?
212 Week 5: Resistance
Week 5: Resistance
Technically, this symbol is for an electrical cell, and a battery is a collection of cells in
series (with their voltages adding to create a higher voltage than we could otherwise
create with the chemical process) but the terms will be used interchangeably in this
introductory work.
213
214 Week 5: Resistance
1 1 1
= + + ... (5.7)
Rtot R1 R2
• Kirchhoff’s Rules:
a) Loop Rule: The sum of the voltage changes around a circuit loop must be zero
(conservation of energy).
b) Junction Rule: The sum of the currents flowing into a circuit junction must be
zero (conservation of charge).
• The battery described above is an “ideal” battery that can in principle deliver any
amount of power. A real battery (or other power supply) can never deliver an arbi-
trarily large electrical power to a circuit. One model that (quite accurately) describes
the limiting of power delivered from a battery is that of internal resistance. In this
model, a “real world” battery consists of two components integrated inside the bat-
tery housing – a source of electrical energy (usually chemical energy for traditional
batteries) and an effective internal resistance of the chemical medium and the rate
limiting aspects of the chemistry itself. When power limitation is important, batteries
will usually be represented as:
+
+
internal voltage −
Vint
• When a real battery is delivering no current, the voltage drop across the internal
resistance is zero, and if the chemical “fuel” of the battery is not totally depleted, you
will usually measure an internal voltage determined by the chemical potential of the
reaction itself. In this case, the terminal voltage will be equal to the internal voltage,
which is generally the nominal/rated voltage of the battery or cell. When the cell is
delivering current I, however, the terminal voltage (between the physical terminals
on the ends of the battery) is:
The internal resistance determines the maximum current and power deliverable by
the battery when the battery is short circuited – its terminals connected by a pre-
sumed perfect conductor. They are:
Vint
Imax = (5.9)
r
and:
2
Vint
Pmax = Vint Imax = (5.10)
r
• RC circuits are simple loops where a capacitor is charged or discharged through a
resistance. You should be able to derive the time-dependent discharge of a capacitor
through a resistor as the following exponential decay :
or:
Q(t) = Q0 e−t/RC (5.12)
where Q0 is the initial charge on the capacitor and V0 = Q0 /C is the initial poten-
tial across the capacitor. This result follows from applying Kirchhoff’s voltage law
around a loop and converting it into a first order, linear, ordinary differential equation
of motion that can be directly integrated.
• The “exponential time constant” of this decay is τ = RC. Recall that the time constant
τ is the fixed time interval in which the initial charge/potential decays to 1/e of its value
at the start of the interval. Exponential processes always gain/lose the same fraction
of their initial value in any given interval of time.
or:
Q(t) = Q0 1 − e−t/RC (5.14)
where V0 is the magnitude of the charging potential and Q0 = CV0 , in both cases the
final values found on the capacitor after a very long time, specifically many exponen-
tial time constant intervals.
Note on notation: At one time the voltage produced by e.g. a battery or mechani-
cal power supply was called (by Allesandro Volta, one of the original discoverers of the
chemical electrical cell) an electromotive force, and this usage was continued by later re-
searchers such as Faraday. This was a horrible misnomer – Volta’s model for the cause of
the voltage (that “motivated” the choice) was incorrect, and of course the units of force,
Newtons, are completely different from the units of voltage, Joules per Coulomb.
The SI unit of potential and potential difference, the Volt, is named after Volta.
Unfortunately many physics textbooks perpetuate the tradition of referring to the voltage
produced by any means as an electromotive force or use the acronym “EMF” to describe
216 Week 5: Resistance
this voltage without actually using the word force. In addition, the symbol E is often used
in place of the symbol V to label the voltage of a cell or induced voltage (discussed in a
few chapters) as an E-MF. Although this is a calligraphic/script font version of E, it is still
remarkably easy to confuse with the electric field and of course a voltage isn’t conceptually
or dimensionally an electric field, either!
This book will (hopefully consistently) use the symbol V to describe the voltage sources
or sinks of a circuit element or the circuit itself, including electrical cells or induced voltages,
and will eschew the use of the symbol E or the descriptors EMF or (worse) “electromotive
force” used to describe a potential or potential difference no matter what it results from.
This should do no conceptual harm to the general topic of electricity and magnetism; in-
deed it should simplify the treatment of potential differences. Students should be aware
of the more common usage, however, to the extent that they use additional textbooks or
references to supplement this one as they study.
Up to now, we haven’t really considered how the capacitors in the sections above got
charged up. Our model of matter is electrically neutral atoms and molecules, and while
conductors have lots of mobile charge we don’t know how to grab that charge and push it
around yet. Or rather, we do – one way to push it around is to use the electric field itself to
do the pushing!
This is how one charges things like amber and glass or clouds by rubbing them. The
fields of the atoms rub together and knock off charges and transfer them preferentially in
one direction or the other. But another way of grabbing things with fields is to exploit the
electrostatic field that holds atoms and molecules together in chemistry – a battery 67 .
It is probably instructive to look at the actual chemical reaction associated with at least one
specific kind of battery, even though one can make a cell out two different kinds of almost
any metal stuck into an electrolyte solution (e.g. an acid). So let’s look at the two reactions
associated with a lead-acid battery, the kind you probably have in your car.
A lead-acid battery consists of two plates. The anode (positive pole) is made out of
ordinary lead. The cathode (negative pole) is made of lead coated with lead oxide. Both
are immersed in a solution of water and sulphuric acid. At the anode68 :
Pb + HSO− +
4 → PbSO4 + H + 2e
−
67
Technically, a single device that generates a voltage in this way is called a cell – a battery is composed
of several cells – but we’ll just call anything that generates electricity a battery because nobody speaks of
“flashlight cells” when they go to the store to get a pack of D’s, they say “I’m going to get some batteries for
the flashlight”.
68
Wikipedia: http://www.wikipedia.org/wiki/Lead-acid battery. There are more complete ways of writing out
the chemical reaction that show more of what is going on with the water in all of this, but this is sufficient.
Either way, you are of course encouraged to visit the link and read more about it.
Week 5: Resistance 217
PbO2 + HSO− + −
4 + 3H + 2e → PbSO4 + 2H2 O
or overall:
Pb + PbO2 + 2H2 SO4 → 2PbSO4 + 2H2 O
plus the transfer of two electrons, driven by the chemical energy of the reaction, between
the cathode and the anode.
The electrolyte provides both the (ionized) sulphuric acid required at both ends and
a conducting pathway for the electrons to be transported from the anode to the cathode.
Energy is released by this reaction; the end products are more stable than the original
ones so the reaction is favored.
However, once a few atoms in the anode have given up their electrons and they’ve
been pulled over to the cathode, the reaction stops! The poles are then charged up and it
costs too much work to remove any more electrons, more than one gains in the chemical
reaction. The anode is then charged up positively (as an electron donor to the reaction in
the battery itself) while the cathode is charged up negatively (having received the elec-
trons). The top and bottom plates behave just like the plates of a capacitor and maintain
an electrical potential difference of around 2 volts (per cell in a battery of six cells, in a
typical twelve volt battery in a car) between them that just balances the chemical potential
of the arrangement.
There is, however, an important difference. If one provides a conducting pathway
between the anode and the cathode outside of the solution, then the negative charge
surplus on the cathode can flow back over to the anode and participate in another reaction,
then another, then another. Charge continues to be driven in this way until all of the lead
and lead oxide is converted into lead sulphate and water. For every mole of lead converted
into lead sulphate, two moles of electrons have to move from cathode to anode. That
is 1.2 × 1024 /1.6 × 1019 = 0.75 × 105 Coulombs of charge, enough to drive an Ampere of
current (one Coulomb/second) for around a day. A mole of lead is around 207 grams, which
weighs around a half a pound. Allowing for the electrolyte and sulphuric acid, roughly a
pound of battery will drive a load of two watts (one ampere at two volts) for just under a
day (where we’ll work out energy relations below to justify this in a moment).
A second advantage of this particular battery is that it is rechargable. If one simply
places a voltage across the cell that exceeds its terminal voltage, charge flows the other
way, reversing the reaction and turning lead sulphate back into lead or lead oxide. By
careful design, one can charge and discharge the battery many times before too much
lead sulphate falls off of the electrodes or crystalizes out across the space in between the
terminals and shorts out the batter, at which time the battery must be remanufactured (to
avoid dumping toxic lead into the environment).
Vehicle batteries, of course, weight many pounds – as many as fifty or sixty – and
have six cells, and therefore can drive bigger currents at higher voltages, currents that can
easily be large enough to be dangerous. In fact, a car battery69 , and can easily kill you if
69
http://www.darwinawards.com/darwin/darwin1999-50.html Not just a car battery. You can kill yourself with
a nine volt transistor radio battery, and one of my favorite Darwin awards went to a Navy officer who demon-
strated this the hard way after being warned about the danger.
218 Week 5: Resistance
you handle it carelessly by the poles with e.g. wet hands or cuts on your fingers! I’ve gotten
“hit” this way myself handling a car battery by the poles in a rainstorm, and it hurts! This
kind of battery can (multiplying out the coulombs, volts, and seconds) do around 150,000
joules of work per pound in the ideal case, probably less than half this in the real world
case.
However, all batteries have a finite rate at which they can do work, determined by the
physical limitations on the rate at which the chemical reaction can proceed. So even if
one shorts out a battery with a perfect conductor, one won’t get an infinite current at a
constant voltage. As the current goes up, the voltage goes down, until at some point all
of the energy is released as the heat of reaction in the electrolyte and none to the battery
load. Some batteries are designed to provide a fixed voltage and low current for a long
time; others are designed to produce a fixed voltage and a large current for a short time.
Car batteries in particular are usually pretty good at both.
All of this is too complicated for intro physics, of course. We want to start by idealizing
a battery and replacing it in all circuits we consider with a single simple symbol. The
symbol we will use is the nominal potential difference maintained by the battery between
its terminals (its “terminal voltage”) and where the + sign (and longer plate) indicate the
anode, the side of the battery from which positive current flows (where we are suffering
from Franklin’s Mistake, because the actual motion of charge in the chemical reaction
above is negative electrons flowing the other way). Again, the battery behaves like an
“inexhaustible capacitor” in an electrical circuit, increasing the potential by V as one moves
from the cathode (small plate) to the anode (large plate) in any circuit diagram containing
this symbol.
Our ideal battery never runs out of power, has no limitations on the amount of current
it can provide at its rated voltage, and its voltage is rigorously constant. None of these
is going to be true in practice for real batteries, and after we define resistance and work
out Ohm’s Law, series resistance addition rules, and Kirchoff’s rules below, we’ll revisit the
battery and see how we can compensate for these features by assigning an internal resis-
tance r to the battery itself. This internal resistance is not entirely a fiction – batteries and
other power supplies do have some actual internal resistance – but it often also represents
the practical effect of other rate limiting physics, such as the maximum rate that some given
force can do work on a piece of generating apparatus.
This internal resistance will quite naturally cap the power and current the battery can
provide as one cranks up the load on it. It still doesn’t indicate the way voltage and current
depend on things like temperature, the degree to which the battery is discharged already,
and how old the battery is – all of these things and more affect real batteries, dynamos
(electric generators), solar cells, and any other method we have of turning (potential) en-
ergy into electrical power. But we will do quite well with our idealized battery, and even
better with our idealized battery with an internal resistance – the rest is a mix of more ad-
vanced physics and associated engineering and doesn’t change the idea, only the details.
Week 5: Resistance 219
Before we move on to resistance, it is worth pointing out that battery physics and engineer-
ing are important in our society, and becoming more important as we move in the direction
of renewable energy sources, hybrid or flat-out electric cars, rechargable electronic devices
galore and more.
One of the biggest obstacles to the widespread adoption of solar or wind generated
power is the difficulty of storing power that is generated when the sun is high and bright
or when the wind blows strongly for use at night or on calm days. With fuel-generated
energy, as long as one provides the fuel one can produce the energy! This is not possible
with sunlight, and parts of the Earth get no sunshine at all for months at a time (as well
as sunshine 24 hours a day other months at a time). Similarly, even “windy” locations can
have calm weather for days or even weeks at a time.
It requires hundreds of pounds of lead-acid batteries per person just to store the av-
erage power needed for a single day (say) generated from solar energy or wind energy
collected in intervals during that same day. Lithium batteries that store the same amount
of energy are much smaller and lighter, but lithium is an alkali metal and burns when ex-
posed to air, making it more difficult to safely engineer high-capacity batteries. Alternative
battery technologies (say, zinc-oxide batteries, lithium batteries, and more with very dif-
ferent chemistry, both wet and dry) are constantly being explored, driven by the need to
store at least a few days’ worth of power from intermittant sources to bridge those times
when the source is not available, as well as to make it possible for our laptops, tablets, and
phones to run for days on a single charge and for electrical cars to travel long distances on
a charge and recharge quickly.
The inventor(s) of a really, really compact and efficient way of storing energy would both
make a well-deserved fortune from the idea and would enable any number of beneficial
changes to our energy hungry society. In the meantime, rechargable batteries have and
are likely to continue to have many problems: They are (so far) bulky and massive, they get
hot while operating at high power levels (due to their internal resistance!), they are often
made with toxic or comparatively scarce materials, they are consequently difficult to safely
dispose of, they (so far) wear out and can store much less energy after a few hundred or at
most a few thousand charges, they can explode or catch on fire if overdriven (making them
very nearly a munition in the hands of the unscrupulous or violent). Put all of this together
and so far, batteries are very expensive, both in direct dollar cost per unit of energy stored
and in terms of environmental cost and risk! Yet there is little doubt that within the decade,
batteries will be running many if not most of our homes and cars in addition to all of the
things that they are used for now.
If this topic interests you you can learn a great deal about rechargable/secondary bat-
tery technology (which is very much a moving target, where the costs per unit of energy
stored by a rechargable battery have decreased by some 60 to 80% over the last ten or
fifteen years) by visiting:
Wikipedia: http://www.wikipedia.org/wiki/Rechargable Battery
To summarize: at this point (with this paragraph being written in 2017) large capac-
ity, high density, long lifetime rechargable batteries for capable of running a “typical” U.S.
220 Week 5: Resistance
household (that uses, say, around 30 kilowatt-hours of energy a day) costs less than $5000
at full retail to an individual consumer. I personally expect that by 2020 battery technology
will advance so that the full retail cost crosses the $0.10/watt-hour threshold (where now
it is more like $0.15), so that a full-day battery will cost roughly $3000 (with a two day
supply or supply for a larger household still quite affordable). There is no good reason to
think that retail costs will not continue to fall beyond this point as technology improves and
manufacturing capacity increases and enables various economies of scale. Well within the
decade, individual houses will be easily and cheaply equippable with “backup batteries”
that can store days’ worth of energy for the entire house that will last for a decade or more
with most of their charge storage capability intact.
Similarly, as of this writing an array of rooftop solar cells capable of recharging these
batteries with the energy received in a single “typical” sunny day in most of the United
States costs around $5000, and this number will also continue to decrease to 2020 and
beyond as new technologies emerge and maufacturing capacity increases.
Electrical energy purchased from a utility company in the United States currently costs
an average of 12 cents per kilowatt-hour, so a year’s worth of electrical energy for a “typical
household” is around $1300 in 2017. If one invests approximately $10,000 (plus $3000
for installation), one can already go “off-grid” in most U.S. locations (ones with adequate
insolation) and generate very close to 100% of the electrical energy needed to run a typical
household, and break even on the investment in roughly a decade, for about what a top of
the line high efficiency air-conditioner/heat pump for that same household would cost.
The amortization time required to recover the investment will very likely drop to seven
years or even less within the next few years, making this a no-brain decision for most
households – one can borrow the money required to convert over and pay off the loan in a
matter of years for about what one would pay for a new car and entirely funded by reduced
electrical utility bills and enjoy “free” electrical power for the rest of the useful lifetime of the
hardware, estimated at this time to be in excess of twenty years.
Fine, so now we have a battery. We place a chunk of conducting matter between the
poles/terminals of the battery, and what happens? Well, current flows, that’s what happens!
We have created a situation where a conductor is not in electrostatic equilibrium, and
charge moves in time through the conductor in response to the force created by the battery,
with energy released in the process. This is actuall fine, and we might even say, it’s about
time that we got out of statics (which are kind of boring, as not much happens, right?) and
into dynamics, where things happen. All we need, then, is to come up with a model for what
goes on inside the conductor as the current flows, and we can start to analyze dynamical
electrical systems once again, which has to be more interesting than just thinking about a
charged capacitor sitting around all do doing nothing much but just storing charge.
A microscopic picture, of course, begins with atoms, each with a heavy nucleus and
surrounded by electrons, arranged in some sort of solid lattice, with some of the electrons
“free” to move within the lattice. Free to move, however, is not the same thing as non-
Week 5: Resistance 221
interacting. Electrons that move through the lattice interact with the lattice and transfer
their momentum to the lattice so that (in equilibrium) their average velocity is zero. The
lattice therefore exerts a kind of drag force on the electrons that brings them back to
equilibrium.
The simplest model for conduction of electrons through a material that “resists” their
motion via a drag force caused by the collision of the moving electrons with each other and
the underlying atoms in the lattice is one with a linear drag force – one that is proportional
to the average velocity of transport of the electrons through the resistive lattice. If the
electrons are being pushed through the conductor by some constant force, then, they’ll
arrive quickly at a terminal velocity that is proportional to that force, where the forces
balance.
We now use this model for “linear drag” to build a working description of voltage changes,
electric fields, and electrical currents and current densities inside conductors that are not
in electrostatic equilibrium. They are not really static because charge is being pushed by
electric fields and is moving, but they are still in a kind of dynamic equilibrium where forces
on the charges balance. This model will also work for “slowly varying” currents – currents
we can treat as being approximately constant on small time intervals ∆t – but ultimately it
will fail when we take into account the possibility of the conductor radiating energy and mo-
mentum into space for rapidly varying currents (a consequence of the Maxwell Equations
we haven’t learned yet in electrodynamics). It is thus a “quasi-static” theory and should not
be taken too seriously or considered to be completely general or correct.
F = b vd F = qE
vd
q
vd
vd vd
q vd q A
q
vd ∆ t
Figure 5.1: The simple linear model for conduction in a resistive lattice.
In figure (5.1) we see a model for a conducting wire. This wire has a cross-sectional
area of A and contains an “inexhaustible” supply of free charged particles (recall, order
of one free charge per atom) each with charge q. An electric field is created within the
wire by a battery (not shown) that exerts a force on any given charge carrier to the right of
F = qE. The wire resists the flow of that charge carrier with a “drag force” bv to the left,
where b is a phenomenological “drag coefficient” characteristic of the imperfect conductor.
Microscopically, we can initially mentally picture this drag force as being the result of an
ongoing average loss of momentum as each free charged particle speeds up in the direc-
222 Week 5: Resistance
tion of the electric field for a time but then is suddenly slowed down enough to “start again”
as it collides with the atoms or molecules of the material (incidentally heating the material).
In “dynamic equilibrium” (steady, or nearly steady currents) we require these two forces
to balance:
qE
vd = (5.15)
b
where we introduce the drift velocity vd , defined to be the average “terminal velocity” of
charges in the conductor70 . It is important to keep in mind that in a typical normal metal
our charge carriers are negatively charged electrons (recall “Franklin’s mistake”) and all of
the vectors are reversed for a current and field that still go from left to right, but this makes
no difference in anything we care about (yet!); the argument given below works for either
sign of the charge carrier.
Let’s carefully examine the picture and see what we can deduce. We are interested
in computing the electric current, defined to be the charge per unit time that is being
carried by the conductor. Ordinarily, we’d think of this as the charge per unit time travelling
in some chosen direction that passes some point in the conducting wire under the influence
of the force created by the battery (or other source of potential difference across the wire).
However, what does “passing a point” mean? How can we manage our choice of direction?
All of the charge may not be travelling in the same direction! The conductor may not be a
simple cylinder like that pictured above but instead be some contorted shape cast in metal
with many branches! We need a better, less ambiguous definition.
We can unambiguously estimate how much charge passes through a given surface cut-
ting across the metal. Since a surface in three dimensions has one dimension perpendicu-
lar to the surface, we can always assign our direction unambiguously in this perpendicular
direction. The wise student should already be saying to themselves “But that sounds a lot
like our reasoning when we talked about the flux of the electric field a few chapters back..”
and that wise student would be quite right!
But first things first. For the time being, let’s confine ourselves to the simple case of the
cylinder above with the surface in question being one that cuts across it at right angles, the
surface A pictured above. From the picture we can see that all of the charge ∆Q in the
volume between the plane surface bounded by the dashed circle and the plane surface A
bounded by the circle at the far right of the conductor passes through the cross-sectional
area A perpendicular to the direction of motion of the charges in a time ∆t. So how much
is that?
To answer this, we need to define a few quantities. One is:
# of charge carriers
n= (5.16)
unit volume
the number of (free!) charge carriers q per unit volume. We can then turn this into the free
charge density:
ρfree = nq (5.17)
70
We will give a particular, simple, classical model called the Drude model for the drift velocity that will give
us an actual functional form for b in a more advanced section below that can safely be omitted by students
uninterested in majoring in physics or more advanced studies in e.g. engineering (although it is not terribly
difficult and is a worthwhile exercise in mechanics).
Week 5: Resistance 223
which we read as “the number of charge carriers per unit volume times the charge per
carrier times the volume vd A∆t”. This means (dividing out the ∆t) that the total charge per
unit time that goes through A is:
∆Q dQ
I= ≈ = nqvd A = ρfree vd A (5.19)
∆t dt
In passing we note that the SI units of current are Amperes (or Amps for short) where
1 Coulomb
1 Ampere = (5.20)
1 Second
The result I = nqvd A will occur again and again when we pass from a microscopic
description of e.g. magnetic forces on charges to macroscopic forces on current carrying
wires, so keep it in mind! It isn’t just a transient “use once” result; it is the key to under-
standing many things.
Note well that in the picture above, we determine the current that passes “a point in the
wire” by evaluating how much charge passes through some surface that contains the point!
The particular surface we chose in our simple derivation is one perpendicular to the direc-
tion of the motion of the charge, but we cannot possibly guarantee that all conductors
carrying a current will have some simple known surface where this is true. Also, as we
noted above, this picture should remind you of something – it is very similar to the pictures
we used to talk about electric flux in the context of Gauss’s Law!
The problem we face is that there are many surfaces that pass through any given point,
so talking about how much charge passes a point on the wire isn’t very well defined. We
would do better talking about how much charge passes through a closed curve drawn
around the wire (or other, arbitrarily shaped) conductor, but even so, there are an infinite
number of surfaces bounded by any closed curve. We need the electric current through
such a loop not to depend on the surface chosen, at least in the (quasi) stead-state dy-
namical equilibrium currents we are talking about here.
We can achieve this by recapitulating the reasoning of electric flux for (again) a single,
simple cylindrical wire where we can count on help from geometry. We want the current
through our surface A perpendicular to the direction of motion of the charge to be the same
as the current through a second surface A′ that is cut through the wire at more or less the
same place but is tipped at an angle θ relative to the direction of the current. As before, the
tipped surface area area A′ = A/ cos(θ) is larger than A. In order to get the same current
I from these two surfaces, we need to compensate for the cosine on the bottom with one
on the top:
A
I = nqAvd = nq cos(θ)vd = nqA′ vd cos(θ) (5.21)
cos(θ)
224 Week 5: Resistance
We can get the cosine out of a dot product between the local direction of the vector
drift velocity ~vd (assumed to be parallel to the actual current at any point in the wire) and
n̂, the directed normal unit vector to the surface A or A′ :
vd · n̂ = nqA′~
I = nqA~ v d · n̂′ (5.22)
We have a single choice to make in this expression – there are two possible directions
perpendicular to the surface and we have to choose (for example) either left to right or
right to left as being positive n̂.
Again as before in our discussions of electric flux, we can take an arbitrary curved
surface and break it up into tiny differential chunks dA, each with its own normal vector
n̂ selected with the same left-to-right or vice-versa sense. The chunks are small enough
that we can treat all the charges that pass through them as locally all going in the same,
unambiguous direction ~ v d . For each of these, the differential current through the chunk is:
dI = nq~
vd · n̂dA (5.23)
and we can now unambiguously sum up all of the current through an arbitrary curved sur-
face or through plane surfaces where the flow of charge is not all parallel and perpendicular
to the surface.
If we chose as our surface any open surface S that cuts completely across a branch of
our conductor, we will find that it is always bounded by a closed curve C on the surface of
the branch. We can then write the following, completely general and correct definition for
the “current in the branch” in the steady state:
Z Z
IC = nq~
v d · n̂dA = J~ · n̂dA (5.24)
S/C S/C
where S/C is read “through the surface S bounded by the closed curve C and where:
J~ = nq~
v d = ρfree~
vd (5.25)
is called the current density. In other words, the current through an open surface S
bounded by a closed curve C is the flux of the current density through that surface.
Note well that this is still just I = nqvd A = ρfree vd A = JA for the simple cylindrical wire and
perpendicular surface A we began with, but it can now handle far more general flows of
current.
We are now in a position to be able to derive a beautiful form for the Law of Charge
Conservation. Consider a simple closed surface S (like the ones we considered for
Gauss’s Law) located anywhere in space. We already know that the closed surface S
encloses some volume V /S, and we already know how to compute the total charge inside:
Z
Qin S = ρdV (5.26)
V /S
or, the total charge inside S is the integral of the charge density inside.
If charge can never be created nor destroyed, the only way the total charge in V can
change is if charge moves across the surface S! Charge can flow in to the volume through
Week 5: Resistance 225
S or out of the volume through S, or both at the same time, but if S is impervious to charge
(say it is a “perfect insulator”), the charge inside can never change.
Quantitatively, then, the total current through S has to equal the rate of change of the
total charge inside. All we have to do is assign a choice for the direction of n̂ – into or out of
the volume – and write this in differential/integral form. Let’s choose “out” because it then
is consistent with Gauss’s Law (which will prove strangely useful to us later on!):
d dQin S
I Z
Iout = ~
J · n̂dA = − ρdV = − (5.27)
S dt V /S dt
This equation is very important! It is, in fact, a law of nature, based on substantial
empirical evidence. It is the law of charge conservation written in mathematical form.
Basically, it says that the amount of charge inside any volume bounded by a closed surface
can only decrease (increase) if charge flows out (or in) through the surface! The net
charge inside cannot just poof into or out of existence, it has to get there by coming in from
outside71 .
If/when you take a more advanced course in electromagnetism, one of the very first things
you will do is apply the divergence theorem to the Law of Charge Conservation, Gauss’s
Law, and expressions containing flux integrals in general and convert them to vector differ-
ential form. Treating the divergence theorem and doing this algebra is beyond the scope
of this course (although advanced students may have done it in the starred homework
problem in the Gauss’s Law chapter earlier and can get the same result with the same
procedure here) but we put down the result (only) here for completeness and to make it
easier to make the connection in a future course.
The law of charge conservation in differential form is:
~ · J~ + ∂ρe = 0.
∇ (5.29)
∂t
This ends up being much more convenient for doing the math associated with solving
serious electrodynamics problems. It is also has a critical invariance property when one
learns about the four-dimensional geometry associated with the theory of special relativity
– basically charge is conserved in all inertial reference frames even when relativity is taken
into account.
We can also look ahead a bit at this point. Soon we will discover that Maxwell’s equa-
tions are called Maxwell’s equations because Maxwell more or less discovered an incon-
sistency in the treatment of current in the original form of one of the laws that could only
71
There is another way charges can appear inside the box that doesn’t violate this law – they can be created
or destroyed a pair at a time in such a way that the net charge of the pairs remains zero. This actually happens
in high energy quantum mechanical collisions – making it beyond the scope of this course – but the creation
of a positron-electron pair does not violate net charge conservation.
226 Week 5: Resistance
be made consistent by adding a term to it to account for the implications of charge conser-
vation and the arbitrariness of the infinity of surfaces “through” which charge can flow that
are all bounded by a single closed curve C.
C
n
S2
S1
n
J Q
To help the interested or advanced student out, consider figure 5.2. In this figure, we
split the closed surface S bounding V into two pieces, S = S1 +S2 by drawing single closed
curve C all the way around it. S1 on the left and S2 on the right are ballooned out so that
they resemble two “fishing nets” placed face to face, through which charge can flow.
If current is flowing in a “steady state” way and charge is conserved, the current from
left to right through the two surfaces S1 and S2 must be equal – the current through the
first must equal the current through the second because in the steady state, no charge
is building up in between.
However, if we put e.g. a capacitor plate in between the two surfaces (or charge is
accumulating in some other way), current may not be flowing in a steady state way – current
may be building up inside the closed surface S. In that case the difference between the
current through S1 and the current throught S2 is the rate at which charge builds up inside
V:
d dQS/V
Z Z Z
~
J · n̂dA− ~
J · n̂dA = Iinto V thru S1 −Iout of V thru S2 = ρe dV = (5.30)
S1 S2 dt V /S dt
(for Q in S/V ). Note that all we really did to get this result is split the integral over the
closed surface S in our previous discussion into two pieces, and change the sign/direction
of n̂ for the first surface so that they both go “left to right” instead of “out”. You should verify
that this makes sense on your own.
Armed with this result, students are encouraged to “play Maxwell” as they go along, and
see if they can discover and fix this inconsistency all by themselves without looking ahead
to see how it is done when Ampere’s Law is introduced. You now have all the information
you need to do so except for, of course, the actual equation that needs to be repaired which
is covered in a later chapter. When you cover it, your instructor may refer you back to this
section and suggest again that you give it a try.
Week 5: Resistance 227
The earliest forms of the passive pinball game 72 worked by dropping a small metal sphere
(usually a ball bearing of some sort) into a vertical box fronted by a piece of glass and
studded on the inside with “pins” as pictured in figure 5.3. The ball would bounce down
through the pins in not-quite-random ways and end up in one of several slots at the bottom.
One could then gamble on just which slota ball would end up in, or try to use skill in the
way the ball was dropped to determine the outcome. However, because the (essentially
classical) motion is effectively chaotic, it is nearly impossible to drop a ball into the array
of such a way that the final outcome could be controlled or predicted in anything but a
statistical way after the first two or three collisions with pins.
pins/bumpers
ball bearings
trajectory
gravity
Figure 5.3: An early pinball machine. Balls (typically small ball bearings) dropped in at the
top fall into an array of “pins” that function as bumpers but are vertically“stopped” by the
pins after falling a short time τ so they only build up a finite downward average speed.
Note well that the pinballs cannot escape through the sides, and to avoid complications
such as a ball striking a side and falling straight down to the bottom along a side, we will
assume that the sides are perfectly elastic bumpers that effectly reflect a ball back into the
lattice of pins in the horizontal direction without affecting its vertical motion.
Physicists, mathematicians and statisticians got involved in the game at a very early
point – for example the Bean Machine 73 was built specifically to demonstrate the central
limit theorem, an important result in the theory of probability and statistics. This sort of
machine is equally useful in the context of understanding classical resistance. Let us build
a very simple “pinball” model for conduction where the electric field that pushes charge
through a lattice of atoms is replaced by gravity pulling down ball bearings and where the
atoms in a lattice are replaced by the pins. One can still sometimes find simple pinball
machines of this sort (sometimes called Pachinko machines) sold as toys.
72
Wikipedia: http://www.wikipedia.org/wiki/pinball. By “passive” we mean that the bumpers are not electrical
and don’t add energy to the bouncing ball.
73
Wikipedia: http://www.wikipedia.org/wiki/Bean Machine. Be sure to play with the dynamic graphics!
228 Week 5: Resistance
Let’s use this pinball model to make a simple conduction model, replacing the balls with
free charges and the pins with the lattice of atoms through which the charges move. There
is just one catch – in the passive pinball model illustrated above, the balls fall between
pins only due to the force of gravity and the pins themselves effectively stop their downward
motion on each collision so they have to build up speed again, until the next collision stops
it – again.
In an actual lattice of atoms, the atoms are at a finite temperature and the extremely
light electrons are in thermal equilibrium (more or less) with the lattice. In a nutshell,
as we explore in detail below, this means that the average thermal kinetic energy of the
conduction electrons is much, much larger than the energy they might gain from the field
between collisions in the passive pinball model!
To put it another way, suppose it takes an electron a time τE to “fall” (say) some average
distance between atoms/collisions and a time τtherm for the atom to travel that same dis-
tance due to their average speed due to their temperature. If their average thermal speed
is much larger than the average speed built up between passive collisions, then:
τE ≫ τtherm . (5.31)
and a conduction charge will undergo many, many thermal collisions in the time it would
have taken to experience one passive collision with the lattice!
These thermal collisions are not biased in any particular direction – they are equally
likely to make the charge go up, down, right, left, forward, backward in the lattice, so they
do not directly contribute to the flow of current! However, during the shorter time τtherm ,
the component of the velocity of an electron in the direction of the force due to the electric
field is slightly increased so that – on average – the electron “drifts” in the direction of this
force as it otherwise bounces around randomly and rapidly in all directions.
This ”rapid thermal collision with weak electrical force biasing electron drift” pictured
in figure 5.4, is called the Drude model. With no voltage/field, the lattice of atoms and
charges behaves like a horizontal “active” pinball table filled with pinballs (charges) and
bumpers (atoms) that are firing/vibrating at an extremely rapid rate so that charges con-
stantly bounce between the atoms in an unbiased random walk, leading to zero average
displacement. The charges themselves are (recall) strongly repulsive and there is an
average of roughly one charge per atom.
The application of a voltage across the conductor is equivalent to tipping the active
pinball table up through a small angle relative to gravity. In such a tipped, active table grav-
ity (acting only during the brief time between collisions) will still make the balls gradually
drift lower in the direction of the component of the gravitational field parallel to the table by
slightly biasing the direction of their otherwise random motion.
In a conductor as well there is a small net acceleration in the direction determined by
the electrostatic force on the charges during this short time between bounces. Quanti-
tatively (as we shall see below), we expect each charge to “bounce”’ roughly thousands
of times between the atoms in the time that it would take a it to cover the distance from
one collision to the next due only to the applied field in the passive pinball model, but this
small, asymmetric force is enough to bias the random motion of the charges so that
they slowly drift in the direction of net force.
Week 5: Resistance 229
no−field trajectory
q
E field
trajectory with field
Figure 5.4: The Drude model: With no field, the charges q bounce very rapidly between
atomic “bumpers” that maintain a roughly thermal distribution of charge speeds. On av-
erage, these collisions form a “random walk” with no direction and zero average displace-
ment. With a field, in the very short time τ between collisions these random free trajectories
are very slightly curved in the direction of the field and the random walk is now biased, with
a net displacement that slowly accrues in the direction of the field.
This is why vd is called the “drift velocity” – it is a velocity that is quite distinct from the
actual speed of the particles as they bounce around vigorously between the bumpers! To
make a proper conduction model out of this, we also need to (metaphorically) constantly
take “pinballs” (charges) off of the lower end of the table and elevate them back up to
the top of the table with a conveyor belt of some sort or another so that charge doesn’t
accumulate at the bottom and generate a backwards-directed field of its own that stops the
process. This constant lifting of charges back to “start over again” is an excellent mental
model of a battery! We finally have a microscopic picture of sorts of a simple electrical
circuit consisting of a battery and a resistance!
Non-physics majors can probably skip the details in most of the next subtopic (depend-
ing on the wishes of your instructor) although it will still be helpful even for non-majors to
skim through the the section to get a feel for what is going on and how this depends on that,
etc. Physics majors almost certainly should work through it in detail even if your instructor
doesn’t plan on testing you on any of those details – you will encounter the Drude model
again in future courses where you are likely to be held responsible for doing the math on
a quiz or exam, and it will greatly simplify matters for you then if you make some effort to
understand the model in some detail now without the pressure of testing.
Note that all students are responsible for the key result – getting Ohm’s Law (in both
forms) out of the general argument, most of which is intended to justify the linear relation-
ship between vd and E that is key to the relationship.
230 Week 5: Resistance
Let’s now generate the full algebraic description of the Drude model, using the insight that
conduction in a resistor can be quantitatively modelled “like the motion of pinballs in a very
slightly tipped pinball table with a lattice of highly active bumpers”.
We start by estimating the mean speed of the conduction charges from thermodynam-
ics. If the lattice is at temperature T , and the free charges are in thermal equilibrium with
the lattice (a reasonable assumption), then from the equipartition theorem we expect the
average kinetic energy of the free charges in a three dimensional space when the “table”
is not tipped by an applied electric field to be:
3 1 1
K = kb T = m(hvx i2 + hvy i2 + hvz i2 ) = m hvi2thermal (5.32)
2 2 2
We solve algebraically for hvithermal to get:
r
3kb T
hvithermal = (5.33)
m
We can now evaluate/estimate this easily enough at (say) 300 degrees kelvin for electrons
from kb = 1.38 × 10−23 J/K and me = 9.1 × 10−31 kg as:
√
hvithermal = 1.17 × 105 ≈ 105 m/sec. ∼ T (5.34)
to get the order of magnitude for the thermal speed expected in metallic conductors “around”
room temperature. This scales like the square root of the absolute temperature! Even the
model is inexact in some detail or another, we can (eventually) test the predictions of the
model against this scaling with temperature!
Note that this is very, very fast (an order of magnitude greater than escape speed from
the Earth!), and at that turns out to be not fast enough in purely classical quantitative
models because the electrons are not a classical gas of non-interacting particles, they are
quantum mechanical strongly interacting fermions whose effective “speed” is determined
by something called the fermi energy that is only weakly dependent on the termperature:
A quantum mechanical treatment leads to average speeds roughly an order of magnitude
larger:
hvi = hviQM ∼ 106 m/sec (5.35)
Note that this is around a hundred times escape speed, an appreciable fraction of the
speed of light!
Next, we define the mean free path d as the average distance a free charge travels in
some unbiased random direction between atomic “bumpers” at this average speed. We’ll
assume (for our purpose of numerical estimation) that d = hvi τtherm ≈ 10−10 meters or
one angstrom, the typical order of the distance between atoms in a metal. Then we can
estimate the average time between “bumper events” when the charges interact violently
with the atoms in the lattice as:
d
τtherm = ∼ 10−16 seconds (5.36)
hvi
Before we go any further, we need to compare this time to the time τE it ought to take
for a charge to start at rest and to move a distance d due to the electric field only in the
Week 5: Resistance 231
passive pinball model, where the only thing giving the ball speed is the electric field itself
and the collisions completely cancel the vertical speed, on average, at each collision. To
get a quantitative estimate, we have to pick an electrical field strength. Let’s assume a
(fairly strong) field, one we might expect to find in the filament of an incandescent light bulb
with a 100 volt potential difference across a 1 cm filament:
∆V 100 volts
E= = ≈ 104 volts/meter. (5.37)
ℓ 0.01 meters
This large number is fairly representative of the field strength in significant resistive
loads, and is orders of magnitude larger than the field strength one would expect in a
halfway decent conductor such as household copper wiring.
To estimate τE , we use ordinary kinematics:
r
1 2d
d ≈ aτE2 ⇒ τE = . (5.38)
2 a
a = qE/m = 1.6 × 10−19 ∗ 104 /9.1 × 10−31 = 1.76 × 1015 m/sec2 (5.39)
which is orders of magnitude greater than τtherm as expected even for quite strong fields.
It also lets us estimate what would have been vd in a passive pinball model, using the fact
that the average velocity of a particle starting at rest with a constant acceleration is half the
acceleration times the time (because the velocity is linear in the time):
d 1
vd = hvipp = = aτE ∼ 103 m/sec (5.41)
τE 2
During the short time τtherm between thermal collisions, we expect the force exerted by
any reasonable electric field inside the material to be “small” compared to the force exerted
by the thermally vibrating atomic “bumpers” so that biased accumulation of momentum in
the direction of the field during the time τ is approximately differential. Also, the strong
interaction between lattice and charge maintains near thermal equilibrium with the much
more massive lattice; any kinetic energy gained from the field during the time τtherm is (on
average) lost again (transferred to the lattice) in each lattice collision so that any given
charge doesn’t systematically accumulate kinetic energy as it moves in the direction of the
field but rather heats the entire material while remaining in thermal equilibrium with it. This
heating is called Joule heating and we will discuss it further later.
We are finally ready to build the Drude model. From Newton’s second law it is easy to
find the accelertion of a charge q during the time between collisions:
~
qE
~ = qE
F ~ = m~
a ⇒ ~
a= (5.42)
m
232 Week 5: Resistance
This acceleration only applies for the average time τtherm before the charge is redirected in
a random direction with the original unchanged thermal distribution of speeds. During this
time its average velocity is:
1 ~
1 qE
h~
vi = ~aτtherm = τtherm = ~
vd (5.43)
2 2 m
where the term “drift velocity” is now formally justified as it is the differential bias of a much
more rapid and violent random process of the charged particles bouncing around between
the atoms.
You will note that this is exactly the same as the expression we obtained in the pinball
model except that instead of using τE from d = 21 qE 2
m τE (where this force is all that makes
the charge move the distance d starting each time from rest), we instead use the average
time τtherm deterined by the strong interaction of the charged particles with the surrounding
material while remaining in thermal equilibrium with it! This seems like a small change, but
it is a very important one, as the drift velocity one estimates is a thousand times smaller in
the second (more correct) case!
In this “active pinball” Drude model, then, we expect from equation 5.43 that:
nq 2 τtherm ~
J~ = nq~
vd = ~
E = σc E (5.44)
m
which scales linearly with the electric field strength. In this expression, we know or can
estimate all of the parameters, and can easily combine the estimates into σc , a quantity
~ in this
we will call the conductivity of the material. Note that τtherm is independent of E
model! This is a crucial point, as we shall see next.
s
2md
Suppose we use the time τE = from the naive passive pinball model above in
qE
exactly the same argument. We then obtain an average speed (not really a “drift velocity”
any more) of: s r
1 1 qE 1 qE 2md dqE
hvipp = aτE = τE = = (5.45)
2 2 m 2 m qE 2m
or r
dqE
J = nq hvipp = nqA (5.46)
2m
(all vectors in the direction of the applied field). The current density would then scale with
the square root of the field, which does not empirically agree with Ohm’s Law! This is a
critical failure of the passive pinball model, one that cannot be explained away by mere
“slop” in our estimation process!
If we compare the Drude model result equation 5.44 to the equilibrium condition qE =
bvd from our elementary discussion assuming linear drag, we see that the “linear drag
coefficient” be introduced phenomenologically is given by:
m
b= (5.47)
τtherm
or:
~ d = m~
F
vd
=
∆~
p
(5.48)
τtherm ∆t
Week 5: Resistance 233
This is conceptually perhaps the easiest way to see what’s going on. m~ v d is the average
momentum of each charge carrier in the conductor. In each interval τtherm between “active
bumper” collisions, the carrier starts from (on average) rest and gains this much momentum
from the field, and then is brought suddenly back to rest by the lattice. The “drag force”
thus equals the average momentum change per unit time of the free charges as they
move through the lattice of atoms, and depends on two easily understood parameters.
If you skipped over the last subtopical section, take a moment to look back at the boxed
equations 5.44 and 5.47. These two results are the most important things to take from the
omitted section if you are just skimming it without working through its arguments in any
detail, as they contain the testable components of the Drude model. Equation 5.44 shows
~ and hence to the potential difference across
that it gives a current density proportional to E
a piece of conductor of some given length. It is also proportional to the time between ther-
mal collisions τtherm between the charge carriers and the atomic/molecular lattice, which in
turn scales inversely with the square root of the absolute temperature. The implicit scaling
of these relationships can easily be compared to observation even if some of the details of
the development of the model were just estimated and perhaps were even a bit sketchy.
With these results in hand we can easily establish the connection between Ohm’s Law
(a well-known empirical result) and the Drude model. We just showed above that the
current density J~ is proportional to the applied electric field E.
~ In so doing, we wrapped up
all of the complexity – all the unknown stuff about a conductor, including, b, n, q, m, τtherm –
into a single parameter called the conductivity :
nq 2 τtherm nq 2 qρfree 1
σc = = = = (5.49)
m b b ρr
In this equation we note the terrible collision of symbols that is (sadly) just the way it is
when discussing the conductivity σc and its reciprocal, the resistity ρr of the material (also
defined in this equation). As you no doubt have observed, physicists reuse the symbol
for volume charge density ρ for resistivity, and worse, reuse the symbol for surface charge
density σ for conductivity! Who invented this stuff! The equation for ρr even contains
ρfree = nq, the density of free (conduction) charge, just to maximally confuse you! To do
my best to help you out, I’m going to use ρr with the little ‘r’ subscript for resistivity, and will
similarly label σc with a ‘c’ for conductivity, but you’ll still need to be careful at first!
Fortunately, with a little practice you will rapidly learn to identify which symbol goes
where from its units/dimensions and from context, so you eventually you won’t be any
more confused than you are by sentences like “The two hippopotami, each wearing a tu-tu
that was too, too much, went over to the bar to order two beers.” We can even hear this
sentence and effortlessly track two (number), tu-tu (ballarina dress), too (comparator), to
(short for ‘toward’), and to (infinitive form of the verb ‘order’) without much thinking about
it. So shall it eventually with you and symbol overloading in physics.
The resistivity and/or conductivity74 is a characteristic of the material of the conduc-
74
Wikipedia: http://www.wikipedia.org/wiki/Electrical resistivity and conductivity. As usual, follow this
234 Week 5: Resistance
tor in question and as the Drude model (and experience) suggests, depends on many
things, such as absolute temperature and details in the structure of the conductor – almost
certainly more than are included in the model or our crude estimates so far!
L
A
J
E
ρ
Figure 5.5: A simple resistor with resistivity ρ, length L, and cross sectional area A.
For now, let us consider an archetypical “resistor”: a uniform conductor with resistivity
ρr , length L, and cross-sectional area A (where the ends are at right angles to the sides),
as pictured in figure 5.5.
We can rearrange the current density equation 5.44 as:
~ = ρr J~
E (5.50)
The electric field and current density inside of this volume are both uniform (in steady state,
all of the charges must move through the volume at the same speed or charge would build
up somewhere in the volume). The electrical current is the flux of the current through either
end, so: Z Z
~
E · n̂dA = EA = ρr J~ · n̂dA = ρr I through the resistor (5.51)
VR is thus the amount the electric potential decreases going from one side of the resis-
tor R to the other in the direction of the field/current. We will often write the potential without
the R subscript to simplify the algebra a tiny bit when there is no ambiguity introduced by
so doing, and will similarly usually omit the sign and just remember that the potential drops
going across a resistor in the direction of the current.
We’ve introduced a new quantity R, called the resistance of of the conducting material
in this particular geometry:
L
R = ρr (5.54)
A
wikipedia link to learn more about resistivity and conductivity than this short treatment allows, as well as
to access tables of resistivities and temperature coefficients of resistivity.
Week 5: Resistance 235
V (or VR ) = IR (5.55)
where as noted we have left off the sign. This equation is known as Ohm’s Law and we
will use it extensively in the weeks to come. Note that we could equally well have called
equation 5.44 Ohm’s Law, as the two basically assert exactly the same thing, one in terms
of current density and field, the other in terms of current and potential, and you may well
see it referred to by this name in more advanced electrodynamics textbooks.
The SI units of the resistance are known as ohms (volts per ampere, obviously) and
given the symbol Ω in most literature. Since a volt is a joule per coulomb, and an ampere
is a coulomb per second,
Note well that we used the fact that tthe SI units of capacitance, farads, are coulombs
squared per joule, so the SI units of R times C are seconds, a pure time. This will be
important to us by the end of this chapter.
Just from the simple relation R = ρr L/A we can tell many things about the ways resis-
tances will add in various configurations. If we put two identical resistances one right after
another in a circuit, that’s the same as one resistance twice as long, so we expect resis-
tances in series to add, increasing the total resistance. If we put two identical resistances
in parallel, that’s the same as one resistance with twice the area, which will decrease
the resistance by a factor of two. We therefore expect that parallel resistance will obey a
reciprocal addition rule. We will derive these two results more carefully below.
Before going on, it is worthwhile to point out the analogy between current flowing in
a wire with finite resistance and water flowing in a pipe packed with something e.g. sand
that similarly resists the flow of water. The flow of water through a sand-filled pipe is
proportional to the pressure difference across the pipe, so pressure difference is analogous
to voltage difference. The current of water is analagous to the current of charge. The
resistance of the pipe is analogous to the resistance of the sand-filled pipe. A pipe twice
as long will let half the water through at the same pressure difference. A pipe twice as wide
will let twice the water through at the same pressure difference. There is even a “current
density” for the water in motion that is the analogue of the current density of the charge.
Even pipes that are not filled with sand have an “Ohm’s Law” of the form ∆P = IR where
R is the “resistance” of the pipe and I is the volumetric current in the pipe, as we discussed
in the chapter on fluids in the first semester textbook.
This is really a rather compelling analogy, and since students are sometimes more
comfortable visualizing the flow of water in pipes than they are imagining electrons flowing
in wires, it is offered up to help you build up your conceptual understanding of the latter
using your prior knowledge and experience of the former, where a day doesn’t pass where
you don’t “switch on and off” the flow of water by means of increasing or decreasing the
area of a pipe using a tap and where the flow of water out against the resistance of all of
the plumbing isn’t increased or decreased by the water pressure entering your house from
the main.
236 Week 5: Resistance
In this analogy, a capacitor can also be visualized as a wide section of pipe containing
a piston on a spring. The piston blocks water flow, but if one applies a pressure difference
then water flows into the pipe section, compressing the spring, until the back-force of the
spring balances the force on the piston due to the pressure difference. At that point this
“capacitor” has stored some water on one side and has had an equivalent amount pushed
off the other side, just like a regular capacitor. Note well that this suggests correctly that
capacitors will dynamically behave like springs in an electrical circuit, storing potential en-
ergy and charge and releasing it back to the circuit, causing current and charge to oscillate.
Later we’ll discover a quantity and associated electrical device that behaves just like mass
in such an analogous arrangement, and our analogical reasoning will be complete!
at least for T close to room temperature (at very low temperatures quantum phenomena
come into play and at very high temperatures our simple model breaks down in other ways).
This is not what is observed. In fact, the resistivities of most common resistive metals
increase approximately linearly with temperature, at least in the range of temperatures
near room temperature, although they deviate significantly from this at low temperatures.
Some metals aren’t even linear or square root in temperature - they are best fit by power
laws with exponents not equal to 1 or 1/2. Worse, insulators tend to have their (initially
large) resistivity decrease with increasing temperature – going the exact opposite to the
way we expect from anything like the classical Drude model.
This is why our classical discussion of resistance has been more to give you a plausible
picture of conduction and resistance and explain some features of the result without ever
claiming to be a good model or a correct model. To correctly understand resistance in
materials, one simply has to use quantum statistical mechanics and electronic band theory
from the beginning, making it a remarkably difficult subject to study or make quantitative
Week 5: Resistance 237
predictions about. The classical predictions get some features right for some materials,
but get some wrong (like temperature dependences) for nearly all materials.
Fortunately, none of this detail matters too much to people who want a practical de-
scription of temperature dependence that will work well enough for most materials near
room temperature. In this case one can take the correct thermal dependence – whatever
it might be – and develop a linearized Taylor series expansion and express the result with
tabulated coefficients ρ0 , α, and T0 (a reference temperature, e.g. 20◦ C = 293◦ K corre-
sponding to ρ0 and α):
ρr (T ) = ρ0 {1 + α(T − T0 )} (5.59)
∂ρ
In this expression, ρ0 is the resistivity at temperature T0 and α = ρ10 ∂T evaluated at T = T0 .
α is called the temperature coefficient of resistivity. This equation allows resistivity to be
accurately computed across a moderate, relevant, range of temperatures by means of
three mutually tabulated quantities75 .
However, this linearized expression, even, is too complicated for most of our purposes
here76 . In this introductory classical textbook we we will generally assume that α ≈ 0
(or that we are at T = T0 ) so that ρ = ρ0 for any given material and concentrate instead
below on the simple scaling of resistance with length and area of the resistor, Pouillet’s
law. Obviously, one cannot do this if one is designing circuits that heat up significantly
as they operate or that have to function correctly across a wide range of temperatures,
and this whole approach fails for things like semiconductors or superconductors that can
be understood only with a correct treatment in quantum theory or very different functional
equations.
V C R
Figure 5.6: Symbols for batteries, capacitors, resistances, wires, and ground.
Before proceding any further, we need to add a symbol to our collection of symbols for
circuit elements. We already have a symbol for capacitance, for a voltage source or bat-
tery and for a “wire”, but now that conducting wires have this new property of resistance,
we need to be a bit more specific. From now on, wires will be assumed to have zero re-
sistance in all circuit diagrams. This specifically means, since VR = IR, that the voltage
drop across any ideal wire is zero independent of the current carried by that wire. Obvi-
ously, this is not physical, but if the resistance of the wire is important, it will (and should)
75
Wikipedia: http://www.wikipedia.org/wiki/Electrical resistivity and conductivity. This is a good article for
you to look over to get a hint of the quantum theory as well as a useful table for many materials of these
parameters in the linearized expression for ρr (T ).
76
Unless, of course, you are a physics major or are interested in electrical engineering, in which case you
would do well to at the very least earmark this discussion for future reference in more advanced courses.
238 Week 5: Resistance
be indicated as an explicit “resistor” in series with the wire in question that represents the
resistance of that particular segment of wire. Resistance itself has the new symbol indi-
cated above, typically labelled with its resistance value in Ohms or a suitably indexed R.
Batteries and capacitances are unchanged (although both may have internal, non-ideal
resistance that will similarly be represented by in-line series or parallel resistance symbols
when appropriate). Finally, the ground symbol, indicating a specific potential of zero for all
wires connected directly to it, is recapitulated.
We are now ready to draw collections of individual resistors connected in series or
in parallel, and to derive the effective total resistance of these arrangements. These are
pictured in figure 5.7.
a
I tot
I tot
a b
I1 I2 I3
R1 R2 R3
R1 R2 R3
b
I tot a
a b
R tot I tot
R tot
b
(a) (b)
Figure 5.7: Three resistors R1 , R2 , R3 arranged in series (left, (a)) and parallel (right, (b)),
along with the equivalent/total resistances of each one portrayed below. In both cases the
total resistance is “equivalent” when applying a voltage Vab across the a and b contacts
produces the same total current Itot in the top and bottom figure.
5.3.1: Series
Suppose we apply a fixed voltage Vab across the contacts in the upper (a) diagram. This
produces some current Itot in the single (serial) line of resistors. Since charge is con-
served and there is nowhere for it to go but through the resistors, this same current passes
through each resistor in turn. We can thus use Ohm’s Law to determine the voltage drop
across each resistor in terms of this total current:
V1 = Itot R1 (5.60)
V2 = Itot R2 (5.61)
V3 = Itot R3 (5.62)
Rtot = R1 + R2 + R3 (5.65)
There was nothing “special” about having only three resistors. We could have had, four,
five, or N resistors in series and we’d simply have more terms in a general equation:
N
X N
X
Vab = Itot Ri = Itot Ri = Itot Rtot (5.66)
i=1 i=1
so that in general the rule for the addition of N resistors in series is:
N
X
Rtot = R1 + R2 + ... + RN = Ri (5.67)
i=1
5.3.2: Parallel
In the case of resistances in parallel, we have the same voltage Vab applied across all of the
resistors in parallel. If we look at the upper (b) figure, we can use Ohm’s Law to evaluate
the current through each resistor, given a common voltage Vab across them:
Vab
I1 = (5.68)
R1
Vab
I2 = (5.69)
R2
Vab
I3 = (5.70)
R3
Now, consider the total current Itot flowing into the arrangement from point a. Charge
is conserved, so that all of the charge that flows into the first junction connecting the three
independent conducting pathways through the resistors must flow out of it and into the
three resistors. From this we conclude that:
Vab Vab Vab 1 1 1
Itot = I1 + I2 + I3 = + + = Vab + + (5.71)
R1 R2 R3 R1 R2 R3
Vab
Itot = (5.72)
Rtot
and when we equate these two forms and cancel the common Vab we get:
1 1 1 1
= + + (5.73)
Rtot R1 R2 R3
240 Week 5: Resistance
There is nothing special about three resistors, and once again we can easily generalize
this argument to N resistors as:
N
1 1 1 1 X 1
= + + ... + = (5.74)
Rtot R1 R2 RN Ri
i=1
We conclude that the total resistance of several resistors in series is the simple sum of
the individual resistances, while the reciprocal of the total resistance of serveral resistors
in parallel is the sum of the reciprocals of the individual resistances. This is the exact
opposite of the rules for summing capacitances in seris and parallel.
V2 = I R 2
I V3 = Q/C 3 I1
V1
I2 I3 I4
V4
start
V5 = I R 5
(a) (b)
Figure 5.8: (a) A single “generic” circuit loop; (b) A single “generic” circuit junction.
In the previous sections we used two rules implicitly that we should make explicit so that
we can use them in the more complicated circuits we will study over the next few weeks.
In studying series capacitors and series resistors, we used the idea that we could add
the changes in voltage across objects in a common wire carrying a steady state current
(including no current at all) to find the voltage changes between any two points in the wire.
This is an idea related to energy conservation. In studying parallel capacitors and and
parallel resistors, we used the idea that the total charge moving around in these circuits
must be conserved to track its distribution over time whether or not it is actually moving.
These two rules (which we will derive and discuss below) are known as Kirchhoff’s
Rules77 .
77
Wikipedia: http://www.wikipedia.org/wiki/Kirchhoff’s Circuit Laws.
Week 5: Resistance 241
Consider the generic circuit loop in figure 5.8 (a) above. The particular devices in this loop
are not too important – I drew a fairly arbitrary mix of the three devices we are aware of so
far, but later we will learn about still more devices we might want to put into a circuit to do
some startlingly useful things.
Let us imagine that we watch a charge +q moving around this circuit loop in the direc-
tion of the current beginning at the (arbtrary) point “start”. As it goes across each potential
V1 , V2 , ... the energy of the charge goes up, goes down, goes up, goes down. By the time
it gets back to the start position, its potential energy has changed by:
X
∆U = qV1 + qV2 + qV3 + qV4 + qV5 = q Vi (5.75)
i
If ∆U 6= 0, then the charge gets back to its starting point with a different energy than the
one it started with! Its kinetic energy will have changed!
However this is almost impossible. Electrons in particular, as fermions, are nearly
completely incompressible in a wire. This means that the current in any line segment is the
same at all points in the segment. Changes in the electric field that produces the current
at all points in the conductor propagate nearly instantaneously throughout the entire loop,
because the speed of light is very large compared to the size of the loop. As potentials
across the elements in the circuit vary, the current adjusts almost instantaneously. Con-
sequently within a very tiny margin associated with this propagation time, the net energy
gain or loss of a charge in a pass around the circuit loop must be zero!
This means that:
loop
X
Vi = 0 (5.76)
i
is a simple statement of energy conservation for the charges as they progress around the
loop. This equation is known as Kirchhoff’s Loop Rule, and we will use it repeatedly to write
down equations that lead to equations of motion for dynamical circuit loops or conditions
that must be satisfied for loops that carry steady state currents.
Consider the generic circuit junction in figure 5.8 (b) above. Again it doesn’t matter much
what devices are on any of the legs. Charge is conserved – it is neither created nor
destroyed. The junction itself cannot act as a reservoir for charge – it has negligible capac-
itance because it is part of a continuous volume of (presumed perfect) conductor that can
conduct any charge surplus of the (incompressible) charge away as rapidly as it develops.
This means that all the charge going into the junction has to go out of the junction along
the various wires that join together at the junction. This rule can be written, and thought of,
in two different ways:
I1,in − I2,out − I3,out − I4,out = 0 (5.77)
with the convention that current going into the junction is positive and current coming out
of the junction is negative. Alternatively, you can sort out the currents coming in and the
242 Week 5: Resistance
and call it Kirchhoff’s Junction Rule (using the ± convention). Remember, the junction rule
is just the symbolic expression of charge conservation, just as the loop rule is the symbolic
expression of energy conservation.
Just for grins, let’s put both rules down side by side, the way you should probably
remember them:
loop
X
Vi = 0 Loop Rule (5.80)
i
junction
X
Ii = 0 Junction Rule (5.81)
i
terminals
+
r
Vterminal R
V0
− Load
Battery
Figure 5.9: A non-ideal battery in a circuit with a resistive load.
Previously, we indicated that any real battery (or electrical power supply, not necessarily
a battery) is incapable of doing an infinite amount of work or deliving an infinite amount of
power. If you put a load of any sort on a real battery, you can increase the load (the power
draw) up to a point, but if you try to draw more power than the battery can deliver, the net
power delivered to the load will actually decrease.
This is actually a key principle of electrical design, where one often wishes to deliver
the maximum power to some load – a speaker, a radio antenna, even a light bulb – that
the power supply will support.
If one short-circuits a battery – connects a very low/zero resistance across its termi-
nals – then the battery will usually deliver its maximum power, and its maximum possible
Week 5: Resistance 243
current. A very simple, but quite accurate, model for this limiting is indicated in the figure
5.9 above. In it, a hypothetical chemical battery is represented as the two circuit elements
inside the dotted box. One is the actual chemical potential generated by the chemical re-
action. This is called the internal voltage of the battery78 . As we shall see, this is also the
voltage between the terminals of the battery when there is no load, if the chemical process
has not exhausted the reactants (if you like, the “fuel” of the battery). In addition to this,
the battery is considered to have an internal resistance r that limits the current the batter
can deliver even when completely short circuited.
We are now (with both Kirchoff’s rules and/or series resistances in hand) well capable
of understanding how all of this works. Kirchoff’s rule for the circuit loop is:
V0 − Ir − IR = V0 − I(r + R) = 0 (5.82)
or:
V0
I= (5.83)
r+R
The terminal voltage is defined to be Vt = V0 − Ir, the voltage between the physical
terminals of the battery when it is delivering any given current I. If R = ∞, I = 0 and
Vt = V0 as indicated above. If R = 0 (the battery is “short circuited” when a zero resistance
is connected across the terminals) we find that:
V0
Imax = (5.84)
r
In practical terms, the internal voltage is usually known, fixed by the chemistry of the bat-
tery, and one can measure the internal resistance indirectly by short-circuiting the battery
while measuring the delivered current. As batteries are discharged (or as rechargable bat-
teries age) this internal resistance increases until their terminal voltage effectively drops to
zero if a load of any sort is connected across the terminals.
Note well that we can easily compute the power delivered to the internal resistance (the
battery itself, generally heating up the battery with its internal Joule heating) versus its load
resistance R:
r
Pr = I 2 r = V02 (5.85)
(r + R)2
and
R
PR = I 2 R = V02 (5.86)
(r + R)2
The sum of theses add up to the total power provided to the circuit by the internal volt-
age/energy source, as it must.
It is an instructive exercise to demonstrate that the power delivered to the load is a
maximum when r = R, when the load resistance matches the internal resistance of the
power supply. This is called impedance matching – impedance is a sort of generalized
78
This was historically called the “electromotive force”, or “EMF” of the battery, and it is still often represented
as E in physics textbooks and called the EMF. I find it difficult to call or label something that is clearly a voltage
a force, even by obscure inheritance. This is doubly so for chemistry, where the actual motivation is caused
by the discrete quantum energy changes between the reactants and the products and where it is a lot of work
to even define a good quantum analog of “force” at all. I therefore rebel in my own small way and just call a
voltage a voltage and differentiate only with modifiers.
244 Week 5: Resistance
resistance that we will study in more detail in the chapter on AC circuits, but in the case of
DC circuits it is equal to ordinary resistance. Impedance matching is an essential part of the
engineering of things like earphones or speakers, where one limits the power deliverable
to the load by any given amplifier.
Although we will have many opportunities to use Kirchhoff’s Rules in the chapters to come,
it is worthwhile to apply it to an archetypical problem where it is necessary to use both rules
to determine the currents in a multiloop circuit with resistors and batteries. The problem
doesn’t have to be particularly difficult, but it does need to illustrate all of the steps required
to solve problems of this time, as well as some of the caveats – places where things one
might try don’t advance you towards the solution.
3Ω 3Ω
I1 I3 I2
8Ω
9V 3V
Figure 5.10: Use Kirchhoff’s Rules to find the three unknown currents: I1 , I2 , I3 .
3Ω 3Ω
I1 I3 I2
1 2
3
8Ω
9V 3V
Figure 5.11: Note the loops and current directions identified on the figure.
Recall that the potential decreases when we go across a resistor in the direction of the
current. We do not write the equation for the bottom junction because it is just -1 times the
top junction equation and hence not independent.
We immediately notice that there is a wee problem – we have four equations and only
three unknowns! This means that our equations cannot all be independent. If you examine
the first three equations, a moment of reflection should convince you that the third equation
(for loop 3) is the equation for loop 2 minus the equation for loop 1. This is characteristic
of multiloop problems – the sum or difference of interior loops always adds up to exterior
loops as the inner/shared voltages cancel.
This is very important to remember when we solve the simultaneous equations – adding
loop equations to eliminate variables does not make progress towards the solution! It just
gives you another loop equation. In order to make progress, you must use the junction
equation(s) and a subset of the loop equations. Let’s dump the equation for loop 3 and
keep only the three we need to solve the problem. With a bit of rearrangement, we get:
There are many ways to proceed to find a solution to this linear system. One can
line up the I’s, form a matrix equation, and invert the matrix using more or less standard
determinants and linear algebra. One can line up the I’s and do Gauss elimination (being
careful to use the junction rule before the loop rules) followed by back substitution. Or in
the case of systems as simple as this one, one can just use substitution to eliminate one of
the currents using the junction equation, then eliminate one of the two remaining currents
(followed by back substitution), a sort of sloppy Gauss elimination. Being a sloppy kind
of guy (and not wanting to teach a course in linear algebra on top of everything else) I’m
going to illustrate the solution of this problem with this latter approach, but if you are down
with using Cramer’s Rule (the fancy name for the first approach) so am I.
246 Week 5: Resistance
or
57I1 = 75 (5.100)
or
75
I1 = = 1.316 (5.101)
57
(in Amperes). We substitute this back into:
75
11 + 8I2 = 9 (5.102)
57
so
75
I2 = (9 − 11 )/8 = −0.785 (5.103)
57
Finally
I3 = I1 + I2 = 1.316 − 0.785 = 0.531 (5.104)
Note well that I2 comes out negative – this simply means that we guessed its direction
incorrectly in our original decoration of the figure. The second battery is actually being
charged as the first one discharges. This is (as you can see from the numbers) about as
nasty a problem of this sort as you are likely to see. Usually problems like this on a quiz or
exam will have voltages and resistances that are chosen to give rational answers that one
can work out without needing a calculator.
5.5: RC Circuits
So far everything we have done with charges and currents has been static. True, we
have studied flowing currents but those currents have been constant in time, as have
all potential differences. We now illustrate the use of Kirchhoff’s Loop Rule to obtain an
equation of motion for the charging or discharging of a capacitor through a resistance. We
begin with a discharging capacitor, as the slightly easier problem.
Week 5: Resistance 247
t=0
I(t)
Q0
C R
Figure 5.12: The capacitor C is initially charged to Q0 . At t = 0 the switch is closed and it
discharges through the resistor, building up a current I(t).
The capacitor in figure 5.12 is initially charged to Q0 . At t = 0, the switch is closed and
charge begins to flow off of the capacitor and is driven through the resistor, so that at time
t there is a charge Q(t) left on the capacitor and a current I(t) in the circuit. Our goal
is to basically understand everything about this problem. We want to know I(t), Q(t),
VC (t), VR (t), the power PC (t) delivered by the capacitor, the power PR (t) consumed by the
resistor, and a full understanding of energy as a function of time in the circuit.
To find all of this, we begin by writing Kirchhoff’s Loop Rule for the loop above (going
clockwise around the circuit in the direction of the current), at some time t after the switch
is closed:
Q
− IR = 0 (5.105)
C
The current and charge are not independent. The current is, in fact, the rate at which the
charge on the capacitor decreases:
dQ
I=− (5.106)
dt
If we substitute this relation into Kirchhoff’s loop rule, divide by R, and rearrange, we
get the following equation of motion for Q:
dQ Q
+ =0 (5.107)
dt RC
This is a first order, linear, homogeneous, ordinary differential equation, in fact the equation
for exponential decay. It can easily by solved by direct integration. The solution proceeds
as follows. Rearrange the equation as follows:
dQ Q
=− (5.108)
dt RC
Multiply through by dt, divide through by Q, to get:
dQ dt
=− (5.109)
Q RC
Integrate both sides (indefinite integral on the right):
t
ln(Q) = − +A (5.110)
RC
248 Week 5: Resistance
From this we can easily find the other quantities mentioned above:
dQ Q0 −t/RC
I(t) = − = e (5.113)
dt RC
Q Q0 −t/RC
VC (t) = = e = V0 e−t/RC (5.114)
C C
Q0
VR (t) = −I(t)R = − e−t/RC (5.115)
C
Q0 −t/RC Q0 −t/RC
PC (t) = VC (t)I(t) = e e
C RC
Q20 −2t/RC
= e (5.116)
RC 2
Q0 Q0 −t/RC
PR (t) = VR (t)I(t) = − e−t/RC e
C RC
Q2
= − 02 e−2t/RC (5.117)
RC
(5.118)
Note well that the power delivered to (+) the circuit by the capacitor equals the power used
by (-) the resistor!
The final little piece of magic we can look for is energy balance. Suppose we wait a
very long (“infinite”) time – we expect the charge on the capacitor to go to zero in that time.
How much energy appears in the resistor during that entire period?
UR = |/int∞ 0 PR (t)dt|
2 Z ∞
Q0
= e−2t/RC dt
RC 2 0
Q20 ∞ −2t/RC −2 dt
Z
= − e
2C 0 RC
Q 2 ∞
= − 0 e−2t/RC
2C 0
Q20
= (5.119)
2C
(5.120)
which just happens to be the total energy initially on the capacitor:
1 Q20
UC = (5.121)
2 C
The argument of an exponential (or any transcendental function) has to be dimension-
less, so the units of RC must be a time, the so-called exponential decay time for the circuit:
τ = RC (5.122)
This is an important quantity to keep in mind when working with RC circuits, as it provides
an instant estimate for how long it will take for the charge on the capacitor to decay.
Week 5: Resistance 249
t=0
I(t)
C R
V0
Figure 5.13: An initially uncharged capacitor being charged through a resistor by a battery
with a fixed voltage V0 .
In figure 5.13 we have added a battery and changed the initial condition to Q(0) = 0,
an initially uncharged capacitor. The solution to the problem proceeds almost identically to
the charging case. From Kirchhoff’s loop rule:
Q
V0 − − IR = 0 (5.123)
C
The current is now the rate at which the charge on the capacitor increases:
dQ
I=+ (5.124)
dt
Now, pay attention for a second, as it took me years of solving this inefficiently before I
finally figured out how to do the algebra efficiently, and I’m going to share a little trick with
you that will help you get the right answer for this equation (which occurs over and over
again in physics, both last semester and this): Before multiplying out and trying to integrate
factor the coefficient of Q out of the entire left hand side!:
dQ 1
=− (Q − CV0 ) (5.127)
dt RC
and set the constant of integration B = eA (the exponential of an unknown constant is still
an unknown constant79 ) from the initial conditions, so that Q(0) = 0. Our final answer is:
Q(t) = CV0 1 − e−t/RC (5.132)
It is left as an exercise to evaluate the same list of quantities that we did for the dis-
charging capacitor: I(t), VC (t), VR (t), PC (t), PR (t). To this we add PV (t), the total power
provided to the circuit by the voltage, and suggest that you demonstrate that as t → ∞
the total energy provided to the circuit by the voltage equals the total energy stored in the
capacitor in the end plus the total energy burned in the resistor. Note well that because
our solution was based on Kirchhoff’s loop rule, which is the constraint that work-energy
be satisfied, it should come as no surprise that in the end energy conservation is precisely
embodied in the full integrated solution we obtain.
Yet to me, it always does. There is something amazing, almost magical, in the way that
energy conservation works out in the equations of electromagnetism, given the complexity,
the structure, the detail we see in the many different problems we work throughout the
semester and beyond (as electromagnetism is a major foundation of our understanding of
everything, in both classical and quantum physics). But it does.
We live in an enormously conservative Universe, where there are, quite rigorously, no
free lunches, where mass-energy never whimsically appears or disappears, where one
can, with sufficient care, trace out and balance every conserved quantity in any problem
no matter how many bodies are involved or how complex the dynamics of the system.
This concludes our examination of RC circuits and our return to the world of dynamical
equations of motion with nontrivial solutions, in this case exponential solutions (although
we have done our best to keep our hand in with the occasional “discovered” oscillator or
constant acceleration problem on the homework so far). RC circuits are quite important
and occur in nature as well as in most electronic devices, where they are often used for
timing purposes or where RC exponential charging or discharging behavior is an artifact of
the circuit design that “softens” the edges of sudden square-wave-like transitions in voltage
as they propagate into a circuit leg with nonzero resistance and capacitance.
The most important place that they occur in nature is probably inside the brain. The
nervous system is decently modelled by neurons as tiny bioelectrical batteries that charge
79
At your convenience, meditate upon the units implicit in this constant and figure out how they make it
through the process above, where certain things have to be dimensionless and others do not...
Week 5: Resistance 251
up capacitance across a membrane with variable resistance, a resistance that goes from
very high to very low “suddenly” as the membrane depolarizes and channels open that
permit the transport of e.g. sodium ions. As such there is a “rise time” required to charge
up a neuron to where it can fire, followed by a sudden exponential drop in charge across
the membrane when it does fire to create an electrical pulse capable of triggering the next
neuron(s) down the network. From nothing but this we can deduce a number of important
properties of biological neural networks: They have a maximum firing rate (consider the
charging/discharging curves, where one has to exceed some threshold in order to be able
to trigger downstream neurons upon depolarization). They consume energy, as all of the
teensy biological batteries that charge them up deliver power to the circuit – the human
brain, for example, consumes around 1/4 of the metabolic energy used by the entire hu-
man body, some 25 watts (out of 100 watts total). Neurotoxins such as tetrodotoxin80
which block the sodium channel effectively freeze the otherwise variable resistance of the
capacitative membrane, locking each neuron in the “charged” state and preventing the trig-
gered discharge that is required for normal operation. Various nervous system disorders
are related to “short circuiting” this network (by e.g. altering the resistance of the myelin
sheaths that protect the axons of the neurons as they transport the current pulse down-
stream to the next neural synapse. Other disorders or neurotoxins are associated with the
neurotransmitter-mediated transport across the synaptic gaps themselves.
Basically, one cannot even begin to understand the biology of the nervous system of
any organism without at least a conceptual understanding of batteries, resistances, and
capacitances, and a sound conceptual understanding is always based on having really
gone through the whole thing and worked it all out, in detail, at least one time in your life.
So even if you don’t plan to become a physicist and work on all of this (very cool) stuff for
the rest of your life, pay attention and work hard on it now, because if you do you will reap
the rewards in your work in other disciplines, where you will discover it lurking, time and
again, to confound your understanding if you never worked hard enough to master it now.
This concludes our treatment of electrostatics with our first electrodynamic model. It is
time to move on from the electrostatic field to the next major piece of the electromagnetic
puzzle: The magnetic field.
80
Wikipedia: http://www.wikipedia.org/wiki/tetrodotoxin. Found in pufferfish and blue-ringed octupi, for the
marine biology crowd.
252 Week 5: Resistance
Problem 1.
Physics Concepts
Make this week’s physics concepts summary as you work all of the problems in this
week’s assignment. Be sure to cross-reference each concept in the summary to the prob-
lem(s) they were key to. Do the work carefully enough that you can (after it has been
handed in and graded) punch it and add it to a three ring binder for review and study come
finals!
Problem 2.
R R
Find the current through each resistor with a voltage
R V is placed across the resistance network as shown
R
to the left. Note that all of the resistances R are equal.
You’ll basically need to use the series and parallel
R R
rules for adding resistances several times, as well as
Ohm’s Law and Kirchhoff’s junction rule. Hint: You
R may find it useful to imagine V = 18 volts and R = 1
ohm. This makes the numbers easy, although it isn’t
that difficult to do this just with algebra.
Problem 3.
5Ω 1Ω
Problem 4.
t=0
S
Q0
R C
a) Write Kirchoff’s Loop Rule for the circuit at an arbitrary time t after the switch is
closed. Convert this into a (first order) equation of motion.
b) Integrate the equation of motion to find first QC (t), and then I(t), VC (t) and VR (t).
c) Using your results to b), find the power delivered to the circuit as a function of time
and show that it equals the sum of the power being burned in the resistor plus the
power that is charging the capacitor (verifying energy conservation for this circuit).
Problem 5.
t=0
S
R
V0 C
a) Write Kirchoff’s Loop Rule for the circuit at an arbitrary time t after the switch is
closed. Convert this into a (first order) equation of motion.
b) Integrate the equation of motion to find first QC (t), and then I(t), VC (t) and VR (t).
c) Using your results to b), find the power delivered to the circuit as a function of time
and show that it equals the sum of the power being burned in the resistor plus the
power that is charging the capacitor (verifying energy conservation for this circuit).
254 Week 5: Resistance
Problem 6.
t=0
S
R
C1 C2
b) Using Kirchoff’s laws for this arrangement, find the ordinary differential equation
(ODE) for (say) Q1 (t) and thereby the time constant for the equilibration process.
Note that you do NOT have to solve the ODE, just formulate it with dt and some
arrangement of R, C1 , and C2 on the other side.
c) All Students: GUESS what the solution to the ODE looks like, based on your an-
swers to a) and b). To do the latter, try visualizing what Q1 (t) and Q2 (t) will formally
look like – it is just a matter of setting the various constants so that the asymptotic
(final) and initial conditions are correctly represented and the approach to those con-
ditions has the right time dependence.
d) Advanced Students Only: Solve the ODE (it is integrable, although a bit messy) for
Q1 (t) and Q2 (t). It’s probably best to solve for just one of the two, and then use
conservation of charge to find the other, right?
Advanced Problem 7.
Once you get the square lattice, think about other infinite lattices – for example infinite tri-
angular lattices in 2d, or infinite cubic lattices in 3d. Can you just “write down” the total
resistance of these two lattices? Are they the same or different? Why?
III: Magnetostatics
255
Week 6: Moving Charges and
Magnetic Force
~ = q(~
F ~
v × B) (6.1)
which we use to define the magnetic field B~ much as we defined the electric field in
terms of the force observed and described by Coulomb’s Law.
For the moment we will ignore just how vB got there, as we live in a locally uniform
magnetic field due to the Earth all the time and can discover magnetic materials in
nature so natural sources of magnetism are ubiquitous.
~ = q(~
F ~ =⇒ P = dW = F
v × B) ~ ·~v = q(~ ~ ·~
v × B) v=0
dt
is an identity of the cross product, so magnetic forces do no work on non-spinning
charged particles.
– A cyclotron.
– A velocity selector (region of crossed fields).
257
258 Week 6: Moving Charges and Magnetic Force
e
– Thomson’s apparatus for measuring m.
– A mass spectrometer
– The Hall effect (region of crossed fields in a conductor).
m
~ = N IAn̂ (6.5)
where N is the number of turns, I is the current, A is the area, and n̂ is the right-
handed normal to the plane of the loop.
~
τ =m ~
~ ×B (6.6)
U = −m ~
~ ·B (6.7)
~ = −∇U
F ~ = ∇(
~ m ~
~ · B) (6.8)
Magnetic dipoles align with the field due to the torque, and then follow the field back
to where it is stronger, just as do electric dipoles. Students have experienced this
with toy magnets and refrigerator magnets from when they were very small – this is
why bar magnets attract one another.
You should be able to compute the magnetic moment of simple current loops, al-
though we’ll get more practice at this in the next chapter/week.
In our discussions of the electrostatic force, we were able to start with a fundamental
experimental result – Coulomb’s Law – and proceed to systematically deduce nearly all of
electrostatics including the more fundamental expression of Coulomb’s Law: Gauss’s Law
for the Electric Field. Coulomb’s Law alone told us both how to create an electric field and
what the force was in terms of the field.
Life is not quite so simple for the magnetostatic field (where the “static” aspect refers
to the field itself, not to the charges moving in or acting as sources of the field). In this
and the next chapter we will learn that moving charges in a magnetic field experience a
force according to a basic experimental rule (given a field) and moving charges in turn
act as sources for a magnetic field (as one can experimentally verify by measuring forces).
However, the original experiments, conducted by Ampere, that demonstrated both together
involved currents and not moving elementary charges.
We, on the other hand, are interested in developing a “microscopic” description of fields
that works for elementary point charges like electrons and quarks and that can be suitably
Week 6: Moving Charges and Magnetic Force 259
coarse-grain averaged into continuous distributions of charge and current (using the meth-
ods explored in the first part of the course). This suggests that we start with either force
acting on or field produced by moving point charges and work our way up to Ampere’s
experimental results with current balances, instead of trying to work our way backwards.
For better or worse we will therefore begin with the force exerted by a magnetic field
that we can think of as being defined by this force law, without (yet) worrying about where
the field comes from. In the next chapter (next week), we will explore in great detail the
sources of that field. Do not hestitate, however, to skip forward and backward between the
two chapters as you study, as knowing at least the summary of the next chapter will help
you with this one, just as you will certainly need to not instantly forget this chapter to move
on and learn the next one. Together they ultimately produce a single view of the magnetic
force between two moving charges and how it becomes the magnetic force between two
currents.
With that said, let us proceed directly to the basic relation that experimentally describes
the force exerted by a magnetic field on a charged particle. Note well that this force law
can be more or less directly observed in a Cloud Chamber 81 placed in a magnetic field.
Observations of many tracks (plus doing various current-based experiments) leads one to
conclude that the force acting on a charged particle with charge q travelling at velocity ~
v in
~
a uniform magnetic field B is:
F~ = q(~ ~
v × B) (6.9)
Ooo! That pesky cross product rears its ugly82 head! Sorry about that, but if you don’t
feel completely comfortable with a cross product yet, it is time to start really working on
it. See the associated mathematical physics documentation linked to this course and start
reviewing the good old right hand rule and the two or three ways available to compute
them.
This law is (as you can see) quite different from the electrostatic rule, and the force
depends on both the magnitude and direction of the velocity of the charge in the magnetic
field, and doesn’t point in the direction of the magnetic field at all! In fact, it points in
the direction perpendicular to the plane determined by the magnetic field and the velocity
81
Wikipedia: http://www.wikipedia.org/wiki/Cloud Chamber. Cloud chambers are actually quite easy to build,
and I have had the building of an operational cloud chamber used for the extra credit/honors project my
students often undertake. They are very cool – literally, as they are often cooled with e.g. dry ice or liquid
nitrogen – and they directly reveal to the eye the tracks of otherwise invisible charged microscopic/elementary
particles from the environment, from radioactive sources, from cosmic rays.
Just something to bear in mind if you are using this text in one of my classes with this third-of-a-letter-grade
option!
82
To introductory level students, at least. Actually, the cross product is amazingly beautiful, an essential part
of a geometric algebra that generalizes the idea of complex variables to higher “grade” (number of complex
dimensions). But to a student, “ugly” in this context is code for more complicated than the ordinary arithmetical
multiplicative product or the scalar inner product between two vectors, and yet essential to learn in order to do
well in the course!
260 Week 6: Moving Charges and Magnetic Force
vectors. Cross products are “twisty” beasts, always pointing off at right angles compared
to any of the directions one might expect.
This twistiness, however, doesn’t represent insoluble complexity, and you shouldn’t
throw your hands up in disgust or tremble in fear. As we will see, the motion produced
by the magnetic force acting on a point charge is often quite simple and easy to under-
stand and compute. To see this, we will begin at the beginning and solve for the motion
in the simplest case, motion when the velocity is perpendicular to the (uniform) magnetic
field.
One critical consequence of this form for the magnetic force law is that magnetic forces
do no work on classical moving charged particles! We can easily see this by looking
at the power delivered to a single charged particle by a magnetic field:
~ = q(~
F ~ =⇒ P = dW = F
v × B) ~ ·~
v = q(~ ~ ·~
v × B) v=0
dt
because (recall):
~ × B)
(A ~ ·A
~=0
(for any vectors A ~ and B ~ is an identity of the cross product. From the superposition
principle, this must hold even in the coarse-grained limit where electric currents are made
up of many moving charged particles.
Note well that when we say never, we mean never, but that the statement is qualified
by that “classical moving charged particles” bit. We will show later that if work is done on
ordinary classical charged particles or currents in an electrodynamics problem, it is being
done by the electric field, not the magnetic field. It may look like the magnetic field is
doing work (and amazingly, that’s how it works out algebraically) but for any arrangement
of classical moving charges with no intrinsic magnetic moments the work is really done by
electric fields instead.
Very shortly I will prove this statement for electric currents made up of the coarse-
grained motion of ordinary charged particles through a conductor in a magnetic field. How-
ever, to avoid misleading you with a statement – however true it is in this classical physics
course – that will get you into trouble when you hit quantum mechanics, I also need to
qualify this statement with a “semi-classical” correction and tell you when magnetic fields
can do (non-classical) work.
Elementary charged particles such as quarks and electrons (and very small composite
particles like protons and neutrons or even atoms and molecules) often have an intrinsic
quantum mechanical magnetic moment that does not arise from the motion of classical
charged particles constrained by e.g. electrostatic or nuclear forces that hold matter to-
gether to remain within the medium the way electric currents are constrained to remain
inside a conductor. Inhomogeneous magnetic fields can indeed do work on these non-
classical point-like “intrinsic” magnetic dipoles! There is direct evidence of this – slow neu-
trons passed through an inhomogeneous magnetic field split into two beams in the neutron
version of the Stern-Gerlach Experiment 83 . Since the neutron is electrically neutral and
83
Wikipedia: http://www.wikipedia.org/wiki/Stern-Gerlach experiment. This is a famous experiment that first
demonstrated the existence of a magnetic moment due to intrinsic quantum spin for first electrons, then protons
and neutrons.
Week 6: Moving Charges and Magnetic Force 261
no electric fields are present, it is difficult to ascribe the work associated with this splitting
to an electric field. However, the magnetic field associated with the intrinsic moment of
electrons is also the source of work that pulls a refrigerator magnet towards a refrigerator,
so you don’t have to go far afield to see it proven!
Now let’s move on to look at a variety of examples of using the Lorentz force law above
(or just the magnetic part of it) to describe the motion of classical charged particles. As we
will see in these examples, the magnetic field will never do any work on the particles, but
sometimes electric fields will.
Bin
vo
r
F +q
Figure 6.1: A charge particle with velocity perpendicular to a uniform magnetic field moves
in a circle.
In figure(6.1 above, we see a charged particle +q moving with initial velocity ~ v 0 perpen-
dicular to a uniform magnetic field B ~ 0 . The little crosses in this figure should be thought of
as the “feather” ends of vector arrows and stand for a vector that points into the page – a
circle with a dot will stand for the “tip” of the arrow and a vector pointing out of the page
should we ever need it.
~ acting on this charge is:
The force F
~ = q(~
F ~ 0)
v0 × B (6.10)
• Does no work. This in turn means that the speed of the particle is unchanged by the
magnetic field.
262 Week 6: Moving Charges and Magnetic Force
• Acts to bend the particle’s trajectory into a constant speed circle, with the magnetic
field providing the necessary centripetal force.
That is:
mv02
Fr = qv0 B0 = (6.12)
r
We can, of course, solve this equation for any single unknown given the rest of the
variables, but its most common use is to derive the so-called cyclotron frequency for the
circulating particle:
v0 qB0
Ωcyclotron = = (6.13)
r m
Note well that this angular velocity/frequency does not depend on the speed of the particle!
It is fixed by the charge of the particle, its mass, and the strength of the magnetic field only,
which means that identical particles take the same amount of time to complete a circuit of
their motion independent of their energy or velocity. This is the basis of the design of the
cyclotron, one of the original particle accelerators (still) used to probe the structure of the
atomic nucleus.
Alternating Voltage V
proton source
B field(up)
"Dee"s
Proton Beam
Figure 6.2: The schematic layout of a cyclotron. The electric field/potential difference
between the “Dees” of the cyclotron oscillates with the same period as the period of the
cyclotron frequency Ωcyclotron of the particles moving in the field, so that it always pushes
in the direction that speeds them up.
In figure 6.2 you can see the general design of a cyclotron. A suitable charged ion,
e.g. a hydrogen nucleus (proton) is produced by e.g. an electrical arc in a source in the
very center of the cyclotron with a low velocity. A powerful magnetic field bends the initial
trajectory into a circular arc in the plane perpendicular to the field.
In between the upper and lower halves of the cyclotron are two copper chambers
shaped like the letter “D”, with a narrow slit in the plane perpendicular to the field cut
Week 6: Moving Charges and Magnetic Force 263
along the straight segment in the middle. An alternating electric potential is applied be-
tween these two “Dees” that has the same angular frequency as the cyclotron frequency
of the particle being accelerated in the magnetic field in question so that when the particle
arrives at the gap between the upper and lower Dee in the figure above, it happens to point
down (and hence speeds the particle up). When the particle gets to the gap between the
lower and the upper Dee on the right, though, the field has switched direction and still
speeds the particle up still more. Every time the particle arrives at the gap, it finds the field
is there, aligned with its motion to give it yet another push.
This works because it takes all of the particles the same amount of time to make it
around a half-circle regardless of how fast they are going. So one can have a stream of
particles all falling across the gap at once at different radii from the source (with short gaps
between these ”pulses” that are in phase and being accelerated together). As the particle
is moving faster and faster, the radius of the circle of its motion increases until it reaches
an electrostatic deflector plate at the outside edge of the magnetic field that angles it into
a beam pipe where it travels through a vacuum to hit an eventual beam target.
Early cyclotrons played an important role in the development of nuclear physics, per-
mitting the creation and discovery of the first transuranic elements past plutonium (one of
which is named Lawrencium, after the inventor of the cyclotron, another of which is named
Berkelium after the University where Lawrence worked).
Cyclotrons, alas, no longer work when the particles are accelerated enough to be mov-
ing at relativistic velocities. At some point the time dilation of the cyclotron period in the
frame of the moving particle is enough to keep the particle from being accelerated by a Dee
voltage at the cyclotron frequency that worked for a slowly moving particle. One can “fix”
this problem by sweeping the frequency to match and acclerating only pulses of charge (in
a synchrotron) but as one reaches higher and higher energies other problems emerge.
The principal limiting factor is ultimately the fact that accelerated charges radiate, and
particles moving in circles are accelerated all of the time by the centripetal magnetic force.
This causes a kind of “resistance” wherein the work done speeding the particle up in a
cycle is balanced by radiative losses in the cycle. Only the use of very large circles can
minimize the latter, which is why the extreme relativistic accelerators of modern times, such
as the Large Hadron Collider (LHC) are enormous circles, the latter being 27 kilometers in
circumference.
In a nuclear collision, a lot of “stuff” is produced – nucleons knocked out of nuclei, electrons,
positrons, gamma rays, alpha particles, and more exotic particles that help us understand
the nuclear field itself. To be able to categorize and classify all of this “stuff”, it helps to be
able to “see” the trajectory of a particle produced in the collision, and determine things like
the ratio of its charge to its mass. A cloud chamber (and more exotic bubble chambers that
work on a similar principle) is a device that makes a charged particle’s trajectory visible so
that it can be photographed. It works by creating a “supersaturated” gas of e.g. alcohol,
water vapor, or other substances. The charged particle in question zips through the vapor
and causes it to bounce together in its wake, precipitating the vapor out as a condensation
264 Week 6: Moving Charges and Magnetic Force
trial, much like the jet contrails one can sometimes see overhead on a clear day. In a cloud
chamber the trajectories typically only last a few seconds before re-evaporating, but that is
long enough to be easily seen and/or photographed for later analysis.
Figure 6.3: The first photograph of a positron ever taken in a cloud chamber. Note the
curvature carefully. Which way is the particle travelling while slowing down? What direction
does the magnetic field in the chamber point?
By putting the chamber in a magnetic field and right next to a nuclear target, the positive
particles curve one way and the negative particles curve the other. The radius of curvature
is related to the charge and mass by:
mv0
r= (6.14)
qB0
and can be determined directly from a photographed trajectory.
At the same time, the particle slows down because the same process that causes
supersaturated gas molecules to precipitate out along its trajectory exerts a “drag” force on
the particle. By looking at the rate the particle’s trajectory curvature changes (and various
other things), one can estimate its momentum, the charge of the particle, and its mass.
Using this and many other specialized detectors, an enormous “zoo” of particles has been
discovered and categorized and transformed into a quantitative model for the nuclear force
that has at least some predictive power, although it is not yet a complete or perfect theory.
A simple cloud chamber is not too difficult to build – it requires a bowl, dry ice, alcohol,
cotton, black paint, a light source, and a few other things, but they are all fairly readily
obtainable. It is therefore a good candidate for an extra credit project, if your program has
one.
Another extremely useful application of magnetic fields acting on individual charged parti-
cles is the region of crossed fields. A region of crossed electric and magnetic fields, when
equipped with suitable collimating slits, can act as a velocity selector.
Week 6: Moving Charges and Magnetic Force 265
E field (down)
Collimating slits
q
v
B field (in)
Figure 6.4: A region of crossed fields functions as a velocity selector; only particles with
just the right velocity pass through undeflected.
A charged particle with charge q enters the device on the left by passing through col-
limating slits that ensure that its velocity is in the x-direction only. Inside the device a pair
of parallel plates creates a uniform electric field E ~ down, while a magnetic coil creates a
uniform magnetic field B ~ into the page as drawn.
From the right hand rule, the magnetic force on the charged particle is
FB = qvB (6.15)
FB = qvB = qE = FE (6.17)
E
v= (6.18)
B
in the x-direction. In this case the particle travels through undeflected and exits through
the collimating slit on the right.
Particles that are travelling too fast, however, have a magnetic force that exceeds the
electric force and are deflected up. They strike the barrier at the far end and fail to pass
through the slit. Similarly particles that are travelling too slowly have an electric force
that exceeds the magnetic force. They are deflected down and fail to make it through the
second slit.
Note well that this is a velocity selector and passes all particles with the right veloc-
ity regardless of their mass or their nonzero charge! The particle can have any charge,
positive or negative (except zero), or any mass – as long as it has the right velocity it will
still make it through undeflected. This makes it very useful for preparing particle beams
for certain kinds of experiments. It is also very closely related to the Hall Effect described
later.
266 Week 6: Moving Charges and Magnetic Force
+ "pudding"
− + nucleus
− −
−
− − vs
− − +
− −
− − − electron cloud
− "plums"
Figure 6.5: The “plum pudding model” that prevailed in 1897 on left, along with a more
accurate representation of the current atomic model – a massive nucleus surrounded by a
quantum “pudding” (the electron cloud).
The year is 1897. People know that matter is made up of atoms, that atoms are made up
of positive and negative charge, but the human species still does not know if the positive
and negative charges are themselves particles, and if so, what the charges and masses
of those particles are. There are a variety of models for atoms, most of them “static”
models that have negative and positive charge glued together in some way that keeps the
negative and positive charge from having to orbit one another the way the electrostatic
force suggests that they should, as James Clerk Maxwell has shown that classical atoms
made up of orbiting charged particles would radiate all of their energy away in a very, very
short time and collapse. One of the favorite models is in fact called the “plum pudding
model” portrayed in figure 6.5 (no kidding!) where negative charge is scattered like raisins
in a gooey pudding of positive charge.
The so-called “cathode ray tube” (or Crooke’s tube) has been invented for twenty or
thirty years, and a mere two years earlier a gentleman named Röntgen discovered that
cathode rays hitting the glass of the screen at high enough energies produce x-rays, capa-
ble of penetrating the human hand and forming images of the bones within (see figure 6.6
above) for which he received the first Nobel Prize in physics in 1901.
The question is: Just what are cathode rays? Are they particles? Do they have arbitrary
amounts of charge and mass? Are they a fixed fraction of the mass of e.g. a hydrogen
atom? Is the mass of a hydrogen atom split evenly between cathode (negatively charged)
material and anode (positively charged) material? J. J. Thomson set out to try to answer
these questions by using a specially modified Crooke’s tube to deflect cathode rays in flight
once they were produced at a heated electrical filament and accelerated by an applied
potential difference so that they formed a beam.
Initially the deflection was accomplished only by the application of an electric field in
between special plates built right into the tube (which was sufficient, as we shall see, to
measure the ratio of e/m for cathode ray particles or electrons (as they turned out to be)
and thereby show that they were a tiny fraction of the total mass of a hydrogen atom, so that
nearly all of the mass was associated with the positive charge only. Later Thomson added
a uniform magnetic field to his apparatus by means of a pair of “Helmholtz coils”. As we
have seen above, magnetic fields can also deflect moving charged particles, and indeed if
~ and B
a region of crossed fields is created, the E ~ fields together can be used to measure
Week 6: Moving Charges and Magnetic Force 267
Figure 6.6: The first “medical x-ray” ever taken, of the bones in Anna Berthe Röntgen’s
hand. She was the wife of Wilhelm Röntgen, the discoverer of x-rays.
the actual velocity of the particles, which permits their kinetic energy and/or mass to be
estimated and the consistency of all of the (many, not too accurate yet) measurements to
be checked.
B
E
e− ∆y
2
θ
vo
∆ y1
∆V
L D
Figure 6.7: Joseph John Thomson’s apparatus for measuring the ratio of the charge on the
electron to its mass (improved by the addition of a magnetic velocity selector). This was
a critical experiment in determining the structure of the atom, for which Thomson received
the Nobel Prize (only the sixth such prize awarded in physics).
produced an upward, constant upward acceleration and hence deflection of the electrons
(where we can completely ignore gravity in the experiment as the electrical acceleration
was vastly greater) as they traversed the plate length L. On the far side they emerged from
the field, travelled in a straight line for an x distance of D, and then struck the glass of the
screen, where they made a glowing spot.
By measuring the total distance of upward deflection of the spot from the center of the
screen (where they struck when the E-field was off) and the point where they struck when
the E-field was turned on to some known value, Thomson could reason backwards to the
ratio of e/m as follows.
First, we analyze the constant acceleration motion of the electron while it is between
the plates:
Fx = 0 (6.19)
Fy = eE = may (6.20)
from which we find (using 2D kinematics from the first semester – the problem is identical
to analyzing trajectories with a constant gravitational acceleration):
x(t) = v0 t (6.21)
vx (t) = v0 (6.22)
1 2 eE 2
y(t) = ay t = t (6.23)
2 2m
eE
vy (t) = ay t = t (6.24)
m
We can easily find the time the electron is between the plates:
L
t1 = (6.25)
v0
from x(t). Substituting this into the last two equations, we find that as it emerges from
between the plates:
eE 2 eEL2
∆y1 = t = (6.26)
2m 1 2mv02
and
eE eEL
vy = t1 = (6.27)
m mv0
From our knowledge of vx and vy when the particle emerges, we can find:
vy eEL
tan(θ) = = (6.28)
vx mv02
eELD
∆y2 = D tan(θ) = . (6.29)
mv02
Week 6: Moving Charges and Magnetic Force 269
Now we can relate the measured total y deflection to the known values of L, D, E, and our
estimated v0 :
e ytot v02
= (6.31)
m EL( L2 + D)
We know everything on the right (where we measure ytot ), so we have measured e/m!
Of course Thomson didn’t really know v0 – he had to estimate it from a mix of thermo-
dynamics and electrostatics in his first experiment. We, however, can see how the addition
of a crossed magnetic field permits him to precisely determine v0 . With the E field turned
on, simply turn up the magnetic field B until the particle’s deflection is once again zero.
At that point, the apparatus is functioning as a velocity selector, and we know from the
argument above that:
E
v0 = (6.32)
B
this can be substituted into the expression above to obtain a much more accurate estimate
for e/m, one that doesn’t rely on a prior knowledge of the thermal distribution of electron
energies before they are accelerated by the first potential difference:
E 2
e ytot B ytot E
= L
= 2 L (6.33)
m EL( 2 + D) B L( 2 + D)
by superheated hydrogen gas. However, the alternative was now a return to the classical
orbital model with electrons orbiting protons the same way a planet orbits the sun, in ellip-
tical orbits wherein the electron is constantly accelerating. Maxwell’s equations had long
since proven that such an atom would instantly collapse, radiating away electromagnetic
energy in all frequencies as it did so, not in some subset of discrete frequencies. Thom-
son’s experiment, simple as it is to us today in terms of our modern models and knowledge
of electromagnetism, truly deserved the Nobel Prize because it paved the way in a criti-
cal way for the invention of quantum mechanics and our current understanding of atomic
structure.
Bo
r2
r1
Vo q1 ,m 1 film/detector
q 2,m 2
"goop"
Figure 6.8: The Mass Spectrometer uses a region with a uniform magnetic field to create a
spectrum of particles that collide with a film or other detector matrix in places that indicate
the radius of the circle they are bent in by the field. This, in turn, is related to the ratio of
q/m for the particle, and by assuming a charge that is a low integral multiple of e one can
determine the mass.
q, m combination registers as a distinct signal a distance 2r from its entrance point (where
r is the radius of curvature of the species’ particular trajectory).
The molecular weight of the components of the sample is thus registered two ways.
Typically a “marker” species of known weight and concentration is introduced that permits
the distances from the entrance point to be calibrated and checked against a known mass,
and each particular components is likely to be present in single ionized form (with charge
e.g. +e), doubly ionized form (with charge +2e) etc. This appears as “similar” patterns
of bands on the film or detector which permits one to tell which pattern corresponds to a
particular charge, for example +e. From this combination it is straightforward to deduce
the charge and infer the mass of the various chemical components visible in the detector
fingerprint.
We can easily understand the physics behind the mass spectrometer. A charged ion of
charge q and mass m produced in the goop boiler is accelerated to a kinetic energy:
1
mv 2 = qV0 (6.34)
2
in the beam entering the magnetic field. It therefore has a velocity84 :
r
2qV0
v= (6.35)
m
and experiences a centripetal magnetic force (that causes it to move in a circle of radius r)
of:
v2
Fr = qvB0 = m (6.36)
r
so as usual:
v q
= B0 (6.37)
r m
If we solve for the radius r of its half-orbit to the film/detector, we get:
v m
r= (6.38)
B0 q
Substituting for v: q
2qV0 s √
m 2mV0 m 2V0
r
m
r= = = (6.39)
B0 q qB02 q B0
Alternatively, since one measures r and wishes to find m (given a good guess for q):
r 2 B02
m= q (6.40)
2V0
As one can see, the mass-to-charge ratio determines r, creating similar “bands” of molec-
ular signal for different ionizations of the same collection of constituent masses. Once the
charge on any given band is guessed/determined (where the lowest charge, in positive
multiples of e, will have the largest radius spectral pattern for each set of m’s) one can
transform a knowledge of r and q directly into m.
Most of this process can be automated and computerized, and mass spectrometers
based on this general principle are at this point commonplace in the laborator.
84
As before in the case of the Thompson apparatus, in reality the “boiler” would produce a Maxwell-
Boltzmann range of entering velocities, but we can insert a velocity selector stage to narrow the distribution to
“precisely” the desired/expected v.
272 Week 6: Moving Charges and Magnetic Force
The final object of our study of the magnetic force on single charged particles is the Hall
effect, the tendency of a current carrying wire in a magnetic field to build up a voltage
across the wire, or conducting strip that is based on spontaneous charge separation in
the conductor to create a “region of crossed fields” where the electric field/force precisely
balances the magnetic force (and simultaneously creates a potential difference).
+ + + + + + + +
vd
E
− − − − − − − −
Figure 6.9: In the Hall Effect, a magnetic field causes the mobile charge to accumulate on
the upper or lower edge of a conducting, current-carrying strip in a magnetic field. This in
turn creates a potential difference across the strip that can easily be measured.
The Hall Effect is a phenomenon that spontaneously occurs when a conductor carrying
a current is placed in a magnetic field that is perpendicular to the current. The effect is
easiest to observe in a ribbon shaped conductor that is relatively wide; one such is pictured
above with width w (top to bottom) and cross-sectional area A.
The Hall Effect can be used to make two very important classical measurements. First,
as we will easily see, we can finally determine the sign of the charge carriers in any given
material, as positive charge carriers (the particles that are physically moving to create the
current) will actually polarize the strip the opposite way than negative ones. Second, it
enables us to directly measure n, the density of charge carriers in our basic model of
conduction.
Here’s how it works. The strip is placed into a magnetic field perpendicular to the strip
as shown and a current is run through it. In the figure 6.9 above, we assume positive
charge carriers as usual so that the current is in the same direction as the drift velocity of
the carriers, from left to right.
At first these moving charges experience a magnetic force that (right hand rule!) diverts
them into a curved trajectory to the left as indicated by the dashed arrow on one of the
charges. However, charges near the top have nowhere to go and build up in a layer on the
upper surface of the strip. This charge layer creates an electric field that begins to oppose
the motion of still more charge until after a bit, the strip has equal and opposite amounts of
Week 6: Moving Charges and Magnetic Force 273
positive (upper) and negative (lower) charge on the top and bottom edges, the latter in the
form of “holes” left from which the positive charge carriers migrated.
The charges now move in a spontaneous region of crossed fields – the carriers in the
middle move in zero net force with the electric force down equal to the magnetic force
up. This, in turn, creates an electrical potential difference V across the strip that can be
measured with a voltmeter, at the same time that the current through the strip I is measured
with an ammeter.
We know that for each charge, when this situation is established:
qvd B = qE (6.41)
or
E
vd = (6.42)
B
We also know that:
E
I = nqvd A = nqA (6.43)
B
Finally, we known that
V = Ew (6.44)
or
V
E= (6.45)
w
so that
V
I = nqA (6.46)
Bw
We can then solve for n, the desired density of charge carriers:
IBw
n= (6.47)
qAV
One can measure I and V directly. B one can compute (although the Hall effect is actually
often used to measure B, as one can obviously turn this equation around and solve for B
with a strip made from a material with known n). w and A can be directly measured with a
ruler.
Best of all, we can finally see that the charge carriers in most metals are electrons, that
is, they are negative. Suppose that the carriers in the picture above were electrons and
negative. Then with a current travelling to the right, they would actually be moving to the
left. The magnetic field would then still divert them up, creating a negative strip of charge
on the upper edge of the strip and a positive one on the lower. The electric field – for the
same left-to-right current – would run from the bottom to the top when the desired region
of crossed fields established itself. This would make the top of the strip at a lower potential
than the bottom, the opposite of what one gets with a positive charge carrier.
Franklin’s Mistake is thus finally laid bare. Alas, the mobile charge in most conductors is
made up of negatively charged electrons, the “cathode ray” particles discovered by Thom-
son. This is not always the case, of course. Ionic fluid solutions (like salt water) can have
currents in which both charge carriers are present. Also, in semiconductors the carriers
274 Week 6: Moving Charges and Magnetic Force
can easily be quantum mechanical “holes” in the electron density that have an effective
positive sign.
As we can see, the magnetic force on discrete particles is a very useful thing! This by
no means exhausts the utility of magnetic fields for bending streams of charged particles
around to make them do our bidding.
In the last example, though, we went from a picture of single charges to one where
we were working with the coarse-grained continuum limit of a charged current once again.
Perhaps it is time to think about the magnetic force on current carrying wires!
If we contemplate our (by now) standard model for current in a uniform wire, where the
current I is given by: Z
I = nqvd A = J~ · n̂dA (6.48)
(where, recall, n is the density of charge carriers, q is the charge per carrier, vd is the “drift
velocity” – the average velocity of the carriers in the wire, A is the wire’s cross-sectional
area) then we can add up the magnetic forces on all of the charges in a short (differential)
length of wire dℓ:
~ = nq(Adℓ)~
dF vd × B ~ (6.49)
We now do a clever thing. We’ll collect the nqvd A magnitudes together and make I, and
take the direction of ~
v d and attach it to dℓ, making it a vector pointing in the direction of the
current in the wire. The result is:
~ = I(d~
dF ~
ℓ × B) (6.50)
for a small (differential) segment of wire carrying a current I in a magnetic field vB. Mag-
netic fields exert forces on current carrying wires!
To evaluate the total force on any given current carrying wire is not, of course, likely
to be easy unless the wire has a very nice geometry, such as being a straight line in a
uniform field or a circular loop of current in a uniform field. However, we can prove a very
interesting result for arbitrary current loops that lets us understand how magnetic forces
work on them to at least a decent approximation, especially when those loops are “small”
relative to everything else that is going on. Let’s procede.
In figure 6.10 you can see pictured a rectangular current loop with N turns, each carrying
a current I. When studying electrical currents and magnetic fields, using loops with many
turns is a cheap and easy way to get a larger current than one’s power source can ordinarily
support, as this is effectively a current of N I on each leg of the circuit. If you’ve ever looked
inside an electrical motor, or transformer, or generator, or electronic device, you’ll almost
Week 6: Moving Charges and Magnetic Force 275
a
F
b b out b
N turns,
each carrying I
n
a θ
φ B
pivot axis θ
b in
−F θ
b
Figure 6.10: The force and torque on an a × b rectangular loop of N turns, each carrying
~ are F
current I, in a uniform magnetic field B ~ = 0 and ~
τ =m ~ ×B ~ respectively.
certainly see loops of reddish (epoxy or enamel insulated) copper wire wrapped into loops
with many turns for just this reason.
The dimensions of this particular loop are a and b, although in the next section we’ll see
that these particular dimensions, and indeed the shape of the plane loop, are not terribly
important. I put the loop in the “inset” to the upper left so you can visualize what it might
look like lying on a table. Note that we’ll imagine that the loop has an “axle” on which it can
freely pivot. This too isn’t strictly necessary (we can pick other piviots that will work just as
well or better) but guessing that your recollection of torque is still a bit shaky it won’t hurt
to draw in a simple one that is easy to understand.
In the main part of the figure I’ve drawn an “edge view” of the loop as it sits in a uniform
magnetic field B~ pointing to the right. The “uniform” bit is very important – we obviously
would get a very different result for the force if (for example) the field on the upper b side
were larger than the field on the lower b side!
Evaluating the force on each of the four sides of the rectangle is trivial. The upper and
lower b sides are perpendicular to the B~ field, have length b, have N turns each carrying
I, and hence the magnitude of the force is:
Fb = N IbB (6.51)
We can find the direction easily using the right hand rule. It is up on upper side (with
current pointing out of the page) and down on the lower side.
The force on the a sides is hardly more difficult. Let’s consider the one closest to us in
the figure, with the current slanting down and to the right. The directed current makes an
angle of φ with the magnetic field, so the force on it is:
Fa = |N I~ ~ = N IaB sin(φ)
a × B| (6.52)
with a direction (right hand rule again) of out of the page. The hidden a side on the other
276 Week 6: Moving Charges and Magnetic Force
side (where the current slants up and to the left) has the same magnitude force and the
opposite direction.
The sum of these forces is this clearly
where I’ve fairly arbitrarily popped a coordinate system onto the picture with x to the right,
y up, and z out of the page.
Does this (F~ = 0) mean that nothing interesting happens to the loop in the field? Not at
all! The two Fa forces are indeed uninteresting, as they act along the same line (along the
axle, in fact) and exert neither force nor torgue on the system. The two Fb forces, however,
do not act along the same line. They exert a torque on the loop!
How large a torque? Recalling that ~τ =~r×F ~ where ~r is a vector from the pivot to the
force, the torque from the upper b side using the pivot shown (so that r = a/2) is:
a
τb = Fb sin(θ) (6.54)
2
and points in to the page. The torque from the lower b side is identical in magnitude and
has the same direction (into the page). The total torque thus has magnitude:
~
τ =m ~
~ ×B (6.57)
Fup
∆y
−Fx Fx
r.h.n (out)
y
N turns
∆x
Current I each
pivot
F
down
Figure 6.11: Arbitrary plane loop of current can be broken into small pieces that are aligned
with or perpendicular to torque axis.
In figure 6.11 we see a golf-putting-green shaped loop of current carrying wires in a plane.
As before, there are N turns carrying a current I, and I’ve drawn an arbitrary rotation
~ field that the loop will be in and located at the end
axis/pivot that is perpendicular to the B
of the (each) loop rectangle for convenience.
As you can see, one can take the curve and break it up into perpendicular segments
that approximate the curve arbitrarily closely as the ∆x and ∆y segments are made
smaller and smaller. If one considers just one such opposing pair of segments each (the
shaded/textured areas in the figure), the forces Fx between the ∆y parts of the curve are
equal and opposite and along a common line parallel to the axis of torque. They contribute
no force and no torque in a uniform field so we don’t even bother to sum over them, we just
ignore them.
The forces between the ∆x parts of the curve (the direction that would have been
into or out of the page in the rectangular figure above) are also equal and opposite, but
they are typically offset so that they do not act along a common line but rather one with
a perpendicular displacement of y sin(θ), where θ is the angle between the B ~ field and a
right handed normal to the figure. y (for this small segment of current) thus acts like the a
coordinate in the rectangular figure above, ∆x acts like a very short piece of the b segment.
This pair of forces does contribute a net torque (magnitude) for just this little strip of the
total wire of:
∆τ = y∆xN IB sin(θ) (6.58)
Summing over all of the strips of width ∆x, the total torque on this plane loop is thus:
X Z
τ = N I lim y(x)∆x B sin(θ) = N I y(x)dx B sin(θ) = N IAB sin(θ) (6.59)
∆x→0
or (including the vector direction from the right-hand-rule applied both to the torque and
278 Week 6: Moving Charges and Magnetic Force
~
τ =m ~
~ ×B (6.60)
with
m
~ = N IAn̂ (6.61)
We see that our rule for the rectangular loop above is thus general and applies to any
plane loop of current, no matter what the shape.
As before with electric dipoles, we must do work rotating a magnetic moment from one
angle to another in a magnetic field, working against the torque. The work we do to rotate
the dipole equals the potential energy stored in the system (the magnetic dipole and field
combined). We can compute this potential energy by following the derivation we used for
electric dipoles, using as before a zero of the potential energy when the dipole is at right-
angles to the magnetic field. That is (given τ = −mB sin(θ), with sign opposite to the sign
of θ):
Z
U = − τ dθ
Z θ
= − (−mB sin(θ)) dθ
π/2
= −mB cos(θ)
or
U = −m ~
~ ·B (6.62)
Note that as before, U (θ) is minimum (negative) when the magnetic dipole is aligned with
the field, maximum (positive) when antialigned.
From this, we can also find the force acting on a magnetic dipole in a non-uniform
magnetic field:
dU
Fx = − (6.63)
dx
(with similar expressions for the other force components, where this derivative should really
be a partial derivative for those of you who have taken multivariate calculus).
We can now construct a table of the analogies between electric and magnetic dipole
moments and their associated fields, forces, and torgues. It is quite strong:
As noted above, magnetic fields do no work on classical charged particles, no matter how
many of them there are or how they are moving. This is a direct consequence of:
dW ~m · ~ ~ ·~
P = =F v = q(~
v × B) v=0 (6.64)
dt
Week 6: Moving Charges and Magnetic Force 279
Table 3: Similarity of results for the electric and magnetic dipoles in (or later, as the source
of) their respective fields.
From this point on, then, that’s just what we’ll do!
Not all current carrying wires or current densities will have magnetic dipole moments that
are easy to compute. In fact, most current densities will have moments that are too difficult
to compute with anything less than a computer! Imagine a spool of wire tangled up like
fishing line with a current running through it – this is only one of the infinity of arbitrary
shapes to consider, most of which cannot even be expressed as a simple function of three
dimensional coordinates! Still, our plane figure result above appears to be very useful
because when we as humans design a magnetic apparatus (say, a motor) we can certainly
choose to wrap our coils in a plane (at least approximately). Also, we can see how to at
least formulate the problem for arbitrary currents. We are therefore done (for this level of
instruction) with current loops per se until your next course in electrodynamics (if there is
one) where you will learn how to compute moments from integrals over current densities
expressed nastily in multivariate calculus.
However, there is one more generic distribution of moving charge that has an easily
computable magnetic moment that we very much need to consider before quitting. A
surprisingly common occurrence in physics is to have a “particle” (that is microscopically
85
Wikipedia: http://www.wikipedia.org/wiki/Stern-Gerlach experiment.
Week 6: Moving Charges and Magnetic Force 281
more or less a ball with a mass and a charge) that is rotating about some axis. A proton,
for example, can be modelled as a ball of some radius rp ≈ 10−15 meters, containing a
mass mp and a charge e. The proton also has a spin – an intrinsic angular momentum – of
Lz = ~/2 where ~ is Planck’s constant over 4π (a number that need not concern us in this
course – it is very small in macroscopic terms but is large as far as the proton’s physical
dimensions are concerned).
We can then build a classical model for a proton, where we imagine the proton to be
a uniform ball of charge with radius R, with total (uniformly distributed) charge Q = e, with
(uniformly distributed) mass M , spinning about some axis through its center at an angular
~ so that:
velocity Ω
~ ~ 2 5 ~
L = IΩ = MR Ω (6.65)
5
~ What
Hopefully it is clear that the proton will also have a magnetic moment parallel to L.
is not so obvious is that this magnetic moment will be directly proportional to the angular
momentum in a way that is independent of the shape of the proton (or even that it is a
proton), so that
Q ~ ~
m
~ = L = µL (6.66)
2M
for any symmetric spinning particle with identically distributed charge and mass, where I
have defined the ratio:
Q
µ= (6.67)
2M
as the classical equivalent of the “Bohr magneton” in the quantum physics of the electron86 .
Let us understand this, starting with a simpler example than a ball.
Example 6.5.1: The Magnetic Moment of a Rotating Ring of Mass and Charge
Suppose we have a ring of charge Q, mass M , and radius R spinning at angular speed Ω
about its axis of symmetry as drawn in figure 6.12
The “current” in such a ring can easily be evaluated. The total charge in the ring goes
around exactly one time in one period of its revolution. Thus:
Q QΩ
I= = (6.68)
T 2π
The magnetic moment of the ring in the (right handed) z-direction is thus just:
QΩ 2 QΩ 2
mz = IA = πR = R (6.69)
2π 2
M
If we multiply the expression on the right by M (one!) and rearrange the terms, we get:
Q
M R2 Ω = µLz
mz = (6.70)
2M
using Lz = M R2 Ω for a ring of mass M rotating symmetrically about the z-axis.
That was almost too easy!
86 e~
The Bohr magneton of the electron is µB = 2m e
, which we recognize as our µ, but with the units of ~
appended to make the remaining parameter a dimensionless quantum number.
282 Week 6: Moving Charges and Magnetic Force
M,Q
Figure 6.12: A rotating ring of charge with mass M , radius R, and charge Q has a magnetic
moment of m~ = Q/2M (M R2 Ω) ~ = µL.
~
Next consider a rotating disk of total charge Q, total mass M , radius R. The charge of a
differential ring of charge of radius r and thickness dr is just
where σ = Q/πR2 is the surface charge density of the uniformly distributed charge. The
current in just this differentially thick ring is:
Ω
dI = dq (6.72)
2π
just as it was for the ring example above. The area inside the differential ring is A = πr 2 ,
so its differential magnetic moment is:
Qπr 2 Ω Q
dmz = dIA = 2πrdr 2
= 2 Ωr 3 dr. (6.73)
πR 2π R
Integrating (to cover the disk) from 0 to R we find:
R
Q Q 4 Q
Z
mz = 2
Ωr 3 dr = 2
R Ω = R2 Ω (6.74)
0 R 4R 4
M
Once again we multiply by M = 1, do some rearrangement, and presto, change-o:
Q Q 1
mz = M R2 Ω = M R2 Ω = µLz (6.75)
4M 2M 2
Part of your homework for this week will be to re-prove these two cases and several
others to show that:
Q ~
m
~ = L (6.76)
2M
Week 6: Moving Charges and Magnetic Force 283
is quite general, subject to the condition that Q and M are proportionally distributed, so
that:
ρQ Q
= (6.77)
ρM M
at all points inside the object, and (if you are do the advanced problems) that the object is
rotating around a principle axis (an axis with enough symmetry that Ω||~ L.
~ This is simple
enough if you require that both the mass and the charge have densities that are identical
functions of coordinates and write dmz correctly in terms of those densities.
So fine, fine, fine. Given this result we can now see that our classical model proton
should have a magnetic moment that is related to its angular momentum by the simple
relation:
m ~p
~ p = µp L (6.78)
e
where mup = 2m p
, the magnetic moment of a classical electron should be the same with
e
µe = 2me and so on, and it isn’t that difficult to directly integrate over a solid sphere of
charge/mass as we did in these examples to prove it! Indeed, this result works adequately
in the case of quantum magnetic moments of elementary particles as well, as long as we
remember to use the intrinsic spin of the particles in question.
Why do we care? It is because we can use this result in a clever way by taking ad-
vantage of the motion that results when we place a proton in a strong magnetic field. The
motion, as we shall see, is a precession of the magnetic moment of the proton in a cone
around the applied magnetic field that has a precession frequency Ωp = µB independent
of the relative angle between the angular momentum or spin of the proton and the magnetic
field.
While precessing in this way, we can easily trick the magnetic dipole moments of the
charged protons to absorb or emit electromagnetic radiation of the same angular frequency
as Ωp . By detecting the signal produced by the protons in various clever ways (beyond the
scope of this course to detail, but within your capabilities of understanding if you master
the next section) we can measure the density of bare protons in almost any substance
and create a three dimensional map of that density at a remarkably fine resolution.
Protons, of course, are the nuclei of hydrogen atoms and water is dihydrogen oxide,
with two protons just waiting to be mapped. And what are we? Well, mostly water! The
precession of magnetic moments of protons around strong applied fields is the basis of
magnetic resonance imaging (MRI), one of the most important technologies in use in hos-
pitals around the world today. With MRI one can safely map out soft tissue densities of the
human body in a lovely complement to x-rays (that map out dense tissues but that go right
through soft tissue without much differentiation). My wife is a physician, and she orders
MRIs on patients on at least a weekly basis, if not a daily one.
Spin resonance is also a very important experimental probe for physicists, as this trick
works for more than “just protons”. Whether you are a potential physics major or engineer-
ing student or a premedical student, you really must master the next section, then, as it
is actually directly important to your future planned career. To encourage this mastery, I
typically tell my students that a problem on magnetic resonance and precession will be on
at least one quiz, hour exam, or on the final. This is usually enough incentive to motivate
284 Week 6: Moving Charges and Magnetic Force
them to take the time to plow through the complexities of torque as the time rate of change
of the vector angular momentum.
I present this result two distinct ways below – the first suitable for any student, the
second perhaps better for students that have mastered the concept of the cross product in
cartesian coordinates. I strongly suggest that all students at least try to master both, but
at the very least get to where you fully understand the first one.
z
B0
τ
Proton m= e L
2m p
θ L
ro
tat
io
n
Figure 6.13: A classical model for a rotating proton with a (blue) magnetic moment m ~ =
~
µp L aligned with (red) its angular momentum along its rotation axis. This proton precesses
around an applied magnetic field B ~ 0 with a precession frequency Ωp = µp B0 independent
of the particular angle θ between m ~ and B ~ 0 . Note that classically, we expect µp = e
2mp
from the previous section.
In figure 6.13 above, you can see a cartoon classical proton in a strong external mag-
netic field B~ 0 . The proton (we imagine) is spinning like a little planet – very little indeed
given that its radius is order of 10−15 meters – and hence has an angular momentum L ~
pointing in the direction up and to the right along its axis of rotation. Because its charge
is positive, it has a magnetic moment that is parallel to its angular momentum, and in the
previous section we argued strongly (leaving actual proof to the student) that as long as its
charge and its mass are identically distributed, its magnetic moment can generally enough
be written:
e ~ ~
m
~ = L = µp L (6.79)
2mp
where e = 1.6 × 10−19 Coulombs is its charge and mp = 1.67 × 10−27 kilograms is its mass
(in SI units).
As we have derived above, the magnetic field exerts a torque ~ τ on the magnetic dipole
of the proton. This torque is out of the page at the particular instant drawn above and is
Week 6: Moving Charges and Magnetic Force 285
z
B0
L
τ L
θ Lz
y
x
Figure 6.14: The torque causes (red) L⊥ to precess around the (blue) B ~ 0 field in the
direction shown, perpendicular to L⊥ , causing L⊥ to swing around in a circle. (Red) Lz is
~ 0 ! θ is the angle between L
constant as it is parallel to B ~ and B.
~
and that the direction of the torque is tangent to the circle with radius L⊥ shown in both
figures 6.14 and 6.15. Recalling our discussion of cross products in mechanics, we can
easily see that the torque only comes from, and only changes, the L⊥ component.
Since the torque is always perpendicular to L ~ ⊥ it changes its direction but not its mag-
~ ⊥ turns in a circle of radius L⊥
nitude. This is a familiar situation in physics – obviously L
286 Week 6: Moving Charges and Magnetic Force
∆φ ∆L
τ
L
φ
z(out) x
~
Figure 6.15: The torque causes (red) L⊥ to precess around the B-field parallel to the z-
axis (out of the page). L⊥ swings in a (red dashed) circle, and in a short time ∆t it moves
through an angle ∆φ and hence changes the angular momentum by (red) ∆L ~ as shown.
In a short time ∆t, the angular momentum changes a small amount with magnitude
∆L = ∆L⊥ that is the length of the arc on the L⊥ circle subtended by the angle ∆φ
through which it turns in that time, as shown in figure 6.15. In the figure it is obvious that:
∆L = L⊥ ∆φ = L sin θ ∆φ (6.84)
or (dividing both sides by ∆t) the magnitude of the rate of change must be:
∆L ∆φ
= L sin θ (6.85)
∆t ∆t
This must also equal the magnitude of the torque in terms of the field, in the limit that we
let ∆t → dt, so (combining the two pieces):
dL dφ
τ= = µp L sin θB0 = L sin θ = L sin θΩp (6.86)
dt dt
where we define:
dφ
Ωp = (6.87)
dt
to be the angular speed of precession87, commonly referred to as the Larmor fre-
quency 88 in the context of spin resonance.
87
This is a case where one can equally well call it an angular velocity, as the angular momentum sweeps out
a real, physical angle per unit time, or an angular frequency since one measures the angular frequency of the
radio waves emitted by the precessing angular momentum in e.g. nuclear magnetic resonance experiments
or MRI. Most of the literature will call this ωp or ωlarmor , in other words, but we are sticking with calling angular
speeds involving actual angles Ω and true angular frequencies involving no actual angles ω, to avoid confusing
the student in problems where both occur.
88
Wikipedia: http://www.wikipedia.org/wiki/Larmor precession. This article introduces and defines the gyro-
magnetic ratio and g-factor discussed in the next section, but is otherwise a bit quantum and complex for this
course.
Week 6: Moving Charges and Magnetic Force 287
Solving for the angular (Larmor) speed of precession (cancelling L sin θ), we find that:
Ω p = µ p B0 (6.88)
independent of the angle θ between m ~ and B ~ 0 ! This latter fact, that the frequency of
precession is independent of the angle, is a key aspect of magnetic resonance as it allows
us to match an external rotating magnetic field to this frequency to make some magic
happen – the “resonance” part.
Note well that this derivation, while correct enough for the moment, doesn’t directly
result in equations of motion for the individual components of the angular momentum –
it is at least somewhat heuristic and relies on the pictures and visualization as much as
the algebra. It is easy enough, however, to write down the three coupled equations of
motion for Lx , Ly , Lz using the cartesian form for the cross product. One of these is trivial
as there is no torque in the direction of B ~ = B0 ẑ. The other two first order coupled
differential equations become second order equations for the oscillatory motion of Lx
and Ly separately. The solutions, however, are not independent, as the phase of one is
determined by the phase of the other. Indeed, the solution describes L ~ ⊥ tracing out an
explicit circle at the precession frequency.
Instead of covering this solution in the text, this is left as an “advanced” (optional) home-
work problem for the interested student or a required problem for physics/engineering/math
majors in the homework for this chapter. The math for this, note well, is very similar to the
math used to derive the wave equation for the electric and magnetic field components from
Maxwell’s equations in a few chapters, so it isn’t completely crazy to give this a try now
even if you don’t “have” to to make it easier on yourself then!
One of the primary reasons for many students to take a course in electricity and magnetism
is to learn enough about how magnetic fields and moments work that they can understand
Magnetic Resonance Imaging (MRI). MRI is one of the most important non-invasive di-
agnostic tools available to physicians practicing modern medicine. It is also not terribly
easy to understand even for physics majors because to completely understand it one has
to understand a lot about both quantum mechanics and spin relaxation to do a completely
proper job of it. This is especially true given that there are multiple somewhat distinct meth-
ods (that provide some degree of choice in contrast and resolution) that all come under the
general heading of MRI and are all options on the hardware that can accomplish different
purposes.
However, at this point you should know enough to understand a sort of a “toy model”
of just how at least one or two of the MRI methods work, including the one that is arguably
the most important (conceptually) to understand. This section is devoted to presenting just
such a toy model. It deliberately omits most of the discussion of the quantum mechanics
involved, while necessarily introducing certain very general terms and describing in a qual-
itative manner the key processes related to those terms. This section should very definitely
be viewed as “optional” for most students but may serve as an introductory reference for
students who are interested or who are confused by other descriptions.
288 Week 6: Moving Charges and Magnetic Force
Note well that this presentation is my own conception of the process, and while I do
have some research experience with the related quantum theory of photon resonance and
photon echos, I am far from being an expert on MRI and nuclear spin echos in particular in
the context of MRI or otherwise. Those who are more expert than I who read this and find
errors are encouraged to contact me to correct them, as long as the correction preserves
the general semi-classical, functional presentation I am attempting that is as appropriate
for introductory physics non-major students interested in the life sciences or medicine as it
might be for future physics majors.
Although it can detect and map a number of nuclei, MRI is used medically to map pri-
marily the density of hydrogen nuclei – protons – in the human body. This is because
hydrogen is by far the most abundant element in addition to being one of the most re-
sponsive (in terms of having a large γ, defined below). Many of these protons are bound
up in water molecules and are screened to some extent from electromagnetic fields and
radiation by the surrounding molecular electron clouds. One can then treat them as “iso-
lated” protons and add a phenomenological correction that describes the effect of small
variations in their local fields magnetic fields.
From our discussion above, we classically expect the magnetic dipole moment of an
isolated proton to be given by an expression such as:
e ~
m
~p= S (6.89)
2mp
where S ~ is the spin (intrinsic, quantum, mechanical) angular momentum of the proton.
There are several problems with this equation. The proton is not, in fact, a homogenous
ball of spinning mass and charge. It is a composite particle made up of three quarks bound
together with the strong nuclear force, and they, not the proton per se, have spin angular
momentum (a purely quantum mechanical kind of angular momentum), not the classical
“orbital” angular momentum described by L. ~ Classically, the angular momentum L ~ we
have studied can have any value, but in quantum theory the spin angular momentum S ~ is
quantized so that it can only take on certain values, generally integer or half-integer values
of ~ = 1.05 × 10−34 joule-seconds. We will model such a spin as a classical fixed angular
momentum that can point in any direction but not change its magnitude.
It is well beyond the scope of this course to go into any more detail about the quantum
theory of angular momentum used to effectively set that magnitude, but at the same time
we do want to connect the spin angular momentum of a proton, whatever it might be, to
its magnetic dipole moment. This is accomplished by using a semi-empirical parameter γ
such that:
m~ p = γS~ (6.90)
~
In this equation, γ is the so-called gyromagnetic ratio for the proton relating its spin S
(whatever value it might have) to its magnetic moment. It is commonly written as:
ge
γ= (6.91)
2m
where the dimensionless g is called (creatively enough) the “g-factor” and simply adjusts
the classically expected gyromagnetic ratio (where g = 1) to the correct tabulated (mea-
Week 6: Moving Charges and Magnetic Force 289
sured and/or calculated) g-factor for the particle with spin under consideration. A concise
discussion of this and accepted values for g/γ are available on Wikipedia89 90 .
We’ll begin by borrowing a key concept from statistical mechanics without introducing
the Boltzmann constant or defining more precisely what we mean by high and low tempera-
tures. Qualitatively, then, at “high” temperatures, all of the possible energy states available
to a collection of many particles with spin in an external magnetic field are equally popu-
lated. As one lowers the temperature of the system, it becomes more and more likely to
find spins in energy states with lower energies instead of higher energies, out of the finite
range of energies the particles can have. At “low” temperatures, most of the spins will
therefore be in those states with the lowest available energies.
Suppose, then, we have a large number of protons in a coarse-grained chunk of matter
that we will call a “spin packet”, so named because all of the spins in that chunk experi-
ence “the same” external magnetic field from some external arrangement of coils carrying
currents, to some resolution. In the absence of any strong external magnetic field (when
those coils are turned off, that is) and neglecting any internal spin-spin interactions, all of
the spins have no magnetic potential energy – so all temperatures become “high” tem-
peratures relative to the range of available energies – and the spin angular momentum
(and magnetic moment) of any given proton in the chunk is equally likely to point in all
possible directions. The total spin angular momentum and magnetic moment of the spin
packet should therefore be very nearly zero, independent of temperature.
Following from this, if we put our spin packet of protons into a strong external magnetic
field in (say) the z-direction:
~ = B0 ẑ,
B (6.92)
(where B0 is the magnitude of the field within that packet, but might be slightly different for
neighboring spin packets) the potential energy of each proton suddenly does depend on
its direction relative to the field:
U = −m ~ = −mz B0
~ p·B (6.93)
where mz is the z-component of its magnetic moment, mz = γSz . The magnetic potential
energy of any given proton is minimized when its spin is aligned with the external field,
and is maximized when it is antialigned with the external field. In general this makes it
more probable that spins will be found at least partially aligned with the field than either
antialigned or randomly aligned where the lower the temperature of the spin packet (or
the stronger the external field!), the greater the expected degree of alignment in thermal
equilibrium.
If one starts from no field and then “suddenly” turns on a field, the spin packet is initially
not in thermal equlibrium. Nor will the external magnetic field itself put it into thermal
89
Wikipedia: http://www.wikipedia.org/wiki/G-factor (physics). This is provided primarily for physics majors,
who will one day be expected to understand most of the omitted details in this discussion. Note well that this
article uses ±~/2 as its spin angular momentum,
p which is technically the magnitude of Sz , not its total angular
√
momentum which would usually be given as s(s + 1)~2 = 3/2~ for a spin- 21 particle. This is irrelevant to
my purpose here, though, which is to give you a decent conceptual idea of what happens to generate a spin
echo, given the appropriate resonance frequency.
90
Wikipedia: http://www.wikipedia.org/wiki/Proton magnetic moment. This is specific to the NMR/MRI/spin
echo theory for a model proton that we are discussing.
290 Week 6: Moving Charges and Magnetic Force
equilibrium – the conservative magnetic interaction itself will just cause individual spins to
precess around the magnetic field in the z-direction with a constant potential energy and
won’t alter the angle between the spin and the external field at all!
There are two ways the spin can alter its orientation relative to this simple precession
around the strong external field. The first is that each spin interacts with the bulk material
it is a part of, referred to as the “lattice” of atoms and molecules, and can absorb energy
from or lose energy to this lattice. This is called the “spin-lattice” interaction and leads to
spin-lattice relaxation towards thermal equilibrium with that lattice. In time, this interaction
causes the spins in the packet to align more with the external field than against it, devel-
oping a macroscopic collective magnetic moment as it does so in the direction of the field.
The relaxation process is best described by our “saturating exponential” solution like that
describing the charging of a capacitor in an RC circuit, with a time constant called T1 , the
spin-lattice decay time.
Typical values for T1 for protons in living tissue at fields of 1.5 tesla and normal body
temperatures are observed to fall between 1 and 2 seconds and get longer at higher field
strengths. One achieves 99% of the possible peak magnetization of a sample in 4.6 × T1
seconds after turning on the field with the spins initially in an random, unmagnetized state
(a useful number to remember for exponential saturation processes).
The second way a spin can alter its orientation is via local spin-spin interactions – one
spin reacting to the magnetic field produced by a second, nearby spin as both interact
systematically with the strong external field and more or less randomly with the lattice.
These interactions are conservative and can only exchange energy, but they do cause the
components of the spins perpendicular to the direction of the strong field to spread out and
experience a transverse exponential “spin-spin relaxation” with a characteristic exponential
decay time referred to as T2 .
A third way spins in a bulk sample can experience a kind of “reversible” transverse
relaxation is because the strong external field experienced by all of the spin packets that
make up the sample is not exactly the same or equal to the “average” magnetic field in the
sample! Some spin packets will experience slightly stronger fields than the average and
will systematically pull ahead of the precession of a spin packet in the average field. Others
will be in slightly weaker fields than average and will systematically get left behind. This,
too leads to an energy-conserving exponential decay of the average component of the
magnetization (if any) perpendicular to the external field called inhomogeneous relaxation
T2,inhomogeneous .
Even this isn’t everything. Spins can interact with the more or less “fixed” static mag-
netic field produced by their local chemical environment – the orbiting electrons, for exam-
ple to produce an effect similar to and mixed in with spin-spin T2 . The T2 decay times can
depend on the field strength itself, and hence on T1 . When we (eventually) hit a system
with a pulse of radiation at some fixed frequency, that frequency is “broadened” by purely
Fourier effects, twisting the simple rotations described below sideways. We will lump all
of the transverse decay times into one “average” time and call it T2∗ , where a significant
fraction of T2∗ will be due to T2,inh , and leave it at that.
Now let’s trace out of the mechanics of spin echoes and magnetic resonance by con-
Week 6: Moving Charges and Magnetic Force 291
sidering a chunk of protons (made up of many spin packets with slightly different resonant
~
frequencies) that have been in a strong B-field for many times T1 and hence have more or
less completely relaxed to the low energy state of maximum equilibrium alignment with the
~ field.
strong external B
z
B0 z
B proton
y
x
~
Figure 6.16: A single proton precessing around a strong B-field aligned with the z-axis will
~
also precess around a weak B-field rotating around the z-axis at the resonant frequency
in the “rotating frame”.
Suppose we look at a single proton in this very strong field oriented in the z-direction,
where the field is represented as the thick blue arrow in figure 6.16. Its spin/magnetic
~
moment is classically portrayed as a solid red arrow initially “mostly” parallel to the B-field
but with a small x̂ component. In the absence of any other fields:
~
dS
τ tot =
~ =m ~ = γS
~ ×B ~ × (B0 ẑ) (6.94)
dt
We solved this above with a few minor tweaks – m ~
~ precesses rapidly around the B-field
aligned with the z-axis with angular Larmor frequency Ωp = γB0 , as we derived above
for the classical angular momenta of spinning charged balls. Fortunately, this classical
behavior is almost perfect preserved in the case of quantum spins, so our classical picture
is good enough for us to conceptually understand what is going on!
Note that so far, we are assuming that this proton is in a locally uniform field that is
precisely equal to the average field for the entire sample, and are momentarily neglecting
both spin-lattice relaxation (T1 ) and the spin-spin interaction between neighboring protons
and other T2 stuff because they are slow relative to the period of precession of the spin in
2π
an e.g. 1.5 T field, Tp = . We’ll add all of this stuff back later.
Ωp
Now, at t = 0 we mentally turn on a much weaker magnetic field:
~ ⊥ (t) = B⊥ (cos(Ωp t)x̂ − sin(Ωp t)ŷ)
B (6.95)
while leaving the strong field in the z direction on. Note that this field is perpendicular to
B0 ẑ, initially points (at t = 0) in the x̂ direction, and rotates around the z-axis at the
292 Week 6: Moving Charges and Magnetic Force
same (average) Larmor frequency Ωp and in the same direction that the magnetic
dipole moment of our typical proton precesses around the strong field!
Technically, then, we should try to solve the differential equation involving all three field
components, two of them time dependent:
~
dS
τ tot =
~ =m ~ = γS
~ ×B ~ × (B⊥ cos(γB0 t)x̂ − B⊥ sin(γB0 t)ŷ + B0 ẑ) (6.96)
dt
~
to obtain S(t) and hence m(t),
~ but in general this highly nonlinear differential equation is
a mess and trying to solve it analytically will make you very sad!
Fortunately, however, a mathematical “miracle” happens in the special case when
B⊥ ≪ B0 . When this is true, it turns out we can describe the motion of the magnetic
moment fairly accurately by transforming our entire description to the so-called rotating
frame91 – a frame that is rotating around the z-axis at the Larmor frequency Ω = γB0 and
in the same direction that both B~ ⊥ and m
~ are precessing. In this (primed) frame:
~ ′⊥ (t) = B⊥ x̂′
B (6.97)
~ ′ (primed as it
is stationary. If B⊥ = 0, the magnetic dipole moment of our ideal proton m
is expressed in the rotating frame) is also stationary in this frame.
In this special case, we can more or less ignore B0 altogether in the rotating frame
and solve for the precession of m ~ ′ around B ~ ⊥ as if it were the only field present! This
precession will be “slow” compared to the rate the moment is whipping around the z axis,
which is why the motion can be separated in this way. In the end, we can find the solution
in the rotating frame and then rotate that solution back into the lab frame via Ωp to get a
very respectable description of the state in the lab frame at any particular time!
Understanding (and quantitatively describing) the motion is now easy! In the rotating
frame m ~ ′ (comparatively) slowly sweeps out a cone around the x′ -axis at frequency γB⊥
while at the same time, in the lab frame both m ~ ⊥ continue whirling around the z-axis
~ and B
at the resonant frequency Ωp = γB0 !
By varying the time you leave the perpendicular field B ~ ⊥ turned on, you can cause this
′
ideal proton’s magnetic moment m ~ to sweep out a cone through the controlled azimuthal
angle π/2 (a so-called “π/2 pulse”) or π (a “π pulse”) around the rotating x′ axis! We can
even compute just how long the rotating field should be turned on to accomplish either one
because we know that one period of slow precession in the rotating frame should be:
2π
Trot = (6.98)
γB⊥
so that a rotating field turned on for a quarter of a period, Trot /4 would be a π/2 pulse while
a half a period Trot /2 would be a π pulse!
91
The easiest way to think of the rotating frame is to imagine a flat merry-go-round with an x′ -y ′ axis painted
onto its floor and the center pole marked as the z = z ′ axis. When this frame rotates around the z/z ′ axis at
~
the Larmor frequency in the right direction, a spin precessing around a strong B-field lined up with z ′ appears
– to riders on the merry-go-round – stationary, with constant Sx′ , Sy ′ , Sz ′ components! The trick, then, is
~ ⊥ is also stationary in the rotating frame and weak enough that the spins that were
to arrange it so that B
stationary in the frame now slowly precess around B ~ ⊥ – in that frame while the entire frame rotates.
Makes you kind of dizzy, doesn’t it...
Week 6: Moving Charges and Magnetic Force 293
~ ′ vector from
Figure 6.16 illustrates the effect of a π-pulse, taking the (initial) solid red m
mostly aligned with z to the (final) dashed red m ~ ′ vector that is mostly anti-aligned with z.
Note well that this corresponds to the proton absorbing energy from the electromagnetic
radiation associated with the rotating B ~ ⊥ -field! For times that are short relative to the
spin-lattice relaxation time T1 , the proton will remain in this new orientation, still precessing
around the z-axis at the Larmor frequency Ωp but in a higher energy state than it began in
at time t = 0!
Now consider a collection of protons in a spin packet in thermal equilibrium with the
strong field. By regulating the amount of time you leave on the rotating transverse field,
B~ ⊥ , one can take the entire spin packet from the initial low energy state where its spins
are mostly parallel to B0 ẑ into almost any desired state of spin polarization in the rotating
frame, where they will remain when the transverse field is turned off until spin-spin and
spin-lattice relaxation first destroys the transverse component of the collective magnetiza-
tion and as it does so, gradually rebuilds the collective moment in the z direction!
We are now finally ready to understand spin echoes and how they enable Magnetic
Resonance Imaging (MRI). Here is the sequence of events.
z
(rotating frame)
First, a collection of protons is placed in a very strong B-
field oriented in (say) the z-direction. The protons quickly
(in, say, more than 4.6 × T1 to achieve over 99% magneti-
zation) “relax” so that they are predominantly in the lowest
potential energy state with their spins mostly aligned with
y this field consistent with thermal equilibrium at the tem-
x
perature of the sample.
z
(rotating frame) ~ ⊥ (t) (in the x′ -direction in the rotating frame) is
Next, B
π/2 pulse
P
turned on in a π/2 pulse, rotating the collective i m ~ i of
many protons all at once so they end up pointing in the
B π/2
y ′ -direction in the rotating frame. For a moment electro-
magnetic energy is strongly radiated out of the sample by
y the large collective rotating magnetic dipole, but that en-
x
ergy is mixed in with the π/2 pulse and difficult to detect.
z
(rotating frame) Third, individual spin packets with collective moment m~i
dephasing
have very slightly different precession frequencies and
around z systematically “dephase” due to T2,inh and get ahead of
or behind the “ideal spin” that is precessing at precisely
the Larmor frequency Ωp (and hence is stationary in the
P
rotating frame). The total moment i m ~ i summed over
y
x the spin packets in a given coarse grained chunk of the
material being scanned rapidly averages to zero.
294 Week 6: Moving Charges and Magnetic Force
z
(rotating frame) At a time TR (where T2 < TR < T1 ) after the π/2 pulse,
π pulse the protons are hit with a π pulse of B ~ ⊥ (t) that rotates
at time τ ′ ′
them around the x -axis so that their y component (only)
inverts in the rotating frame. The spin packets remain in
the same slightly inhomogeneous field, however, so their
collective moments continue to get ahead or behind in the
y
x same direction in the rotating frame, unwinding the total
angle they accumulated since the π/2 pulse!
z
(rotating frame) At a time 2TR after the original π/2 pulse, the magnetic
moments of all of the spin packets in the coarse grained
moments come back
together at time 2τ chunk all come back together into a single collective mag-
netic moment, in a quiet environment with no competing
radiation! This large collective rotating dipole (somewhat
reduced by the spins that relaxed back into the mostly
y
x aligned state during 2TR ) strongly radiates electromag-
netic energy, in a so-called spin echo pulse.
TR 2TR +t
initial signal final signal
(hidden)
Figure 6.17: The key steps in nuclear spin echoes. The two red pulses show the intensity
of an external rotating magnetic field applying π/2 and π pulses. The two blue pulses show
the intensity of the radiated signal from the rotating spins. The black dashed curve shows
the gradual loss of intensity in the second “echo” pulse due to T1 (and the irreversible part
of T2 ) decay back into a state aligned with the external field during the entire process.
Week 6: Moving Charges and Magnetic Force 295
Problem 1.
Physics Concepts
Make this week’s physics concepts summary as you work all of the problems in this
week’s assignment. Be sure to cross-reference each concept in the summary to the prob-
lem(s) they were key to. Do the work carefully enough that you can (after it has been
handed in and graded) punch it and add it to a three ring binder for review and study come
finals!
Problem 2.
A particle with mass m and charge q a has a velocity ~ v perpendicular to a uniform magnetic
~ ~
field B (with magnitude B = |B|). Find: a) the radius R of its orbit; b) the period of the
orbit; c) the momentum of the particle; d) the kinetic energy of the particle. All answers but
the first should be in terms of q, m, B and R – no v should appear in b-d.
Problem 3.
A rigid circular loop of wire with mass m, N turns and radius R carries a current I in each
turn and is sitting on a rough table. There is a horizontal magnetic field B that is parallel
to the surface of the table in some direction (call it x). What is the minimum value of B
sufficient to lift on edge of the loop off of the table? On your figure, clearly indicate which
edge lifts relative to the directions you select for I and B.
Week 6: Moving Charges and Magnetic Force 297
Problem 4.
B
m θ
N turns, carrying I
A circular loop of wire with radius R, N turns, and total mass M carries a current I.
It is pivoted about a line that passes through the loop as shown, then placed in a uniform
magnetic field B ~ = B0 ẑ so that its magnetic moment makes an initial angle of θ ≪ π with
the z-axis at time t = 0, and is then released.
Describe its small-angle motion quantitatively. Note well that this arrangement has no
angular momentum to speak of and will not precess!
Problem 5.
Problem 6.
ω Q,M
A nonconducting rod of total mass M and length L has a charge Q uniformly distributed
along it. It is pivoted around one end and is rotating in the x − y plane around the z-axis at
angular frequency ω.
a) Consider a small bit of charge dq a distance r from the pivot and compute its average
magnetic moment in the z-direction, dmz .
b) Integrate this result and find the total magnetic (dipole) moment of the rotating rod
mz
Q
c) Show that the result can be expressed as mz = 2M Lz where Lz is the angular
momentum of the rod about the pivot (that is to say, in the z-direction).
Week 6: Moving Charges and Magnetic Force 299
Problem 7.
z
ω
R, M, Q
A disk of radius R and thickness t, with uniform charge density ρq and uniform mass
density ρm is rotating at angular velocity ~ω = ω ẑ.
Consider a tiny differential chunk of the disk’s volume dV = dA t located at r, θ in
cylindrical polar coordinates. Note that this chunk is orbiting the z-axis at angular frequency
ω in a circular path.
a) Find the magnetic moment dmz of this chunk in terms of ρq , ω, dV and its coordi-
nates.
b) Find the angular momentum dLz of this chunk in terms of ρm , ω, dV and its coordi-
nates.
c) Doing the two (simple) integrals, express them in terms of the total charge and total
mass of the disk, respectively, and show that the magnetic moment of the disk is given by
m ~ where µB = Q .
~ = µB L, 2M
d) What do you expect the magnetic field of this disk to look like on the z axis for z ≫ R?
(Answer in terms of m
~ is fine.)
Advanced Problem 8.
Using the insight gained from the previous two problems, consider any of the symmetric
distributions of charge and mass, where the mass distribution is the same as the charge
distribution and where both are “balanced” rotationally. Find a relationship between dI (the
moment of inertia of a small chunk of mass dm at a radius r) and dmz (the magnetic mo-
ment of the same small chunk of charge dq at the radius r) to show that for all distributions
Q
with sufficient (balanced) symmetry that Lz = IΩ, mz = 2M Lz . This result therefore holds
for spheres, cylinders, disks, rods (in a plane), spherical or cylindrical shells, etc.
300 Week 7: Sources of the Magnetic Field
Advanced Problem 9.
A semi-infinite thin solenoid aligned with (say) the negative z-axis so that the “+” end is at
the origin creates a magnetic field that looks like that of a point magnetic charge qm at the
origin:
~ = km qm r̂
B
r2
at points “near” the end and outside of the solenoid itself. Note that km = µ0 /4π = 10−7
N-m/A2 is the magnetic field constant, analogous to ke for the electric field, and that µ0 is
called the magnetic permeability, none of which matters more than algebraically for this
problem but which is important next week!
Suppose you take a small bar magnet and place it at ~ r = rr̂ so its magnetic moment
~ is aligned with r̂. Find the force acting on it (if any).
m
What would you expect its motion to be if you placed it at the same point so that its
moment was not initially aligned with the magnetic field?
~
dL ~ ×B
= µp L ~ (6.99)
dt
• Magnetic and electric fields are clearly connected in many ways (some of them still
to be learned). A perfectly reasonable question is: “Are magnetic fields created by
a magnetic charge the same way electric fields are created by electric charge?” We
will refer to such an isolated ”north” or ”south” pole as a magnetic monopole.
It is empirically the case that no isolated magnetic charges have been experimentally
observed, in spite of an electromagnetic theory that “begs” for them, a quantum
theory that can explain charge quantization if a single magnetic monopole exists in
the Universe, and in spite of an intense experimental search for them. It is probably
safe to say that magnetic monopoles are at the very least rare in all the places we’ve
looked for them!
where the magnetic field constant km = 10−7 tesla-meter/ampere exactly. Note that
this constant is exact because SI units define the coulomb of charge as an ampere-
second, not the other way around – Ampere “got there first”.
•
Idℓ~ × r̂
~ = km µ0 Id~
ℓ × r̂
dB 2
= (7.4)
r 4π r 2
is known as the Biot-Savart Law (Bee-oh Sa-var Law) for the magnetostatic field
produced by stationary currents in a wire. In this expression, d~
ℓ is a differential
301
302 Week 7: Sources of the Magnetic Field
length of the wire with a direction pointing in the direction of the current I in the wire.
It must be integrated over the wire(s), presumed to carry a constant (or very slowly
varying) current I.
• You should be able to use the Biot-Savart law to find the field of a straight wire seg-
ment, a current carrying loop on its axis of symmetry, and on the axis of symmetry
of a rotating ring or disk of charge. From either of the latter two (far from the disk
or ring) you should be able to guess the general magnetic field of a magnetic dipole
in terms of its dipole moment in analogy with the field of an electric dipole. Note
that the results of doing this are not included in this summary because you aren’t to
memorize them, you are to learn how to find them!
• The actual source for magnetic fields (in the absence of monopoles) is moving charge,
either in the form of a more or less continuous current or from moving discrete point
charges. The correct description of the field produced by a moving point charge re-
quires the theory of relativity and hence is beyond the scope this course, but one
can obtain an approximate result for the field produced by a slowly moving point
charge (relative to the speed of light) from the Biot-Savart Law by treating the cur-
rent in a differential chunk of the wire as a “moving point charge”, that is I d~ v (a
ℓ ⇔ q~
coarse-grained equivalence we have used before). The result is : 92
~ = km q~
B
v × r̂
=
µ0 q~
v × r̂
(7.7)
r 2 4π r 2
• Ampere’s Law – for magnetostatic/steady-state currents – is
I Z
~ · d~
B ℓ = µ0 Ithru C = µ0 J~ · n̂dA (7.8)
C S/C
92
Note Well! This result is not general and will not work for charges moving at any appreciable fraction of
the speed of light or at points in space that are “distant” from the source charge! The correct magnetic field
produced by a point charge at the origin moving in the z direction at speed v at a point P = (r, θ, φ) in spherical
polar coordinates is:
2 !
1 − vc2 v × r̂
q~
B~ = km (7.6)
2
1 − v2 sin2 θ r2
c
which obviously reduces to the form above when v ≪ c. You are not responsible for knowing this form (or
the related form for the electric field of a moving charge, when the finite speed of propagation of both fields
are taken into account) but far too many textbooks give the non-relativistic Biot-Savart result for the “field of a
moving point charge” without the v ≪ c qualification.
Week 7: Sources of the Magnetic Field 303
• ...there is a conceptual error in this “broken” version Ampere’s Law. The current I
through an open surface S bounded by a closed curve C is not invariant as evaluate
the current through different surfaces bounded by the same closed curve C!
As a challenge for physics or math majors (or others who just like a challenge!): From
this one observation, plus your knowledge that charge is conserved (so that the net
flow of charge out of any closed volume must equal the rate at which the charge
inside that volume decreases in time:
dQ
I
= − J~ · n̂dA (7.9)
dt S
you should be able to deduce the necessity for an additional term known as Maxwell’s
Displacement Current which makes the total current invariant as we select surfaces
bounded by the closed curve C to compute the current through.
If you can do this on your own without looking ahead in the textbook, well, you’re just
a bit late for a Nobel prize, but this is the general idea for how you will eventually go
about winning one: find an inconsistency in a physical theory and resolve it. Unify
two fields! Explain something mysterious in a way that agrees with observation! You
too can have your name on something one day...
• On a more mundane level, all students taking this course should learn to use Am-
pere’s Law to find the magnetic field of any steady state: cylindrically symmetric
current distribution, a (long) solenoid, a toroidal solenoid, and a plane sheet of
current (and trivial integrable or summable variations of these four special geome-
tries). These are all taught and reinforced in lecture, in the examples below, and in
the homework (and in-class problems if your particular class is using active Team
Based Learning).
• Useful True Fact: We do not usually deduce a scalar magnetic potential analogous to
the electric potential, because magnetic fields do no work on isolated point charges,
so our entire method for deriving a potential fails! In a future, more advanced, elec-
trodynamics class physics majors will learn about a vector potential that leads to the
magnetic field by virtue of a different form of multivariate vector differentiation (taking
the “curl” of the vector potential) rather than taking the gradient of the potential as
is the case for finding the electric field from the electric potential. This is why we
will stop with “direct” evaluation of the magnetic field via the Biot-Savart law or via
Ampere’s Law for the cases where we can exploit symmetry to make the evaluation
easy (enough) to teach the laws without undue mathematical hardship and omit any
further discussion of magnetic potentials in this textbook.
At this point we know a rather lot about the magnetic field. We know that moving charges
experience a magnetic force when they move through a magnetic field, and we further
know that that force is “odd” compared at least to the Coulomb electrostatic force which
304 Week 7: Sources of the Magnetic Field
(like gravity) acted on the “right line” connecting two charges. As you can see (looking over
the chapter summary above), there are other surprises associated with the magnetic field
waiting to be learned, some of which will eventually lead us toward the theory of special
relativity and more! For now, though, it is time to search for the sources of the magnetic
field.
It is perfectly reasonable to begin our search by saying to ourselves: “Gee, we just
spent all of this time learning about electrostatic fields coupled to monopolar electrical
charges that behave like ke qe /r 2 . I know about the gravitational field too, which behaves
like Gm/r 2 . Is it just barely possible that there is a quantity that behaves like a gravitational
mass or an electrical monopolar charge that is similarly a source of the magnetic field so
that:
~ r ) = km qm r̂
B(~ (???) (7.10)
r2
for a magnetic point charge located at the origin?”
If there is, then we would expect the field of a collection of “magnetic monopolar”
charges qm 93 to be given by:
X km qmi (~
r−~
ri )
~ r) =
B(~ (7.11)
|~ r i |3
r−~
i
~ 12 = km qm1 qm2 (~
F
r1 − ~ r2 )
(7.12)
|~
r1 − ~r 2 |3
and so on and so forth (magnetic potential energy, magnetic potential, etc), where I’ve
introduced a magnetic force constant km equivalent to ke to set an SI scale for the units of
magnetic charge and force.
If monopoles such as these existed, clearly I could derive a Gauss’s Law for Mag-
netism: I Z
~
B · n̂dA = 4πkm Qm,in S = µ0 ρm dV (7.13)
S V /S
proceding exactly as I did before for an isolated electrical charge! This would make the
static electrical and magnetic fields, at least, identical to one another, and even would
suggest that there would be a force on a magnetic charge moving in an electrical field
that has that pesky velocity dependent cross product in it, to maintain the symmetry even
further.
As we’ll see later, we even know what the magnetic field constant km would have to be.
It is:
km = 1.00000... × 10−7 tesla − meter/ampere (7.14)
exactly (exactly because it defines the coulomb, not the other way around) as was deter-
mined and defined by Ampere in his experiments on magnetism!
93
I meditated for quite a time what symbol to use for magnetic charge in this book. There are no particularly
good choices. The one I initially leaned towards is g, which is sort of like a q but backwards, but this conflicts
with the gravitational field. I finally went with qm , even though this will require me to sometimes refer to
electrical charge as qe when I’m discussing the two kinds of charge together. This is tedious, however, in the
long run, so be warned: q by itself will generally refer to electrical charge; I will always add the subscript m
when discussing magnetic monopoles.
Week 7: Sources of the Magnetic Field 305
At the time of Maxwell and for twenty or so years afterwards, the lack of any evidence
for monopoles was simply an accepted fact. Pierre Curie, on the other hand, pointed out in
the 1880’s that Maxwell’s equations could be made consistent with magnetic monopoles,
so they might exist, but again lacking evidence that they did, nobody took the possibility
seriously. Finally, in 1931 Paul Dirac published a theoretical demonstration that if at least
one magnetic monopole existed in the accessible Universe surrounding a point electric
charge, it would explain charge quantization, something that at the time was a complete
mystery.
Well, dangle bait like that in front of a bunch of physicists and they’ll be haring off
to the laboratory to search for magnetic monopoles, visions of Nobel Prizes and trips to
Stockholm to meet the king dancing through their minds. For at least 90 years at this
point, intense effort has been expended searching experimentally for magnetic monopoles
using a variety of ingeneous methods. This search aggressively continues to this day since
magnetic monopoles turn out to be more or less required for most of the current “grand
unified theories” (GUTs) or “theories of everything” (TOEs) to be consistent. In particular,
they would have needed to be present in the early Universe during the time electromagnetic
charges “condensed” out of the unified-field “soup” that is hypothesized to have existed at
the earliest moments of the Big Bang. It is still a more or less guaranteed Nobel Prize to
the researcher who first finds them in an experimentally reproducible way94 .
Alas, no isolated magnetic monopoles have been experimentally observed, in spite of
an electromagnetic theory (and GUTs, and TOEs) that “beg” for them. Physicists would
love for at least one magnetic monopole to exist in the Universe because if it did, quantum
theory could explain charge quantization and much more! However, given the lack of
concrete evidence for their existence at this point, it is probably safe to say that magnetic
monopoles are at the very least rare. We express this lack of discoverable monopoles (so
far) in Gauss’s Law for Magnetostatics (GLM) as:
I Z
~ · n̂dA = 4πkm Qm,in S = µ0
B ρm dV = 0 (7.15)
S V /S
and this is just the way that you should learn it for this course.
Believe it or not, this is yet another one of Maxwell’s equations, and we need to learn
this equation just as well as we learn its electrostatic equivalent, Gauss’s Law for Electro-
statics (GLE). It actually tells us some very useful things about the magnetostatic field. In
vector differential form (something you will learn later, if you continue on in physics) it is a
key differential equation that you will need to be able to solve field problems. In this class,
its implications can be summarized as:
• Magnetic field lines cannot begin or end at a point (recall that they could only begin
or end at a point for electric field lines if the point contained an electric charge). Nor
can they cross. This leaves only one alternative:
• As we’ll shortly see, those closed loops must be caused by something. That some-
thing is moving charge passing through the loops, at least for the next two chapters.
To repeat: Gauss’s Law with no monopoles is an empirical rule, and lack of evidence
isn’t positive evidence of lack! We don’t know if there are, or are not, magnetic monopoles
somewhere in the Universe; we only know that we haven’t seen any so far when we’ve
looked for them quite hard. At any moment, though, a reproducible experiment that ob-
served them would require us to modify GLM (as well as Faraday’s Law from the next
chapter) to include monopolar terms and we’d all have to work a bit harder to learn elec-
trodynamics. This would be well worth the price, however, as it would enable us to under-
stand why charge is quantized, it would make various TOEs more consistent, and besides,
they are so rare that they would not practically complicate problem solving in introductory
physics classes, however much physics graduate students might be tortured...
GLM above contains an integral that amounts to the flux of the magnetic field through a
surface. We will encounter this kind of flux again in Faraday’s Law covered in the next
chapter, so we will at this point explicitly define the magnetic flux through a surface S and
the SI units associated it.
The definition of magnetic flux through a surface S is (as should already be clear):
Z
φm = ~ · n̂dA
B (7.16)
S
(where in context we might well omit the m subscript). Its SI units are called Webers, where
1 Weber is one Volt-Second or one Joule/Ampere or 1 Tesla-Meter2 , as you prefer.
There. That’s done. From a practical point of view, we will almost never use Webers as
units per se, as we will work directly with the equations in which magnetic flux occurs to
get quantities that are of a lot more directly useful to us, for the most part. You should still
know what they are.
In the previous section we tried to generalize Gauss’s Law for Electrostatics into a Gauss’s
Law for Magnetostatics, where static magnetic fields could be created by magnetic “charges”
(magnetic monopoles) much the same way that static electric fields are created by elec-
tric charges. Using our imagination, we readily succeeded, but alas when we went out
into the world to search for magnetic charges we didn’t find any. Yet magnetic fields exist
in abundance; otherwise how could pictures and newspaper articles ever be stuck to our
refrigerator doors?
When we go to search for sources of these magnetic fields, we find that they all have
something in common: The static magnetic fields we can readily generate and observe in
Week 7: Sources of the Magnetic Field 307
the lab are created by moving electrical charges, usually in the form of a static (or slowly
varying) electrical current in a wire95 .
As it turns out, the magnetic field produced by a single charged particle has proven to
be very interesting – too interesting, in fact, for us to give a really complete treatment of
it in an introductory course. We will therefore defer a full non-relativistic discussion of the
origins of the magnetic field of a moving point charge at least until we have completed our
discussion of the Maxwell Displacement Current in a few weeks.
Doing the field of currents per se first actually recapitulates the historical order of things.
In 1819 Øersted discovered that the currents in a wire connecting two poles of a battery
(invented by Alessandro Volta in 1800 – hence the SI unit volt) would cause the deflection
of a magnetized compass needle. One year later André-Marie Ampere discovered that two
current carrying wires exerted a force on one another and published his findings, within a
week of when Jean-Baptiste Biot and Flix Savart studied the damped oscillations of a
compass needle deflected by a wire to conclude what amounts to the same result for the
causal “magnetic field” produced by a wire. At this time the nature of the charged “fluid”
flowing through matter was almost entirely unknown!
This situation lasted almost seventy years! The magnetic fields produced by individ-
ual charged particles were beyond the experimental reach of eighteenth and nineteenth
century physicists until the very end of the nineteenth century. Between 1838 and 1874,
various models were proposed for atoms, some of which did involve pointlike charged par-
ticles and “cathode rays” produced by hot negatively charged wires were hypothesized to
be one of the particles, but it wasn’t until 1887 that the charge-to-mass ratio of the electron
was first measured by J. J. Thomson and the electron as a pointlike charged particle was
officially “discovered”. The value of the e (and hence me ) was not measured until 1909 by
Robert A. Millikan, although George Johnstone Stoney deduced the value of e (correctly)
from experiments in chemistry without assigning it to any particular particle other than
atoms themselves way back in 1874. Finally, it wasn’t until 1908 when the atomic nucleus
was discovered by Ernest Rutherford!
In other words, it wasn’t until the early 20th century that a reasonably accurate model
for the atom as a building block for molecules and ordinary matter was developed!
Yet most of the electromagnetism we learn in this book was discovered and more or less
“finished” as far as its fundamental equations are concerned (through Poynting’s Theorem,
the complete “work-energy-momentum” equation of the electromagnetic field) by 1884, so
it is clear that we don’t really need to understand the magnetic field of a point particle to
understand how to make a magnetic field at least in the non-relativistic limit of more or less
continuous currents in e.g. wires. Relativity theory itself was only developed in the late
nineteenth/early twentieth century, largely to help make electromagnetism consistent!
As the short history lesson above suggests, the original magnetic fields studied were
95
Reality is slightly more complicated than that, however. For one thing, as we discussed in the last chapter,
point-like electrons (and several other elementary particles) and many nuclei have a magnetic dipole moment
due to their quantum “spin” even though this spin can not be correctly modeled as a moving/rotating electrical
charge with finite extent. For another, as we shall see in the next chapter, changing electric fields make
magnetic fields even in the absence of moving electrical charges. Still, we will be able to understand both
phenomena in terms of electrical currents at least at first.
308 Week 7: Sources of the Magnetic Field
• They were “natural” fields generated by magnetic objects: Bar magnets, compass
needles, magnetite mineral chunks, the Earth itself. These natural magnets and the
forces they generate have been known almost from prehistoric times – it is probable
that primitive compasses were used in China and in India as long ago as 1000 BCE,
and a Greek legend dates it as much as 1000 years before that. The ancient Olmec
culture in the Americas likely had discovered magnetism by 1000 BCE as well, as
some of the artifacts from their culture are strongly magnetized. These fields were
not really studied, but they were used to engineer compasses and curios.
• Beyond this, the original controlled, artificial magnetic fields that were systematically
studied after the Enlightenment (that is, with the scientific method and producing pub-
lished results) were all generated by current carrying wires in the laboratory, under
human-controlled conditions.
The results of the early experiments by Biot and Savart (who used their apparatus
to literally map out the field strength and direction near the wire) can be summarized as
follows:
• The magnetic field produced by a small (differentially short) segment of wire carrying
an electrical current is proportional to the current (and by implication, proportional to
the length of the segment).
• The magnetic field of the segment diminishes like the distance from the charge to the
point of observation squared, just as was the case of the electrostatic field.
• The direction of the magnetic field produced by that same current-carrying segment
is that it forms circular loops around the direction of the current, in the direction the
fingers of your right hand curl around your thumb if you line your thumb up along with
the direction of the current in the segment.
• For any given value of the current, the field strength produced by a differential chunk
of the wire is proportional to sin θ where θ is the angle between the direction of ~
v and
the direction of I in the chunk.
dB (in)
r
θ
dl
Figure 7.1: The geometry represented by the observations of Biot and Savart.
The geometry of d~ ℓ and the resulting magnetic field is illustrated in figure 7.1, where for
simplicity we center the coordinate frame on d~ℓ in such a way that the z-axis of a spherical
Week 7: Sources of the Magnetic Field 309
polar coordinate frame is aligned with the direction of the current. We can transform the
list of observations of Biot and Savart into the following formula. If we let dℓ be the length
of the segment and give it the direction of the current I in the wire the segment is a part of,
then:
~
dB~ = km I(dℓ × r̂) (7.17)
r2
should be the magnetic field of just that small segment!
This result is not particularly general. First, the wire in question is straight; real wires
can easily be bent into curves. We can solve this by using the superposition principle, and
this is the main reason we insisted on finding the field of a differentially short segment
of wire, so we can integrate over an arbitrary wire and thereby add up the field (in more
general coordinates). Also note that this expression makes no sense in isolation – because
of charge conservation, we can never observe the magnetic field of an actual microscopic
segment in isolation because charge cannot be created at one end and destroyed at the
other – the current has to get to the ends of the segment through more wire!
dl
r − r0
r0 dB
I r
We will usually write this more properly in more general coordinates instead of co-
ordinates that centered (as these ones did) on the chunk d~ ℓ, and for a possibly curved
conductor instead of just a straight one. The result (and its associated figure 7.2 becomes:
~ = km I(d~
ℓ × (~
r −~ r 0 )) µ0 I(d~
ℓ × (~
r−~ r 0 ))
dB 3
= 3
(7.18)
|r − r0 | 4π |r − r0 |
a new quantity known as the permeability of free space, which will be the magnetic equiv-
alent of ǫ0 , the permittivity of free space. As before, the electric and magnetic constants are
easy to remember, while the permittivity and permeability are easy to remember numbers
multiplied or divided by 4π which are not particularly easy to remember.
Wow! That looks a lot more complicated than our integral expressions for the elec-
trostatic field! And so it is... we will have to work much harder to evaluate the magnetic
field directly from a current (distribution), and the need to do twisty integrals of directed dif-
ferential vectors as they are worked along a curve and formed into a cross product with a
relative vector to the point of observation (in some system of coordinates) will severely limit
the problems we can solve analytically by just doing sufficiently straightfoward integrals.
This is good news and bad news. From the student’s point of view, it means that things
at the intro level are relatively easy. If you learn to do all of the examples presented in
the textbook, the homework, and any in-class examples or problems, well, that is close
to all of the examples anyone (including, very likely, your instructor) can do without being
an integration god 96. On the other hand, it presages bad things for the more advanced
student, where sooner or later some of the more difficult problems must be faced (at which
point you will need to have made some progress on the road to calculus-deity or have
learned to integrate complicated functions numerically on a computer).
For this course, we will cheerfully take the easy road and work through a nice set of
relatively simple examples that make up, as promised, most of or close to all of what one
can do, period, before things get very complicated indeed.
θ1 dB (out)
θ2
θ dθ
y r
φ
x1 I x dx x2
Figure 7.3: The geometry and coordinates that make it simplest to evaluate the magnetic
field of a straight segment of wire carrying a current I.
In figure 7.3 we see the geometry of a single, straight, segment of wire relative to an
arbitrary point in space. We wish to use the Biot-Savart Law to find the field of this wire at
the point on the y-axis indicated. Note well that this is a general point as the y-axis itself
96
These do exist, by the way, some of them wandering in the halls of physics departments. Be warned.
Week 7: Sources of the Magnetic Field 311
is located at a general point on the x-axis, and no matter where we locate it we can do an
integral between x1 and x2 .
We therefore begin by considering the geometry of the cross product. If we let the
fingers of our right hand line up with dx in the direction of I and rotate through the small
angle to line up with the vector ~
r between dx and the point of observation, our thumb picks
the perpendicular direction out of the page as indicated. We can thus easily write the
magnitude of the magnitude for the field produced by this small differential chunk of the
current as:
km I sin(φ) dx
dB = (7.19)
r2
and hence (formally integrating both sides) we get:
Z x2
km I sin(φ) dx
B= (7.20)
x1 r2
x = y tan(θ)
y dθ
dx = (7.21)
cos2 (θ)
and
sin(φ) = cos(θ) (7.22)
(think about it for a minute, it will make sense). The integral becomes:
θ2 θ2
km Iy cos(θ) dθ km Iy cos(θ) dθ
Z Z
B= = (7.23)
θ1 cos2 (θ)r 2 θ1 y2
km I θ2
Z
B = cos(θ) dθ
y θ1
km I
= (sin(θ2 ) − sin(θ1 )) (7.24)
y
(out of the page, as noted). This is almost exactly identical to our expression for the electric
field of a long straight line of charge as evaluated in week 2, so it should be easy enough
to remember or rederive.
As was the case then as well, we can find the magnetic field produced by an infinite
straight line of current by taking the limits θ1 → −π/2 and θ2 → π/2, where the sines
become −1 and 1 respectively. Note that this result will actually be relevant and useful any
time we seek the field close enough to a wire carrying current that the angles to the end
points approach π/2. The field of such an “infinite” wire is just:
2km I
B∞ = (7.25)
y
312 Week 7: Sources of the Magnetic Field
Again note the analogy with electric field, with ke → km and λ → I, but note well, the
geometry of the field is entirely different! The magnetic field, for a finite or infinite wire
carrying current, flows in circular loops around the wire! In our picture, the field goes out
of the page above the wire, into the page below the wire, and in general if we let the
thumb of our right hand line up with the direction of the current in the wire, the field
circulates around the wire in the same sense as our fingers.
z
dB
dB z φ
dB
φ
dθ
z r
θ
y
a
I
dl
x
Figure 7.4: The geometry and coordinates used to compute the magnetic field of a circular
loop wire carrying a current I on its axis of symmetry.
In the figure above, we have the geometry of a circular current loop in the xy-plane.
Finding the magnetic field of this loop at an arbitrary point in (say) spherical polar coordi-
nates is not impossible, but neither is it easy – it is a chore best left for your next course
(if any) in electromagnetism. In this course, however, we can easily find the field at an
arbitrary point on the z-axis, because there we can use the cylindrical symmetry of the
arrangement to our advantage.
We begin by writing the Biot-Savart Law for the small chunk of current in the segment
of the wire labelled d~l:
~
~ = km Idl × r̂
dB (7.26)
r2
As you can see, the direction of this infinitesimal field element is in the plane formed by ~r
and the z-axis, perpendicular to ~
r in the right-handed direction. The magnitude of this field
element is:
Idl
dB = km 2 (7.27)
r
angle φ:
Idl
dBz = km sin(φ) (7.28)
r2
Idl
dB⊥ = km 2 cos(φ) (7.29)
r
We can evaluate the two trig functions using the right triangle with sides of a, z, and r
(which has the same angle φ in its apex) – sin(φ) = a/r and cos(φ) = z/r:
Idl a
dBz = km (7.30)
r3
Idl z
dB⊥ = km 3 (7.31)
r
At this point we could be lazy and invoke symmetry. The problem has azimuthal sym-
metry – if we walk around the ring and look at it from arbitrary angles, the problem does
not change with our perspective, so we know that the total magnetic field cannot have a
component that changes as we walk, that is, one in the x or y direction. The field can point
in the z direction only on the z-axis. This allows us to evaluate Bz only to get the total field.
However, one day you might need to show that the ⊥ field vanishes the hard way by
actually integrating it. Fortunately, this really isn’t that hard. If you look carefully at the
picture, you can see that:
Idl a
dBz = km (7.32)
r3
Idl z
dBx = km 3 cos(θ) (7.33)
r
Idl z
dBy = km 3 sin(θ) (7.34)
r
The only thing remaining is a variable we can integrate over. Hopefully it is obvious that
integrating over x and y is a really bad choice, while integrating over θ is a good one. We
note that dl = adθ, substitute this in, and we are ready to go:
2π
Ia2 dθ
Z
Bz = km
0 r3
I2πa2
= km 3 (7.35)
r
Z 2π
I az
Bx = km cos(θ) dθ = 0 (7.36)
0 r3
Z 2π
I az
By = km sin(θ) dθ = 0 (7.37)
0 r3
We conclude that:
~ = km I2πa2
B ẑ (7.38)
(a2 + z 2 )3/2
~ = Iπa2 ẑ:
It is instructive to write this in terms of the magnetic moment of the loop, m
~ = 2km m~
B (7.39)
(a2 + z 2 )3/2
314 Week 7: Sources of the Magnetic Field
which is exactly the same form as that of the electric field on the axis of an electric dipole,
E~ = 2ke p~/(a2 + z 2 )3/2 , that we derived several weeks ago, with the substitution of km for
ke and p~ for m.
~ This (hopefully) continues to motivate the idea that electric and magnetic
fields have certain characteristic shapes – those of monopoles, dipoles, quadrupoles, and
so on – and that if we ever learn to evaluate their multipolar moments for arbitrary charge-
current distributions, we will be able to easily reconstruct at least a good approximation to
the total electromagnetic field of those distributions.
In that spirit, we can easily find the form of the field when z ≫ a:
~ = 2km m
B
~
(7.40)
z 3
where we used the binomial expansion, sort of – we only had to keep the leading term
after factoring out the z so it was pretty easy.
Evaluating the magnetic field using the Biot-Savart Law becomes increasingly difficult
from here on. At the very least, it becomes an exercise in increasingly difficult calculus,
even though the physical concept is the same and you can always write down an integral
that – if you could do it – would lead you to the answer. There is one more worth at least
laying out to help get you set up for your homework.
z
dB
dB z φ
dB
φ
dθ
z r
Q
θ
y
a
ω
dl
x
Figure 7.5: The geometry and coordinates used to compute the magnetic field on the axis
of symmetry of a circular ring of charge Q revolving at angular velocity Ω.
In the figure above, a circular ring of charge with charge Q, radius a, and angular veloc-
ity Ω (which really points in the vector z-direction, recall – I’m just indicating the direction of
rotation in the figure above) is in the xy-plane concentric with the z-axis. Our job is to find
the field on the z-axis once again.
If this figure reminds you of the one in the last section, it should – they are the same.
In fact, the solution is going to be the same, except that we have to figure out the current
Week 7: Sources of the Magnetic Field 315
in the case where we have a revolving ring of charge instead of an actual current in a wire.
To do so, we note that all of the charge in the ring moves past an arbitrary point on the
circle of its motion – say, where it crosses the x-axis – in one period of its revolution. The
total charge per unit time passing that point is thus:
Q QΩ
I= = (7.41)
T 2π
The field is thus obtained by doing the exact same integrals as before:
~ = km I2πa2 QΩa2
B ẑ = km 2 ẑ (7.42)
(a2 + z 2 )3/2 (a + z 2 )3/2
The main reason to do this here (since it no doubt seems trivial) is that it is a key step
along the way to finding the magnetic field of a rotating disk of charge Q and radius R on
your homework. On your homework problem you will want to draw the disk of charge and
select out a thin ring of that charge of radius r and thickness dr. The field of this rotating
ring will depend on r (which is the same as a in this example, the radius of the ring, not the
distance from the ring to the point z). With a bit of care, you can integrate Bz over r to find
the total field on the z axis. Then you can investigate the z ≫ R limit and (with luck and
the use of expressions you derived for m of a rotating disk in the last chapter) show that it
~ 3 in this much more complicated case.
is still km 2m/z
We aren’t quite “done” with Biot-Savart. There are a few problems I could reasonably
give you and expect you to be able to at least formulate them as integrals and – with a bit
of skill – integrate to find the total field. Some of them are on your homework, but you can
imagine others – the field on the axis of a rotating rod of charge. The field on the axis of a
solenoid. The field of a rotating sphere, or spherical shell of charge. Some of these may
seem daunting, but in all cases with a bit of work you could get them.
However, learning to do these increasingly difficult integrals won’t teach you the physics
any better. To get a better grip on the physics, we have to leave the Biot-Savart Law behind,
or better yet, convert it into a more general equation the same way that we converted the
field of a point charge into Gauss’s Law for Electrostatics. In the next section we will do
just that – we will turn the Biot-Savart Law into our next Maxwell equation, Ampere’s Law.
But first, let’s have a discussion on the magnetic field produced by “slowly” moving point
charges. As promised, this section will be very short, because we will come back to it later
in the textbook armed with Ampere’s Law, but there are some really interesting aspects
of magnetism that one first confronts when thinking about the magnetic field of a moving
point charge, and if nothing else, they’ll give you something to think about in the weeks
ahead.
The Biot-Savart Law looks as though it is begging to be turned into a very simple rule for
finding the magnetic field produced by a point charge. When we consider the geometry of
a short segment of wire with length dℓ and cross-sectional area A, we learned that:
I = nqvd A ⇒ Idℓ = nqvd (Adℓ) = nq(Adℓ)vd
316 Week 7: Sources of the Magnetic Field
In words, the current times the small length dℓ equals the free conduction charge in that
small chunk times its drift velocity.
It seems pretty reasonable to think that the field of the point-like segment, then, equals
the field produced by the point-like charge ∆Q of the segment times the actual (classical)
speed the charge is moving with down the wire in the direction of the current! In that case
we would expect:
~
dB~ = km I(dℓ × r̂) ⇔ B ~ = km q (~ v × r̂)
(7.43)
r 2 r2
should be the “Biot-Savart Law” magnetic field of a point charge q moving at a velocity ~ v ! In
fact, this expression is approximately correct, but only if the charge q is moving at a speed
v ≪ c, slowly relative to the speed of light, and only if r is “small enough” that we can treat
the emission of the field and its observation to be effectively “instantaneous”.
Let’s try to understand this, as it is one of the things that strongly motivates the later
development of the theory of special relativity. Let’s start with the geometry of the field
v
B (out of page) B (into page)
θ r
q
Figure 7.6: The geometry of the magnetic field lines going in circles around the (dotted)
line of motion of the charge in the right-handed sense. Note that the direction of ~ v×~ r
is into the paper on the right, out of the paper on the left, for ~
v and any of the ~
r vectors
shown.
lines we expect for the point charge, as shown in figure 7.6. In spherical polar coordinates
centered on the charge with the z-axis aligned with its direction of motion, the cross product
simplifies to give us a field magnitude
qv
B = km 2 sin θ (v ≪ c) only. (7.44)
r
The direction of the field is obtained as usual from the RHR – if you let your right-handed
thumb point in the direction of motion of the charge, your fingers will curl around your thumb
in the same sense that the field lines are directed around the velocity vector of the charge
in an axially symmetric way.
Week 7: Sources of the Magnetic Field 317
This rule is simple enough, and is almost the rule you will learn in much more advanced
courses in electrodynamics than this one. The only thing we are leaving out (that is, of
course, very important) is that neither the electric nor the magnetic field appears instanta-
neously in all space. When one of their sources is “turned on” by a suitable rearrangement
or motion of charges, the fields propagate outward from the charge at the speed of light,
establishing their value at a point of observation point at a lagged time after the charge or
current appears at any given point of field emission.
This is true even for the apparently “static” electric field we worked on in the first chapter!
For this field evaluated “near” a more or less stationary (slowly moving) electric charge this
didn’t matter much, but we still called the field the electrostatic field to emphasize the
requirement. For the magnetic field, things are still more surprising! Let’s consider a very
simple example. Suppose you have a point charge at rest at the origin. As we can see
from the expression above, it should have no magnetic field at all, and should produce an
electrostatic field at the point ~
r (as usual) of:
~ rest = ke q
E
r2
Now imagine that we change reference frames to a new inertial reference frame moving
a velocity ~
v = −v0 ẑ relative to the first frame. In this frame, q is moving up with speed v0 ẑ
at t = 0, the time it is located at the common origin of both frames! At that instant and in
this moving frame there are both electric and magnetic fields at the point ~ r=~ r ′!
This results in a complete mess! For example – if we have a neutral current-carrying
wire with a segment at ~ r , it will experience no electric force or magnetic force in the first
frame. In the second, it will still experience no electric force, but (depending on the current
direction relative to ~
v ) it may well experience a magnetic force in the second frame! This
isn’t the appearance or disappearance of a pseudoforce in an accelerating frame, this is
a serious blow to either Newton’s Laws, or to Galilean relativity, as the actual force in the
two frames changes so that there are different accelerations depending on which frame
you are in! This makes no sense, of course. The trajectories observed must be the same
in both frames, within the kinematics of the transformation between frames!
It will take a bit of work, in some future E&M course, to see that the fundamental
problem is with the galilean frame transformation itself and that the theory that consistently
~ ′ and B
permits it all to work out is the theory of special relativity. In fact, both E ~ ′ in the
moving frame differ from what they are in the original stationary charge-centered frame –
and so are space and time – but in just the right way that ultimately, observers in the two
frames don’t see fundamentally different realities, but just two different ways of observing
just one reality!
Another problem that we can understand right now follows from combining this empirical
law for building the field with the Lorentz force law for the magnetic force on a moving
318 Week 7: Sources of the Magnetic Field
charge. Put them together and we find that for two charges q1 and q2 travelling at velocities
v 1 and ~
~ v 2 and with ~
r 12 the vector from q1 to q2 , the force on q1 due to q2 is:
~ ~
v 2 × r̂ 12 km q1 q2
F 12 = −q1~ v 1 × km q2 2 =− 2 (~
v 1 × (~
v 2 × r̂ 12 )) (7.45)
r12 r12
Similarly, the force on q2 due to q1 is:
~ 21 = q2~ ~
v 1 × r̂ 12 km q1 q2
F v 2 × km q1 2 = 2 (~
v 2 × (~
v 1 × r̂ 12 )) (7.46)
r12 r12
B, F21= 0 q2 v2
r 12
v1
F12
Bin q 1
Figure 7.7: Newton’s Third Law fails for this arrangement because the field (and hence
force) at q2 due to q1 is zero while the field (and hence force) at q1 due to q2 is not zero!
the magnetic field at the position of q2 in this figure is zero, because ~ v1 is parallel to ~
r 12
(so the cross product is zero). However, ~v 2 is perpendicular to ~
r 12 and the cross-product
is not zero; the magnetic field at q1 is nonzero and in to the page, perpendicular to ~ v1 .
Consequently the force on q1 is nonzero and points to the left while the force on q2 is zero!
If you think for a moment, you will recall that we used Newton’s Third Law to derive a
very important physical principle: The Law of Conservation of Momentum! In the picture
above, the total momentum of the interacting pair of particles is changing in time. If we
worked harder and computed the way the total energy of the particle pair is changing as a
function of time we would find that it is changing too. The pair of particles is literally “lifting
itself up by its own bootstraps”!
This is, of course, offensive to all right minded individuals, who quite correctly view the
failure of energy or momentum conservation to be the failure of all of physics, a global
inconsistency that would cause us to observe all sorts of “magic” that we do not, in fact,
observe in the world.
Historically, whenever momentum or energy have appeared to be lost in a collision,
we have (when we looked carefully) found them again, often in an unexpected form. This
case is no exception; in fact it was probably the original case of the rule. Physics is quite
safe, because momentum is conserved, even though the momentum of a collection of
Week 7: Sources of the Magnetic Field 319
massive particles interacting only electrically and magnetically is not! Where do you think
the missing momentum and energy might be found?
It is difficult to know the best way to show you the path from the Biot-Savart Law to Am-
pere’s Law or vice-versa. In a sense, this is conceptually one of the most difficult things
to see, because the usual connection is established using multivariate calculus that is be-
yond the scope of this course, and it is Ampere’s Law that is viewed as being the more
fundamental!
The Biot-Savart Law can can be used to derive a limited “magnetostatic” version of Am-
pere’s Law using relatively simple arguments that we will give below, but the only method I
know of for deriving the Biot-Savart Law from Ampere’s Law using more or less elementary
calculus at the level of this course requires that we start from Ampere’s law corrected by
the “Maxwell Displacement Current”, something we will not cover for two more chapters.
We will therefore defer until then a very interesting demonstration (due to Robert Buschauer)
for obtaining the Biot-Savart Law from Ampere’s law using only the displacement current
of a point charge moving at speed v ≪ c along the z-axis b – a demonstration that also
reinforces the idea that the magnetic field produced by a point charges in a frame where
that charge is moving must be an electric field only in a frame where it is not – something
that forces us to conclude that electromagnetism with the electric and magnetic fields rep-
resented by simple vectors is not invariant under the Galilean transformation between
these two reference reference frames!
The point of this is that the Biot-Savart Law is itself not strictly correct outside of the
context of magnetostatics where Ampere’s Law – with Maxwell’s eventual addition – is one
of the two equations that lead us to electrodynamics – the fully unified dynamic electromag-
netic field. As we’ve seen, it doesn’t seem to work correctly for “isolated” current-carrying
wire segments or their equivalent moving point charges.
In fact, my favorite way of presenting this whole chapter is as a sort of detective story,
where I lay down hints along the way and give you a chance to win a prize97 . The goal
is to see the flaw in Ampere’s Law as we soon write it, to see how it must fail to be
mathematically consistent for certain geometries of currents, and – naturally – to correctly
derive the fix for it: the Maxwell Displacement Current that (with Faraday’s Law) unified
Electricity and Magnetism.
Of course, any student who wishes can skip ahead a few chapters and “cheat”, but that
would be no more satisfying than reading the last chapter of a mystery novel first.
So come on. You’re pretty smart. You’re taking a no-kidding physics course. Think you
can slam-dunk like James Clerk Maxwell? Then bring it. Figure out why Ampere’s law isn’t
consistent and make it right, without peeking! If you can do that, you can do anything, and
97
A prize of no value whatsoever and of the greatest value you can imagine. In my own class, I up the ante
of the “no value whatsoever” part and give any student that manages it a piece of candy or a small prize picked
out of a treasure chest of cheap prizes. This, of course, makes the total value of the prize greater than the
greatest value you can imagine, if you can imagine that. Of course – ontologically – you can’t, hmmm...
320 Week 7: Sources of the Magnetic Field
the knowledge that you can do anything is more valuable than you can imagine when later
you hit some really difficult problems in life if not physics.
Thus we will start our discussion by thinking once again about a single, infinitely long,
straight line of current I. We recall from our Biot-Savart Law based example problem done
in the text above that:
2km I µ0 I
B= = (7.47)
r 2πr
where the direction of the magnetic field is around the current I in the right-handed sense.
The geometry of this is drawn in figure 7.8.
r B
Figure 7.8: An infinitely long straight wire carries current I and has a magnetic field that
goes around the wire in circular loops of constant magnitude in the right-handed sense.
Note well that the field drops off like 1/r. The circumference of the circle just happens
to increase like r. In week 3 we saw that if we multipled a field that went down like 1/r 2
by an area that went up like r 2 , we got a quantity that only depended on the charge (and
Gauss’s Law for Electrostatics). Here we can clearly do the same thing and write:
µ0 I
B × 2πr = × 2πr = µ0 I (7.48)
2πr
So far, this is only suggestive. However, consider the geometry of figure 7.9. where the
current I is drawn directly into the page so we can concentrate on the plane in which the
magnetic field lies.
The B-field is of course constant in magnitude on the circle of radius r as before, and
so we can multiply it by the length of the circular arc at the same radius right up to the
notch. However, our path now steps out by ∆r along the radius (and perpendicular to
the field). Along the curved path ∆s, the field is somewhat weaker, but the path itself is
somewhat longer.
Week 7: Sources of the Magnetic Field 321
r
I (in) ∆s
∆θ ∆r
Figure 7.9: A circular path of radius r around a long straight wire with a “notch” of angular
width ∆θ and radius r + ∆r.
and we see that deforming the circle with the notch did not alter the value of the sum we
got from multiplying the field times the length of the curved path C (circle with notch) along
the magnetic field, while ignoring the part of C perpendicular to the magnetic field. Again,
this should be reminding you of what we did for Gauss’s Law only it is a bit simpler.
Well, we can add more notches; in fact, we can deform the curve C in, we can deform
it out, we can deform it so that it goes along the wire and no longer lies in a plane, and
as long as we break up C into teensy segments that either lie perpendicular to the field or
follow a curved arc tangent to the field, we only get a contribution from the piece of length
ds tangent to the field, and that contribution is always of the form:
µ0 I µ0 I
Bds = rdθ = dθ (7.50)
2πr 2π
If we clean up the geometry of this, picking a path element along C with a vector length d~ℓ
~
and selecting only the component parallel to B at that point with the dot product, we get:
2π
µ0 I µ0 I
I I Z
~ · d~
B ℓ= θ̂ · rdθ θ̂ = dθ = µ0 I (7.51)
C C 2πr 2π 0
which is true for any curved path C that goes around the infinitely long straight wire pre-
cisely one time, so that the integral of dθ is eventually 2π (with the r in the length element
along B~ always cancelling the 1/r dependence of B). ~
Note two things. One is that current carrying wires that do not pass through the closed
loop C do not contribute to the loop integral – the integral of dθ around such a loop always
adds up to zero because it doesn’t go, and stay, around. If there were many wires and not
322 Week 7: Sources of the Magnetic Field
just one, we could use superposition and show that this equation would still be true as long
as we only add up the total current I that passes through C on the right hand side.
The other is something that I can’t precisely show, but which you can kind of see is
true. It turns out that this equation works even if the wire(s) aren’t infinitely long or straight!
In fact it works for any steady-state (static) current passing through the closed loop C.
We can imagine trying to prove this by (for example) leaving C as a circle and starting to
deform the path followed by the current I and noting how the result depends on the angle
subtended by each point on the circle, but in the end visualizing it would be too difficult
because of the cross product. For that reason, this is one of the few times in this book
where I’ll ask you to just trust me because I can’t quite show you how the result doesn’t
change as we, for example, bend the infinitely long straight wire around into an arbitrary
loop itself, while maintaining the fact that it passes “through”98 .
This leads us to write the previous equation in the following carefully selected form:
I Z
~ ~
B · dℓ = µ0 Ithrough C = µ0 J~ · n̂dA (7.52)
C S/C
which we will call (the incorrect form of) Ampere’s Law. Ampere’s Law is our third Maxwell
equation, and is the equation that Maxwell in fact “fixed” to get his name on the entire set.
Some fix!
Note that I wrote the current “through C” in a mathematically correct way as the flux of
the current density through an open surface S bounded by the closed curve C! 99 . This is
the way I’d like you to practice writing Ampere’s Law, although in application below finding
the field of certain highly symmetric constant (or slowly varying) currents we’ll often (but
not always) be able to just add up the total current through any given loop “by inspection”.
Another Maxwell Equation! We’re now up to three, with one to go. Gosh, seems as
though it should be good for something, doesn’t it?
Indeed it is. Even in its slightly broken form above, we can use it to find the magnetic
field in problems with just enough of the right kind of symmetry. There are only a handful
of problems that fit the bill, but they are all useful and important. In the end, though, the
purpose of Ampere’s Law (fixed) is that it is a law of nature (where the Biot-Savart Law per
se is not). In fact, we can derive the non-relativistic Biot-Savart Law from Ampere’s Law,
using the Maxwell Displacement Current as I’ll show in a chapter or two once we know what
the latter is. Just bear in mind that this introductory treatment (but not Maxwell’s Equations
per se) is always going to be incomplete without special relativity and differential vector
calculus covered in a more advanced course on electrodynamics.
For now, though, and in this course, we’ll content ourselves with the handful of problems
where Ampere’s Law can be used to evalute the magnetic field. We’re basically going to
98
This is your first hint. Exactly what does it mean for a current to pass “through” an arbitrary closed loop?
It’s easy to answer this when the line is straight and the loop is a nice plane circle, but Ampere’s Law holds
for curves C that are topologically equivalent to a kilometer of fishing line with the ends tied together (to form
a closed curve) and the balled up onto the biggest, worst fishing tangle you ever saw! What does it mean for
current to go “through” that? And yet it can, and if you stuff the entire snarl into a pipe carrying water, you can
completely imagine that some of the water does, in fact, go through the loop as it flows along.
99
Hint: There are many – in fact, an infinite number of – surfaces S that are bounded by any particular closed
curve C. Is the value of this integral independent of which surface you choose? If it is, is that a problem?
Week 7: Sources of the Magnetic Field 323
do all of them. On your homework, I’ll ask you to do them again (without looking, before
you are done) and will throw you a few simple enough variants of the problems.
There are basically four problem geometries where Ampere’s Law can be used to find the
field. As was the case with Gauss’s Law for Electricity, each of them has an associated
symmetry that permits the path integral on the left to be evaluated “once and for all”, so
that solving the problem amounts ot finding the total current through the Amperian Loop
(topical equivalent of the Gaussian Surface) in question.
Those categories are:
a) Infinitely long straight wire, or cylinder, or cylindrical shell, or anything else where the
current has cylindrical symmetry. These examples will be like the argument we used
to justify Ampere’s Law above, only backwards.
c) Toroidal solenoid (which also has cylindrical symmetry, but in a different way).
d) Infinite plane sheet of current (which may or may not be a “thick” sheet).
The simplest example of this is the infinitely long straight thin wire, so we’ll do that just as
a warm-up.
Take an infinitely long, straight wire carrying a current I. We know from symmetry (not
just because we used the fact to sort-of-derive Ampere’s Law in the first place) that the
magnetic field is constant in magnitude on and tangent to a circle of radius r because the
problem doesn’t change as we walk around the wire. We therefore choose the Amperian
Path C to be a circle of radius r and do so once and for all for this kind (symmetry) of
problem. Then:
I
B~ · d~
ℓ = µ0 Ithru C
C I
Bt dℓ = µ0 I
C
Bt 2πr = µ0 I
µ0 I
Bt = (7.53)
2πr
Big surprise. We find that the field tangent to the circle at all points Bt is exactly what
we know it to be as we more or less invert our “derivation” of Ampere’s Law. The one
important lesson to take from this is that the left hand side and concluding algebra for this
little mini-derivation will never change! For every cylindrical problem we will always use
324 Week 7: Sources of the Magnetic Field
a circular Amperian Path and the left hand integral will (because B is constant on and
tangent to the circle) always evaluate to Bt 2πr.
The right hand side, on the other hand, we may have to work for. Specifically, we will
often have to work to find the actual current through the particular C we have drawn.
Still, this seems a hell of a lot easier than setting up and evaluating the Biot-Savart
integral we did earlier this week. Maybe there is something useful in here, after all!
This is more apparent in the next example.
R r
B
J (in) r’
C
C’
Figure 7.10: An infinitely long thick wire of circular radius R carries a current I into the
page as drawn. We would like to find the magnetic field in all space.
Suppose we have an infinitely long straight wire that has some finite radius R and is
carrying a current I that is uniformly distributed across the wire cross-section as shown in
figure 7.10. We would like to compute the magnetic field everywhere in space, both inside
and outside of the wire.
Our first step is to transform I into a current density J~ into the page:
I
J~ = ẑ (7.54)
πR2
where the z-axis is into the page.
Next, we have to think just a bit about the field we expect to get. This step is essential
– most people who get this problem wrong get it wrong because they omit it, they haven’t
thought about the problem enough. The current has cylindrical symmetry, so the field will
too. We expect the field lines to run in circles of constant magnitude around the center of
symmetry in the middle of the wire, in the clockwise direction as drawn from the right hand
rule. But we do not expect them to have the same form inside the wire and outside of the
wire. We therefore have to draw two Amperian Paths, one (C) of radius r in region I r < R,
Week 7: Sources of the Magnetic Field 325
the other (C ′ ) of radius r ′ in region II r ′ > R. We have to apply Ampere’s Law twice, once
in each region.
Let’s do region I (r < R):
I Z
~ ~
B · dℓ = µ0 J~ · n̂dA
C S/C
I Z
Bt dℓ = µ0 J dA
C S/C
2
Bt 2πr = µ0 Jπr
µ0 Jπr 2 µ0 Ir
Bt = = (7.55)
2πr 2πR2
where we have selected a right-handed normal n̂ into the page so that the dot product of
J~ and n̂ is just the magnitude J. The right hand side, as you can see, computes the total
current that flows through the curve C (inside the radius r)! The left hand side is identical
to what it was for the thin wire (and what it will be for all other cylindrical problems).
Then, region II (r ′ > R):
I ′ Z
~ · d~
B ℓ = µ0 J~ · n̂dA
C S/C
Bt 2πr ′ = µ0 I
µ0 I
Bt = (7.56)
2πr ′
which is the same as for a long straight thin wire. The field outside of any cylindrical current
will be the same as the field of a current of the same strength all concentrated in a thin
wire at the origin. This should all be very reminiscent of Gauss’s Law and fields outside of
cylinders or spheres.
We crudely plot the field as a function of r in figure 7.11. Remember, the field circulates
around the current (density) in a clockwise direction as determined by the right hand rule.
We could, of course, do more complicated problems now that have this symmetry as
long as we can figure out how to do the integrals (or otherwise figure out the amount of
current that passes through C) on the right hand side of Ampere’s Law. The left hand
side is always the same. Variations include: Finding the field in a thick cylindrical shell
carrying a current I; a coaxial cable; a thick wire with a cylindrical hole, a thick wire with a
current density that is not uniform. The latter is particularly relevant for alternating currents
– when an alternating current is sent through a thick wire the current is not uniformly
distributed, it tends to concentrate near the surface and die off in the middle. This has
implications for computing the resistance and actually affects the design of high voltage
power transmission lines and wave guides.
The solenoid pictured above in figure 7.12 is a classic problem in magnetism – it is (as we
will see) the moral equivalent of a capacitor for the storing of magnetic energy. A solenoid
is also our ideal model for “permanent magnets” as well as electromagnets of all flavors.
326 Week 7: Sources of the Magnetic Field
B
µ0 I
2 πR
R r
Figure 7.11: B(r) for a long thick wire of radius R carrying a current I. Note that the field
increases linearly inside of the wire and reaches a maximum value on the surface of the
wire. Outside it drops off like 1/r. Although the field is continuous, its derivative (slope) is
not; it jumps at r = R.
(infinite)
I (in) C
b B
Figure 7.12: A cross-sectional view of an infinitely long solenoid with n turns per unit length,
cross-sectional area A, carrying current I in each turn. The field both inside and outside of
the solenoid is parallel to the axis of the solenoid (from symmetry), leading to the Amperian
Path shown.
chunk. We only get a contribution from the side of length b inside the solenoid. That is:
I
B~ · d~
ℓ = Bz b + 0(left) + 0(top) + 0(right) = µ0 Ithru C
C
Bz b = µ0 nbI
µ0 N I
Bz = µ0 nI = (7.57)
L
where we computed the total current through C by multiplying the number of turns per unit
length by the length of C through which the turns passed times their current.
Note well that this tells us that the field is zero outside of an ideal solenoid – all magnetic
field lines are confined to live inside the solenoid tube and none can escape to the outside.
It also tells us that the field inside is uniform – there is no dependence of the answer on
any spatial coordinates, so it doesn’t vary with coordinates beyond being non-zero on the
inside and zero on the outside.
The final form is given as you might use it for a solenoid with a finite number of turns N
and of finite length L, where (recall) L needs to be much larger than the radius or diameter
of the solenoid and where we are finding the field not too near the ends. Usually we will
idealize even finite size solenoids as having the field of an infinite solenoid inside, and will
neglect end effects. That is, we will assume that the field is uniform but drops to zero
“instantly” at the solenoid ends. Of course this isn’t physical, but the field does drop off
very rapidly at the ends, so it is a good approximation once again, as was neglecting fringe
fields for capacitors (the moral equivalent in the electrostatic case).
That was certainly very easy compared to any sort of Biot-Savart Law integration. The
latter can be done with some work, but it isn’t easy and requires more calculus than you
are likely to have so far; maybe some day in a future class you’ll do it.
Simple, easy or not, the solenoid is an enormously useful and important example, so
328 Week 7: Sources of the Magnetic Field
z
N turns
a
r h
I
b
C
Figure 7.13: A cross-sectional view of a toroidal solenoid with N turns, and a rectangular
cross-sectional with inner radius a, outer radius b, and height h, carrying current I in each
turn. The field both inside the solenoid is concentric to the vertical axis of the torus (from
symmetry and the right hand rule), leading to the Amperian Path shown.
In figure 7.13 above a toroidal100 solenoid is drawn. The particular one we will look at
has a rectangular cross-section although (as we will see) this doesn’t really matter as far
as finding the field in all of space is concerned – any uniform cross-sectional shape (such
as a circle or ellipse or outline of Homer Simpson) would do. We choose a rectangle with
nice coordinates mostly to make it easy to compute the self-inductance of this solenoid
next week, not because it matters this week and this way we can just reuse the figure as
well as the Ampere’s Law result.
The wires in the figure (drawn on the left) have to be visualized wrapping the whole
torus (fairly tightly). If one lays one’s right hand thumb mentally along the direction of
the current in each leg of a loop around the torus, you can easily convince yourself that
each wire produces a field nearby that is generally cylindrically “around” the torus in the
direction given by laying your thumb in the direction of the inside wires, the ones closes
to the z-axis of symmetry. In this case the B-field is counterclockwise, then, viewed from
our perspective above, and our Amperian Path (along which the field should be constant
in magnitude and tangent to the path or anything you like and perpendicular) is a circle of
radius r.
100
Wikipedia: http://www.wikipedia.org/wiki/Torus. A torus is a “doughnut shape”, usually with a circular cross
section.
Week 7: Sources of the Magnetic Field 329
We locate the circle inside the solenoid at first. Ampere’s Law then gives:
I Z
~ · d~
B ℓ = µ0 Ithru C = µ0 J~ · n̂dA
C S/C
B2πr = µ0 N I
µ0 N I
Bt = (7.58)
2πr
where we discover that the current “through C” is just the current in a single wire times the
number of wires but only when the curve C lies inside the torus! For circles C outside of
the torus the current through the any surface bounded by C is zero, as every wire goes (at
best) into the surface one time and right back out out one time.
Our conclusion is that the toroidal solenoid confines the magnetic field to live inside
the torus, and the geometry of the field causes it to drop off like 1/r! How useful! How
interesting! Solenoids in general seem to like to trap magnetic field lines and keep them
from escaping. If we bend them around in curves, they keep the field inside (and cause
it to vary by getting weaker on the outside edges of the curves). If we wrap them back
into themselves (making a torus or a topological knot of some sort then the magnetic field
cannot get out into the room and remains confined to the inside of the coil.
This property will turn out to be very useful next week when we consider making induc-
tors out of solenoids, as a toroidal solenoid will have the helpful property of having very
little mutual inductance with nearby current loops, where finite length regular solenoids pro-
duce a pesky “fringe field” at their ends that can induce unwanted voltages in conductors
or loops close to those ends.
If you look inside a computer or other electronic device, you will usually see a few
toroidal inductors soldered into the motherboard, and that is exactly why they are shaped
the way they are shaped – it is very “bad” for computer motherboards to pick up inductive
signals from processes that have nothing to do with their function, especially if the voltages
involved approach the threshold that can trigger flips and flops in its enormously complex
bit processing structure.
In figure 7.14 we see our final example, an infinite conducting sheet of negligible thickness
(exaggerated in the picture) carrying a uniform current per unit transverse length into the
paper. We then follow a familiar ritual. Every point is in the middle of an infinite sheet, so
our picture is located in the middle. If we flip the picture over (maintaining the direction
of the current into the paper) the field has to be the same, so we know that the field has
to have the same magnitude equal distances above and below the plane. We know that
the picture has mirror symmetry around any vertical line. We know that there is much
current to the right of that line (which produces a field with an upward directed component
above the sheet) as there is to the left of the line (which produces a field with a symmetric
downward directed component), so our right hand tells us that the only possible direction
for the field is to the right parallel to the sheet above it, and to the left parallel to the
sheet below it. A sensible Amperian Path is then a rectangle symmetric about the sheet
330 Week 7: Sources of the Magnetic Field
C
B
y/2 λ in
y/2
B
b
Figure 7.14: A side view of an infinite sheet of conductor carrying a current (per unit length)
λ into the page. The field due to the sheet is symmetric up and below the sheet as drawn,
and must point parallel to the sheet because every point is in the middle of the infinite
plane (as usual). Any up-down asymmetry would violate mirror symmetry about that “mid-
dle” because the problem would not change but the solution would. This leads us to the
Amperian Path shown, which should remind you of that of the infinte solenoid, with sides
perpendicular to the field.
with sides perpendicular to the field and ends parallel to it, traversed in the right handed
direction as shown.
It is now simple to apply Ampere’s Law, as we get no contribution from the sides of C
and equal positive contributions from the upper and lower legs of C:
I
~ · d~
B ℓ = µ0 Ithru C
C
2B|| b = µ0 λb
µ0 λ
B|| = (7.59)
2
where B|| is the magnitude of the component of B ~ parallel to the sheet a distance y/2
above or below it. Of course we note that this field doesn’t depend on y so the field above
and below the sheet is uniform to the right and left respectively.
There is a bit of insight to be gained from thinking about two sheets, one carrying cur-
rent in, one carrying current out, separated by a distance d. In this case the superposition
principle suggests that the field above the two sheets and below the two sheets will be
zero, as the contributions from the two sheets cancel. In between, though, they add to a
total magnitude of:
B|| = µ0 λ (7.60)
If we imagine that λ is made up of the field in a lot of very closely spaced single wires each
carrying some current I, then you can see that:
λ = nI (7.61)
or, the number of wires per unit length times the current per wire equals the amount of
current per unit length. The field in between is thus:
B|| = µ0 λ = µ0 nI (7.62)
Week 7: Sources of the Magnetic Field 331
Yes, this week is long enough, and has enough content, that it is worth a bit of a wrap-up
at the end. We have covered one and a half Maxwell equations, after all!
At this point you should be aware that unless and until somebody positively discovers
magnetic monopoles in an experimentally reproducible setting so that everybody agrees
that they are real (and ideally, learns enough about them to incorporate them into our
general picture of physics) Gauss’s Law for magnetism will tell us that magnetic field lines
produce no net flux through a closed surface S and consequently must form closed loops
in space.
The Biot-Savart Law for currents tells us how to compute the magnetostatic field pro-
duced by a steady-state current distribution, if we can manage the complexity of dealing
with vectors, cross-products, and multivariate integral calculus simultaneously.
The “Heaviside” form for the magnetic field of a point charged particle q travelling at
some velocity ~ v ≪ c, consistent with the Biot-Savart Law, led us to some serious puzzles,
enough to make us doubt the consistency of classical physics itself. For one thing, we
were able to show that the interaction forces between two charged particles interacting
with this field violated Newton’s Third Law and hence (apparently) the Law of Conservation
of Momentum for the pair! For another, Biot and Savart only obtained their experimental
law by studying steady state currents, and a charged particle exists only at a single point
in space and isn’t smeared out into a “continuous” current; we effectively assumed that the
magnetic field propagates instantaneously from the moving charge in the form we wrote
down, and as it will turn out, this is incorrect.
This was apparent when we thought about the appearance and disappearance of mag-
332 Week 7: Sources of the Magnetic Field
netic fields (and hence magnetic forces when we do nothing but change the inertial ref-
erence frame in which we view charge-current distributions – if the electric field and force
(and possibly more) don’t change at the same time, changing reference frames could pro-
duce distinct physical realities with different forces, accelerations, and hence trajectories!
Finally, we obtained with some hand-waving from the Biot-Savart Law a new equation
we called Ampere’s Law after its discoverer. Unfortunately it inherits a flaw from our sort-
of-derivation – it is essentially a static result, good only for steady-state currents (like the
Biot-Savart Law itself). We did find Ampere’s Law to be remarkably useful for finding the
static (i.e. at most “slowly” varying in time) magnetic field produced by suitably symmetric
static current distributions, but we are, or should be, a bit worried about consistency be-
cause (hint hint) the “current through the closed curve C” that it explicitly references seems
as though it can mean nothing but the flux of the current density through some open sur-
face S bounded by that closed curve, but there are an infinite number of these surfaces
and we (should) have the uncomfortable feeling that the current we obtain depends on the
surface chosen where it really shouldn’t.
An invariant form of the current – one that one could prove does not depend on the
surface chosen – would be much better, especially if it still gives us the usual static result
where it should, but what physical principles or insight might lead us to such an invariant
form?
Ah, puzzles in abundance! Things are finally getting interesting! This is a good thing,
as reality is undeniably rather complex and if the electric and magnetic force were too
simple they could not sustain the complexity we see every time we, well, see. This seems
like a good time to wrap up electrostatics and magnetostatics and move on to electric
and magnetic field dynamics.
We’ll begin by trying to understand a puzzle that we haven’t really faced until now.
Magnetic forces are by definition always exerted at right angles to the direction of motion
of a charged particle or moving current. This means that magnetic forces do no work!
on isolated classical charged particles (with no intrinsic magnetic moment), because work
requires a force component in the direction of motion. Next week we will study what at
first glance then seems like a paradox – cases where magnetic fields clearly appear to
do work – and then resolve the paradox by concluding instead that magnetic fields under
some circumstances create electric fields, and electric fields have no difficult at all doing
work on charged particles!
Week 7: Sources of the Magnetic Field 333
Problem 1.
Physics Concepts
Make this week’s physics concepts summary as you work all of the problems in this
week’s assignment. Be sure to cross-reference each concept in the summary to the prob-
lem(s) they were key to. Do the work carefully enough that you can (after it has been
handed in and graded) punch it and add it to a three ring binder for review and study come
finals!
Problem 2.
d a
I2
I1
Problem 3.
Using Ampere’s Law, find the magnetic field in all space produced by:
b) Two cylindrical conductings shells carrying opposite currents (each equal to I in mag-
334 Week 7: Sources of the Magnetic Field
c) A solenoid with N turns and length L carrying current I in each turn (inside only, far
from the ends).
e) An infinite plane sheet of current into the paper (above and below the sheet).
This more or less exhausts the list of possible problem types where one can find the mag-
netic field using Ampere’s Law. Most were examples in lecture, so this forces you to reca-
pitulate on your own what you saw presented there.
Problem 4.
Problem 5.
Based on the analogy between electric and magnetic dipoles, deduce the probable form of
the magnetic field of a spherical ball of charge Q, mass M , and radius R that is rotating at
angular velocity Ω on a) its axis of rotation; b) at a point in the plane that passes through the
ball perpendicular to the axis of rotation; in both cases far from the ball of charge, that is,
for z ≫ R and x ≫ R for a ball spinning around the z axis. Note that it is quite a bit of work
to actually derive this result (though it can be done). This is part of the point of multipolar
expansions – once one knows the form of the field for any given multipolar moment, one
merely has to compute that moment for a give charge-current density to discover the (far)
field “for free”.
Week 7: Sources of the Magnetic Field 335
Problem 6.
Jin
R/2
R
A cylindrical conductor of radius R aligned with the z direction has a cylindrical hole of
radius R/2 centered at x = R/2 also aligned with the z direction. The conductor carries
a current density J~ = J ẑ (and obviously J~ = 0 in the hole). Find the magnetic field at all
points inside the hole.
Problem 7.
~
a) Find the B-field on the z axis of a circular current loop of radius a and N turns
carrying a current I in the x − y plane (centered on the origin).
b) Set up the integral to be done to find the vB-field on the z axis of a disk in the x − y
plane of uniform charge density σ and radius a that is rotating with angular frequence
Ω around the z axis. (A) Do this integral (requires integration by parts a couple of
times).
336 Week 7: Sources of the Magnetic Field
Problem 8.
B field
Show that a uniform magnetic field that has no fringing field violates Ampere’s law.
Use a rectangular closed curve C that lies partly inside, and partly outside, the region of
confined field. Then explain why this does not apply to the uniform field inside a solenoid,
which goes “sharply” to zero as one crosses the current in the solenoid loops inside to
outside.
Problem 9.
L y
I
A square loop of wire lies in the x − y plane centered on the z axis and carries a current
I. It has side length L. Find the magnetic field at an arbitrary point on the z axis, and show
that in the limit z ≫ L it gives an expected result in terms of the magnetic moment mz of
the loop. Note that this problem is “simple” – just a repeated use of the field of a straight
segment of wire – but visualizing the geometry in terms of the givens is not simple and is
Week 7: Sources of the Magnetic Field 337
the object of the exercise. So draw a very good, very large picture! Or several! Visualize!
I R/2 I R/2
z
R
R
y
(A) A pair of Helmholtz coils is made up of two loops of wire with N turns and radius
R carrying a current I per turn. They both are concentric with the z axis with centers at
d2 Bz
z = ±R/2. Show that at z = 0: dB dz = 0 and dz 2 = 0. This means that the magnetic field
z
Find the magnetic field on the axis of a uniform disk of charge with radius R, mass M ,
and charge Q. This should be fairly easy to set up at this point and everybody ought to
be able to do it. The resulting integral over r, however, will require integration by parts to
solve, in particular
r 3 dr
Z
(r 2 + z 2 )3/2
If you make u = r 2 and v = (r 2 + z 2 ) this is pretty easy, but it is still a bit difficult for
non-majors. A good challenge problem, though, for non-majors who want to improve their
math and physics skills!
Express the answer in terms of the magnetic moment of the disk (computed previously)
and show that its limiting form as z ≫ R is that of a dipole. Note also that this is the spinning
disk that you demonstrated would precess in the previous chapter when placed in a strong
field! At this point you should understand spinning disks of charge (as dipoles) pretty well!
338 Week 7: Sources of the Magnetic Field
IV: Electrodynamics
339
Week 8: Faraday’s Law and
Induction
• Suppose a conducting bar moves through a field at right angles to the field lines and
the alignment of the bar. Magnetic forces quickly push charges to the two ends until
an electric field is created that balances the electric force. The integral of this field is
called a motional potential difference.
• Suppose now that a rectangular wire loop is pushed into (or pulled out of) a uniform
field that terminates at an edge (perhaps generated by a solenoid with a slot in it).
We note that the field now pushes charges around the loop in agreement with the
motional potential difference and that the net magnetic force on the current carrying
wire resists the push into (or pull out of) the field.
• We consider a conducting rod on rails as it slides through such a field. We can see
that the induced/motional potential difference is equal to the time rate of change of
the field times the area the field occupies within the rectangle.
• Time for our final Maxwell equation. If the magnetic field flux through an open surface
S bounded by a closed curve C varies in time it induces an electric field dynamically
around the closed curve according to Faraday’s Law :
d
I Z
~ · d~
E ℓ=− ~ · n̂dA
B (8.1)
C dt S/C
The integral on the left is the induced voltage around the curve C.
• In this equation the minus sign is called Lenz’s Law and tells us that the induced
voltage decreases around the loop in the direction such that a flow of positive charge
in that direction (the induced current if the loop is a conducting pathway) will oppose
the change in the varying flux. If the flux is decreasing it will generate a magnetic
moment that points in the direction that will increase it. If it is increasing it will gener-
ate a magnetic moment that points in the direction that will decrease it. This causes
the opposition to motion noted in the motional voltage problems above.
• The flux through a conducting loop is directly proportional to the current through the
loop itself or to the current through nearby sources of magnetic field that produce the
flux. The constant of proportionality in either case depends solely on the geometry
of the loop and source(s). That is, given a bunch of loops:
X
φi = Mij Ij + Li Ii (8.2)
j6=i
341
342 Week 8: Faraday’s Law and Induction
where the Mij are called the mutual inductances between the ith and jth loops and
Li is the self inductance of the ith loop.
• From this we can compute the self-induced (loop) voltages for simple current-carrying
loops, in particular solenoids. To compute the self-inductance of a solenoid we begin
with the result for the magnetic field inside an ideal solenoid from Ampere’s Law:
µ0 N I
B= (8.3)
L
(parallel to the solenoid axis). The current I creates a flux per turn that is equal to:
µ0 N AI
φt = BA = (8.4)
L
where A is the cross-sectional area of the solenoid. The total flux is thus:
µ0 N 2 AI
φ = N BA = = Ls I (8.5)
L
where Ls is the self-inductance of the solenoid. Clearly:
µ0 N 2 A
Ls = (8.6)
L
which depends only on the geometry of the solenoid just as the capacitance of an
arrangement of conductors depended only on their geometry.
Last week we saw that our study of the sources of the magnetic field, together with the
Lorentz force law, are starting to raise red flags concerning the consistency of electromag-
netic theory. One victim of the developments so far is Newton’s Third Law 101 , directly
violated by magnetic forces between moving charged particles! We noted that there is
some sort of hidden problem with Ampere’s Law – that you may or may not have figured
out on your own from the hints – but it seems as though it might have something to do with
dynamics and using a current that is invariant when we make an arbitrary choice of “the
surface S bounded by a closed curve C”.
In addition, the magnetic fields that appear when we change reference frames in which
we view even isolated charges did not appear to lead to Newton’s Second Law being
invariant under Galilean frame changes as well! Up to now, we have absolutely relied on
Newton’s Second Law and its frame-invariance as the basis for all of our dynamics, so
this is very disconcerting! Worse, when we studied the Lorentz force law, we learned that
magnetic forces, by their very nature and defining equation, can do no work on isolated
charged classical (spinless) particles or, by extension, electrical currents!
This week we will begin by looking at a very simple scenario in which it will certainly look
algebraically like magnetic fields do work and confront the puzzle: How is this possible?
Fm
q v
L Fe
B (in)
Figure 8.1: A conducting rod of length L moving through a uniform magnetic field into
the page. The field polarizes the free charge in the rod until a region of crossed fields is
produced.
To see the nature of the difficulty, consider a conducting rod of length L moving through
a uniform magnetic field at right angles to the field as show in figure 8.1. The rod is, of
course, made up of many, many microscopic point charges, and as the rod moves to the
v in the magnetic field, all of those charges experience a magnetic force
right at velocity ~
(according to the Lorentz Force Law that we learned two weeks ago).
Because it is a conductor, it has an “inexhaustible” supply of free charge that can move
101
One can “sort of” rescue it by insisting that it only holds for force pairs directed along the line connecting
two particles and not forces with nasty right-angled cross-products in abundance inside, but that both begs the
question – sure, it holds except where it doesn’t – and prevents us from sensibly setting out to rescue it and
thereby the Law of Conservation of Momentum by inventing a field theory where fields can carry energy and
momentum.
344 Week 8: Faraday’s Law and Induction
within the conductor under the influence of this force while the equal and opposite charge
of the presumed neutral conductor is pushed the other way. We will assume that the free
charges have magnitude q (which might be positive or negative) – none of what we work
out will depend on its sign or whether only one or both charge flavors are free to move.
The magnetic force on any given charge in the rod is, of course:
~ m = q(~
F ~
v × B) (8.12)
which is up for any e.g. positive conduction charge that appears to be moving to the right
at velocity ~
v . We therefore expect the magnetic field to push free charge up until it reaches
the end of the rod, where a surface potential holds it in (the vacuum beyond is basically
an insulator, if you like). Every charge that migrates to the upper end leaves behind a
“hole” (ion of the opposite charge in the lattice) and following the exact same reasoning we
used in our study of the Hall Effect, we conclude that these negatively charged “holes” will
migrate (via backfilling) until they are located at the lower end, at which point there is no
charge available to backfill them.
The charge in the rod therefore polarizes, creating a net negative charge at one end
and a net positive charge at the other end that create an electric field in between pointing
from the top end to the bottom one. Charge will move until the remaining free charge in the
rod in between the ends experiences no net force when the electric and magnetic forces
balance – figure 8.1 shows an intermediate state where they don’t yet balance and charge
is still moving around. Ultimately, the rod spontaneously forms a region of crossed fields,
exactly the same way it spontaneously formed in the case of the Hall Effect, only now there
is no current; the forces that balance are brought about solely by the motion of the rod
through the stationary, uniform magnetic field!
With this insight, we can easily deduce the condition for force balance for the charges
in the rod proper:
F~m + F~e = 0 (8.13)
or (since they are in opposite directions and the motion is at right angles to the magnetic
field)
qvB = qE (8.14)
or the magnitude of the electric field that is generated in the polarized rod is given by
E = vB. This field, in turn, creates an electric potential difference between the ends of the
rod:
∆V = L · E = (vL) · B (8.15)
If we were to somehow construct a conducting pathway between the ends of the rod,
we would expect current to flow, and naively at least we would expect it to be driven by
the magnetic force on the positive conduction charges that push them up in the rod and
hence around in a loop of we build that conducting pathway, even though we know that the
magnetic force cannot be doing any work! This, of course, is the paradox – if the magnetic
field isn’t doing any work, but work is being done, what is?
To make the problem we are confronting absolutely clear, please note that in the figure
above, we are examining what happens in a frame of reference in which the rod moves
through a static, uniform magnetic field. Let’s imagine that we have changed reference
Week 8: Faraday’s Law and Induction 345
frames – mentally jumped onto the rod so that we and the rod are now at rest and the
uniform magnetic field is sweeping past us the opposite way. This is portrayed in figure
L E pol E ind v
B (in)
Figure 8.2: A stationary (white) conducting rod of length L sits inside a magnet producing
a uniform magnetic field (shaded grey) moving in the opposite direction at speed v. We
must observe the same charge polarization and electrostatic field in the rod observed in
figure 8.1 as this is a trivial inertial frame change! But in this frame there is no magnetic
force acting on the stationary charges in the rod! We are left with little choice but to imagine
that an equal-but-opposite electric field has appeared inside the moving magnetic field.
In this case we have no reason to think that there should be a magnetic force on
charges in the rod at all! They are all at rest in this frame of reference, and the magnetic
field they are moving in isn’t varying at the location of the rod, it is constant in magnitude
and direction! Yet things like the observed distribution of charge (at least!) in the frame
where the rod is stationary has to agree with the distribution of charge in the frame in
which the rod moves. Physical reality itself cannot change along with our point of view; the
charges are where they are (at the ends of the rods) no matter the frame we look at the
rod in!
Even in this stationary frame, then, the charge in the rod has apparently polarized in
such a way that that the electric field inside the rod is zero (as charges there are no longer
moving and there is certainly no magnetic force acting on them). We know that the electric
field generated by the polarization charges has not changed – it is still Epol = vB pointing
down in the stationary rod.
We can explain this – I’m tempted to say only explain this but that is likely an overreach
of our logic – by asserting that an induced electric field E ~ ind of exactly the same magni-
tude, pointing up has appeared in space outside of the rod but inside of the moving magnet,
in such a way that the vector sum of E~ ind (up) and the electrostatic field E~ pol = vB (down)
produced by the polarization of the rod cancel inside the rod – as they must!
If there is no possible way for a magnetic force to be exerted on the stationary charges
in the rest frame of the rod, the only remaining force that (as far as we know) acts on
charge per se at all is an electric force. A consistent explanation, however odd it might
seem at first, is that the motion of the rod through the magnetic field, when viewed in the
frame of the stationary rod, has generated an external electric field from the bottom of the
rod towards the top! This field has acted exactly like an external field always does, and
created surface charge densities at the ends that polarize the rod until the internal field
346 Week 8: Faraday’s Law and Induction
but pointing up, not down, when seen in the rest frame of the rod. This, believe it or not,
is our first glimpse of a natural law that is one of the fundamental cornerstones of human
civilization in disguise – without it our lives would be far, far poorer.
By once again using our imagination to change our point of view to a different inertial
reference frame and using the expected invariance of the laws of physics when we perform
such a change in frame, we have discovered induction – the creation of electric fields from
magnetic fields subjected to a change in the frame of reference. We have a ways to go
before we completely understand this and can write the result down as our fourth and final
Maxwell equation, Faraday’s Law, but we can already see that it must be so as the result
beautifully resolves the paradox of “what does the work” on moving charges in a magnetic
field (which can do no work, yet work as we shall see in a moment is clearly done).
In the next section we will reconsider this rod when we do indeed provide it with an
idealized conducting pathway that allows current to flow. In the process, we will get a step
close to a suitable general formulation of the underlying physical principle.
Fm
q
R L v
B (in)
x
y
Figure 8.3: A conducting rod of mass M and length L moving through a uniform magnetic
field into the page and sliding on frictionless conducting rails that are connected by a
resistor R outside of the magnetic field. Current can flow around the loop thus formed.
In figure 8.3 we have added a pair of frictionless conducting rails connected by a wire
outside of the magnetic field. The total resistance of the loop thus formed (including the
rod) is R. We have added an x coordinate to show indicate the instantaneous position of
the rod, which is still moving to the right at speed v.
In the previous section we decided that while in the lab it looked as though there was a
Week 8: Faraday’s Law and Induction 347
magnetic force acting up on any given free charge q in the rod (which is now free to move
all the way around the loop as part of a “continuous” current I formed in the usual coarse-
grained limits we have now seen several times), in the frame of the rod itself there was
an external electric field generated as the magnetic field moved across it in the opposite
direction of magnitude E = vB. In this frame, at least, this is what actually pushes the
charges along, doing work as needed. Of course this electric field now has to exist in the
entire conducting pathway as it has to push the charges along against the actual resistance
R, and we know that to properly ensure that the work-energy theorem is satisfied, we
should think not of the field, but of the potential difference produced by the field. The
magnitude of the potential difference induced across the rod in its own frame is clearly:
The next thing to consider is the sign of this potential difference and the direction of the
induced current in the loop. This is very confusing; I will do my best to explain it, but you
will have to work through it carefully to understand it.
We can see that both the magnetic force and the induced electric force in the rod, at
least, are going to push an electrical current counterclockwise in the arrangement above
and we will see shortly that any other direction would badly violate energy conservation.
We drew the resistance R outside of the magnetic field for convenience in a frame where
its leg and the rails and the magnetic field itself are stationary and the rod is moving, but
we could have put the resistance anywhere in the loop – in the stationary rails inside of the
magnetic field, or in the moving rod itself inside of the magnetic field – and of course we
can also still “hop” mentally into the frame where the rod is stationary and it is the rails, the
resistor, and the magnetic field that are all moving!
Consider this last case and imagine the resistance R to be inside the rod, in the frame
where the rod is stationary. Counterclockwise current is then flowing from the bottom of
the resistive rod to the top, so (from Ohm’s Law) the bottom of the rod must be at a higher
potential than the top. This makes sense from the point of view of the induced electric field
in the frame of the rod, which is great!
Next, let’s consider the resistance to be in the top rail, in any frame. The current is still
counterclockwise, so the right hand side of the resistance is at higher potential than the
left! When the resistance is outside of the field in the vertical left-hand leg, the potential
at the top is greater than that at the bottom! When it is in the bottom leg, the potential is
higher to its left than its right!
We see that no matter where we put the resistor in the loop, the potential has to de-
crease when we traverse the circuit loop counterclockwise across the resistor! That means
that the “battery” produced by the induced voltage has to increase as we traverse the cir-
cuit the entire circuit loop, going the other way (clockwise) no matter where we imagine
that “battery” to be.
Clearly the induced voltage doesn’t appear only “in the rod”, either in the frame where
it is moving or the frame where it is stationary or any frame at all – it appears in the entire
loop all at once because the fields that produce it appear and disappear in different places
depending on the frame we choose, but observers in all frames have to agree about the
348 Week 8: Faraday’s Law and Induction
current in the wire, magnitude and direction just as in our initial rod-only example, they had
to agree on the rod polarization.
This is basically another appeal to physical invariance plus what we’ve learned about
resistances in series – in both frames we know that the magnetic force or induced elec-
tric force respectively must somehow push the charges around the loop counterclockwise
when v is to the right. All we have to do to ensure this is make up a sign convention for the
induced voltage so that this is true, and ensure that energy conservation is satisfied (that
is, ensure that Kirchoff’s Loop Rule is satisfied in both frames).
Let’s take the direction of the magnetic field itself through the loop as the direction of
a unit vector normal to the loop, n̂. We will then use the Right Hand Rule as usual to
determine the positive direction around the loop by letting our thumb point in this direction
and noting which way our fingers curl around the loop (clockwise in this example). If we
traverse the loop clockwise, the potential across the resistor will increase by IR, no matter
where in the loop we place it. This means that the induced electric field has to circulate
around the loop counterclockwise (in agreement with what we concluded considering the
frame where the rod is stationary).
We can therefore write Kirchoff’s Loop Rule for the loop, going clockwise, and putting
in an explicit integral for the potential produced by the induced electric field :
I
∆Vloop C = − ~ ind · d~
E ℓ + IR = 0 (8.18)
C
E~ ind goes around counterclockwise, so that the voltage integral around the loop is positive,
and we already evaluated it in the frame where the rod is stationary as having magnitude
BLv, so:
BLv
∆Vloop C = BLv + IR = 0 ⇒ I = − (8.19)
R
The minus sign in the current is relative to the positive direction around the loop – it tells
us that the current is counterclockwise! This is the only possible sign that can correctly
cause energy to be conserved as a charge is pushed around the loop without gaining or
losing net energy in a circuit; the charge has to gain energy from the induced field and lose
energy into Joule heating of the resistance, and the physically induced electric field has to
be parallel to the current to be able to drive charge through the resistor (which could be
anywhere in the circuit)!
~ ind electric
This last bit is the final thing we have to clear up. Where, exactly, is this E
field induced? What is it (in detail) inside of the conductor? At the moment, it appears to
follow the resistor around as we change the location of the resistor and/or change frames.
And in fact, this is exactly where it is, at least in this simple example.
The electric field inside the conducting loop depends on the resistivity and current den-
sity associated with the entire conductive pathway, since we know that Ohm’s Law can be
written as:
E~ = J~ρ (8.20)
at all points inside the current carrying conducting pathway. Where ρ is zero, there is no
field at all. Where ρ is not zero, there must be a field pushing the charges through the
resistive conductor there. The cumulative work done by that field equals the rate that work
Week 8: Faraday’s Law and Induction 349
appears as heat in the resistor, and the only thing that can be creating that electric field is
something associated with the changing magnetic field that does not depend in detail on
the particular frame we watch the experiment in!
The best that can be said, then, is that the field appears in the entire loop, not “across
the rod” or “across the resistor” (which isn’t even moving) or “along the rails” (which might
actually be a part of the net resistance, as might be the rod). This also means that the
induced electric field forms a closed loop. It is neither an electrostatic field nor what we
have called so far a conservative field!
This does not violate Gauss’s Law for Electrostatics – we can add any electric field
loops we like to the electrostatic field loops it describes and they will not contribute to the
net electric flux through any closed surface S – but it does make one of our rules for visual-
izing electric field lines obsolete. Electrostatic fields begin and end on electric charges, but
induced electrodynamic fields apparently can form closed loops, not beginning or ending
on any charge!
This does have a significant impact on how we write the electric potential associated
with the electric field. Recall that we defined a conservative force as one where:
I
~ · d~
F ℓ=0 (8.21)
C
for all closed loops C one can draw in space. The electrostatic field was conservative – if
we let F ~ = qE
~ and factored and cancelled q, we got:
I
E~ · d~
ℓ=0 (8.22)
C
The induced electrodynamic field that appears in the loop, however, is not conservative!
It has a nonzero integral around the loop:
I
∆Vind = E~ ind · d~
ℓ = BLv 6= 0 (8.23)
C
We recall that the whole point of a conservative field and its associated potential was that
E~ = −∇V~ (encapsulating Newton’s Second Law) in cases where the work done going
around a closed loop didn’t depend on the path taken. This new result more or less means
that the work done does depend on the path taken, but in a very special way. It also
does indeed mean that E ~ is no longer going to be equal to the negative gradient of the
electrostatic potential! We are going to get an additional piece that depends in some way
on the magnetic field and the loop itself!
My goodness, things are getting complicated! Perhaps it is time to make just two more
observations and then finish off this particular problem before coming back to the equation
that it seems to imply. The first observation is that (given constant B and L in the picture
above):
d(BLx)
I
|∆Vind | = ~ ind · d~
E ℓ = BLv = (8.24)
C dt
dx
(because v = dt ) and, noting that A = LX is the area inside of the loop we can write this
as
d(BA)
I
|∆Vind | = ~ ind · d~
E ℓ= (8.25)
C dt
350 Week 8: Faraday’s Law and Induction
which is just begging to be turned into the flux of the magnetic field through the loop C:
dφm
I
|∆Vind | = ~ ind · d~
E ℓ= (8.26)
C dt
where: Z
φm = ~ · n̂dA
B (8.27)
S/C
is the magnetic flux through the surface S bounded by the closed loop C.
The second is that if energy isn’t ultimately conserved, life is going to be bad for physics
students because magic102 and perpetual motion machines both become possible, and yet
we never seem to actually observe either one in nature. Nature is stable, not unstable the
way it would be if induced forces increased the very motion that induced those forces (to
make them increase even faster, with no source for the energy associated with the ever-
increasing force).
We’ve already seen that the potential around the loop has to increase when we go
counterclockwise in order to balance the rate that energy is removed from the loop by the
total resistance in Kirchoff’s rule combined with Ohm’s Law. Eventually we’re going to need
to formalize this as a rule for the sign of the change in potential we get going around the
loop in any given direction. In order for us to be able to tell somebody far away about this
rule, we ought to make sure that it is based on the use of our right hands to determine loop
directions relative to something that uniquely orients the problem, such as the direction of
the magnetic field through the loop.
In the next section we will, as promised, take all of these observations and combine them
into a new physical law, and a very beautiful one it will turn out to be! But yeah, let’s finish
off this problem first. Of course you may be asking what problem, since I haven’t stated one
yet. How’s this: Let’s find everything about the rod sliding on rails in this system, assuming
only that it starts at time t = 0 moving at initial velocity v0 to the right. That is: I(t), v(t),
x(t) and so on, we’ll find it all! Time to use Newton’s Laws once again!
We begin with:
∆Vind − IR = 0
I
~ ind · d~
E ℓ − IR = 0
C
BLvx − IR = 0 (8.28)
(where we have used the results of the first section to evaluate the total induced voltage in
the loop and where we’ve added the x subscript to v to make it clear that we are dealing
102
You might, if you are a science fiction and fantasy reader (and writer) like myself, think that it would be great
fun to live in a Universe where either one was possible. Think again. Life is unstable, chaotic, and whimsical
enough as it is with the negative feedback associated with the laws of thermodynamics; with unbounded
positive feedback loops possible at all, it seems rather likely that the Universe would simply explode instantly,
much the same way that positive feedback in an amplifier leads to an ear-shattering screech and (if the gain
is turned up enough) blown fuses. We wouldn’t want to live in a Universe with a blown fuse now, would we?
Week 8: Faraday’s Law and Induction 351
with τ = mR/B 2 L2 the exponential time constant of the rod’s velocity as it slows down. A
plot of vx (t) is shown in figure 8.4.
Given vx (t), we can easily find:
BLv BLv0x −t/τ
I(t) = = e = I0 e−t/τ (8.33)
R R
where I0 = BLv0x /R is the initial current at t = 0. Similarly, we can directly integrate:
dx
vx (t) = = v0x e−t/τ
dt
dx = v0x e−t/τ dt
Z x(t) Z t
x(t) = dx = −v0x τ e−t/τ (−τ dt)
0 0
x(t) = v0x τ 1 − e−t/τ (8.34)
This let’s us see at a glance that the rod will (eventually, after “infinite” time) come to rest
having moved down the rails a maximum distance v0x τ = v0x mR/B 2 L2 .
352 Week 8: Faraday’s Law and Induction
0.8
v(t)
0.6
0.4
0.2
0
0 0.5 1 1.5 2 2.5 3
t
Figure 8.4: A plot of the exponential decay of the velocity of the rod as its initial kinetic
energy is “burned” in heating the resistor with the induction-driven current that also slows
it down. The units of the plot are v0x (for v) and τ (for t) to make it “universal”.
The magnetically induced electrical voltage produces a current that produces a force in
the magnetic field that slows the rod down. If energy is indeed conserved, we would expect
that the rate at which the kinetic energy of the rod decreases should exactly match the rate
at which Joule heating from the current occurs in the resistor. That way the negative work
done by the induction force is precisely balanced by the positive appearance of heat energy
in the resistor throughout; energy isn’t being created, it is just being changed from one form
to another.
This is easy enough to test algebraically. The rate at which power appears in the
resistor is (substututing in several results from above):
B 2 L2 vx (t)2 B 2 L2 v02 2B 2 L2 t
2
PR (t) = I (t)R = = exp( − (8.35)
R R mR
B 2 L2 v02 2B 2 L2 t
PF (t) = Fx (t)vx (t) = (−BLI(t)) vx (t) = − exp( − (8.36)
R mR
which is exactly the same but which has, of course, the opposite sign because F is slowing
the rod down! If we add the two, we see that:
at all times t and energy is indeed conserved! The kinetic energy removed from the rod by
the induced force appears in the resistor as heat, precisely. Our “non-conservative” loop
integral of the field is, in fact, conservative after all!
At this point we know pretty much everything about this loop and it is all consistent
with the fundamental physical laws we have learned so far. If nothing else, the physics
Week 8: Faraday’s Law and Induction 353
of the rod sliding in the magnetic field works as if an electric field is induced around the
conducting loop which does indeed do work on the system that transforms its initial kinetic
energy into heat energy in the resistor as it slows down the sliding rod.
But wait! Isn’t the force involved a magnetic force? What about magnetism?
The astute student will have noted (from the hints I have liberally supplied) something
puzzling about this problem and solution. When we looked at the rod moving through the
field alone (no rails) the rod spontaneously developed an internal region of crossed fields
with an electric field that balanced the force exerted by the magnetic field on the moving
charges. However, when we connected the ends of the rod with the rails, a current was
established that runs in what appears to be the direction of the magnetic force and the
potential decreases in the loop when it is traversed in the opposite direction of the static
field set up in the rod moving alone!
There is a strong temptation to look at this and say “Wait a minute. It isn’t an electric
field pushing the charge around the loop, it is the magnetic field! What kind of swindle are
you trying to pull, here?” This, in turn, might make you doubt that Faraday’s Law (which
we’re working steadily towards) is correct at all – maybe it is just the magnetic field that is
doing the work of pushing those charges around the loop!
Be careful! If you look back at the chapter on Magnetic Force, you will see that mag-
netic forces can never, ever, ever do work on spinless point charges! To remind you:
dW ~ ·~ ~ ·~
P = =F v = q(~
v × B) v=0
dt
is an identity of the cross product. The magnetic field itself is incapable of doing work of
this sort as it can only exert forces at right angles to the direction of motion of a charged
particle. We really have little choice but to believe that the electric field introduced in:
I
|Vind | = E ~ · d~
ℓ
above in Kirchoff’s Loop Rule for the rod on rails is “real”, at least as real as the electric
field we invented to describe the action-at-a-distance Coulomb force so many weeks ago.
It is, however, a worthwhile experience to run down this “work” thing, so let’s look
carefully at the problem and see just exactly where the work that appears as heat in the
resistor does, in actual fact, come from, and in the process start to clear up just what the
fields are and where they are. In figure 8.5 the actual motion of a (positive) point charge
carrier is depicted as it moves in a rod that is pulled by a hand so that it moves at a constant
speed v0 to the right on the rails from the initial (dashed) position on the left (where the
charge q starts to move up from the rail at the bottom) to the final (dashed) position on the
right (where it reaches the top rail). I’ve invented and drawn some θ angles into the figure
to make the important vector decompositions easier to see and algebraically evaluate.
As the charge q moves across the rod, it also moves down the rails with the rod so that
its actual trajectory is the diagonal dashed line of length L, not the vertical line of length
354 Week 8: Faraday’s Law and Induction
y I
v
Fm θ
R
l θ Frod
L v0
I
Fhand
B (in) I
x
Figure 8.5: The actual motion of the charge q is along a diagonal of length L. A “hand”
pulls the rod, which transfers some (electrostatic) force Frod to the charge to overcome the
component of the magnetic force pulling it back.
ℓ = L cos θ. It’s actual velocity ~ v is also in this direction, not straight up! The magnetic
field is still into the page (I drew fewer “×”s but it is still there) so the vector magnetic force
F~ m acting on it is similarly not straight to the left (as it would be if the charge were really
moving straight up) but at right angles to ~ v , diagonally up and to the left as drawn. We see
that the speed of the rod to the right v0 = v sin θ is just the horizontal component of the
actual velocity of the charge!
Note well that I’ve drawn the “hand” that pulls on the rod hard enough to maintain it at
a constant speed v0 (doing work all of the while). When the test charge turns the corner to
start moving up the rod, the rod has to exert a force on the charge – F ~ rod – that accelerates
it in the x direction this speed or it would come out of the rod as it is left behind! While
moving at the constant x-directed speed v0 , the force F ~ rod has to precisely cancel the x
component of the magnetic force in the opposite direction, or the charge would come out
of the rod and be left behind.
Of course, the charge does not come out of the rod, for the same reason that all of
the conduction charges in the rod – however free they are to move inside the rod – do not
come out. Electrostatic forces exerted at the surface of the rod prevent charge from com-
ing out until pushed by a force great enough to cause dielectric breakdown, and internal
electrostatic forces otherwise cancel the field inside the rod in all directions except that of
current flow (where E ~ = ρr J~ for a rod with a non-zero resistivity)! It is even electrostatic
forces which, with an assist from quantum mechanics and the Pauli exclusion principle103 ,
are ultimately responsible for the “normal” force between the hand and the rod that exerts
the external force on the rod itself!
Now that we have this concept firmly in hand – errr – mind, we can do some geometry
and some algebra. The magnitude of the magnetic force is Fm = qvB, but now v is the
magnitude of ~ v along the diagonal, not v0 (it’s x component). As argued above, the total
force in the horizontal direction on the charge:
so it will remain inside the rod as it travels at the constant speed in the x direction v0 =
v sin θ. The horizontal component of F ~ m is (from basic trig, using the angle θ defined in
figure 8.5):
Fm,x = −Fm cos θ = −qvB cos θ (8.39)
so the force exerted on the charge by the rod that cancels this must be:
on the charge as it moves diagonally along the line L, starting at the bottom rail and ending
at the top. You can think of this as either the component of the rod force in the L direction
Frod sin θ times L, or the component of the L displacement in the direction of the force
L sin θ times Frod ; the net work done is the same either way.
We rearrange this as follows:
and see that the work done on the charge q (ultimately) by the hand, transmitted through
the electrostatic field binding the rod and charges together as it moves between the rails, is
exactly what one gets if one (mistakenly) claims that it is the magnetic force acting on q to
drive it vertically that does the work which in turn is exactly what you get if you consider the
work to be done by an induced electric field/potential around the entire conducting loop!
Note well that the Newton’s Third Law reaction force tells us that the force on the charge
is transferred back to the rod through the same electrostatic binding force, so that the entire
rod itself is pulled back by the magnetic force with exactly the overall force Fhand , which
is what really does all of the work. This is in perfect agreement with the result argued
for somewhat differently above. The one last thing we might need to consider is why this
induced field seems to follow nonzero resistivity around, but this is no different from the
way that there is almost no field inside a good conductor – we treat wires in electric circuits
as being equipotential even if they carry (moderate) currents – but recognize that charge
piling up at the ends of a poor conductor results in an electric field inside that drives the
charge on through against its resistance.
In the next section we will clearly state the conclusions of the first two sections of this
chapter in the form of a single equation: Faraday’s Law.
In the last section, we saw that for the rod sliding down the rails (at least) we could describe
the voltage induced around the closed loop formed by the rails as the time rate of change of
the magnetic flux through the loop. We left open the question of how to specify the direction
of the induced E-field, although clearly we have to have just the right sign (direction) in
order for energy to be conserved as it was for the rod and resistor together.
356 Week 8: Faraday’s Law and Induction
If we point our right hand’s thumb in the direction of the magnetic field through the
loop in the previous section and let its fingers curl around the loop, this is in some sense
“the” natural direction to specify as the “positive” direction for d~ℓ in the loop (clockwise as
drawn in figures 8.3 and 8.5). In this case an increasing loop area and increasing flux
produced a negative directed electric field (counterclockwise as drawn in figure 8.5) and
the induced current went in this direction (which is also the direction of the electric field
we “imagined” appearing in the rod in a frame where the rod is stationary and it is the
magnetic field that swept across the loop). This in turn made the force on the rod negative
(in the opposite direction of its velocity ~
v ) as it had to be, it turned out, for energy to be
correctly conserved.
This suggests that we could have written the voltage that appears in the loop completely
consistently with respect to magnitude and direction using this “right hand rule” as:
d
I Z
Vinduced in C = ~ · d~
E ℓ=− ~ · n̂dA
B (8.43)
C dt S/C
This equation is known as Faraday’s Law and is our first truly dynamical field equation for
the electromagnetic field. It tells us that changing magnetic flux through an arbitrary loop
creates an electric field around the loop.
The minus sign on the right hand side tells us the direction of this field – if we let the
fingers of our right hand curling around the loop as our thumb points in the (predominant)
~ through the loop, then if the flux through the loop is increasing the E-field
direction of B
circulates the loop C in the negative direction, opposite to the direction our right-handed
fingers curl around the B ~ field. Similarly, if the flux through the loop is decreasing the
E-field circulates around C in the positive direction, that is to say, in the direction our right-
handed fingers curl around the magnetic field through the loop.
Note that (to be sure!) magnetic fields can easily go “through” any given surface
bounded by a loop first in one direction, then the other, but because this integral is linear
~ and B,
in E ~ we can actually pick either of the two directions possible to be right-handed
positive, use superposition, and get the correct answer relative to this choice. The fact
that we pick the obvious direction in cases where it is obvious doesn’t mean the equation
can’t handle cases where it isn’t obvious or even isn’t known initially at all, forcing you to
arbitrarily choose a direction to be “positive” for both n̂ in the flux expression and d~
ℓ around
the loop!
The information encoded in this humble minus sign (which leads to energy conserva-
tion) is so important that it has a name of its own – it is called Lenz’s Law. Lenz’s Law can
be stated a different way in words as well:
The electric field induced in a loop by changing magnetic flux goes around the
loop in the direction such that any current generated by the field will create a
magnetic field of its own that opposes the change in the magnetic flux.
This is a very interesting result, and is worth studying for a moment all by itself before
returning to the many applications of Faraday’s Law.
First, though, we have to consider a serious question. Our discussion up to now has
involved current carrying loops of conducting material. To put it another way, C (the closed
Week 8: Faraday’s Law and Induction 357
and call equation 8.44 Faraday’s Law (with Lenz’s Law embedded as the minus sign in the
term on the right). This is our fourth Maxwell equation, valid for arbitrary closed curves
C in space whether or not there is a conductor, or nonzero charge, at any point on those
curves.
The existence of the induced electric field in free space even where there are no
charges or conductors is key to our later development of the dynamic electromagnetic
field – it suggests that the induced E-field can propagate through empty space as long as
there is a changing magnetic field present somewhere to produce it, even with no charges
or conductors locally handy for the field to act on.
Faraday’s Law is truly a sublime result. As we will see, this Maxwell Equation is directly
responsible for our ability to generate and transmit electrical energy to run our homes, our
businesses, our industries, our entertainments, our lives. If it were not for Faraday, I would
at best be laboriously typing this textbook on a mechanical typewriter by candlelight and
you would not be able to read it until a publisher (at great expense) typeset the entire book
and printed it with a steam or water driven press to sell for a small fortune, making its
contents available only to the fortunate and the wealthy.
Instead you are very likely reading a purely electronic version of the textbook that you
got for free, or perhaps paid a pittance for as a gesture of courtesy to the author105, all
104
Fans of “causality” might wish to assert that there must be some sort of field that is nonzero at the point
~
where the induced E-field appears. They will be pleased to learn that there is indeed a vector potential that
is generally not zero at the points where the E ~ field appears, one of many reasons physicists prefer potentials
to fields.
105
Yes, that’s me, and if you aren’t a Duke student you should very much consider the virtue of such courtesy
and how it enables high quality, cheap textbooks to be created and improved for your delight and edification...
358 Week 8: Faraday’s Law and Induction
thanks to electricity generated via Faraday’s Law and transmitted as electromagnetic wave
energy and processed in countless ways inside your computer that also rely completely
on Faraday’s Law. Each and every one of these carefully engineered occurrences is an
“experimental test” of Maxwell’s Equations in general and Faraday in particular, so you can
have a great deal of confidence that Maxwell’s equations (and the associated Lorentz force
law) are at the very least a very good approximation to some true underlying principle or
law of nature.
In the next section, we will discuss Lenz’s Law and give several examples of using it
either algebraically or conceptually to determine the direction of the induced electric field
around a loop, as promised.
Lenz’s Law, as we have just seen, tells us in a general, mathematically consistent way,
what the direction is of the induced E-field around a loop through which magnetic flux is
changing in time regardless of the mechanism of that change in flux and whether or not
there are charges or a conductor handy to produce or contain currents. However, if you
think about the equation for the magnetic flux through some surface S bounded by a closed
curve C: Z
φm (t) = ~ · n̂dA
B (8.45)
S/C
you will soon realize that the flux φm can vary in time for any or all of four reasons:
Yes, one can imagine a loop that is changing its size and its orientation inside a mag-
netic field that is changing its magnitude and its orientation, all four changes in time con-
tributing to the overall change in magnetic flux through a surface S bounded by the loop!
This multiplicity of ways the magnetic flux depends on geometry and field strength can
make it difficult to figure out the direction of the induced field. In this section, we will en-
deavor to provide examples of each of these separately to help you see how it all goes.
With a bit of meditation, you should then be able to figure out how to synthesize this knowl-
edge and work out the direction when multiple things are changing at once.
We’ve already seen an example of this in our single meaningful example this far – the
rod on rails. If a plane loop C in a fixed magnetic field is increasing in size, then the
induced field points opposite to the right handed direction determined from the magnetic
Week 8: Faraday’s Law and Induction 359
B(in) B(in)
I,E I,E
R R
(a) (b)
~
Figure 8.6: Illustration of E-field direction for loops that change size. In (a) the loop is
getting larger (tending to increase the magnetic flux) so the induced magnetic moment
from a counterclockwise E ~ field and current opposes the existing field through the loop.
In (b) the loop is getting smaller (tending to decrease the flux) so the induced magnetic
moment from the clockwise E ~ field and current supports the existing field through the loop.
field through the loops. If it is decreasing, it points around the loop C in the same right
handed sense.
In terms of the verbal statement (illustrated in figure 8.6), if a conductor of resistance
R were placed along a path C increasing in area (in (a)), the current in the loop thus
formed would have a magnetic moment that opposes the increasing flux through the loop.
Incidentally, the magnetic force acting on this current would point in towards the center of
the loop which is the direction that makes the loop try to shrink, not grow, opposing again
the increase in flux.
If the conducting loop were decreasing in area (in (b)), the induced current would be
in the direction that creates a magnetic moment for the loop in the same direction as the
magnetic field through the loop, again opposing the (now decreasing) change in flux. This
direction for the current also creates a general outward directed force on all parts of the
loop, which would make the loop grow to oppose the decrease in flux, if e.g. the rails were
not rigidly affixed.
In figure 8.7 we illustrate what happens when the magnitude of the B-field changes. In (a),
B is increasing in magnitude through a fixed loop while maintaining a fixed direction. Again
if we imagine a conducting pathway around C the (counterclockwise as shown with B ~ into
the page) current induced in it would create a magnetic moment from the loop that is in the
opposite direction as B,~ opposing the change in flux. The forces acting on this current in
each wire of the loop would point inward, trying to shrink the loop as an alternative way of
reducing the flux.
In (b), B and the magnetic flux are decreasing in magnitude and the opposite happens
~
– the induced moment would create an E-field and associated current that circulate in the
(clockwise) direction such that the induced magnetic moment supports the decreasing B ~
360 Week 8: Faraday’s Law and Induction
B(increasing) B(decreasing)
I,E I,E
R R
(a) (b)
~
Figure 8.7: Illustration of E-field direction when the magnitude of B through the loop
changes. In (a) B is getting larger (tending to increase the magnetic flux) so the induced
magnetic moment from a counterclockwise E ~ field and current opposes the existing field
through the loop. In (b) B is getting smaller (tending to decrease the flux) so the induced
magnetic moment from the clockwise E ~ field and current supports the existing field through
the loop.
field (opposing the change in flux). The magnetic forces on the loop wires would point
outward, trying to expand the loop as an alternative way of increasing the flux.
~ and/or n̂ direction
8.4.3: Lenz’s Law for changing the B
n B n B
θ θ
I,E I,E
~
Figure 8.8: Illustration of E-field direction when the direction of B~ or the direction of the
normal to the loop n̂ changes. In (a) cos(θ) is getting larger (tending to increase the mag-
netic flux) so the induced magnetic moment from a counterclockwise E ~ field and current
opposes the existing field through the loop. In (b) cos(θ) is getting smaller (tending to de-
crease the flux) so the induced magnetic moment from the clockwise E ~ field and current
supports the existing field through the loop.
Now we imagine the shape of the loop C doesn’t change, the (uniform) magnetic field
is constant in magnitude, but the loop’s orientation in the magnetic field is changing or the
direction of the magnetic field is changing (or both). Note that both changes have the same
effect: they alter the angle between the field and the normal to the plane of the loop, and
hence the flux through the loop. This is actually a very common situation – it describes an
Week 8: Faraday’s Law and Induction 361
Note that it is entirely possible for all four of these contributions to the total flux to be
changing at once. The loop and field could both be rotating, the loop could be shrinking
or growing, and the field could be turning on or turning off all at the same time! Problems
where all of this is going on at once are a bit excessive, perhaps, largely because it is such
a pain to specify all of the possibly competing parameters, but in principle you know what
~
you need to know to determine the E-field/current direction from Lenz’s Law. It will always
point in the direction such that a magnetic moment associated with a current in the induced
~
E-field direction (whether or not one actually exists) would oppose the change in magnetic
flux through the loop.
This is precisely the right direction for energy conservation to always hold for the sys-
tem. We can breathe a sigh of relief!
There is one more observation we can make that can help you solve direction problems
involving Lenz’s Law – a whole new way of stating it that should make physical sense to
you. When C was changing, the forces acting on the loop all opposed the change – if it
was expanding the induced forces tried to keep it from expanding. If it was contracting, the
induced forces tried to keep it from contracting. When B’s magnitude was increasing, the
forces tried to make the loop smaller, opposing the increase in flux by decreasing its area.
When it was decreasing, the forces tried to make the loop larger for the same reason! In the
final case, where the angle θ between the field and the normal was decreasing (increasing
the flux), the induced forces on the loop itself would be trying to make it decrease its area
and the induced magnetic torque on the loop would be acting in the direction to make θ
larger, opposing the external torque that is causing θ to shrink. If the angle θ is increasing,
the forces will try to increase the area of the loop and the induced magnetic torque will
again opppose the torque increasing the angle θ, trying to make it smaller.
In summary, every aspect of physics resulting from the change in flux opposes the
change. The magnetically induced current creates a field that augments or opposes a
decreasing or increasing magnetic flux through the loop, respectively. The magnetic force
on the induced current in the loop will act to increase or decrease the area as the flux
through it decreases or increases, respectively (opposing the change). The induced mag-
netic torque on the loop will tend to rotate it so its normal is more parallel or at right angles
to the field through it as the flus through the loop decreases or increases, respectively,
(opposing the change). If the loop is in an non-uniform field, the total force acting on the
loop will point in the direction where the field is weaker or stronger as the flux increases or
362 Week 8: Faraday’s Law and Induction
decreases, respectively, opposing the change. We can rewrite our verbal statement of
Lenz’s Law as:
If the flux through a current loop is increasing, the loop will simultaneously try to shrink
in size, rotate to an angle at right angles to the loop, move overall in the direction the field
strength decreases if any, and generate its own magnetic moment in the opposite direction
to the increasing field. If it is decreasing, it will do the exact opposite: increase its size,
move to stronger field, torque its normal until it is parallel to the field, and augment the
decreasing field with its own induced magnetic moment. This means that we will always
need to be doing external work against these Lenz’s Law reactions to bring about the
changes, causing work and energy to remain in perfect balance.
B (in)
dr R
a
I b
Figure 8.9: A long straight wire sits next to a rectangular loop of wire and carries a current
I up as shown. The current in the long straight wire can be increased or decreased.
In figure 8.9 above, a long straight wire is carrying a current I. It sits a distance d away
from a rectangular loop with side lengths of a and b (all wires in the plane of the page) as
shown. I can be increased or decreased at will.
Here’s the physics of this picture. The current I creates a magnetic field through the
loop. We can easily compute that field using Ampere’s Law (so we don’t have to remember
things like the magnetic field of long straight wires). On the other hand, we’ve worked
enough with the magnetic field of long straight wires that perhaps you do remember that it
µ0 I
is , into the page on the right for a current up as drawn – I’ve helped you out a bit with
2πr
lots of “dressing” on this figure that on a quiz or exam you’d have to provide for yourself.
If I is varied, the field it generates varies as well. This changes the magnetic flux
through the rectangular loop. Mr. Faraday then tells us that there must be a voltage induced
in the loop that will create a current!
Week 8: Faraday’s Law and Induction 363
You can actually completely calculate the induced voltage in the rectangular loop using
Faraday’s Law (and will, in a homework problem) and from the voltage compute the current
in the loop, and from the current the force on the loop. But here our goal is more humble.
We simply want to figure out the direction of the induced current, and the direction of the
induced force, using Lenz’s Law.
Suppose the current I is increasing. Then we expect the magnetic field into the page
– and the magnetic flux through the loop – to be increasing as well, and we can tell the
following (highly anthropomorphized) story:
The increasing flux makes the loop sad, because it is a very conservative loop. It
hates change, and is happy with things just the way that they are. It says to itself “Gosh,
I’d really rather the magnetic flux through me not change, what can I do?” It then has
the brilliant idea: Create an electric field to drive a current around itself so that its own
magnetic moment opposes the change in flux! Perhaps it won’t keep the flux from changing
altogether, but it will ensure that at least the flux will change more slowly than it would
without the induced current.
But which way is that? Well, a clockwise current would make the moment of the loop
point into the page, which would make the field through the loop even stronger, so that
won’t work. Instead the reactionary little loop makes the current counterclockwise. Now
its own magnetic field opposes the field due to the wire, and slows the rate of change of
magnetic flux through itself. Eventually, of course, the field might reach a new constant
value as the current in the long straight wire stops changing and the loop becomes happy
again with constant flux through it and no current at all.
The induced current in the counterclockwise direction has an additional bonus for the
loop. It makes the net force on the loop point away from the wire (as you can verify when
you solve the problem completely). If the loop is free to move, moving away from the wire
moves it from a strong field near the wire to a weaker field farther away from the wire! This,
too, helps to keep the flux through the loop from increasing, and is a part of the responses
predicted by Lenz’s Law. If the loop is even slightly tipped relative to the field, then there
will be a nonzero torque on it as well, trying to twist it towards right angles relative to the
field, reducing the flus through it in that way as well! Again, every reaction of the loop to
the increasing flux will physically tend to oppose the increase in flux!
When you do this problem for homework, you will have to compute the net magnetic
flux through the loop (in order to differentiate it to find the induced voltage). I’ve helped you
out here by shading a strip of length a and width dr, a distanced r from the main wire. It
should be pretty easy to compute the flux dφm through this strip, and then to sum up the
total flux using integration between suitable limits. Give it a try.
In figure 8.10 you can see a wire loop (rectangular, although this makes no real difference)
being pulled from the field. A typical short answer question might show this picture, or a
similar picture, of a loop of any shape you like being pushed into or pulled out of a magnetic
field and ask you the following questions:
364 Week 8: Faraday’s Law and Induction
B (in)
R
F
Figure 8.10: A rectangular loop of wire is pulled out of a region of uniform magnetic field
as shown.
~
• What is the direction of the induced E-field/current in the wire as it is being pulled
out (or pushed in)?
• What is the direction of the magnetic force acting on the loop while this is going on
(in either direction)?
• A trick question might show you the loop completely inside the uniform field (so it isn’t
actually coming out!) and ask the same questions.
• When the loop is being pulled out, the flux through the loop is decreasing. The sad
little loop doesn’t want the flux to go away, so it generates a clockwise current whose
magnetic field sustains the disappearing flux.
• The net force on this current resists the motion of the loop out of the field. Check it
yourself!
• If the loop were entirely in the field, the flux wouldn’t be changing as it moved and
there would be no current and no net force.
This example is almost identical to a rod on rails problem, is it not? For a specified
geometry and mass m of wire loop and speed v, you might well be able to compute the
current, the force, the acceleration, the trajectory.
In figure 8.11 above, the switch is closed at time t = 0 with the rod (of mass M and
length L) sitting at rest on a pair of frictionless conducting rails that are on the other end
connected by a resistor R and battery with potential difference V0 . A uniform magnetic field
of magnitude B points into the page as shown.
We would like to find a number of things in this problem:
Week 8: Faraday’s Law and Induction 365
R
L
V0
B (in)
Figure 8.11: A conducting rod sits on conducting, frictionless rails and a switch is closed at
t = 0 to send current through the loop thus formed. A magnetic field (into the page) exerts
a force on the rod.
a) The voltage in the loop as a function of v, the velocity of the rod (at some instant in
time t).
d) The terminal velocity of the rod, after the switch has been closed for a long time.
This list lays out a very nice solution strategy. Using Faraday’s Law
dφm dBLx
Vind = − =− = −BLv (8.46)
dt dt
(where the minus sign is Lenz’s Law and must be interpreted accordingly). Note that the
induced voltage is zero until the rod is moving, then decreases in the direction that will
cause currents that experience forces that oppose the motion.
Using Kirchoff’s rule for the loop:
V0 − BLv − IR = 0 (8.47)
in the loop) will be zero. Using either the force or the current equation above we can easily
see that:
V0
vterminal = (8.50)
BL
Alternatively, using the force equation we can write Newton’s second Law and turn it
into an equation of motion:
BLV0 − B 2 L2 v dv
F = = Ma = M (8.51)
R dt
which we can rearrange into a first order, linear, inhomogeneous, ordinary differential equa-
tion:
dv B 2 L2 BLV0
+ v= (8.52)
dt MR MR
As usual, this equation is simple enough to directly integrate:
dv BLV0 − B 2 L2 v
=
dt M R
2 2
B L V0
= − v−
MR BL
dv 2
B L 2
V0
= − dt
v − BL MR
dv B 2 L2
Z Z
V0
= − dt
v − BL MR
B 2 L2
V0
ln v − = − t+C
BL MR
V0 B 2 L2
v− = e− M R t ∗ eC
BL
V0 2 2
− BML t
v(t) = 1−e R (8.53)
BL
where we’ve used our initial condition, v(0) = 0, to set the constant of integration. Note
well that this curve represents an exponential approach to the terminal velocity:
v(t) = vterm 1 − e−t/τ (8.54)
where τ = M R/B 2 L2 is the same as it was for our original rod on rails problem.
With this in hand we can easily integrate over time again to get x(t), differentiate it to
get a(t), substitute it to get I(t) or F (t). We can compute the power being delivered to
the circuit by the voltage and show that it equals the rate at which energy is burned in the
resistor plus the rate that work is being done on the rod. We can answer anything asked
about the rod – the motion is now completely known subject to the usual idealizations in
the problem (no friction or drag and so forth).
8.6: Inductance
We have seen that changing the current in one wire causes the magnetic field associated
with that current to change in time. That, in turn, will usually cause the magnetic flux
Week 8: Faraday’s Law and Induction 367
through other nearby conducting loops to change in time. This, according to Faraday’s
Law, will induce a voltage around those loops and, assuming they have some resistance,
cause current to flow in the direction predicted by Lenz’s Law.
For loops of fixed size and orientation, the field produced by them at any given point
in space is directly proportional to the current they carry (from the Biot-Savart Law, which
contains the current in the wire on top and constant so it can be pulled out of the integral
over the geometry of the wire). The magnetic flux both through the loop itself and through
all other loops that its field passes through is thus also proportional to the current.
2
3
1
I1 4
B field lines
Figure 8.12: A set of current loops indexed by i = 1, 2, 3..., fixed in space and carrying
currents Ii . The B-field produced by (say) current I1 swirls around the current and passes
through both loop 1 and the other loops in the figure, creating both self inductance and
mutual inductance.
This general state of affairs is pictured in figure 8.12. In this figure, loop 1 (we suppose)
carries a current I1 . At the instant shown, this current produces a magnetic that swirls up
through loop 1 in field line loops that go around the current in the right-handed direction.
These field lines pass both through any surface S1 we might draw that is bounded by the
curve C1 and through the surfaces Si6=1 bounded by the other curves Ci . These fields
create magnetic flux that is proportional to I1 in all of the loops.
We can write this in an algebraic form. The flux through the ith loop caused by the
current in the jth loop is:
Z
φij = ~ j · n̂i dAi
B
Si /Ci
!
µ0 Ij d~lj × (~ri − ~ rj )
Z Z
= · n̂i dAi
4π Si /Ci Cj |~
ri − ~r j |3
!
µ0 d~lj × (~ ri − ~ rj )
Z Z
= · n̂i dAi Ij
4π Si /Ci Cj |~
ri − ~r j |3
= Mij Ij (8.55)
where I’ve take some pains to label the coordinates with the object: n̂i normal to the
surface Si bounded by the curve Ci , where dAi is the area element of this surface and ~ ri
~
the vector coordinate of a point on its surface; coordinates dlj and ~
r j on the curve Cj .
368 Week 8: Faraday’s Law and Induction
There are a few very interesting things to observe about this pair of integrals. One is
that the integral over the surface Si cannot depend on the particular surface chosen out of
the infinite number of surfaces Si bounded by any particular curve Ci . Understanding how
integrals like this can be invariant as one selects different surfaces will be a key aspect of
our addition of the Maxwell Displacement Current in two more weeks, so consider this a
hint.
Ultimately, it can therefore only depend on Ci itself, so both integrals can be repre-
sented as integrals around the closed loops Ci and Cj using theorems from multivariate
calculus that you do probably do not yet know106 . The result is (eventually):
!
µ0 d~lj × (~
ri − ~ rj )
Z Z
Mij = · n̂i dAi
4π Si /Ci Cj |~ r j |3
ri − ~
µ0 d~li · d~lj
I I
= (8.56)
4π C1 C2 |~ ri − ~ rj |
which is obviously symmetric under interchange of i and j:
where we define the self-inductance of the ith loop to be the symbol Li . Note that I had to
add primes to the “j” coordinates in the previous expression to differentiate between the
integral over the current loop and the integral over the area.
106
Wikipedia: http://www.wikipedia.org/wiki/derivation of self inductance. It uses Stoke’s Theorem and the
definition of the magnetic field in terms of the vector potential, both things that are beyond the scope of this
course, but it actually isn’t terribly difficult. I link the wikipedia page so that interested students (or students in
a more advanced course trying to connect back to simpler concepts by reading this book) can take a look.
Week 8: Faraday’s Law and Induction 369
If we then differentiate this with respect to time and use Faraday’s Law, we get the following
expression for the induced voltage in the ith loop:
dIi X dIj
Vi = −Li + Mij (8.60)
dt dt
j6=i
Finally, in many, if not most, cases of interest, we can neglect mutual inductance be-
cause the magnetic field dies off rapidly with distance. For that reason we will often speak
of the self-inductance only of specific circuit elements, especially “inductors”, the magnetic
equivalent of capacitors in a circuit, labelled with a plain L with or without an index. The
key equation for a single self-inductance will be:
dI
VL = −L (8.61)
dt
where VL is the voltage drop or rise across the inductor and I is the current through the
inductor. This expression finally gives us a good way of specifying the SI units for induc-
tance. One Henry is a Volt-Second/Ampere, or a Volt-Second2 /Coulomb, or (since a Volt
is a Joule/Coulomb) a Joule/Ampere2 .
Henries can, of course, also be expressed in terms of Webers – you do remembers
what Webers are, don’t you? It should be fairly obvious that 1 Henry is 1 Weber/Second,
but nobody cares much about Webers, while everybody cares about Henries.
In figure 8.13 we return to the long straight wire and adjacent rectangular loop of wire (all
in a common plane) that we examined above in the limited context of Lenz’s Law and the
direction of the induced current. This time, we want to answer all of the questions we might
ask, such as:
• Given this field, what is the magnetic flux through the rectangular loop φ2 ?
370 Week 8: Faraday’s Law and Induction
B (in)
dr R
a
r I2
I1
b
Figure 8.13: A long straight wire carrying a time-varying current Ii (t) near a rectangular
current loop induces a voltage V2 in that loop, which in turn creates a current I2 in that wire
and a force F~ 2 on the wire loop.
• Given this flux and a current I1 (t) that is increasing, what is the voltage V2 induced
in the rectangular loop?
• Given this voltage, what is the current I2 (t) in the rectangular loop (magnitude and
“direction”, that is clockwise or counterclockwise in the arrangement shown?
• Finally, given this current, what is the net force on the loop, and is it attractive (back
towards the long straight wire) or repulsive?
That’s a lot of questions, but I laid it out in this way so you can see the very simple
flow of reason. In a quiz or exam problem I’d be much more likely to just give the picture
~ 2 (t)? So practice thinking about how
(without any “dressing”) and say I1 (t) = IT0 t, what is F
this chain works so that each answer is a trivial step away from the previous one, but put
together the answer isn’t “simple” at all!
At this point you should really all be able to answer each and every step on your own,
so I’ll provide the most cursory review of each step and let you fill in the details (completely,
of course!) for homework.
• To find the flux through the obvious plane surface S bounded by the rectangle, we
have to start by finding the flux in the differentially thin strip shaded in the figure. The
magnetic field is known and approximately constant in the strip in the limit that it is
Week 8: Faraday’s Law and Induction 371
(This doesn’t really help us find the force, but it is certainly something you should be
able to do.)
and since I1 is increasing, we expect the voltage to decrease (and drive a current)
counterclockwise from Lenz’s Law (see above).
V2 − I2 R = 0 (8.67)
or µ a d + b dI
0 1
I2 (t) = ln (8.68)
2πR d dt
dI1
(counterclockwise for dt > 0).
• Finally, the force on each wire is – naaaah, I’m too lazy to help you out any more.
Besides, I think you already found it in a previous homework assignment. The force
on the side wires is a bit tricky, mind you, but not that tricky and the final answer
is now very simple to obtain. What direction does the net force have to point even
before you work it out?
As noted, this is pretty much your first homework problem, given down below. While it
is OK to skim this part of the chapter before starting it, once you start it do not look back at
this example; try very hard to work through the reason on your own. This means, of course
(if you are reading these words right before you start the homework, maybe you’d better
skim through this example aqain before you start...
There are a few other examples of “simple” geometries where one can compute the
mutual inductance, and you will do at least one other one on your homework. The place
where mutual inductance is a critical feature, the whole point instead of an annoyance is
in the design and construction of transformers and inductively coupled rectifiers and the
372 Week 8: Faraday’s Law and Induction
like. There are some places where one can make very clever use of mutual induction to
accomplish some astounding things, such as in a Tesla Coil 107
8.7: Self-Induction
Now we get to one of the most important parts of this chapter: computing the self-inductance
of various simple current loops. We will have even fewer cases of geometries (and ideal-
izations!) where we can even think of doing the integrals in a course at this level, and I will
pretty much present all of them here. Interested students can, and should, visit wikipedia
here: Wikipedia: http://www.wikipedia.org/wiki/Inductance both to read more about induc-
tance itself and to see its lovely table of the self-inductance of a number of circuit shapes
with less idealization. Nevertheless, our idealized answers herein will be more than suf-
ficient to help us fully understand both the essential concepts and the general algebra
required to do a better job.
Our general solution strategy here will be:
a) Find the magnetic field produced by the current I in the loop in question. Usually
we will use Ampere’s Law for this simply because integrating the Biot-Savart Law for
arbitrary points in space is usually too difficult.
b) Write an expression for the flux produced by that field through the loop(s) that pro-
duce(s) it. This may be a simple product of field times area (for constant field per-
pendicular to the surface bounded by the loop) or an integral not unlike the one we
did for rectangular loops near a long straight wire.
c) In cases where there are many “turns” (loops of wire) contributing to the overall flux,
multiply by N , the number of turns.
Let’s start with the simplest and most important example, the moral equivalent of the
parallel plate capacitor for magnetic fields. The Self-Inductance of the (ideal) Solenoid:
In figure 8.14 I’ve drawn an “ideal” circular cross-section solenoid, one with N (tightly
wound) turns, a radius R, and a length ℓ ≫ R. Obviously I’ve had to exaggerate some of
these features in the drawing – the radius of the wire itself is really very small compared
to the other length dimensions, there is very little space between turns, and it should be
longer compared to its illustrative radius.
Following the rubric given above, we first find the field inside of the solenoid using
Ampere’s Law (see week 7 if you cannot remember the correct Amperian path to use as
107
Wikipedia: http://www.wikipedia.org/wiki/Tesla Coil. A Tesla Coil is basically a big resonant transformer
that makes Big Sparks. In fact, it pretty much makes lightning. As such, it is a great favorite for students to
make for an extra-credit project, because taming the lightning is what physics is all about, isn’t it...?
Week 8: Faraday’s Law and Induction 373
I (in)
l
Direction of induced E
for increasing field.
B
n
R
Figure 8.14: An ordinary (ideal) solenoid with N turns each carrying a current I(t) is drawn
above. The total flux through the solenoid is N times the flux through a single turn.
The solenoid has N turns, each with this flux. Yes, they all count, as each of them
contributes a piece ∆Vturn to the total potential difference as the current changes, so the
total will be N times that of just one turn:
µ0 N 2 πR2
φtotal = I (8.71)
ℓ
374 Week 8: Faraday’s Law and Induction
µ0 N 2 πR2
L= (8.72)
ℓ
Note that we generally make L positive by convention and figure out any signs using Lenz’s
Law and a bit of common sense, so inductors don’t come with a polarity or sign.
Nothing to it! Now suppose that I(t) = I0 sin(ωt) (a reasonable assumption for har-
monic alternative voltages such as those we will shortly study). We can easily find:
dI
∆VL = −L = I0 (ωL) cos(ωt) (8.73)
dt
where the field of the induced voltage opposes the increasing current during that part of its
harmonic oscillation and reinforces the decreasing current during that part of its oscillation.
As we indicate on the figure, if I, directed into the page at the top of the coils and out at
the bottom, is increasing, then the induced E-field points out of the page at the top and
in at the bottom and the induced potential decreases right-to-left, opposing the increasing
left-to-right current.
This may be tricky for you to see! The direction of the potential difference ultimately
depends on which way the coil was wound – if the helix spirals from left to right (in at the
top) as drawn then the net current transport is left to right and the induced voltage from
an increasing current decreases from right to left. If it is wound right to left (in at the top)
so that the net current transport is right to left as well, then the induced voltage for an
increasing current will be left to right. It all makes perfect sense in terms of Lenz’s Law
either way – the voltage decreases in the direction that opposes the flow of the increasing
current either way, and reverses to support it if and when the current decreases instead.
Before we move on, it is indeed worth pointing out that ωL in the expression for ∆VL
above has units of resistance (since I0 ωL has units of volts). Next week we will name ωL
inductive reactance as it will be a very important quantity in AC circuits.
In figure 8.15 we see the same toroidal solenoid that we saw in week 7, where we evaluated
the magnetic field inside using Ampere’s Law. We will follow exactly the same rubric as
before, except that this time I won’t actually do the steps for you; they are part of this week’s
homework. Remember:
a) Evaluate the field (magnitude) B(r) using Ampere’s Law. Only refer back to week 7
if you must, as by now you should be able to do this on your own without looking!
b) Evaluate the flux through a single turn of the toroidal solenoid. This will involve setting
up an integral that is almost exactly the same as the integral in the example of finding
the mutual inductance of a long straight wire and a rectangular loop above. Again,
try not to have to go back and look, as the picture should remind you of what you
need to do, and the integral itself is pretty trivial.
Week 8: Faraday’s Law and Induction 375
z dA for flux
N turns
dr
a
r h
I
b
C
Figure 8.15: A tightly-wrapped toroidal solenoid with N turns produces a magnetic field
inside that varies with r, but is approximately constant everywhere in a narrow strip of
height h and width dr. The field is, of course, in the direction determined by the right hand
rule, meaning that it points in to the page through the shaded strip we need to use to find
the flux.
c) Multiply the flux for a single turn by N , the number of turns in the solenoid (as once
again each turn contributes to the overall potential difference) to find the total flux.
d) Divide the flux by the current to find the self-inductance of the solenoid.
e) Think a minute. Suppose the current I(t) in the direction shown in the figure is
increasing. What is the direction of the induced electric field around a loop? Suppose
it is decreasing, ditto? Either way, of course, the induced voltage across the two wires
leading to/from the solenoid will oppose the change in the current!
dA = l dr
I
r l
I
a
Figure 8.16: Coaxial cables have a self-inductance measured per unit length. At high
frequencies the inductance only depends on the outer radius of the inner conductor a and
the inner radius of the outer conductor b. A strip of area dA = ℓdr is shown that may be of
use in computing L/ℓ, the self-inductance per unit length.
376 Week 8: Faraday’s Law and Induction
This sets up another homework problem, as I’m feeling even lazier than before and
you need to do the work in order to learn how! In figure 8.16 a current I(t) flows e.g. up
the (long, straight cylindrical shell) inner conductor and back on the outer (long, straight
cylindrical shell) outer conductor. From Ampere’s Law you can easily find the magnetic
field where it is confined in between the inner and outer conducting shells.
With the magnetic field in hand, it should be easy to find the flux through the dark
shaded strip shown (with the parameter ℓ in it, so this will yield the flux per unit length once
the ℓ is divided out) and integrate from a to b, an integral that should by now be boringly
familiar to you. Divide by the current and ℓ to find the self-inductance per unit length of the
cable.
That isn’t quite all of the cases where one can compute the self-inductance of some-
thing without needing to do absurdly difficult integrals or deal with even more heavily ap-
proximated fields – for example, you might think about what the self-inductance per unit
length is for a thick cylindrical wire of radius R and resistivity ρr – but it is pretty close.
8.8: LR Circuits
From here on out, with rare exceptions we will work with inductors as (self-inductive) circuit
elements just like capacitors and resistors. We will use “The Solenoid” (idealized) as our
architypical inductor, and we will often pretend that they are made with superconducting
wire (as a further idealization) so that they have no resistance to worry about. Real induc-
tors, of course, are made with many turns of relatively thin wire and can have substantial
(non-negligible) resistance as well as self-inductance. However, their “resistive” properties
can always be considered to be a resistor in series with a pure zero resistance inductor,
so nothing is lost by the idealization as long as we remember to include their resistance in
our circuits.
Switch
t=0
I(t)
+
V0 R
Figure 8.17: The architypical direct current LR circuit. We generally assume the switch
closes at time t = 0 with the current in the circuit I(0) = 0.
Let us, then, figure out a simple DC LR circuit, given in figure 8.17: an inductor in
series with a resistance R, which could be the natural resistance of the inductor itself, or
an external resistor, or the combined resistance of an external resistor and the resistance
of the inductor. Note well that we have generated a symbol for an inductor in an electrical
Week 8: Faraday’s Law and Induction 377
circuit, the squiggly thing that looks like a coil/solenoid with many turns of wire. We don’t
care much about how many turns it has, or how long it is, or what its cross-sectional area
is, or whether or not it contains a magnetic material (discussed later). All we care about
is the combined effect of all of this, the (self) inductance L (and possibly its contribution to
the total resistance R of any branch of a circuit it is in).
Obviously no current flows while the switch is open. We imagine closing the switch
at time t = 0. The battery will drive current through the wire. The resistor will oppose
this current (Ohm’s Law), and the inductor will also oppose this current as long as it is
increasing (Faraday’s Law). At some finite time t later, we expect to find some non-zero
current in the circuit, one that is changing in time, and will use this assumption in analyzing
the circuit algebraically.
First, however, let’s see what we can figure out using nothing but verbal reason and
dimensional analysis instead of algebra and calculus. We begin, as we see, at I(0) = 0.
After a very long time, we rather expect that the current will arrive at some constant value,
at which point the back-voltage generated by the inductor will be zero The voltage gain
from the battery will all drop across the resistor, suggesting that the current will be I∞ =
V0 /R. We therefore expect a current I(t) that starts at zero and approaches V0 /R before
beginning the problem, and we might guess that it will approach this current exponentially.
All that is left is guessing the exponential time constant.
Well, we have two parameters to play with: R and L. Ohms are Volts/Ampere. Henries
are Volt-Seconds/Ampere. We want a time constant in seconds, so it looks like:
L
τ= (8.74)
R
will have units of seconds and is the simplest way of getting such a time out of the three
quantities that could appear in the answer, V0 , L and R. If our life depended on just writing
down an expression for I(t) that is at least approximately correct, we would then guess:
V0 t
V
0 R
I(t) = 1 − e− τ = 1 − e− L t (8.75)
R R
before starting the problem!
Although perhaps it will be a bit anticlimactic, let’s solve it the more difficult but formally
correct way. We start, as usual, with Kirchoff’s Loop Rule, some arbitrary time after the
switch is closed:
dI
V0 − IR − L =0 (8.76)
dt
We rearrange this to put it in the standard form of a first order, linear, inhomogeneous
ordinary differential equation:
dI R V0
+ I= (8.77)
dt L L
At this point I shouldn’t have to help you. We’ve now solved this equation several times
over two semesters108 – it is directly integrable after some rearrangement and is clearly
an important equation to be able to effortlessly solve if you want to understand Nature, not
108
Approach to terminal velocity with a linear drag force, approach to a terminal velocity for a rod on rails with
a battery or gravity, charging a capacitor in a DC RC circuit, for example.
378 Week 8: Faraday’s Law and Induction
only in the context of physics but in biology and chemistry and medicine as well. If you
remember how, stop reading here, get out a piece of paper, and do so, verifying that you
get the solution I already deduced above without using algebra or calculus. Work neatly,
as this is a straight up homework problem so your efforts won’t be wasted.
But what the heck, you’re learning, you’ve forgotten, so I’ll solve it here again. But pay
attention this time – really learn to recognize this kind of equation and solve it when you
see it! Practice it a bit, then wait a day and try working through this section again, this time
solving the FOLIODE above without looking.
So here we go:
dI R V0
+ I =
dt L L
dI V0 R
= − I
dt L L
dI R V0
= − I−
dt L R
dI R
= − dt
I − VR0
L
Z
dI R
Z
= − dt
I − VR0 L
V0 R
ln I − = − t+C
R L
V0 R
exp ln( I − = exp − t+C
R L
V0 R
I− = e−( L )t eC
R
V0 R
I = + Ae−( L )t
R
V0 R
I(t) = 1 − e−( L )t (8.78)
R
where we’ve used the fact that the natural log and exponential are inverse functions of one
another and where we set the (exponential of) the constant of integration from the indefinite
integrals A to −V0 /R in order that I(0) = 0 (the initial condition, recall).
8.8.1: Power
Let’s track the flow of energy in this circuit. Remember, the power delivered to/used by any
given circuit element is P = V I where V is the voltage gain/drop across the element and
I is the current through it (which we now know).
The power provided by the battery (positive):
V02 R
PV = V0 I(t) = 1 − e−( L )t (8.79)
R
The power burned in the resistor (negative – remember, this is energy that is all turned
into (joule) heat(ing):
PR = VR I(t) = (−I(t)R)I(t) = −I(t)2 R
V2 R
2
= − 0 1 − e−( L )t
R
V02 R R
= − 1 − 2e−( L )t + e−(2 L )t (8.80)
R
which is a bit more complicated, but still not terrible. Note that I stuck a minus sign in
front because this is power being removed from the system by the voltage drop across the
resistor. With this sign choice, we are guaranteed to have energy conserved, as we will
see below.
The power delivered to the inductor (negative, but where does this energy go? See the
next topic...):
dI
PL = VL I(t) = (−L )I(t)
dt
V0 R −( R )t V0 R
= − L e L 1 − e−( L )t
R L R
V 2 R R
= − 0 e−( L )t − e−2( L )t (8.81)
R
Note that we used the fact that
dI R
VL (t) = −L = −V0 e−( L )t (8.82)
dt
is the voltage drop across the inductor just as:
R
VR (t) = −IR = −V0 1 − e−( L )t (8.83)
Let’s imagine that the power delivered to the inductor is is somehow being stored in the
inductor in the magnetic field. Then:
dUL dI
PL = = −LI (8.85)
dt dt
or (multiplying by dt):
dUL = −LI dI
Z Utot Z I0
dUL = −LI dI
0 0
1 2
Utot = LI (8.86)
2 0
This is the moral equivalent of the U = 21 CV 2 that we similarly derived for a capacitor, but
this is a dynamic quantity as it depends on the current flowing in the inductor.
Let is imagine that our inductor is an ideal solenoid with N turns, length ℓ, and cross-
sectional area A, one where the magnetic field inside the solenoid is constant and equal
in magnitude to:
µ 0 N I0
B= (8.87)
ℓ
and that vanishes at the ends of the solenoid (neglecting fringing fields). We showed above
that the self-inductance of this ideal solenoid is:
µ0 N 2 A
L= (8.88)
ℓ
which strangely matches our similar equation (deduced from very similar considerations
for the energy density in the electric field:
dUe 1
ηe = = ǫ0 E 2 (8.91)
dV 2
There is something really sort of spooky about this – it is redolent110 of as-yet undiscov-
ered relationships between the electric and magnetic fields. Soon, my child, soon we will
understand this and a great burst of illumination will occur. Literally.
As was the case for capacitors, it isn’t enough to just make the ansatz. We need to
verify that it works for at least one other geometry of inductor, ideally one with a varying
field and inductance we can compute. Our only real choice here is the toroidal solenoid.
Suppose you have the very toroidal solenoid we study above, carrying a current I. We can
use Ampere’s Law to find the magnetic field strength B(r) inside the solenoid, of course.
We can then use it to find:
dU B(r)2
= (8.92)
dV 2µ0
if we multiply this out:
B(r)2
dU = dV (8.93)
2µ0
and integrate both sides, we should get Um , the total energy stored in the magnetic field
(according to our ansatz).
Show that this is exactly equal to:
1
U = LI 2 (8.94)
2
using the L you found above.
Note that I’m not actually doing this for you, but I will help you one teensy bit. the
volume element dV you should use is the one of thickness dr at radius r with height h, or
dV = 2πrhdr (8.95)
We have seen up above that a current loop resists being pulled from or pushed into a
magnetic field because the field induces currents that exert forces that act against any
change in flux. Just as this is true for actual e.g. loops of wire, it is also true for bulk
conductors! Any conducting material such as a sheet of copper will resist being pushed
into or pulled out of a magnetic field, because the changing field causes currents to loop
110
Politespeak for “it stinks”...
382 Week 8: Faraday’s Law and Induction
B(in)
I eddy
I eddy
Fm v
copper sheet
Figure 8.18: A sheet of copper being pulled rapidly out of a field has induced eddy currents.
The forces from these currents, according to Lenz’s Law, resist the motion, causing a
magnetic “drag force” similar to that observed in the rod on rails problem. The kinetic
energy of the object is transformed into heat by these currents (resistive Joule heating).
through the entire conductor as if it were many, many parallel wires. We call these currents
“eddy currents”.
Eddy currents are remarkably important, as they are a source of energy loss whenever
we attempt to e.g. alter a magnetic field in the vicinity of any conductor. Eddy currents
produce Joule heating of the conducting material very readily – one can actually cook food
on stoves that use a rapidly varying magnetic field to directly heat metal pots placed in the
field111 . Transformers (covered later) rely on rapidly varying, ferromagnetically enhanced
magnetic fields to step up or step down voltage, and unless care is taken to prevent eddy
currents in the design of the magnetic cores, much of the energy being transmitted through
the transformer will be lost to heating the cores. Eddy currents cancel electromagnetic
radiation at the surfaces of conductors, both heating the conductors slightly and causing
the electromagnetic field to reflect from the surface rather than be transmitted. It seems
worthwhile to spend a moment trying to understand them.
In figure 8.18 above, a sheet of copper being pulled rapidly out of a strong magnetic
field is illustrated. It is moving at some speed v to the right. As it is pulled out, the magnetic
flux through the entire sheet is reduced. This creates an induced field in the conductor and
its associated induced voltage that (because it is a good conductor ) can and does drive
a large current in the copper. This current is not isolated or confined in the conductor –
the conducting sheet is like an entire field of parallel resistance pathways and the current
spreads out to use them.
Note well, however, that like the rod on rails problem (which this greatly resembles!)
the net force on the induced current is in a direction that opposes v (whichever direction
the sheet is moving, in or out of the field). The current flow in the field produces this force
, while the current flowing in the opposite direction through the part of the sheet that is out
111
Wikipedia: http://www.wikipedia.org/wiki/Induction Cooking. This is actually a lovely article, and will intro-
duce you via a link to the notion of skin depth, as induction stovetops only tend to work on ferromagnetic pans
(such as cast iron) because they have a high magnetic permeability (discussed shortly), a small skin depth,
and hence concentrate the induced current in a thin layer of the iron with a much higher electrical resistance
than is obtained with an otherwise identical copper or aluminum pot.
Week 8: Faraday’s Law and Induction 383
of the field does not. One expects that the velocity of this sheet, like the velocity of the rod,
will be exponentially damped, or, if the sheet is being pulled, will reach a terminal velocity.
The current itself is like a “whirlpool” or eddy of charge swirling around in the material,
hence the name eddy current. There are several simple demonstrations of eddy currents
– swinging a sheet of copper down between the poles of a powerful magnet with or without
slits that break up the conductive pathways and reduce the effect, swinging a magnet above
a conducting sheet, or (my favorite) dropping a powerful magnet down through a copper
pipe and a PVC pipe at the same time.
Magnetic brakes can use this same principle to stop a car, although (as a homework
problem will demonstrate) one can avoid wasting the energy by turning the wheel rotors
into “generators” that can store the energy in a battery as they remove it.
We will return to the notion of eddy currents when we treat transformers because the
iron cores of transformers are usually laminated – made of thin sheets or wires of iron
coated with and separated by an insulating resin – precisely to prevent eddy currents from
the rapidly changing magnetic fields they help support from heating the iron and hence
wasting the energy in the time varying magnetic field.
We have postponed discussing the magnetic properties of materials until here because
we had to wait until we understood the basic idea of Faraday’s and Lenz’s Laws. As we
will see, the diamagnetic property of some materials that corresponds to the dielectric
properties we’ve already studied comes about as a result of Faraday’s Law.
However, another good reason to wait until now is that magnetic properties of mate-
rials are much more complicated than electrical properties were. Back in electrostatics,
dielectric polarization was about it. Well, not really – a very few materials exhibit e.g. ferro-
electric properties, and further study also reveals that dielectric polarization and electrical
conductivity are two aspects of a single complex quantity and not really independent – but
close enough. If you put nearly any material in a static or slowly varying electrical field, the
field inside that material will be reduced.
If you put that same material in a static or slowly varying electrical field, you might find:
• The magnetic field is altered by the addition of another vector magnetic field produced
by the material itself, a field that persists even if there is no external field. We call this
ferromagnetism.
These are all bulk descriptions, and fail to capture the wide variety of magnetic structure
one can discover on the microscopic scale of the material. They also are all properties
that depend on the temperature of the material. In fact, a single material can, at different
temperatures, be ferromagnetic, paramagnetic, and diamagnetic!
384 Week 8: Faraday’s Law and Induction
Thus far, we have been pretty successful in understanding things classically, but certain
aspects of the magnetic properties of matter rely heavily on quantum mechanics, in partic-
ular the fact that electrons have spin (and hence an intrinsic magnetic dipole moment) and
orbit the atomic nucleus in non-radiating, non-resistive orbits. We will have to draw at least
on these “cartoon” ideas as we seek to grasp the general concepts and ideas underlying
magnetic behavior of materials.
Diamagnetism
This is a course on classical physics, but magnetism in particular is very difficult to under-
stand on purely classical grounds. For example, we’ve seen above how conductors will at
least transiently reduce magnetic fields that attempt to penetrate them, as eddy currents
are induced around their perimeter. We can imagine that a superconductor with zero re-
sistance would reduce those fields to zero (and indeed that is the oversimplified case, with
some limitations) but superconductivity is a purely quantum phenonmenon.
We don’t have to go to the extreme case of superconductivity to require a bit of quantum
theory in our explanation, however. Basically all three of the primary ways ordinary matter
modifies magnetic fields are at least partially quantum mechanical in their explanation.
Atoms can be thought of as more or less spherically symmetric balls of electrons sur-
rounding heavy pointlike nuclei. The electrons are in “orbits” around these nuclei, but the
orbits are not classical orbits like the Moon orbiting the Earth, they are non-radiating, zero
resistance flows of electronic current around the nucleus.
If a magnetic field is increased in the vicinity of an atom, Faraday’s Law suggests that
all electronic currents around an axis parallel to the magnetic field through the nucleus
will be increased or decreased as needed in order to reduce that field. This alteration in
the currents can be accompanied by an increase or decrease in the average radius of the
orbits in question, and by small changes in the energy of those orbits.
If the currents were classical currents moving against some form of resistance, the
decrease in magnetic field strength due to the induced current would be small, transient
and difficult to detect. However, quantum atomic orbitals have no resistance. As long
as the external magnetic field isn’t varied too rapidly by too great an amount, so that the
atom has time to “smoothly” adjust its orbitals, the induced current variation doesn’t involve
dissipation and the field reduction dynamically tracks the applied field and is “permanent”.
To see what happens inside a block of dense matter, we need to consider how all of
these reactive currents combine. In figure 8.19 an external magnetic field into the page is
applied to a (highly magnified) block of material. This field induces non-dissipating atomic
currents in the atoms that create magnetic dipoles pointing into the page.
Inside the bulk of the material, the current circulating around one atom approximately
cancels the current circulating around the atoms next to it, where they are in contact. If
one does a coarse grained average of the current, it is nearly zero in any small volume of
the material containing many atoms.
This is not true on the surface. The currents of the atoms on the surface have no
Week 8: Faraday’s Law and Induction 385
Figure 8.19: Wherever “atomic” magnetic current loops adjoin one another, the average
current is zero. On the surface, however, there are no neighboring atoms, and the current
loops there are not cancelled. They add (on average) into a continuous surface current not
unlike that of a solenoid, so that the field everywhere in the interior is reduced.
neighboring atoms with currents running the opposite way on the outside, so there the
currents all combine, on average, to produce a net current running around the perimeter
of the object. This current is almost identical to that of a solenoid, and, like a solenoid,
there is a uniform field inside the material that directly opposes the applied external field
and hence reduces it inside of the material112 .
We will call this reactive response diamagnetism, the exact analog of the dielectric
response of most insulators and conductors. Nearly all materials have a diamagnetic re-
sponse to applied magnetic fields (especially at higher temperatures), but many materials
have this response overridden by one or both of the following kinds of bulk magnetization,
which have very different explanations.
8.11.1: Superconductors
Certain materials, when cooled to extremely low absolute temperatures, become super-
conductors. Superconductivity is a more or less purely quantum mechanical phenomenon
and hence is beyond the scope of this book – basically a fraction of the electronic charge
starts to behave collectively like a macroscopic quantum “orbital” that can transport elec-
tronic charge without resistance.
Superconductors can be thought of as being “diamagnetic” – indeed perfectly diamag-
netic (as well as being perfectly dielectric) as they tolerate no magnetic or electric field
inside at all, but it isn’t exactly the same mechanism as merely opposing an applied field
via induction; a superconductor actively ejects any existing magnetic field as it is cooled
112
This follows from Ampere’s Law applied to e.g. paths parallel to the applied field on the inside of the
material that contain a piece of the surface current, similar to the “infinite plane sheet of current” we considered
earlier.
386 Week 8: Faraday’s Law and Induction
across the transition temperature where superconductivity appears, even if that field is not
changing. One visible sign of this ejection is that superconductors placed above a perma-
nent magnet float, suspended by its perfectly opposed magnetic field. This is called the
Wikipedia: http://www.wikipedia.org/wiki/Meissner EffectMeissner Effect.
Superconductors, of course, are potentially very useful – a long term search continues
for finding specially engineered materials that are superconducting at e.g. room tempera-
ture. A room temperature superconductor would have enormous positive implications for
our civilization – levitating trains that require no energy to levitate, loss-free transmission
of electrical energy over long distances, and much more – but so far they have eluded our
search. As of the time of this writing, the highest temperature superconductors thus far
found have critical temperatures in the range of 100-150 degrees Kelvin, over 100 degrees
Kelvin short of even the freezing point of water.
Still, enormous progress has been made in recent decades. We can certainly at least
hope that high(er) temperature superconductors eventually have a significant impact on
our lives.
Paramagnetism
Some molecules have permanent electric dipole moments. Many atoms or molecules have
permanent magnetic dipole moments. This is a purely quantum mechanical phenomenon.
Charged electrons and protons have spin and hence are permanent magnetic dipoles. As
atoms and nuclei are “built” out of many protons, neutrons, and electrons these spins are
paired when possible in such a way that no net moment results, but all across the periodic
table are elements with unpaired electrons or protons, and at least potential net spin and
magnetic moment. This angular momentum combines with orbital angular momentum to
produce many atoms with magnetic dipole moments113 .
We know that magnetic dipoles have a potential energy in an applied magnetic field
that is a minimum when the dipoles are aligned with the field. Although (as we have
seen) magnetic dipoles associated with angular momentum on the scale of elementary
particles or atoms experience a torque due to an applied magnetic field that causes their
angular momentum to precess around the magnetic field, they also experience many small
“random” torques due to thermal (heat) fluctuations in their environment. These torques
caused by e.g. collisions between atoms or vibrations in a lattice constantly more or less
randomly reorient the magnetic moments at high temperatures so that the system has no
net average magnetic dipole moment. A lattice of “spins” at high temperature is pictured in
figure 8.20.
At low temperatures there is less (free) energy to share among all of the spins – recall
that the equipartition theorem (for example) relates the total kinetic plus potential energy
in all of the degrees of freedom of an atom to its temperature. It is therefore a lot more
likely to find the atoms in states that have “less” magnetic potential energy in the field than
those that have more, and atoms have the least magnetic potential energy when they are
113
Wikipedia: http://www.wikipedia.org/wiki/Magnetic moment#Magnetic moment of an atom. In fact there
is a dizzying array of ways these moments can arise, too many to exhaustively and correctly cover here.
Week 8: Faraday’s Law and Induction 387
Bext Bext
Figure 8.20: A lattice of “spins” at high temperature (a) and low temperature (b) is por-
trayed as a two dimensional cartoon. The direction of the arrows can be thought of as the
directions of the angular momentum and hence magnetic moment of each atom, in a side
view that reveals their rough degree of alignment with the field. At high temperature the
spins are more or less randomly aligned with the field, but at low temperature there is less
free energy and the spins are much more likely to be in a lower energy state, partially or
completely aligned with the external field.
in alignment with the field! Consequently, at low enough temperatures we are likely to find
the “permanent” magnetic moments of the atoms or molecules (if any) aligned with the
applied external field!
This alignment causes the exact opposite response of the material to the field. Since all
of the magnetic moments are lined up with the field, and can be much larger than induced
magnetic moments that oppose it that are being created at the same time, the net field
produced by the “current loops” still cancels on the interior and adds up on the surface,
but this time to enhance or augment the applied field. The total magnetic field inside the
material is larger than the original external magnetic field. This is portrayed in figure 8.21.
Figure 8.21: Just as was the case for a diamagnet, the internal currents of aligned magnetic
moments cancel (on average) in the bulk of the material, but the surface currents add. The
surface currents behave like the wires of a solenoid or sheet of current wrapped around
the object to increase the total field inside.
388 Week 8: Faraday’s Law and Induction
One can barely appreciate paramagnetism classically. Spinning electrons and orbits with
both angular momentum and a magnetic moment are classically accessible, even though
their properties (such as quantization of the angular momentum) are partly determined by
quantum theory. Not so for the next two kinds of magnetic behavior of materials. They are
purely quantum mechanical; one has the opposite sign altogether to anything you would
expect classically.
Let us suppose that the permanent magnetic moments on two neighboring atoms can
themselves interact. This alone isn’t inconceivable – one creates a (weak) magnetic field
at the location of the other, although the actual direction of that field is determined by
the relative orientation of the source dipole and the target location and hence not easy to
imagine. We will further suppose that the interaction is bilinear in the magnetic moments
themselves, and since energy is a scalar, we’ll make the bilinear product the scalar product
for simplicity.
That is, let us suppose that the potential energy of interaction between two neighboring
atoms (labelled with i and j respectively) has the general form:
Uij = −Jij m
~ i·m
~j (8.96)
where Jij is the energy coupling between the two moments. Note well that this form is by
no means unique or necessarily correct – it is more or less a hypothesis that we’d need to
test against observed materials.
If Jij > 0, the two moments will have minimum energy when they are aligned (ferro-
magnetism). If Jij < 0, the two moments will have minimum energy when they point in
opposite directions (antiferromagnetism). As before, when the temperature goes down,
the energy removed has to come from somewhere, so low temperatures will favor a “para-
magnetic” alignment or antialignment of the moments. The interesting thing is that this
alignment will occur even in the absence of an external field!
The energetics of this are illustrated in figure 8.22. This is yet another cartoon rep-
resentation in two spatial dimensions, this time of “spins” in one dimension (each spin is
associated with a magnetic dipole moment more or less as usual by a relation such as:
e
m
~e= ~
s (8.97)
2me
in a suitable system of quantized angular momentum units). In this kind of toy model, we
only let the spins point in one of two directions: up or down, to study only their tendency
Week 8: Faraday’s Law and Induction 389
energy out
Figure 8.22: A cluster of five magnetic moments (spins) is illustrated with the central spins
in two possible configurations. When the central spin is antiparallel to the four surrounding
spins, it has potential energy Ua = +4Jm2 in a suitable system of units. When it lines up
parallel to the four surrounding spins, it’s energy is Up = −4Jm2 .
cools. At a critical temperature, the size of one of the clumps spans the lattice and the
system develops a macroscopic magnetization characterized by a permanent magnetic
dipole moment. Not all of the spins point in the same direction (until one reaches absolute
zero in temperature, at any rate, which is impossible in any macroscopic sample) but the
majority do, with a fraction that increases to unity as one approaches zero temperature.
surface current
B inside
S
Figure 8.23: In a ferromagnet, the magnetic dipoles spontaneously align when cooled
below a critical temperature. The resulting surface current transforms them into small
“solenoids” with a non-dissipative surface current surrounding their interior volume and
trapping magnetic flux that emerges from their north pole and flows to their south pole.
One last time we resort to our magnetization picture, this time (in 8.23) to illustrate the
permanent macroscopic magnetization of a bar magnet in the absence of an external field.
The critical temperature for the paramagnetic-ferromagnetic transition is called the Curie
Temperature after Pierre Curie (the husband of the perhaps better-known Marie Curie),
who showed that ferromagnetism was lost at this temperature. The critical temperature for
the related antiferromagnetic transition is called the Neel Temperature for similar reasons.
Physicists find the classic ferromagnetic phase transition to be very interesting because
it is an excellent example of the (sudden) emergence of long range order in a system that
is disordered at high temperatures. The magnetic susceptibility of the system, the heat
capacity of the system, and other thermodynamic descriptors of the system all do unusual
things at the critical temperature of the phase transition, often exhibiting divergent or non-
continuous behavior. Considerable effort has been expended on deriving a theory that
accurately describes things like the particular value of the critical temperature and certain
exponents that describe the divergences that occur there. These theories haven’t been
without some successes, but only a very few simple models have been solved exactly,
notably the 2 dimensional Ising model mentioned and portrayed in cartoon form above.
However, we can now use powerful computers to simulate the behavior of “ideal” mag-
Week 8: Faraday’s Law and Induction 391
netic systems and compute their critical parameters with systematically improvable accu-
racy. These computations in turn can be used to check the theoretical predictions (since
we lack “perfect” exemplars of the theoretical models in messy old nature).
Magnetism, Concluded
With this we’ll wrap up our treatment of straight-up magnetic phenomena. As you can see,
it is considerably more complicated than electrostatics even before the dynamical behavior
associated with Faraday’s Law is introduced.
Magnetic forces are right-hand twisty. They appear to violate Newton’s Third Law, which
should make you very worried about the consistency of physics and the laws of Conser-
vation of Momentum and Angular Momentum. They appear or disappear, seeming to turn
somehow into the electric force as we change inertial reference frames (transforming into
a frame where a charge is at rest, for example).
The sources of magnetic fields are no less right-handed twisty. Fields circulate around
moving electric charges, and although we might expect to find free magnetic charges, so
far nobody has managed to salt the tail of one115 .
Finally (and best of all), it looks like changing magnetic fields are somehow able to cre-
ate electric fields! Magnetic induction is wonderfully complicated, with right hands twisting
this way and that trying to simultaneously track the directions of currents, magnetic fields,
electric fields produced by the magnetic fields, new currents created by the electric fields,
and forces between all of these currents and the magnetic fields the sit in? And did I
mention Lenz’s Law, that makes all of the induced responses work backwards?
Furthermore, if we look at Maxwell’s equations (so far) we have now seen the full set –
two Gauss Laws, Ampere’s Law, and Faraday’s Law – and there is no sign yet of Maxwell.
We do notice that the equations are getting more symmetric. Magnetic fields actually
behave almost like electric fields and vice versa and it looks strangely like one can turn
into the other if we merely look at it differently (changing reference frames, for example).
However, they aren’t quite right, somehow – Ampere’s and Faraday’s Law look like they
ought to be more consistent, but we can’t quite see how.
In a week, we’re going to look at Maxwell’s Equations again and make a startling dis-
covery – the one due to Maxwell – that makes the set of equations perfectly symmetric
except for the lack of magnetic charges, a problem that experimentalists might resolve
tomorrow by finding one. Maxwell’s addition will throw considerable light 116 on several
puzzles in physics, and in the process give us plenty of stuff to study and learn for the rest
of the semester.
115
Sorry, this is an ancient metaphor, associated with the idea that you can catch a bird by putting salt on its
tail. It is used by bored parents to torment their four year old children who want to catch the pretty birdies. As
in: “Oh, you want to catch that sparrow? All you have to do is put salt on its tail!” The child, of course, spends
days in the field with a box of salt, trying to get close to birds. Birds, not being that stupid, fly away anytime the
child and salt come near. Finally a great truth dawns on the child – you can’t salt the tail of a bird you haven’t
already caught...
116
Heh, heh. This is a pun, actually. If you don’t get it now, in Yodaspeak: “You will. You will.”
392 Week 8: Faraday’s Law and Induction
But first, let’s look at a complete different topic. Let’s look at harmonically alternat-
ing voltages applied to electrical circuits containing inductances (L), resistors (R), and
capacitors (C) as well as generators or other voltage sources that produce harmonically
oscillating voltages. Along the way we will see how all of the things we have learned so
far form pretty much the basis for modern civilization, given that modern civilization would
regress to a form not seen for over a century overnight if our modern electrical power grid
were to fail. You are finally knowledgeable enough to be able to understand the power
grid – how electricity is generated, how it is transmitted long distances without significant
losses, how it is used when it gets there in all kinds of work saving and life saving devices.
You can also understand how electrical circuits can be combined to make information pro-
cessing devices – radios, televisions, computers, cell phones, music players, networks –
as well as a vast array of devices useful in medicine, business, industry, or the home.
Electricity helps make our cars and boats and planes and trains work, it cools our food
to keep it fresh and cooks our food to make it safe and savory to eat, it cleans our dishes
afterwards, it entertains us in all of the well-lit time we have to spare in the evenings in our
electrically heated or cooled houses, a time when our ancestors only a hundred and fifty
years ago either had to work or sleep for the lack of cheap light, huddling to keep warm in
houses heated (if at all) with costly wood or coal. Electricity saws the wood that builds our
houses, it weaves and sews the cloth we wear on our backs. Electricity enables us to grow
far more food than we could without it, transport that food for vast distances, and store it
safely until it is needed – cities would die almost overnight without it.
Nothing in human civilization is more important than maintaining and increasing the
flow of inexpensive electrical energy. With it, the poorest of our poor are wealthier than the
wealthiest of the kings, emperors, and nobles of yesteryear. Without it, billions of humans
would starve, our urban civilization would collapse, wars would erupt over access to food
and other resources that electricity makes cheap and plentiful.
Yet – to get up, just a bit, on a political soapbox – our elected leadership and the
population that elects them seem somehow to be blind to all of this. Nothing in human
civilization is more important than ensuring an inexhaustible source of electrical energy
to enable that civilization to continue, and yet we do almost nothing with our collective
resources to construct an electrical grid that does not rely on scarce and exhaustible fuels,
fuels that there are far better uses for than burning them.
There is plenty of non-scarce energy available on Earth to run a high level of civilization
not just for the few, but for every person on the planet. The Sun, the wind and the water
can provide us with power for as long as the Sun shines (some five billion more years),
the wind blows (as long as the Sun shines), the water flows (ditto). If we must burn fuels,
thermonuclear fuels such as deuterium are so abundant that they, too, are virtually inex-
haustible – even if the Earth runs out in a billion years or so, there is all of the rest of the
solar system to mine. Burning oil and coal, however, is simply inexcusible, except as a
short time stopgap to keep civilization from collapsing while we change over to renewable
or inexhaustible resources.
But to make this changeover, we require political will. We have to invest in the changeover,
we have to mandate the changeover as a matter of social will. Until we have converted to
renewable energy, human civilization will hang by an ever eroding thread over an abyss of
Week 8: Faraday’s Law and Induction 393
misery. On the other hand, once we have converted energy scarcity will never again be an
important social or economic issue and indeed, the world economies can actually stabilize
by using the more or less fixed value of energy as a standard of monetary value. Nearly
all scarcities in human affairs – water, food, living space, clothing, commodities – can be
provided cheaply given only enough, cheap enough, electricity.
It is my hope that my students over the years, reading these words, will be inspired
to take action and bring about the next great age of man, the unlimited energy age. But
for you to have much hope of being effective, you have to understand electricity in a bit
more detail than most people do. Hopefully the next chapter will help you accomplish that
understanding.
394 Week 8: Faraday’s Law and Induction
Problem 1.
Physics Concepts
Make this week’s physics concepts summary as you work all of the problems in this
week’s assignment. Be sure to cross-reference each concept in the summary to the prob-
lem(s) they were key to. Do the work carefully enough that you can (after it has been
handed in and graded) punch it and add it to a three ring binder for review and study come
finals!
Problem 2.
b
A a long straight wire carries a current I(t) =
I0 sin(ωt). A rectangular loop of wire with resis-
d tance R and dimensions a × b is a distance d
a
away as shown.
I
Find (as functions of time, in order):
Problem 3.
R
V L
B in
S
A rod of length L and mass m initially sits at rest on two frictionless conducting rails
that sit in a plane perpendicular to a magnetic field as shown. At time t = 0 a switch S is
closed connecting a voltage V that goes through a resistance R and the rod. Assume that
the rod is initially at x(0) = 0.
Find (in order):
• Using heuristic reasoning (that is, not solving the equation of motion), what do you
expect the terminal velocity of the rod to be after the switch has been closed for a
long time? (Don’t forget to give the direction and explain your reasoning!)
• Write Kirchoff’s Loop Rule for the circuit, including both the voltage of the battery and
the induced voltage as the rod moves with a (presumed) speed v.
• Write Newton’s Second Law for the rod and solve the equation of motion for v(t).
Does it approach the terminal velocity the way you would expect?
396 Week 8: Faraday’s Law and Induction
Problem 4.
Problem 5.
Problem 6.
Complete the toroidal solenoid example begun for you above (see figure 8.15). Find the
self-inductance L of a toroidal solenoid of N turns that has inner radius a, outer radius b,
and height h.
Problem 7.
Complete the coaxial-cable example begun for you above (see figure 8.16). That is, find
the high-frequency self-inductance per unit length of a coaxial cable with inner conductor
radius a, outer conductor radius b.
Problem 8.
I(0) S t = 0 I (0)
2
R
I1(0)
R L
V0
In the circuit above, switch S has been closed for a very long time. At time t = 0 the
switch is opened. Find:
a) The currents I(0), I1 (0), and I2 (0) at t = 0 at the instant before the switch is opened.
b) Using Kirchhoff’s voltage rule, find (derive) and solve the differential equation for
I2 (t). Draw a qualitative plot of this function.
c) Write an expression for the energy stored on the inductor as a function of time, using
your answer to b). Draw a qualitative plot of this function.
398 Week 8: Faraday’s Law and Induction
Problem 9.
A
Estimate the braking power of the system as follows. Assume that each magnet pro-
duces a total peak flux φ = BA through a single turn of the loop. Also assume that the
flux of each magnet ramps up linearly from zero to φ and then back down to zero in the
time Tl required for the magnet to swing past a loop (so the flux is a “sawtooth” pattern as
a function of time). Determine Tl as a function of Ω and M . Then:
a) Estimate the induced voltage and current during the ramp up and ramp down phases.
Plot them as a function of time over several periods Tl , assuming constant Ω.
b) Compute the (effectively average) power as a function of Ω and plot it as well for
several Tl .
In the previous homework problem, you should have gotten an average power (slowing
the car down!) as a function of the angular velocity Ω of the wheel of the of the general
form:
dK
P = −CΩ2 =
dt
for a constant C that depends on M , N , etc (the constant is not given here as you are
supposed to have derived this in the previous problem).
This is the rate the kinetic energy of the car is being reduced while e.g. recharging the
electronic or hybrid car’s battery. As the kinetic energy is reduced, the car will slow down.
Note well that:
v
Ω=
rt
where v is the speed of the car and rt is the radius of the tire. The kinetic energy of the car
is thus:
1 1
K = mc v 2 = mc Ω2 rt2
2 2
So here’s the challenge: Convert the expression for the power above into a differential
equation for K, the kinetic energy of the car. Solve for the kinetic energy K(t) as a function
of time, starting from an arbitrary initial value K0 . Use this result (or a re-expression of the
Week 9: Alternating Current Circuits 399
ODE as a differential equation for the rate of change of the velocity/speed of the car) to
p
find v(t) (assuming it is initially in the positive direction with initial value v0 = 2K0 /m).
Note well that according to this model, the car would never quite stop if only magnetic
braking were used (assuming that the model above is accurate even for very small speeds
– unlikely). Cars with magnetic brakes must always transition to friction brakes when the
speed becomes small, because they exert less braking force and remove less energy as
the car slows down!
400 Week 9: Alternating Current Circuits
Week 9: Alternating Current Circuits
• AC Generator: If one spins a coil with N turns and cross-sectional area A at angular
velocity ω in a uniform magnetic field B oriented so that it passes straight through the
coil at one point in its rotation, one generates an alternating voltage according to:
• The most common models for household electrical distribution are represented in
the following table (note well that ω = 2πf where f is the frequency of the source
in Hertz): 208 is the potential difference between any two phases of a three-phase
Table 4: Common alternating voltages and frequencies in use around the world. There is
a dazzling array of plug types in use around the world as well.
“Wye” main supply in the US where the pole voltages are 120 relative to ground:
and 240 is similarly the difference between two 120 volt lines that are completely out
of phase. Do not use this table as an authoritative guide to electrical main supplies
around the world; there are many such authoritative guides and tables available on
the internet117 .
117
Wikipedia: http://www.wikipedia.org/wiki/Mains electricity. See also the many links in this article.
401
402 Week 9: Alternating Current Circuits
• The reason for using such low frequencies is that AC does not flow uniformly through
a conductor – it is lies within an exponential distance of the outer surface of a con-
ductor, a length called the skin depth. At 60 Hz this length is roughly 8.5 mm in
copper; copper conductors “an inch in diameter” or more have relatively little current
transmitted along their axis, where at 10 kHz (an arguably safer frequency) it is 0.66
mm in copper. Thicknesses comparable to the skin depth increase the resistance
of a wire by effectively decreasing its cross-sectional area. 50 or 60 Hz are thus
compromises between the need to use AC to transmit energy long distances and the
need to minimize the resistance of the transmission wires along the way.
• It is no exaggeration to state that this is the fundamental basis for modern civilization.
Power distributed over long distances using step-up and step-down transformers has
created the highest global standard of living in human history. Some 2/3 of the world’s
population uses nearly ubiquitous electricity to light, heat and cool their homes, to
refrigerate and cook their food, to fuel devices that provide increasingly universal ac-
cess to information in many of its sensory forms – musical, textual, visual, to provide
transportation, to fuel industry and commerce and agriculture. If the electrical grid for
any reason ceased to function we would regress to a medieval existence in a matter
of weeks (as I have personally experienced as both hurricanes and ice storms have
caused weeklong power outages in North Carolina on more than one occasion).
• There are two critical aspects of so-called alternating current (AC) that we will study
in this course. The first is transformers and the electrical grid that delivers power
to points distant from the generators with minimal loss. The second is the basis for
signal processing electronics: the LRC band-pass circuit (or tank circuit) that can be
used with rectifiers to build a simple amplitude-modulation (AM) radio. This circuit
and its variants is ubiquitous in non-digital (and most digital) information processing
devices.
• The Transformer: The transformer is basically a pair of flux-coupled coils, one (the
primary) with Np turns connected to the source of alternating voltage, the other (the
secondary) with Ns turns connected to the load that actually consumes the energy
delivered from the source. All of the flux that passes through any turn in the primary
or secondary coils passes (with as little loss as it is possible to arrange) through all
of the turns in both coils. The flux is usually coupled by wrapping the coils around
e.g. a torus of soft iron that traps flux, laminated to prevent eddy currents (called the
transformer core).
• If we let φm be the flux trapped in the core that passes through a single turn, then:
dφm
Vs = Ns (9.5)
dt
dφm
Vp = Np (9.6)
dt
Week 9: Alternating Current Circuits 403
Vs Ns
= (9.7)
Vp Np
Note that we omit Lenz’s law in this expression because we can wrap either coil either
way around the core so that the voltages on primary or secondary side can be “in
phase” or “exactly out of phase” as we wish.
• A transformer can thus step voltage up to higher levels or step it down to lower ones,
depending on whether Np < Ns or vice versa.
• Here’s the trick of the power grid. The resistance of a wire is (recall) R = ρL
A (where
A is the effective cross section at a given frequency). A copper wire just under a
quarter inch thick has a resistance of roughly 1 Ohm/mile (rule of thumb). A wire a
third of an inch thick has a resistance of roughly 0.1 Ohms/mile. Wires this thick are
heavy and expensive and have to carry a lot of energy. Now, suppose we have a
power station a mere ten miles from your home. The total resistance of all the wires
between that power station and your home is easily order of an ohm. Now imagine
that you turn on a single 100 Watt bulb (drawing roughly 1 A in current. The power
station must provide 101 Watts for your bulb to burn – 100 Watts used by the bulb
and I 2 R ≈ 1 Watt used in the supply line.
However, you then turn on the rest of your lights, your refrigerator kicks on, your AC
starts up. Your house is now drawing more like 100 Amperes (delivered in parallel
to the many appliances) and is using order of 10000 Watts. So is the supply line!
Half of the energy being delivered to your home is wasted as heat along the way. A
second consequence is that the voltage at your house is reduced to a fraction of the
nominal voltage as you turn on more appliances and more of the voltage drop occurs
across the supply resistance!
The solution is to transmit at high voltage and low current and use at low voltage
and high current. If we step up the voltage by (say) 10,000 Volts (real long distance
transmission is at much higher voltages than this) then in order to deliver the same
power at the far end, instead of delivering 100 Amps at 100 volts one can deliver 1
Amp at 10,000 Volts! The resistive heating of the supply line is back to 1 Watt out of
10,000 delivered. Here the square in I 2 R becomes your friend – delivering 10 kW at
100,000 V requires only 0.1 A and uses only 0.01 W heating the wire.
This is good for transmission, but bad for utilization. 100,000 volts can arc an appre-
ciable distance through even dry air; that’s why the insulators on high voltage trans-
mission towers are so long! We’d hate to get electrocuted every time we changed a
light bulb as power arced out of the socket through our bodies on the way to ground.
With an entire power plant delivering the energy, even the (mere) 16,000 volt lines
that run down the streets can literally make your body explode if you should stray
within a few cm of a supply line. Remember the crispy-fried squirrel story!
• Consequently, there is always a step-down transformer at the very end of the line,
that drops the voltage in our houses to the much safer but still dangerous 120 volts
(relative to ground). We use currents on the order of 1-20 Amps within the house,
404 Week 9: Alternating Current Circuits
which is low enough that the resistive heating of the order of 30-50 meter long house-
hold supply lines remains low. Even “low” can waste a lot of heat! 12 gauge copper
wire has a resistance of a bit less than 0.25 Ohms in 50 meters, wasting around
100 watts heating the wire all along its length when one draws 20 Amps of current
(and reducing the line voltage available to the ∼2000 watt appliance at the end that
is drawing all of that power by roughly 5%). Personally, I prefer to do primary runs
in household wiring with the even thick 10 gauge wire (and not to use the thinner
14 gauge wire at all to minimize heat loss in the household wiring. As you can see,
though, you can easily waste anywhere from 1% to 5% of your energy bill simply
heating the space inside your walls!
• Non-driven LC circuit: In figure 9.1, the capacitor C on the left is initially charged
S
+Q
L
C
Q dI
−L =0 (9.8)
C dt
where
dQ
I=− (9.9)
dt
If we substitute this relation in for the I’s and divide by L, we get the following second
order, linear, homogeneous ordinary differential equation:
d2 Q Q
2
+ =0 (9.10)
dt LC
We recognize this as the differential equation for a harmonic oscillator! To solve it,
we “guess”118 :
Q(t) = Q0 eαt (9.11)
1
α2 + =0 (9.12)
LC
118
Not really.
Week 9: Alternating Current Circuits 405
We solve for: r
1
α = ±i = ±iω0 (9.13)
LC
and get:
Q(t) = Q0+ e+iω0 t + Q0− e−iω0 t (9.14)
• Non-driven LRC circuit: In figure 9.2, the capacitor C on the left is initially charged
S
+Q
L
C
R
Figure 9.2: Undriven LRC circuit
Q dI
− L − IR = 0 (9.16)
C dt
where
dQ
I=− (9.17)
dt
If we substitute this relation in for the I’s and divide by L, we get the following second
order, linear, homogeneous ordinary differential equation:
d2 Q R dQ Q
2
+ + =0 (9.18)
dt L dt LC
We recognize this as the differential equation for a damped harmonic oscillator. To
solve it, we “guess”119 :
Q(t) = Q0 eαt (9.19)
R 1
α2 + α+ =0 (9.20)
L LC
119
Not really.
406 Week 9: Alternating Current Circuits
We solve for:
q
R 2 4
R L − LC
α = − ±
2L r2
R R2 C
= − ± iω0 1−
2L r 4L
R τC
= − ± iω0 1 −
2L 4τL
R
= − ± iω ′
2L
(9.21)
q
τC
where τL = L/R τC = RC, ω ′ =0 1− 4τL , and our final solution looks like:
−Rt
Q(t) = Q0 e 2L cos(ω ′ t) (9.22)
(after we choose the real part of the complex exponential and use the initial condi-
tions).
From this we can easily find the current through and voltage across all of the elements
of the circuit. Finally, given the current and voltages it is easy to show that energy is
conserved, that the initial energy stored in the capacitor exactly balances the energy
consumed in the resistor as t → ∞.
I(t)
V(t) R
V0 sin(ωt) − IR = 0 (9.23)
or
V0
IR (t) = sin(ωt) (9.24)
R
and we see that the current is in phase with the voltage drop across a resistor.
I(t)
V(t) C
Q
V0 sin(ωt) − =0 (9.25)
C
We can solve for Q(t):
Q(t) = CV0 sin(ωt) (9.26)
dQ(t)
IC (t) = = (ωC)V0 cos(ωt)
dt
= (ωC)V0 sin(ωt + π/2) = I0 sin(ωt + π/2) (9.27)
where
V0
I0 = (ωC)V0 = (9.28)
χC
We see that the current is π/2 ahead in phase of the voltage drop across the capac-
itor. We will actually usually use this the other way around and note that the voltage
drop across the capacitor is π/2 behind the current through it. We call the quan-
1
tity χC = ωC (which clearly has the units of Ohms) the capacitative reactance, the
“resistance” of a capacitor to alternating voltages.
I(t)
V(t) L
dI
V0 sin(ωt) − L =0 (9.29)
dt
We can solve for dI(t):
V0
dI = sin(ωt)dt (9.30)
L
We integrate both sides to get:
V0
Z
IL (t) = sin(ωt)dt
L
V0
Z
= sin(ωt) ωdt
ωL
V0
= cos(ωt) (9.31)
ωL
V0
= sin(ωt − π/2) (9.32)
ωL
= I0 sin(ωt − π/2) (9.33)
(9.34)
where
V0 V0
I0 = = (9.35)
ωL χL
We see that the current is π/2 behind in phase of the voltage drop across the inductor.
We will actually usually use this the other way around and note that the voltage drop
across the inductor is π/2 ahead of the current through it. We call the quantity χL =
ωL (which clearly has the units of Ohms) the inductive reactance, the “resistance” of
an inductor to alternating voltages.
• The Series LRC Circuit: We apply Kirchhoff’s voltage/loop rule to this circuit and
L I(t)
V0 cos( ω t) R
get:
dI Q
V0 sin(ωt) − L − RI − =0 (9.36)
dt C
or
VL + VR + VC = V0 sin(ωt) (9.37)
Week 9: Alternating Current Circuits 409
or
d2 Q R dQ 1 V0
2
+ + Q= sin(ωt) (9.38)
dt L dt LC L
There are a number of way to solve this second order, linear, inhomogeneous ordi-
nary differential equation. We will first show a simple one that relies on a “guess”,
then we will show how if we use complex exponentials we really don’t have to guess.
Our goal will be to solve for all voltage drops, the current in the circuit, the power
delivered to each circuit element and the entire circuit as a whole – pretty much
everything.
The first thing to note that if we find at least one “particular” solution Qp (t) to the
inhomogeneous ODE, we can construct a new solution by adding any solution to the
homogeneous ODE (the undriven LRC circuit solved above) and still get a solution.
That is, a general solution can be written:
Note that the solution to the homogeneous ODE decays in time exponentially. It is a
transient contribution to the overall solution and after many lifetimes τL = R/L it will
generally be negligible.
The remaining particular part is therefore called the steady state part of the solution,
and it persists indefinitely, as long as the driving voltage remains turned on. We
expect that the time dependence of the steady state solution be harmonic (like the
applied voltage) and to have the same frequency as the applied voltage. However,
there is no particular reason to expect the charge Q to be in phase with the applied
voltage.
We will find it slightly more convenient to work at first with the current I than the
charge Q – we can always find Q(t) (or VC ) by integration and VL by differentiation –
although when we go to a complex formulation it won’t matter. If we make the guess:
then solving the problem is easy120 . We begin by noting the voltage drops across all
three circuit elements in terms of I(t):
VR = I0 R sin(ωt − φ) (9.41)
VL = I0 χL sin(ωt − φ + π/2) (9.42)
VC = I0 χC sin(ωt − φ − π/2) (9.43)
or
Our goal, then, is to find values of I0 and φ for which this equation is true. This is
quite simple. Suppose I use a phasor diagram to add the trig functions graphically:
The y-components of the phasors on the diagram that are proportional to I0 must
I0χL
V0 I0χC
φ
I 0R
ωt
V0 cos( ω t)
add up to produce V0 sin(ωt), and this must be true if we add up the phasors as
shown, taking advantage of our knowledge of the phase of the voltage drop across
the various elements relative to the current through those elements.
If we let V0 = I0 Z where Z is called the impedance of the circuit, we can cancel the
I0 and get the following triangle for the impedance: From this triangle we can easily
Z
χ −χ
L C
φ
R
see that:
p
Z= R2 + (χL − χC )2 (9.45)
so that
V0
I0 = (9.46)
Z
and
−1 χL − χC
φ = tan (9.47)
R
The parallel LRC circuit is actually much simpler than the series as far as under-
standing the solution is concerned. This is because the same voltage drop V0 sin(ωt)
occurs across all three components, and so we can just write down the currents
Week 9: Alternating Current Circuits 411
Note well that we use the rules we derived where the current through the inductor is
π/2 behind the voltage (which is therefore π/2 ahead of the current) and vice versa
for the capacitor. To find the total current provided by the voltage, we simply add
these three currents according to Kirchhoff’s junction rule. Of course, we are adding
three trig functions with different relative phases, so we once again must accomplish
this with suitable phasors:
V0 V0 V0
Itot = sin(ωt) + sin(ωt − π/2) + sin(ωt + π/2)
R χL χC
V0
= sin(ωt − φ)
Z
= I0 sin(ωt − φ) (9.51)
In this expression, a bit of contemplation should convince you that the impedance Z
for this circuit is given by the entirely reasonable:
s
1 1 1 1 2
= +( − ) (9.52)
Z R2 χC χL
which we recognize as the phasor equivalent of the familiar rule for reciprocal addition
of resistances in parallel, and:
1 1
!
χC − χL
φ = tan−1 1
R
RC(ω 2 − ω02 )
= tan−1 (9.53)
ω
for the phase.
1
Resonance for this circuit is a bit unusual – it is the frequency ω = ω0 = √LC as
before, but now f rac1Z is largest at resonance and the current increases away from
resonance. The power delivered to the resistance no longer depends on L or C and
only depends on the frequency as:
V02 sin2 (ωt)
PR = (9.54)
R
so that the average power delivered to the circuit is:
V02
< P >=< PR >= (9.55)
2R
independent of frequency altogether. Away from resonance, one simply generates a
large (but irrelevant) current in either L (for low frequencies) or C (for high frequen-
cies) that is out of phase with the voltage and hence dissipates zero average power
per cycle.
412 Week 9: Alternating Current Circuits
As we have seen in the previous chapter, if one spins a coil with N turns and cross-
sectional area A at angular velocity ω in a uniform magnetic field B oriented so that it
passes straight through the coil at one point in its rotation, one generates an alternating
voltage according to:
This is, in fact, the functional form of the voltage that comes out of wall receptacles in
your house, no matter what the voltage or frequency used by your particular country of
residence. It is also the general functional form of electrical signals generated by many
other means in (for example) radio transmitters.
In this chapter, then, we will learn to treat “arbitrary” harmonic alternating voltage
sources as having the form:
V (t) = V0 sin(ωt) (9.58)
where of course we can introduce an arbitrary phase (corresponding to the choice of when
we start our clock). In this expression, remember that:
2π
ω = 2πf = (9.59)
T
where f is the frequency of the harmonic oscillation in units of Hertz (cycles per second)
and T is the corresponding period.
We will also look at slightly more general voltage sources that are nearly harmonic, in
particular amplitude modulated harmonic sources such as:
where A(t) is a slowly varying function of time (making only small changes over many
periods T of the harmonic part). More advanced students should note well that we will not
properly treat this problem by means of e.g. a Fourier Transform, as knowledge of Fourier
Transforms (however useful!) is not a requirement for this course. We will barely explore
some of the benefits of treating voltages or currents given in a complex form:
where V0 may be a general complex number, V0 = |V0 |eiδ but again, advanced students
should keep in mind the fact that this often makes things much easier once one has paid
the price of learning how to use algebra over the field of complex numbers plus a few things
such as Cauchy’s theorem and Fourier Transforms. Some ideas, such as the importance
of having enough bandwidth to encode an amplitude modulated (or otherwise encoded)
signal on top of a given carrier frequency while nevertheless remaining well resolved from
nearby carriers carrying information on other channels are very difficult to prove without
using this more advance math, so students will have to content themselves with a few of
this book’s rare it-is-so-because-I-say-so without proper derivation or justification.
Week 9: Alternating Current Circuits 413
One very important thing all students should learn from this chapter is just how alter-
nativing voltages and high-voltage transmission lines, together, are nothing less than the
basis for modern civilization – a country’s productive capacity and the comfort of its citi-
zens is directly linked to its ability to generate electrical energy and distribute it widely in a
cost-effective way.
Nothing convinces one more of this than the not-terribly-infrequent instances of power
outages when hurricanes, ice storms, earthquakes, or solar storms interrupt the power
grid for days or even weeks of time. During the downtime one immediately loses all refrig-
eration (so stored food spoils), heating and cooling (so one has to survive at the ambient
temperature as best one can), the ability to turn light on and off with the touch of a finger
(so one can stay up later and get up earlier than the sun), the ability to drive safely (no
traffic lights), the ability to bank or shop indoors in shopping malls (no air conditioning,
lights, electronic cash registers, check card readers), the ability to listen to music, com-
pute, browse the internet (once local battery stores are exhausted). Over a single week life
devolves to what it was like over a century ago before the advent of universally accessible,
inexpensive electricity.
Life over a century ago, without electricity, sucked!
The most common models for household electrical distribution are represented in the fol-
lowing table (note well that ω = 2πf where f is the frequency of the source in Hertz):
208 volts is the potential difference between any two phases of a three-phase “Wye” main
Table 5: Common alternating voltages and frequencies in use around the world. There is
a dazzling array of plug types in use around the world as well.
supply in the US where the pole voltage are 120 volts relative to ground:
and 240 is similarly the difference between two 120 volt lines that are completely out of
phase. Do not use this table as an authoritative guide to electrical main supplies around
the world; there are many such authoritative guides and tables available on the internet121
.
121
Wikipedia: http://www.wikipedia.org/wiki/Mains electricity. See also the many links in this article.
414 Week 9: Alternating Current Circuits
ricanes and ice storms have caused weeklong power outages in North Carolina on more
than one occasion).
Let us understand the transformer and the role that it plays in the transmission of power.
Iron Core
Ip
Is
Vpsin ωt Np Ns Vs sin ωt R
Flux
Figure 9.9: A transformer transforms voltage V1 into a new voltage V2 , for time-varying
(usually sinusoidal) voltages only.
The transformer is basically a pair of flux-coupled coils, one (the primary) with Np
turns connected to the source of alternating voltage, the other (the secondary) with Ns
turns connected to the load that actually consumes the energy delivered from the source.
All of the flux that passes through any turn in the primary or secondary coils passes (with
as little loss as it is possible to arrange) through all of the turns in both coils. The flux
is usually coupled by wrapping the coils around e.g. a torus of soft iron that traps flux,
laminated to prevent eddy currents (called the transformer core).
If we let φm be the flux trapped in the core that passes through a single turn, then:
dφm
Vs = Ns (9.64)
dt
dφm
Vp = Np (9.65)
dt
or (taking the ratios of these two equations, in order)
Vs Ns
= (9.66)
Vp Np
Note that we omit Lenz’s law in this expression because we can wrap either coil either way
around the core so that the voltages on primary or secondary side can be “in phase” or
“exactly out of phase” as we wish.
A transformer can thus step voltage up to higher levels or step it down to lower ones,
depending on whether Np < Ns or vice versa. This seems as though it would be obviously
useful for many, many things, and of course it is. Sometimes we need a high voltage and a
low current in a wire; other times we need a low voltage and a high current. Note well that
we can’t magically get a higher voltage and more current out of a transformer as this would
416 Week 9: Alternating Current Circuits
violate energy conservation. In fact, if we compute the power delivered by the primary
voltage to the transformer and equate it to the power consumed by the secondary circuit,
then as long as the transformer itself doesn’t get hot (removing energy from the circuit of
its own accord):
Pp = Vp Ip = Vs Is = Ps (9.67)
or, if we use the fact that Vs = Vp Ns /Np and divide a couple of times, we find that:
Np
Is = Ip (9.68)
Ns
When the voltage goes up (Ns > Np ) the current goes down, and vice-versa.
Of course this does assume that the transformer itself and all of its wiring doesn’t have
any resistance and get hot, and the iron core of the transformer must also not get hot.
However, the iron core is itself a conductor. When the magnetic flux through it is constantly
changing it induces a voltage in it that causes a current to flow. That current, flowing in the
resistance of the iron, generates heat! This kind of inductive heating is said to be caused
by eddy currents, currents induced in any conductor by rapidly changing magnetic flux
through the conductor.
It is also clearly undesirable, as the heat that appears in the iron core is lost and hence
reduces the available power (voltage and current alike) on the secondary compared to what
comes in through the primary. To minimize eddy currents, the iron core is usually made of
laminated strips of iron separated by insulating resin or out of insulated wires of iron. The
small cross-sectional area of the individual conductors thus minimizes flux, voltage and
current, and thereby losses to heating through eddy currents.
Now, high voltage is dangerous. Dielectric breakdown can easily occur of the voltage is
high enough – power can simply leap through the air in an electrical arc and fry whatever
it passes through on its way to ground. Nevertheless, we find it very useful to use high
voltage to transmit electrical power long distances by using the fact that current goes down
as the voltage goes up for any given power being delivered.
When electricity was first introduced into society on a grand scale (largely by Thomas Edi-
son, to use in his recently invented light bulbs) Edison wished to power the world with
direct current (DC) lines from his generating stations directly into your home, at a very
low (and thereby safe) voltage. Edison had a number of patents on various aspects of
DC power generation, storage, and metering, and had a vested interest in all of this tech-
nology. However, Edison was no mathematician, and did not understand electricity or
Maxwell’s equations (indeed, at the time Maxwell’s equations were only about 20 years
old and there weren’t a lot of people who weren’t mathematicians or physicists who did
understand them).
There is just one problem. At low voltages, delivering power across miles of wire to
households can easily be shown to waste almost all of that energy heating the wires that
carry it, and leave almost none for the households at the end! Edison’s solution required
Week 9: Alternating Current Circuits 417
there to be a DC generating plant within a mile, at most, of every household that received
its energy, and required massive amounts of copper even then for its transmission lines.
At the same time, a George Westinghouse had hired a young man named William
Stanley Jr122 ) to work on implementing an alternating current distribution system using AC
transformers, which had just been invented. Stanley (working with a few others) in 1886
built a working AC distribution system that distributed the electricity over long distances
with low currents at very high voltages in Great Barrington (basically, the neighborhood
in which he lived). This system required much, much thinner wires and could distribute
the energy over long distances but also required that the voltage be stepped down to a
relatively “safe” voltage inside the homes and businesses where it was going to be used.
Westinghouse almost immediately began to sell and build AC distribution systems based
on Stanley’s design for US cities that were eager to reap the benefits of electricity for its
citizens.
Westinghouse also acquired at roughly the same time the patent(s) of a young man
named Nikola Tesla123 on polyphase generators and motors that would run on AC volt-
age, basically going “all in” on AC distribution. This led to the so-called War of the Cur-
rents124 between the two companies – Edison’s General Electric and Westinghouse’s
Westinghouse (both of which survive as supergiant corporations to this day).
Edison lost. We absolutely need to learn, and understand, why as it is of paramount
importance today, some 130 years later, as we struggle to convert to renewable resource
electrical generation, conversion of e.g. sunlight or the power of the wind in suitable lo-
cations into electrical current and its transmission across thousands to as many as ten
thousand miles from those locations to where it will be consumed (say, from the Sahara
desert to Finland, or India to Siberia).
So here’s the trick of the power grid, Stanley’s solution. The resistance of a wire is
(recall) R = ρL
A (where A is the effective cross section at a given frequency). A copper wire
just under a quarter inch thick has a resistance of roughly 1 Ohm/mile (rule of thumb). A
wire a third of an inch thick has a resistance of roughly 0.1 Ohms/mile. Wires this thick are
heavy and expensive and have to carry a lot of energy. Now, suppose we have a power
station a mere ten miles from your home. The total resistance of all the wires between that
power station and your home is easily order of an ohm. Now imagine that you turn on a
single 100 Watt bulb (drawing roughly 1 A in current. The power station must provide 101
Watts for your bulb to burn – 100 Watts used by the bulb and I 2 R ≈ 1 Watt used in the
122
Wikipedia: http://www.wikipedia.org/wiki/William Stanley Jr. William Stanley, incidentally, is also the inven-
tor of the “Stanley stainless steel thermos”. Interestingly, General Electric eventually bought out a controlling
interest in Stanley’s own electrical research and manufacturing company.
123
Wikipedia: http://www.wikipedia.org/wiki/Nikola Tesla. Tesla was the original “mad scientist” – he is the
original inventor of the radio (and was cheated of the patent), he invented his own ruinously inefficient and
short range electrical distribution system based on the Tesla coil, invented the X-ray tube and photographed
the bones of his own hand before Roentgen (but failed to publish or patent and lost the technical descriptions in
a fatal fire that destroyed much of his work prematurely), he purportedly invented a “death ray”, but destroyed
it after a single apocryphal demonstration of its effects. He had a photographic memory and reportedly expe-
rienced direct insight into problems he was working on, bypassing all normal routes to invention or design. He
is basically an enormously interesting person I a strongly recommend reading at least the wikipedia article on
him.
124
Wikipedia: http://www.wikipedia.org/wiki/War of Currents. Again, a worthwhile read.
418 Week 9: Alternating Current Circuits
supply line.
However, you then turn on the rest of your lights, your refrigerator kicks on, your AC
starts up. Your house is now drawing more like 100 Amperes (delivered in parallel to
the many appliances) and is using order of 10000 Watts. So is the supply line! Half
of the energy being delivered to your home is wasted as heat along the way. A second
consequence is that the voltage at your house is reduced to a fraction of the nominal
voltage as you turn on more appliances and more of the voltage drop occurs across the
supply resistance!
The solution is to transmit at high voltage and low current and use at low voltage and
high current. If we step up the voltage by (say) 10,000 Volts (real long distance transmis-
sion is at much higher voltages than this) then in order to deliver the same power at the far
end, instead of delivering 100 Amps at 100 volts one can deliver 1 Amp at 10,000 Volts!
The resistive heating of the supply line is back to 1 Watt out of 10,000 delivered. Here the
square in I 2 R becomes your friend – delivering 10 kW at 100,000 V requires only 0.1 A
and uses only 0.01 W heating the wire.
This is good for transmission, but bad for utilization. 100,000 volts can arc an appre-
ciable distance through even dry air; that’s why the insulators on high voltage transmission
towers are so long! We’d hate to get electrocuted every time we changed a light bulb as
power arced out of the socket through our bodies on the way to ground. With an entire
power plant delivering the energy, even the (mere) 16,000 volt lines that run down the
streets can literally make your body explode if you should stray within a few cm of a supply
line.
In one of the few instances in my memory of a power outage at Duke, a squirrel was
crispy-fried when it got inside the barbed wire fences at a major step-down transformer
serving part of the campus. It strayed too near to the main power buses, which arced over
(through the squirrel) blowing the transformer and shutting down power to the campus for
a time. Imagine how exciting life would be if every time you went to plug in an electric light
into your 16,000 volt household wiring or flicked a switch on a humid day, you risked being
electrocuted by what amounts to a manmade lightning bolt!
“Exciting” isn’t quite the right word for it. Consequently, there is always a step-down
transformer at the very end of the line, that drops the voltage in our houses to the much
safer but still dangerous 120 volts (relative to ground). Why such a high voltage? Wouldn’t
(say) 12 volts be even safer? Sure, but we still have to transmit the energy around inside
the house as well, and there is a reason car jumper cables are made of much thicker wires
than (say) a lamp cord!
Even inside the house we use devices that use anywhere from a watt or two up to
almost 2000 watts for devices plugged into an ordinary receptacle. At 120 volts, these
devices draw currents as high as 15 to 20 Amps (per circuit) within the house before circuit
breakers or fuses interrupt the circuit. This is a low enough current that the resistive heating
of the order of 10-50 meter long household supply lines remains “low”, specifically low
enough that the wires don’t melt their insulation and/or set your house on fire if there is a
short circuit momentarily drawing a much higher load!
Even this “low” resistance can waste a lot of heat! 14 gauge copper wire has a resis-
Week 9: Alternating Current Circuits 419
tance of around 0.25 Ohms per 100 feet of wire, wasting around 56 watts heating the wire
all along its length when one draws the maximum National Electrical Code (NEC) permit-
ted 15 amps of current. It also reduces the line voltage available to the appliance(s) at
the end that are drawing all of that current by 3-4%) – there is a reason people refer to
it as “110” volt household wiring when the voltage produced at the outdoor transformer is
carefully regulated at 120 volts – the actual voltage at your appliance could be as much as
10lower or around 110 volts when a heavy load is turned on at the end of a long (50 meter)
wiring run of 14 gauge wire!
Personally, while the NEC does permits one to use 14 gauge wire for wiring of normal
“short” 15 amp household circuits, I prefer to do any primary runs in household wiring with
the thicker 12 gauge wire (and not to use the thinner 14 gauge wire at all if I can help it)
to minimize heat loss in the household wiring. The thicker wire is more expensive, but, as
you can see, with thinner wiring you can easily waste anywhere from 1% to 5% of your
energy bill simply heating the space inside your walls when you run appliances (and then
paying again to air-condition that heat out of your house), month after month over decades!
It’s also safer! Even though the wire doesn’t produce a lot of heat per foot, if it is running
next to other wires in an insulated space, that heat can build up and significantly raise the
temperature! Thicker supply lines run cooler at any given load.
All of this – and the National Electric Code itself – will make sense when you work out
the algebra for yourself and apply the physics we have learned in this text so far, although
there is a lot more to it than we have time to cover here – one can take entire courses
in electronics. One of the homework problems has you do this very thing – explore the
electrical distribution system quantitatively for an example mini-system. Be sure that you
work through it, with the help of your instructor as necessary.
So much for the generation and efficient transmission of power, which we can see relies
very much on AC currents and generators. Next we move on to the use of alternating
voltages of much higher frequency, frequencies that we can associated with radio waves
and information processing. The electrical circuits that allow us to generate, transmit,
receive, encode and decode information in alternating flows of current are very nearly
as important to modern society as the direct delivery of electrical power in the first place.
They are also useful in the laboratory, and are key components of much medical apparatus,
information technology apparatus, entertainment apparatus – they are ubiquitous, in other
words. We begin by seeing how simple arrangements of resistances and inductances can
oscillate in a way that is mathematically identical to the way a mass on a spring oscillates.
To make this section as simple as possible, we begin by noting that in the context of Kir-
choff’s rules and electrical circuits, a capacitor plays precisely the same role as a spring
does in mechanics – it stores electrical charge and energy with a restoring “force” pro-
portional to the charge. A resistance behaves exactly like a linear drag force does on the
mechanical movement of the stored charge. An inductance behaves exactly like a mass
does in a spring-driven harmonic oscillator, as a reservoir for the “kinetic” energy associ-
420 Week 9: Alternating Current Circuits
ated with flowing charge and the “momentum” that causes that charge to tend to continue
flowing unless acted on by opposing forces. Finally, a harmonically alternating voltage
behaves exactly like a harmonically altering driving force in the damped, driven harmonic
oscillator.
One can also build a circuit made entirely out of water-filled pipes that precisely mimics
an electrical circuit. A section of the pipe containing a spring loaded piston that can store
water on one side against the pressure difference maintained by the spring is a “capaci-
tor”. A sand-filled pipe that resists the flow of water is a “resistor”. The water itself, which is
massive and hence continues to flow in the (frictionless) pipe until slowed down by resis-
tances or pressure differences is an “inductor”. Finally, a pump that creates a harmonically
oscillating pressure difference in the water, e.g. a harmonically driven pistor in a pipe, is
just like an “alternating voltage”.
Keep this in mind as we develop the following. Even though of course the algebra will
be specific to the particular circuits being studied, the results will be analogous to identical
results that arise from solving identical equations in other contexts you have already ex-
plored in mechanics. This conceptual repitition can help you learn the material more easily,
and help you remember it for longer without additional reinforcement, provided (of course)
that you properly studied harmonic oscillators the first time you encountered them.
S
+Q
L
C
In figure 9.10, the capacitor C on the left is initially charged up to charge Q0 . At time
t = 0 the switch is closed and current begins to flow. If we apply Kirchhoff’s voltage/loop
rule to the circuit, we get:
Q dI
−L =0 (9.69)
C dt
where
dQ
I=− (9.70)
dt
If we substitute this relation in for the I’s and divide by L, we get the following second
order, linear, homogeneous ordinary differential equation:
d2 Q Q
2
+ =0 (9.71)
dt LC
Week 9: Alternating Current Circuits 421
We recognize this as the differential equation for a harmonic oscillator! To solve it, we
“guess”125 :
Q(t) = Q0 eαt (9.72)
and substitute this into the ODE to get the characteristic:
1
α2 + =0 (9.73)
LC
We solve for: r
1
α = ±i = ±iω0 (9.74)
LC
and get:
Q(t) = Q0+ e+iω0 t + Q0− e−iω0 t (9.75)
or (taking the real part and using the initial conditions):
Note well that this overall solution methodology is identical to that used for the simple
harmonic oscillator, with spring constant keff = C1 and mass m = L.
One can, of course, analyze energy in this circuit. At any instant of time, the energy in
the circuit is clearly all the energy stored in the capacitor:
Q(t)2
UC (t) = (9.77)
2C
This energy over time oscillates between the capacitor and the energy in the inductor:
1
UL (t) = LI(t)2 (9.78)
2
Show that the sum of these two energies is a constant, and that the constant equals the
initial energy in the capacitor! This is precisely analogous to what happens to the con-
served total energy as it oscillates between potential energy in a spring and kinetic energy
of motion of the mass in a harmonic oscillator.
In figure 9.11, the capacitor C on the left is initially charged up to charge Q0 . At time t = 0
the switch is closed and current begins to flow. If we apply Kirchhoff’s voltage/loop rule to
the circuit, we get:
Q dI
− L − IR = 0 (9.79)
C dt
where
dQ
I=− (9.80)
dt
If we substitute this relation in for the I’s and divide by L, we get the following second
order, linear, homogeneous ordinary differential equation:
d2 Q R dQ Q
+ + =0 (9.81)
dt2 L dt LC
125
Not really.
422 Week 9: Alternating Current Circuits
S
+Q
L
C
R
Figure 9.11: Undriven LRC circuit
We recognize this as the differential equation for a damped harmonic oscillator. To solve
it, we “guess”126 :
Q(t) = Q̂eαt (9.82)
where Q̂ is some unknown – possibly complex – constant and substitute this into the ODE
to get the characteristic:
R 1
α2 + α + =0 (9.83)
L LC
We solve for:
q
R 2 4
R L − LC
α = − ±
2L r2
R R2 C
= − ± iω0 1−
2L r 4L
R τL
= − ± iω0 1 −
2L 4τR
R
= − ± iω ′
2L
(9.84)
q
τL
where τL = R/L τC = 1/RC, ω ′ =0 1− 4τR . As before (in both the previous section and
in mechanics) we have two complex exponential soutions so we might as well let the
unknown constant (for each) be complex as well:
Then e.g.
Rt ′ Rt ′
Q+ (t) = |Q̂+ |eiφ+ e− 2L ei(ω t) = Q+ e− 2L ei(ω t+φ+ ) (9.85)
where Q+ is the real amplitude of Q̂+ (and a very similar equation for the −iω ′ solution).
Of course, we don’t know what complex charge Q or imaginary charge Q could possibly
be, so we must take the real part of this as our final solution:
n Rt ′
o Rt
Q(t) = ℜe Q+ e− 2L ei(ω t+φ+ ) = Q+ e− 2L cos(ω ′ t + φ+ ) (9.86)
Week 9: Alternating Current Circuits 423
1.5
Exponential Envelope
0.5
Q
-0.5
-1
-1.5
0 2 4 6 8 10
t
Figure 9.12: A weakly damped series LRC-circuit oscillates within an exponential damping
envelope. R/2L = 0.15 and T0 ≈ T ′ = 1 in the time scale of this figure.
where Q+ and φ are both real constants of integration that must be set from the initial
conditions.
For the specific “standard” initial conditions given above, Q+ = Q0 , φ+ = 0 and we get:
Rt
Q(t) = Q0 e− 2L cos(ω ′ t) (9.87)
This solution is plotted in figure 9.12 for the underdamped case (see below), where
RT0 1
= 0.15 ≈ < 2π
2L 7
so that the exponential damping time of the amplitude is roughly seven times the period
T0 = 1 on the scale displayed. Note that this is small enough that little deviation is observed
between the damped and undamped solution over ten cycles of oscillation – T ′ ≈ T0 .
Note that if we used Q̂− and −iω ′ to develop a real solution, its real part would have
exactly the same form and hence would be identical to this once its constants of integra-
tion were set from the same initial conditions, so we don’t even need to write it down, let
alone keep it (or the general complex solution) around.
From this completely general solution for Q(t), we can easily find the current through
and voltage across all of the elements of the circuit. Given the current and these voltages
it is easy to show that energy is conserved (it could hardly not be, given that we started
with KLR) and that as we chase energy down we will find that the initial energy stored in
the capacitor exactly balances the energy consumed in the resistor as t → ∞. This is left
as an exercise – the more of this that you work out on your own (rederiving things at least
once as part of the process) the more you will learn.
126
Not really.
424 Week 9: Alternating Current Circuits
Clearly the analogy with ordinary simple harmonic oscillators with linear damping is
beyond strong – it is algebraically exact. We therefore must expect that series LRC cir-
cuits will also exhibit underdamped oscillation, critically damped exponential decay, and
overdamped even slower expoential decay (just as the mass on the spring did), and that
one can drive the circuit in resonance exactly the same way as well.
Let’s look briefly at these limits:
R2 C
Underdamped Oscillation – < 1 : As always, when the system is underdamped
4L
it will oscillate within a decaying amplitude envelope as illustrated above.
It is worth looking at this condition a bit more closely:
R2 C R2 R 2π RT0
<1 ⇒ × LC < 1 ⇔ < ω0 = or < 2π
4L 4L2 2L T0 2L
We will consider weak damping to be the specific limit where:
R 2π RT0
≪ or ≪ 2π (9.88)
2L T0 2L
R2 C
In this limit, 4L ≪ 1 and:
ω ′ ≈ ω0 , T ′ ≈ T0 (9.89)
(we can basically ignore the frequency shift to lowest order). We will now express the
other two (less important) conditions in terms of ω0 and the
RT0
Critically Damped Oscillation – = 2π : As usual, when the LRC circuit is
2L
critically damped, the charge on the capacitor exponentially approaches zero at the
maximum rate (but because it is an exponential, it never quite gets there).
RT0
Overdamped Oscillation – > 2π : Also as usual, if the LRC circuit is over-
2L
damped, it approaches zero like a mix of exponential decays, but the fastest of them
is still slower than the critically damped approach.
In a short while we will tackle the daunting task of damped driven, active AC circuits, in-
cluding the series LRC circuit driven by an inline AC voltage such as V0 cos(ωt). There we
will find that we can best understand the power delivered to the circuit and used to “drive a
resistive load” – transmit average power from the voltage source to the resistor, basically,
where it turns up as heat – in terms of the so-called quality factor or Q-factor127 we stud-
ied in the context of damped undriven simple harmonic oscillators in the first (mechanics)
half of this introductory series. Let’s work this out for the passive series LRC circuit we just
127
Note well that I’m trying to use a different font for Q than I do for the capacitor charge Q because in this
context they otherwise would both use exactly the same standard symbol, which could be quite confusing in
the algebra!
Week 9: Alternating Current Circuits 425
worked out above, as even in this simple case, Q is a useful parameter for describing the
damping of a circuit.
In mechanics, the Q-factor was defined to be 2π times the reciprocal of the fractional
energy loss in a single period of oscillation:
E(0) E
Q = 2π ′
= 2π (9.90)
E(0) − E(T ) ∆E
where T ′ = 2π ω ′ is the shifted period of the damped oscillator. We will evaluate this in
terms of the known parameters of our circuit in exactly the same way we did for a linearly
damped mass on a spring, only now the energy stored in the LRC circuit is the sum of the
field energies in the capacitor and the inductor – resistors don’t store energy – instead of
in the total potential plus kinetic energy of an oscillating mass.
Evaluating Q from arbitrary initial conditions would be tedious because we’d have to
sum the stored energy in both the capacitor and the inductor and the current in the series
LRC circuit is a a bit messy128 . If, however, we start the LRC oscillator with zero current
and charge Q0 on the capacitor at t = 0 (the initial conditions we used in the previous
section) the algebra is easy. In this case (which, don’t worry, gives the exactly correct gen-
eral result averaged over many cycles due to the wonderful properties of the exponential
function):
RT ′ 2π R
= ′ ≪1
L ωL
in which case it is certainly true that
RT0
≪ 2π
2L
128
We would have to use the product rule to take the time derivative of Q(t) to form I(t), and then we have to
1
square this two-term result to make LI 2 (t), which would then have three time-dependent terms with mixed
2
trig function cross-terms and two distinct exponentials in them – Yuk!
426 Week 9: Alternating Current Circuits
because T0 /2 ≤ T ′ and 2π > 1. In the weak damping case it is clear that we can use a
Taylor series expansion of the exponential in Q129 :
2
RT ′ 1 RT ′
−RT ′ /L 2π R
e =1− + − ... ≈ 1 −
L 2 L ω′L
Substituting in this last form, cancelling the 1’s and keeping only the leading order
surviving term, we get:
✟✟
2π
Q=
2✚
✚
πR
1✁ − 1✁ − ω′ L
or (rearranging):
Lω ′
Q= (9.95)
R
Let’s apply the “weak damping” condition we assumed just above to simplify this even
′
further. To justify keeping only a single term in our expansion of e−RT /2L we assumed that
RT ′ /2L ≪ 1. In this case it is certain that RT0 /2L < RT ′ /L ≪ 1 < 2π is also true, which
means in turn that:
12
R2 C R2 C
′
ω = ω0 1− ≈ ω0 1 − ≪ 1 + ... ≈ ω0
4L 8L
where the latter form makes it easy to see how Q depends on the three parameters R, C,
and L.
Our last chore, then, is to determine where and why our “simple” derivation above might
break down. Let’s see what our equations and definitions tell us when we examine the case
of critical damping.
The condition for critical damping is:
R RT0
= ω0 ⇔ = 2π
2L 2L
as indicated above. If we simply plug this into the “convenient and customary” formula
given above we get:
2πL π 1
Q= = = (9.97)
RT0 2π 2
129
Remember this step – it means that our actual numerical computations (really estimations, here) of Q are
probably not terribly precise for “low” Q where the damping is not weak. We will work out an estimate for the
lowest useful value of Q for a passive (undriven) oscillator below when the damping is not particularly weak.
130
Wink, wink, nod, nod. Y’know what I mean? Y’know what I mean?
Week 9: Alternating Current Circuits 427
But this is impossible! The energy loss of the oscillator in a single period cannot exceed
the initial total energy in the oscillator, especially for a passive LRC circuit with no source
of energy even capable of in principle of adding energy to the circuit. Our approximate
form clearly breaks down long before the critical damping boundary.
Let’s compare this to the actual definition of Q at this boundary. For a critically damped
system, ω ′ = 0 (or T ′ → ∞) even if ω0 and T0 are finite and well-behaved :
E(0)
Qmin = 2π = 2π (!)
E(0) − E(∞)
That is, this value of Q tells us that the the system loses all of its energy in one “period”
of the (not actually oscillating) oscillator! This tells us that:
Qmin = 2π (9.98)
is a hard limit on Q. Any quality factor less than 2π will be meaningless for a passive
oscillator, especially if it is only estimated with the C&C form using ω0 .
This doesn’t quite tell us when we might be able to use Q as a reasonable estimate of
the weak damping of the oscillator. If we pick R, L, C such that :
RT ′
=1
L
then we can evaluate Q exactly using the formula above without the expansion:
1 e
Q = 2π 1 = 2π = 9.94 ≈ 10
1− e
e−1
n Qn % relative error
0 ∞ (undefined)
1 6.28 37
2 12.57 26
3 9.42 5.2
4 10.1 1.1
5 9.92 0.2
6 9.94 0.0
RT ′
Table 6: Table indicating convergence of the expansion of Q for =1
L
In table 6, we can see that in first order (n = 1 – the first surviving term in the expansion
of the exponential, basically, the one that corresponds exactly to our approximate result
Q = 2πL/RT ′ above) the relative error is around 37%. Instead of Q ≈ 10, it yields
428 Week 9: Alternating Current Circuits
Q = 2π = 6.28. This is really suprisingly good in one sense (as the argument of the
exponential is hardly “much much less than one”), but it is non-physical as Q = 2π = Qmin ,
as we just saw, corresponds to critically damped non-oscillation.
Using the same code, I can easily generate a different result: The relative error for
various values of RT ′ /L coming down from 1 in powers of 10. This isn’t worth making into
a table, but when RT ′ /L = 0.1, Q = 66.0 (correct/exact to the given precision) the estimate
for it is 2π/0.1 = 62.8, a relative error of around 5%. By the time it is 0.01, Q = 631, the
estimate is 2π/0.01 = 628, and the relative error is down to around 0.5%.
What can we learn from this (and further computations, not shown)? Primarily that if
we use the first-order approximated form:
1 Lω ′
Q = 2π −RT ′ /L
≈
1−e R
to calculate Q, it’s going to give us errors on the order of 10% or more if Q itself is smaller
than around 10π. To get the errors down to 1%, Q itself will have to be perhaps 100π = 314
or more.
To conclude, one should not use any first order approximate form for any Q less than
10 as the first order approximate form (2π) is already on the edge of being nonphysical
when the true Q = 10. For any RT ′ /L > 1, the resulting first-order Q will be less than 2π
and hence completely nonphysical, regardless of the relative numerical error, and using
the form with T0 instead of the correctly ′
r shifted T will only make it worse. The simpler,
Lω0 L
easier to compute form Q ≈ = (sort of an approximation of an approximation)
R R2 C
will produce relative errors that are a bit larger but by the time Q is big enough to work
decently for the first order ω ′ or T ′ forms (say, 10% relative error at Q ∼ 30), it is probably
ok to use the first order ω0 or T0 forms instead to get errors of the same general order a lot
faster.
Having carefully developed this excellent advice, we’ll shortly ignore it – sort of – when
we consider the average power for an active AC circuit – the driven series LRC circuit. In
that context there is a source of power doing work on the system, and Q will retain a bit of
utility all the way down to perhaps 2 or 3, but with a slightly different physical interpretation.
To go on, we need to introduce a classical harmonic oscillating voltage like that produced
by an AC generator and use it to make an active AC circuit, one driven by a alternating
voltage. Our first step is to determine what the relationship is between voltage (provided
by the generator) across each circuit element, one at a time, and the current through
that circuit element as a function of time. We begin with the resistor, as the easiest to
understand and as a model for the other two.
Week 9: Alternating Current Circuits 429
I(t)
V(t) R
Consider the circuit diagram in figure 9.13, portraying a harmonic alternating voltage 131
placed across a resistance R. Applying Kirchhoff’s voltage/loop rule and Ohm’s Law to the
circuit loop, we get the following equation of motion for the circuit:
V0 cos(ωt) − IR = 0 (9.101)
V0
IR (t) = cos(ωt) (9.102)
R
and we see that the current is in phase with the voltage drop across a resistor.
131
We could obviously use V0 sin(ωt) or V0 cos(ωt + δ) for an arbitrary δ – we choose this particular form to
make it the real part of the imaginary form: V0 eiωt and thereby enable the treatment of the complex solution
below.
430 Week 9: Alternating Current Circuits
I(t)
V(t) C
Proceeding the exact same way, we use Kirchhoff’s voltage rule and the definition of
capacitance to get an equation of motion:
Q
V0 cos(ωt) − =0 (9.103)
C
We wish to find IC (t), the current through the capacitor. To get it, first we solve for Q(t):
and then we differentiate (and use the trigonometric identity − sin(θ) = cos(θ + π/2) to
express the result in terms of the original harmonic function and a phase):
dQ(t)
IC (t) = = −(ωC)V0 sin(ωt)
dt
= (ωC)V0 cos(ωt + π/2) (9.105)
We observe in this result a quantity that behaves like the “resistance” of the capacitor
in an AC circuit, regulating the magnitude of the current as the frequency changes much
the way a resistor would as it increases or decreases. We will give this quantity its own
name – the capacitive reactance of the capacitor at the angular frequency ω – and define
it to be:
1
χC = (9.106)
ωC
Note well that the units of χC are ohms.
Using the capacitive reactance, the peak current in the circuit takes on a more familiar
form:
V0
I0 = (ωC)V0 = (9.107)
χC
so that
IC (t) == I0 cos(ωt + π/2) (9.108)
We see that the current is π/2 ahead in phase of the voltage drop across the capacitor.
We will actually usually use this in series AC circuits with capacitors the other way around
and note that the voltage drop across the capacitor is π/2 behind the current through it.
Week 9: Alternating Current Circuits 431
I(t)
V(t) L
We repeat this process one more time for an inductance L. The methodology is basi-
cally the same: We use Kirchhoff’s voltage rule and the definition of capacitance to get:
dI
V0 cos(ωt) − L =0 (9.109)
dt
χL = ωL (9.113)
We see that the current is π/2 behind in phase of the voltage drop across the inductor.
As before, in series circuits we will actually use this the other way around, considering the
voltage drop across and inductor in a series circuit as being π/2 ahead of the (otherwise
specified) current through it. Let’s see how this works.
432 Week 9: Alternating Current Circuits
L I(t)
V0 cos( ω t) R
C
Figure 9.16: A LRC (tank) circuit.
In figure 9.16 above we see a series LRC circuit, also known for obscure reasons as a
“tank circuit”132 . To analyze it, we apply Kirchhoff’s voltage/loop rule to the circuit and get
the equation of motion:
dI Q
V0 cos(ωt) − L − RI − =0 (9.116)
dt C
or
VL + VR + VC = V0 cos(ωt) (9.117)
or
d2 Q R dQ 1 V0
2
+ + Q= cos(ωt) (9.118)
dt L dt LC L
There are a number of way to solve this second order, linear, inhomogeneous ordinary
differential equation. We will first show a simple one that relies on a “guess”, then we will
show how if we use complex exponentials we really don’t have to guess.
In fact, if we use complex exponentials for everything associated with electrical circuits
and harmonic oscillation we don’t really need to guess, but we do need to know more math
than a student of introductory physics might initially know. Wise students will view this as
an open invitation to learn more math to make the physics easier.
Our goal will be to solve for all voltage drops across circuit elements, the current in the
circuit, the power delivered to each circuit element and the entire circuit as a whole – pretty
much everything – as functions of time.
The first thing to note that if we find at least one “particular” solution Qp (t) to the inho-
mogeneous ODE, we can construct a new solution by adding any solution to the homo-
geneous ODE (the undriven LRC circuit solved above) and still get a solution. That is, a
132
Apparently its mathematical description resembles in some way the fluids pulsed through a tank can
resonate according to the tank dimensions. I include the term – once – just in case a student hears the term
elsewhere and wonders what the hell it refers to.
Week 9: Alternating Current Circuits 433
Note that the solution Qh (t) to the homogeneous ODE (equation 9.86) decays in time
exponentially. It is a transient contribution to the overall solution and after many lifetimes
τL = R/L it will generally be negligible. Nevertheless, it plays an important role in the
overall solution – its presence allows us to solve arbitrary initial value problems by using
the usual variables (amplitude and phase) in the undriven solution!
Now that we know why it is there and what it is good for, we’ll ignore it – we are far
more interested in what happens after the initial transient that depends in detail on initial
conditions has died away.
The (remaining) particular solution Qp (t) is therefore also called the steady state part
of the solution, and it persists indefinitely, as long as the driving voltage remains turned
on. Based on what we derived above for the individual circuit elements being driven by
a harmonic voltage, we expect that the time dependence of the steady state solution for
a circuit with all of them together in series will also be harmonic at the frequency of the
applied voltage). However, there is no particular reason to expect the charge Q(t) to be
in phase with the applied voltage as two out of the three circuit elements are not, in quite
different ways.
We will find it slightly more convenient in our first treatment to work with the current
I rather than the charge Q – This gives us VR (the voltage across R) directly when we
are done, and we can always find Q(t) (or VC ) by integration and VL by differentiation. In
the next section, when we go through the complex solution it won’t matter and we’ll work
directly with Q(t).
We therefore make the informed guess:
If we substitute this one guess into Kirchoff’s Loop Rule, then solving the problem is
easy. We begin by noting the voltage drops across all three circuit elements in terms of
I(t) (where we use the rules for the individual elements we derived above backwards as
we are given the current and seek the voltage):
VR = I0 R cos(ωt − δ) (9.121)
VL = I0 χL cos(ωt − δ + π/2) (9.122)
VC = I0 χC cos(ωt − δ − π/2) (9.123)
Our goal, then, is to find values of I0 and φ for which this equation is true. To accomplish
this very easily, we’ll use a phasor diagram to add the trig functions as if they are the x-
components of phasors of the appropriate lengths, graphically: The x-components of the
434 Week 9: Alternating Current Circuits
I0χL
V0 I0χC
φ
I 0R
ωt
V0 cos( ω t)
phasors on the diagram that are proportional to I0 must add up to produce V0 cos(ωt),
and this must be true if we add up the phasors with precisely the geometry shown in
figure 9.17, taking advantage of our knowledge of the phase of the voltage drop across the
various elements relative to the current through those elements.
If we let V0 = I0 Z where Z is called the impedance of the circuit, we can cancel the I0
and get the following triangle for the impedance: From this triangle we can easily see that:
Z
χ −χ
L C
φ
R
p
Z= R2 + (χL − χC )2 (9.125)
so that
V0
I0 = (9.126)
Z
and
χL − χC
δ = tan−1 (9.127)
R
The discussion above is more a description of the solution of the differential equation for
the series LRC circuit than it is a proper algebraic development of the solution. It requires
multiple “guesses”, and while those guesses are well-motivated and ultimately lead to suc-
cess, one might reasonably wonder whether or not one can solve the differential equation
more generally, or with fewer guesses and assumptions.
Week 9: Alternating Current Circuits 435
The answer is yes, certainly – but... The “but” is that one has to use tools such as
the functional analysis in the form of the Fourier Transform, integration by parts, the Dirac
delta function – things most physics majors or math majors will see sooner or later, but not
necessarily in time for their introductory physics course. Unfortunately, the proper solution
is often never returned to in future classes in physics (or if it is, in a too general and difficult
form).
This optional chapter is included for students who want to “look behind the curtain” now,
rather than waiting until later. I’m trying to make it as simple as possible, which basically
means that I’m going to gloss over the math that a student may not have had yet (principally
Fourier transforms and Dirac Delta functions) while at the same time fully exploiting things
like complex exponentials that I have covered in some detail as a shortcut around a lot of
the even more annoying manipulations required to use sin(ωt) and cos(ωt) as harmonic
functions in the context of the second order linear differential equations one encounters in
physics.
We’ll start by assuming that the differential equation we are solving above is the real
part of a more general complex second order differential equation of exactly the same
form. This let’s us skip what amounts to pages of math associated with Fourier transform-
ing a differential equation per se! And, because we are only considering a harmonic AC
driving voltage it works just as well!
Specifically, we note that:
2
d R d 1 V0
2
+ + Qss = cos(ωt) (9.128)
dt L dt LC L
is identical to:
d2
R d 1 V0 iωt
ℜe 2
+ + Q̂ss = e (9.129)
dt L dt LC L
With this motivation, we will procede to solve the second, complex version of the differ-
ential equation: 2
d R d 1 V0 iωt
2
+ + Q̂ss = e (9.130)
dt L dt LC L
and find a complex solution Q̂ss (t). When we’re done, we’ll just take the real part of the
complex solution and it is guaranteed to solve the first, real version of the differential equa-
tion we got from Kirchoff’s Loop Rule, because all of the constants that appear in it –
V0 , L, R, C – are real.
The time dependence of Q̂ss on the left hand side of equation 9.130 is now evident
without any fancy mathematics. We know that it must be the same as the right hand side,
because the time derivative(s) of exponential functions of time are proportional to (the
same) exponential functions of time – we can at most multiply this by a complex constant.
We can thus write the solution in a completely general form:
where Q and the phase angle β of the complex constant are real, but unknown133.
133
Note that ω in Q̂ss (t) is the given frequency of the driving force and is not to be confused with ω ′ or ω0
for the undriven passive oscillator! Also note that by custom we will restrict β to live in the range [0, 2π) to
eliminate a pointless infinity of answers modulo 2π.
436 Week 9: Alternating Current Circuits
We are now in a position to determine the (real) magnitude Q and the steady-state phase
β. The term in the [ ] brackets is Q! The term in the {} brackets has the form z/|z|, so its
complex amplitude is 1. This term clearly is eiβ .
We can identify its two (real and imaginary) parts:
ω02 − ω 2 −ω R
L
cos(β) = q sin(β) = q (9.137)
2 2 2 R2 2 2 R2 2
(ω0 − ω ) + L2 ω (ω0 − ω )2 +
2
L2 ω
where:
V0 −Rω
Q= and β = tan−1
L(ω02 − ω 2 )
r
R2 2
L (ω02 − ω 2 )2 + L2 ω
This one time, I left the transient/homogeneous part in so you can see the two constants
of integration Q0 and φ – you can set these from the initial conditions, if any, if you need
the solution right from t = 0.
Once enough time has passed for the transient to have decayed away, however, we are
left only with the steady state part, which has no free parameters. Q and δ are completely
determined from the values of V0 , L, R, C, and ω, all given.
The one thing we probably should do before moving on to power is to show how this
solution (which I deliberately obtained in a different form, one that hides the explicit loca-
tion/role of χL and χC and has no explicit appearance of Z) corresponds to the one we got
from phasors. This is easy to do if we remember a True Facttm about the complex unit i.
Let’s form I(t) in two steps:
dQ̂ss
= iωQei(ωt+β) = ωQei(ωt+β+π/2) (9.139)
dt
because:
i = eiπ/2 (!)
so:
dQ̂ss
I(t) = ℜe = ωQ cos(ωt + β + π/2) (9.140)
dt
Now we are in a position to rearrange things and connect them up. First of all, let’s look
p
at the denominator of Q, as we expect that must be related to Z = (ωL − 1/(ωC))2 + R2
(the impedance):
s s
2 R2 2
2 2 2
R 2
1 2 2
(ω0 − ω ) + 2 ω = ( −ω ) + 2ω
L LC L
s
ω2
1
= ( − ωL)2 + R2
L2 ωC
s
ω 1 2 2
ω
= ( − ωL) + R = Z (9.141)
L ωC L
This makes:
V0 V0 1 V0
ωQ = ω r = ω L × ω Z = Z = I0
R2 L
L (ω02 − ω 2 )2 + L2 ω
2
Obviously, the two solutions will be identical if −δ = β + π/2. To prove this, we have to use
a trig identity from the unit circle to relate β to δ:
π
tan(β + ) = − cot(β) = tan(−δ) = − tan(δ) (?) (9.143)
2
438 Week 9: Alternating Current Circuits
This should equal tan(δ) if the two expressions for the current are in fact identical. Is it?
Let’s see. Using cos(β) and sin(β) from above and cancelling the common denominator:
cos(β) L(ω02 − ω 2 )
− cot(β) = − =
sin(β) Rω
ω 1
L × L (ωL − ωC )
=−
Rω
χL − χC
=− = − tan δ (9.144)
R
and our proof is complete.
Let’s state our final result clearly. If Q̂ss (t) = Qei(ωt+β) and
V0 −Rω
Q= and β = tan−1
L(ω02 − ω 2 )
r
R2 2
L (ω02 − ω 2 )2 + L2
ω
then
dQ̂ss V0
I(t) = ℜe = ωQ cos(ωt + β + π/2) = cos(ωt − δ)
dt Z
where: p χL − χC
Z= (χL − χC )2 + R2 δ=
R
exactly as we determined it from the phasor argument.
From this we can conclude several things. First, the phasor argument works – it gives
us the rigorously correct answer from little more than an inspired guess and the geometry
of triangles. It yields the current rather than the charge on the capacitor per se, it is true, but
the current is more useful than Q in our discussion of power in the series LRC circuit, next.
Second, the complex solution is potentially much more general – it can be used to analyze
what happens in the LRC circuit when the input is a general waveform voltage V (t) that is
not a single-frequency harmonic function, once one masters the Fourier transform.
For these reasons, physicists usually concentrate on the former, at least early on – they
need all of the power of the general complex solution and more when they get to quantum
mechanics or advanced electrodynamics – while electrical engineers tend to teach the
latter from the beginning since it can form the basis of a plug-and-play analysis of much
more general circuit structures than the ones we look at here, with much more general
inputs and desired outputs.
We will now return to the mainline thread of the chapter, with all of our algebra muscles
sore from a good workout. You won’t be tested on any of this “advanced” section, even if
you are a physics major, but I think you’ll find its coverage very useful when you eventually
get to this topic in the more advanced courses that await you, or if you one day need to
analyze or design or understand a more complex circuit required in some experiment!
OK, we’ve just spent a lot of time and energy analyzing the series LRC circuit, but we
haven’t really thought about why it is worth the effort. In order to see why the circuit is
Week 9: Alternating Current Circuits 439
useful, we have to examine the flow and delivery of power in the circuit. After all, physics
(and engineering!) is all about “doing work” – making a desired thing happen by arranging
things ‘just so’.
In words, the series LRC circuit is fundamentally useful in electronics because
it functions as a resonant band pass filter to an applied/input harmonic voltage.
Basically, it only allows a large current to flow (and hence deliver power from some input
voltage to the circuit load ) when the frequency of the applied/input voltage is nearly the
same as the resonant, undamped frequency for the circuit:
r
1
ω0 = (9.145)
LC
obtained in the first section of this chapter for the undriven, undamped LC circuit. For
frequencies far from this resonant frequency – in either direction, above it or below it – the
current in the entire circuit and hence power input by the voltage and delivered to the load
rapidly goes to zero. Let us understand this.
Let’s consider the power delivered to or by each circuit element. The instantaneous
power delivered by the input voltage to the circuit is just:
If we use the trig identity135 cos(ωt − δ) = cos(ωt) cos(δ) + sin(ωt) sin δ and I0 = V0 /Z we
get:
V2
P (t) = 0 cos2 ωt cos δ + cos ωt sin ωt sin δ
(9.147)
Z
We don’t usually care about the instantaneous power delivered to the circuit (although
there are very definitely exceptions, such as when the peak power is much larger than the
average power, which can stress e.g. transformers that might be providing the power). If
we time average this to obtain the average power from the voltage we get:
1 V02 1 V02 R 1
hPV i = Pavg = cos δ = = I02 R (9.148)
2 Z 2 Z2 2
where we used the fact that the time average of the square of any harmonic function of
time is 1/2, the fact that cos δ = R/Z (from the impedance triangle above), and the fact that
I0 = V0 /Z.
Note as well that the time average of sin(ωt) cos(ωt) is zero so that the second term
does not contribute to the average power (it does contribute to the peak power delivered
by the input voltage). This can be shown by direct integration over a single period. Note
135
Which can be trivially proven with complex exponentials and the Euler equation in two lines:
cos(A − B) = cos A cos B + sin A sin B and sin(A − B) = sin A cos B − cos A sin B
440 Week 9: Alternating Current Circuits
well that the term itself is zero if δ = 0 (or π). The cos δ term is called the power factor of
the circuit, and will be discussed further below.
Now consider each of the other circuit elements separately:
Again we don’t much care about the peak values, but the averages are important. Ob-
viously, the time average of sin2 (ωt − δ) is still 1/2 and the time average of sin(ωt − δ ±
π/2) sin(ωt − δ) is also still zero! By inspection, then:
1
hPR i = I02 R = hPV i = Pavg (9.152)
2
hPL i = 0 (9.153)
hPC i = 0 (9.154)
Thus the average power provided to the circuit by the applied harmonic input voltage is
delivered to the resistive load R only!
This is quite interesting! We see that the energy flowing in and out of the capacitor and
inductor may be very large (if e.g. I02 χL or I02 χC is large), but these active circuit elements
dissipate/use no energy. In every cycle the net energy they absorb from the circuit equals
the net energy they return to the circuit!
Note well that this does not mean that they aren’t important in assessing the power!
They contribute to – sometimes even dominate – the peak current provided to the sys-
tem. This is an important, possibly crucial, design characteristic and is the reason that we
bothered to separate out cos δ and give it a name.
Now let us consider Pavg of the circuit, understanding that it is delivered to the resistor
(or circuit element such as an amplifier that behaves like a resistance in the circuit and
does something useful – the resistive load ) and not to the inductance or capacitance.
1 V02 1 V2 V2
Pavg = 2
R = I02 R = rms
2
R = rms cos δ = Irms
2
R (9.155)
2Z 2 Z Z
where we have introduced the root-mean square voltage and current:
1 Vrms 1
Vrms = √ V0 Irms = = √ I0 (9.156)
2 Z 2
as a form of the voltage that lets us drop the pesky factor of 1/2 that frequently arises from
averages in harmonic circuits (and leaves us with quantities that look more like their direct
current counterparts, easier to remember).
As we just saw, the various expressions for the average power delivered to the load resis-
tance in at least a series LRC circuit:
1 V02 1
Pavg = cos δ = V0 I0 cos δ = Vrms Irms cos δ (9.157)
2 Z 2
Week 9: Alternating Current Circuits 441
χ L− χ C
χ −χ
L C
Z Z δ
R Z
χ −χ
L C
δ
δ
R R
χ L >> χ C , R R >> χ L− χ C χ C >> χ L , R
Consider the three possible triangles for Z drawn in figure 9.19. In the first one, we
imagine that χL ≫ χC and χL ≫ R – we are basically far from resonance on the high
frequency side so that the circuit (at this frequency) is dominated by induction. That is in
every cycle most of the energy provided to the circuit is stored in the inductor (and then
given back!) and very little is used in the resistive load or the capacitor. Note that:
χL − χC π
δ = tan−1 →∼ tan−1 ∞ = (9.159)
R 2
The power factor is close to zero, so independent of the individual magnitudes of V0 or I0
little power is delivered to the circuit – the voltage and current are pretty much out of phase,
with the voltage leading the current as is the case for inductances alone.
In the second figure, χL − χC ≪ R (whatever their absolute magnitudes). In this case
the energy provided by the voltage is mostly delivered as Joule heating to a resistor or
useful work one on other resistive loads; only a little goes into from the applied voltage into
the capacitor or inductor (that bounce whatever energy they have back and forth between
them, not taking or giving much back to the applied voltage or resistor). In this case:
χL − χC
δ = tan−1 →∼ tan−1 0 = 0 (9.160)
R
and the power factor is close to one. We are in, or close to, resonance, and the current
and voltage are nearly in phase.
Finally, the third figure is “like” the first, only now it is χC ≫ χL and χC ≫ R. This is
likely to be true in the low frequency limit as ω → 0, far from resonance. Most of the power
now goes into and out of the capacitor, with little being dissipated by the resistor or going
into the inductor. The phase angle is now:
χL − χC π
δ = tan−1 →∼ tan−1 −∞ = − (9.161)
R 2
The power factor is again close to zero so that little steady state power is delivered to the
circuit – the voltage and current are out of phase, with the voltage lagging the current as is
the case for capacitances alone.
442 Week 9: Alternating Current Circuits
From these figures, we can deduce the importance of the power factor in circuit design.
In all three cases, the useful work in the circuit is likely to be associated with the power de-
livered to the resistive load. In DC circuits, we expect that power to scale like I 2 R = V 2 /R.
In AC circuits, V and I can sufficiently out of phase that the product of their magnitudes,
V 2 /Z, can be far enough away from the true average power associated with useful work,
(V 2 /Z) × R/Z = V 2 /Z × cos δ, that one has to provide a significantly higer voltage to reach
some target power delivery than one would if it were applied directly to the load resistance.
Why then, one might ask, do we not just leave out capacitors and inductors and apply
our voltage directly to the resistive load? We, we do this when we can, but some of
the things that do useful work – notably electric motors and transformers – are based on
magnetic coils with many turns. They have, in other words, a large induction L as well
as a kind of “resistance” associated with the work that they do, and yet they have almost
no capacitance. To get a desired amount power out of the motor, one has to use a much
higher voltage and current than one might expect to need. You still provide no more total
energy, but in motors the power spent overcoming the actual resistance of the coils and the
motor are wasted work – heat – and scales like the peak magnitude of the actual current
squared! In this case adding a suitable capacitor in such a way as to correct the load factor
from close to ±π/2 to close to 1 will significantly reduce the wasted energy losses in the
circuit, making the e.g. motor a lot more efficient!
In a minute, we’ll return to the specific utility of the power factor in the context of high
pass/low pass filter design, but first, let’s look at the actual power curve itself and its quality
factor Q in more detail.
Let us consider the variation of the average power delivered to our series LRC circuit with
frequency only, for fixed circuit elements. The first thing to note is that
2
Vrms dPavg V2 dZ
Pavg = R ⇒ = 0 = −2 rms R (9.162)
Z2 dω Z 3 dω
(from the chain rule). The first part of this zero only when Z → ∞ (which happens for the
minima when ω → 0 or ω → ∞, when χC or χL go to ∞, respectively), so we will find the
maximum when we set
!1
1 2
2
dZ d
= ωL − + R2
dω dω ωC
1
ωL − ωC
1
= 1 × L + ω2C
1 2 2
ωL − ωC + R 2
1 1 1
= × ωL 1 − 2 × L+ 2
Z ω LC ω C
2 2 2
L ω ω
= × ω 1 − 02 × 1 + 02
Z ω ω
=0 (9.163)
Week 9: Alternating Current Circuits 443
L2 /Z we can ignore – it will go to zero only when Z → ∞ but we already know there
is a minimum there. We can ignore the (1 + ω02 /ω 2 ) term for the same reason – it is non-
negative and goes to zero only as ω → ∞. We can ignore the ω term – it goes to zero
as ω → 0, obviously, but again we already know Z → ∞ there so we expected to get a
minimum – this just makes it clearer. The one critical piece of new information is that we
will have a maximum (in between the two known minima!) when:
r
ω2
1
1 − 02 =0 ⇒ ω 2 = ω02 ⇒ ωmax = ω0 = + (9.164)
ω LC
This is a very important result! The maximum average power is delivered to the series
LRC when ω = ω0 , the resonant frequency of the oscillator! Indeed, this is also exactly
where:
χL − χC
δ = tan−1 = 0 ⇔ cos δ = 1 ⇔ Z = R (9.166)
R
(which is also obviously the case from the phasor diagram when χL = χC ) and:
2 R
Vrms V2
Pavg,max = 2
= rms (9.167)
R R
just as we would expect for a DC circuit. If we use peak voltage V0 instead of rms voltage
Vrms , of course, we have to put back the factor of 1/2:
1 V02
Pavg,max = (9.168)
2 R
Next, let’s factor the power into a slightly more convenient form for plotting, one that
we could have gotten directly from the solution form we obtained in the section (that you
might well have skipped) on the complex solution to the equation of motion. This form also
makes the expected minima at ω = 0, ∞ readily apparent.
2 R
Vrms
Pav = 1 2
(ωL − ωC ) + R2
2 R
Vrms
= L2 1 2
ω2
(ω 2 −LC ) + R
2
2 Rω 2
Vrms
= (9.169)
L2 (ω 2 − ω02 )2 + R2 ω 2
This function is plotted, for L = C = V0 = 1.0 and several values of R in figure 9.20.
When ω → 0, Pav → 0 like ω 2 . When ω → ∞, Pav → 0 like 1/ω 2 . In between, it clearly
peaks precisely at ω = ω0 , resonance, with a peak power as given above. We are thus
almost ready to draw a generic shape for the resonance curve, the power delivered to the
circuit as a function of frequency. It will turn out that, as was the case for the damped,
driven simple harmonic oscillator of mechanics, the semi-quantitative curve shape is al-
most entirely determined by the quality factor Q.
444 Week 9: Alternating Current Circuits
Q=3
Q = 10
15
Q = 20
Power
10
0
0 0.5 1 1.5 2
ω
Figure 9.20: A typical series of resonance curves for Q = 3, 10, 20, plotted on a scale such
that ω0 = 1
: L = C = 1.0, and R = 0.3333, 0.1, 0.05.
The Q-factor of a passive (undriven) series LRC circuit was shown to be:
E Lω0
Q = 2π = (9.170)
∆Ecycle R
or “two pi times the reciprocal of the fractional energy loss per cycle”. Weak damping is
large Q – it means that you lose only a small amount of the total energy in the circuit in a
single period of the oscillation.
It turns out that we can also write:
r
ω0 Lω0 L
Q= = = (9.171)
∆ω R R2 C
where ∆ω is the full width of the resonance curve at half-maximum.
The Q-factor is a measure of the sharpness of the resonance. A circuit with a low
Q-factor still delivers significant power to the circuit at frequencies far from resonance
(although asymptotically the power still vanishes at zero and infinity as given above). A
circuit with a high Q-factor, on the other hand, has a sharply peaked resonance curve that
goes to zero quickly when ω moves away from ω = ω0 in either direction. For large Q
significant power is delivered to the “resistive load” of a circuit only for input frequencies
very close to the resonance frequency.
In the homework you will be asked to derive the relation:
r
ω0 L 1 L ω0
Q= = = (9.172)
R R C ∆ω
The best way to go about this is to determine Pavg,max , set Pavg (ω) = 12 Pavg,max , and solve
for the values of ω near ω0 for which this is true, following the hints given in the problem.
In figure 9.20, you can see how decreasing R (with V0 , L, and C all fixed at one) causes
the resonance to sharpen up – become much narrower at half-max – at the same time it
Week 9: Alternating Current Circuits 445
increases the maximum power delivered at peak dramatically (recall that independent of
the values of L or C per se, the peak power at resonance is always 12 V02 /R and strictly
increases as R decreases).
In a later section we’ll see at least one or two places one can use an series LRC circuit
to do useful things, but first we have to study an even more useful circuit, the parallel LRC
circuit.
I(t)
L C R
Vo sin( ω t)
Figure 9.21: A parallel LRC circuit, with a voltage that has an “internal resistance” that
limits its ability to deliver current. This circuit is ideal for the construction of a simple AM
crystal radio.
The parallel LRC circuit drawn in figure 9.21 above is actually much simpler than the
series as far as understanding the solution is concerned. In this figure we have added the
internal resistance r of the power supply or antenna, as in the latter case especially the
fact that the voltage cannot supply an infinite amount of power is essential to understanding
how this circuit can be used to build a crystal radio. Note that we didn’t bother doing this
in the case of the series LRC circuit because the resistance R in that case was the total
resistance from all sources in the single circuit loop.
Initially, for simplicity, we will analyze this system assuming that r = 0. This will obvi-
ously be somewhat nonphysical, but it makes it easy to solve exactly because the same
voltage drop V0 sin(ωt) occurs across all three components, and so we can just write
down the currents through each component using the elementary single-component rules
above:
V0
IR = sin(ωt) (9.173)
R
V0
IL = sin(ωt − π/2) (9.174)
χL
V0
IC = sin(ωt + π/2) (9.175)
χC
Note well that we use the rules we derived where the current through the inductor is
π/2 behind the voltage (which is therefore π/2 ahead of the current) and vice versa for
446 Week 9: Alternating Current Circuits
the capacitor. To find the total current provided by the voltage, we simply add these three
currents according to Kirchhoff’s junction rule. Of course, we are adding three trig functions
with different relative phases, so we once again must accomplish this with suitable phasors:
I(t)
Vo
χ
C
Vo
Z Vo
χ
φ Vo L
ωt R
V0 V0 V0
Itot = sin(ωt) + sin(ωt − π/2) + sin(ωt + π/2)
R χL χC
V0
= sin(ωt + φ)
Z
= I0 sin(ωt + φ) (9.176)
As before, we can factor out the common V0 and look at the resulting triangle addition
of the inverse resistance and reactances to obtain a sum rule for the inverse impedance
Z: From figure 9.23 the pythagorean theorem immediately yields an expression for the
1
1 χ
Z C
φ
1 1
R χ
L
Figure 9.23: The impedance diagram for the parallel LRC circuit.
which we recognize as the phasor equivalent of the familiar rule for reciprocal addition of
resistances in parallel.
We similarly can easily evaluate the phase φ:
1 1
!
−1 χC − χL
φ = tan 1
R
RC(ω 2 − ω02 )
−1
= tan (9.178)
ω
where we have factored out a C and 1/ω from the first expression and used ω02 = 1/LC,
the resonance frequency of the circuit.
Resonance for this circuit is the opposite of the series LRC circuit we first looked at. It
1
still occurs at the frequency ω = ω0 = √LC as before, but now Z1 is largest at resonance.
To understand how this can be useful, let us think about current flow in the circuit both at
and away from resonance.
At resonance, the impedance (resistance to current flow) of the L and C together is:
r
1 1 1 2
= ( − ) =0 (9.179)
ZLC χC χL
or
ZLC = ∞ (9.180)
No current flows into the L and C in combination – they behave like an open circuit at the
resonant frequency. All the current that flows from the voltage at this frequency therefore
flows through the resistance (or “load”).
Far from resonance on either side, either χC or χL will be very small – in particular
much less than R. The current produced by the voltage will thus find either the capacitor
(for high frequencies) or the inductor (for low frequencies) to be a much easier path to
ground, provided only that the load resistance R is bigger.
If the voltage were an ideal voltage with r = 0, capable of delivering any amount of
current, this wouldn’t matter. As the impedance of the parallel LC combination drops, it
would simply provide more current and maintain its voltage, while continuing to deliver as
much current to the resistor as before. However, many voltage sources – in particular
a radio antenna – have a signficant impedance/resistance of their own, and if they are
provided with an easy path to ground this shorts out the antenna by pulling enough current
from it so that its pole voltage drops to zero (or at any rate a very small number), reducing
the current through the resistor to zero at the same time.
This suffices to show that there should be a maximum power delivered to the resistance
when one is at resonance and the current has no alternative pathway to ground through
the LC combination, but it does not suffice to show what the characteristics of the power
curve are. To solve this problem exactly, one has to write Kirchoff’s laws for the entire
circuit, reduce them to an algebraic form, and then solve that form. This is rather painful
to do working with trig functions, somewhat easier with complex exponentials, and beyond
the scope of this course.
However, we can at least comment on certain aspects of the solution and show a
curve or two (for the benefit of any would-be crystal radio builders). First, although it
448 Week 9: Alternating Current Circuits
is far from obvious, the power delivered to the load resistor (headphones) will be maxi-
mum if its resistance more or less matches the resistance of the antenna. This is called
“impedance matching” (impedance because in general one has to account for more than
just resistance). One can in fact prove a result known as the Maximum Power Theorem136
or Jacobi’s Law that states that in general when a power source has a complex internal
impedance ZS and the load has a complex impedance ZL , maximum power is transferred
when
ZL = ZS∗ (9.181)
or the impedance of the load has the same amplitude but the opposite phase of the source.
This theorem works for purely resistive loads – in fact in its simplest application it simply
describes the energy distribution between two resistors RS and RL in series! Hence one
needs to design a radio (when possible) to match the impedance of the antenna one hopes
to use with it; if one doesn’t one either burns too much of the received energy in the
antenna itself (when the impedence of the load is too small) or one eliminates one’s ability
to discriminate the signal.
We can do a somewhat sloppy job of estimating the power delivered to the load resistor
with the following argument. Suppose Z is the impedance of the parallel circuit above and
r is the resistance of the source. Then we expect the total impedance of the circuit to be
Z ′ = r + Z (where if we don’t use complex numbers we will have to separate out and add
separately the resistive component of Z to r). The total current drawn from the source
is thus approximately I0 = V0 /Z ′ . We can then find the “corrected” source voltage across
the resistance R as a (phase shifted) VR = V0 − I0 r, and the power delivered to it is thus
approximately:
V2
PR = R (9.182)
R
R=2
2 R = 10
R = 40
Power in R
1.5
0.5
0
0 0.5 1 1.5 2
ω
Figure 9.24: Parallel resonance power delivery in a greatly simplified resistive model.
We plot this very approximate function, computed in just this way, for a range of values
of ω around the resonant frequency ω0 = 1/LC = 1.0 as before in figure 9.24. The voltage
136
Wikipedia: http://www.wikipedia.org/wiki/Maximum Power Theorem.
Week 9: Alternating Current Circuits 449
and resistance have been mutually adjusted to make the picture pleasing, with r = 10ω.
Note that we do indeed see peak power delivery to the load when R ≈ 10ω as expected at
least for the three values for R shown. Note how the Q value of the circuit visibly changes
with R for the fixed L as well.
In the next section we will see how to make practical use of the parallel LRC circuit
(and a rectifier) in the design of a crystal radio, an inexpensive device capable of receiving,
discriminating, and decoding an AM-encoded signal.
The simplest way to transmit things like voice and music via electromagnetic (radio) waves
is to use Amplitude Modulation (AM) to encode the signal onto a carrier wave. Here’s
how it works. First one builds an oscillator at the fixed frequency of the carrier (which is
generally a much higher frequency than any frequency in the signal). Without going into
any details, the LC circuits studied above (combined with an amplifier) can be used to
drive themselves to a stable, single frequency output (especially when stabilized with and
tuned to a “natural” electrical oscillator such as a piezoelectric crystal). For our purposes
this frequency doesn’t have to be too precise – a bit of slow drift in phase or frequency
is OK, for example – but we’ll pretend that it is a single, pure harmonic wave at a carrier
frequency ωc .
Next, we need to collect the signal being encoded in electronic form. This is easily done
with e.g. a microphone, which creates a voltage proportional to the air pressure variations
that it experiences when we speak into it or play music into it. This sort of signal is called
an analog signal (as opposed to a digital signal) that can take any value and that varies
over time.
Third, we combine the two. We use the varying voltage from the microphone as the
relatively slowly varying amplitude of carrier. The three signals (unmodulated carrier, mod-
ulating signal, encoded/modulated carrier) are shown in figure 9.25. The final AM encoded
voltage is used as input to an amplifier that drives the voltage supplied to the transmission
antenna, typically a tall radio tower being driven at a power of tens to hundreds of kilowatts.
The resulting radio signal – electromagnetic radiation of the sort we will study in the next
chapter – propagates for long distances at the speed of light and falls upon the receiving
antenna of your AM radio.
There it creates an alternating voltage with the same shape as the voltage applied to
the transmitting tower. However, this voltage is now very weak – the intensity of the radio
wave diminishes with roughly the square of the distance from the radio tower – and is
mixed in with many other equally strong or even stronger signals from other radio sources
(other radio stations, the sun, electrical motors, many things create radio waves) at various
frequencies.
To tune in just the carrier (plus enough bandwidth to allow its amplitude modulation
to make it through the receiver circuit) we build a circuit that effectively shorts out all of
the signals but the desired carrier at ω0 by providing them with an easy path to ground
through either an inductor (for lower frequencies) or a capacitor (for higher frequencies).
450 Week 9: Alternating Current Circuits
carrier
0.5
0
-0.5
-1
0 2 4 6 8 10
t
4
signal
3
2
1
0
0 2 4 6 8 10
t
AM carrier
4
2
0
-2
-4
0 2 4 6 8 10
t
Figure 9.25: (a) The unencoded carrier with an arbitrary normalization voltage Vc = 1 volt
and angular frequency ω0 . (b) The signal to be encoded. A DC bias has been added to
the AM signal so that the voltage is always positive. This DC bias can be removed at the
far end with a simple high-pass filter; (c) The AM encoded carrier used as (for example)
the power supply to the antenna of a radio station. Note that for real AM signals the
carrier frequency is much higher compared to the highest frequencies in the signal, which
improves the averaging that takes place in the decoding rectifier.
The simplest circuit that accomplishes this is our parallel LRC circuit above.
However, we have to add two features in order to make it a tunable AM radio. First is a
way to tune it! We note that we do the best possible job of filtering out unwanted frequen-
cies when the condition ω02 = 1/LC and when R = r, so our receiver resistance/impedance
matches the internal resistance of the voltage source. We therefore have to be able to ad-
just L, C, or both in order to tune in our AM encoded carrier.
It is beyond our scope in this work to discuss all the various aspects of this decision.
The antenna, diode (crystal), headphones or amplifier input all have some impedance –
characteristics of resistance, inductance and capacitance – and have to be corrected for.
Also, we need to be able to tune the Q of the circuit so that the receiver bandwidth is
adequate to pick up all of the encoded signal while still being narrow enough to reject
nearby AM encoded stations. Many simple crystal radio designs that use wire wrapped
around e.g. a simple tube of some sort allow one to vary L across a range (which adjusts
ω0 and Q simultaneously) – this is especially wise if one’s headphones and/or antenna
have enough capacitance already to make it difficult to add a tuning capacitor “in range”
to permit tuning. Others use fixed L (and hence fixed Q) and a variable capacitor to tune.
Still others may do both – allow one to vary L (possibly to one of a small set of discrete
values) and then use a continuously tunable C to find the signal.
In an idealized circuit for the simplest of crystal radios in figure 9.26, I arbitrarily show
a variable C (that’s the arrow symbol) and also introduce the symbol for an antenna and
ground. The resistance r is a mix of the physical resistance of the antenna wire and its
“radiation resistance” and is the quantity that needs to be impedance matched (more or
Week 9: Alternating Current Circuits 451
r diode (crystal)
L C
R
Figure 9.26: A very simple, idealized crystal radio circuit using a variable capacitor instead
of variable inductance (or variable both). Note also the presence of a diode decoder – a
one-way gate for current (which flows only in the direction of the “arrow”).
less) by the load R for maximum power delivery at resonance. Recall that providing an
easy (low impedance) path to ground through either L or C for a given frequency will
effectively short out the antenna so that all its power at that frequency will be dissipated in
the antenna, not in R. Only when LC has infinite collective impedance at resonance will
the power delivery be balanced in r and (matched) R.
This simple parallel signal alone would suffice to tune in the AM carrier, but if we lis-
tened to the headphones without the diode decoder visible in the circuit, we’d hear – noth-
ing! That’s because the carrier is at a very high frequency (typically over 500 kHz) that is
well above the range of human hearing. We have to remove the carrier, leaving the signal.
Diodes act as a one-way gate for the voltage, allowing current to flow only in the di-
rection of the “arrow” in the diode. This process is called “rectification” (literally right-
sidification), and a single diode is a half-wave rectifier, cutting off of the negative parts
of the current and passing only the positive “right side up” voltage/current variation. Plac-
ing a small capacitor in the line containing the headphones (usually not necessary, as the
diode and the headphones together have some capacitance) removes the DC bias and
“smears” out the top-half carrier waves to fill in a good approximation to the original signal.
The original diodes were crystals of e.g. lead galena in a mount with an adjustable
wire whisker in contact with the crystal – hence “crystal radio”. The wire whisker created a
semiconducting interface with the crystal that in turn only passed current in one direction
(with a very high back resistance that effectively prevented it in the other). However, lots of
other conductor interfaces will provide the same effect, including a graphite pencil (basis
of so-called “foxhole radios” used by GIs in World War II, usually built out of surplus junk
scavenged on a battlefield).
Of course using a single diode in a circuit wastes half of the power picked up by the in-
coming antenna! It is much better to use four diodes turned into a full-wave rectifier. Look
over the following circuit in figure 9.27 (intended to replace the entire diode/headphone
452 Week 9: Alternating Current Circuits
arrangement in the circuit above) and understand how as the voltage oscillates positive to
negative, the current through the headphones only passes in just one direction.
Figure 9.27: A full-wave rectifier made out of four diodes. The “headphones” are the
resistance in the center of the diamond of diodes. Verify that the current always passes
through this resistor from left to right, regardless of whether the voltage difference top to
bottom is positive or negative.
This arrangement basically flips the negative half-waves and fills them into the “holes”
between the positive ones, recovering the full energy. Again, when smeared out a bit by
an RC time constant by the capacitance of the headphones, this accurately reconstructs
the decoded AM signal, without any bias, with a bit of high frequency “ripple” that the
human ear cannot hear. A schematic of the flipped (but not smeared) signal is shown
below in figure 9.28. Compare it to the original signal and you can see that as long as
the headphones are massive enough to be unable to respond to the very high frequency
ripple anyway, you’ll be able to hear the music, voices, or whatever that was encoded on
the carrier to a high degree of accuracy.
This section should provide you with more than enough information to understand and
even build a crystal radio of your own. Note well: this general process of encoding and
decoding information on to/off of carrier signals is one of the fundamental bases of modern
civilization. High pass, low pass, and band pass/reject circuits are ubiquitous. Even if you
yourself never actually build an electronic circuit, knowing a bit about how they work and
in particular knowing what things such as “impedance matching” are and why they matter
can really improve your understanding and ability to work with electronic devices in many
laboratory environments.
In this chapter we have already remarked on the content of the next one. We have
learned all of Maxwell’s Equations already, but one of them is broken; in particular, it
doesn’t take into account the fact that charge is conserved and that there is a certain
ambiguity in the particular open surface S one can choose that is bounded by any given
(specified) closed curve C. We need to fix this, adding the Maxwell Displacement Current
to Ampere’s (broken) Law.
Week 9: Alternating Current Circuits 453
Rectified AM carrier
4
0
0 2 4 6 8 10
t
Figure 9.28: The AM encoded signal after it has been received by a tuned, band-pass
filter and full-wave rectified. Note that the average output voltage will very closely track the
original signal.
When we do, we will discover an amazing thing: time varying electromagnetic fields
satisfy the wave equation and hence propagate like a wave. Under some circumstances
those waves form radio waves, like the AM encoded carrier wave we have just studied. In
others, however, those waves are what we know as light!
454 Week 9: Alternating Current Circuits
Problem 1.
Physics Concepts
Make this week’s physics concepts summary as you work all of the problems in this
week’s assignment. Be sure to cross-reference each concept in the summary to the prob-
lem(s) they were key to. Do the work carefully enough that you can (after it has been
handed in and graded) punch it and add it to a three ring binder for review and study come
finals!
Problem 2.
R
+Q o
C L
S1 (close at t=0)
At time t = 0 the capacitor in the LRC circuit above has a charge Q0 and the current
in the wire is I0 = 0 (there is no current in the wire). Derive Q(t), and draw a qualitatively
correct picture of Q(t) in the case that the oscillation is only weakly damped. Show all your
work.
Problem 3.
Week 9: Alternating Current Circuits 455
R V
R
C V
C
a) The current I(t) through the resistor and capacitor, assuming no current is diverted
into the branches on the right. Clearly identify the relative phase shift δ between the
applied voltage and the current.
b) The voltage VR (t) across the resistor. Factor your answer out so that it is in terms of
the dimensionless ωRC.
This circuit is called a high-pass filter, one that delivers the maximum current in the
circuit only when ωRC ≫ 1 (so that the capacitor behaves like a “short” with very low
reactance).
When the frequency is low, the capacitor acts like a gap, with very high reactance, and
does not permit current to flow. At this point the applied voltage drop across the capacitor is
maximal, and this pair of tap points is sometimes used to help clean up a DC power supply
by “shorting out” high frequency pulses while maintaining a steady DC voltage across the
fully charged capacitor. In this configuration, the capacitor can also serve as a reservoir of
charge and can maintain the voltage even if the load imposes a transient peak in demand
that is higher than the supply voltage source could otherwise handle.
Problem 4.
456 Week 9: Alternating Current Circuits
V
R
L V
L
Repeat the previous problem for the LR circuit above, evaluating I(t), δ, VR (t), VL (t) in
terms of the dimensionless ωL
R . This circuit is used as a low pass filter, with peak current
through and voltage across R at low frequencies, while high frequencies are blocked by
the inductor.
When might one wish to use the VL versus the VR voltage taps, respectively? Think
about this: Not all loads are resistive...
Problem 5.
C
A series LRC circuit connected across a variable AC voltage source V = V0 cos(ωt) is
drawn above. Find:
a) The current I(t) in the primary supply wire (as shown in the figure above) with all
terms, e.g. the phase δ, and the impedance Z defined (the latter in terms of the
individual reactances).
b) The average power dissipated by the circuit. Remember, P (t) = V (t)I(t) (where V (t)
Week 9: Alternating Current Circuits 457
is the voltage across each circuit element and I(t) is the common current through it).
Two of the circuit elements have zero average power, but you must prove this.
I(t) = I0 cos(ωt − δ)
and then add the voltage drop across each series element. Your answer for Z should
remind you of series addition of resistors, using reactances instead (two of which are π/2
out of phase with the resistor).
Problem 6.
We wish to evaluate the Q-factor for this resonant circuit, as this is an important design
parameter for band-pass filters such as those used in radios.
If you did part b) of the previous problem correctly, you should have found that:
2 Rω 2
Vrms
2
Pav (ω) = Iav R=
L2 (ω 2 − ω02 )2 + ω 2 R2
is the average power delivered to the circuit by the voltage and is also the average power
“burned” by the resistor, since the inductor and capacitor do not dissipate energy and there
p
is no net work done per cycle upon them. In this expression ω0 = 1/LC as you should
fully understand at this point.
Show that for a sharply peaked resonance (one with large Q):
R
∆ω ≈
L
so that
ω0 ω0 L
Q= ≈
∆ω R
where ∆ω is the full width at half maximum of the power curve you derive in the first part.
To do this, set the expression above equal to the computed half-maximum power, and
solve for the two quadratic roots for ω, assuming that both of them are very close to (but
not equal to) ω0 (this is the sharply peaked part). You may find the following factorization
useful:
ω 2 − ω02 = (ω − ω0 )(ω + ω0 )
458 Week 9: Alternating Current Circuits
Problem 7.
Rt V
0
V V1 R
0 load
Rt
V R
0 load
In this problem you must analyze the major problem of power transmission that took
place some hundred years ago. Above you can see two alternatives for transmitting power
long distances. The first circuit generates AC power at a relatively low voltage V0 (which
is easy). Step the power up to a very high voltage V1 ≫ V0 and transmit it at high voltage
across a long transmission wire of fixed resistance Rt . Step it back down to voltage V0 and
then place the load Rload across it.
The second circuit generates a DC voltage V0 . Transmit it down identical transmission
lines and place it across an identical load.
Your job is to compute the way the power is divided up between Pload (which is fixed –
the power we need to light a light bulb, for example) and Pt , the power wasted heating up
the transmission lines. The better solution has Pt ≪ Pload . Find a relationship between the
ratios:
V0
V1
and
Pt
Pload
that proves that the AC high voltage solution wins (and by how much it wins, given “rea-
sonable” estimates for Rt /Rload ).
Problem 8.
Week 9: Alternating Current Circuits 459
I(t)
V cos( ωt)
o
L C R
a) The current I(t) in the primary supply wire (as shown in the figure above) with all
terms, e.g. the phase δ, and the impedance Z defined (the latter in terms of the
individual reactances).
b) The average power dissipated by the circuit. Note that (if you are clever and remem-
ber what each elements does in the circuit) you don’t really have to solve a) to get
this answer, although you can certainly get the same answer from a knowledge of
V (t) and I(t) and some integration.
Hint: To find the answer you must add the currents being drawn by each element
separately! Your answer for Z should remind you of parallel addition of resistors, using
reactances instead (two of which are π/2 out of phase with the resistor, of course).
Advanced Problem 9.
C
460 Week 10: Maxwell’s Equations and Light
This problem is in two parts. First, for your own enduring benefit I want you to derive the
full solution to the driven LRC circuit problem. In particular, start with Kirchhoff’s rule for the
loop and either assume a complex V (t) = V0 eiωt and I(t) = I0 eiωt (where by convention
V0 is real, I0 = |I0 |e−iδ , and where one gets physical answers at the end by taking the
real part of the complex answers, or assume V (t) = V0 cos(ωt) and I(t) = I0 cos(ωt − δ).
Find an algebraic expression that expresses the sum of the voltages. Solve this expression
using either phasors (which will work in both cases, one in the complex plane and one in a
”real” x-y plane) or in the complex case directly using algebra, no pictures really required.
Factor out the solution to obtain |I0 | and δ, Z (the impedance), and the voltages across
each element as a function of time.
Week 10: Maxwell’s Equations and
Light
I have also a paper afloat, with an electromagnetic theory of light, which, till I
am convinced to the contrary, I hold to be great guns.
• Ampere’s Law has a bit of a problem. The current through C is not consistently
defined so that it gives the same value for all surfaces S that are bounded by the
closed curve C (through which we evaluate the flux of the current density to find the
current “through C”). This means that two people can evaluate the integral to find the
current through C and get different answers without either of them making a mistake.
One can prove anything from a theory with an inconsistency, so this is a bad thing.
• James Clerk Maxwell noted this problem, and sat down to invent the mathematical
tools and concepts to resolve it. We will proceed far more elegantly than he was able
to, using the gift of hindsight. Either way, we will all arrive at the following consis-
tent form for Ampere’s Law, one to which we have added Maxwell’s Displacement
Current:
!
d
I Z Z
B~ · d~
ℓ = µ0 J~ · n̂dA + ǫ0 ~ · n̂dA
E
C S/C dt S/C
Both of these latter two integrals must be evaluated with the same surface S, but
given this they sum together to give the same invariant current for all the surfaces S
that are bounded by the closed curve C.
• In this new, correct version of Ampere’s Law, you can see Maxwell’s contribution: the
Maxwell Displacement Current produced by a time varying electric field :
d
Z
IM DC = ǫ0 ~ · n̂dA
E
dt S/C
461
462 Week 10: Maxwell’s Equations and Light
• It is worth writing down the complete set of trading cards, suitable for engraving:
1
I Z
~
E · n̂dA = ρe dV (10.1)
S ǫ0 V /S
I Z
~
B · n̂dA = µ0 ρm dV = 0 (10.2)
S V /S
!
d
I Z Z
~ · dℓ~ = µ0
B J~ · n̂dA + ǫ0 ~ · n̂dA
E (10.3)
C S/C dt S/C
d
I Z
~ ~
E · dℓ = − B~ · n̂dA (10.4)
C dt S/C
• Physicists usually rearrange them to make the equations connecting fields to sources
stand out from the equations that have no source terms (because we have yet to see
a magnetic monopole):
1
I Z
~ · n̂dA =
E ρe dV (10.5)
S ǫ0 V /S
d
I Z Z
~ ~
B · dℓ − µ0 ǫ0 ~
E · n̂dA = µ0 J~ · n̂dA (10.6)
C dt S/C S/C
I
~ · n̂dA = 0
B (10.7)
S
d
I Z
~ · d~
E ℓ+ ~ · n̂dA = 0
B (10.8)
C dt S/C
This way, the symmetry is compelling! Two inhomogeneous equations have source
terms connected to electric charge, two homogeneous equations have the same form
but lack the source terms, at least until monopoles are discovered.
• If one applies these equations to a source-free volume of space where electric and
magnetic fields are varying, one can show that they lead to the following wave equa-
tions for the electromagnetic field propagating in (say) the z-direction:
∂2E ~ ~
1 ∂2E
2
− 2 2 = 0 (10.9)
∂z c ∂t
2
∂ B ~ 1 ∂2B~
− = 0 (10.10)
∂z 2 c2 ∂t2
2
∂
The ∂z 2 symbol in this expression, let me remind you, just means to take the deriva-
~ x, t) and B(~
tive of the functions E(~ ~ x, t) with respect to the z-coordinate only, pre-
tending that the other coordinates are constants. In this equation,
r
ke 1
c= =√ = 3 × 108 meters per second (10.11)
km ǫ0 µ 0
is the speed of light in a vacuum, which we can see is completely determined from
Maxwell’s equations.
Since Maxwell’s equations are laws of nature and expected to hold in all inertial
reference frames, it is entirely reasonable to expect the speed of light to be constant
in all reference frames! This postulate, together with some very simple assumptions
about coordinate transformations, suffices to derive the theory of relativity!
Week 10: Maxwell’s Equations and Light 463
• We will study the details of at least certain simple solutions to these wave equations
over the next few weeks. For the moment, the most important solution for you to
learn is:
known as a harmonic plane wave travelling in the z-direction. Note that Ex and
By are in phase and do not have independent amplitudes – their amplitudes are
connected by Maxwell’s equations (Faraday or Ampere’s law) and Ex = cBy . There
is an identical pair of solutions with a different polarization:
that also propagate in the z-direction, as determined from the derivation of the wave
equations above.
In these equations, note well that:
2π
k= (10.16)
λ
is the wave number of the wave, where λ is the wavelength of the harmonic wave,
while:
2π
ω= (10.17)
T
is the angular frequency of the wave. The wavelength is thus the “spatial period” of
the wave, where T is the “temporal period” of the wave that harmonically oscillates in
space and time. This wave propagates in the positive z-direction as can be seen by
considering kz − ωt = k(z − ωk t) = k(z − ct). Note well that this uses the result that:
λ ω
c= = (10.18)
T k
for a harmonic wave.
• The flow of energy in an electromagnetic wave (and field in general) can be deter-
mined from the Poynting vector :
~ = 1 (E
S ~ × B)
~ (10.19)
µ0
The magnitude of the Poynting vector is called the intensity of the electromagnetic
wave – the energy per unit area per unit time or power per unit area being transported
by the wave in the direction of its motion:
dP d dU
I= = = |S| (10.20)
dA dA dt
where U is the energy in the wave. To speak more mathematically precisely to com-
municate the transport of power (energy per unit time, in watts) across some given
surface A, one evaluates the flux of the Poynting vector through the surface:
Z
PA = S~ · n̂ dA (10.21)
A
464 Week 10: Maxwell’s Equations and Light
As you can see one just cannot get away from flux integrals as a way of representing
~ or B
the “flow” of energy, current, fluid, or E ~ field through a surface! As such, it is a
very important idea to conceptually master.
• The Poynting vector can be understood and almost derived by adding up the total
energy in the electric and magnetic fields in a volume of space being transported
perpendicular to a surface A. In a time ∆t, all of the energy in a volume ∆V = A c∆t
goes through the surface at the end. This is:
1 1 2
∆U = ( ǫ0 Ex2 + B )A c∆t (10.22)
2 2µ0 y
If we use |Ex | = c|By | (see above) for a wave travelling in the z-direction and do a bit
of algebra, we can see that:
∆U 1 ~ ~
= |E x ||B y | (10.23)
A∆t µ0
which is just the Poynting vector magnitude in the z-direction for these two field com-
ponents.
• The electromagnetic field also carries momentum, solving the dilemma of the “miss-
ing momentum” left over from our consideration of the magnetic force and the failure
of Newton’s third law. The field momentum is rather difficult to derive in a simple way,
but it can somewhat be understood by assuming that the field electrically polarizes
atoms that it sweeps over in such a way that it exerts a magnetic force along the
direction of motion of the electromagnetic wave. We’ll explore this with a problem
later. The momentum density of the electromagnetic field is:
U
|pf | = (10.24)
c
and we can consider the net momentum transported per unit area per unit time by
the electromagnetic field perpendicular to a surface A to be:
Ithru A
Pr = (10.25)
c
This quantity is called the radiation pressure and it is partially responsible for the solar
wind, created as sunlight pushes gas molecules away from the sun. Light “sails” have
also been proposed as a propulsion for getting around inside the solar system without
rocket fuel. We will explore both of these ideas with homework problems.
To use radiation pressure properly, one has to compute the force it exerts on a sur-
face. This force will depend on certain things, such as whether or not the radiation
is perfectly absorbed or perfectly reflected and (eventually) the relative velocity of
source and target (as the incident and reflected waves can be doppler shifted, affect-
ing the momentum transfer). In the simplest cases (perfect absorption or reflection)
the force is best computed by using an expression such as:
1
Z
FS = ~ · n̂ dA
S (10.26)
c A
Week 10: Maxwell’s Equations and Light 465
that is, the flux of the Poynting vector yields the power transferred to a (perfectly
absorbing) surface, and 1/c of the power is the effective force exerted along the line
of the original Poynting vector. If the radiation is reflected, one has to construct a such
quantity evaluated (with the same power) with respect to the direction of the angle of
reflection, and vector sum the forces. In the simplest case of normal absorption or
reflection:
SA
FS = (10.27)
c
or
2SA
FS = (10.28)
c
respectively.
The power cross section is the amount of power per unit solid angle (dΩ) radiated
away from the accelerating charge. The actual power then drops off like 1/r 2 in this
direction.
A direct consequence of this result is the death of classical physics. Classically, we
expect an electron to orbit a proton in a hydrogen atom, much the way the moon
orbits the earth. After all, the forces of attraction between them have a more or less
identical form! But if an actual hydrogen atom were bound in this way, the electron
(like the moon) would be more or less perpetually accelerating. It would therefore
be more or less perpetually radiating away energy and dropping into a lower orbit to
provide it. If one considers how long it would take before an electron in a circular orbit
around a proton with an initial radius around 10−10 meters (one Angstrom, roughly
the size of almost any atom) to spiral in to the proton, it is a very, very short time (as
the further in it gets the more strongly it mus accelerate and the faster it radiates to
a still lower energy orbit with a still smaller radius). In a tiny fraction of a second, the
classical “atom” would collapse!
The fact that this manifestly does not occur, when it must occur if both Newton and
Maxwell are correct, is one of several factors that led to the invention of quantum
mechanics and modern physics (including relativity theory). This, then, is the next
course in physics that students beginning a serious study of physics should under-
take, as soon as they complete this one and solidify their understanding of classical
electricity and magnetism and light. Things are getting interesting!
• When one considers a point charge oscillating around is oppositely charged mate
(a dynamical version of our Lorentz model for an atom that helped us understand
466 Week 10: Maxwell’s Equations and Light
dielectric polarization earlier) one can either convert this expression into or derive
directly from the Poynting vector the following expression for the power cross-section:
c2
r
dP µ0 4
= 2
k |pz |2 sin2 (θ) (10.30)
dΩ 32π ǫ0
The k4 = (2π/λ)2 is very important, as it is why the sky is blue! Remember it for later
– shorter wavelength/higher frequency light waves have a much larger power cross-
section, all things being equal, than longer ones, because the fields are related to the
time derivatives of the dipole moments which increase with the frequency. Again, the
actual power radiated away in any direction drops off like 1/r 2 .
• Finally, one can (as usual) consider the collective radiation from many charged parti-
cles oscillating against a neutral background in, for example, an antenna. An antenna
is basically a wire that has a current in it such that it forms a macroscopic dipole mo-
ment (in say the z-direction) that oscillates at some frequency ω. This antenna will
then radiate away energy in the form of electromagnetic radiation!. The power cross
section is basically the same as that just given (but for a much larger dipole moment
pz ), so that the intensity of the radiation field of a z-oriented dipole antenna located
at the origin of a spherical polar coordinate system is usually given by:
P0
I(θ) = sin2 (θ) (10.31)
r2
(and is azimuthally symmetric about the z-axis). P0 has the units of power, and
intensity has units of power per unit area, so this works. It is often given as:
2
P0 = Irms Rrad (10.32)
where Irms is the root-mean-square current in the antenna and Rrad is the radiation
resistance of the antenna, which can heuristically be thought of as resulting from
the reaction force exerted on the radiating charges due to their own radiated field!
Deriving these results is beyond the scope of this course, but it is nevertheless useful
to understand and use the terminology when we consider radios (as we saw last
week). Note well that the radiation is most strongly emitted perpendicular to the
dipole moment, and that no energy at all is radiated along the dipole moment.
Week 10: Maxwell’s Equations and Light 467
As discussed at the end of week 8, Maxwell’s Equations – so far – don’t seem quite right.
Let’s write them out as we have them at this point:
~ · n̂dA = 1
I Z
E ρe dV (10.33)
S ǫ0 V /S
I Z
~
B · n̂dA = µ0 ρm dV = 0 (10.34)
S V /S
I Z
~ ~
B · dℓ = µ0 J~ · n̂dA (10.35)
C S/C
d
I Z
~ ~
E · dℓ = − B~ · n̂dA (10.36)
C dt S/C
The asymmetry will be a bit more apparent if I put all of the terms involving charges as
sources of the fields on the right and all of the terms involving the fields themselves on the
left:
1
I Z
~
E · n̂dA = ρe dV (10.37)
S ǫ0 V /S
I Z
B~ · d~
ℓ = µ0 J~e · n̂dA (10.38)
C S/C
I
B~ · n̂dA = 0 (10.39)
S
d
I Z
E~ · d~
ℓ+ B~ · n̂dA = 0 (10.40)
C dt S/C
I put a tiny e subscript on the J~ and reordered them with a big hole in Ampere’s Law to
emphasize the point. The top two equations are connected to electrical charge – either
stationary or moving – to produce the fields. The bottom two are zero on the right, where
the zero just means “there ain’t no stinkin’ magnetic monopoles been seen (yet)” but we
can imagine that if there were, Gauss’s Law for Magnetism would get a source term on
the right that looked just like that for Gauss’s Law for Electricity, and Faraday’s Law would
get a term on the right involving the current density of moving magnetic charge, just like
Ampere’s Law.
But what about poor Ampere’s Law, in that case? Faraday’s Law mixes electric and
magnetic fields, so that time varying magnetic fields make electric fields.
Shouldn’t Ampere’s Law have a term such that time varying electric fields make mag-
netic fields? I left the gap just in case...
This is as good a thing as any to motivate a closer look at Ampere’s Law. Maxwell’s
Equations are starting to look rather beautiful 137 but that big hole is ugly, as are (really)
the big ugly zeros where magnetic monopoles should live. Natural philosophers have from
time immemorial considered “beauty” – a certain appealing symmetry, as it were – to be
an essential component of probable truth. Sometimes this belief is followed to a fault, of
137
Seriously. If there is such a thing in this Universe as beautiful mathematics, Maxwell’s Equations are It.
This course won’t cover the half of just how gorgeous they really are...
468 Week 10: Maxwell’s Equations and Light
course, especially when the beautiful idea in question is our idea and we ignore the fact
that it doesn’t, actually, agree with nature particularly well when we look138 . Ultimately
nature itself must be the arbiter of truth in natural law, but still, at the very least, things that
are almost beautifully symmetric demand a closer examination to see if we are missing
something because symmetry is – empirically – often observed in nature! Experimentalists
today continue the search for magnetic monopoles; we ourselves will follow in Maxwell’s
footsteps and search for the “problem” – that will turn out to be a missing term as we might
guess from symmetry – in Ampere’s Law.
Bφ S2
S1 +Q −Q
I I
C
no current through S 2
Figure 10.1: A simple circuit and pair of surfaces that illustrate how Ampere’s Law is (so
far) wrong, with two completely different currents for the two surfaces S1 and S2 .
We have learned enough at this point to be able to see that Ampere’s Law is obviously
wrong – or at least, not mathematically consistent! – from a few simple, specific, examples
that illustrate the problem. In figure 10.1 I’ve drawn a side view of a humble parallel plate
capacitor. At this particular instant, a current I(t) is flowing along the wire on the left,
charging up the capacitor so that a charge +Q(t) is increasing on the left plate.
To this innocuous looking problem we’ll apply Ampere’s Law – specifically to the nice
circular loop C drawn around the supply wire. This loop is quite far away from the capacitor,
and the electric field the capacitor is making is more or less confined to live between its
plates, and the current I quite obviously goes through the surface S1 stretched across C
(and hence goes “through C”), so we should be quite justified in deducing the usual:
I Z
~ ~
B · dℓ = Bφ 2πr = µ0 J~ · n̂dA = µ0 I (10.41)
C S1 /C
(where recall that S/C should be read as “the open surface S bounded by the closed curve
C”) so that
µ0 I
Bφ = (10.42)
2πr
around the circle in the right handed sense. No problem, the field of an infinitely long
straight wire carrying current, the simplest possible situation. How could this be wrong?
138
A serious problem with pre-Enlightenment philosophy...
Week 10: Maxwell’s Equations and Light 469
But wait. When I wrote the right-hand side of Ampere’s Law, I happened to choose the
“easy” surface S1 that stretches straight across the curve C (and an easy curve C that lies
in a plane). However, there is nothing in the mathematics of Ampere’s Law that requires
me to use that particular surface.
Indeed, I could choose to use (say) surface S2 /C instead! S2 is just as “bounded by
the closed curve C” as S1 is. They are topologically equivalent – S1 is like the film of soap
stretched across a bubble blowing loop, and S2 is like the bubble as it has been blown out
but is still attached to the loop. The only problem with this is that the current:
Z
I= J~ · n̂dA = 0(!) (10.43)
S2 /C
because the surface S2 goes in between the plates of the capacitor, where no charge flows!
This is a disaster! Ampere’s Law seems to give us two possible answers. In fact,
since there are an infinite number of surfaces S I could draw bounded by C that intercept
different parts of the capacitor and wire supplying it, there are an infinite number of possible
answers! But the two answers Bφ = µ0 I/2πr 6= 0 and Bφ = 0 are more than enough for
us to see that we have a serious problem to deal with. The current on the right hand side
of Ampere’s Law (correctly evaluated as the flux of the current density through a surface
bounded by the curve C) is not invariant when we vary the surface S in perfectly reasonable
ways.
Now, in this particular example, based on the specific curve and geometry illustrated
in figure 10.1, one could argue that using S2 is silly – that we should “obviously” use the
surface S1 that lies in the same plane as C (or otherwise choose a “special” surface) so
that we’ll get the right answer. You should be deeply suspicious of this argument, of course
– it sounds rather like choosing the surface on the basis of the fact that it gives the right
answer instead of finding the right answer from the equation no matter how we choose the
surface.
In fact, it is easy (and educational) to construct a simple counterexample to this as-
sertion, one where there is no possible way to a priori which of two surfaces S/C to use.
Both of the two “obvious” surfaces that stretch across C in the way closest to the way the
plane surface S1 stretches across the circular curve C in the example above turn out to
be identical – simple rotations of one another, in fact – and (for what it’s worth) empirically
neither of them will give the right answer for the broken version of Ampere’s Law!
Consider figure 10.2, the “potato chip” case139 . In this case we imagine the curve C
to be a circle that is bent over in just the right way so that there are two surfaces that are
bounded by it that are identical – so much so that S2 is a simple rotation of S1 that has
exactly the same boundary curve C, in exactly the same orientation!
If you put them together just right, the two surfaces thus joined make the closed surface
S = S1 + S2 , and we can clearly push a current through (say) S2 that does not exit the
139
If you live in a country where “Pringles” potato chips – the ones sold in a simple cylindrical package where
they are neatly and perfectly stacked – are available, the “saddle” shapes in question are nearly identical to
the shape of a Pringles potato chip. If you take two such chips and rotate them just right and put them rim to
rim, you can almost perfectly enclose a volume with a closed surface made up of potato chips. Sadly, I doubt
that Pringles will send me any money for so shamelessly plugging their topologically useful product...
470 Week 10: Maxwell’s Equations and Light
S1
S2
C C
ρ S1
S2
C
I
Figure 10.2: The “potato chip” case. Surfaces S1 and S2 are bounded by the same curve
C and together form a closed surface that completely encloses a charge density ρ!
volume through S1 as illustrated to accumulate a nonzero total charge density ρ (and hence
total charge Q) in the volume bounded by S. Obviously, there is now no possible way to
decide which of these two surfaces to use as “the” correct surface to use so that the broken
version of Ampere’s Law will yield the correct answer140 .
You can try to argue even further, that we must be sure to always use both “nice” curves
C (ones in a plane, for example) and “nice” surfaces S (ones in that same plane) but a) that
isn’t very satisfactory, mathematically; and b) empirically, it doesn’t work! Moving point
charges make magnetic fields, and those fields pretty much always are going to violate the
broken version of Ampere’s Law on almost all curves C and surfaces S simply because the
“current” through any given S is always going to be zero except for the tiny instant that the
point charge crosses it, but there is going to be a magnetic field around C at least some
of the time as the charge moves along a trajectory that would eventually pass through it,
maybe, if it didn’t stop first or curve away.
Basically, these explicit examples demonstrate that that so far, Ampere’s Law is really
just Ampere’s Sort of OK Rule That Works, Sometimes, For A Subset Of All Possible
Cases, If We Cheat. This simply won’t do. We want a natural law to always work – it
has to be “unbreakable”, especially by as simple a thing as bending C into a 3D twisted
loop (like a crumpled coat hanger) or choosing the “wrong” C-bounded surface S for some
perfectly reasonable plane loops C. Wrong by what standard? How can we decide that
any of the variations are “wrong” without knowing the answer some other way, if the law
itself isn’t invariant across all possible choices?
Mathematicians and physicists get very anal about this sort of thing. If they don’t, the
bugaboo of all human efforts to reason, inconsistency, creeps into our set of beliefs, and
mathematicians all well know that you can prove anything from a contradiction (and hence
know nothing on the basis of your proofs)141 .
140
As noted, neither of them.
141
In fact, by insisting that Maxwell’s Equations as natural laws ought to be invariant under changes of inertial
Week 10: Maxwell’s Equations and Light 471
Our job, it appears, is to try to make the current in Ampere’s Law invariant:
I
~ · d~
B ℓ = µ0 Iinv through S/C (10.44)
C
so that it gives us the exact same current for any surface S/C we might happen to choose
to solve a problem. Obviously, we also want it to give the known/observable right answer
for (say) the simple capacitor example illustrated above. If it works for a few cases like this
where we know the answer from experiment and can compute it more than one way, we
can even hope that the answer thus obtained from our new, improved version of Ampere’s
Law will always agree with experiment and is indeed a natural law! After all, that’s the only
real game in town!
I out
S2
n = n’
S1
J
n = n’
ρ
I in
n
n’ I out
C n = n’
Figure 10.3: A very general current density flows through space. Some current flows in
from the left and exits on the right, but some builds up in the current density ρ in the
volume between the two surfaces S1 and S2 . The point is that the difference between the
flux (current) in through S1 and out through S2 must be equal to the rate that charge builds
up in between, because charge is conserved.
The picture that will best help us find the invariant current is drawn in figure 10.3. We
are going to take this picture and think about it in the light of another physical law that we
really believe in, the Law of Charge Conservation. You might recall that we used figure
5.2, more or less equivalent to 10.3, in an alternate form of our derivation of the integral
form of the law of charge conservation way back in chapter 5. As you should now be able
to see from the examples above, Ampere’s Law fails to consistently account for charge
conservation in any case other than steady state (time independent or very slowly varying)
current flow. When we analyze what happens for the two surfaces above and include
Gauss’s Law for Electrostatics consistently, the correct invariant current will more or less
fall out of our analysis at our feet, ready to be plugged into Ampere’s Law to make it correct.
reference frame, Einstein threw out more or less all of classical non-relatistic physics – and was backed up by
numerous experients that showed that he was right to do so! Kind of scary, that...
472 Week 10: Maxwell’s Equations and Light
In figure 10.3, I’ve chosen two simple surfaces S1 and S2 bounded by C. In fact, they
are both parts of a sphere, and together they make a closed spherical surface, one that
encloses the volume V inside. As was the case in the advanced discussion of charge
conservation, the current density J~ flows in through surface S1 , but not all of it flows out
through S2 . Some of it is building up in a charge distribution ρ inside the sphere. So the
total current I flowing in to the sphere is larger than the total current flowing out. None of
this – the choice of a sphere, the particular curve C or surfaces S1 or S2 – is important; we
just choose them to make the result easy to see.
Since charge is conserved (empirical law!), the rate that charges builds up inside the
closed surface S = S1 + S2 will equal the difference the the flux of the current densities:
d
Z Z Z
~
J · n̂dA − ~
J · n̂dA = ρdV (10.45)
S1 /C S2 /C dt V /S
which is exactly what we arrived at as equation 5.28, a version of the law of charge conser-
vation, before. In this equation, the normals n̂ in the two integrals on the left are directed
from the left to the right, in the direction of the current’s apparent flow. You should feel free
to go back and review the relevant part of chapter 5 and reread the discussion there if this
is not clear.
The integral on the right looks strangely familiar! In fact, it is part of Gauss’s Law for
Electricity! Using Gauss’s Law (multiplied on both sides by ǫ0 ) we can substitute:
Z I
ρdV = ǫ0 E ~ · n̂′ dA (10.46)
V /S S
(where note well that n̂′ on the right is the outward directed normal in GLE).
We substitute this back into the first equation to get:
d d
Z Z Z I
~
J · n̂dA − ~
J · n̂dA = ρdV = ǫ0 E ~ · n̂′ dA
S1 /C S2 /C dt V /S dt S
Z
d
Z
= ǫ0 ~ ′
E · n̂ dA + ~ ′
E · n̂ dA (10.47)
dt S1 S2
where I have broken the flux integral on the right hand side into two pieces, one over S1
and one over S2 . This is perfectly all right since the entire closed surface S = S1 + S2 .
Next, since n̂′ in the two field flux integrals is the outward directed normal of GLE, it
goes from left to right on S2 , but on S1 it goes from right to left! I want to make n̂ exactly
the same (left to right) in the field flux integrals as it is in the current density flux integrals
on the left hand side of the equation, so I change the sign of the S1 integral (and thereby
can change n̂′ to n̂ in both integrals):
d d
Z Z Z Z
~
J · n̂dA − ~
J · n̂dA = −ǫ0 ~
E · n̂dA + ǫ0 ~ · n̂dA
E (10.48)
S1 /C S2 /C dt S1 /C dt S2 /C
Finally, we move all of the S1 integrals to the left, and all of the S2 integrals to the right:
d d
Z Z Z Z
~
J · n̂dA + ǫ0 ~
E · n̂dA = ~
J · n̂dA + ǫ0 ~ · n̂dA
E (10.49)
S1 /C dt S1 /C S2 /C dt S2 /C
Week 10: Maxwell’s Equations and Light 473
The left side only depends on S1 /C. The right depends only on S2 /C. We used no
special properties of these curves or surfaces beyond the fact that any two non-coincident
open surfaces bounded by the same closed curve C enclose a volume. The two sides are
thus invariant under any possible change in the curves C or surfaces S. We thus define
the invariant current to be:
d
Z Z
Iinvariant, “through C” = ~
J · n̂dA + ǫ0 ~ · n̂dA
E (10.50)
S/C dt S/C
where the result now holds for any surface S bounded by any given closed curve C!
Let us now guess – not prove – that this invariant current is the correct one to use
in Ampere’s Law. If it isn’t, Ampere’s Law is in deep trouble, as any other form will be
inconsistent with the Law of Charge Conservation and/or will produce a “current” we’ll
have to “cheat” to evaluate by knowing the answer and selecting particular choices for C
and S so that it somehow works out.
Fortunately, we can immediately check to see if this invariant current works in at least
one problem where we do know the correct answer – the specific capacitor example above
where Ampere’s Law as it was before got it wrong.
That is, we suppose Ampere’s Law is really:
(Z )
d
I Z
~ · d~
B ℓ = µ0 Iinvariant, “through C” = µ0 J~ · n̂dA + ǫ0 ~ · n̂dA
E (10.51)
C S/C dt S/C
which no longer depends on our choice of S/C. Note well the location of the brackets: the
µ0 is outside of them, and everything inside of them has the units of current.
If we apply this version of Ampere’s Law to our pathological counterexample, the ca-
pacitor problem above, when we compute the invariant current through S1 we still get the
~
actual current I in the wire (because the E-field due to the capacitor is confined to live in
between the plates of the capacitor and doesn’t pass through S1 at all in our usual ideal-
ization). This leads us to the expected answer for the field of a long straight wire, which we
also evaluated directly from the Biot-Savart Law in week/chapter 7 and which agrees with
Ampere’s original current balance experiments – empirically, this is bound to be right.
If we apply it to the surface S2 , then as before no physical current gets through, but the
magnitude of the field in between the plates of the capacitor is (recall):
σ 1Q
E= = (10.52)
ǫ0 ǫ0 A
where A is the area of the capacitor, where the field goes from left to right across S2 . This
field is nonzero only between the plates of the capacitor, so integrating over S2 only gets a
contribution from the area A where the field is non-zero, uniform in magnitude, and parallel
to n̂ as we have drawn S2 . Therefore the flux integral is:
~ · n̂dA = EA = Q
Z
E (10.53)
S2 /C ǫ0
exactly as one expects, and hence:
d ~ · n̂dA = ǫ0 d(EA) = dQ = Iinv (through S1 /C)
Z
Iinv (through S2 /C) = ǫ0 E
dt S2 /C dt dt
(10.54)
474 Week 10: Maxwell’s Equations and Light
because I in the wire is, in fact, the rate at which the capacitor is charging! We get the
same I for both surface, and for both surfaces this leads to the correct field around
C!
Needless to say (I wouldn’t have presented all of this hard work for nothing, after all) this
version of Ampere’s Law gives the right answers for all classical charge-current densities
where “right” means only “in agreement with experiment within experimental error and the
fact that we are ignoring quantum mechanics” as it should. From now on, if someone asks
for “Ampere’s Law”, while you can still use the broken version in magnetostatic, steady
state current density problems, you should remember this version:
(Z )
d
I Z
~ · d~
B ℓ = µ0 J~ · n̂dA + ǫ0 ~ · n̂dA
E (10.55)
C S/C dt S/C
with its invariant current as the one and only correct version of Ampere’s Law and remind
yourself that you are neglecting the term involving the derivative of field flux because it is
zero in problems of this sort.
The extra term we have added to the physical current was originally added by James
Clerk Maxwell, and the implications of this term ae so profound, so overwhelming, that
the entire set of equations (and the term itself) were named in his honor. It is called the
Maxwell Displacement Current:
d
Z
IMDC = ǫ0 ~ · n̂dA
E (10.56)
dt S2 /C
As we’ve seen, for many “static” problems where there is no time-varying electric field
we can use the old form without error, but it won’t work when charge is building up and the
electric field “through C” is varying! In fact, there is one very important place where the
old form fails. It fails to describe the magnetic field inside the parallel plate capacitor. Let’s
work that out as an example.
In figure 10.4 we see a parallel plate capacitor with cylindrical symmetry being charged by
a (momentarily) steady current I. As charge flows onto the capacitor, the field (assumed
as usual to be strictly confined to be between the two plates, ignoring the fringe) increases
uniformly. This increasing field creates an increasing flux through cylindrically symmetric
Amperian loops of radius r in between the plates, generating a magnetic field there. Our
job is to evaluate this field, both between the plates and in free space outside of the plates
(but in the plane that separates them).
This description is a perfect recipe for our algebraic work, yet another example of how a
verbal understanding of the physics plus knowledge of the laws and ability to do relatively
simple math suffices to enable one to solve problems that at first glance are quite difficult.
We imagine that at some time t the capacitor has a total charge Q(t) on it such that
I = dQ/dt.
Week 10: Maxwell’s Equations and Light 475
r
(r > R)
Figure 10.4: A capacitor made up of two circular disks is being charged by a current I.
The increasing electric field between the two plates becomes a Maxwell Displacement
Current that creates a magnetic field identical to the one that would exist inside a uniform
conductor of the same radius (assuming the conductor had a magnetic permeability and
electric permittivity identical to the vacuum value, not really a very good assumption).
J~ = 0 (no actual current flows through the insulating vacuum between the plates) and the
only thing that varies with time in the flux is the charge Q, so this becomes:
dQ 2
dt r µ0 Ir 2
Bφ 2πr = µ0 = (10.60)
R2 R2
(in the right handed direction around the current onto the disk as shown).
If we choose the larger Amperian path C at r > R, the only thing that changes is that
the flux is no longer a function of r, as the field is nonzero only in between the plates and
equals φC = ǫQ0 there. The field (after the same basic algebra) becomes:
µ0 I
Bφ = r>R (10.62)
2πr
Note two things. First, the two algebraic forms for Bφ are equal at r = R, the boundary
between the two regions. Second, on the inside the field is the same as the field one
would expect in a wire of radius R carrying a uniform current I (and vanishes at r = 0 as
might be expected), while on the outside the field is that of an infinitely long straight wire.
These two observations are strong algebraic evidence that our displacement current has
indeed “solved” the problem of finding an invariant current that gives us sensible answers
regardless of the path C or surface S chosen that is bounded by it.
~
In chapter 7, we noted that at least in the non-relativistic limit where v ≪ c, the B-field
of a unformly moving point charge could be sort-of derived from the Biot-Savart Law, and
that the Biot-Savart Law could sort-of be turned into Ampere’s Law (broken version). We
also saw in chapter 8 that there were some very interesting consequences from looking at
what happens to the magnetic field and force when we hop from a frame where a charge
~
is moving in a uniform B-field ~
and no E-field is present (so it only experiences a magnetic
force) to a frame where the charge is not moving, experiences no magnetic force, but still
experiences a force that must, somehow, be due to an electric field that appeared out
of nowhere. That field, as it turned out, was directly related to the changing flux of the
magnetic field!
This leads us to the question: Can we play the same sort of game now, only back-
wards? Can we (for point charges moving in the non-relativistic v ≪ c limit) look only at
the changes in electric field flux – since there will be no physical charge current through
almost all possible surfaces S/C in Ampere’s Law, after all – the surfaces where the point
charge is on S are a set of measure zero in the set of all surfaces in mathematese, and
require careful treatments of various infinities surely unsuitable for an introductory course
– and obtain e.g. the magnetic field of a point charge from Ampere’s Law with the MDC
only, and hence work backwards to the Biot-Savart Law etc?
The answer, amazingly enough, is yes. The following is due to Robert Buschauer142 .
We start with a point charge q sitting at rest at the origin. As we well know, its static fields
are then:
E~ = ke q r̂ ~ =0
B (10.63)
r2
Now we imagine changing frames into a frame that is moving (say) in the −ẑ direction
at speed v ≪ c. In this frame, the charge is moving at velocity ~
v = v ẑ in the positive z
142
The Physics Teacher 51, 542 (2013)
Week 10: Maxwell’s Equations and Light 477
direction, and we’ll consider it to be at the origin at the time t = 0 in both frames. Now
consider the electric flux of the static electric field through the spherical “cap” of a sphere
of radius r whose boundary is a circle parallel to the x-y plane at a height z above the
origin. This is simple to evaluate:
θ 2π
q
Z Z Z
~ · n̂dA =
E r 2 sin θdθdφ (10.64)
disk 4πǫ0 r 2 0 0
where
z
θ = cos−1 (10.65)
r
is the angle that defines the curve C that bounds the cap on the sphere of radius r. This is
easy to evaluate. The r 2 cancels, we get 2π from the dφ integral, and the remaining theta
integral yields
q z q
φe = (1 − cos θ) = (1 − ) (10.66)
2ǫ0 r 2ǫ0
A quick check: If θ0 → π, we get the flux through the entire sphere, which is (correctly,
according to GLE) q/ǫ0 .
Now let’s write down Ampere’s Law with the MDC, evaluated only for the electric flux
through this cap:
d q
I
~ ~
B · dℓ = Bφ 2πr sin θ = µ0 ǫ0 (1 − cos θ) (10.67)
dt 2ǫ0
We get no contribution from the 1, ǫ0 cancels, and (to ring in a v) we can use the chain
rule:
d cos θ d cos θ dz d cos θ
= = v (10.68)
dt dz dt dz
d cos θ
All that remains before putting it back together is to evaluate . First, let’s let
dz
a = r sin θ. Since we are not letting the actual curve C change in time, this is a constant!
In terms of this:
z z
cos θ = = 1 (10.69)
r (a2 + z 2 ) 2
The derivative with respect to z (holding a constant) is straightfoward to evaluate:
d z 1 z2 (a2 + z 2 ) − z 2 a2
1 = − 3 = = (10.70)
dz (a2 + z 2 ) 2 r r r3 r3
Now we can plug this in and reassemble the parts. Ampere’s Law becomes:
µ0 qva2
I
~ · d~
B ℓ = Bφ 2πa = (10.71)
2 r3
15 × 107
∆t = = 500 seconds (10.74)
3 × 105
or a bit over eight minutes before you would – very briefly – see it before being vaporized
(along with the rest of the Earth) by the burst of incredibly intense light propagating out
from the blast. Our equations above left this retardation factor out, but a more precise
143
Wikipedia: http://www.wikipedia.org/wiki/Electromagnetic tensor#Relationship with the classical fields.
Called the “second rank field strength tensor in four dimensions” in relativity theory, with symbol F µν , if you
care to look it up.
Week 10: Maxwell’s Equations and Light 479
OK, so let’s rewrite the complete set of Maxwell’s Equations, but this time with Maxwell’s
teensy weensy little contribution and see if we can figure out why it is so all-fired important
that physicists speak in hushed tones when they mention Maxwell’s name, much as they
do for Newton and Einstein and a handful of others:
1
I Z
~
E · n̂dA = ρe dV (10.75)
S ǫ0 V /S
I Z
B~ · n̂dA = µ0 ρm dV = 0 (10.76)
S V /S
!
d
I Z Z
~ · d~
B ℓ = µ0 J~ · n̂dA + ǫ0 ~ · n̂dA
E (10.77)
C S/C dt S/C
d
I Z
~ · d~
E ℓ = − B~ · n̂dA (10.78)
C dt S/C
The symmetry will now be a apparent if I put all of the terms involving charges as
sources of the fields on the right and all of the terms involving the fields themselves on the
left:
1
I Z
~
E · n̂dA = ρe dV (10.79)
S ǫ0 V /S
d
I Z Z
~ · d~
B ℓ − µ 0 ǫ0 ~ · n̂dA = µ0
E J~e · n̂dA (10.80)
C dt S/C S/C
I
~ · n̂dA = 0
B (10.81)
S
d
I Z
~ · d~
E ℓ+ ~ · n̂dA = 0
B (10.82)
C dt S/C
The only asymmetry now arises from the empirical non-observation of magnetic monopoles,
and even you, humble beginning physics student that you are, can already see exactly
what we would have to do to “fix” Maxwell’s Equations if tomorrow somebody performed a
reproducible experiment that discovered them.
But this symmetry isn’t (yet) why Maxwell is cool. No, there is something much more
profound buried in these equations now. Faraday’s Law already showed us that changing
magnetic fields make electric fields. Maxwell showed us that at the same time, changing
electric fields make magnetic fields! Why is this significant? Because a changing electric
480 Week 10: Maxwell’s Equations and Light
field can make a changing magnetic field that makes a changing electric field that makes
a changing magnetic field that makes – wait a minute! Is it possible that we could have an
electromagnetic wave?
It is!
To see this is a bit tricky. It is tricky because we are taking an intro course where we
~ differential operator.
have to avoid “real” differential multivariate calculus and the dread ∇
We have learned only the integral equation forms, which means basically that we have to
convert them into derivatives in order to end up with a wave (differential) equation for the
electric and magnetic field. Let’s get to it.
E x(z + ∆ z)
E x(z) ∆x
∆z z
B By(z) ∆y
By(z + ∆ z)
y
Figure 10.5: Two particular components of the electric and magnetic field, in a coordinate
frame “far” from any sources and varying in space and time. The graph is a snapshot at
a particular time t, but we can imagine that Ex (z, t) and By (z, t) generally and ignore any
other variation with x or y for the moment.
Let us start, then, with no source terms in Maxwell’s equations, or rather, in a region
of space far from any sources. That doesn’t mean that the fields there are zero, only
that we don’t have to worry about how the fields were originally produced – we know that
they were somehow created by electric charges and currents but we don’t care about the
details. Maxwell’s equations are then somewhat simpler:
I
E~ · n̂dA = 0 (10.83)
I S
B~ · n̂dA = 0 (10.84)
S
d
I Z
~ · d~
B ℓ = µ 0 ǫ0 ~ · n̂dA
E (10.85)
C dt S/C
d
I Z
~ ~
E · dℓ = − B~ · n̂dA (10.86)
C dt S/C
Week 10: Maxwell’s Equations and Light 481
as now there are no magnetic or electric monopoles present, only the fields.
Let us graph the fields on an arbitrary coordinate system and apply Ampere’s Law and
Faraday’s Law (only) to our graph. E~ and B ~ have many components each, of course, and
can be varying with respect to both position and time, so we need to simplify a bit to make
sense of things. We will then imagine that either our distant source created only x-directed
electric fields and y-directed magnetic fields or that, equivalently, we are only considering
Ex and By components in particular of a more complicated field. Since the fields satisfy the
superposition principle, any results we get for this pair of components can be generalized
to any actual directions we like.
The graph is shown in figure 10.5, along with two dashed curves (bounding the shaded
surfaces) to which we will apply Ampere’s and Faraday’s Laws. We will assume that
Ex (z, t) is a function of z and t only – it may vary with respect to x or y as well, but for
the moment we’ll ignore any such variation144 . Similarly we will assume By (z, t) only. Our
graph is a snapshot at some particular time t, so we don’t bother writing t in on the figure
(but it is really there). I’m sorry if it is a bit confusing to constantly ignore variation with re-
spect to this or that variable – if/when you take multivariate calculus you’ll learn once and
for all how to deal with this sort of thing and encode it into the notion of the partial deriva-
tive but for the moment we’re working our way towards a result that should be expressed
in partial derivatives without actually using them or their (honestly, much simpler) notation.
Now let us apply Faraday’s Law to the small differential loop in the x-z plane. This loop
has an area ∆A = ∆x∆z, and we need to define a right handed normal to the loop in the y-
~ That means that we need to go around the loop counterclockwise
direction (parallel to B).
as drawn in the page. Then:
d
I Z
~ · d~
E ℓ = − ~ · n̂dA
B
dt ∆A
d
0 · ∆z + Ex (z + ∆z)∆x − 0 · ∆z − Ex (z)∆x = − (By ∆A)
dt
dBy
(Ex (z + ∆z) − Ex (z)) ∆x = − ∆x∆z
dt
(Ex (z + ∆z) − Ex (z)) dBy
= −
∆z dt
(10.87)
where we do the loop piecewise and get no contribution when we go in the z direction
(because E~ is in the x-direction perpendicular to z). If we take the limit ∆z → 0 of the left
hand side this is just the definition of the derivative and we get145 :
dEx dBy
=−
dz dt
Let’s do exactly the same thing for Ampere’s Law, this time using the more lightly
shaded surface and curve in the y-z plane with area ∆A = ∆y∆z. Again we must go
144
It isn’t too difficult to imagine how such a field could be produced by (say) a distant oscillating electric
dipole in the −z direction, actually.
145 ∂B
Technically, this should be expressed as partial derivatives: ∂E
∂z
x
= − ∂ty , but since we cleverly arranged
it so that Ex is a function of only one spatial coordinate and x and t are independent, it doesn’t matter in this
case.
482 Week 10: Maxwell’s Equations and Light
∂B ~
~ ×E
∇ ~ = −
∂t
∂E~
~ ×B
∇ ~ = µ0 ǫ0 ,
∂t
the grown-up way of writing the source free Faraday’s and Ampere’s Laws in terms of the curl, a component
pair at a time. You can actually get all six terms in these two equations from our one original result by mentally
rotating the arbitrary right-handed coordinate system into all six indepedent orientations. Or you can use
Stokes Theorem, which we basically just derived. Since advanced students derived the partial differential
form for Gauss’s Law in the second week, we have now derived the partial differential form for the whole set
of Maxwell’s Equations, at least once the source terms are put back in...
Week 10: Maxwell’s Equations and Light 483
This is all very well, but so far it is still not spectacular. To make it spectacular, we (say)
differentiate the first of these equations with respect to z:
d2 Ex d dEx d2 Ex
= − µ 0 ǫ 0 = µ 0 ǫ 0 (10.93)
dz 2 dt dt dt2
or
d2 Ex d2 Ex
− µ 0 ǫ 0 =0 (10.94)
dz 2 dt2
We stare at this for a moment, our brains dulled by too much algebra. Then, through
the fog, a light begins to shine through, dim at first, then ever brighter until it rivals the sun!
Holy Smoke, Batman, haven’t we seen that equation, or one sort of like it, before?
We have! In the first part of the course we went to considerable (although much less)
pains to derive the one-dimensional wave equation for a string:
d2 y(x, t) 1 d2 y(x, t)
− =0 (10.95)
dx2 v 2 dt2
for a y-displaced string, where the wave propagated at speed v in the ±x direction! Well, it
seems that Maxwell’s Equations tell us that the x-component of the electric field in a region
of space far from any sources satisfies a wave equation too! I wonder (you ask yourself)
what the speed of this wave is?
Well, comparing the two equations, we see that:
1 4π 4π 1 ke
v2 = = = = (10.96)
µ 0 ǫ0 µ0 4πǫ0 µ0 4πǫ0 km
and if we do only a tiny bit of arithmetic with the only two constants I really required you to
memorize/learn for this part of the class we get:
9 × 109 meters2
v2 = = 9 × 1016 (10.97)
10−7 second2
or:
meters
v = c = 3 × 108 . (10.98)
second
This particular speed was first estimated during the very first days of systematic scientific
exploration based on observations of variations in the period of one of Jupiter’s moons.
It was known within a few percent by the mid-1800s, and experiments were being done
that were rapidly adding significant digits to the quantity (it is currently one of the most
accurately known physical constants). This quantity is the speed of light.
The electric field wave propagates at the speed of light!
And that, boys and girls, is why Maxwell got his name on the whole set of Maxwell’s
Equations for his one measely term. He proposed (correctly) that light is an electromag-
netic wave and in so doing, transformed the still partially disparate electric and magnetic
484 Week 10: Maxwell’s Equations and Light
fields into a single unified field theory and revolutionized our understanding of, well, every-
thing. You. Me. Stuff. What isn’t made up of electric charges and doesn’t interact via the
electromagnetic interaction148 ?
Well, we haven’t quite shown all of that yet. But now you can see how it goes well
enough to complete most of what we sill need to do even without my help. If we take the
second of the two equations (Ampere’s Law) and differentiate both sides with respect to z
and substitute in the first (Faraday’s Law) for the right hand side we get:
d2 By d2 By d2 By 1 d2 By
− µ 0 ǫ 0 = − =0 (10.99)
dz 2 dt2 dz 2 c2 dt2
for example (you should verify this, obviously, by doing it). So yes, By (z, t) is also a wave
that propagates at the speed of light c. The two components were presented together
because they are coupled by Ampere’s and Faraday’s Laws. The variation of Ex in space
and time produces the variation of By in space and time, so that either one propagates like
a wave, but the waves are not independent. Similarly, Ey and Bx are coupled as they vary
along the z axis in time, and obviously they satisfy the same wave equation and propagate
at the same speed as well.
The rest of the course is basically devoted to understanding light as an electromagnetic
wave. Although we will restrict ourselves to “one dimensional” wave forms, we will talk a
bit about how light varies with distance as it spreads out in three dimensions from a central
source. We will think at least a bit about sources, relying heavily on the oscillating electric
dipole as a model source. As a source, the dipole has one ideal feature: It is a harmonic
source. Consequently, although light in general does not have to be harmonic, we will find
it very convenient to focus on understanding it as a harmonic wave149.
Before we study light as a harmonic wave, let’s very quickly recapitulate things we know –
or should know – about waves based on our study of waves on a string and sound waves
in the first part of the course. Recall that we showed that a very general solution to the
wave equation for waves on a string was:
where f (u) is an arbitrary one-dimensional function. Basically any functional form that
propagates to the right or left along the x-axis was a solution to the wave equation.
Since the electric and magnetic fields both satisfy one-dimensional wave equations for
propagation along the z-axis, we can expect this to be true for them as well. Any electric
148
The correct answer: not much...
149
Even when we treat light as a non-harmonic wave, we usually begin by transforming e.g. the initial condi-
tions or boundary conditions into the harmonic/frequency/wavenumber domain, solve the problem for harmonic
waves, and then use the Fourier transform to transform back and obtain the general non-harmonic result. Of
course this once again requires more math to pursue. Physics majors, do you get the idea that you will need
more math, sooner or later? Math majors, do you see why you need to take more physics? Everybody else,
aren’t you glad you don’t need to in order to pretty much understand light waves perfectly well?
Week 10: Maxwell’s Equations and Light 485
field that we can create that has some shape at time t = 0 can be made to propagate
in the ±z direction by pairing it with the appropriate magnetic field. However, most of
those arbitrary shapes are going to be very difficult to arrange, and arranging them to
occur with their correctly paired partner field even more difficult. We will thus ignore this
general solution and concentrate on a much more specific one, one tied to a particular
easy-to-imagine source.
Suppose the source of the wave we observe is indeed an oscillating electric dipole
located at the (distant) origin and aligned with the x-axis. Then we know that at any given
instant in time, if the dipole points up in the +x direction, its field curls around and points
down in the −x direction as it passes through the z-axis. At least, this was our static result.
Now, however, we see that this result can’t quite be correct. If the electric field propagates
at speed c and the dipole is oscillating, the field itself has to oscillate too, and furthermore
the “up” regions have to move away from the source at c, as do the “down” regions. In
other words, we’d expect the field to have the form of a harmonic wave:
where ω is the frequency of the oscillating dipole source that is producing the wave150.
We are fortunate in this is actually a function of the form f (z ± vt)! To see this, let’s
factor the argument:
ω
Ex (z, t) = E0x sin(k(z ± t) = E0x sin (k(z ± ct)) (10.102)
k
which has the desired form if c = ω/k. Indeed, if you substitute this harmonic wave into
the wave equation, you get:
d2 1 d2
E0x sin(kz ± ωt) = −k2 E0x sin(kz ± ωt) = E0x sin(kz ± ωt)
dz 2 c2 dt2
1
= − 2 ω 2 E0x sin(kz ± ωt) (10.103)
c
or (dividing out)
ω2
c2 = (10.104)
k2
and c = ω/k as promised.
Again recalling our work with harmonic waves, we expect that in these equations:
2π
k= (10.105)
λ
is the wave number of the wave, the “spatial angular frequency” in terms of the wavelength
of the wave λ, just as:
2π
ω= (10.106)
T
150
Note well that we could have equally well used E0x cos(kz ± ωt + φ) for some arbitrary phase angle φ, or
better yet E0x eikz e±iωt where E0x = |E0x |eiφ is an arbitrary complex amplitude. We choose to use sin(kz±ωt)
for no other reason than to have something specific to work with, but these all satisfy the wave equation and
are equally valid possibilities. The phase angle φ in particular corresponds to determining simply the shape of
the wave when we start the “clock” of our harmonic wave in our particular reference frame.
486 Week 10: Maxwell’s Equations and Light
is the temporal angular frequency of the wave in terms of its period T . Thus:
ω 2π λ λ
c= = = = fλ (10.107)
k T 2π T
are all useful ways of relating the frequency, wavelength, angular frequency, wave number,
period, and speed of the wave. Yes, you can remember just one of these and figure out
the rest, but on an exam speed counts and I recommend learning all of these forms so
that they are second nature and you don’t have to think about them.
We expect that:
By (z, t) = B0y sin(kz ± ωt + φ) (10.108)
where we cannot yet assume that Ex and By have the same phase, although we do insist
(since they are parts of the same wave) that they have the same frequency. Now let’s work
some magic. We’ll restrict our interest for the moment to a wave propagating to the right:
We substitute these two forms into (your choice of) Ampere’s or Faraday’s Law in differen-
tial form. Let’s choose Faraday as being marginally simpler:
d d
E0x sin(kz − ωt) = − B0y sin(kz − ωt + φ) (10.111)
dz dt
kE0x sin(kz − ωt) = ωB0y sin(kz − ωt + φ) (10.112)
ω
E0x sin(kz − ωt) = B0y sin(kz − ωt + φ) (10.113)
k
E0x sin(kz − ωt) = cB0y sin(kz − ωt + φ) (10.114)
In order for this to be true, φ = 0 – the electric and magnetic fields do have to have the
same phase (and frequency and wavelength) and we have now proven this, and:
The electric and magnetic fields are not independent! The magnitude, phase, and fre-
quency of one is determined completely by the other.
This is a wave propagating to the right, as noted. Let’s try the exact same solution for
the independent solution:
Note that we have assumed nothing other than Ey is coupled to Bx (because that’s what
Ampere/Faraday tell us). Again we substitute – using the form of Faraday’s Law we derived
for Ey – and get:
d d
E0y sin(kz − ωt) = B0x sin(kz − ωt + φ) (10.118)
dz dt
kE0y sin(kz − ωt) = −ωB0x sin(kz − ωt + φ) (10.119)
ω
E0y sin(kz − ωt) = − B0x sin(kz − ωt + φ) (10.120)
k
E0y sin(kz − ωt) = −cB0x sin(kz − ωt + φ) (10.121)
Week 10: Maxwell’s Equations and Light 487
This time we see that the two fields must be in phase and that:
~ are
For a wave propagating to the right, both of the independent components of E
~ such that:
related to the coupled components of B
~
|vE| = c|B| (10.123)
and so that the E-field crossed into the B-field points in the direction of the wave’s propa-
~ and curl them into B,
gation. That is, if we let the fingers of our right hand line up with E ~
our thumb points in the direction of propagation. This also works for waves propagating in
the −x direction, e.g. E0x sin(kz + ωt) (try it!).
OK, so now we have the harmonic electric and magnetic field, and both are in phase and
have amplitudes related by c. We know that there is some energy in these fields described
by the energy density of the electric and magnetic fields respectively:
1
ηe = ǫ0 E 2 (10.124)
2
1 2
ηm = B (10.125)
2µ0
Now, however, that energy isn’t sitting still. It is moving, being carried by the wave from one
point to another. We can easily see that energy must be carried by the wave by imagining
a source that is turned on (the dipole moment is pulled out and released to oscillate, if
you like) at time t = 0. Some distance away from the source at first there is no field – our
“Lorentz model” atom was spherically symmetric and produced no field – and then the field
reaches it some time after the dipole is excited and starts to oscillate. No energy in that
region of space before, yes energy after, therefore energy is carried by the field from the
source to the region of space. Simple!
Naturally, we’d like to be able to compute how much energy is being carried along by
the field. To find out, we resort to what should now be a very familiar argument. In a time
∆t, all of the energy in a box of length c∆t will be carried through the cross-sectional area
A of it’s end. The amount of energy is:
1 2 1 2
∆U = ǫ0 E + B c∆tA (10.126)
2 µ0
The power per unit area per unit time that is carried through A is a quantity we define to
be the intensity of the light wave:
P ∆U 1 2 1 2
I= = = ǫ0 E + B c (10.127)
A A∆t 2 µ0
Let’s do a bit of algebra. For the moment, let’s once again concentrate on our familiar
harmonic pair Ex (z, t) and By (z, t). Then Ex2 = Ex (cBy ) and By2 = By (Ex /c), so if we
488 Week 10: Maxwell’s Equations and Light
Again, I’m hoping that I don’t have to do much more than this – sketch out one more
example of how the flow of a vector field through a surface is conserved and correctly
accounted for by the flux integral.
The intensity is thus the magnitude of the Poynting vector :
~
I = |S| (10.132)
and is still a very useful quantity in its own right.
The Poynting vector is actually pretty much magical. For example, it doesn’t just work
with dynamic electromagnetic waves – it works for static fields as well. In fact, for your
homework you will prove that the flux of the Poynting vector into a resistor, and inductor
and a capacitor all precisely equal V I – I 2 R, LIdI/dt and QI/C respectively. This seems
to suggest that the power that appears as heat in a resistor is actually electromagnetic
energy that flows in through the sides of the resistor, quite contrary to at least my naive
expectations. But it gets the answers we obtained other ways precisely correct – it is
difficult to argue with the conclusion.
The electromagnetic field doesn’t just carry energy – it carries momentum151. If you
recall our arguments way back when we discussed the failure of Newton’s Third Law, we
knew even then that it must be so – the missing momentum has to go someplace or
momentum violation would be ubiquitous in electromagnetism – but now we have to run it
down.
This is actually rather tricky. It isn’t easy to derive the momentum carried by the elec-
tromagnetic field, because it has no mass. The easiest way to see what it must be is to
examine the net force exerted on a point charge in an electromagnetic field. We’ll do this
(and define the associated radiation pressure in the next section.
151
And often angular momentum as well, but that is beyond the scope of this course.
Week 10: Maxwell’s Equations and Light 489
There are two arguments that make it comparatively simple to see that an electromagnetic
wave must exert a force on charged matter that it strikes at a surface. Let’s take the
simplest one first – an electromagnetic wave incident on the surface of a perfect conductor
at right angles.
B (out)
E
E I (surface)
F F
Although it is beyond the scope of this course to treat waves incident on conductors,
it is a True Fact(tm) that while conductors screen their bulk interior from electromagnetic
fields (including electromagnetic radiation) they do not do this instantly at the surface.
Just as static fields build up a static surface charge density that cancels the field on the
interior that is a few atoms thick, time varying fields penetrate a small distance into a
conductor (called the skin depth) before being cancelled by a time-varying charge-current
distribution confined to the surface. The skin depth depends on the frequency of the wave
and the conductivity of the material (getting smaller as either one gets larger) but is usually
at least a few atoms thick (and can be centimeters thick at very low frequencies such as
that of household current).
In figure 10.6 an electromagnetic wave is incident at right angles on a conducting sur-
face. The wave penetrates a short (grey-shaded) distance into the conductor before being
attenuated, and within this distance the electric field pushes a surface current in the di-
rection of the field as one expects from the relation J~ = σ E
~ (a form of Ohm’s Law, recall,
from our discussion of conduction and resistance). The magnetic field also penetrates
a short distance into the surface and exerts a force on this surface current. As you can
see from the figure, this force is expected to be in the direction of the wave and will be
spread out on the entire conducting surface.
This simple picture demonstrates that just as the electromagnetic wave carries energy
(per unit time), it carries linear momentum (per unit time) and exerts a force on any conduct-
ing surface it collides with. From our previous discussion of dielectrics, which also develop
a (bound) surface charge density that reduces the electric field, we expect a dielectric sur-
face to also have a (much weaker) surface current parallel to the electric field and to still
experience a force when impacted by an electromagnetic wave in direct proportion to the
490 Week 10: Maxwell’s Equations and Light
1 ~
~
g= S (10.133)
c2
1 ~
∆p = |S|Ac∆t (10.134)
c2
If we divide both A and ∆t to the left, we get the force per unit area exerted on the
surface:
1 ∆p ~
|S|
Pr = = (10.135)
A ∆t c
This is called the radiation pressure exerted on the surface by the electromagnetic
field, assuming normal incidence and complete absorption of the wave. One then finds the
total force the usual way:
~
~ = AS
F (10.136)
c
~
~ = A S cos(θ)
F (10.137)
c
but the force is still exerted in the direction of the incident wave.
If the wave is incident on a tipped surface that reflects the wave, it exerts (again, ap-
proximately) twice the force from the radiation pressure alone, but only along a line per-
pendicular to the surface, much like the homework problem involving beads bouncing on
the pan of a balance in the Mechanics text. In this case we expect:
~
~ = 2A S cos(θ)n̂
F (10.138)
c
where n̂ is a normal unit vector pointing in to the surface in question. The momentum
density of the incident wave parallel to the surface is unchanged while the momentum
density perpendicular to the surface reverses. As noted above, this is an idealization as
the reflected wave will always have slightly less energy density than the incident one if
the surface itself is moving in the general direction of the incident wave and thereby gains
energy from the wave, and will have slightly greater energy density if the surface is moving
towards the source and hence loses energy to the wave152 .
One question that we failed to answer in our derivation of the wave equation for electro-
magnetic radiation above is this:
At this point in the course, you should be able to see that it has to come from electric
charges, because so far the only source for either electric or magnetic fields are electric
charges, either stationary or moving.
“Radiation”, in the form of a solution to the coupled wave equation for electric and
magnetic fields we derived above, requires two more things: The electric and magnetic
fields have to be changing in space and time and they have to be perpendicular to the
direction the radiation propagates. This strongly suggests that the charges that produce
electromagnetic radiation have to be moving.
However, our derivation of the magnetic field produced by a uniformly moving point
charge, plus a bit of reasoning, clearly shows that while a charge can produce both an
electric and a magnetic field in any inertial reference frame in which it is moving, if we
choose the specific inertial reference frame in which it is not moving, it produces only an
electrostatic field. Without a magnetic field at all (in this frame) there can be no directed
152
Or, you can think of the reflected wave being Doppler shifted by the motion of the reflector – in one case
the reflected wavelength would be a bit longer in the reflected wave than in the incident one, so the energy
is going to be more “diluted” in space; in the other the reflected wavelength is shorter, so there are more
wavefronts per unit volume.
492 Week 10: Maxwell’s Equations and Light
radiation of energy (that is, the Poynting vector associated with the fields produced by the
charge is zero if the magnetic field is zero). If there is no energy radiated in one inertial
frame, there can be no energy radiated in any other inertial frame, so we conclude that
uniformly moving charges do not radiate. This is essentially the statement that both the
kinetic energy and momentum of an isolated, uniformly moving charge must be conserved.
Well, what about accelerating charges, charges whose magnetic field cannot be made
to generally vanish in any inertial reference frame? As we will see below, the source of
electromagnetic radiation is accelerating charge! Before we attempt to derive an explicit
form for the radiation emitted by a uniformly accelerated charge, however, we have to add
one key concept to our earlier treatment of the electrostatic field. In the first three chap-
ters, we only considered the fields produced by more or less stationary charge (density)
distributions. In our treatment of magnetostatics, we similarly only considered the fields
produced by uniformly moving charge/current densities. In both of these treatments, we
could determine the electric and magnetic fields “over all space”.
When we added Faraday’s Law and Ampere’s Law with the Maxwell Displacement
Current, however, things changed. In particular, we proved above that electromagnetic
radiation must propagate in a vacuum at the speed of light, c, determined from the electric
and magnetic constants.
We will now extend this just a tiny bit. It turns out that:
Proving this is beyond the scope of this course – it really requires the vector differential
formulation of Maxwell’s equations, the so-called “vector potential”, and gauge field theory,
but for the time being we can simply accept it as an empirically true natural law in its own
right. In other words, we can only tell that the electrostatic and magnetostatic fields and
potentials propagate away from sources at speed c because if we move them in certain
ways, we can observe the changes produced by the motion only after a lag time:
|∆~x|
∆t = (10.139)
c
where |∆~ x| is the distance between the source point (at the time te of emission) and the
observation point (at the time to = te + ∆t). Note that this doesn’t affect any of our static
field results above for (not really) “stationary” charges or continuous static currents as
long as the charges or currents have been there long enough that their static fields have
had time to fill space within at least the distance between the sources and our measuring
apparatus.
This general idea should be fairly familiar to us already from our studies of sound waves.
Because sound travels so much more slowly than light, it is common enough to e.g. see
a flash of lightning and have to wait before the clap of thunder it produces arrives at our
ears. Light is a bit trickier, because it turns out nothing carries information from one point
of space to another faster than the speed of light, so we can’t be located at one point in
space and somehow “see” that a charged particle moves far away any faster than the field
at our point of observation changes as a result of the motion!
Week 10: Maxwell’s Equations and Light 493
Armed with this one small addition to our existing knowledge of electrostatics, we can
fairly simply establish the connection between an accelerated charge and the appearance
of electromagnetic radiation.
The following derivation of so-called “Larmor” radiation from an accelerated point charge is
due to J. J. Thomson, dating back to the first decade of the 20th century just about the time
Einstein was deriving special relativity. It is non-relativistic, and requires that our charges
never move at a speed comparable to the speed of light.
q
r
z
∆v∆t
Figure 10.7: The field lines produced by a charge initially at rest at the origin that undergoes
a brief acceleration to a final speed ∆v = a∆τ ≪ c, at the time ∆t ≫ ∆τ that the change in
the electrostatic field reaches the sphere of radius r = c∆t. The red field lines are centered
on the moving charge, the blue field lines are centered on the charge before it began to
move, and the purple field lines represent the disjunction of the field across the spherical
shell of thickness c∆τ .
Consider a point charge q that has been located at rest at the origin of a frame S “for
a long time” (long enough that the only field visible in the nearby surrounding space is the
usual electrostatic field of a point charge centered on the origin). At the time t = 0, we will
give this charge a uniform “push” for a very short time ∆τ so that it experiences a constant
acceleration in the z direction. At the end of this time, the charge has the uniform velocity:
∆v = a∆τ (10.140)
This motion – plus our new rule above that changes to the electrostatic field can prop-
agate no faster than the speed of light – creates a kind of a disjunction in the electrostatic
field. If we are sitting a a point of observation P = (r, θ, φ) in spherical polar coordinates
(where φ is irrelevant because our z directed motion is azimuthally symmetric under rota-
tion around the z-axis) then the field there will remain unchanged for a time
r
∆t = (10.142)
c
We will express this the other way around. For r ≥ c∆t, the purely radial electrostatic field
in the frame S will just be:
ke q
Er = 2 (10.143)
r
After the time ∆τ the charge is moving uniformly with speed ∆v along the z-axis. As
we know, there exists a frame S ′ co-moving with the charge (so it is at rest and at the
origin of S ′ ) and within an expanding sphere of radius r − c∆τ = c(∆t − ∆τ ), there is an
electrostatic field centered on the point charge. If we transform this field back into S, it
will look just like a radial electrostatic field whose center points back to the position of the
charge at the time ∆t.
This situation is illustrated in figure 10.7, where the the (blue) field lines represent the
field at the boundary of the sphere of radius r = c∆t and beyond, and the (red) field lines all
point back to the position of the charge at time ∆t as they hit the sphere of radius r − c∆τ .
We are implicitly assuming that ∆τ ≪ ∆t, and are hence simply ignoring the tiny
displacement of the charge 21 ∆v∆τ during the acceleration itself relative to ∆v∆t. Also,
because ∆v ≪ c, ∆v∆t ≪ c∆t. If we were to draw figure 10.7 properly to scale in this
limit, ∆v∆t would be indistinguishable from the origin, and the two dashed spheres in the
figure would form a spherical shell of thickness c∆τ centered on the origin of S!
r = ct ∆v∆ t sin θ Eθ
E
r = c(t− ∆τ) E
c∆τ Er
E
c∆τ (similar triangles)
E
∆v∆ t sin θ
θ θ
q q z
∆v∆ t
~
Figure 10.8: Close-up of the geometry of the E-field across the spherical shell from r = ct
to c(t + τ ). Note well that t ≫ τ and ∆v = aτ ≪ c – the thickness of the shell and displace-
ment of the charge are greatly magnified to make it easy to see the relevant triangles.
Figure 10.8 represents these two origin-centered spheres on a scale that greatly exag-
gerates the size of ∆v∆t relative to r = c∆t, but that also preserves the geometry essential
Week 10: Maxwell’s Equations and Light 495
to determining the components of E ~ across the transitional spherical shell. Recalling the
rule that electrostatic field lines can begin or end only upon a charge, the (purple) E ~ field
lines connect the lines emitted from the charge at the common angle θ (relative to the z-
axis) on either side of the spherical shell of thickness cτ . It is this purple connecting field
that is responsible for radiation, and we need to estimate its spherical polar coordinates.
As we proceed, we will effectively eliminate both ∆τ and ∆t from the result in favor of the
acceleration a and r.
First, finding the radial field strength inside the shell is simple enough. We use the
binomial expansion in the limit (established above) that ∆t ≫ ∆τ to approximate the radial
field strength Er at the inner sphere of radius c(∆t − ∆τ ):
ke q
Er =
(c(∆t − ∆τ ))2
ke q
= 2
(ct)2 1 + ∆τ ∆t
ke q ∆τ
≈ 1−2 + ...
(c∆t)2 ∆t
ke q ke q
= 2
= 2 (10.144)
(c∆t) r
which is the radial field strength at the outer sphere of radius c∆t. This establishes that
Er is effectively continuous across the spherical shell where the field direction “suddenly”
shifts due to the acceleration of the charge and in any event produces no radiation of
energy.
On the other hand, Eθ is the component perpendicular to ~ r that cannot be shifted away
by any inertial reference frame change and it is this component that we expect to result
in electromagnetic radiation in the form of a nonzero radial Poynting vector. We can find
~
this from the fact that in figure 10.8 the (purple) E-field component triangle and the (green)
triangle describing the relative displacement of charge location perpendicular to r and the
distance light travels during τ are similar :
∆v
If we multiply by c/c = 1, turn c∆t on top into r, and identify a = this becomes:
∆τ
ke qa sin θ qa sin θ
Eθ = = (10.146)
c2 r 4πǫ0 c2 r
where I’ve converted to a form in terms of ǫ0 for later convenience. Note well! This
~
electrodynamic component of the E-field is inversely proportional to r! It diminishes more
slowly than the radial electrostatic field itself with r.
Next we form the magnitude of the instantaneous (not time-averaged!) Poynting vector
~ perpendicular to ~
associated with this component of E r:
~ = ǫ0 cEθ2 = q 2 a2 sin2 θ P0
|S| = 2 sin2 θ (10.147)
16π 2 ǫ0 c3 r 2 r
496 Week 10: Maxwell’s Equations and Light
where:
q 2 a2
P0 = (10.148)
16π 2 ǫ0 c3
has units of power (watts). The magnitude of the Poynting vector is (recall) the intensity –
power per unit area – of the electromagnetic radiation propagating away from the charge
as it arrives a distance r = ct away from its position where the acceleration occurred.
We can compute the total rate at which the charge radiated energy while it was ac-
celerating by integrating the flux of the Poynting vector through the sphere of radius r in
spherical polar coordinates:
π 2π 1
P0 8πP0
Z Z Z
Ptot = sin2 θ 2
r sin θ dθ dφ = 2πP0 (1 − x2 )dx = (10.149)
0 0 r2 −1 3
dW q 2 a2
Ptot = − = (10.150)
dt 6πǫ0 c3
(where here and immediately below, W stands for the energy in the radiated field, not
necessarily “work” per se).
This result is called Larmor’s Formula and represents the rate at which one has to
do extra work while accelerating the charge q. Some of the work goes into increasing
the kinetic energy of the (massive) charged particle, but some of it is lost in the form of
radiation from the particle as it radiates!
From Newton’s third law, then, the emitted radiation field at the location of the charge
must therefore itself act back on the charge as something called radiation resistance.
Radiation resistance looks suspiciously like a “self-force” and is worthy of far more discus-
sion than we can reasonably include here153 .
Apparently, if one takes an electric charge and accelerates it, it radiates away electro-
magnetic energy. Charge moving at a constant velocity (which is a frame transformation
away from being charge at rest) does not radiate energy. It may produce an electric and
magnetic field, but that field is guaranteed not to carry any energy away. Only when it
accelerates does the charge radiate (and of course, there is no inertial frame that can get
rid of that acceleration, so the radiation occurs in all frames).
Well, when do charges accelerate? One place is inside a linear accelerator used to
study particle physics. Another is when a charged particle strikes a medium and slows
to rest as it interacts with it (producing so-called “braking radiation”154 ). They accelerate
if they are oscillating harmonically (see the next section). They accelerate if they move
around in circles or follow any sort of curved path even at constant speed – circular motion
is just harmonic oscillation in two dimensions simultaneously, and that pesky centripetal
153
Wikipedia: http://www.wikipedia.org/wiki/Abraham-Lorentz force. The actual force is called the Abraham-
2
~ rad = − q
Lorentz force, and it appears as a third order differential equation, F
d~
a
. Equations of this sort
6πǫ0 c3 dt
have various problems with causality and have runaway solutions that must be excluded as non-physical, but
in any event (as it turns out) the Universe doesn’t explode with free energy due to radiation reaction...
154
Wikipedia: http://www.wikipedia.org/wiki/Bremsstrahlung. Which sounds much cooler in German.
Week 10: Maxwell’s Equations and Light 497
acceleration qualifies as one that would radiate energy continuously and not just in part of
its cycle or when speeding up or braking linearly.
The Larmor formula, historically, has been one of the most profound results in all of
physics, as it established a fundamental inconsistency between Newtonian classical me-
chanics and Maxwell’s Equations. What it adds up to is this: There is no obvious way to
make a model for an atom made up of an electrostatically bound state of electron(s)
and proton(s) that does not involve orbiting or oscillating charge! No non-obvious
way either, at least not classically, especially not one that agrees with the observation
that atoms do radiate electromagnetic energy, but only at certain fairly sharp energies and
frequencies!
In fact, if you build a simple model for a hydrogen atom consisting of a proton being
orbited by a light electron in an orbit that is initially roughly the right size, you find that it
collapses, with the electron spiralling into the proton while it radiates away energy, in than a
nanosecond. A strictly Newtonian classical Universe based on Maxwell’s equations would
last just about that long!
Of course, the visible Universe has been around for approximately fourteen billion years
according to the most recent theories and observations155 . We are forced to conclude:
In either case nearly everything we’ve learned over the last two semesters is wrong, or, to
be more charitable, approximate and/or incomplete.
Too bad, ladies and gentlemen. Maxwell’s Equations appear to be more or less correct,
although they do need to be slightly reexpressed in non-classical mechanics and ultimately
augmented with other fields. Classical Mechanics is not. At this point you should visualize
Isaac Newton (who was reportedly a dangerous man to cross in an argument and who
had been elevated to a state of near worship for his manifold contributions to mathematics
and science in general) spiralling down to the earth like Icarus, wings melted, never to rise
again as high as he was before Maxwell and Faraday made their momentuous discoveries
and their successors used them and additional evidence (such as the discovery of the
electron and atomic nucleus) to arrive at this stark conclusion.
Over the last 140 years or so Newtonian mechanics (or its classical near-equivalents,
Lagrangian and/or Hamiltonian mechanics) has been replaced by quantum mechanics, a
wave-like theory of matter that is, to say the least, a lot more complicated than the relatively
simple F ~ = m~ a that has governed nearly everything we have learned so far. Indeed,
this is almost a good point to bridge over to a study of so-called “modern physics” – the
non-Newtonian mechanics and non-Galilean relativity of the late 19th and 20th century –
but there are a few more things we need to study first concerning light as a classical
electromagnetic wave.
155
Wikipedia: http://www.wikipedia.org/wiki/Age of the universe. Note that there is some uncertainty in the
most current estimate of 13.772 billion years old, and that even the error estimates on this number themselves
are based on assumptions that could be overturned by, for example, a better understanding of so-called “dark
matter” and “dark energy” or physics beyond the Standard Model.
498 Week 10: Maxwell’s Equations and Light
Before we embark on a study of light per se, it is worth noting that nearly all of the elec-
tromagnetic radiation we directly observe comes from something that can be remarkably
successfully modeled by oscillating dipoles acting as sources of electromagnetic radiation,
predominantly electric dipoles at that. Our Lorentz model “linear response” polarization of
an atom (or molecule) turns out to be very useful (all the way into graduate classical me-
chanics) for understanding many of the “generic” properties of radiating atoms and matter
interacting with a time dependent electromagnetic field.
Let’s return to our undamped Lorentz model atom from our discussion of dielectrics. If
we polarize the atom in the z-direction so that its dipole moment is initially p0 = qz0 for
some (relatively mobile/polarizable) charge q, and release it, we’d expect that charge to
oscillate harmonically, z(t) = z0 cos ωt. In turn this means that the charge would accelerate
harmonically:
a(t) = −ω 2 z(t) = −z0 ω 2 cos ωt (10.151)
so that:
qa = −qz0 ω 2 cos ωt = −p0 ω 2 cos ωt (10.152)
However, we just observed that accelerated charges radiate! This means that instead
of oscillating harmonically without damping, the atom would radiate the total energy of the
oscillator away in the form of dipole radiation
p20 ω 4
Iavg (r, θ) = sin2 θ (10.154)
32π 2 ǫ0 c3 r 2
As before, we can integrate over the entire solid angle to get the Larmor formula for the
power emitted from the dipole:
dWtot q 2 a2 q 2 z02 ω 4
(t) = − = − cos2 ωt (10.155)
dt 6πǫ0 c3 6πǫ0 c3
dWavg p2 ω 4
Pavg = =− 0 3 (10.156)
dt 12πǫ0 c
This radiation acts much like a damping force, removing the energy of the oscillator not
through “frictional” losses into some medium but through direct radiation of energy into the
electromagnetic field. If one models this loss as a linear damping term, one obtains for-
mulas for the dispersion and scattering of electromagnetic radiation that are the dynamical
equivalent of the dielectric polarization response we studied before.
Week 10: Maxwell’s Equations and Light 499
You can see from the above that the total power radiated from an oscillating dipole scales
with the fourth power of the angular frequency of the oscillator. This is an enormously
interesting result! As we will see in the next chapter on light, electromagnetic radiation
~
polarizes atoms or molecules in the direction of the E-field of the incident radiation. The
atom or molecule acts like a driven oscillator, with a dipole moment oscillating at the same
frequency as the incident radiation, and reradiates energy it absorbs from the radiation
field, scattering the radiation in all directions distributed like sin2 θ relative to its axis of
polarization.
All things being equal, then, an atom or molecule oscillating in the violet end of the
spectrum, with a frequency close to twice the frequency of red light, will reradiate almost
16 times as much power as one oscillating in the red part of the spectrum.
White sunlight is made up of all the frequencies of visible light, but red-orange light
tends to go through the atmosphere unscattered, where blue-violet light tends to be scat-
tered sideways. This both makes direct, originally white, sunlight appear “oranger” by the
time it has penetrated the atmosphere, while the blue sky we see when we look in any
direction but at the sun is blue because the scattered light contains more blue-violet wave-
lengths than it does red-orange ones.
This is also, of course, why sunsets and sunrises appear red. The sunlight comes in
at an oblique angle and traverses a much longer distance in the atmosphere than sunlight
from straight overhead does. The blue-violet component is scattered out to the sides,
leaving behind the far less attenuated red-orange light. On a particularly clear day, you
can almost see a pastel “rainbow” of colors at sunset, where the colors vary from dark red
up through violet as the light you see from the sky increases the path length it must follow
from the sun, through the atmosphere, to your eye.
There are several things you can observe in the real world that reflect this. Emergency
lighting, stop lights, brake lights (and more) tend to be red because red light can penetrate
anything from a clear atmosphere to a foggy, dusty atmosphere much farther than any
other visible color. On the other hand, the little lights they use on airport runways to direct
planes on the ground tend to be blue, because blue lights are more or less invisible unless
you are right on top of them! Landing planes cannot see them or be distracted by them –
only planes almost on top of them can see them at all.
The next time you are driving at night on a foggy/rainy/dusty night, look at the stop
lights along your route. When they are red, you will be able to fairly clearly see them,
with relatively little of a scattered light “halo” surrounding them. When they turn green,
on the other hand, the green light will be surrounded by a substantial blue-green halo of
scattered light! Similarly, if you look at red LEDs on (say) electronics in your home, you
will see the LEDs as point-like sources in the dark, with little halo. Blue LEDs on the same
appliance will be surrounded by a substantial blue halo from the ω 4 increase in the intensity
of scattered radiation at all angles BUT the primary line between the source and your eye!
We will not pursue the theory of dipole radiation any further at this time, but we do want
to apply our result to the radiation pattern we expect in a physically important case:
500 Week 10: Maxwell’s Equations and Light
r
+
θ
V0 sin ωt
Figure 10.9: A simple “dipole antenna”. An alternating voltage is applied to two simple
wires, creating an electric dipole whose moment varies harmonically in time.
We can use the results above for a single charge oscillating as part of a stationary
dipole, together with the superposition principle, to get the Poynting vector and Larmor
formula for an arbitrary z-directed oscillating dipole moment, one made up of the collective
oscillation of many charges in the usual coarse-grained limit
With this concept in hand, consider figure 10.9, a dipole antenna156 . An alternating
voltage at angular frequency ω is applied to two conducting rods separated in the middle
by a small gap. The top rod is first charged up to be positive, then negative, then positive,
as the voltage applied to the two rods oscillates, with the bottom rod charging up the
exact opposite way. This makes the two rods at least approximately into a harmonically
oscillating dipole moment centered between the two rods:
p
~(t) = p0 ẑ cos ωt (10.157)
In order for the previous section (that really only worked for “pointlike” dipoles, ones
where pz = qz0 with z0 ≪ λ = c/f for the frequency of electromagnetic radation being pro-
duced) to apply, we will assume that the length of the dipole antenna is much smaller than
a wavelength; otherwise we would have to work much harder (as you may well do in some
future course). Witht his done, we can simply apply the results obtained above, substituting
the collective dipole moment of the antenna for the dipole moment of the oscillating point
charge above:
2 4 2
~ θ) = p0 ω cos ωt sin2 θ r̂
S(r, (10.158)
16π 2 ǫ0 c3 r 2
and average over time as before to get the average intensity radiated in the r̂ direction at
the angle θ relative to the z-axis:
p20 ω 4 sin2 θ
3Ptot
Idipole (r, θ) = sin2 θ = (10.159)
32π 2 ǫ0 c3 r 2 8π r2
156
Wikipedia: http://www.wikipedia.org/wiki/Dipole antenna. If you follow this link, learn at least one thing
from the article: Classical dipole antennae are a lot more complicated than what I present here, which is
something that is at best valid – as noted – only in the limit that the dipole is much smaller than a wavelength
and oscillates more or less like a “perfect dipole”.
Week 10: Maxwell’s Equations and Light 501
where
dW p20 ω 4
Ptot = − = (10.160)
dt 12πǫ0 c3
is the total average power radiated from the antenna.
Note well that in order for the radiation field to take on this nice form, we also need to
be far enough away that the assumption for r at the point of observation that we built into
our derivation of the Larmor formula continues to hold, that is, that we are in the “far zone”
where r ≫ z0 with z0 the maximum displacement of charge from the origin in our antenna.
If this condition is satisfied, we expect the radiation to at least have an angular distribution of
sin2 θ relative to the axis of the dipole itself and drop off like 1/r 2 with distance. Otherwise,
if we are too near the dipole, or if the dipole is too large relative to the wavelength, then –
you guessed it! – we have to work a lot harder and our simple geometric form and estimate
above (while usually not completely crazy) won’t necessarily be particularly quantitatively
accurate.
This is a great place to stop talking about sources (pending some possible future course
in Electrodynamics where physics majors will fill in many of the missing details). Physics
major or not, understanding that:
• electromagnetic radiation in the forms that are most critical to our biology and tech-
nology comes from oscillating electric dipoles;
are the two critical facts you need to kickstart our explanation of that particular band of
electromagnetic radiation we refer to as light as well as the rest of the spectrum.
Problem 1.
Physics Concepts
Make this week’s physics concepts summary as you work all of the problems in this
week’s assignment. Be sure to cross-reference each concept in the summary to the prob-
lem(s) they were key to. Do the work carefully enough that you can (after it has been
handed in and graded) punch it and add it to a three ring binder for review and study come
finals!
Problem 2.
502 Week 10: Maxwell’s Equations and Light
As always, we need to rederive the principle results of the week on our own for home-
work (has it occurred to you yet that this is one of the things we are doing?). So let’s start
by using Maxwell’s equations to show for a z-directed plane wave (where E ~ and B ~ are
independent of x and y) that:
∂Ex ∂By
= − (10.161)
∂z ∂t
∂By ∂Ex
= −µ0 ǫ0 (10.162)
∂z ∂t
and
∂Ey ∂Bx
= (10.163)
∂z ∂t
∂Bx ∂Ey
= µ 0 ǫ0 (10.164)
∂z ∂t
and from this show that (Ex , By ) and (Ey , Bx ) both satisfy the wave equation for a z-
directed wave.
Problem 3.
∂2f 1 ∂2f
− =0 (10.165)
∂z 2 v 2 ∂t2
Show (by drawing appropriate pictures that convince you that it is true so that you
understand it) that these are left and right propagating waves respectively.
Finally show that F0 cos(kz ± ωt) is a function that has this form, so that harmonic
travelling waves manifestly satisfy the wave equation!
Problem 4.
Sun
R
Some science fiction stories, notably ones by Larry Niven, portray space travel around
the solar system occurring with no expenditure of reaction fuel using a light sail. A light
sail is an enormous, extremely thin, perfectly reflecting mirror arranged like a parachute
so that it can ”lift” a payload/space capsule attached to the sail by shroud lines. Radiation
Week 10: Maxwell’s Equations and Light 503
pressure from sunlight exerts a force on the sail sufficient to lift the mass directly out from
the sun, and by altering the angle of the sail one can ”tack” in arbitrary directions.
This problem analyzes the plausibility of this proposal. Start by computing the force
exerted by sunlight on a perfectly reflecting sail at normal incidence a distance R away
from the center of the sun. Note well that a reflecting sail will exert twice the force that an
absorptive sail would (why?). Next, make a reasonable assumption for the density of the
sail material and compute the maximum thickness of a sheet of it that is capable of lifting
its own weight against the gravitational pull of the sun. Using this information, you decide if
the idea of sailing directly away from the sun (with or without a payload) is plausible. Does
your answer depend on how far away from the sun you are?
Of course, this simple no-orbit radial model is naive. In reality, the starting and ending
point of any journey are orbits around the sun; a payload won’t fall into the sun even if it
has no light sail at all as long as it is in a solar orbit, and one has to do a lot of work on a
mass to take it out of a solar orbit if it starts in one.
In general, to go from one orbit to another, it suffices to add energy (and angular
momentum in the proper measure) to the orbiting object (or take them away, of course)
in the correct direction using an angled light sail. Making any assumptions that you like,
make an argument for or against light sails as a means of moving a significant payload
mass between earth orbit and a lunar orbit, or between earth orbit and an orbit around/near
mars without the expenditure of fuel.
In a nutshell, what is the maximum plausible transverse acceleration one can expect to
achieve using a light sail of reasonable thickness angled at θ with respect to the sun, for a
payload of of (say) 1 metric ton (2000 kg)? How large a light sail do you need to achieve
that result?
The power output of the sun is 3.8 × 1026 watts, and its mass is 2.0 × 1030 kilograms. If
you need it, the mean radius of earth’s orbit is R = 1.5 × 1011 meters.
Problem 5.
Consider a resistor capped with perfectly conducting ends. The resistor is a cylinder of
radius a and length L and is filled with a material of resistivity ρ. A voltage V is hooked up
across the resistor so that current flows.
d) Find the magnetic field as a function of distance from the cylinder axis inside the
resistive material (assume that its permeability is µ0 ).
f) Evaluate the flux of the Poynting vector through that surface. Simplify it so that is
given in terms of I and R. Surprise! The Poynting vector precisely predicts Joule
heating!
Week 10: Maxwell’s Equations and Light 505
Problem 6.
Let’s work out an interesting fact about the solar wind. Consider a spherical grain of dust of
radius R with a “reasonable” mass density of 1000 kg per cubic meter (the density of water).
Given the mass of the sun (see problem above), your knowledge of G (the gravitational
constant) and the insight that the radiation pressure from sunlight is approximately exerted
on the transverse cross-sectional area of the sphere πR2 , determine the radius Rc for which
the force exerted by light pressure away from the sun exactly balances the gravitational
force towards the sun.
Will particles larger than this (smaller than this) fall into or be pushed away from the
sun? Note well that this differential force is exerted no matter how far away from the sun
one travels, so particles pushed away are accelarated all the way! This explains why
small particles (gas molecules, dust particles) are accelerated away from stars, forming a
constant “wind” of microparticle radiation.
Problem 7.
Suppose you have a long solenoid (of length L, with n = N/L turns per unit length and
radius R) carrying a time varying current I(t) = I0 (1 − e−t/τ ).
b) Find the induced electrical field at an arbitrary point inside the solenoid (say, at a
distance r from its axis).
c) Find the magnitude and direction of the Poynting vector on an imagined surface of
constant radius just inside the windings at radius R.
d) Compute the flux of the Poynting vector into the volume of the solenoid.
e) Compute the total magnetic energy of the solenoid, and show that the flux of the
Poynting vector equals the rate at which this energy changes.
Problem 8.
A vertical cell phone radio tower acts as a dipole antenna. Suppose such a tower is located
1 km away from your cell phone. It radiates a power of 1 kilowatt. What is the approximate
intensity of this radation when it reaches your phone? Now consider your phone. It’s dipole
antenna radiates roughly one watt when it operates. What is the radiation intensity of your
cell phone back at the tower?
Problem 9.
506 Week 10: Maxwell’s Equations and Light
Optics
507
Week 11: Light
n is called the index of refraction of the medium. You need to know the following
approximate indices of refraction to work problems: Air: na ≈ 1. Water: nw ≈ 4/3.
Glass: ng ≈ 3/2. Any others needed will be given in the problem in context.
• The index of refraction is not constant – it varies with the frequency of the light: n(ω),
a phenomena known as dispersion.
• In the visible range, for most common transparent materials (e.g. normal glass, water,
plastic) n(red) < n(violet, that is, the index of refraction increases with frequency
across the visible spectrum. One can, however, engineer glasses where the oppo-
site is true. Dispersion curves in general have distinct ranges where the index of
refraction increases or decreases with frequency across the entire range of electro-
magnetic radiation frequencies.
θi = θℓ (11.2)
• Snell’s Law:
n1 sin(θ1 ) = n2 sin(θ2 ) (11.3)
• Fermat’s Principle:
Light takes the path that minimizes the time of flight between any two points. Both
the law of reflection and Snell’s law can be derived from Fermat’s principle.
509
510 Week 11: Light
• Polarization:
We describe the orientation and phase of the two components of the electric field
component for a given fixed harmonic frequency as the polarization of the harmonic
wave.
• Unpolarized Light:
Unpolarized light is light for which the polarization vector is constantly shifting its
direction around. On average, unpolarized light has its energy/intensity equally dis-
tributed between the two independent directions of polarization.
• Linear Polarization:
Linear polarization occurs whenever the electric field vector oscillates consistently in
a single vector direction in the plane perpendicular to propagation.
√ √
~ t) = 2 2
E(z, E0 x̂ sin(kz − ωt ± π/2) + E0 ŷ sin(kz − ωt)
√2 2
~ t) = 2
E(z, E0 (±x̂ cos(kz − ωt) + ŷ sin(kz − ωt)) (11.5)
2
There are two independent helicities of circularly polarized light: right (clockwise/+)
and left (anticlockwise/-) when facing in the direction of propagation).
In this expression, E0x and E0y may or may not be equal, and the phases δx and δy
may or may not be zero or equal.
~ · t̂ = Eincident cos(θ)
Etransmitted = E (11.8)
Week 11: Light 511
where θ is the angle between the direction of linear polarization of the incident light
and a unit vector along the transmission axis. This implies that the transmitted inten-
sity is given by:
Itransmitted = Iincident cos2 (θ) (11.9)
This result is known as Malus’s law.
• Polarization by Scattering:
Rays scattered more or less at right angles to an atom, molecule, or speck of dust
are linearly polarized perpendicular to the plane of scattering.
• Polarization by Reflection:
Light that is reflected at a non-normal angle from a dielectric surface is (partially or
completely) polarized parallel to the surface, which is also perpendicular to the
plane of reflection. Light transmitted into the new medium is partially polarized the
opposite way (by subtraction).
The reflected light is completely polarized when the light is incident at the Brewster
angle, where the reflected and refracted rays are perpendicular to each other, given
by:
n2
tan(θb ) = (11.10)
n1
• Polaroid Sunglasses:
Reflected glare from any smooth surface and scattered glare at midday are both likely
to be at least partially polarized parallel to the ground. Both are thus blocked by a
pair of polaroid sunglasses with a vertical transmission axis.
for an approaching (-) or receding (+) source describes the general moving source
doppler shift in the frequency/color detected by the receiver.
• Cerenkov Radiation:
The ”light boom” given off by a charged particle moving faster than the speed of light
in a medium is called Cerenkov radiation.
512 Week 11: Light
We just learned that the speed of light in a vacuum, derived from Maxwell’s Equations, is
√
c = 1/ ǫ0 µ0 = 3 × 108 meters/second. However, we have also learned that the permittivity
and permeability of bulk polarizable matter are not equal to their vacuum equivalents. The
conclusion is inescapable. The speed of light is not c in a medium.
√
We expect it to be v = 1/ ǫµ where e.g. ǫ = ǫr ǫ0 (scaled by the dielectric and dia-
magnetic constants of the material). It turns out for many reasons that the polarization of
the medium always slows down the wave – in free space it just sweeps along, but in the
medium it has to move all of that bulk charge too, which has mass and cannot respond as
quickly. For most transparent materials, µ ≈ µ0 so:
1 c
v≈√ =√ (11.14)
ǫr ǫ0 µ 0 ǫr
To keep life simple, we take all of the contributing properties of the material and roll
them into a single relation:
c
vmedium = (11.15)
n
√
n is called the index of refraction of the medium and is roughly equal to ǫr (which is
dimensionless, recall).
However, there is a problem with this. ǫr is defined in the static limit of ω = 0. Visible
light has a frequency range of (roughly!) 4 × 1014 Hz to 8 × 1014 Hz (see tables below), and
the charges in a dielectric material simply don’t have time to reach their peak polarization
before the wave points the other way!
Indeed, it turns out that the index of refraction is a function of frequency – n(ω) – a
phenomenon known as dispersion. This means (as we shall see) that different frequen-
cies are bent by different amounts via Snell’s law at an interface between two dispersive
media, splitting white light up into a spectrum of colors, with the highest frequency (short-
est wavelength) light usually getting bent the most although this is very much dependent
on the particular medium in question.
This is why water droplets break up light into a rainbow! Note well that this means that
– as far as we can tell examining the world around us or looking back into the remote past
as we look up at the stars – water droplets have always broken up light into rainbows when
backlit by a local source of light, just as they do if you spray water in a fine mist away from
the sun in your back yard.
This has profound religious and philosophical consequences. At one time there was
a rather extensive argument concerning the “frangibility of light” where Biblical literalists
argued that this process could not have occurred before the Flood in Genesis, as it clearly
states therein that the rainbow was first created at a specific antediluvian time as a sign
that God wouldn’t try to drown the world ever again.
It is worth noting that if light wasn’t “frangible” before this (mythical) Flood, there would
have been no light as the processes that produce it are the same as the processes that
break it up in interaction with matter into colors in rainbows and everywhere else. Nor
Week 11: Light 513
would there have been any normal matter – as we have just learned in considerable detail,
the electromagnetic forces that hold atoms and molecules together are the forces that are
responsible for polarizability, which in turn is responsible for dispersion.
In much of the text below, we will idealize the index of refraction and assume that it is
“constant” and “simple” for certain well-known materials. Basically this amounts to taking
its average value in the middle of the visible spectrum as its value, and then picking a
convenient nearby rational number as “the value of the index of refraction” for that medium.
Be aware that this is a pure and simple simplification for the sake of rendering the arithmetic
finger-and-toe easy while still preserving the entire conceptual idea and algebraic structure.
We will also do just enough stuff with dispersion and more realistic n(ω) for us to see how
this goes as well.
The sources of light can classically be viewed as charges bound into an electrically
neutral atom in some sort of equilibrium. There were various classical models of stable
neutral atoms that were tried out in the late 19th and early 20th century but they all failed.
However, we can borrow one of the ideas – the idea that stable systems that are per-
turbed from equilibrium tend to harmonically oscillate, and (as we just saw in the chapter
on Maxwell’s Equations) if that system is an electric dipole, that oscillator will radiate away
electromagnetic energy in the form of harmonic travelling waves – dipole radiation! Al-
though the description of atoms that explains the full span of experimental observations
ended up being quantum mechanical, the experimental observations themseles were in-
deed consistent with light being predominantly a harmonic travelling wave with a more or
514 Week 11: Light
Note that this list is not really complete, nor is it precise, as in many cases the spectral
range of two independent means of generating radiation overlap, or here are multiple ways
of generating the same “kind” of radiation, or particular bands with different uses are lo-
cated within a more broadly named category. It is, however, important to know which of the
principle named bands of waves have longer wavelength (smaller frequency) than which
others, and to know at least approximate boundaries for the most important ranges.
In addition you are required to know the range of wavelengths and frequencies of visible
light. You don’t need to know these specifically indexed by color, but it is interesting to look
over a table of colors and their associated frequencies and wavelengths to get a bit of a
feel for it as well. I assume that most of you know the venerable mnemonic device ”ROY G
BIV” – standing for the colors in the visible part of the spectrum in the order of increasing
frequency/decreasing wavelength: Red Orange Yellow Green Blue Indigo Violet. Note that
many books and tables now omit Indigo as a separate color; this practice is continued in
table 8 in this book.
Both tables assume waves propagating in a vacuum (so the frequencies can easily be
determined by using f = c/λ where λ is the wavelength). Note well that again this table
exaggerates the precision of the boundaries between colors. It is not the case that a wave
with wavelength 621 nm is clearly red, but one with wavelength 619 is clearly orange.
Different books specify slightly different “ends” of the range – 370-760 nm, for example.
Personally, I think you will be just fine if you can remember the approximate ranges:
Then just remember that “red” light can be seen for wavelengths a bit longer than 700
nm, and “violet” light can be seen for wavelengths a bit smaller than 400 nm. Good enough.
incident reflected
θi θl phase shift of π
cancelled
Figure 11.2: When light is incident on a perfectly reflecting surface, it creates little an-
tennas/sources that radiate the opposite field in the direction of the incident field. These
antennas cause the light to be reflected at the same angle and with the opposite phase
from the surface.
A perfect conductor in electrostatic equilibrium, we recall, cancels the electric field in-
side by arranging charges on its surface to effect the cancellation. Similarly, it creates
surface currents that oppose and cancel magnetic fields. In the dynamical case this is
still true for good conductors and optical frequencies. An incoming light wave strikes the
conductor, and its electric field polarizes the surface atoms so that they become little an-
tennae that oscillate along with the electric and magnetic field of the light. However, the
fields produced flip over (the way a dipole field does) and hence propagate in the leading
direction with the opposite phase, cancelling the forward directed field quite rapidly at the
surface (often within a few layers of atoms).
516 Week 11: Light
Since the conductor is good, very little energy is lost to eddy current heating during
this cancellation. The oscillating surface currents must reradiate their energy, and the only
direction they can do so that conserves energy and momentum is to reflect the incident
energy. However, the reflected wave (in order to achieve the cancellation at the surface)
must have the opposite phase from the incoming wave. The situation is very much like the
reflection of a wave pulse on a string from a fixed point on the wall – the reflected wave flips
so it is upside down for precisely the same reasons (energy and momentum conservation).
In an elastic collision with the conductor, the component of the momentum of the light
along the surface is unchanged, but the perpedicular component inverts (becomes minus
itself). The only way this can be true is for the light to bounce off of the surface, with its
phase inverted, at an angle of reflection θr (measured relative to the normal at the surface
at that point) equal to the angle of incidence θi as drawn above.
So that’s it:
θi = θℓ (11.16)
is the Law of Reflection. The polarization properties of the reflected light will be discussed
later below.
Note well that for this to be strictly true requires that the surface in question be extremely
smooth – “shiny” as it were. Otherwise neighboring rays would be reflected at different an-
gles because of small differences in the direction of a normal at different point on a rough
surface. Many (even most) surfaces of real materials are indeed rough on a microscopic
scale (compared to the wavelengths of the incoming light) and hence are diffusely illumi-
nated ty light instead of perfectly reflecting it according to this rule. Many materials also
differentially absorb light and only “reflect” particular wavelengths and hence colors.
We will assume that the law of reflection holds, more or less perfectly, for shiny smooth
good conducting (e.g. metal) surfaces, such as a polished piece of silver or aluminum.
This in turn will help us understand how mirrors work to form images of objects next week.
Light is incident on a surface that separates two transparent media with different indices
of refraction n1 and n2 (where we assume for the moment that n1 < n2 although that isn’t
necessary in the end). This is illustrated in figure 11.3 above.
It should be fairly obvious that the frequency of light in the two media cannot change.
If the same number of wavefronts per second do not pass each point in either medium,
wavefronts must be building up in between. This in turn means that energy (associated
with the wavefronts) must be building up. This simply does not happen.
It should also be less obvious that the wavefronts themselves – the places where the
waves reach their maximum amplitudes – should be the same just inside and just outside
the media interface. For it to be otherwise would require a very strange charge distribution
on the surface itself, one that one cannot easily imagine arising.
Since the wave must change speed across the media interface, and since the speed
Week 11: Light 517
λ1
θ1 D θ1
θ2
λ2
Figure 11.3: When light is incident on a transparent dielectric surface, it is partially trans-
mitted and partially reflected. Since its speed changes, however, the light must change
direction at the surface as shown.
Since the geometry is exactly the same going from n2 to n1 , we conclude that it doesn’t
matter which medium has the greater or the lesser index of refraction.
In figure 11.4, we note that any curved path such as S1 is longer than the path S0 (some-
thing that can be proven using the calculus of variations, which we will not introduce here).
518 Week 11: Light
S0
S1
Figure 11.4: For constant speed, the straight line path between A and B takes the least
time.
The time required to traverse S1 is t1 = S1 /v while t0 = S0 /v. The minimal time path
is therefore clearly the minimal distance path, the straight line. Fermat’s principle thus
correctly describes this case.
Fermat noted that a straight line is the path along which it takes the least time to travel
between two points A and B at constant speed in ordinary space. Any other path is longer
in distance than the straight line path, and hence takes longer to traverse at the same
speed. This is illustrated in figure 11.4 – the curved path is longer, so it takes more time to
traverse it if you have to move at exactly the speed of light (or the same speed along both
trajectories).
Thus when we say that light travels a constant speed (the speed of light) in a straight
line between A and B, it is also true that the path that it follows is the one that takes the
least time.
Now consider the Law of Reflection above. It is equally easy to see that any reflective
path between A and B that doesn’t have θi = θl is longer, and hence takes more time. We
will examine and prove this below using calculus.
What happens when the speed is not constant? In that case, one has to solve an
optimization problem, a problem in economy. It seems that one might be able to obtain
some benefit from going further where the speed is greater and thereby reduce the amount
of distance one has to travel at the slower speed, and actually go between A and B in less
time than the straight line trajectory.
Fermat, observing that light must speed up or slow down as it passes between dis-
tinct physical media, hypothesized that the trajectory followed by light between point A in
medium 1 and point B in medium 2 would not be a straight line; it would instead be the path
that takes the minimum time. This, as we shall see, is another way to get Snell’s law, but
this time in a ray description of the light that is altogether independent of the wavelength or
wave properties of the light.
Week 11: Light 519
Although Fermat was not the first person to propose a variational/minimum principle for
optics (that honor belongs to Ibn al-Haytham in 1021, over 600 years earlier) he was the
first to do so post Descartes, with an analytic geometry capable of fully exploiting the idea.
Although Fermat’s principle puts the cart a bit in front of the horse by making it the cause
of the trajectory followed by light instead of a feature of the trajectory followed by light (that
can be derived from other principles) variational principles based on his original statement
proved to be essential to a formulation of classical mechanics that would translate, with
minimal changes, into a formulation of quantum mechanics. It is therefore worth looking at
in a bit of detail, especially for physics majors or minors.
H1 B
y1
θi H2
y2
θl
x D−x
Figure 11.5: The path with θi = θl is the one with the minimal time when the entire trajectory
is otherwise in a single medium with a constant speed.
In figure 11.5 illustrate and prepare to prove the law of reflection from Fermat’s require-
ment that the time required to go between points A and B on a path that reflects off of
the mirror is a minimum. From the result above we can ignore all trajectories that are not
straight except where they strike the reflecting surface. The total distance between the two
points A and B is therefore the sum of the two hypotenuses:
H = H1 + H2
1 1
= y12 + x2 2 + y22 + (D − x)2 2
(11.22)
We need to find a condition that produces the minimum of this function. We therefore
differentiate with respect to x, set the result to zero, and solve for (say) x or θ1 . y1 , y2 and
D are all constant, so (using the chain rule, note well):
1 1
dH 2x 2 2(D − x)
= 2 1 − 1 = 0 (11.23)
dx y12 + x2 2
y22 + (D − x)2 2
or
x x D−x (D − x)
sin(θi ) = p 2 = = =p 2 = sin(θl ) (11.24)
y1 + x 2 H1 H2 y2 + (D − x)2
520 Week 11: Light
If the speed of light is a constant, this condition minimizes both distance and hence
time t = H/v. Thus θi = θl , and we see that the Law of Reflection can be derived from
Fermat’s principle. What about Snell’s Law?
H1
y1
θ D−x
1
x
θ2
y2
H2
Figure 11.6: The path with n1 sin(θ1 ) = n2 sin(θ2 ) is the one with the minimal time when the
trajectory goes between media n1 and n2 where light has distinct speeds. As suggested,
one minimizes the time by choosing a trajectory that trades off more distance in the faster
medium against less distance in the slower one.
To derive Snell’s Law, we need a figure like that one drawn in figure 11.6. As was the
case for reflection, we only need consider straight line trajectories in a given medium, but
we allow x to (again) be a variable that we adjust to find the trajectory with the minimum
time.
The major difference this time is that the speeds in the two media are different. When
we right down the times required for the trajectories in media 1 and 2, we have to include
the indices for refraction for those media, that is:
p p
y12 + x2 n1 y12 + x2
t1 = = (11.25)
v1 c
and p p
y12 + (D − x)2 n2 y12 + (D − x)2
t2 = = (11.26)
v2 c
as the times it takes for the light to travel in a straight line 1) from A to x and 2) from x to
B.
The total time is thus:
p p
n1 y12 + x2 n2 y22 + (D − x)2
t = t1 + t2 == + (11.27)
c c
Differentiating and setting the result equal to zero recapitulates the same algebra as
used above to derive the law of reflection, except that there is an extra factor of n1 and
n2 on each side. The details are thus left as a (simple) exercise that you should attempt
Week 11: Light 521
and we see that Snell’s law can be derived from Fermat’s principle as well!
Variational principles prove to be of great use in more advanced physics, as nature
appears to be intrinsically “economical” and choose extremal paths, usually ones that min-
imize a quantity called the action. Newton’s laws themselves can be derived in a gener-
alized form from a suitable variational principle of a quantity called the “action”, and this
proves to be a useful way to derive and understand parts of quantum theory as well!
θ r = π/2
refracted ray
(does not escape medium)
θc θc
incident reflected
Figure 11.7: Light travelling from a denser medium to a lighter one is totally internally
reflected if θi ≥ θc = sin( nn21 ), corresponding to an angle of refraction of π/2, where the
refracted ray fails to escape the medium.
If a ray is travelling from a denser medium to a lighter one, one quickly observes a
curious thing. Since the ray is bent away from the normal, there exist angles for which
Snell’s law has no solution!
In fact, it is easy to identify an angle of incidence such that the angle of refraction is θr =
π/2. If we assume that n2 > n1 and we are going from medium n2 (the heavier/denser) to
medium n1 (the lighter/less dense):
or
−1 n1
θc = sin (11.30)
n2
If we increase θ2 > θc , we make the left hand side of Snell’s law bigger than n1 but we
cannot find any angle θr for which sin(θr ) > 1!. We conclude that at all angles θc and
greater the ray fails to escape the medium!
522 Week 11: Light
Since it is not absorbed by the interface, and is not transmitted into medium n1 , the only
place the energy in this ray can go is into the reflected ray. The ray is thus totally internally
reflected.
Total internal reflection is extremely useful in our modern society. It is the basis of fiber
optics where (laser) light signals are “trapped” inside a “light pipe” that transmits the light
down the fiber and around sufficiently gentle bends without allowing the light to escape
through the sides of the optical fibers that have an index of refraction greater than that of
the surrounding air or other media.
It is also pretty! Diamonds and the diamond-like compound C3 (Moissonite) have ex-
tremely large indices of refraction, roughly nd = 2.4. This makes its critical angle:
−1 1
θcd = sin = 24.6◦ (11.31)
2.4
Light incident on the facet of a diamond at any angle greater than this (rather small) angle
is trapped by the diamond. Diamonds are cut so that light entering through any given facet
is reflected many times without escaping, so that dispersion splits the light up into many
colors until it escapes either through the sides or at corners or edges. This gives diamond
(or Moissanite) its “bright and sparkly” appearance. Cut crystal prisms and lesser clear
gemstones have much the same properties on a lesser scale, trapping light and splitting it
up into a rainbow of colors to brighten an otherwise drab existence.
11.4.3: Dispersion
n
1.53
1.52
1.51
400 500 600 700 800 λ (nm)
Figure 11.8: An approximate dispersion curve n(λ) for “ordinary” glass. However, distinct
glass mixtures can have very different dispersion curves, including ones where n increases
with increasing wavelength λ (decreases with frequency).
transparent materials have a dispersion in the visible range that decreases (increases)
the index of refraction with wavelength (frequency). A typical dispersion curve for the kind
of glass one might find in a drinking glass or prism is shown across the range of visible
wavelengths in figure 11.8. Note well that violet light (400 nm) has an index of refraction
that is a percent or two higher than the index of refraction of red light (700 nm).
This is sufficient to cause white light incident at some nonzero angle to split up into
its distinct component wavelengths in beams that gradually spatially separate as the light
travels. The band of colors produced by any given source of incident light, sorted out by
wavelength from longest to shortest is called the spectrum of the incident light. White
light is a mixture of all visible colors, and its spectrum is the familiar “rainbow” of colors,
Red Orange Yellow Green Blue Indigo Violet, or “ROY G BIV” (a common mnemonic for
the order). Note well that the frequency order is opposite – from smallest to largest.
One familiar way to get a good spatially separated band of colors is to use two refractive
surfaces, each of which helps to further bend the resolved colors – a prism157 .
white light
red (smallest n)
violet (largest n)
Figure 11.9: A prism causes violet light to be bent more than red light at each interface,
splitting up the originally white incident light into a full spectrum.
In figure 11.9 the way a prism acts on an incident white beam of light is crudely repre-
sented. Red light, with the smallest n, is bent the least (at each of the two surfaces). Violet
light, with the largest n, is bent the most.
Similarly, water droplets or ice crystals that are all roughly the same size can indi-
vidually preferentially divert different colors of light into different angles, creating a ring
spectrum around a white source seen through e.g. a falling rain. When the white source
is sunlight shining through raindrops in the early morning or late evening (so it can come
in underneath the raincloud cover) one sees only half of the ring, a rainbow 158. When the
157
This is yet another of the discoveries/accomplisments of Isaac Newton. He was the first person to deduce
many of the properties of light from the observation that a prism would take an incoming circular ray of white
light and transform it into an exiting ellipse of light with the colors of the rainbow.
158
Or, more rarely, a double rainbow! All the way across the sky!
I’ve never seen a triple rainbow, but they too are possible, and I’m guessing an easy way to go viral if you
ever capture one in a sappy video...
524 Week 11: Light
white source is sunlight shining through ice crystals in light clouds in the atmosphere, one
can get “sunbows”, or more rarely, “sun dogs” formed from refracting/reflecting off of planar
ice crystals.
11.5: Polarization
Ex
z
y S
By
z
y S
Ey B
x
Figure 11.10: The two possible directions for the electric and magnetic vector field compo-
nents to point corresponding to an electromagnetic wave propagating in the +ẑ direction.
As we saw in the last chapter, the electric and magnetic field vectors can point in two
independent directions perpendicular to the direction of propagation (the Poynting vector
direction). These two directions/orientations are portrayed in figure 11.10. However, we
don’t need to define both the electric and magnetic field components, because if we know
(for example) Ex (z, t) then By (z, t) = |Ex (z, t)/c| is determined (and ditto for Ey (z, t) and
Bx (z, t)). As a general rule, then, we will describe the polarization of an electromagnetic
wave with some given frequency and wavelength by fully specifying two independent vector
(harmonic travelling wave) components of the electric field only that are perpendicular to
the direction of propagation.
There are several ways to describe the polarization, and several physical processes
produce polarized light, or take unpolarized light and filter it into some specific polarization
state. To understand this, we will start by building an understanding of what is meant by
“unpolarized” light (which isn’t really unpolarized as in having no polarization state, it is just
mixed-up light with a random, fluctuating polarization state).
Unpolarized light is light for which the polarization vector is constantly shifting its direction
around. For a few tens to thousands of wavelengths the electric field vector points in
some direction. Then it suddenly shifts into a new direction, as its source gets randomly
Week 11: Light 525
Linear polarization occurs whenever the electric field vector oscillates consistently in a
single vector direction in the plane perpendicular to propagation. The following are all
examples of linearly polarized light propagating in the z-direction with frequency ω:
Light linearly polarized in the x-direction:
~ t) = E0x x̂ sin(kz − ωt)
E(z, (11.32)
ŷ × −x̂ = ẑ (11.38)
There is no reason that the magnitudes of the electric polarization components in the two
independent directions have to be the same or to be in phase. We start by considering the
case where they have the same magnitude but are π/2 out of phase:
√ √
~ t) = 2 2
E(z, E0 x̂ sin(kz − ωt ± π/2) + E0 ŷ sin(kz − ωt)
√2 2
~ t) = 2
E(z, E0 (±x̂ cos(kz − ωt) + ŷ sin(kz − ωt)) (11.40)
2
These two components describe a vector of constant length that sweeps around in a circle,
either counterclockwise (-) or clockwise (+). We call this circularly polarized light. Note that
the two components must have equal amplitudes and must be π/2 out of phase to be
circularly polarized. There are two independent helicities of circularly polarized light: right
(clockwise/+) and left (anticlockwise/-) when facing in the direction of propagation).
Note that we might expect circularly polarized light to be produced by a rotating electric
dipole!
If the amplitudes of the two waves are (potentially) different and the two waves are (poten-
tially) out of phase, the most general polarization state is that of elliptical polarization:
In this expression, E0x and E0y may or may not be equal, and the phases δx and δy may
or may not be zero or equal. The amplitudes of the x and y limits define a rectangular
box. The electric field vector rotates within that box wit the box tipped at an angle relative
determined by the relative phase difference δ = δx − δy (where if δ = 0 or δ = π one has
linear polarization).
To see a lovely animation of the electric field vector for various flavors of polarization,
visit:
http://www.nsm.buffalo.edu/∼jochena/research/opticalactivity.html
A polaroid filter is made by putting oriented conducting threads into a transparent medium
in such a way that long currents in those threads created by the polarization component of
light parallel to the thread heats the threads, absorbing and attenuating only that compo-
nent of the incident polarized or unpolarized light and passing the component perpendicu-
lar to the threads (the transmission axis of the filter).
The rules for transmission are simple. If the incident light is unpolarized, on average
half its energy is polarized in either polarization direction. Therefore (assuming that the
Week 11: Light 527
where θ is the angle between the direction of linear polarization of the incident light and a
unit vector along the transmission axis.
To find the transmitted intensity, we need just remember the relation between the elec-
tric field strength and the intensity that follows from the intensity being the time-average
magnitude of the Poynting vector:
1 1
~ ~ E2
I = E × B = (11.44)
2µ0 2µ0 c
The intensity is directly proportional to the electric field amplitude, squared, so that:
scattered rays
incident ray
Figure 11.11: The scattering of initially unpolarized light by a molecule or dust particle.
Note that the polarization is perpendicular to the plane of scattering for each of the possible
outgoing directions.
When unpolarized light passes across an atom or molecule, it polarizes it in the in-
stantaneous direction of the electric field vector (which, recall, has a definite direction at
528 Week 11: Light
any time but which jumps around to a new direction every 10-1000 optical periods). The
oscillating molecule acts like a dipole antenna and reradiates the incident electromagnetic
wave. However, the reradiated electric field must be parallel to the dipole moment of the
molecule, and there is no radiation along the dipole (with a clear maximum at right angles
to the dipole. As a consequence we can easily see that the rule for polarization of rays
scattered more or less at right angles is that they must be polarized perpendicular to the
plane of scattering!
θ
n1
n2 φ
Figure 11.12: The scattering of initially unpolarized light by reflection off of a plane surface
between two dielectric media at the Brewster angle that produces complete polarization
of the reflected ray. Note that the polarization of all reflected rays incident on the surface
at an angle is parallel to the ground even at angles other than the Brewster angle.
When light strikes a surface between two regions with differing indices of refraction,
it is partially transmitted and partially reflected (with the amount of each determined by
the angle of incidence and the two indices of refraction). The reflection is caused by the
polarization of surface molecules in such a way that the light scattered by them adds up
coherently into the reflected wave; similarly those polarized molecules create a forward
propagating wave into the medium (although at a different angle according to Snell’s law).
As before, the polarized surface molecules (dipoles) cannot radiate along their own axis so
that light that is reflected parallel to one of the polarization directions cannot contain that
polarization.
This state of affairs occurs when the reflected ray is perpendicular to the refracted ray,
pictured above. In this case:
n1 sin(θ) = n2 sin(φ) (11.46)
so that:
sin(φ) = sin(π/2 − θ) = cos(θ) (11.48)
As we have just seen, reflected glare from any smooth surface is likely to be at least
partially polarized parallel to the ground. It is thus blocked by a pair of polaroid sunglasses
with a vertical transmission axis. Similarly, (scattered) light from the blue sky viewed near
the horizon at midday is predominantly polarized parallel to the ground and is also blocked
by a vertical transmission axis, which can make e.g. driving safer and less stressful on the
eye.
Since light is a wave, the frequencies picked up by a frequency sensitive receiver (e.g. the
human eye) depend on the original frequency (color) emitted by the source and Doppler
shifted by the motion of the source and/or the receiver. A complete treatment of the
Doppler shift requires relativity and is beyond the scope of this course, but an elementary
treatment suffices to understand the Doppler shift at velocities that are small compared
to the speed of light 159 .
The idea underlying the Doppler shift is very simple. If the source is moving towards
the receiver, its motion foreshortens the normal wavelength, increasing the frequency ob-
served by the stationary receiver. If the receiver is moving towards the source, its motion
reduces the time between the wavefronts it receives, increasing the frequency it observes.
If both motions are occurring, both shifts occur as a product. We show the picture and
quick derivation of each possibility below.
159
At higher speeds, lengths contract and times dilate, so this simple argument has to be made a bit more
complicated. In this case the correct argument leads to the formula for the relativistic Doppler shift for moving
source and/or receiver, but at low speeds the forms for the shifts are approximately (to lowest nontrivial order
in v/c) the same
530 Week 11: Light
Source Receiver
vs
vsT λ’
The source emits light waves that travel a distance λ = cT in a single period T . How-
ever, in the time T between wavefronts, the source moving at speed vs towards the receiver
travels in to the wave it has emitted a distance vs T , reducing the distance at the time of the
next front to λ′ = λ − vs T . This in turn reduces the time T ′ between wavefronts that cross
the receiver (e.g. an eye or camera) and hence we can solve for the frequency shift thus:
λ′ = λ − vs T
cT ′ = cT − vs T
vs
T′ = T 1 −
c
1 1 1
=
T′ T 1 − vcs
f
f′ = (11.50)
1 − vcs
For a source moving away from the receiver the algebra and picture is the same, but the
wavelength λ′ = λ + vs T is increased, so that:
f
f′ = vs
(11.51)
1∓ c
for an approaching (-) or receding (+) source describes the general moving source doppler
shift in the frequency/color detected by the receiver.
Note well that visible light sources moving away from the receiver are shifted towards
the red end of the spectrum, while sources moving towards the receiver are shifted towards
the violet end of the spectrum. Since spectral lines produced by atoms have sharp and
well-defined frequencies, this permits us to ascertain that the visible Universe is expanding
(as all distant stars and galaxies are red-shifted). Since the velocity with which distant stars
are receding from the Earth increases with distance, the red shift becomes a meter stick
permitting us to measure the size of the visible Cosmos. This is a small but significant part
of the physical evidence for the Big Bang cosmological model that so far seems best to
fit the data, and that suggests that the Big Bang occurred approximately 13.5 billion years
ago (give or take a billion years) so that the visible Cosmos is a sphere roughly 27 billion
light years across, containing roughly a trillion galaxies containing order of a trillion stars
apiece. This is around Avogadro’s number of stars.
Week 11: Light 531
With no boundaries visible in any direction, there is no particular reason for us to think
that we are in the exact center of the cosmos, save in the sense that every point is in the
middle of an infinite line. Sometimes small pieces of physics (such as the Doppler shift of
light) can have enormous consequences.
Source Receiver
vr
cT’ vrT’
cT = (c + vr )T ′
vr
T = (1 + )T ′
c
1 1 vr
= (1 + )
T′ T c
vr
f′ = f (1 + ) (11.52)
c
As before, if the receiver is moving away, it decreases f ′ instead of increasing it, so that
the general rule is:
vr
f ′ = f (1 ± ) (11.53)
c
for a receiver moving towards (+) or away from (-) the source.
It is interesting to note that if a source is moving at the speed of light (where these
expressions are no longer valid, alas, although they still capture part of the shift) the fre-
quency f ′ goes to infinity. This divergence occurs in the relativistic expression as well,
and is the moral equivalent of a sonic boom only with light.
532 Week 11: Light
Although particles cannot go faster than light in a vacuum, this is actually a physical
possibility inside a medium. Consider an electron travelling at 0.99c and entering a piece
of glass where the speed of light is only approximately 0.67c. The ”light boom” given off
by the superluminal particle in the glass is clearly visible (experimentally) and is called
Cerenkov radiation. Cerenkov radiation is the basis of some of the high-energy particle
detectors used in many of the big accelerator laboratories in high energy nuclear physics.
Week 11: Light 533
Problem 1.
Physics Concepts
Make this week’s physics concepts summary as you work all of the problems in this
week’s assignment. Be sure to cross-reference each concept in the summary to the prob-
lem(s) they were key to. Do the work carefully enough that you can (after it has been
handed in and graded) punch it and add it to a three ring binder for review and study come
finals!
Problem 2.
Derive Snell’s Law. You may use any method you like (there are several) but the way it
was done in class is probably the easiest).
Problem 3.
for light sources or receivers moving in a vacuum, where the upper signs in both case
refer to approach and the lower signs recession. Note well that this is how the radar guns
police use to trap speeder work, how “doppler radar” used by weather forecasters works
that measure the wind speed of storms and can detect the occurence of tornados, and is
a technology used in a variety of medical imaging techniques including e.g. ultrasound.
Problem 4.
Derive Malus’ Law It = I0 cos2 (θ) where I0 is the intensity of polarized light incident on
a polarizing filter at an angle θ relative to the transmission axis of the filter. I’d suggest
going back to the Poynting vector and expressing the intensity I0 in terms of E0 , the E-field
amplitude of the incident polarized wave.
Problem 5.
Derive Brewster’s Formula (the expression for the angle of incidence for which reflected
light is completely polarized parallel to the surface).
534 Week 11: Light
Problem 6.
• Polarization by scattering
• Polarization by absorption
• Polarization by reflection
These are a mnemonic device for the formulas and help you understand why the transmis-
sion axis of polarizing sunglasses is vertical (to block reflected glare and scattered skylight,
both predominantly polarized parallel to the ground).
Problem 7.
Derive the expression for the critical angle leading to total internal reflection for rays moving
from a dense medium (high n) to a lighter one (with lower n).
Problem 8.
Suppose a layer of oil no = 5/4 is floating on water nw = 4/3, that in turn is on a piece of
glass ng = 3/2. Show that the critical angle for the glass is not changed by the combined
system of layers of water and oil; that rays incident on the glass-water interface at or above
the critical angle for glass-air alone do not escape the final layer of oil.
Problem 9.
Show that in spite of the occurrence of total internal reflection, one can in principle still
see all of bottom in a shallow lake stretched out before your feet. That is, although some
rays of light from a fish on the bottom are trapped and escape, there are others that will
reach your eye no matter where your eye is located. (Other factors – ripples, reflections off
of the surface, murkiness in the water – may limit your vision, but it isn’t that any part of the
bottom is theoretically invisible because light from there cannot escape to reach your eye,
it is that the light that does reach them may be very faint and difficult to resolve from other
things going on.)
Note that the “answer” to this question is likely to be a diagram or figure that illustrates
the answer, not algebra per se, although one can always support the answer further with
algebra.
Problem 10.
apex angle
n=1 (air) α
δ angle of deviation
light
n
In the figure above, a beam of light is incident from air onto a prism with an apex angle
α. Its angle of incidence is adjusted until it refracts symmetrically across the prism, with
the ray crossing the vertical bisector of the prism at right angles. Prove that the angle of
deviation, δ, is related to α and n by:
apex angle
n=1 (air) α
δ angle of deviation
light
n
Prove that the angle of deviation, δ, is a minimum when the light ray crosses the
vertical bisector at right angles so that the figure has full reflection symmetry if one reverses
the direction of the ray.
536 Week 12: Lenses and Mirrors
Week 12: Lenses and Mirrors
• The distance from a mirror (or lens) to an object one is viewing in (or through) it is s,
the object distance. Object distances are positive if the object is on the side of the
mirror (or lens) that the light is coming from. Object distances are obviously ‘always’
positive, unless the object is a virtual object formed out of the image of a previous
mirror or lens, which can be either positive or negative.
• The distance from a lens or mirror to the image one is viewing is s′ , the image
distance. Image distances are positive if the image is on the side of the mirror (or
lens) that the light is going to.
• The focal length f of a mirror (or lens) is the point where incident parallel rays are
focused to (for positive focal lengths) or appear to be defocused from (for negative
focal lengths). f is typically measured in meters (SI) or centimeters (for convenience).
However, the strength of lenses is usually given in diopters, where:
1
d= (12.1)
f
with f in meters. This a one diopter (1.00d) lens has a focal length of 1 meter. A
10.00d lens has a focal length of 0.1 meter. A diverging lens with a focal length of
one centimeter is -100.00d.
1 1 1
+ ′ = (12.2)
s s f
• The transverse magnification of a simple mirror (or lens) is defined by the ratio of the
image height y ′ to the object height y:
y′ s′
m= =− (12.3)
y s
• A real image is one where the rays of light that appear to the eye to diverge from a
point on the image actually pass through that point. A virtual image is one where
the rays of light that appear to the eye to diverge from a point on the image do not
actually pass through the image.
• In addition to being real or virtual, an image can be erect (oriented the same way as
the object) or inverted (oriented the opposite way from the object.
537
538 Week 12: Lenses and Mirrors
• For a thin lens, the focal length is given by the lensmaker’s formula:
1 1 1
= (n2 − n1 ) − (12.5)
f r1 r2
In this expression, n1 is the index of the surrounding medium (typically air, n1 = 1)
and n2 is the index of refraction of the lens itself. r1 (r2 ) is the radius of curvature of
the first (second) surface struck by the ray, with the sign convention that it is positive
(negative) on the side of the lens refracted light is going to (coming from).
The advantage of using diopters as a measure of lens strength is inherent in this
expression, as you can see that the combined strength of the two lensing surfaces
(in diopters) is equal to the sum of the strength of each surface, in diopters. This
extends to any pair of lenses placed close together – the effective strength of two
lenses closely placed (relative to their focal lengths) in front of one another is the
sum of their strength in diopters.
• The simple magnifier is a converging (f > 0) lens placed immediately in front of the
eye. An object placed at its focal point therefore forms a virtual image at infinity that is
automatically brought into focus by the relaxed normal (or vision corrected) eye. The
magnification of the object occurs because one can bring the object closer to the
eye than xnp and still see it clearly, where it subtends a greater angle on the retina
(angular magnification). Its magnification is given by:
xnp
M= (12.6)
f
It is very important to understand the simple magnifier, as it forms the eyepiece of
both the microscope and the telescope.
• A telescope is used to view a distant object by making the angle its image subtends
on the retina larger. Two lenses are situated at ends of a tube such that their focal
points are coincident. The first lens (with a long focal length) forms a real image of
the distant object more or less at its focal point. The second lens (with a short focal
length) is used to view this real image as a simple magnifier. This produces a virtual
image at infinity that subtends a greater angle than the original object did, viewable
with the relaxed normal eye.
The overall angular magnification of a telescope is given by:
fo
M =− (12.7)
fe
The eyepiece lens can be converging (regular) or diverging (Galilean). In both cases
this formula for the magnification works (provided that one uses a negative fe for the
diverging lens and place the focal point fo at the focal point on the far side of the
diverging lens). A regular telescope inverts the image, which is inconvenient and
undesireable. A Galilean telescope does not invert the image.
• A compound microscope is used to view a very small, but nearby object. It accom-
plishes this in two stages. Two short focal length lenses are situated at ends of a tube
much longer tube. The tube length ℓ of the microscope is by definition the distance
between the focal point of the first, or objective lens (which must be converging) and
the second, or eyepiece lens. The object is placed just outside of the focal length of
the objective lens in such a way that it forms a magnified, real image of the object
more or less at the end of the tube length. The eyepiece lens is used as a simple
magnifier to view this real image, and can be converging or diverging as was the case
for the telescope. It produces a virtual image at infinity that subtends a greater angle
than the real image formed by the objective lens alone would if viewed at the near
point of the relaxed normal eye.
The magnification of the objective is:
ℓ
Mo = − (12.8)
fo
ℓ xnp
Mtot = − (12.10)
fo fe
where as before, this formula for the magnification works provided that one uses a
negative fe for the diverging lens and place the real image formed by the objective on
the far side of the diverging lens. A regular microscope inverts the image, which is
inconvenient and undesireable. A “Galilean” microscope does not invert the image.
eye
lamp
Figure 12.1: How the eye sees an object. Light diverging from points on the surface of the
object are focused onto the retina of the eye, where they form an image of the object that
the retina converts into neural impulses and your brain converts into perception.
Objects in the real world that are illuminated by diffuse light absorb the light at every
point on their surface and then reradiate (selected colors/frequencies) from each point in
all directions. This is why you can see something that is illuminated from all angles – every
point on its surface emits light reradiated from the illuminating source in all directions so
no matter where you look at it from, some of the light reaches your eye.
To completely understand how your eye can see the object, we have to get halfway
through this week’s work. On the other hand, we can’t understand enough about how
mirrors and lenses work to understand the eye without understanding the eye well enough
to understand how lenses and mirrors work.
Hmmm, a bit of a dilemma. We have to bootstrap just a bit and draw a few pictures
now that you won’t completely understand later to help you understand what you need to
understand what you need to understand later. Or something like that.
So meditate on the picture above, which shows light diffusely scattered from from a
couple of points on a common object. The light goes in all directions from all of the points
Week 12: Lenses and Mirrors 541
on the surface of the object. Some of these rays reach your eye. There the lens of your
eye does its thing, and forms a nice sharp image of the object cast upon the retina of the
eye. Vision occurs.
s s’
Now consider looking at an object in a plane mirror. Lamps are too hard to draw, so we
consider an arrow, which we will use as a “generic object” in our diagrams.
Rays radiated from the object radiate out in all directions as shown in the figure above.
When they strike the mirror they are reflected with the angle of incidence equal to the angle
of reflection. As we look at the mirror, we see the rays that originated on a single point on
the object as if they were diverging from a single point in space. That point is the image of
the point on the object. Since every (visible) point on the object corresponds to an apparent
point of divergence in space from the image, we can see the image exactly as if we were
looking at an object.
In the case of a plane mirror (above) the image is always behind the mirror. The light
rays you see do not actually pass through the image, they simply appear to diverge from
it. We call such an image a virtual image.
We need to define several quantities that will be essential in our analysis of how lenses
and mirrors work. The distance from a mirror (or lens) to an object one is viewing in (or
through) it is s, the object distance. Object distances are positive if the object is on the
side of the mirror (or lens) that the light is coming from. Object distances are obviously
‘always’ positive, unless the object is a virtual object formed out of the image of a previous
mirror or lens, which can be either positive or negative.
The distance from a lens or mirror to the image one is viewing is s′ , the image distance.
Image distances are positive if the image is on the side of the mirror (or lens) that the light
is going to.
Multiple mirrors can be used to create images of images, or images of images of images
(used as “virtual objects” for the second mirror). Most of us have experienced the “infinite
tunnel” of images that results from standing directly in between two plane mirrors.
542 Week 12: Lenses and Mirrors
object P
image P’
Figure 12.3: Two mirrors create an image of an image. Only a few of the many rays are
drawn – copy the picture and fill in more yourself.
Plane mirrors simply create a perfect image of everything that is in the real space reflected
in the mirror. Things get more interesting if the mirrors are curved. Curved mirrors can
create images that are systematically larger or smaller than the object, and can create a
new kind of image from the one seen in figure (12.2).
In figure (12.4) we see a concave spherical mirror, which we will also call a converging
mirror or a positive mirror160 . The horizontal line running through the center of the mirror
is very important and is called the axis of the mirror, which is rotationally symmetric about
this axis. Even imaging an arrow is too complicated for our purpose (which is to figure out
how spherical mirrors can make images at all) so we look for the image of a single point
P, which we locate for convenience on the axis of the mirror.
The image P’ occurs where two reflected rays cross. The two rays in question are the
one that strikes a distance l up the mirror (with angle of incidence equal to the angle of
reflection) and a ray that goes along the axis and is reflected directly back the way it came.
This is a new kind of image – the rays don’t just appear to come from a point in space (a
point that is really in the dark of your closet or medicine cabinet, back behind the mirror)
as they do with a virtual image, they really reach the eye after passing through a point in
space. You could reach out and put your finger through the point in space they appear
to be coming from. We call this kind of image a real image, and we need to be able to
determine whether an image is real (the kind of image that can be projected on a retina,
piece of film, wall, projector screen) or virtual (which cannot be projected at all, since no
light actually passes through the image), so be sure you understand the distinction and
can categorize images you determine from e.g. ray diagrams.
We begin by making an essential approximation. We will later talk about aberrations of
160
For those who have concave/convex dyslexia, remember that concave is like a cave, and curves inward,
while convex is nothing at all like a vex. What is a vex, anyway?
Week 12: Lenses and Mirrors 543
lenses and mirrors – things that prevent rays from a single point on the object from d. One
of the most important ones will be spherical aberration – spheres have this annoying habit
of not focussing parallel rays from an object point far from the axis or rays that are near the
axis but that are not approximately parallel to the axis down to a single point in the image.
We can’t have that, so we insist that the rays we will deal with be paraxial – close to the
axis and close to parallel. The former means that we strike the mirror close enough to its
center for us to be able to pretend that the deflection occurs in a (slightly) curved plane;
the latter means that small angle approximations will all work quite well.
l
α β γ
P P’
s’
s
r
Three important lengths are drawn onto the figure: s, s′ , and r, as well as the distance l
itself. Note well also the four angles: α, β, γ and the angle of incidence/reflection θ. Since
the angles are all small and l is close to a straight line:
l
α ≈ (12.11)
s
l
β = (12.12)
s
l
γ ≈ (12.13)
s′
(where the result for β, note well, is exact because l really is the length of a circular arc
that is subtended by the angle β).
We now play games with the triangles in the picture. We use the following rule several
times: Consider the triangle with α, θ and the angle δ (filled in to figure (12.5)). We can
α δ β
Figure 12.5: α + θ = β.
If we eliminate θ, we get:
α + γ = 2β (12.16)
Finally, if we substitute in all of the small angle approximations and cancel l, we get:
1 1 2
+ = (12.17)
s s r
As we move the object back farther and farther from the mirror (let s → ∞) we note that
the image distance approaches r/2. Rays coming from an infinitely distant object arrive at
the mirror parallel and converge at s′ = r/2. We define the point where a lens or mirror
focuses parallel, paraxial rays to be the focal point of the lens or mirror. Thus:
r
f= (12.18)
2
and
1 1 1
+ = (12.19)
s s f
This is a very important result! It is the equation we will use to analyze all images formed
by curved mirrors and thin lenses (after we derive the same formula for the latter) so be
sure that you have learned it and understand it.
The focal length f of a mirror (or, soon, thin lens) is the point where incident parallel
rays are focused to (for positive focal lengths) or appear to be defocused from (for negative
focal lengths). f is typically measured in meters (SI) or centimeters (for convenience). You
may have observed that the stronger (more curved!) a mirror is, the smaller its focal length
is. We might like to invent a quantity that expresses this strength more intuitively, so that a
larger value of the quantity corresponds to a more strongly acting merror (or, shortly, lens).
The simplest way to accomplish this goal is to express the strength of a mirror by the
inverse of its focal length, a quantity called its power (symbol P ), given in new (SI) units
called diopters (D). That is:
1
P = (12.20)
f
with f in meters. Thus a one diopter (1.00D) lens or mirror has a focal length of 1 meter.
10.00D corresponds to has a focal length of 0.1 meter. A diverging lens or mirror with a
focal length of one centimeter is -100.00D.
Note that now the relation between power and the effect of the mirror is much more
intuitive: P = 0 describe a flat mirror that doesn’t magnify or shrink the image at all. This
is much better than describing such a mirror with “f = ∞”. There are other advantages to
power expressed in diopters – so much so that we’ll spend an entire section on it later –
but for now let’s just note that it is possible to use the same inverse length units to write the
thin lens/mirror equation above. We’ll define the inverses of image and object distances to
be the two symbols v = 1/s, v ′ = 1/s′ , so that161 :
v + v′ = P (12.21)
161
Note to experts: Obviously, v and v ′ are intended to sound like “vergences” in the Cartesian description of
the lens/mirror equations in the paraxial approximation, but to avoid confusing introductory students I am not
using the sign convention wherein V = −v is generally negative. If you are not an expert, ignore this footnote!
Week 12: Lenses and Mirrors 545
expresses the a direct (instead of reciprocal) algebraic statement of the mirror equation.
However, this is not necessarily easier to use for the purposes of computation, as one
still (ultimately) has to do the same algebra to e.g. actually compute s′ from a knowledge
of s and f . We will postpone further discussion of diopters until we reach multiple-lens
systems, as this is where they really shine!
At this point we have derived a simple equation relating s, s′ and f . The only rule
we have used so far in deriving that equation (which you can easily see holds for plane
mirrors as well) is the law of reflection. We have deduced as a theorem of this the rule that
parallel paraxial rays are diverted by a converging mirror to an image at the focal distance
from the mirror. We now need to take these two rules (and a third that is a restatement
of the second) and use them to construct ray diagrams that permit us to visualize how
a converging or diverging mirror forms an image out of rays diverging from an object.
Constructing such diagrams, and answering a more or less standard set of questions, will
constitute most of the problems associated with this chapter.
To construct our ray diagrams, we need to begin by idealizing spherical mirrors in a way
that “hides” things like the fact that many rays we might wish to image with are not paraxial.
Later in this chapter we’ll deal with many of the aberrations that are features of real lenses
and mirrors as deviations from ideal behavior in the focussing elements themselves or
the light that goes through them, but these will be “corrections” that should not cloud our
perception of how things basically work.
First, when drawing rays in a ray diagram, one always assumes that all deflection by
the lens or mirror occurs in a single plane. This is an idealization, to be sure – the reason
mirrors and lenses focus light is because they are curved, not planar. But paraxial rays by
definition strike close enough to the center that the deviation from planar can be ignored,
and we idealize this to the entire plane.
Given this, the following three rays have rules that can be used to locate images and
compute magnification for any mirror (and eventually, lens):
a) The Parallel Ray: A ray from the object that is parallel to the axis of the mirror is
reflected by the mirror through the focal point.
b) The Focal Ray: A ray from the object that strikes the mirror either through the focal
point or along a line that comes from the focal point is reflected parallel to the axis
of the mirror.
c) The Central Ray: A ray from the object that strikes the mirror in the center is reflected
by the mirror with angle of incidence equal to the angle of reflection which means
that the reflected ray is symmetric across the axis from the incident one.
Now consider the following ray diagrams for various positions of our archetypical arrow
object for converging (+) and diverging (-) ideal mirrors.
546 Week 12: Lenses and Mirrors
object
1
2
image f
s’
s
y
α
α
y’
s’
Figure 12.7: Transverse magnification can be determined from the two right triangles
formed with the central ray as a hypoteneuse.
y y′
tan(α) = − = (12.23)
s s
Week 12: Lenses and Mirrors 547
3
image
1
object f
s’
s
2 image
object
f 3
s’
s
We only need to present one diagram for diverging/convex mirrors, as they all have the
same general diagram independent of the relative size of s and f . Note that the first and
object
1
2
3
image
f
s’
s
second rules are “backwards” compared to converging lenses. A ray parallel to the axis is
deflected so it appears to be coming from the far side focal length. A ray headed to the far
side focal length is deflected back parallel to the axis. The central ray is drawn as before.
We apply as always the mirror/thin lens formula: 1/s′ = −1/10 − 1/20 = −3/20 so s′ =
−6.7 cm. The magnification is m = −(−6.67)/20 = 0.33. The image is erect, virtual, and
smaller than the object. All of these general properties will apply (with different numbers)
to any diverging mirror.
If you master drawing these generic diagrams (and can manage the very simple algebra
associated with evaluating e.g. s′ and m given s and f , you can with patience analyze any
combination of mirrors (and later) lenses) you are presented with.
Week 12: Lenses and Mirrors 549
12.4: Lenses
A spherical lensing surface between two different media with different indices of refraction
are drawn in figure (12.11).
n1 θ1 n2
θ2
l
α β γ
P r P’
s
s’
Figure 12.11: Diagram that shows how a spherical lens creates an image via refraction.
As was the case for the mirror, the three angles α, β, and γ in the small angle approxi-
mation can be written as:
ℓ
α ≈ (12.25)
s
ℓ
β = (12.26)
r
ℓ
γ ≈ (12.27)
s′
We also have Snell’s law for the (small) angles θ1 and θ2 :
so
n1
θ2 = θ1 . (12.29)
n2
Using triangle rules like the ones above, we also get:
θ1 = α + β (12.30)
and
β = θ2 + γ (12.31)
Eliminating θ2 , this becomes:
n1
θ1 + γ
β= (12.32)
n2
If we multiply both sides by n2 and substitute θ1 from the first equation, this becomes:
n2 β = n1 α + n1 β + n2 γ (12.33)
550 Week 12: Lenses and Mirrors
or
n1 α + n2 γ = (n2 − n1 )β (12.34)
r1
r2
positive radius of curvature r1 . The second surface has a negative radius of curvature r2 .
The index of refraction of the lens is n.
Suppose we have an object on the left hand side of this lens at distance s. From the
formula above, we have:
1 n 1
+ = (n − 1) (12.37)
s s′ r1
The image of the first lensing surface is a virtual object for the second lensing surface.
Because it is virtual (located to the right of the second surface, on the side light is going
to) and because we are going from the material with index of refraction n into air, the
formula for the second lensing surface is:
−n 1 1
′
+ ′′ = (1 − n) (12.38)
s s r2
If we add these two formulae, the s′ term cancels and, we get:
1 1 1 1 1
+ ′′ = (n − 1) − = (12.39)
s s r1 r2 f
Week 12: Lenses and Mirrors 551
This is the thin lens formula where s′′ is the final location of the image of the entire lens.
Note that this is identical to the formula for the mirror. The focal length is given by the
lensmaker’s formula:
1 1 1
= (n − 1) − (12.40)
f r1 r2
2
f
s s’
Figure 12.13: A converging lens with focal length of 10 cm and an object at s = 30 cm.
With the thin lens formula in hand, we can easily adapt exactly the same rules for
drawing ray diagrams for locating images. Let’s draw a simple ray diagram for a converging
and a diverging lens that are similar to the ray diagrams above for mirrors. We do the usual
algebra and arithmetic: s1′ = 101
− 301
= 302
so s′ = 15.0 cm, m = − 21 . The final image is
inverted, real, and smaller than the object.
As before, if one puts an object inside the focal length it will make a magnified, erect,
virtual image, if one exchanges the position of object and image in the example above, one
will obtain an inverted, real image that is larger than the object.
A diverging lens, on the other hand, has only one generic diagram to be learned. It
is basically the same as for the mirror, except that rays are transmitted through the thin
lens (with all bending occurring at the thin plane representing the center plane of the lens)
instead of reflected from it. In the situation represented in figure (12.14), the image is
virtual, erect, and smaller than the original object. Show (from the numbers and thin lens
formula) that s′ = −6.67 cm and that m = 1/3.
We already encountered our first compound lens system made up of two lens surfaces
in our derivation of the thin lens equation above. A similar idea can be used to analyze
systems made up of two (or more) lenses, using the image of the first lens encountered by
light as it passes through the system as the “object” of the second lens, and the image of
the second as object of the third, etc. In a moment, we’ll analyze a few such systems (and
552 Week 12: Lenses and Mirrors
s’
f
s
Figure 12.14: A diverging lens with focal length of −10 cm and an object at s = 20 cm.
get some practice at using the thin lens and/or mirror equations while we are at it) but first,
let’s learn more about an important concept in optics – that we mentioned briefly in our
discussion of mirrors – that massively simplifies the algebra (and arithmetic!) of multiple
lenses and is the one commonly used in the everyday optics of the glasses used to correct
defects in vision: the diopter.
12.5.1: Diopters
You will have noticed, I’m sure, that the thin lens equation and the mirror equation are both
reciprocal sum equations. As a consequence, if you try to solve them algebraically, you
will always find yourself doing things like (to find s′ given s and f ):
1 1 1 1 1 1 s−f
+ ′ = ⇐= ′
= − =
s s f s f s fs
If you are given numbers, say s = 30 cm, f = 10 cm, then:
fs 300
s′ = = = 15 cm (12.41)
s−f 20
That’s not actually terribly difficult, but consider the following. Recall the short discus-
sion in the section on mirrors where we introduce the notion of the power of a mirror (or
lens) in units of inverse length, so that a flat mirror or lens has power P = 0 instead of
f = ±∞. In the SI system, diopters are equivalent to:
1
1 diopter (D) = = 1 m−1
1m
although one usually doesn’t express other inverse length quantities (such as wave num-
ber) in diopters – the use of the unit is customary only in geometric optics applications.
Let’s recall the expression the thin lens/mirror equation(s) in terms of the power of the
lens or mirror (the inverse of its focal length) given in diopters. We previously defined
Week 12: Lenses and Mirrors 553
v = 1/s and v ′ = 1/s′ with our usual sign convention so v is positive when the object is
on the side light is coming from and v ′ or P are positive when the image distance or focal
length is on the side light is going to. Then the thin lens equation takes the following simple
form:
v + v′ = P (12.42)
v ′ = 10 − 3.33 = 6.67 D
and s′ = 1/v ′ = 1/6.67 = 0.15 = 15 cm. It looks like the algebra is simpler, but the ac-
tual arithmetic itself is pretty much a tossup – we can avoid putting things over common
denominators and taking an inverse, but we have to take inverse of all of the given quanti-
ties instead. So why do we bother with diopters? Why are they the way optometrists and
ophthalmologists prescribe and describe lenses?
It is because of the way we can combine two lenses with different focal lengths to
create the equivalent of a single lens, at least if the two lenses are physically very close
together relative to their focal lengths. Suppose we have two lenses that are physically
very close together as shown in figure 12.15:
f2 f2
f1 f1
Now suppose an object is placed a distance s = s1 to the left of the first lens with focal
162
For the record, the thin lens equation here should really be written as V + P = V ′ , using the object
vergence V = −nv with n the index of refraction of the medium between the object and the lens (in this case
n = 1 for air or vacuum). This has the opposite sign, note well, of the one I am using for v in this textbook,
while the signs of image vergence V ′ = nv ′ and power P are unchanged. However, this Cartesian description
of optics is usually taught in more advanced courses in optics that treat e.g. thick lenses and more general
optical elements in arbitrary media in a matrix representation, and swapping a single sign and adding in a
discussion of why it is necessary to multiply by the index of refraction is going to cause nothing but confusion
in an introductory course.
554 Week 12: Lenses and Mirrors
The second lens again uses a virtual object made of the image from the first lens (with d
ignored), v2 = −v1′ . Then:
or
v1 + v2′ = v + v ′ = P1 + P2 = P (12.50)
The algebra (and result) are now easy. The two “touching” lenses together can be re-
placed by a single lens with the same total power! This makes it very easy to assemble
a set of lenses that have any desired target power/collective focal length!
Note that we get the same answer either way we derive it! Either way leads to:
1 1 1 f1 f2
P = = + = P1 + P2 =⇒ f =
f f1 f2 f1 + f2
Week 12: Lenses and Mirrors 555
if there is ever any real need to find the focal length of the composite lens system. In
most cases there is not! Overall, it is simply a lot more convenient to work with a slightly
more general version of v, v ′ and power when working with complex systems with multiple,
possibly thick, lenses separated by distances too large to neglect.
This is – within an overall sign that isn’t important in this introductory discussion –
the way to “add lenses” using the Cartesian representation of optical elements, a topic
usually treated in higher-level courses in optics. It also neglects what happens when the
two lenses (or lensing surfaces) aren’t very close to each other (so d in the figure above
isn’t “negligibly small”) – so-called thick lenses or lenses separated by large distances
d require (a lot!) more work to handle, and are the motivation for using the Cartesian
matrix representation if you are actually working for an optics company designing multilens
systems. But it is plenty for our purposes in this introductory treatment.
The eye is roughly spherical and approximately one inch in diameter. Figure (12.16)
shows its essential anatomy. Here is a brief review of the components of the eye.
• Cornea: The cornea of the eye is the rounded, transparent structure at the front
of the eye. It is strongly curved, and is responsible for most of the bending of light
required to focus images onto the...
• Retina: The retina is the “film” of the eye. It consists of tight bundles of photosensitive
nerves called rods (sensitive to light intensity) and cones (sensitive to intensity in
specific colors. In the center of the retina is the...
• Macula: The macula is the most sensitive part of the retina and is where one ”sees”
the object of one’s attention. It is more or less in front of the...
556 Week 12: Lenses and Mirrors
• Optic Nerve: which pipes all of the information transduced from the light image cast
on the retina to the brain. The retina (especially the macula) is very sensitive to light
and easily damaged. To control the amount of light entering the eye, the...
• Iris: The iris is a ring of pigmented tissue that can open or contract to let more or
less light into the...
• Pupil: The pupil is the aperture for light into the eye. When it is dark, the iris opens
and lets all the light possible into the retina (which is very sensitive and capable of
seeing with remarkably little light). When it is very bright, the iris closes down to a pin-
point. This actually increases visual acuity – see the pinhole camera – independent
of the action of the...
• Lens: The lens of the eye is normally in a state of tension maintained by suspensory
ligaments called zonules that keep it flattened out, with a maximally long focal length.
A ring of ciliary muscles surrounding the lens can be contracted, which removes a
part of this tension, predictably bulging the lens and thereby reducing its focal length.
This process is called accommodation.
It is important to understand that accommodation can only reduce the focal length of
the lens, not increase it, as well as the fact that the cornea is responsible for most of the
focal length of the combined system – the actual lens is more of a “correction” to the overall
focal length already achieved by the cornea alone. We now need to understand the three
common conditions that describe the eye.
corrected corrected
farsighted eye nearsighted eye
Figure 12.17: The focal length of the relaxed (combined) lensing acting of the eye for a
normal eye, a farsighted eye (hyperopia), and a nearsighted eye (myopia).
The focal length of a relaxed lens of an eye with normal vision is on the retina, so
distant objects (at “infinity” compared to the size of the eye) are automatically in focus (as
a real image cast upon) on the retina. Given a distance from the cornea to the retina of
roughly 2.5 cm, this means that the strength of the lens of a normal eye is approximately
1
0.025 = 40.00d. When viewing less distant objects, accomodation shortens the focal length
to bring them into focus on the retina.
Week 12: Lenses and Mirrors 557
The focal length of a relaxed farsighted eye is behind the retina (too long, strength less
than 40.00d) and is corrected with a converging lens to make up the difference. If one
expresses strength in diopters, one can simply add a converging lens with a strength in
diopters to the strength of the the eye to get the “right strength” to make the combination
focus distant objects on the retina with the eye’s lens relaxed. Note that a hyperopic person
can see in focus all the way out to infinity, but they have to use accommodation to shorten
their lens’s “too long” relaxed focal length see even distant objects, which can lead to eye
fatigue and headaches.
The focal length of a relaxed nearsighted eye is in front of the retina (too short, strength
greater than 40.00d) and is corrected with a diverging lens to take away some of its
strength. A myopic individual simply cannot see distant objects in focus without a cor-
rective lens because accommodation cannot increase the focal length of the eye’s lens, it
can only further decrease it.
Accommodation can shorten the focal length only so far, which limits how close an
object can be and still be focused on the retina. The nearest point one can bring an object
to the eye and still bring it into focus on the retina is called the near point of the eye and is
also the distance of most distinct vision, represented xnp . In most adults, this distance is
around 25 cm (less for small children, longer for the elderly).
A nearsighted person’s lens already has too short a focal length to be able to focus
distant objects on the retina, and accommodation only shortens the focal length still farther.
A nearsighted person cannot see anything clearly at distances greater than some point,
called the far point for that person’s eyes. A nearsighted person is one for whom the far
point xf p is less than infinity.
A common aberration of human eyes is a condition called astigmatism. Astigmatism
is what happens when the eye’s lens is no cylindrically symmetric. That is, the focal length
of the lens in the horizontal plane is not the same as the focal length in the vertical plane.
One can then bring things into focus in one dimension with accommodation, but only at the
expense of blurring them in the other. The solution is to wear lenses that are astigmatic in
the opposite direction to add up to neutral (or to person’s otherwise necessary correction).
As a person’s eyes age, their ability to focus changes. People with once normal vision
can become nearsighted or farsighted. After the age of roughly 50 a new condition often
emerges – that of presbyopism. The collagen of the lens hardens over time. Its flexibility
decreases, making it more difficult for the eye to accommodate and increasing the near
point. This kind of “farsightedness” can occur even for nearsighted individuals. The solution
is to correct with “reading glasses” – positive lenses that permit a presbyopic individual to
read at normal distances. They can be combined into “bifocals” – reading glasses for short
distances plus diverging lenses to correct myopia at long distances – for people with the
latter condition.
558 Week 12: Lenses and Mirrors
The “size” of an object to the human eye is determined by three distinct things. Humans
have binocular vision, and use parallax – the apparent displacement of an object seen from
two slightly different positions – to get a sense of an object’s distance. This is reinforced by
the physiological sense of accommodation, which gives one a sense of relative nearness.
Finally, given the distance, it is determined by the angle the image subtends on the retina.
y α
x np
y β
f
To see a small thing as clearly as possible, we naturally bring it to the closest point
we can, so its details subtend the largest possible angle when our eyes are maximally
accommodating. In figure (12.18) the top picture shows an object of height y viewed at the
near point. When the image is focused on the retina by the maximally accomodating eye,
it subtends an angle of α, where:
y
α ≈ tan(α) = (12.51)
xn p
in the small angle approximation (which is entirely justified because we only “see” detail
with the macula, which in turn only occupies around 0.2 radians in the center of the visual
field. Even if we are examining a larger object, we do so by redirecting the eye to look at it
in patches that cover it in small angle chunks.
To use a simple magnifier we place a converging (f > 0) lens immediately in front of
the eye. The object is placed at its focal point. It therefore forms a virtual image at −∞
that is automatically brought into focus by the relaxed normal (or vision corrected) eye. It
now subtends an angle β on the retina given by:
y
β ≈ tan(β) = (12.52)
f
The magnification is therefore the ratio of the new angle (with the magnifier) to the angle
without it, when the object is seen at the near point. The magnification of the object occurs
Week 12: Lenses and Mirrors 559
because one can bring the object closer to the eye than xnp and still see it clearly (more
clearly, even, than before given that one does not have to accommodate). Its magnification
is given by:
β xnp
M= = (12.53)
α f
It is very important to understand the simple magnifier, as it forms the eyepiece of both
the microscope and the telescope, our next two optical instruments.
12.7.2: Telescope
A telescope is an optical instrument used to bring distant objects closer so that you
can see them magnified and much more clearly. In figure (12.19) you can see what a ray
diagram looks like for light from a very distant object entering the naked human eye. The
rays from the originating point, after travelling a long distance, necessarily enter the eye
more or less parallel and are focused by the relaxed normal lens onto the single point on
the retina determined by the central ray entering at angle α.
fo
α
y
Figure 12.20: The first lens creates a real image of the distant opject.
fo fe
α β
y
Figure 12.21: The second lens acts as a simple magnifier to allow this (tiny, inverted) real
image to be viewed at infinity from a point of view much closer than the near point of the
eye.
560 Week 12: Lenses and Mirrors
fo fe
α β
y
Figure 12.22: This ultimately creates an angular magnification M = −β/α = −fo /fe .
To magnify our view of this object, we begin by inserting a lens with a long focal length
fo into the optical path. This takes light from the (infinitely) distant object and creates an
inverted real image of it at the focal point as shown in the first panel in figure (12.21)
above. We draw many parallel rays and show them as if they were deflected by the ideal
lens at its plane of refraction. This shows how we can use rays from the image the same
way we would use rays from the original object when this image becomes a virtual object
for the second lens, and pick any ray that is convenient for our purposes of analyzing the
magnification.
This image (virtual object) is “infinitely” smaller than the original object but it has the
advantage of being right there in space in front of the eye, not infinitely distant. We can
therefore examine it quite closely. To do so, we use a second lens as a simple magnifier,
placing it so that the virtual object is at its focal point. This is shown in the second panel,
figure ??.
Since the virtual object is at the focal point fe , rays diverging from the virtual object
exit the second lens parallel to the central ray, shown entering at angle β. This bundle
of parallel rays corresponds to a virtual image at (negative) infinity but deflected so that
their angle relative to the central axis if much steeper. We can easily compute the angular
magnification of this telescope by noting that:
y
α ≈ tan(α) = − (12.54)
fo
and
y
β ≈ tan(β) = (12.55)
fe
so that
β fo
M= =− (12.56)
α fe
In the final panel figure 12.22, we show what happens when this final image at infinity
coming in at angle β looks like when closely viewed by a human eye. Since the image is in-
finitely distant (the rays enter the eye parallel) it can be comfortably viewed with the relaxed
normal lens, which will focus the bundle down to a single point on the retina determined
by the central ray at angle β. Obviously the total angle subtended on the retina is much
larger – the object being viewed appears much larger to the eye and senses. The major
disadvantage of this telescope is that it inverts the image – everything viewed is upside
down and backwards. This makes it a bit tricky to find objects as they move the opposite
way one thinks that they should when viewing them through the telescope.
Week 12: Lenses and Mirrors 561
fo
fe
α β
y
Figure 12.23: A “Galilean” telescope uses a diverging lens for the eyepiece. This does not
affect the formula for the magnification, but it ensures that the eye sees the distant objects
erect instead of inverted.
This kind of telescope is called a Galilean telescope and is much more convenient to
look through than a regular telescope. As you can see from figure (12.23), the angular
magnification of a Galilean telescope is still:
β fo
M= =− (12.57)
α fe
(where now fe < 0 is negative) but parallel rays from the distant object enter the eye after
passing through the telescope in the same angular sense that they enter it when viewed
without the telescope. As before, note that we used a ray that would have passed through
the center of the second lens (and the eye, if the eye were drawn into the figure) in order
to determine the angle all of the parallel rays leave the eyepiece lens before entering the
(normal) eye and being focused on the retina.
Telescopes (in the hands of Galileo and others) were an instrument that ushered in
the Enlightenment in the seventeenth century, putting an end to several thousand years
of human history where mythology and inexact observations prevented the systematic de-
velopment of a consistent theory of physics. Let’s look at another instrument that had a
revolutionary impact on human society, the microscope.
12.7.3: Microscope
A compound microscope is used to view a very small, but nearby object. It accomplishes
this in two stages. Two short focal length lenses are situated at ends of a tube much longer
tube. The tube length l of the microscope is by definition the distance between the focal
point of the first, or objective lens (which must be converging) and the second, or eyepiece
lens.
The objective stage of the magnification occurs as the the object is placed on a mov-
able platform just outside of the focal length of the objective lens of the microscope. The
562 Week 12: Lenses and Mirrors
s s’
fo fo l
y
α
y’
Figure 12.24: The first magnification stage of a compound microscope brings a small
object just outside of the focal point of the objective lens into focus as a real, magnified
image at the end of the tube length l. By comparing the two dashed similar triangles, one
can see that the first stage magnification is − flo .
platform is raised or lowered (altering s, the object distance) until the objective lens forms a
magnified, real image of the object at the end of the tube length as shown in figure (12.24).
The magnification of the objective stage is:
ℓ fo + l
Mo = − =− (12.58)
fo s
where the first relation is the one actually used, but the second one (based on the obser-
vation that s′ = fo + l) can be used to find the correct object distance s that will accomplish
this.
This real, magnified image can be viewed with the naked eye, but of course the naked
eye can view it no closer than xnp . The second stage of a compound microscope consists
of an eyepiece lens is used as a simple magnifier to view this real image in precisely the
same way we used it for the telescope, and can be converging or diverging as was the
case for the telescope. It produces a virtual image at infinity that subtends a greater angle
than the real image formed by the objective lens alone would if viewed at the near point of
the relaxed normal eye.
The magnification of the eyepiece used as a simple magnifier is therefore:
xnp
Me = (12.59)
fe
which yields an overall magnification for the two stages working together of:
ℓ xnp
Mtot = − (12.60)
fo fe
As we noted and can see in figure (12.26) above, one can use a diverging lens for the
eyepiece by placing the real image formed by the objective on the far side of the diverging
Week 12: Lenses and Mirrors 563
s s’
fo fo l fe
y
α β
y’
Figure 12.25: The second magnification stage of a compound microscope brings the highly
magnified image from the objective stage close to the eye by functioning as a simple
magnifier. By bringing the virtual image in from xnp to fe it magnifies it by an addtional
x
factor of fnp
e
.
lens to form a “Galilean” microscope. As before (for the telescope) this microscope does
not invert the image (inversion is inconvenient and undesireable) but otherwise the same
formula works for the magnification provided that one uses a negative fe for the diverging
lens. It has the further advantage of having a slightly shorter overall length.
Typical numbers for a compound microscope this might be fo = fe = 1 cm, l = 10
cm, for a total magnification of 250 (inverting or non-inverting). 250x microscopes are
more than adequate to observe e.g. blood cells, bacteria, the cellular structure of plant an
animal tissue, amoeba, paramecium, and a host of microorganisms and cellular structures.
For example, amoeba can range in size from 10-1000 µm (where the latter, note well, is
roughly a millimeter and barely visible to the naked eye). A 250 power microscope can
make an amoeba appear to the eye as large as a 25 cm object, clearly revealing its nucleus
and vacuoles. Even small amoeba or bacteria will appear several millimeters in size at this
magnification.
Just as the telescope caused a revolution in our vision of cosmology and the structure
of the Universe at large distances and over long times, the microscope caused a revolution
in our vision of the world of biology. Disease, which had long been thought of as being
caused by demons or by a curse afflicted on sinners by God, was seen to be caused by
living organisms too small to be seen by the naked eye. Where before the only possible
cure for most diseases was believed to be divine intervention, miracles brought about
by repentance and prayer, the microscope enabled the discovery of antiseptic medicine
– that heat, soap and water, alcohol, and eventually antibiotics kill off disease-causing
microorganisms to prevent or cure disease quite independent of “magic” such as miracles
or prayer. The two together brought about the Enlightenment, a time of intense discovery
and invention that ultimately ushered in the rational modern world of today.
564 Week 12: Lenses and Mirrors
s s’
l
fo fo fe
y
α β
y’
Figure 12.26: A “Galilean” microscope uses a diverging lens for the eyepiece. This does
not affect the formula for the magnification, but it ensures that the eye sees the tiny objects
erect instead of inverted. As always, we use a “central” ray for the second lens that is
deflected at the plane of the first lens as if it passes through both lenses to find the location
and size of the final image.
Problem 1.
Physics Concepts
Make this week’s physics concepts summary as you work all of the problems in this
week’s assignment. Be sure to cross-reference each concept in the summary to the prob-
lem(s) they were key to. Do the work carefully enough that you can (after it has been
handed in and graded) punch it and add it to a three ring binder for review and study come
finals!
Problem 2.
for a spherical concave mirror as seen in class. Remember, this involves drawing a picture
of an object that is a point on the axis of the mirror and the rays that local its point-image,
then doing some work with triangles and the small angle approximation.
Week 12: Lenses and Mirrors 565
Problem 3.
Produce ray diagrams for both lenses and mirrors for all permutations of the following
data: f = 10 cm. f = −10 cm. s = 10, 20, 40, 60 cm. In all cases locate the image (give s′ ),
find the magnification m, and indicate whether the image is erect or virtual.
Problem 4.
∆s′ s′2
ml = = 2 (12.61)
∆s s
I’d “suggest” that you think about your friend, the binomial expansion, when solving this
problem. Is the image “inverted”?
Problem 5.
The human eye is the primary optical instrument. Draw a normal eye, a nearsighted
eye, and a farsighted eye, showing the location of the relaxed-eye focal length in all three
cases. Draw them a second time with the appropriate corrective lenses, showing with
simple rays how they work to fix the problem(s).
Problem 6.
A fish’s eye has a focal length of 1 cm in water (which is just the distance from the lens
to the fish’s retina, of course). Is its focal length in air longer or shorter? Don’t just answer
with a guess – you need to make a complete argument based on the lens-maker’s formula
or Snell’s law directly, supported by pictures. Is the fish nearsighted or farsighted in air?
Conversely, if you open your eyes underwater (and have normal vision in air) are you
nearsighted or farsighted?
Problem 7.
Draw ray diagrams and derive the magnification for: The standard telescope and the
Galilean telescope (one with an eyepiece lens with a negative focal length). Show that the
latter permits one to view the final image at infinity erect instead of inverted.
566 Week 13: Interference and Diffraction
Problem 8.
Draw ray diagrams and derive the magnification for: The standard microscope (with
tube length ℓ) and the “Galilean” microscope (one with an eyepiece lens with a negative
focal length). Show that the latter permits one to view the final image at infinity erect
instead of inverted.
Problem 9.
a) Draw a ray diagram for the simple magnifier, deriving its (angular) magnification in
the standard picture.
b) Solve for where one has to locate the object to form a virtual, erect image at the near
point of the eye xnp as viewed through the magnifier.
c) What is the overall (angular) magnification of the image now (with the image located
at xnp )?
Problem 10.
From the previous problem, you saw that if one places the object viewed with a simple
magnifier at a position that isn’t exactly at focal point of the lens, one can achieve a slightly
greater angular magnification (at the expense of having to use accomodation in order to
view the final image at the near point of the eye instead of at infinity). Both the microscope
and telescope above use the eyepiece lens as a simple magnifier to view a real image.
Based on your result, by roughly what fraction do you think you can increase their
effective magnification if you locate the final image at the near point of the eye? Note
that you can solve this problem by redoing the diagrams and computation of the overall
magnification, or by using your result from the previous problem to estimate the fractional
increase in magnification in terms of xnp and fe . Both will help you understand everything
better.
Week 13: Interference and
Diffraction
• Note well that waves do not travel in straight lines when the pass around or
through obstacles or holes through obstacles that are of the same general order
of size as the wavelength or less! Waves are perfectly happy travelling around
corners (as anyone who has ever watched water waves in a lake or the ocean will
attest).
• The coherence time τcoh of a typical hot source (such as a light bulb) is anywhere
from few tens or hundreds of periods
• Two Slit/Point Source Interference: If one has two coherent, monochromatic sources
that are within one another’s coherence length (typically very narrow slits that are illu-
minated by a single source of plane waves) then the intensity received by a distance
(compared to slit spacing and wavelength) screen is given by:
where
δ = kd sin(θ)
163
Wikipedia: http://www.wikipedia.org/wiki/Coherence (physics). This is a lovely review article on coherence
times and lengths that goes far beyond the remarks below.
567
568 Week 13: Interference and Diffraction
is the phase difference between the light waves from the two slits. In this expression,
I0 is the central maximum light intensity from either of the two slits/sources alone.
• One can easily find the angles θ where maxima and minima in this interference pat-
tern occur.
Heuristically: The maxima occur where the path difference between the two slits,
d sin(θ), equals an integer number of wavelengths (so the light from the two slits/sources
arrives at the screen in phase. The minima occur where the path difference contains
a half integral number of wavelengths, so the light arrives at the screen exactly out
of phase.
By Inspection or Calculus: By inspection, the maxima in the expression for I(θ)
above occur when cos(δ/2) = ±1 and the minima occur when cos(δ/2) = 0. Alter-
natively, one can differentiate it with respect to δ and set the derivative equal to zero
and solve for δ for max’s or min’s that way.
Either path leads one to:
d sin(θ) = mλ Maxima
1
d sin(θ) = (m + )λ Minima
2
with m = 0, ±1, ±2, ±3....
• N-slit Interference: When there are multiple slits, they will all arrive in-phase at the
screen when:
2π
δ = kd sin(θ) = d sin(θ) = m(2π)
λ
or
d sin(θ) = mλ
for m = 0, 1, 2.... At these principle intensity maxima the field amplitude is N times
the amplitude of a single slit, so that the intensity is:
I = N 2 I0
• If we use phasors to search for heuristic minima and secondary maxima, we find
that we get (zero) minima when the phasors form a closed N -gon. This occurs when:
2π
δ=n
N
❩
❩
for (note well!) n = 0✁❆, 1, 2, ..N − 1, ✚
N ❍
✚, N + 1, N + 2, ..2N − 1, ✟✟
❍, 2N + 1... The
2N
crossed out numbers represent places where δ is an integer multiple of 2π, but those
Week 13: Interference and Diffraction 569
are where the principle maxima occur, not another minimum! Secondary maxima
will occur approximately half way in between these minima, when:
1 2π
δ = (n + )
2 N
❩
❩
for (note well!) n = 0✁❆, 1, 2, ..N − 1, ✚
N ❍
✚, N + 1, N + 2, ..2N − 1, ✟✟
❍, 2N + 1... Finding
2N
the exact angles for the maxima, however, requires solving a transcendental formula
as there is a small trade off between unwinding the phasors a bit and the resultant
length.
If this criterion is satisfied, there is a resolvable dip in intensity in between the two
separate maxima. If the two maxima are any closer, there is just one broad central
maximum and one cannot tell that the images of the two source points or wavelengths
are distinct (that is, one cannot tell that there are two source points there at all from
the image).
• The Diffraction Grating: If one illuminates N slits with the distance between adja-
cent slits d (such that all N slits are within the coherence length of the light) then
different wavelengths in the light source have principle maxima at different angles for
any given order. This can be used to perform experimental spectroscopy and invert
the observation as a measurement of the wavelengths of the light in the source. From
the discussion of N -slit interference, we know that the principle maxima are bright-
ened by a factor of N 2 relative to the light from a single slit and that these maxima
occur at the angle(s) where:
d sin(θ) = mλ
for m = 0, 1, 2....
• The resolving power R of a diffraction grating depends on the order of the maximum.
In the small angle approximation,
λ
R = mN =
∆λmin
λ
λmin =
mN
570 Week 13: Interference and Diffraction
so that resolution improves (closer wavelengths can be resolved) with both the num-
ber of slits and the order of the maxima being resolved.
2
sin(φ/2)
I(θ) = I0
φ/2
where the phase angle φ = ka sin(θ) and where θ is, as usual, the angle from the
center of the slit to the point on the screen. The phase angle φ can be thought of as
the phase difference between light from the first Huygens radiator on one side of the
slit and light from the last Huygens radiator on the other side of the slit, the difference
accumulated across the width of the slit.
• A simple heuristic (described in the text) can be used to show that minima occur in
this “diffraction pattern” (the intensity function given above) when:
a sin(θ) = mλ
• In between the minima given at these exact angles are secondary maxima of strictly
descending intensity at the approximate angles:
1
a sin(θ) = (m + )λ
2
As was the case for N -slit intereference secondary maxima, however, the exact an-
gles of the secondary maxima requires the solution of a transcendental equation and
not a formula as simple as this.
• Combined Interference and Diffraction: If one takes (e.g.) two slits, each of width
a, separated by a distance d > a and illuminated by light with wavelength λ, the
intensity on a distant screen is given by:
2
2 sin(φ/2)
I(θ) = 4I0 cos (δ/2)
φ/2
The resulting intensity is the usual two slit interference pattern, modulated by the
so-called “diffraction envelope” of each slit indepedently.
of course, magnify objects almost without bound as far as geometric optics is con-
cerned, but at some point diffraction makes further magnification pointless because
neighboring source points in the field of view are no longer resolvable according to
the Rayleigh criterion at any greater magnification.
The angle of the first minimum (dark ring around the central maximum) produced
by a given wavelength of light is determined by the formula:
D sin(θmin ) = 1.22λ
where D is the diameter of the circular aperture of the optical instrument. It is beyond
the scope of this course to derive this, but it is “reasonable” as an approximation of
the single slit result above. In almost all cases, we are only interest in using this when
the angles involved are very small, in which case we can write:
λ
θmin = 1.22
D
• The Rayleigh criterion for wave-optic resolution with an optical instrument is then
simply that the angle between the two source points as they enter the first lens of the
microscope or telescope must exceed the angle to the first minimum of either one,
or:
λ
αincidence > θmin = 1.22
D
• Thin Film Interference: Light that strikes a thin transparent partially reflective film
on top of a second reflective medium can interfere with itself provided that the film is
thin enough that the total path difference between light reflected from the first versus
the second surface is inside the coherence length of the light. Thin film interference
is what makes soap bubbles and a drop of oil on water on dark pavement swirl with
odd pastel colors.
• To understand this, note that when light reflects from an interface between a medium
with a lower index of refraction (source) and a medium with a higher index of re-
fraction (destination) the reflected wave inverts (shifts its phase by π or a half-
wavelength). When light refects from an interface between a medium with a lower
index (source) moving towards a higher index (destination) the reflected wave does
not invert its phase. Note that we learned precisely these rules for wave pulses re-
flected from the interface between light string and heavier string or vice versa in the
first part of this course.
• Second, the transmitted light that is partially reflected and partially transmitted at
the first surface of the thin film has to travel to the second surface through the film
(typically a distance given as d, not to be confused with the distance between two
slits above) and then back to the first surface again, where the wave that is partially
transmitted here recombines with the original reflected wave. The light that went
into the film thus travels an (approximate) additional distance of 2d, and we can use
the heuristic rule above to determine whether or not we get constructive interference
572 Week 13: Interference and Diffraction
• Let n1 < n2 < n3 or n1 > n2 > n3 , where by convention we will use 123 to indicate
the order of the media in the direction of the incoming light. Then there are either
two phase shifts of π (first case) or no phase shifts of π (second case) at the two
reflecting surfaces of the middle layer, and the phase difference is due only to the
path difference in the film medium with index of refraction n2 . The heuristic rule
is then:
λ
2d = mλ′ = m Maxima
n2
1 1 λ
2d = (m + )λ′ = (m + ) Minima
2 2 n2
with m = 0, ±1, ±2, ±3... as usual. Note Well: the use of λ′ = λ/n2 , the path
difference in the medium must contain an integer number of wavelengths for the
reflected light that emerges back into n1 to be in phase.
• A special result occurs when d ≪ λ. In this case there is “no” path difference, and
the waves emerge in phase for all wavelengths. The surface becomes “shiny”. You
can observe this when a drop of oil spreads out on water on dark pavement – at first
there are many colors and then the surface takes on a silvery grey sheen.
• Let n1 < n2 > n3 or n1 > n2 < n3 . Then there is only one phase shift of π at the first
surface (first case) or one phase shift of π at the second surface (second case), and
the total phase difference is that from the path difference plus an additional phase
of π. This is equivalent to half a wavelength difference. The heuristic rule then
reverses:
1 1 λ
2d = (m + )λ′ = (m + ) Maxima
2 2 n2
λ
2d = mλ′ = m Minima
n2
• A second special result occurs when d ≪ λ. In this case there is “no” path difference,
and the waves emerge exactly out of phase by π for all wavelengths. The surface
becomes perfectly non-reflective, hence transparent. You can observe this when a
soap bubble has persisted long enough for most of its water to evaporate – as it
becomes thinner than the wavelengths of visible light, it becomes almost perfectly
transparent and invisible. This is also used to make nonreflectiving coatings for glass
and lenses to maximize their light transmission.
Week 13: Interference and Diffraction 573
Several weeks ago we learned about harmonic waves, solutions to the wave equation of
the general form (in one dimension):
~
E(x, t) = E0 ê sin(kx − ωt) (13.1)
where ê is a unit vector in the direction of the wave’s polarization. Waves spreading out
spherically symmetrically in three dimensions from a source with radius a have a similar
form:
~ t) = E0 a ê sin(kr − ωt)
E(r, (13.2)
r
~ t)| = E0 is the field strength at the surface of the source for this component
(where |E(a,
of the polarization). Recall also that we only need to write the electric field strength be-
cause the associated magnetic field has an amplitude of B0 = E0 /c, is in phase, and is
perpendicular to the electric field so that the Poynting vector:
~= 1E
S ~ ×B
~ (13.3)
µ0
points in the direction of propagation. Finally, don’t forget that the (time averaged) intensity
of the wave is:
~ >av = 1 E0 B0 = 1 E02
I0 =< |S| (13.4)
2µ0 2µ0 c
We also learned about Huygen’s principle, which states that each point on a wavefront
of a propagating harmonic wave acts like a spherical source for the future propagation
of the wave. This will prove to be a key idea in understanding interference and diffraction
of waves that pass through slits, the superposition principle, which says that to find the
total field strength at a point in space produced by waves from several sources we simply
add the field strengths from all the sources up, and one of the ideas underlying Snell’s law,
that the wavelength of a wave of a given fixed frequency depends on the index of refraction
of the medium through which it propagates according to:
λ
λ′ = (13.5)
n
where λ is the wavelength in free space; the wavelength of a wave is shorter in a medium
with an index of refraction greater than 1 so that the wave slows down. All of these things
that we have already learned will be important in our development of interference and
diffraction.
In addition to these old concepts, we will require one or two new ones. One is the idea
of a hot source. A hot source is something like the hot filament of a light bulb, the hot flame
of a candle, the hot gasses on the surface of the sun, all so hot that they glow and give off
light. Even the gasses in a relatively cool fluorescent tube are “hot” in the sense we wish
to establish, as the atomes that are giving off the light are very weakly correlated with one
another.
574 Week 13: Interference and Diffraction
Although we’ve see that Maxwell’s equations in free space become the electromagnetic
wave equation (so that light is plausibly and electromagnetic wave) we haven’t spent much
time considering how light arises in the first place, how charges can end up emitting elec-
tromagnetic waves. The bulk of our understanding came from thinking about a Lorentz
model atom – an electric dipole moment that harmonically oscillates, producing an electric
field that propagates and oscillates, inducing its companion magnetic field as it goes to
produce a wave.
That’s pretty much how it (classically) goes, so this isn’t a bad thing. We also get elec-
tromagnetic radiation (usually at radio frequencies) if we make a magnetic dipole moment
oscillate in time, for example by putting an alternating current into an antenna consisting of
N circular turns of wire, but radiation from atoms is predominantly electric dipole radiation.
The only “catch” is that the radiation is a quantum process and hence only comes out of
the atoms in particular frequencies and “all at once” instead of continuously and at varying
frequencies as we might expect classically.
There are two general kinds of sources we need to be concerned with when dealing
with electromagnetic waves and superposition leading to interference and diffraction: Co-
herent and Incoherent. These are both relative terms – no causal, periodic source of
electromagnetic waves is perfectly coherent or perfectly incoherent (it would have be pe-
riodic over an infinite amount of time to manage this, which seems infinitely unlikely in a
“messy” Universe), and ultimately source coherence is thus described by a real number
that can vary over some range.
A source is said to be coherent if:
b) The waves emitted by these source are ideally harmonic, that is, their phase tem-
porally accumulates as ωt for the fixed frequency ω and with a constant additional
phase, if any.
In this wave I have illustrated two common sources of incoherence. One is a frequency that
isn’t really constant in time but e.g. slowly varies in such a way that it has some constant
average value, e.g.
Z T
ωavg = lim ω(t)dt (13.8)
T →∞ 0
that is, it might be approximately constant over a time that is long compared to a period of
the wave, perhaps several thousands or millions of those periods, but on shorter times it
might vary within some range. This variation might be caused by e.g. thermal fluctuations
in the source, by thermal doppler shifting of a sharp natural frequency in a gas, or by still
other things (including humans, who amplitude or frequency modulate a carrier wave to
encode information).
In nature, not even quantum sources have infinitely sharp frequencies, so even “monochro-
matic” light is only approximately monochromatic or monochromatic within some band-
width or range165, and the variation over longer time scales may be sufficient to cause
temporal interference (beats) instead of the spatial interference we will examine in this
chapter when waves that follow different paths from a common source are recombined.
The other source of incoherence is the phase angle φ(t). We recall that when we
solved the wave equation we could add an arbitrary phase constant to the argument of the
harmonic wave and we’d still have a harmonic wave. Basically, that constant simply indi-
cated when we “started our clock”, and we could more or less choose to use a sine wave
or cosine wave with no phase at all by starting our clock appropriately when examining or
describing the wave.
The problem is that for many sources, especially hot sources, this clock gets reset
whenever the oscillators that are producing the wave are physically disturbed or re-energized
(the oscillation necessarily damps out over time as the energy in the oscillator is radiated
into the electromagnetic field). There is no reason to expect that the phase of the oscillator
producing the light will be constant over time indefinitely. Indeed, we rather expect the
opposite!
The simplest model for “hot source” incoherence is that of phase interruption. We
imagine a sample of some element that is hot enough so that when an atom collides with
a neighbor it excites some particular oscillator state with a fixed frequency and a phase
determined by the time of the collision. It then oscillates monochromatic light with a phase
and polarization direction determined by the time and angle of that collision. Eventually,
however, the atom collides again, and although the same oscillator state is re-excited and
164
And hence, of course, not perfectly harmonic or monochromatic! Students who have taken more advanced
math can understand this in terms of the Fourier transform of the wave above, which will not be a Dirac delta
function of any single frequency but rather will involve a band of frequencies around a peak at ωavg . This
in turn takes us back to the discussion of amplitude modulated waves from the AC Circuits chapter above
compared to frequency modulated waves that can also be used to carry encoded information. Deep waters
underlie these simple concepts.
165
We speak of “line broadening” and the “natural width” of spectral lines to acknowledge or quantify this.
576 Week 13: Interference and Diffraction
light of the same frequency emerges, it has a (discretely) different phase and direction of
polarization!
In this (most common) case, the hot “monochromatic”166 source is temporally phase
coherent only for the mean time between collisions, which in turn depends on things like
the density of the material and its temperature. Although our mental picture of “collisions”
is simplest to envision for a fluid like a liquid or gas, related (e.g. phonon based) events
also phase interrupt the wavetrains emitted by hot solids, and again there is a characteristic
average time between such phase interruption events.
The effect of these phase interruptions is such that when adding the electric fields of
two completely incoherent sources, no interference or spatial diffraction is observed to
occur – the intensities of the different sources simply add because the fields themselves
add for a few cycles, then cancel for a few cycles, then add, then cancel, in such a way that
the average energy transmitted smooths out and just adds. Temporal incoherence over
long time scales destroys spatial interference patterns and replaces them with mere
average intensity addition167 ! This is very important – it is the reason we don’t see
interference patterns all the time, e.g. why windowpanes and drinking glasses don’t exhibit
thin film interference like that discussed below! Whenever we add two harmonic waves to
get a harmonic wave as a result, we are implicitly assuming coherence.
Hot sources are thus coherent, but only over a comparatively short time. We use the
heuristic arguments above to define the time over which a hot source (or any source) will
remain coherent – the coherence time: τcoh . For most hot sources in the visible band of
frequencies, the coherence time is on the order of a few tens to hundreds of optical periods.
A reasonable round number might be:
τcoh ≈ 10−12 seconds (13.9)
(given frequencies in the range of 1014 to 1015 cycles per second).
Light, of course, doesn’t travel very far in such a short time. We can define the coher-
ence length of light as the distance light travels in the coherence time:
Lcoh = cτcoh ≈ 10−4 meters (13.10)
In all of the text below, we will therefore assume that all of the relevant length scales
(such as the maximum path difference in interference problems) is smaller than 0.1 mil-
limeter, or 100 microns. For slit separations or film thicknesses much larger than this,
interference will generally be washed out by the random phase shifts associated by hot
sources.
Coherent sources in the range of frequences that we might generally call “radio waves”
of all sorts are common as dirt in our society. Every device that transmits energy and
information over a carrier frequency to a remote receiver relies on the coherence of the
transmitted wave to permit information to be encoded on top of that wave.
Coherent sources in the optical regime are correspondingly rare and for all practical
purposes there is just one source of coherent optical radiation – the laser. The laser is
166
In quotes because the fourier transform of a harmonic wave with random phase interruption is no longer
sharp or monochromatic.
167
All of this is proven in more advanced mathematical treatments.
Week 13: Interference and Diffraction 577
nearly unique as a source of monochromatic coherent light. Lasers typically have coher-
ence lengths measured in meters. Lasers are so coherent that light from two different
lasers produces a stable interference pattern. Laser light can be split and sent along two
very different path lengths and still interfere. This is the basis of laser holography 168 , the
ring laser gyroscope169 and laser interferometry170 .
All other sources of visible light generally rely on atoms to produce the actual light, most
often atoms that are hot, hot enough to glow as they thermally bounce off of each other
at high speed, exciting various electric “oscillators” in their quantum structure. The sun
is a very hot source (surface temperature around 5778 ◦ K). Incandescent bulbs produce
light from a hot tungsten filament that is joule heated to some 3600 ◦ K. Fluorescent bulbs
operate much cooler – the optimum bulb temperature is around 313 ◦ K (40 ◦ C or 104 ◦ F)
but are still “hot” in the sense of thermally random and chaotic.
Finally, one of the most recent developments in electrical lighting is the increasing
prevalence of light emitting diodes (LEDs) as commercially important sources of light.
LEDs actually operate at room temperatures and are so efficient that their temperature
generally doesn’t greatly exceed the ambient temperature – nearly all of the energy deliv-
ered to them emerges as light. LEDs are usually more or less monochromatic, emitting
light at particular wavelengths determined by the quantum properties of the semiconduc-
tors that make up the diode. In this they are almost identical to solid state diode-based
lasers, except in the one important regard – they are still “hot” incoherent sources.
Pay careful attention to coherence as you work through interference and diffraction
below. Remember, even hot (monochromatic) sources will usually produce interference
when the light being summed is within the mutual coherence time/length of the light source
in question, and even white light from hot sources – as a mixture of many frequencies that
are all coherent over similar Lcoh – can be locally sufficiently coherent to support e.g. thin
film interference in all of the colors/frequencies independently.
The unifying idea of this entire chapter is then: Monochromatic coherent light from some
source follows two (or more) different paths to reach a detector (e.g. – an eye, a screen
observed by an eye, a piece of film, a photoelectric detector). Along the way it accumulates
phase differences between the waves due to the different path lengths that they follow (and
possibly other things such as reflection that introduce phase shifts discretely along the
way). The electric (and magnetic) fields then recombine, and the intensity of the resulting
electromagnetic field is registered by the detector.
Provided that the maximum path differences involved are less than the coherence
length Lcoh of the light, we will then have to repeatedly evaluate below sums such as
168
Wikipedia: http://www.wikipedia.org/wiki/Holography. This is actually a fascinating topic and a great thing
for someone seeking an extra credit project to try out. It does, however, require a laser, film and a darkroom,
and a very, very solid/motionless lab bench to use as a base, and probably won’t work the first time you try it.
169
Wikipedia: http://www.wikipedia.org/wiki/ring laser gyroscope.
170
Wikipedia: http://www.wikipedia.org/wiki/interferometry.
578 Week 13: Interference and Diffraction
which two narrow slits have been cut. Each slit is so narrow that it acts like a “point”
Huygens radiator. Light from one slit (the upper) travels a long distance and falls on a
distant screen. Light from the lower slit travels this distance plus the additional distance
d sin(θ) to arrive at the same point.
λ r
θ r + d sin θ
d θ
d sin θ
Figure 13.1: Two narrow slits act as Huygens radiators when indident plane wavefronts fall
upon them. Light from the two slits is coherent and in phase as it leaves the slits, but
arrives at P with a phase difference that depends on the path difference.
As long as the distance D between the two slits and the screen is much larger than d the
distance between the slits themselves then the angle θ between the horizontal line shown
and both paths to the point of observation P is the same (although this is not visibly the
case in the figure, where D is not sufficiently large compared to d). The condition d ≪ D is
called the Fraunhofer condition and must be compared to the Fresnel condition which
evaluates interference patterns “close to” the slits where the simplifying Fraunhofer condi-
tion does not hold. Fresnel patterns can “easily” be evaluated as well, but the evaluation
requires methodology that is beyond the scope of this course.
Light from the top slit travels a distance r to arrive at point P . Light from the bottom slit
travels a distance r + ∆r = r + d sin(θ) to arrive at the point P . r ≥ D and d sin(θ) ≤ d,
so r ≫ ∆r. We can therefore find the total electric field at P by adding the electric fields
produced by each slit. Let us call the amplitude of the electric field produced by a single
source in the center of the screen E0 . Then the total field at point P is:
D D
Etot (P ) = E0 sin(kr − ωt) + E0 sin(kr + k∆r − ωt)
r r + ∆r
∆r −1
D D
= E0 sin(kr − ωt) + E0 1+ sin(kr + k∆r − ωt)
r r r
D D ∆r
= E0 sin(kr − ωt) + E0 1− + ... sin(kr + k∆r − ωt)
r r r
D D ∆D
= E0 sin(kr − ωt) + E0 sin(kr + k∆r − ωt) + O
r r r
≈ E0 sin(kr − ωt) + E0 sin(kr − ωt + δ) (13.12)
580 Week 13: Interference and Diffraction
so E0 D/r ≈ E0 for both sources. Obviously this will not hold for large θ (angles pointing
out at the edges of a large screen stretching to infinity on the horizon), nor will it hold
if the screen is close to the two slits (where Fresnel interference or diffraction must be
considered, which is a lot more work and beyond the scope of this course although answers
there are certainly computable). In the last equation we also introduce the phase shift
produced by the path difference:
2πd
δ = k∆r = kd sin(θ) = sin(θ) (13.14)
λ
To add these two waves, we could use a trigonometric identity for sin A + sin B. Un-
fortunately, nobody can ever remember the trig identities for things like this (supposedly
memorized back in high school), including me. For those of us who find it impossible to
remember arbitrary things we memorized out of any context where they would be useful
to us for more than busy work, it behooves us to learn how to derive the answer in simple
ways from things we can remember and that make sense in context. We therefore eschew
the use of a trig identity and derive the result from a geometric picture, a phasor diagram
just as we did before for e.g. LRC circuits.
δ/2
E tot Eo
δ
kr − ω t
δ/2 Eo
kr − ω t
Figure 13.2: Phasor diagram for the addition of the electric field components of two slits.
In figure (13.2) we see the requisite phasor geometry. The light from the first slit has
a field amplitude of the y-component of a “vector” (phasor) of length E0 at angle kr − ωt
with respect to the x-axis. The light from the second slit is the y-component of a phasor of
length E0 at angle kr − ωt + δ. The field amplitude of the sum is the y-component of the
phasor that is the vector sum of these two phasors, added by putting the tail of the second
at the head of the first. Since the triangle representing this sum is isoceles it is easy to see
Week 13: Interference and Diffraction 581
that the two acute angles must both be δ/2172 . The total amplitude is thus the sum of the
adjacent side lengths of the two right triangles formed by dropping a normal as shown:
We don’t actually care about the field strength, of course – we care about the intensity.
The time-averaged intensity of light from a single slit at the point P is:
1
I0 = |E0 |2 (13.17)
2µ0 c
(from the Poynting vector, as we have seen many times at this point). The total intensity
from the pair of slits is therefore:
or
2πd
δ= sin(θ) = ±(2m + 1)π (13.20)
λ
or the actual angles θ where:
2m + 1
d sin(θ) = ± λ (13.21)
2
The intensity is zero at the minima.
The maxima occur at the angles where:
or
2πd
δ/2 = sin(θ) = mπ (13.23)
2λ
or the actual angles θ where:
d sin(θ) = ±mλ (13.24)
The intensity is 4I0 at the maxima.
172
The argument goes as follows: “δ plus the obtuse angle at the vertex of the triangle form a straight line
and hence add up to π. The sum of the angles in the triangle also add up to π. Therefore the sum of the two
acute angles have to add up to δ. The triangle is isoceles, so they must be equal, hence they are each δ/2.”
This is why geometry is better than algebra or trig – proving this algebraically is nearly impossible without the
use of complex variables and with trig identities it is difficult and requires knowing the relevant identity.
582 Week 13: Interference and Diffraction
The minima and maxima occur at precisely the angles that agree with our heuristic
rule from above. We heuristically expect a constructive interference maximum when the
path difference d sin(θ) contains an integer number of wavelengths, and this is exactly
what we get. We heuristically expect a minimum of light from the lower slit travels half
a wavelength farther than light from the upper one, or three half wavelengths farther, or
five half wavelengths farther, and that’s exactly what we get. It’s always nice when our
intuitive, heuristic expectations are confirmed by the actual algebra of the solution. It gives
us confidence that the latter is correct.
λ
r
θ
d θ r + d sin θ
θ
c r + 2d sin θ
d
θ
d sin θ
2d sin θ
Figure 13.3: Three narrow slits, equally spaced a distance d > λ apart, are illuminated
by monochromatic light that is coherent over distances long with respect to both d and λ
to produce an interference pattern on a distant screen. Note well that the path difference
between any adjacent pair of slits is d sin(θ).
In the case of three narrow slits, each separated by the same distance d (illustrated in
figure 13.3, we can follow a more or less identical procedure to find the overall amplitude
from a phasor diagram and square it to find the intensity on the screen in terms of the
intensity produced by a single slit. We can also begin the process of identifying general
rules for finding the angle and amplitude (at least approximately) of important features of
the interference pattern produced, rules that will work for four, five, or indefinitely many slits.
As before we will assume that Fraunhofer conditions hold: the screen is “far” (compared
to d and λ) from the slits, and either we will confine our attentions only to angles that are
near the center of the screen or we will consider the screen to “wrap around” the slits in
the shape of a cylinder so that it is all an equal distance from the central slit173 .
173
Not that we couldn’t explicitly include the effect of r’s gross variation with angle, especially if we pro-
grammed a computer to do the tedious arithmetic for us, but this course isn’t about doing hard arithmetic –
seriously, stop laughing – it is about ideas and the idea of interference can be perfectly well understood and
quantitatively analyzed with these simplifications, idealizations, and approximations.
Week 13: Interference and Diffraction 583
with δ = kd sin(θ) is the phase angle produced by the path difference between any two
adjacent slits174 . Examining figure (13.4) we see that the general result is:
δ
E 0 cos δ
E0
δ
δ
E0 kr − ω t
E0
E 0 cos δ
δ
α kr − ω t
δ
E0
kr − ω t
Figure 13.4: Phasor diagram for general solution for three slits. Note that the amplitude of
the sum of the three phasors Etot = E2 + 2E0 cos(δ).
and we rather expect that the interference pattern intensity will be:
1
|Etot |2 = I0 1 + 4 cos(δ) + 4 cos2 (δ)
Itot = (13.27)
2µ0 c
which equals 9I0 when δ = 0, 2π, 4π... and equals I0 when δ = π, 3π, 5π.... It seems as
though it will equal zero for certain values of the phase angle as well, but how can we
determine which ones?
To answer this last question and find a more general way of determining the pattern of
maxima and minima for 3 slits (and later for more) we turn back to the phasor diagram.
Consider the four diagrams drawn in figure (13.5):
Clearly, we get a principle maximum whenever the three phasors line up (for simplicity
the figures are shown at a time that kr −ωt = 0) for a total field amplitude of 3E0 . This obvi-
ously occurs when δ = 0, but it can also correspond to δ = 2π, 4π, 6π... – rotating any field
phasor through 2π puts it back where it started. We conclude that this arrangement leads
174
Note well that the angles in the corners of the symmetric trapezoid can be seen to equal δ by reasoning
out loud: “δ plus π/2 plus α add up to π because they make a straight line. Inside the bottom triangle, α plust
π/2 plus the unknown angle in the corner at the origin add up to π because it is a triangle. Therefore the
bottom angle must be δ.
And you thought high school jommetry wasn’t good for anything...
584 Week 13: Interference and Diffraction
E0 E0 E0
Principle maxima a
2π
δ=
3
E0 E0
2π 4π
3 3
Minima E0
E0
E0 E0
4π
3
π
Secondary maxima
E0
π
Figure 13.5: Phasor diagrams illustrating principle maxima, minima, and secondary max-
ima in the interference pattern. Note that we get minima when the three phasors close to
get a three-sided polygon or 3-gon (a.k.a. an equilateral triangle in this case). In between
the minima we get maxima, but the secondary maxima are much weaker than the principle
maxima that occur when all three slits arrive in phase because d sin(θ) = mλ.
to a maximum in intensity with Ip = 9I0 called the principle maxima of the interference
pattern, when the condition:
2π
δprinciple max = d sin(θ) = 0, ±2π, ±4π... = ±2π m m = 0, 1, 2... (13.28)
λ
If we divide by 2π and multiply by λ, we see that this corresponds to:
just as before for two slits separated by d, so that the angles for principle maxima are:
figure, that is, a unilateral triangle175 . The two triangles in the figure above thus represent
phase angles that lead to minima.
We observe that we close these triangles when:
2π 4π
δmin = or (13.31)
3 3
or these angles with any integer multiple of 2π added (or subtracted). If we multiply this out
and turn it into a rule, it becomes:
2π 4π 8π 10π 14π
δmin = kd sin(θ) = , , , , , ...
3 3 3 3 3
2π 2π 4π 8π 10π 14π
d sin(θ) = , , , , , ...
λ 3 3 3 3 3
mλ
d sin(θ) = m = ⊗, 1, 2, ⊗, 4, 5, ⊗, 7, 8... (13.32)
3
Note that this is almost the integer multiples of 2π/3 (where 3, recall, is the number
of slits – hmmm, one wonders if this rule generalizes...). However, we have to skip the
multiples of 2π/3 that are also multiples of 2π because we already know that the multiples
of 2π are principle maxima. I remind you of this by putting ⊗’d out holes in the m-sequence
in the final result. We’ll continue this practice in the next section.
Finally, consider the last phasor diagram, which coorresponds to a secondary maxi-
mum. If we set:
δsecondary max = π, 3π, 5π... (13.33)
then this phasor diagram results. Although at the moment there isn’t any compelling reason
to see why (there will be shortly) let’s write this as:
Now let’s look at one more particular case (just to be sure) and then generalize the above
results. We will not worry (in this course) about actually finding the explicit total electric
field and squaring it and factoring it to find the intensity for more than two slits, even though
I derived the intensity for three just to show you one way that it can be done. Instead
we will (continue to) focus on just finding the angles of the principle maxima, the minima,
and approximately finding the angles of the secondary maxima. In all cases we will be
graphically “evaluating”:
where δ = kd sin(θ) is the phase angle produced by the path difference between any two
adjacent slits in a set of N slits.
E0 + E0 + E0 + E0 + E0 = 5E 0
Principle Maxima
4π /5
2 π /5
Minima 8π /5
6π /5
3π /2
π
Secondary Maxima π /2
Figure 13.6: Phasor diagrams principle maxima, minima, and secondary maxima for five
slits. The amplitude of the secondary maxima aren’t exactly E0 (or equal) and the angles
aren’t exactly at δ = 2π/(N − 2) (for N = 5) but this is close enough for an excellent
semi-quantitative graph of the intensities (and our heuristic understanding).
To see exactly how the results generalize, let’s draw the phasors for one more set of
slits, this one with N = 5, in figure 13.6. That should be plenty for us to infer a rule and
understand how diffraction gratings (our next subject) and single slit diffraction (the one
after that) work.
Note the following features, described in terms of the general rules that they represent:
a) Principle maxima have field amplitude of N E0 (for N = 5) when the field phasors “all
line up”. They do so whenever the phase angle δ is an integer multiple of 2π. Clearly
this result (which held for N = 2 and 3 as well) is general. Thus for all N we find:
or:
Principle maxima occur when the light from all of the slits arrives at the point of
observation in phase, which in turn happens when the path travelled by light from any
two adjacent slits differs by an integer number of wavelengths. This makes perfect
sense.
Note well that the series doesn’t continue indefinitely – the largest m that contributes
is one where:
principle max −1 mλ
θm = sin (13.38)
d
exists, so mλ/d has to be less than or equal to 1. This condition constrains all of the
other series (below) as well, just as it did for 2 or 3 slits.
b) Minima occur when the N -gon formed by the amplitudes closes (forming pentagons
or five pointed stars in the N = 5 case). The angles δ where these minima occur
clearly form the series:
2πm
δmin = m = ⊗, 1, 2, 3, 4, ⊗, 6, 7, 8, 9, ⊗, ... (13.39)
5
where I’ve ⊗’d out the values m = 0, 5, 10, .... We have to skip those in the series
because e.g. 10π/5 = 2π, and we already know that δ = 2π is a principle maximum.
Clearly this generalizes to:
2πm
δmin = for
N
m = ⊗, 1, 2, ..., N − 1, ⊗, N + 1, N + 2, ..., 2N − 1, ⊗, 2N + 1, ... (13.40)
c) In between any pair of adjacent, isolated minima, a smooth function must have a
maximum. We therefore expect that in between each adjacent pair of minima enu-
merated above, there must be a maximum. The principle maxima have already been
enumerated, but there also exist a whole list of secondary maxima. These occur
as the “chain” of E-field vectors twists around in between closed N -gons, and occur
close to (but not exactly at) where the (N − 1)-gon closes, leaving a single “dangling”
E0 at the end. If one evaluates the maxima more carefully (using calculus) one finds
that they aren’t exactly at the (N − 1)-gon angles, and don’t have the exact length E0 ,
but they are all close to these angles and lengths and we’ll consider this to be “good
enough” to help us draw a semi-quantitatively correct graph of the intensity.
This was illustrated in the 5-slit example above as:
2πm πm
δsecondary max = = m = ⊗, 1, 2, 3, ⊗, 5, 6, 7, ⊗... (13.41)
4 2
588 Week 13: Interference and Diffraction
where we note that we again have to skip the values of m that would lead to a δ that
is an integer multiple of 2π, and generalizes to:
2πm
δsecondary max = m = ⊗, 1, 2, ..., N − 2, ⊗, N, N + 1... (13.42)
N −1
and so on.
These rules are more than sufficient to allow us to draw a qualitatively correct graph
both of the intensity produced by 5 slits and a “generic” graph of “N “ slits (where of course
we have to pick some large but finite number to illustrate).
You might wonder why we are spending so much time looking at interference through
multiple slits, when we hardly ever run into problems involving interference through just two
slits while shopping at the mall. There are two simple reasons. The first is that interference
from many closely spaced slits is the basis for the diffraction grating, which in turn is the
basis for modern spectrographs. Spectrographs are optical instruments used to identify
e.g. atoms and molecules from their “signature” optical spectra, and are the basis for
much of what we know of the Universe. For example, we know that the physical laws
governing very distant stars very far away (and hence being observed today in their distant
past due to the speed of light delay) is pretty much identical to the laws we observe today!
This may sound silly, but this is an enormously important result. If things like the
gravitational constant G, the electric permittivity ǫ0 , the magnetic permeability µ0 , the speed
of light c – constants of nature, as it were – weren’t constant over time frames of billions
of years, it woul radically alter our perceptions and understanding of the Universe we find
ourselves apparently living in. Instead we find that no matter how far away or how far back
in time we look, the spectra of atoms in stars are pretty much the same, something that
actually tests many of the constants of nature all at once. The physics governing those
stars there, then, seems the same as the physics we learn and use today.
Of course spectrographs are also useful throughout science and technology in a strictly
mundane way. We have many occasions to wish to identify a material, and if we heat al-
most anything until it glows and then examine its light with a spectrograph, we can instantly
identify at least all of the elements in the sample and their relative abundance, if not the
molecules made up of those elements. Chemistry, engineering, and a variety of physical
sciences use this capability every day, using machines that have more or less automated
the process. It does seem wise for us to learn at least in general how this works, and what
limits the resolution and accuracy of the process.
The second place understanding the interference of “many” slits will aid us is in boot-
strapping our understanding of diffraction itself. There a mix of Huygens principle and our
knowledge of N -slit interference will let us quickly come to understand how a single “wide”
slit can produce an intensity pattern, cast on a distant screen, that is the result of part of
the light passing through the slit interfering with the rest, a wave interfering with itself. In
the next two sections we will therefore apply the concepts we have learned for 2, 3, ..., N
slits, beginning with N -slit interference for large N straight up, the diffraction grating.
Week 13: Interference and Diffraction 589
Consider now a diffraction grating – basically an opaque material with many transparent
narrow slits inscribed through the opacity, each separated from its neighbor by a distance
d. We will imagine this grating to be normally illuminated by polychromatic light (with
many frequencies/wavelengths) in such a way that N of them produce outgoing waves that
recombine coherently at the screen, where in application the screen is indeed wrapped
around in a cylinder at a distance that is large compared to d > λ (for any λ in the visible
band).
As we saw in the previous section, the angles at which the primary maxima occur are
determined only by the distanced d such that:
max −1 mλ
θm = sin (13.43)
d
independent of N – indeed, they are at the same angles for 2 slits as they are for 2000.
What changes as we increase the number of slits is the location of the minima and
the secondary maxima in between. Consider the two minima that “bracket” each primary
maximum. Again borrowing results from the previous section, we can see that they should
occur at:
min −1 nλ
θm = sin (13.44)
Nd
for the particular values:
n1 = N ± 1
n2 = 2N ± 1
...
nm = mN ± 1
... (13.45)
where the index nm can (as you can see) take on two values for each m, one for the
minimum immediately before, the other for the minimum immediately after the mth principle
maximum:
nm = N ∗ m − 1, N ∗ m + 1 m = 1, 2, 3... (13.46)
We now no longer need nm . We can directly write these angles in terms of m alone as
(factoring):
min −1 mλ λ
θm = sin ± (13.47)
d Nd
for each pair of values that bracket the mth maximum.
We now make the small angle approximation for both the maxima and the minima. This
may well not be justified – many diffraction gratings will produce even the first principle
maximum at a relatively large angle – but it suffices for us to understand what they do and
590 Week 13: Interference and Diffraction
the idea of “resolving power”, and we can always take the actual inverse sines if needed
for a particular actual grating. With this approximation, we get:
max mλ
θm ≈ (13.48)
d
and:
min mλ λ max λ
θm ≈ ± = θm ± (13.49)
d Nd Nd
This is just what we need to understand what a diffraction grating does: it makes an
absolutely perfect spectrometer, allowing us to cleanly resolve the spectral lines emitted
by hot glowing atoms and molecules and thereby both identify them and make many infer-
ences concerning their structure!
To see how this works, imagine that there are two “spectral lines” λ1 and λ2 being
emitted by a given atom (such as the two emitted by the Sodium atom, with D1 at λ1 =
589.592 nm and D2 at λ2 = 588.995 nm, see homework). The first principle max for λ1
occurs at the (presumed small) angle:
λ1
θ1 (λ1 ) = (13.50)
d
while that for λ2 occurs at:
λ2
θ1 (λ2 ) = (13.51)
d
These two lines are separated in angle by:
λ1 − λ2
∆θ12 = |θ1 − θ2 | = (13.52)
d
The lines projected on the screen, however, are not infinitely sharp (even if the sodium
wavelengths themselves are)! The widths of the first principle maxima at λ1 or λ2 are:
2λ1 2λ2
∆θ ≈ ≈ (13.53)
Nd Nd
If the two maxima are too close together, their lines will overlap and we won’t be able to
tell that there are two lines there at all! On the other hand, if they are far enough apart,
the lines won’t overlap at all (except out in the irrelevant morass of secondary maxima and
higher order minima) and we’ll be able to easily see two lines. We need a criterion for the
minimal resolution of two spectral lines (or anything else) cast as an “image” onto a screen,
or a piece of film, or the retina. Enter Rayleigh’s Criterion for Resolution.
Lord Rayleigh was yet another eponymous physicist who studied the wave properties of
“rays” and things such as the resolving power of spectral gratings or optical instruments.
We have encountered him before in the context of “Rayleigh scattering”, the original blue-
sky theory. He established a very simple criterion for when two spectral lines from a diffrac-
tion grating or diffraction maxima from e.g. circular apertures are marginally resolved. It is
this:
Week 13: Interference and Diffraction 591
Two lines are said to be marginally resolved if the principle maximum for one
line is outside of the first minimum of the other.
That’s it! Nothing to it. It is really slightly more general than this, however. We will
also use it below to determine whether two point-like images, when focussed on a screen
through a circular aperture, are marginally resolved, where instead of “lines” we simply talk
about the diffraction maxima of the dots, but the idea is exactly the same. For us to be
able to determine that there are two instead of one, they cannot overlap, and “overlap” is
defined to be the maximum of each further away than the first minimum of the other.
With that criterion in hand, we can talk about and derive the resolving power of a grating
and see how we can determine whether or not any given grating will be able to resolve any
given pair of closely spaced lines.
In order for our grating to resolve two lines the angular separation of their maxima has
to be larger than the angle of the first minimum of each maximum. That is:
λ
∆λmin = (13.57)
mN
for the order considered and if the two lines are separated by more than this spread, they
will be resolved.
There are other places in our daily lives where “diffraction gratings” can be observed.
CD or DVD ROMs, for example, consist of many “tracks” carved into a shiny reflective
plater and pitted by means of a laser to encode information. The reflective grooves behave
592 Week 13: Interference and Diffraction
just like multiple slits and split white light up into a veritable rainbow of colors when the
reflective grooved surface is viewed at various angles. There is no real color to the shiny
disk; all of the color arises from multiple slit interferences.
This same process works backwards, as well. A radio telescope is made out of a
regular array of antennae spread out in a two dimensional lattice. If we imagine all of the
antennae radiating coherently at the save frequency and wavelength, we expect the waves
they emit to only constructively interfere and hence radiate most of their energy along
certain directions. If we reverse this, however, by adjusting the phase of the signals picked
up by the antennae and combining them into one phase delayed superposition signal, we
can arrange it so that they only coherently receive from certain directions in the sky. In
fact, by appropriately sweeping the phase delays, we can sweek the telescope across the
sky and make a highly directional map of all of the radio signals emitted by the sun, by
stars, even by remote galaxies. We even expect resolution to improve as we increase the
number of antenna, in a way that should now be intuitively familiar.
Now, let us think about multiple slits and Huygens’ Principle. Huygens’ Principle states
that all of the points on a wavefront behave like coherent radiators, which sounds a lot like
what multiple slits that sample just some of those radiators do. The difference is that with a
wavefront, the number of coherent radiators has to go to infinity at the same time that the
distance between radiators has to go to zero at the same time the amplitude emitted by
each radiator (which we’ve been treating as a given constant for the many slit problems)
has to also go to zero, but in such a way that the total energy emerging from a piece of the
wavefront is conserved!
Handling all of this correctly lets us understand diffraction, the interference of a wave
that e.g. passes through a single slit with itself. Understanding diffraction is absolutely es-
sential to the understanding of the diffraction/wave based limitations of optical instruments
such as microscopes and telescopes. We begin by completely analyzing and solving for
the diffraction intensity produced by light passing through a single slit of width a > λ, in the
usual Fraunhofer approximation.
13.6: Diffraction
We have seen how coherent, monochromatic light passed through multiple slits, when it
recombines after traversing different path lengths, interferes – sometimes creates a wave
with an amplitude greater than that produced by a single slit, sometimes cancelling alto-
gether – and that this creates a modulation of the intensity observed on a distant screen,
basically transforming it into a pattern of light and dark bars (or something more complex
if we have sources more complicated than “slits”).
We have also seen that Huygens’ Principle tells us that every point on a wavefront of
an advancing wave behaves like a “source” for the future time evolution of the wavefront.
This suggest that we don’t need multiple slits in order to see a wave interfere – all we need
is one slit, but one that is wide enough that it contains “many” Huygens radiators in the
wavefronts that are incident upon it!
Calling this interference would be very confusing – one slit? two? ten? – so we
Week 13: Interference and Diffraction 593
introduce a new term to describe “interference” of a wave with itself, or the interference
patterns produced by very large numbers of slits/sources, so many that they form a near
continuum. We call this kind of phenomena diffraction, and speak of the diffraction of a
wave through a single slit, or the diffraction of a wave around an obstacle, or the diffraction
patterns produced on a screen or piece of film by light that passes through one or more
slits that are wide enough that the light that goes through them can interfere with itself.
λ P
a θ
a/2 sin θ
Figure 13.7: The geometry of single slit diffraction. Waves of some wavelength λ pass
through a slit of width a, where a is typically somewhat larger than λ (to get an “interesting”
diffraction pattern) and fall upon a screen under Fraunhofer conditions, where the screen
is distant compared to a and λ and roughly equidistant from the center of the slit
The geometry of diffraction is straightforward and is represented in figure 13.7. Note its
similarity to N slits – all of the N little round circles in the slit a represent Huygens radiators
on the wavefront there.
As before, we’ll assume that we have Fraunhofer conditions, so that the screen is far
(compared to a and λ) from the slits, and we’ll either ignore any radial variation in the field
strength with distance or imagine that the screen bends in a half cylinder around the center
of the slit. Note that we don’t have to do this – we could work all of this out (and in later
courses physics majors very likely will) but doing so doesn’t help you understand the basic
idea of diffraction itself so we won’t bother176 .
Locating maxima and minima – especially maxima – will prove more difficult for a single
slit (of width a) than it did for two or more very thin slits! Before we tackle actually solving
for the intensity in a formally justifiable way, let’s point out a couple of heuristic features
that will – for the most part – suffice to help us understand at least the gross features of
the diffraction pattern that results.
The first of these is the central maximum. At θ = 0, all the radiators in the slit are
basically equidistant from P and hence all of the coherent wavelets they spawn arrive in
176
We’ll also (as we’ve been doing) more or less ignore the vertical dimension of the slit (the one perpendic-
ular to the paper) even though that is itself a “slit” and hardly seems to be as negligible as we’ve been making
it out to be...
594 Week 13: Interference and Diffraction
phase in the middle. We use this middle point of complete constructive interference of all
of the Huygens radiators to define the peak amplitude and (time average) intensity of the
light in the diffraction pattern, E0 and I0 = 1/(2µ0 C)E02 respectively.
The second are the locations of the diffraction minima – angles at which the total am-
plitude and intensity are zero. We can find these using the following not-too-difficult mini-
argument.
Consider the two waves emerging from the two Huygens radiators portrayed above in figure
13.7 and proceeding to the point P . As shown, the wave from the lower slit arrives having
travelled a longer path, with a path difference of ∆r = a2 sin(θ).
We now apply the simple heuristic concept that served us well when we were trying to
understand the two-slit minimum. If this path difference contains exactly λ/2 (one half of a
wavelength) then the waves from these two particular radiators will cancel at P .
Now consider the second radiator down from the top. It also has a path difference of
a
2 sin(θ) compared to the radiator second down from the middle and these two cancel. The
third down from the top cancels the third down from the middle. In fact, every Huygens
radiator in the top half of the slit cancels the corresponding radiator a/2 beneath it in the
lower half of the slit. The field amplitude and intensity at P are zero (which is as low as
one can get), making
a λ
sin(θ) = , or
2 2
a sin(θ) = λ (13.58)
a sin(θ) = 2λ (13.59)
If we consider dividing the strip up into sixths, the condition a6 sin(θ) = λ/2 and the
exact same argument shows that a sin(θ) = 3λ is a minimum. If we divide it into eights we
get a sin(θ) = 4λ. Clearly we can continue indefinitely; the general rule for a minimum is:
where I’ve used ⊗ again to indicate that m = 0 is the principle maximum at the center, not
a minimum and so must be skipped.
Week 13: Interference and Diffraction 595
θ waves cancel at P
a/4
θ
a/4 waves cancel at P
a/4 sin θ = λ/2
a/4
a/4
Figure 13.8: The slit, with the Huygens radiators divided into four equal segments. Light
from the two pairs indicated cancels at P when the path difference a4 sin(θ) contains a half
of a wavelength, for all of the pairs that make up the slit.
Finally, we know that diffraction will be symmetric, so that we have minima at all of
the negative angles a sin(θ) = −mλ but as before we’ll manage this by hand to keep the
equation simple.
Alas, no such simple argument can be made in order to find the angles of the diffraction
maxima (except for the central principle maximum, already considered). We know there
must be maxima in between each of the minima above but we expect from our discussion
of N -slit interference that they won’t occur at any “simple” values of the phase angle φ any
more than they did at simple values of δ. We therefore abandon heuristics at this point
and proceed to solve for the exact diffraction intensity as a function of phase angle φ (and
hence θ, via the usual kind of inverse sines).
In figure 13.9 you can see a single slit with N radiators neatly drawn out. I chose N = 7
because it is enough to “cover” the slit without being so many that you can’t see what is
going on. In the end, of course, we will let N → ∞ so that we really cover the slit with a
continuum of radiators177 so no particular choice for N much matters.
We have to be able to “scale” the field result itself. After all, the light we shine on
the slit could be very intense or it could be weak. The slit could be large (letting a lot
of light through) or it could be very small (not letting a lot of light through). We need a
single parameter that indicates how strong the E-field is on the screen, or equivalently,
how intense. We choose to set E0 to the value of the E-field that makes it through the slit
to the screen in the center of the principle maximum at θ = 0. With this interpretation, it is
177
... or, if this were a course in optics being given to majors or folks with mad math skills, we’d just write an
integral for the field at an arbitrary P and not bother with all of this dividing up and summing...
596 Week 13: Interference and Diffraction
N = 7 radiators to P
E field strength
at screen is E 0 /N
per radiator
to center of screen
Path difference a/N sin θ per radiator
Figure 13.9: If we split the slit up into N radiators, the field amplitude at the maximum in
the center of the screen from each radiator is E0 /N , where E0 is the maximum amplitude
from the entire slit there. When we consider the waves emerging at an angle θ directed
towards point P , each radiator travels an additional distance of ∆r = Na sin(θ) compared
to the radiator immediately above it. Both of these relations scale with N , and hence will
be useful when we try to let N → ∞ and fill in the entire slit with radiators.
exactly like what we did for the interference of N “narrow” slits above. Indeed, at the end of
this topic we can go back and a posteriori formally justify our narrow slit results, and define
precisely just what “narrow” means!
If we split the slit up into N radiators, each with the same path length to the center
of the screen (in the Fraunhofer limit, recall), then from symmetry and superposition run
backwards each radiator must produce an individual E-field on the screen with strength
E0 /N . That way, no matter what N is, the superposition of the fields at the center will
remain equal to E0 , the measured/known/observed/assumed E-field there. As N gets
large, this field amplitude (per radiator) will get very small (but nonzero) but the larger
number of radiators will precisely compensate.
Next, let’s think about path differences and phase differences. Recall that a sin(theta)
is the total path difference to the point P between the wave from the (radiator at the) very
top of the slit and the wave from the (radiator at the) very bottom of the slit. In the figure
above, the top and bottom radiators aren’t, of course, precisely “at” the top and bottom of
the slits, but as we increase the number of radiators they will get closer and closer, and
any error we make in assuming that they are there already for a finite N will go away.
We therefore can split a sin(θ) up into N pieces, and make the path difference between
adjacent radiators Na sin(θ). A very astute student might observe that for the 7 slits above,
it really should be a6 sin(θ) (or rather, that our general rule should be Na−1 sin(θ) because
the top radiator is at “zero”) but in the limit N → ∞ we will make an error of order 1/N
using the first relation178 so we’ll just ignore it and use the first (easier) relation.
178
As you can easily see by doing the binomial expansion of a/(N − 1) = (a/N )(1 − 1/N )−1 , right...?
Week 13: Interference and Diffraction 597
Let’s turn this path difference between waves from adjacent radiators into a phase
difference between adjacent radiators (by multiplying it by k, as always). Recall that we
defined φ = ka sin(θ), so the phase difference between adjacent slits is just ∆φ = φ/N .
This phase difference accumulates as we count down the radiators from the top – the first
slit down has a phase difference of φ/N , the second has a phase difference of 2φ/N , the
third 3φ/N and so on.
The wave we have to sum – using our ever-so-useful phasors, of course – is then (for
N = 7):
E0 E0
Etot = sin(kr − ωt) + sin(kr − ωt + φ/N )
N N
E0 E0
+ sin(kr − ωt + 2φ/N ) + sin(kr − ωt + 3φ/N )
N N
E0 E0
+ sin(kr − ωt + 4φ/N ) + sin(kr − ωt + 5φ/N )
N N
E0
+ sin(kr − ωt + 6φ/N ) (13.61)
N
This is looking really tedious, and we’re only at N = 7. However, if we draw the phasor
diagram for this sum, it isn’t so bad:
E tot
E 0 /N
∆φ
Figure 13.10: The phasor diagram for N = 7 Huygens radiators distributed across a. The
amplitude of each radiator is E0 /N , and the phase ∆φ = φ/N accumulates.
The diagram in figure 13.10 (which we might have drawn for a 7-slit interference pat-
tern!) shows us that as long as ∆φ is small, the phasors gently arc up into what looks
almost like a smooth curve even for only N = 7. In a seven slit problem however, as we
increase θ then δ between two slits gets bigger and soon isn’t small at all – we expect to
get things like seven-pointed stars and so on that don’t at all look like a smooth curve.
In this case of a single slit, however, as we make φ large, we can make ∆φ as small as
we like by increasing N ! In fact, we can make it infinitesimally small, accumulating dφ as
we go around a smooth curve. We won’t actually do the following sums algebraically (so
don’t be intimidated by the notation) but we can in fact write the total field at the point P at
598 Week 13: Interference and Diffraction
N
X E0
Etot = lim sin(kr − ωt + iφ/N ) (13.62)
N →∞ N
i=0
φ /2
φ
r
r sin φ /2
r
E tot = 2 r sin φ /2
E0
Figure 13.11: The phasor diagram for N → ∞ Huygens radiators distributed across a. The
“phasor snake” bends smoothly around into a circular arc of length E0 , where we need to
determine the length of the secant that cuts across, Etot .
Almost all of our work has been done for us in this diagram! Let’s go over its features
and results so that you understand them as we derive our final result. Note that the length
of the arc is E0 (we are just “bending it around”, but all the superposition of all of the
amplitudes of the infinitesimal phasor chunks still has to add up to E0 ). The total phase
difference between (a tangent to) the beginning of the arc and (a tangent to) the end of
the arc is just φ, as illustrated with the lower φ angle. This same angle φ is the angle
subtended by the circular arc as illustrated at the top – you can “see” by noting that the two
r radii are perpendicular to the arc at both ends, so as we swing out the second r the angle
accumulated by the tangent at the bottom has to match the angle accumulated between
the radii. From this we see that the arc length E0 can be related to r by:
E0 = rφ (13.63)
179
Note that we are still ignoring that extra O(N ) term on the end as there are N + 1 terms in the sum.
180
Ideally a complex exponential integral. Who actually likes to integrate sines and cosines and remember
all of those silly sign change? eu du = eu , all we ever really need to know...
R
Week 13: Interference and Diffraction 599
If we drop a perpendicular bisector (dashed line) from the center of the circular arc to
the total field phasor Etot , we make two simple right triangles with vertex angle φ/2. The
opposite side of each of them has length r sin(φ/2) so that:
Finally, we go through the usual ritual to convert the field amplitudes to intensities:
1
I0 = E2 (13.66)
2µ0 c 0
so that: 2
1 2 1 sin(φ/2)
Itot = Etot = E2 (13.67)
2µ0 c 2µ0 c 0 φ/2
or 2
sin(φ/2)
Itot (θ) = I0 . (13.68)
φ/2
This is what we have been trying to get – an exact formula for the intensity of the diffraction
pattern as a function of θ (yes, it is actually given as a function of φ but recall that φ =
ka sin(θ) so we also know it as a function of θ, at the expense of a little extra (and tedious,
admittedly) arithmetic. But arithmetic isn’t tedious to humans any more as long as an
equation can be programmed into a computer, and this one is easy to code.
At a glance, this equation has all of the right features. At θ = 0 (and hence φ = 0) we
get an intensity of I0 181 . At all the other places where sin(φ/2) = 0, we get a minimum.
This occurs when:
φ πa
= sin(θ) = π, 2π, 3π... (13.69)
2 λ
or when:
a sin(θ) = mλ m = ⊗, 1, 2, 3, ... (13.70)
dItot
=0 (13.71)
dφ
and which aren’t the minima (which will also occur, recall, at the zeros in the slope of
the intensity). Physics majors and advanced students will enjoy this exercise in calculus,
181
We avoid the problem of “division by zero” calculus-fashion by taking the limit
which leads one to the relatively simple result that maxima occur when the transcendental
equation182
φ φ
= tan (13.72)
2 2
is satisfied. If one plots φ/2 and tan(φ/2) simultaneously on a single set of axes, the
intersections of the two lines are the relevant zeros. As one can see (once one does this)
the maxima occur at angles close to (and just before) the condition(s):
E0
Principle Maximum
Minima
Secondary Maxima
Figure 13.12: Phasor diagrams representing successive minima and maxima for single slit
diffraction.
In figure 13.12 the principle maximum (of length E0 is illustrated for angle φ = 0. The
next two phasors show the (exact) conditions for minima, where E0 is wrapped first one
time around φ = 2π or twice around φ = 4π. Note that the diameter of the circle has to
get smaller as one wraps more than once! The secondary maxima are now easy enough
to understand. We don’t get one at φ = π because we are still between the principle
maximum and the first minimum, there is no maximum here. At φ = 3π/2 (dashed circle
and arrow) we can gain a tiny bit of length by rolling the circle back to a slightly larger
diameter, ditto at φ = 5π/2, although both of these figures are probably a bit exaggerated.
It is now time to put it all together with a few examples.
To draw the semiquantitatively correct I(θ) for a single slit, we must capture its features –
both those we can compute or discover exactly as well as those that we can only guess at
short of plotting the exact result. We’ll find it a lot easier to plot not I(θ) but I(sin(θ)), so
182
Wikipedia: http://www.wikipedia.org/wiki/Transcendental Equation.
Week 13: Interference and Diffraction 601
much so that I’m going to focus on this in the example. Note well that all we have to do to
convert to or plot in terms of θ is take the inverse sines of the points we obtain.
We have seen above that we can exactly locate the principle maximum and the minima.
We cannot exactly locate the secondary maxima, but we can guess their approximate
location as roughly halfway between the minima in our drawing. Similarly, we can’t exactly
determine the intensity of the secondary maxima, but we do know that they have to get
smaller as we increase their order, quite rapidly.
To facilitate drawing a graph with these features, we therefore begin by locating the
minima:
a sin(θm ) = mλ
4λ sin(θm ) = mλ
m
sin(θm ) =
4 m
θm = sin−1 (13.74)
4
Let’s arrange these for the values of m for which the inverse sine exists in a table. All
angles are in radians.Don’t forget to skip m = 0, the principle maximum!
m sin(θm ) θm
1 −1 1
1 4 sin 4 = 0.25268
2 −1 1
2 4 sin 2 = 0.52360
3 −1 3
3 4 sin 4 = 0.84806
4
4 4 sin−1 (1) = 1.00000
We see that it is a lot easier to draw the plot in terms of the regular sin(θm ) than it is
in terms of θm . Of course, the latter is a lot more useful. Oh well, such is life. You should
be able to do whichever one a problem requests on the homework or a quiz or exam. One
reason I often accept results plotted in terms of sin(θm ) is that one doesn’t usually need a
calculator to do a decent job.
We are now ready to consider two slits of finite width. The result is very simple. We get
interference maxima and minima at exactly the same angles we got them for very narrow
slits. However, the field strength at those angles is modulated by the diffraction of the field
through the individual slits. As a result, the field we observe as an angle of θ is the product
of the field expressions for interference and diffraction:
sin(φ/2)
Etot (θ) = 2E0 cos(δ/2) (13.75)
φ/2
Following the usual procedure (using the time average Poynting vector and relation
602 Week 13: Interference and Diffraction
I0
I0
−π/2 π/2 θ
Figure 13.13: Typical graphs of the diffraction intensity from a single slit of width a = 4λ.
Note the distortion of the horizontal scale by the inverse sine in the lower graph – the top
graph is much easier to draw and requires no calculator.
Nothing to it. Note well that as always, δ = kd sin(θ) and φ = ka sin(θ), so this is an indirect
function of θ linked by inverse sines.
We proceed exactly the same way we did for the previous example, except now we add
two more tables: The angles of the interference maxima and the interference minima. We
find these (as usual) from:
mλ m
sin(θm ) = = (13.77)
d 8
for maxima and
(2m + 1)λ 2m + 1
sin(θm ) = = (13.78)
2d 16
for minima. The result is displayed in table 10. Using these numbers we can easily enough
construct a combined interference/diffraction pattern, displayed in figure 13.14. For sim-
plicity I only present the graph for sin(theta) – you can easily visualize or fill in a graph as
a function of θ using the previous example as a guide to the distortion (or a piece of paper
with an accurate graph scale on it). Note well the “squashed” interference that occur where
there are diffraction minima. This illustrates a simple rule – when one of the two functions
in the product above in Itot are zero, zero wins!
Problems like this are graded on the basis of whether or not they contain the essential
features illustrated herein. The various min’s and max’s should be correctly tablulated and
Week 13: Interference and Diffraction 603
4I 0
located approximately correctly on the graph. The diffraction envelope should be qualita-
tively as shown, and the interference pattern should be drawn “under” it. If max’s and min’s
occur at the same angle, the minimum wins. The maximum central intensity should be 4I0 ,
where I0 is the central intensity produced by a single slit.
Nothing to it!
Finally we are ready to understand how the use of waves with a finite (non-zero) wavelength
affects things like vision and optical instrumentation. To start with, I have to give you a “true
fact” concerning diffraction through a circular aperture of radius D – something that can
be derived but that I won’t derive just now in this work for you. It’s not that the derivation
is incredibly difficult or exotic – it proceeds more or less along the lines we’ve just used for
single slit diffraction – it just is easiest to obtain using integration (which we avoided) and
complex variables instead of phasors per se (which we have also mostly avoided).
In a nutshell, to obtain the result one has to do an integration in a sensible coordinate
system (e.g. cylindrical coordinates) that sums up the differential electric field radiated
from every point on the “disk” of Huygens radiators in the circular aperture, including their
phase difference due to the path difference to an arbitrary point on the screen a distance
Z away from the center of the aperture. To some people183 this sounds like a really good
time, but I’m guessing that for most students using this text it sounds like a still better time
to not actually do it and hence you’re inclined to forgive me for presenting something you
actually have to just memorize/learn.
183
Mostly physics or math majors or other mathochists, granted...
604 Week 13: Interference and Diffraction
Diffraction Minima
m sin(θm ) θm
1 −1 1
1 4 sin = 0.25268
4
2 −1 1
2 4 sin = 0.52360
2
3 −1 3
3 4 sin 4= 0.84806
4 −1
4 4 sin (1) = 1.57079
Interference Maxima
m sin(θm ) θm
−1
0 0.0 sin (0.0) = 0.00000
1
sin−1 18 = 0.12532
1 8
2
sin−1 14 = 0.25268
2 8
3
sin−1 38 = 0.38439
3 8
4
sin−1 12 = 0.52360
4 8
5
sin−1 58 = 0.67513
5 8
6
sin−1 34 = 0.84806
6 8
7
sin−1 78 = 0.94843
7 8
8
8 8 sin−1 (1) = 1.57079
Interference Minima
m sin(θm ) θm
1 1
sin−1
0 16 16 = 0.62540
3 3
1 16 sin−1 16 = 0.18862
5 5
2 16 sin−1 16 = 0.31782
7 7
3 16 sin−1 16 = 0.45282
9 9
4 16 sin−1 16 = 0.59741
11 11
5 16 sin−1 16 = 0.75804
13 13
6 16 sin−1 16 = 0.94843
15 15
7 16 sin−1 16 = 1.21538
Table 10: Diffraction minima, interference maxima, and interference minima for a single slit
of width a = 4λ.
That true fact is this. The diffraction pattern produced on the screen by a circular
aperture is itself a cylindrically symmetric “circle” of light, surrounded by alternating, ever
fainter, rings of darkness (where destructive interference causes the total wave to cancel)
and light (where partially constructive interference causes the total wave to peak, although
never at the intensity seen in the central maximum). In fact, the generic shape of the
diffraction pattern is much the same as that for a slit, only it is cylindrically symmetric
instead of itself being a slit shaped bar with alternating bars of light and dark on the side.
In this diffraction pattern the first minimum (the dark ring surrounding the bright(est) central
maximum occurs at the angle given by:
Note that this is almost like the rule for the slit, a sin(θmin ) = λ, except that we no longer
get a pretty integer on the right and on the left we have the diameter of the aperture, not
its short-direction width. It certainly makes dimensional sense.
Week 13: Interference and Diffraction 605
Now consider viewing very distant, point-like objects through a circular aperture. I
prefer to think of viewing stars, for example, as they are very distant indeed and appear to
the eye as mere points of light in the sky, through the aperture of your pupil, or the lens of
a camera, or the lens of a telescope – it doesn’t really matter what the aperture is as long
as it is circular and symmetric.
The occurence of a lens in the aperture doesn’t affect the diffraction – every ray gets
bent by the lens to be focussed on the screen according to the angles in the diffraction
patter, so the point-like object is focussed down not to a point, but to a circular dot. The
size of the dot is basically determined by the angle of the first diffraction minimum, with
smaller wavelengths being better resolved. Indeed, everything we learned in geometric
optics, where source points on the object were mapped directly to image points by the
lens, is what true physical optics predicts in the limit of infinitely short wavelengths (or
more practically, wavelengths that are “infinitely” short compared to the aperture or length
scales of the imaging apparatus)184 .
We can then ask: Suppose we are photographing a section of sky with our telescope
and see a large, slightly asymmetric blob of “white” on our photograph corresponding to a
light source in the sky. Is that blob the image of one object, or two? That is, is the source
made up of the light from two objects (e.g. stars) or is it a slightly asymmetric single object
(e.g. a lenticular galaxy)? Time to return to Rayleigh’s Criterion for Resolution!
We can easily compute the capability of our telescope to resolve two objects that have
a very small angle in between them using this criterion. Basically, if the peak produced by
one object (center of the illuminated area on the film or charge-coupled device (CCD)185
is separated from the other by at least the angle of the first diffraction minimum of the
other, we can consider the two objects marginally resolved. This criterion depends on
wavelength, and we intuitively expect our resolution to be better with e.g. blue or violet
light than with red light186
The critical angle – which is certain to be a very small angle for any macroscopic
aperture and optical frequency light – defining the diffraction resolution limit of an optical
instrument is thus:
1.22λ
θc ≈ sin(θc ) = (13.80)
D
Two stars with an angular separation greater than this critical angle will be clearly resolved
on the film (assuming that the image is otherwise focussed on the film or CCD).
184
This is actually a very important result, one worth reinforcing for possible math or physics majors. Geo-
metric optics is the small wavelength limit of physical (wave) optics. Similarly, classical mechanics is the small
wavelength limit of quantum (wave) mechanics! This answers one of the most important of questions from
the Enlightenment – how light can behave like a particle (geometric) and wave (physical) at the same time,
and extends it with the surprising result that microscopic objects like electrons and protons behave exactly the
same way, with the same kind of schizophrenia producing particle-like behavior in one context or measurement
apparatus, wave-like behavior in another.
185
Wikipedia: http://www.wikipedia.org/wiki/Charge Coupled Device. A CCD is basically the “electronic film”
used in digital cameras, consisting of a fine-mesh grid of photosensitive electrical units
186
This same intuition has driven the invention of e.g. “blue ray” DVD formats that hold more information.
Blue light has roughly half the wavelength of red light, so one can store roughly 4x as much information
at the diffraction limit of resolution of blue light on disks compared to red. DVDs based on hard ultraviolet
(λ ∼ 100 − 200 nm) would hold a factor of 4 to 16 more data, and I’m quite certain that the minute I finish
buying lots of blue-based movies UV DVD will be trotted out to replace it all yet again, this time on tiny DVDs...
606 Week 13: Interference and Diffraction
The same is true for two tiny features inside a bacteria or almost any two source ob-
jects imaged through a circular aperture. The central rays from object to image must be
separated by more than 1.22λ/D or the two images will blur into one.
Imaging nearly anything gets dicey when the objects themselves are the order of a
wavelength in size or smaller. If you have ever seen water waves striking a pier support that
is much smaller than a wavelength you know that they swirl right around it and recombine
on the far side. A short distance away from the pier there is little sign in the shape of the
wavefronts that there was a pier there at all. In order to reflect a wave or obstruct a wave,
an object needs to be (ideally much) bigger than the wavelength of the wave.
Practically speaking, it is very difficult to create viewable images of objects much
smaller than a half a micron using visible light. Bacteria are thus visible through a visible
light microscope, but structures in or on the bacteria are not. Only the largest of viruses
are visible with visible light.
To see objects smaller than the wavelength of visible light, one needs a wave with a
smaller wavelength. Electron microscopes use electron “waves” to see objects as small as
5 nm – small enough to see most viruses in considerable (beautiful) detail187
We can see that physicians and physicists alike need to have a fairly clear idea of
the role that waves play in the formation of the magnified images that permit us to see
the very small or the very far away. It is quite easy to build microscopes and telescopes
for which diffraction, wave interference and things like chromatic distortion are the limiting
factors that prevent us from being able to see further, smaller, better. Even if you will never
actively design a microscope or telescope, understanding their limitations will make you a
better consumer of the information that they can provide.
Observing interference from slits thick or thin, at optical frequencies, is a bit of a rarity
in everyday life. We just don’t trip over visible light travelling through multiple pathways
within the coherence length of the light to reach a common goal every day, given that the
coherence length of light from hot/chaotic sources is the order of a few microns (tens to
perhaps a hundred wavelengths). Exceptions do include – for a few people – diffraction
limited viewing through visible light telescopes and microscopes, discussed above, or peo-
ple who use spectrographs based on diffraction gratings. Well, I suppose I should include
the rainbow of colors one can see on the bottom of CDs or DVDs, which are basically
reflection-based diffraction gratings as light bounces off of the many tiny tracks scored in
the reflective surfaces – now that is an everyday experience but it hasn’t always been so.
Thin film interference, however, is something that we might well observe every day, or
nearly so. Every time we blow a soap bubble, or see a slick of oil or gasoline on water,
swirling around with many colors, we are observing thin film interference. Whenever we
look at the lens of a camera and see a lack of reflections or those same “metallic” colors,
187
Wikipedia: http://www.wikipedia.org/wiki/Virus. This article has some lovely transmission electron micro-
graphs of viruses, revealing detail that would be completely invisible to the eye even with the aid of a powerful
visible light microscope.
Week 13: Interference and Diffraction 607
π
n1
d n2
π
n3
n1 < n 2 < n 3
(Two phase shifts of π)
Figure 13.15: One of the two basic diagrams for thin film interference. The total phase
difference in the superposed reflected waves in the case n1 < n2 < n3 or n3 < n2 < n1 is
just δ = k′ (2d), as the phase shifts produced by reflecting off of the two surfaces are either
both zero or both (as they are in this case) π, in which case they cancel.
π n1
d n2
n3
n1 < n 2 > n 3
(One phase shift of π)
Figure 13.16: The second of the two basic diagrams for thin film interference. The total
phase difference in the superposed reflected waves in the case n1 < n2 > n3 or n3 < n2 >
n1 is δ = k′ (2d) + π, as there is a phase shift of π produced by reflecting off of the surface
of a material with a higher index of refraction only one one of the two surfaces..
we are seeing thin film interference. Thin film interference gives color and life to ornaments
and has various other technological or social applications, even if those who observe it
don’t realize what it is.
We’d like to understand it and learn to recognize it and see one or two of its applications.
Fortunately, it is (at this point) quite simple. Here’s the idea.
In figures 13.15 and 13.16 a thin film of transparent material sits in between two other
transparent materials. Each material has its own index of refraction, and we will for the
608 Week 13: Interference and Diffraction
moment use the convention that n1 is the index of refraction of the material the light is
coming from, n2 is the index of the thin film itself, and n3 is the index of the material the
light is going to.
Incident light (often white light, a mixture of all the visible colors/wavelengths) is incident
approximately “normally” onto (coming in perpendicular to) the surface between n1 and n2 .
Some fraction of this light reflects off of the interface; the rest is transmitted into n2 . Of the
light that makes it into n2 and then is incident normally on the interface between n2 and n3 .
Again, some fraction is reflected and some is transmitted. Finally, the light that is reflected
back up arrives at the interface between n1 and n2 a second time, this time coming from
below, and a fraction of it is transmitted back into medium n1 , where the electromagnetic
wave combines with the original reflected wave.
The interference we observe thus comes from adding two waves:
where (as we will see below) there is a chance of a phase shift occurring in both reflected
waves compared to the phase of the incoming wave. Note also that it is almost certain
that E12 6= E23 , that is, the two reflected waves will very likely have somewhat different
amplitudes as they recombine.
Presuming that these two waves have at least approximately equal field amplitudes
and a consistent phase difference brought about at least partly by path difference (the
wave that traverses the film twice travels a distance 2d farther than the wave that reflects
of of the first surface), this superposition will partially cancel or partially add the waves
for different wavelengths. Some wavelengths will be brightened, others diminished. The
reflected white light will therefore take on those characteristic mauves and greens and
poisonous shiny blues that are familiar to us all.
Of course, there are a few details we have to consider, and they are important; they are
why we need two figures (and two phase shifts) to demonstrate two of the four possible
patterns of sort order of the indices of refraction. In a nutshell, two things contribute to
the overall phase shift between the recombined waves – the phase shift due to the path
difference in the medium n2 and a phase shift caused by reflecting off of a medium with
a higher index of refraction! Let’s begin by working out the former, as that is easiest, and
then we’ll talk extensively about the latter, as the phase shifts due to reflection off of the
surfaces themselves will require us to go back to our intro physics 1 course and recall e.g.
the reflection of waves on strings off of interfaces between a light string (where the speed
of the wave is large) and a heavy string (where the speed of the wave is less).
This one, as promised, is easy. The wave that traverses the thin film (twice!) goes an
additional distance ∆r = 2d compared to the wave that reflects off of the upper surface.
We are thus tempted to (after “reflection”188 on what we have learned so far) to associate
with this path difference an additional phase δpath = k(2d).
188
Har, har...
Week 13: Interference and Diffraction 609
As it turns out, this heuristic guess is almost correct! But as the saying goes, “almost”
only counts in horseshoes and hand grenades189. The problem is that the path difference
accumulates while the wave is in the thin film! To get the phase difference right, then, we
have to use the wavelength (and hence wave number) in the thin film medium n2 , not the
one we used in the originating medium n1 , or worse, the one that the light would have in a
vacuum!
You should recall that:
λ
λ2 = (13.82)
n2
where λ is the wavelength of the light in a vacuum. This leads to a wavenumber of:
2πn2
k2 = (13.83)
λ
and a phase shift of:
δpath = k2 (2d) (13.84)
Basically, the wave that traverses the thin film accumulates phase at the spatial rate of k2 ,
not k, k1 , or k3.
Using k instead of k2 is a very common mistake made by students of physics!
Don’t let it be you!
Next, let’s examine the phase shifts due to the actual reflections themselves.
As you should remember from the treatment of waves in the first half of this course (see
my 190 book online if all of this eludes you.), a wave pulse on a string that partially reflects
off of the junction with a heavier string (slower speed) flips over, where a wave pulse on
a heavier string that partially reflects of off the junction with a lighter one does not. The
transmitted wave pulse in both cases does not flip.
Exactly the same thing happens for harmonic wave trains or wave pulses in the case of
light. If a harmonic light wave reflects off of a denser medium (which usually has a higher
index of refraction and a slower velocity of light) the reflected wave inverts. Inversion is
basically multiplication by a minus sign, or equivalently (for harmonic waves) shifting the
phase of the reflected wave by π or the heuristic equivalent half-wavelength. If a harmonic
light wave reflects off of a lighter medium (lower index of refraction) the reflected wave does
not flip, it retains it’s original phase.
There are thus four permutations of sort order for the indices of refraction n1 , n2 , n3 .
They are:
I strongly recommend that when you solve a problem involving thin film interference,
you circle the reflections that have a phase shift δij = π and write a little “π” next to each
one, as I did in figures ?? and 13.16 above. Then you are less likely to forget to include
189
...and possibly even other things that begin with ‘h’, such as hydrogen bombs. Being “almost” hit by a
hydrogen bomb can ruin your whole day...
190
http://www.phy.duke.edu/ rgb/Class/intro physics 1.php Introductory Physics 1
610 Week 13: Interference and Diffraction
Table 11: Relative phase shift introduced between the wave reflected off of the n1 → n2
interface and the transmitted wave reflected off of the n2 → n3 interface. Note that in the
first two cases (smoothly increasing or decreasing n) there is no net phase shift with n2
“in the middle”. In the second two cases, the index of refraction of the thin film medium is
either higher than that of its neighbers or lower, but not in the middle.
it in your overall computation and understanding of the total relative phase shift. Leaving
out one or more of these phase shifts (and getting the max’s and min’s backwards as a
result) is another common error. Don’t do it!
Now we are ready to put all of this together and and determine the heuristic conditions
for maxima and minima. We’ll do this twice, once for each of the two “opposite” rules one
gets for max’s and min’s.
Consider the case where δ12 = δ23 = 0 or π. In both of these cases there is no relative
phase shift due to the reflections. Either both waves flip (and hence accumulate phase
difference only due to the path difference) or neither wave flips (ditto). Either way, the total
relative phase shift δ is just due to the path difference:
2πn2 4πn2 d
δ = k2 (2d) = (2d) = (13.85)
λ λ
We can now use our simple heuristic rules for max’s and min’s: If the path difference
is an integer number of wavelengths λ2 in the thin film, then we expect the two waves to
recombine in phase and while the resultant amplitude may not be twice either of the two
waves, it will certainly be larger than either one alone. Similarly, if it is an odd-half integer
number of wavelengths in the film, we expect the waves to be exactly out of phase and to
maximally cancel. We’ll summarize this as:
λ
2d = mλ2 = m m = 0, 1, 2... maxima (13.86)
n2
2m + 1 (2m + 1) λ
2d = λ2 = m = 0, 1, 2... minima (13.87)
2 2 n2
Of course, this is only heuristic. The “correct” way to arrive at the same place is to set
δ to 0, 2π, 4π... for constructive interference and to π, 3π, 5π... for destructive interference.
It is left as a fairly simple (and hopefully by now, familiar) exercise for the student to show
that if you do this, you arrive precisely at our heuristic rules.
Week 13: Interference and Diffraction 611
Consider the cases where either δ12 or δ23 is π and the other is 0. In both of these cases
there is a relative phase shift due to the reflections. One of the two waves flips (and hence
“suddenly” accumulate an additional phase of π and the other does not. No matter which
wave flips the total relative phase shift δ must add or subtract this relative phase to the one
from the path difference:
2πn2 4πn2 d
δ = k2 (2d) = (2d) ± π = ±π (13.88)
λ λ
Note that the sign we get differ depending on which one flipped. However, we don’t
really care which sign we get. This is because sin(θ + π) = sin(θ − π) = − sin(theta), so
we can simply move a π with either sign to whatever side of the equals sign that seems
convenient to us. In order to get the best correspondance with our heuristic rules, we
should probably use the minus sign no matter which one flipped (which I just proved that
we can do):
2πn2 4πn2 d
δ = k2 (2d) = (2d) − π = −π (13.89)
λ λ
That will let us move it over onto the same side as the other π’s with a plus sign later.
The heuristic rules for max’s and min’s, are now exactly the opposite of the ones above:
2m + 1 (2m + 1) λ
2d = λ2 = m = 0, 1, 2... maxima (13.90)
2 2 n2
λ
2d = mλ2 = m m = 0, 1, 2... minima (13.91)
n2
This is because the extra phase shift of π or minus sign in the wave corresponds to exactly
half of a wavelength path difference in the medium, just enough to make the two rules swap
places. In words, if the path difference contains an odd-half integer number of wavelengths
in the medium, the phase shift of π at the surface contributes the equivalent of another
half wavelength and the waves will recombine constructively in phase. Similarly, if the
path difference in the medium contains an integer number of wavelengths, the extra phase
shift puts them back exactly out of phase for (maximally) destructive interference and a
minimum.
Again, the “correct” way to arrive at this heuristic is to set δ to 0, 2π, 4π... for constructive
interference and to π, 3π, 5π... for destructive interference. The extra factor of π is there,
ready to be moved to the other side with whatever sign that pleases you. Again, a diligent
student should verify that this leads straight to the heuristic rules.
The occurrence of discrete phase shifts of π upon reflection from none, one, or both sur-
faces has one easily observable consequence. A very thin film, one that is much thinner
than a wavelength (d ≪ λ) will have no phase shift from path difference, as the film isn’t
thick enough. The only shifts that matter, then, are those that arise from the inversions
reflecting off of a higher-n interface. There are as before only two combinations that matter
– no relative reflection shift or a relative reflection shift of ±π.
612 Week 13: Interference and Diffraction
In the former case (two shifts or no shift’s, no relative shift), light reflected from the
upper and lower surface emerge in phase for all wavelengths! The surface becomes shiny
white, even mirror-like.
In the latter case (one shift in either order), light comes off of the surfaces almost exactly
out of phase for all wavelengths, and destructive interference results. Light is not reflected
from the surface; it becomes extremely transparent.
Whether or not you know it, you have probably observed concrete examples of both of
these limits. For example, a drop of oil or gasoline that falls onto a rain puddle over black
pavement instantly spreads out and forms a thin film. We have all seen the initial rainbow
swirl of strange “metallic” colors, followed by the surface becoming shiny and grey. What
one is seeing is the oil forming a layer on top of water with the order of indices of refraction
nair < noil < nwater .
A second “experiment” – one that is greatly enjoyed by physics students the world over,
including very young ones – is to blow soap bubbles191 . All of us are familiar with the
swirl of colors seen in the reflections from these spherical balls of thin soap film, and at
this point you should understand that colors are the results of the enhancement of some
wavelengths of light in the visible band and diminishment of others, constantly varying as
the soap swirls around in the film (and the film thickness changes minutely) and as the
angle of incidence and reflection of the light is varied by perspective.
If you blow a nice, big bubble that just hangs there for a time on a still day, supported by
the slight buoyancy of the warm air of the breath with which you blew it, you will probably
observe the following, although how successful you are may depend on the particular mix
of soap you are using (some soap mixtures ‘pop’ more quickly than others).
As you watch, the color swirl will settle down and become colored not-quite rainbow
like rings concentric around the vertical axis, and concentrated in the bottom half of the
bubble. You may see several sets of rings at some point. What is happening is that the
bubble soap is sinking under the influence of gravity and “bulging” the film at the bottom
and thinning it out on top. At the same time, of course, the film is evaporating – getting
thinner as the water molecules in the film thermally bounce free.
On the top, a curious thing happens. The film stops exhibiting color at all – it becomes
completely transparent! In fact, as the water evaporates, the entire bubble may become
almost completely invisible, revealed only by a hint of distortion at the outside edge of the
sphere and an almost invisible tracing of lines where the soap is ever so slightly thicker
and holding the bubble together.
This transparency is caused, as noted above, but light reflecting off of the first surface
with a phase shift of π (functionally, a half of a wavelength) and reflecting off of the second
surface with no phase shift. Once the film is much thinner than a wavelength, light in all
wavelengths thus recombines destructively, largely cancelling the reflected wave. Light
that isn’t reflected is transmitted; hence the soap bubble becomes transparent.
This trick is used to advantage to make advanced optical coatings for e.g. binoculars,
191
That’s right, this is an assignment! Go down to the store and get a bottle of bubble soap in any size that
suits you. Blow bubbles, the bigger the better, ideally on a still, quiet, warm day where you get good ‘hang
time’...
Week 13: Interference and Diffraction 613
telescopes, microscopes, and other optical instruments. By covering the outer surface of
the primary lens with a thin (< 100 nm) coating with a higher index of refraction than the
glass, destructive interference in all visible wavelengths is assured, resulting in a lens that
maximizes light transmission. High quality coated optics deliver 90+% of the light that is
incident on them to the eye of the observer, which makes a big difference when compared
to expected reflection/transmission intensities for the glass-air interface alone192.
192
In my online book Classical Electrodynamics II I derive the transmission coefficient
4n1 n2
T =
(n1 + n2 )2
for normal reflection. This is the fraction of intensity that is transmitted at an interface between two otherwise
perfectly transparent media with differing indices of refraction. We omit discussing transmission and reflection
coefficients in this book because they are too difficult to derive or handwave, arising from solving the boundary
value problem on the surface between the two media.
However, for air (na ≈ 1) and glass (ng ≈ 3/2) the expected transmitted fraction of the intensity from each
air-glass surface (in either direction) is thus T = 0.96. For four surfaces (two lenses), this means that only 85%
of the light makes it through to the eye, less if there are additional reflecting surfaces or lenses in the optical
path, less still from filters or absorption by the glass (which is small but not zero). Coating can increase the
transmitted fraction to 0.98-0.99 (per surface) and thus transmit an easy 10% more light.
614 Week 13: Interference and Diffraction
Problem 1.
Physics Concepts
Make this week’s physics concepts summary as you work all of the problems in this
week’s assignment. Be sure to cross-reference each concept in the summary to the prob-
lem(s) they were key to. Do the work carefully enough that you can (after it has been
handed in and graded) punch it and add it to a three ring binder for review and study come
finals!
Problem 2.
Derive the intensity as a function of θ for the two-slit problem (where the slits are assumed
to be a ≪ λ in width). For d = 4λ, find the angles where the intensity is maximum and
minimum. Sketch the interference pattern from θ ∈ [−π/2, π/2].
Problem 3.
Redo problem 2, but this time assume that the slits have a finite width of a = 3λ and that
d = 6λ. Determine all of the interference and diffraction minima and maxima (the latter can
be approximate for diffraction) and sketch a qualitatively correct picture of the interference
pattern underneath the diffraction envelope.
Problem 4.
There are four permutations of results for thin film interference based on the relative sizes
of n1 , n2 and n3 where n2 is the index of refraction of the thin film itself and the others are
the index of refraction of the first (originating medium) and third layers. Derive the condition
(relation between t the thickness of the film and λ0 the wavelength of the incident light in a
vacuum) for interference maxima and minima for all four orders. Be sure to circle on your
figures the reflections at surfaces that are accompanied by a discrete phase shift of π.
Problem 5.
Draw the phasor diagrams from which the angles at which primary and secondary maxima
and minima occur for five small (a ≪ λ slits separated by a distance d. From these dia-
grams write the conditions on δ = kd sin θ such that maxima and minima occur. Find the
Week 13: Interference and Diffraction 615
actual angles theta for d = 4λ, graph the intensity, and compare it to the answer to problem
1 above.
Problem 6.
Joe Braggart claims to have really, really good vision. “Why,” he says. “My vision is so
good I can make out the Galilean moons of Jupiter with my naked eyes on a really clear
night. If I’d been around at the time of Galileo we wouldn’t have had to invent the telescope
in order to confirm the Copernican theory.”
Callisto is the moon with the largest orbit and has a maximum distance from Jupiter
of just under 2 × 106 kilometers. At its closest point to the earth, it is around 600 × 106
kilometers away. Assuming that he is using visible light, is there a chance that he’s telling
the truth? Note well: This is a problem on resolution, not lenses or the sensitivity of the
retina, so the determine whether or not Jupiter and its moon are resolved by the human
eye at this distance.
Problem 7.
Derive the intensity as a function of θ for the single slit problem. For a = 3λ, find the angles
where the intensity is a minimum. Sketch the diffraction pattern from θ ∈ [−π/2, π/2]. If
you prefer, you can solve for the sines of the angles and sketch the diffraction pattern from
sin(theta) ∈ [−1, 1] instead.
Advanced Problem 8.
From your algebraic answer to the previous problem, obtain an expression for the angles
where diffraction maxima occur. You might find the following useful:
d f2 df
= 2f
dx dx
which has zeros both where f = 0 (the minima, except for the one at θ = 0) and where
df
dx = 0 independently. Also recall from the footnote in the text above that:
sin(x)
lim =1
x→0 x
and hence is not “undefined”.
Advanced Problem 9.
λ
Derive the expression R = mN = ∆λ for resolution for a diffraction grating with N slits
of separation d. This proceeds as follows: First use a phasor diagram to determine the
angle(s) where the principle maxima occur. Then use it to find the angles where the first
616 Week 13: Interference and Diffraction
minimum following such a maximum occurs for any given order m. This tells you the
angular half-width of the maximum for a given λ. Use Raleigh’s criterion for resolution to
determine the minimum ∆λ that can be resolved (consider λ′ = λ + ∆λ), and verify the
expression above.