7-dom-querying
7-dom-querying
>> Evgenii Ray: So what is the DOM API? So the DOM API is a set of methods that we
can utilize to manipulate a DOM. And usually, we don't use it directly because most
of the libraries are handling the DOM mutations for us, for instance, React,
Angular or Vue. But there are cases where we actually need to use the DOM API.
[00:00:21]
And one of that is when you're building some low-level library that works with
virtualization or some DOM management. Also, it's used when you need to create some
generic components that can be ported to any library or framework. It can be a
video player or a chart engine. And also, for instance, you are building your
portfolio website, and you want to minimize the blueprint of your application by
not importing too many libraries.
[00:00:51]
So this way you can just use the DOM API and your app will be lightweight. So this
section will give us a quick refresher on how to use the DOM API. So first, let's
overview the objects that are available to us and how can we use the DOM API?
[00:01:11]
So the first global object that we are all aware of is the window. The window is
the globally accessible object, so you don't need to import anything. The next one
is the document, the document who presents HTML Element, basically, the page that
we're currently rendering. And the document can be accessed for the Window or
directly by referencing this as a Global Object.
[00:01:40]
Then we have some quick access links that we can use on the Document. It's a body
and head gives us the body check and head respectively. Now let's go for the Class
Hierarchy. All the elements on the HTML page are represented by HTMLElement. But
apparently the HTMLElement doesn't have any DOM API assigned to that.
[00:02:06]
The DOM API is actually assigned to the element prototype. So why is that? Because
the DOM API can actually work not with just HTML. So it can be also SVG elements or
XML-like things. That's why the DOM API lives on the element prototype instead of
HTMLElement. And we also have the Node.
[00:02:29]
The DOM is a tree and somehow, we need to access the tree properties. So that's why
we have a special class that's designed to represent a tree leaf, and this is the
Node. And we have exception, which is a TextNode, which basically extends the Node
plus. The TextNode is used when we use any text, then we need to wrap our text in
the HTMLElement.
[00:02:54]
So the browser needs to somehow to represent the text that we place in our in our
text. Then we have the HTMLDocument and HTMLDocument is outlier here because it has
the DOM API assigned to its prototype, but it extends the Node. I don't know the
reason why, but this is the exception in the whole hierarchy that we need to live
with.
[00:03:17]
And the next one is the Window. Basically, the Window has an instance of
HTMLDocument and it doesn't have any interesting properties on its prototype, so it
presents the Browser Viewport. Query methods, let's now understand how the browser
queries elements on a page, how the DOM API queries the elements.
[00:03:37]
So let's have first the most simple method, which is getElementByID. So when the
browser reads the page, it constructs the Hashmap of the elements. So it reads the
ID, and then it basically builds the cache where the ID is assigned to some HTML
reference. So this means that when we try to getElementByID the Time Complexity and
Read Costs is just O (1).
[00:04:05]
Because this is just basically accessing the object by its key. The situation is
slightly different when it comes to getElementsByClassName. So to
getElementsByClassName we need to actually traverse the whole tree, and the way the
browser traverse a tree it's use the DFS approach. So it basically goes through the
elements and finds all the all the elements on a page with the class text.
[00:04:36]
And the Time Complexity for this is linear, so we need to go for all elements in
the worst case, but you see there is a star here. This is because the browser is
quite smart, and it utilizes the Hashmap when you try to query things. So the first
query will take some time, but if you execute the same query next time, it will be
almost immediately because browser utilizes the caching mechanism.
[00:05:04]
And one thing about this method is also to trans the HTMLCollection. An
HTMLCollection is actually a legacy Collection which has one interesting
characteristic, it's life. So this means that if we modify the DOM Tree, and, for
instance, remove one element from the DOM, this will be reflected on our
HTMLCollection.
[00:05:30]
And that's why the Read Cost of the HTMLCollection is big O(N) and because every
time we need to read a single element, the browser will parse the tree again to
verify its position. So this is a disadvantage of this thing. But on the other
hand, the memory cost is low because the browser doesn't need to clone any objects,
it just takes the already existing references.
[00:06:00]
So you can utilize the HTMLCollection when you're low on memory and you need to use
as less memory as possible. Okay, the same way works the getElementByTagName. So
there is no difference at all, it allows us to query things by detect. So the
querySelector works slightly different. So it utilizes the CSS selector, so the
time complexity depends on the selector that we use.
[00:06:34]
So we always space some text on compiling the selector. So this means that the more
complex selector you have, the more time it takes to compile it for CSS. So the
read access and the Memory Cost is big O (1) because we're just returning the
single element. And the querying mechanism when we try to use the class names or
the text, it's the same as the DFS.
[00:07:04]
If we use the ID, then we gonna convert this statement to getElementByID instead.
So one interesting thing about the querySelector, and this is the querySelectorAll,
is because it returns the NodeList. The NodeList is not a type of collection, but
it's not life, it's just basically the copy of the elements.
[00:07:25]
So what the browser does it just copies the HTML objects. And this means that we
pay additional memory costs when we do the querying. So it's not advisable to query
too many things because you may end up having the memory spike. But the read access
is O(1) because you're just accessing the object directly from the array instead of
traversing the tree again by maintaining the life collection.
[00:07:51]
And the querying mechanism is the same, DFS plus Hashmap. Quick summary, so the
getElementsByID provide the best performance because the browser builds the cache.
But you need to make sure that you utilize the ID space correctly. So it's
basically apparent to you to rely on the IDs too much, because the ID name space is
shared across your multiple components and just make sure that you don't overuse
IDs in your app.
[00:08:23]
The getElementsByClassName or getElementsByTagName, provide a low memory overhead
because it returns to live collection with just references to existing objects. But
because the read access is high, so you need to make sure that you utilize this
collection right. Because if you're going to loop for this collection, yes, it will
convert to quadratic time.
[00:08:50]
And we don't want to do that in our app on the largest set of elements. The
querySelector have slightly worse performance than getElementByID. But the browser
itself heavily optimize any CSS selectors. So in different browsers, you may get
different results, but on average, it's very close actually to getElementByID.
[00:09:13]
But the elements that are returned from this query do not represent the live
collections. So if you remove the element from the DOM Tree, you may end up having
the stale element in your in your collection. Yeah, the same applies to the
querySelectorAll, but because it copies all the elements, you don't want to query
too many elements with this method.
[00:09:39]
>> Speaker 2: Could you clarify the difference between the time complexity of
querying versus the read cost?
>> Evgenii Ray: Yeah, okay. So if we'll go back to getElementsByTagName, for
instance. So the time complexity it takes to go through the elements, to find all
the elements that we are looking for is big O(N) because we need to go through all
elements in the worst case.
[00:10:06]
But the read cost is also big O(N) because we maintain the live collection. And
when we maintain a live collection, this means the browser needs to verify that the
element exists in the DOM Tree. So every time you're trying to access this live
collection, the browser queries the elements again to verify that the element is
still present.
[00:10:30]
That's why it can result to the Time Complexity, in the worst case, we'll need to
parse all elements again.
>> Speaker 3: Is there a reason the querySelectorAll returns to non-live element
lists compared to this one, for example, comparing or returning the live?
>> Evgenii Ray: So the live collection is actually legacy one, it's not recommended
to use for normal cases.
[00:10:59]
And I guess the querySelector returns non-live collection because we needed to have
some alternatives. The live collection has more drawbacks and if not used wisely,
can lead to more issues. But in our application development, we don't really need
to manage the live collections, and for most of the developers, the
querySelectories the better go because it doesn't have the expensive read access.
[00:11:27]
>> Speaker 2: Why would querySelectorAll impact memory while returning references
to Nodes which already exist in the DOM?
>> Evgenii Ray: Not exactly, we actually return the copies. Basically, the browser
creates a new object with the same values and the copies, and when you change the
properties of these objects, it will impact the original object.
[00:11:56]
But we have two separate objects basically, with the same properties and they're
connected with each other for the proxy mechanism. So it's a slightly different,
when we use the live collection, we actually guarantee we can actually reuse the
real reference to the object instead.
>> Speaker 2: When you're saying that the cache would make things faster next time,
are you always sure about that because class names can be dynamic?
[00:12:28]
>> Evgenii Ray: So you can't control that because it's up to the browser engine.
The same query methods can behave differently in Safari or Chrome depending on how
developers optimize that. For instance, Safari browser uses the WebAssembly
compilation for the native selectors, while the Chrome doesn't use that. Because
they find that the performance of existing querySelectors is fine on average.
[00:12:56]
So you can't guarantee the same results, and it's up to the engine. So the logic of
caching the querySelector is quite complex, and yeah, no guarantee [LAUGH].