Dom Traversal for Fun and Profit

Dom Traversal for Fun and Profit

During my time writing funny words in an IDE to make the computer do what I want, I dabbled in a little web scraping for cash.

I kept forgetting how to target certain parts of the page that I wanted to scrape and organise within my program.

So below, I'm putting together a few notes to share with my future self and you :)

Let's start with a little boilerplate HTML that we can work with.

<div class="grandparent" id="grandparent-id">
<!-- top level grandparent -->
    <div class="parent"> <!-- first parent -->
        <div class="child" id="child-one"></div> <!-- child 1 -->
        <div class="child"></div> <!-- child 2 -->
    </div>
    <div class="parent"> <!-- second parent -->
        <div class="child"></div> <!-- child 3 -->
        <div class="child" id="child-four"></div> <!-- child 4 -->
    </div>
</div>

Get Element by ID

There should only be one unique ID name per page. So we call getElement (singular).

const grandparent = document.getElementById("grandparent-id")

Get Elements by Class Name

Calling get elements (plural) returns an HTMLCollection of elements from the DOM (both the parents in the HTML above). However, when trying to use Array methods on this collection you'll get an error.

htmlcollectionproto.png

We can get around this by wrapping the returned collection of elements inside an array, then we're able to use array methods on that content.

const parent = Array.from(document.getElementsByClassName("parent"))

Query Selector

This gives us a single element (the first one that appears in the DOM tree) by targeting the DOM using CSS selectors.

const grandparent = document.querySelector("#grandparent-id") // id
const grandparent = document.querySelector(".grandparent") // class

Query Selector All

Similar to Get Elements by ID, this gives all the elements that match our query. However, this returns a NodeList, which allows us to use Array methods.

nodelistproto.png

const grandparent = document.querySelectorAll("#grandparent-id") // id
const grandparent = document.querySelectorAll(".grandparent") // class

Selecting Child Element

First, we want to target the top grandparent node. From there we can grab all of the children underneath.

Even though we're using QuerySelector which usually gives us a NodeList, when calling on the children, we get back an HTMLCollection!! Annoying.

So we'll need to create an Array from the returned children.

const grandparent = document.querySelector(".grandparent")
const parents = Array.from(grandparent.children)
const parentOne = parents[0] // etc

We can also drill down into the parent's children

const children = parentOne.children

Selecting Parent Element

We can use QuerySelector on NodeLists that we've already captured to go straight to the child level and skip the parents.

const childFour = document.querySelector("#child-four")
const parent = childFour.parent

Selecting Closest Grandparent Element

This works very similar to QuerySelector, but instead of going down the DOM it moves upwards.

It takes a CSS argument which moves up the DOM to find the closest element that has the passed selector.

const childFour = document.querySelector("#child-four")
const grandparent = childFour.closest(".grandparent")

Skipping DOWN half the DOM

We can use QuerySelector on NodeLists that we've already captured to go straight to the child level and skip the parents.

const grandparent = document.querySelector(".grandparent")
const childOne = grandparent.querySelector(".child")

Selecting Siblings Previous + Next

This gets the next element along from where you currently are. Instead of going up and down, it's like we're going sideways through the DOM.

const childOne = document.querySelector("#child-one")
const childTwo = childOne.nextElementSibling

const childFour = document.querySelector("#child-four")
const childThree = childFour.previousElementSibling