Path and Positional Encoding In a Binary Tree

How to uniquely identify a node of a binary tree? One method is to record the path to reach it from the root. The advantage of this mehtod is the path can be easily generated when traversing down to that node. Additionally, the recorded path also gives the information about how to reach that node again. It’s quite straightforward to come up with an efficient implementation: We use a string of bits to encode the path, where 0’s represents the left branch and 1’s represents the right branch. For example, the node in the diagram below is marked as ‘10’, or 2 if we use an integer to save it.

Interestingly, counting from zero, this node is also the 2nd one in its row. This is not a coincidence, and you can verify it yourself to see this rule applies to every node in the tree. Formally speaking, “the rule” is the integer representation of the path generated by the method described above is also the index (starts from zero) of the node in its row if the tree is complete.

This result is not too surprising if you interpret the index of the node as the number of nodes at its left. Let n be the layer of the node counting from the bottom, then at each layer, going right would add 2^n-1 nodes to the left, and going left adds none.

This can be a quite useful result, as once the location of a node is specified by its row and index, we can immediately figure out how to reach it by simple bit manipulation. If we want to reach the node at the nth row from the top, the c++ code to do it would be:

Node* reach(Node* current, uint32_t path, uint32_t n){
    if(n == 0) return current;
    uint32_t mask = 1 << (n - 1);
    int dir = path & mask;
    return reach(dir == 0 ? current->left : current->right, path, n - 1);
}

The path in the snippet above can just be the index of that node in its row, or it can be generate with something like this:

uint32_t find(TreeNode* current, TreeNode* target, uint32_t path) {
    if(current == target) return path;
    if(current == nullptr) return 0;
    path = path << 1;
    uint32_t res = findLast(current->left, target, path);
    res += findLast(current->right, target, path | 1);
    return res;
}

I’m delighted to see how simple and efficient solutions to many problems can be made using this result. It’s also quite universal as the index is counted in a complete binary tree, so for any tree with vacancies the index is still valid after node insertion. But the most fascinating part for me is the intrinsic relation between binary trees and binary numbers it revealed, which somehow demonstrates a mathematical beauty.