Digging Through Internet History

By hernil

February 16, 2024

note: This post was originally written on an internal company Slack. Some tweaks have been made.

From time to time I wonder about weird things which requires me to dig a bit around to see if I can get to the bottom of something. Today’s shower-thought (actually from quite a while back but never bothered to track down an answer): Why is the host part of an URL (or URI) read from most to least specific? Especially when the path part just after is the opposite being read from least to most specific! Intuitively (or at least from a file system way of thinking) the path way of doing it by defining a root and drilling down to a leaf seems more reasonable. Being consistent would result in .no.yvn.devblog/posts/digging-through-internet-history/ ¹ Reading this out makes all the sense as we simply start at the root and drill down to no, then yvn and all the way down to whatever client is hosting devblog (which could be several more steps down) and continue drilling down a specific path on that host. This “least to most” specific way of writing is also used in a few programming languages to define packages or namespaces, in addition to all file systems (that I know of at least). So why is this not how we navigate the Internet all day every day? Looking at RFC3986 for URIs ² we see that they’re actually not that fuzzed about which way one chooses to look up a host (they even mention the yellow pages!) but they do admit that

The most common name registry mechanism is the Domain Name System (DNS).

By that logic one deduces that we could very well refer to the leads host from least to most specific, as we did over, in a theoretical hernil Domain Name System (hDNS for short). RFC1738 for URLs is more specific though ³ and in this case DNS is the way to go. This brings us back to the year 1987 and RFC1034 which together with RFC1035 defines DNS as we mostly know it today. Some RFC authors are better story tellers than at least I gave them credit for so I’ll let the RFC do the talking:

Host name to address mappings were maintained by the Network Center NIC) in a single file (HOSTS.TXT) which FTPed by all hosts [RFC-952, -953]. The total network bandwidth consumed in distributing a new by scheme is proportional to the square of the number of hosts in network, even when multiple levels of FTP are used, outgoing FTP load on the NIC is considerable. growth in the number of hosts didn’t bode well for future.

So yeah, I guess whatever problems I have with the DNS-syntax I should at least be grateful that we’re not updating a single hosts file for the entire Internet. Anyways, a few minutes of reading later our journey seemingly ends on the following paragraph ⁴:

The domain name of a node is the list of the labels on the path from the node to the root of the tree. By convention, the labels that compose a domain name are printed or read left to right, from the most specific (lowest, farthest from the root) to the least specific (highest, closest to the root).

The answer to why we order URLs this way simply turns out to be “by convention”. Probably from a time when literally all significant clients on the Internet fit in a single file that used to be passed around when needed. Like a lot of tech history it’s fun to see how some decisions follow us to this day, and while some of them might not be perfect in hindsight - more often than not it’s quite incredible how a few key engineers lay ground work so solid that it could sustain the unimaginable growth that the Internet has seen since its beginning. I don’t know what I was expecting, maybe some sort of convincing argument for why this order is the only one that makes sense, or maybe that the whole thing was a big misunderstanding - but I guess “by convention” is as good a reason as any sometimes. Anyways, sometimes digging another step or three can be interesting - so do more of that!

fun fact, the prepended . is not a typo. A fully qualified domain name should have a trailing . which isn’t really trailing as the root domain is null. Turns out we’re being utter savages just writing partially qualified domain names all day long in our browser! ↩︎
https://datatracker.ietf.org/doc/html/rfc3986#section-3.2.2 ↩︎
https://datatracker.ietf.org/doc/html/rfc1738#section-3.1 ↩︎
https://datatracker.ietf.org/doc/html/rfc1034#section-3.1 ↩︎

Input or feedback to this content? Reply via email!