Tuesday, November 2, 2010

Java: URI vs URL

I was just writing some HTTP-related code using java.net.URL when I noticed that Apache httpclient 4.0's API seems to want java.net.URI instances. "Why's that, I wonder?" The answer, it seems, is that Java's java.net.URL class is broken: its equals() method is blocking! It goes out on the network and does a reverse lookup of the hostname. This is very unfortunate since in every other way that class is what I want.

From the Javadoc for java.net.URL#equals:

Two hosts are considered equivalent if both host names can be resolved
into the same IP addresses; else if either host name can't be
resolved, the host names must be equal without regard to case; or both
host names equal to null.

Since hosts comparison requires name resolution, this operation is a
blocking operation. (Emphasis mine)

Good times. So, to avoid abitrary thread "hanging" at some point down the road I guess I'll use java.net.URI. Too bad these are all valid URIs, but nonsense in an HTTP context: "mailto:me@foo.com", "abc:123", "quux"

This begs the question: what precisely is the difference between a URI and URL? There is tons written on this (Google it), but I'll add my semi-informed $0.02 as well:

URI: an identifier (name) for a resource. Doesn't necessarily say anything about how to locate the idientified resource, but sometimes does. e.g. "/foo", "http://test.com/bar", "x:y:z/a/b/c"

URL: a URI that MUST include how to locate the resource. i.e. it starts with "http", "https", "ftp", etc. e.g. "http://www.google.com", "https://bank.com", http://abc.com/foo/bar/baz.html"

So, URI is very general, and URLs are a specialization of URIs. There is another subset of URI called URN that adds even more complexity, so I'm going mostly ignore that here. I'll just paraphrase from the SO link below and say that URNs are supposed to be a unique name (over time and space) for a resource, and they say nothing about locating said resource.

References

1 comment:

  1. A URL is a URI that is hierarchical and absolute. That's all there is to it. URIs can also be opaque instead of hierarchical, such as in "mailto:me@foo.com" and "bitcoin:1HB5XMLmzFVj8ALj6mfBsbifRoD4miY36v". (All opaque URIs are absolute.) Hierarchical URIs can also be relative instead of absolute, such as in "/foo" and "../bar" and "baz". URNs are simply URIs that are opaque, use the "urn" scheme, and follow a constrained syntax in the scheme-specific part.

    ReplyDelete