The Danger of URLComponents

Bug

I fixed an interesting bug recently: for URLs in the format of "http://apple.com/ios;v=14"1, the web pages loaded in the browser are blank.

When I pasted that URL in the browser, it loaded correctly. But when I tapped it in the app, it failed to load in the browser, and I saw a 404 error.

Code

What's the output of this code?

let url = URL(string: "http://apple.com/ios;v=14")!
let urlComponents = URLComponents(url: url, resolvingAgainstBaseURL: false)!
if url == urlComponents.url! {
    print("good")
} else {
    print("broken")
}

It is "broken".

Analysis

Why is it broken?

In urlComponents.url the ; character is percent encoded: "http://apple.com/ios%3Bv=14".

If ; is a normal(unreserved) character, it's usually fine:

URIs that differ only by whether an unreserved character is percent-encoded or appears literally are equivalent by definition, but URI processors, in practice, may not always recognize this equivalence.

However, ; is a reserved character, and in the context of path it has special meaning:

URI producing applications often use the reserved characters allowed in a segment to delimit scheme-specific or dereference-handler-specific subcomponents. For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used for similar purposes. For example, one URI producer might use a segment such as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same.

Notice path parameters and query parameters are different things.

Because:

URIs that differ only by whether a reserved character is percent-encoded or appears literally are normally considered not equivalent (denoting the same resource) unless it can be determined that the reserved characters in question have no reserved purpose.

Hence, "http://apple.com/ios%3Bv=14" is not the same as "http://apple.com/ios;v=14".

This is the gist of the bug. The app uses URLComponents to process URLs before opening them in the browser. If a URL has path parameters, the processed URL no longer points to the same resource as the original URL, and most likely it points to nothing, hence 404.

Interestingly, Apple's URL framework used to supported path parameters, and it is called parameterString, but only on macOS:

Now parameterString is deprecated from NSURL, it doesn't exist in URL, and it is not supported by URLComponents. I guess that's exactly why ; is percent encoded by URLComponents: ; only serves a special purpose for path parameters in path context, since URLComponents doesn't even have the concept of path parameters, ; loses its reserved purpose, but it is still a reserved character, and

Reserved characters that have no reserved purpose in a particular context may also be percent-encoded but are not semantically different from those that are not.

Hey, but you are wrong Apple, some websites are still using path parameters!

BTW, the doc of URLComponents says "This structure parses and constructs URLs according to RFC 3986." But I don't think so.

Ironically,

  • the doc of URLComponents.percentEncodedPath says "Although ‘;’ is a legal path character, it is recommended that it be percent-encoded for best compatibility with URL."
  • and the doc of NSURLComponents.percentEncodedPath (Why they are different docs?) says "Although an unencoded semicolon is a valid character in a percent-encoded path, for compatibility with the NSURL class, you should always percent-encode it."
  • except that NSURL and URL don't really require percent-encoding of ; in path. In other words, leaving plain ; in path still results in a valid URL.

Conclusion

To recap:

  • ; is not required to be percent encoded in path.
  • Percent encoded ; is different from plain ; in path.

So ; should not be percent encoded. Maybe we should blame URLPathAllowedCharacterSet.

What should we do?

I went back to manually compose the URL using URL and String, and will always think twice about using URLComponents in further. I also filed a radar.


  1. Just as an example, it doesn't really point to a web page.