data URL scheme is defined by
which unfortunately is vague regarding many details of the syntax.
This document describes a more precise parsing algorithm for
See also Bug 19494 on the W3C Bugzilla and other stuff linked from there.
This algorithm returns either a failure or two byte strings: a MIME type with parameters (as it would appear in a Content-Type HTTP header) and the decoded data.
To obtain a resource from a
with the "
run these steps:
?" and the query to input.
The comma can come either from the scheme data or the query.
What if the comma is an a MIME quoted string for a parameter value?
Match how strictly? Case sensitive or not? Allow whitespace? Percent-encoding?
Return a failure on "invalid" base64? What is invalid? Also accept the URL and Filename Safe Alphabet? Mixed alphabets in the same body? Ignore which non-alphabet bytes? Missing/too little/too much padding?
TODO: The algorithm is missing this part of RFC2397:
If <mediatype> is omitted,
it defaults to text/plain;charset=US-ASCII.
As a shorthand, "text/plain" can be omitted
but the charset parameter supplied.
This definition does not impose any length limit on data: URLs.
When doing URL parsing followed by this algorithm, implementation are allowed to skip some intermediate steps in order to process large URLs efficiently, as long as the "black box" behavior the same.