URL Encoding Explained: What %20 Means and How Percent-Encoding Works
What Is URL Encoding?
URL encoding — formally called percent-encoding — is the mechanism that allows arbitrary data to be safely represented inside a Uniform Resource Locator (URL). Every character that cannot appear literally in a URL is replaced with a percent sign (%) followed by exactly two hexadecimal digits representing that character's byte value.
The standard is defined in RFC 3986 (Uniform Resource Identifier: Generic Syntax), which superseded the older RFC 2396. RFC 3986 draws a hard line between characters that may appear in a URL without modification (unreserved characters) and every other character, which must be encoded before it can appear in a URL.
URLs were designed to be transmitted over protocols and software that might misinterpret certain bytes — early email gateways, for example, could corrupt bytes above 0x7F. Even today, a space in a URL would be ambiguous to a parser trying to split an HTTP request line. Percent-encoding solves this by reducing all potentially problematic values to safe ASCII sequences.
<\!-- Section 2 -->Why Special Characters Need Encoding
A URL is a structured string, and many of its characters carry specific syntactic meaning:
?separates the path from the query string.&separates individual query parameters.=separates parameter names from their values.#begins a fragment identifier./separates path segments.:separates the scheme from the authority (https://).
If any of these characters appear as data — for instance, a search query that literally contains an ampersand — a URL parser will misread them as structure. Consider the URL:
https://example.com/search?q=salt&pepper&lang=en
A URL parser reads this as three parameters: q=salt, pepper (no value), and lang=en. The intended query was salt&pepper. By encoding the ampersand, the intent becomes unambiguous:
https://example.com/search?q=salt%26pepper&lang=en
Spaces present a similar problem. A space in an HTTP request line would cause the server to split the URL at the wrong position. Percent-encoding converts a space to %20, keeping the URL intact.
How Percent-Encoding Works
The algorithm is straightforward:
- Take the character you want to encode.
- Determine its byte representation in UTF-8.
- Express each byte as two uppercase hexadecimal digits.
- Prefix each pair with a
%sign.
Let's walk through the space character as a concrete example. A space is Unicode code point U+0020. In UTF-8 — which is a single byte for all code points below U+0080 — a space is byte value 0x20, which equals 32 in decimal. The two-digit hexadecimal representation is 20, so the percent-encoded form is %20.
A slightly more complex example: the forward slash / is U+002F, UTF-8 byte 0x2F, encoded as %2F. And the at-sign @ is U+0040, UTF-8 byte 0x40, encoded as %40.
Key rule: Percent-encoding operates on bytes, not on Unicode code points directly. For ASCII characters (U+0000 through U+007F) the two representations are identical. For characters above U+007F, you must first encode to UTF-8 bytes, then percent-encode each byte separately.
Common URL-Encoded Characters
The table below lists the most frequently encountered characters that require encoding, along with the standard percent-encoded form and the reason each character is reserved or unsafe.
| Character | Encoded | Why Reserved or Unsafe |
|---|---|---|
| Space | %20 | Splits HTTP request line; not permitted in URLs |
\! | %21 | Sub-delimiter; encode when used as data |
# | %23 | Begins fragment identifier |
$ | %24 | Sub-delimiter; encode when used as data |
% | %25 | Percent sign itself; must be encoded to avoid confusion with encoding sequences |
& | %26 | Separates query parameters |
' | %27 | Sub-delimiter; may confuse HTML attribute parsers |
( | %28 | Sub-delimiter; encode when used as data |
) | %29 | Sub-delimiter; encode when used as data |
* | %2A | Sub-delimiter; wildcard in some systems |
+ | %2B | Represents space in form-encoded data; ambiguous in query strings |
, | %2C | Sub-delimiter; encode when used as data |
/ | %2F | Path segment separator |
: | %3A | Separates scheme from authority; port delimiter |
; | %3B | Sub-delimiter; path parameter separator in some frameworks |
= | %3D | Separates query parameter names from values |
? | %3F | Begins query string |
@ | %40 | Separates userinfo from host in authority |
[ | %5B | IPv6 address delimiters; not allowed in paths |
] | %5D | IPv6 address delimiters; not allowed in paths |
SnapUtils JSON Formatter makes it easy to paste, format, and validate structured data — including URL-encoded payloads.
Reserved vs Unreserved Characters
RFC 3986 divides URL characters into two broad categories:
Unreserved Characters
These characters are safe to appear in a URL without encoding and carry no special syntactic meaning. They are:
- Uppercase letters:
A–Z - Lowercase letters:
a–z - Digits:
0–9 - Four punctuation marks:
-_.~
You should never percent-encode these characters. Encoding an unreserved character (e.g., writing %41 instead of A) is technically valid and produces an equivalent URL, but it wastes space and can confuse string-comparison logic.
Reserved Characters
Reserved characters have defined syntactic roles in URL structure. They are split into two groups:
- General delimiters:
:/?#[]@ - Sub-delimiters:
\!$&'()*+,;=
Reserved characters may appear unencoded in a URL when they are serving their structural role. When they appear as data — inside a path segment or query value — they must be percent-encoded to prevent misinterpretation.
Every other character — including all non-ASCII bytes — must always be percent-encoded.
<\!-- Section 6 -->Query String Encoding
The query string is arguably the most encoding-sensitive part of a URL. A typical query string looks like this:
https://example.com/search?q=hello+world&category=food+%26+drink&page=2
Several encoding rules apply specifically here:
- Parameter names and values must encode
&,=,+,#, and%when those characters are part of the data. - In the application/x-www-form-urlencoded scheme (HTML form GET submissions), spaces may be encoded as
+rather than%20. This is a legacy rule specific to form data. - When constructing query strings programmatically, always encode values individually before joining them with
&. Never try to encode the full assembled query string at once — you will double-encode the delimiters.
A correct assembly pattern in any language looks like:
const params = new URLSearchParams({
q: 'salt & pepper',
category: 'food & drink',
page: '2'
});
// produces: q=salt+%26+pepper&category=food+%26+drink&page=2
<\!-- Section 7 -->
URL Encoding in JavaScript
JavaScript provides two built-in global functions for URL encoding, and the distinction between them is important.
encodeURI()
encodeURI() is designed to encode a complete URL. It encodes everything except characters that are legal URL structure — including unreserved characters, general delimiters, and sub-delimiters. In practice, the characters it leaves alone are:
A–Z a–z 0–9 - _ . \! ~ * ' ( ) ; : @ & = + $ , / ? # [ ]
Use encodeURI() when you have a complete URL that might contain non-ASCII characters or spaces but whose structural components (https://, /path, ?q=) are already correct.
encodeURI('https://example.com/my page?q=hello world');
// → 'https://example.com/my%20page?q=hello%20world'
encodeURIComponent()
encodeURIComponent() is designed to encode a single component of a URL — most commonly a query parameter value or a path segment. It encodes everything except unreserved characters:
A–Z a–z 0–9 - _ . \! ~ * ' ( )
Critically, it does encode &, =, +, /, ?, and # — all the characters that would break URL structure if they appeared in a value without encoding.
const value = 'salt & pepper / spice';
encodeURIComponent(value);
// → 'salt%20%26%20pepper%20%2F%20spice'
// Building a safe query string:
const url = `https://example.com/search?q=${encodeURIComponent('salt & pepper')}`;
// → 'https://example.com/search?q=salt%20%26%20pepper'
Decoding in JavaScript
The corresponding decode functions are decodeURI() and decodeURIComponent(). Use decodeURIComponent() when decoding individual component values received from query strings or path segments.
decodeURIComponent('salt%20%26%20pepper');
// → 'salt & pepper'
The modern URL and URLSearchParams APIs handle encoding automatically and are preferred over manual concatenation for constructing URLs:
const url = new URL('https://example.com/search');
url.searchParams.set('q', 'salt & pepper');
url.searchParams.set('lang', 'en');
console.log(url.toString());
// → 'https://example.com/search?q=salt+%26+pepper&lang=en'
<\!-- Section 8 -->
URL Encoding in Python
Python's urllib.parse module provides the standard tools for percent-encoding.
urllib.parse.quote()
quote() encodes a string for use in a URL path. By default, it leaves the slash / unencoded (treating it as a path separator). Use safe='' to encode slashes as well.
from urllib.parse import quote
quote('salt & pepper')
# → 'salt%20%26%20pepper'
quote('/path/to my file', safe='/')
# → '/path/to%20my%20file'
quote('/path/to my file', safe='')
# → '%2Fpath%2Fto%20my%20file'
urllib.parse.quote_plus()
quote_plus() follows the HTML form-encoding convention: spaces become + instead of %20, and + itself becomes %2B. Use this when building form-encoded query strings.
from urllib.parse import quote_plus
quote_plus('salt & pepper')
# → 'salt+%26+pepper'
urllib.parse.urlencode()
For building complete query strings from a dictionary, urlencode() is the most convenient option:
from urllib.parse import urlencode
params = {'q': 'salt & pepper', 'lang': 'en', 'page': 2}
urlencode(params)
# → 'q=salt+%26+pepper&lang=en&page=2'
Decoding in Python
from urllib.parse import unquote, unquote_plus
unquote('salt%20%26%20pepper')
# → 'salt & pepper'
unquote_plus('salt+%26+pepper')
# → 'salt & pepper'
<\!-- Section 9 -->
URL Encoding in HTML Forms
When an HTML form is submitted, the browser encodes the field values before sending them. The encoding scheme depends on the form's enctype attribute and the HTTP method used.
GET Forms
With method="GET" (the default), field values are appended to the URL as a query string using the application/x-www-form-urlencoded encoding: spaces become +, and reserved characters are percent-encoded.
<form method="GET" action="/search">
<input name="q" value="salt & pepper" />
<\!-- Submits: /search?q=salt+%26+pepper -->
</form>
POST Forms
With method="POST" and enctype="application/x-www-form-urlencoded" (the default POST encoding), values are encoded the same way but placed in the request body rather than the URL. With enctype="multipart/form-data" — used for file uploads — no percent-encoding is applied; each field is its own MIME part.
Use the SnapUtils JSON Formatter to instantly beautify, validate, and inspect JSON payloads from APIs and encoded data sources.
Common Mistakes with URL Encoding
Double-Encoding
The most pervasive mistake is encoding a string that is already encoded. If a value contains %20 and you run encodeURIComponent() on it again, the percent sign becomes %25, turning %20 into %2520. The server then decodes it to %20 (a literal percent-two-zero) rather than to a space.
// Wrong — double encoding:
const alreadyEncoded = 'hello%20world';
encodeURIComponent(alreadyEncoded);
// → 'hello%2520world' (broken)
// Correct — only encode raw values:
encodeURIComponent('hello world');
// → 'hello%20world' (correct)
Using + Instead of %20 Outside Form Data
A + in a URL is only decoded as a space when the content is processed as application/x-www-form-urlencoded. In a URL path segment, + is a literal plus sign. If you use quote_plus() or URLSearchParams output in a path segment, the server will receive a +, not a space.
Encoding the Entire URL Instead of Components
Calling encodeURIComponent() on a full URL will encode the ://, all slashes, and the ? and & delimiters, rendering the URL useless. Always encode individual component values — never the assembled URL as a whole.
Forgetting to Encode the Percent Sign Itself
If your data contains a literal % character (e.g., a mathematical expression like 50%), that percent sign must be encoded as %25. If it is left unencoded, the URL parser may try to interpret the characters following it as a percent-encoding sequence.
URL Decoding
Decoding reverses the process: every occurrence of %XX in a string is replaced by the byte represented by the two hexadecimal digits XX. Once all bytes are collected, they are decoded as a UTF-8 string to produce the original text.
Browsers auto-decode URLs for display purposes. When you paste https://example.com/caf%C3%A9 into Chrome's address bar, the browser shows https://example.com/café. Under the hood, the actual HTTP request uses the percent-encoded form. This display convenience is why copying a URL from the address bar sometimes produces a different string than what the server actually received.
On the server side, frameworks like Express (Node.js), Django (Python), and Rails (Ruby) automatically decode path parameters and query string values before they reach your route handler. You should almost never need to manually decode incoming values in application code — but you do need to be aware of whether your framework decodes once or twice, particularly for path segments that contain encoded slashes (%2F).
Unicode Characters and Internationalized URLs
URLs were originally restricted to ASCII, which left no room for languages other than English. The solution for domain names is Internationalized Domain Names in Applications (IDNA), which converts Unicode domain names (e.g., bücher.de) into ASCII-compatible encoding (xn--bcher-kva.de). This transformation is handled transparently by the browser.
For path segments and query strings, the approach is simpler: encode to UTF-8 bytes, then percent-encode each byte. For example, the Japanese word for "search" (検索) encodes as follows:
- 検 → U+691C → UTF-8:
E6 A4 9C→%E6%A4%9C - 索 → U+7D22 → UTF-8:
E7 B4 A2→%E7%B4%A2
The full encoded path becomes %E6%A4%9C%E7%B4%A2. When you type a Japanese search term into a browser's address bar and look at the raw URL, you will see exactly this pattern.
Emoji follow the same rule. The thumbs-up emoji 👍 is U+1F44D, encoded as UTF-8 bytes F0 9F 91 8D, giving %F0%9F%91%8D in a URL.
Modern browsers display decoded Unicode characters in the address bar for readability (a feature called the Internationalized Resource Identifier or IRI display mode), but they always transmit the percent-encoded form in the HTTP request.
<\!-- FAQ -->Frequently Asked Questions
%20 is the percent-encoded representation of a space character. Because spaces are not allowed in URLs, they must be encoded. The space character has ASCII/Unicode code point 32, which equals 0x20 in hexadecimal — hence %20. You will see %20 in URL path segments, such as /my%20folder/file.txt, and in query strings when not using form-encoding.
encodeURI() is designed to encode a complete URL. It leaves characters that are valid URL structural components — such as :, /, ?, #, &, and = — unencoded, because those characters are needed as structure.
encodeURIComponent() is designed to encode a single component of a URL, such as a query parameter value. It encodes nearly everything except letters, digits, and - _ . \! ~ * ' ( ). If you need to safely embed a value inside a query string, always use encodeURIComponent().
In the application/x-www-form-urlencoded encoding scheme — used when HTML forms are submitted via GET — spaces are encoded as + rather than %20. This is a legacy convention from HTML forms that predates the modern URL standard.
In modern URL path segments and when using encodeURIComponent() in JavaScript, spaces become %20. Mixing these two conventions is a common source of bugs: if you receive a query string where spaces were encoded as +, use decodeURIComponent(value.replace(/\+/g, ' ')) or your framework's built-in form-decoding function.
No, percent-encoding is case-insensitive. %20 and %2f are treated as equivalent to %2F by compliant URL parsers. However, RFC 3986 recommends using uppercase hex digits (A–F) for consistency, and most encoding libraries produce uppercase by default.
Note that the path and query string themselves may be case-sensitive depending on the server — /Search and /search can be different resources. But the encoding notation (%2F vs %2f) is always treated as identical.
Non-ASCII characters are first encoded as UTF-8 bytes, and then each byte is individually percent-encoded. For example, the letter é (U+00E9) is represented in UTF-8 as two bytes: 0xC3 and 0xA9, so it becomes %C3%A9 in a URL.
An emoji like 😀 (U+1F600) requires four UTF-8 bytes — 0xF0, 0x9F, 0x98, 0x80 — and encodes to %F0%9F%98%80. Modern browsers display these in decoded form in the address bar for readability, but the underlying HTTP request always uses the percent-encoded byte sequences.