HTML Entities: Complete Reference Guide & Encoding Explained [2026]

Every time you write <p>Hello & welcome</p>, you are using HTML entities. They are the mechanism that lets you include characters with special meaning in HTML — like angle brackets, ampersands, and quotation marks — as displayable text rather than as markup instructions. Get them wrong and your HTML either breaks or renders incorrectly. Get them right and your pages are valid, accessible, and safe from injection attacks.

This guide covers what HTML entities are, which ones you absolutely must know, how named and numeric entities differ, where typography entities fit in, and the common mistakes developers make when encoding HTML.

1. What Are HTML Entities?

An HTML entity is a string of text that begins with an ampersand (&) and ends with a semicolon (;). It represents a single character in HTML output. When the browser's HTML parser encounters an entity, it substitutes the entity string with the character it represents before rendering the page.

For example, the entity &lt; is rendered as the less-than sign <. The entity &copy; is rendered as the copyright symbol ©. The entity &nbsp; is rendered as a non-breaking space.

Entities come in two forms:

2. Why HTML Entities Exist

HTML uses certain characters as structural markers. The less-than sign < opens a tag. The greater-than sign > closes a tag. The ampersand & begins an entity reference. The double quote " delimits attribute values. These characters cannot appear in page content as-is without confusing the parser.

Consider the problem without entities: if you want to display the text if (x < 10 && y > 0) in an HTML page, writing the raw characters would produce broken HTML. The < would be interpreted as the start of an unknown tag. The & would be interpreted as the start of an entity reference. Entities solve this by providing an unambiguous escape mechanism.

A secondary reason entities exist is historical: older character encodings (like ISO-8859-1) could not represent all characters. Entities allowed authors to include characters like © or é in documents encoded in ASCII. Modern UTF-8 encoding handles these directly, but the entity mechanism remains useful and correct.

3. The Core HTML Entities You Must Know

Entity Name Numeric Code Character Use Case
&amp; &#38; & Ampersand in text content and attribute values
&lt; &#60; < Less-than sign; displaying code snippets
&gt; &#62; > Greater-than sign; displaying code snippets
&quot; &#34; " Double quote inside double-quoted HTML attributes
&apos; &#39; ' Single quote inside single-quoted HTML attributes
&nbsp; &#160; (non-breaking space) Prevent line break between two words; force visible spacing
&copy; &#169; © Copyright symbol in footers and legal text
&reg; &#174; ® Registered trademark symbol
&trade; &#8482; Trademark symbol (unregistered)
&euro; &#8364; Euro currency sign

Of these, &amp;, &lt;, &gt;, and &quot; are the four you will use most frequently and the ones that HTML validators will complain about if missing. The others are convenience entities for characters that most keyboards cannot type directly.

Format and Encode HTML Instantly

The SnapUtils Code Formatter handles HTML encoding and decoding, formats messy HTML, and highlights syntax errors. Paste your code and get clean, properly encoded output in one click.

Open Code Formatter

4. Named vs Numeric Entities

Both named and numeric entities produce identical results in the browser. The choice between them is about readability and compatibility.

Named Entities

Named entities like &copy; and &nbsp; are defined in the HTML specification. They are readable at a glance — you do not need to look up what &copy; renders to. However, the full set of named entities is large (over 2,000 defined in HTML5), and not all named entities are recognized by all parsers. The five core entities (&amp;, &lt;, &gt;, &quot;, &apos;) are universally supported.

Decimal Numeric Entities

Decimal numeric entities use the format &#[number]; where the number is the Unicode code point in base 10. &#169; is the copyright symbol (Unicode code point 169). These work for any Unicode character and are supported by every HTML parser ever written, making them safer than named entities for obscure characters.

Hexadecimal Numeric Entities

Hexadecimal numeric entities use the format &#x[hex];. &#xA9; is the copyright symbol (0xA9 = 169 in decimal). When working with Unicode documentation (which uses hex code points like U+00A9), hex entities translate directly without conversion.

<!-- These three are identical: -->
&copy;       <!-- named entity -->
&#169;       <!-- decimal numeric entity -->
&#xA9;       <!-- hex numeric entity -->

5. Character Encoding and UTF-8

Character encoding determines how text bytes are mapped to characters. Before UTF-8 became universal, web pages used encodings like ISO-8859-1 (Latin-1) that could only represent 256 characters. Entities were the only way to include characters outside that set.

Modern web development uses UTF-8 for everything, declared via:

<meta charset="UTF-8">

UTF-8 can encode every Unicode character (over 140,000 characters across all languages and symbol sets). This means you can include characters like ©, €, —, and even emoji directly in your HTML source without entities — the browser will render them correctly as long as the page is declared UTF-8 and saved in UTF-8 encoding.

However, even with UTF-8, you still must encode the four structural characters: <, >, &, and " in the appropriate contexts. UTF-8 does not change the HTML parsing rules — it only expands which characters can appear as raw bytes.

6. Common HTML Entity Mistakes

Using Raw Ampersands in URLs

URLs in HTML attributes frequently contain query parameters separated by ampersands. The URL https://example.com/search?q=html&page=2 contains a raw & inside an href attribute. This is technically invalid HTML. The correct version is:

<!-- Wrong: raw ampersand in attribute -->
<a href="https://example.com/search?q=html&page=2">Page 2</a>

<!-- Correct: encoded ampersand -->
<a href="https://example.com/search?q=html&page=2">Page 2</a>

Most browsers tolerate raw ampersands in attributes and display the link correctly, but the HTML is invalid, and some validators, parsers, and accessibility tools will flag it as an error.

Forgetting the Trailing Semicolon

The semicolon at the end of an entity is required. &copy without a semicolon is not a valid entity. Modern browsers are lenient and often render it correctly anyway, but in strict parsers (like XML or XHTML) the missing semicolon is a fatal error. Always include the semicolon.

Double-Encoding

A common mistake in templating systems is encoding text that is already encoded. Running &amp; through an HTML encoder produces &amp;amp;, which renders as the literal text "&amp;" rather than "&". This typically happens when user-provided content is encoded on storage and again on output. Encode once, on output only.

Using &nbsp; for Layout

Non-breaking spaces are frequently abused for creating visual spacing in layouts: &nbsp;&nbsp;&nbsp; repeated to indent text, or &nbsp; to force space between inline elements. This is a layout problem that belongs in CSS, not HTML. Use margin, padding, or gap for spacing. Reserve &nbsp; for its semantic purpose: preventing a line break between two words that should stay together (like "10&nbsp;kg" or "January&nbsp;19").

7. When to Encode and When Not To

A practical rule covers the vast majority of cases:

The exception is when outputting user-generated content. Any text that originates from user input — form fields, URL parameters, database values — must be HTML-encoded before being inserted into a page. Failing to encode user input is the root cause of Cross-Site Scripting (XSS) vulnerabilities, where malicious users inject <script> tags through form fields. Proper HTML encoding neutralizes this attack vector by converting the raw < to &lt;, preventing the browser from interpreting the user input as executable markup.

8. Typography Entities

Professional typography uses characters that keyboards do not have dedicated keys for. HTML entities provide access to all of them:

Entity Character Name Use Case
&mdash; Em dash Sentence breaks, parenthetical asides — like this
&ndash; En dash Ranges: pages 12–45, dates 2020–2026
&lsquo; Left single quotation mark Opening curly single quote in prose
&rsquo; Right single quotation mark Closing curly single quote; apostrophe in prose
&ldquo; Left double quotation mark Opening curly double quote in prose
&rdquo; Right double quotation mark Closing curly double quote in prose
&hellip; Horizontal ellipsis Omission in quoted text; trailing thought
&bull; Bullet Custom bullet points outside of list elements

In modern UTF-8 documents you can paste these characters directly into your HTML source rather than using entities. Both approaches produce identical output. Entities have the advantage of being unambiguous in source code review — &mdash; is easier to spot than a raw em dash character that looks similar to a hyphen at small font sizes.

9. Frequently Asked Questions

What is &amp; in HTML?

&amp; is the HTML entity that represents a literal ampersand character (&). Since the ampersand character is used to begin all entity references in HTML, you must encode it as &amp; whenever you want to display an actual ampersand in your page content. For example, to display "Cats & Dogs" in HTML text, write Cats &amp; Dogs. Writing Cats & Dogs with a raw ampersand is technically invalid, though most browsers tolerate it.

When do I need to use HTML entities?

You must use HTML entities for the four structural characters that HTML parses specially: < (use &lt;), > (use &gt;), & (use &amp;), and " inside attribute values (use &quot;). You also need entities for any character your document's character encoding cannot represent, though UTF-8 covers all Unicode characters and eliminates this need in modern pages. Any other use of entities — like encoding é as &eacute; in a UTF-8 document — is valid but unnecessary.

What is the HTML entity for a space?

&nbsp; is the HTML entity for a non-breaking space. It renders visually the same as a regular space but prevents the browser from wrapping a line at that position. Unlike regular spaces, multiple consecutive &nbsp; characters are not collapsed — they all render as visible gaps. Use &nbsp; to keep words together that should not be separated across lines, such as "10&nbsp;px" or "Fig.&nbsp;3". Do not use &nbsp; for layout spacing — use CSS margin and padding instead.

How do I decode HTML entities?

In JavaScript, the simplest approach is to create a temporary textarea element: set its innerHTML to the encoded string, then read back its value property. The browser's HTML parser handles the decoding automatically. In Python 3, use html.unescape() from the standard library. In PHP, use html_entity_decode(). For one-off decoding in the browser, the SnapUtils Code Formatter decodes HTML entities as part of its formatting pipeline.

Format, Encode, and Validate HTML

The SnapUtils Code Formatter prettifies HTML, encodes and decodes entities, and highlights syntax errors. Free to use, no account required.

Open Code Formatter Free

Related guides: URL Slug Guide  •  YAML vs JSON  •  Hex Colors & WCAG Contrast