<\!-- Section 1 -->

1. What Is Markdown?

Markdown is a lightweight markup language created by John Gruber and Aaron Swartz in 2004. The central philosophy is deceptively simple: a document written in Markdown should be readable as plain text without any rendering. The punctuation-based syntax is designed to look like natural formatting — asterisks around a word look like emphasis, a hash before a line looks like a heading.

Gruber described the goal as creating "a text-to-HTML conversion tool for web writers." The tool he shipped was a Perl script; the format it defined became one of the most widely adopted writing syntaxes in software development. Today Markdown is the default format for GitHub READMEs, documentation sites, static site generators, note-taking apps, and countless CMSes.

The original Markdown spec, however, left many edge cases undefined. That ambiguity spawned dozens of incompatible dialects: GitHub Flavored Markdown (GFM), MultiMarkdown, Pandoc's Markdown, and others. The CommonMark project (2014) attempted to produce a rigorous, unambiguous specification, and today most major parsers target CommonMark compliance with dialect extensions layered on top.

Understanding the format's origin matters because it explains its constraints. Markdown was built for readable plain text first, HTML second. When you need rich semantic HTML — ARIA roles, custom classes, complex tables — raw HTML is often the better tool. When you need fast, readable, portable prose, Markdown wins.

<\!-- Section 2 -->

2. The Core Markdown Syntax Reference

The table below covers every element defined in the original Gruber spec and the CommonMark standard. The Markdown column shows the literal syntax; the HTML output column shows what a compliant parser produces.

Element Markdown Syntax HTML Output
Heading 1# Heading<h1>Heading</h1>
Heading 2## Heading<h2>Heading</h2>
Heading 3–6### … ######<h3>–<h6>
Bold**bold** or __bold__<strong>bold</strong>
Italic*italic* or _italic_<em>italic</em>
Bold + Italic***both***<strong><em>both</em></strong>
Link[text](https://url)<a href="https://url">text</a>
Link with title[text](url "title")<a href="url" title="title">text</a>
Image\![alt](image.png)<img src="image.png" alt="alt">
Unordered list- item or * item<ul><li>item</li></ul>
Ordered list1. item<ol><li>item</li></ol>
Blockquote> quoted text<blockquote>…</blockquote>
Inline code`code`<code>code</code>
Fenced code block```lang … ```<pre><code class="language-lang">…</code></pre>
Indented code block4 spaces or 1 tab<pre><code>…</code></pre>
Horizontal rule--- or ***<hr>
Line breakTwo trailing spaces<br>
ParagraphBlank line between text<p>…</p>
Escaped character\*literal asterisk\*literal * character

One subtlety worth knowing: ATX-style headings (# Heading) and Setext-style headings (underline with = or -) produce identical HTML. ATX style is overwhelmingly preferred because it works for all six levels and is more visually distinct.

<\!-- Section 3 -->

3. How Markdown Parsers Work

Converting Markdown to HTML is not a simple find-and-replace operation. A production-grade parser follows a multi-phase pipeline:

Phase 1: Tokenization (Lexical Analysis)

The raw input string is split into a stream of tokens — structural markers that the parser can reason about. A blank line becomes a BLANK_LINE token; a line beginning with # becomes an ATX_HEADING candidate; a fenced code delimiter (```) opens a CODE_FENCE context.

Phase 2: Block-Level Parsing

The token stream is walked to identify block-level elements: paragraphs, headings, lists, blockquotes, code blocks, and HTML blocks. Most parsers build a document tree (AST — Abstract Syntax Tree) at this stage. Each node in the tree represents a block container and may contain children. This two-pass approach is what allows nested blockquotes and lists to work correctly.

Phase 3: Inline Parsing

Leaf nodes — paragraphs, headings, and similar — are processed again for inline spans: emphasis, strong, links, inline code, and images. Inline parsing is the hardest part of a Markdown parser because the rules for emphasis resolution involve complex precedence and nesting logic (the CommonMark spec devotes 50+ rules to emphasis alone).

Phase 4: HTML Rendering

The populated AST is walked a final time to emit HTML. Because rendering is decoupled from parsing, the same AST can be rendered to HTML, LaTeX, plain text, or any other format — which is exactly how Pandoc works.

Major parser implementations you should know:

  • CommonMark — The reference spec with a reference C and JavaScript implementation (cmark, commonmark.js). Strict, unambiguous, and widely adopted.
  • marked — Fast JavaScript parser, widely used in Node.js applications and browser contexts.
  • markdown-it — Another popular JavaScript parser with a rich plugin ecosystem and good CommonMark compliance.
  • GitHub Flavored Markdown (GFM) — CommonMark with GitHub extensions: tables, task lists, strikethrough, and autolinks. The de facto standard for open-source documentation.
  • Pandoc — The universal document converter, written in Haskell. Pandoc's Markdown is a superset of CommonMark with footnotes, definition lists, metadata blocks, and more.
  • Python-Markdown — The canonical Python implementation, foundation for MkDocs and many documentation pipelines.
Note: When choosing a parser for a production system, check its CommonMark conformance score. A non-conformant parser will produce different output for edge cases in emphasis, link resolution, and code block handling — which can cause subtle bugs when users copy Markdown between systems.
<\!-- Section 4 -->

4. Extended Markdown: Tables, Task Lists, Footnotes, Strikethrough

The original Gruber spec deliberately kept the syntax small. Most extensions were pioneered by third-party parsers and later codified in GFM and Pandoc. Here are the most commonly used extensions:

Tables (GFM)

GFM tables use pipe characters to delineate columns, with a separator row of dashes defining the header:

| Name | Type | Required | |----------|--------|----------| | src | string | yes | | alt | string | yes | | width | number | no |

Alignment is controlled by colons in the separator row: :--- (left), :---: (center), ---: (right).

Task Lists (GFM)

Checked and unchecked checkboxes in a list render as interactive or static checkbox inputs:

- [x] Install dependencies - [x] Write tests - [ ] Ship to production

Parsers render this as <input type="checkbox"> elements inside list items. Whether the checkboxes are interactive depends on the parser and context — GitHub renders them non-interactive in READMEs but interactive in issues.

Strikethrough (GFM)

Double tildes produce <del> elements:

~~deprecated function~~

Footnotes (Pandoc / MultiMarkdown)

Footnote references and definitions are separated from the main text:

This is a statement.[^1] [^1]: This is the footnote text.

The parser outputs a numbered superscript link in the body and a <section> containing footnote definitions at the document end. CommonMark does not include footnotes natively, but most parser ecosystems offer them as plugins.

Definition Lists (MultiMarkdown / Pandoc)

Term : Definition text that can span multiple lines.

Autolinks (GFM)

GFM automatically detects bare URLs and email addresses in text and wraps them in anchor tags — no explicit bracket syntax required. This is convenient for prose but can cause issues if a URL contains Markdown-significant characters.

<\!-- Mid-article CTA -->

Convert Markdown to HTML Instantly

Paste your Markdown, get clean HTML output in real time — no install, no account, no limits.

Try the Markdown to HTML Tool
<\!-- Section 5 -->

5. How to Convert Markdown to HTML with SnapUtils

The SnapUtils Markdown to HTML converter runs entirely in your browser. No file uploads, no server processing, no account required. Here is how to use it:

  1. Open the tool. Navigate to snaputils.tools/markdown-to-html. The editor loads instantly — no splash screen, no onboarding.
  2. Paste or type your Markdown. The left panel is a plain-text editor. Paste an existing document or start typing. The preview updates in real time as you type.
  3. Review the live preview. The right panel renders your Markdown as styled HTML. This shows you exactly how the output will look when embedded in a page.
  4. Copy the HTML. Click the "Copy HTML" button to place the rendered HTML markup on your clipboard. The output is clean, standards-compliant HTML — no inline styles injected by the tool.
  5. Use the HTML source view. Toggle to the source tab to see the raw HTML string. This is useful when you need to inspect the output structure or paste into a CMS that accepts raw HTML.
  6. Download as .html. Use the download button to save the full HTML snippet as a file — useful for batch workflows or handing off to a designer.

The converter supports CommonMark syntax plus the most common GFM extensions: tables, task lists, strikethrough, and autolinks. Fenced code blocks are preserved with their language class (language-python, etc.) so a downstream syntax highlighter like Prism or Highlight.js can pick them up without modification.

Tip: If you are converting large documents, use the download option instead of copying to clipboard. Some browsers impose clipboard size limits that can silently truncate large HTML payloads.
<\!-- Section 6 -->

6. Markdown vs HTML: When to Use Which

Markdown and HTML serve different purposes and are best thought of as tools for different contexts — not as competitors. The table below clarifies when each is the right choice.

Scenario Markdown HTML
Blog posts and documentation Preferred — readable as plain text, easy to version-control Overkill for prose
README files in code repos Industry standard Not supported by most renderers
Email templates Must convert to HTML first — email clients don't render Markdown Required for email
Complex layouts (sidebars, grids) Not suitable — no concept of layout Required
Custom ARIA attributes Must inline raw HTML inside Markdown Native
CMS content input Good fit for text-heavy CMSes with Markdown support Better when the CMS uses a WYSIWYG editor
Static site generators Preferred (Hugo, Jekyll, Eleventy, Astro all use Markdown natively) Used for layout templates, not content
User-generated content Safer to accept — limits attack surface vs raw HTML Requires careful sanitization
Long-form technical documentation Excellent — footnotes, cross-references, and code blocks shine Verbose for long documents
Presentation slides Possible with tools like Marp or Reveal.js More control over visual layout

The practical rule: use Markdown when the primary audience is humans writing and reading plain text, and convert to HTML when the primary audience is a browser or a downstream system that consumes structured markup.

<\!-- Section 7 -->

7. Sanitizing HTML Output: Security Considerations

Markdown parsers produce HTML. When that HTML is rendered in a browser — especially when the Markdown source came from a user — it becomes a potential XSS (cross-site scripting) vector. This is one of the most misunderstood aspects of using Markdown in web applications.

The Core Risk

Most Markdown parsers allow raw HTML inside the Markdown source (since the original spec was "a superset of HTML"). A user can submit:

<script>document.cookie = 'stolen=' + document.cookie;</script> or more subtly: [Click me](javascript:alert('XSS'))

If you render this output directly into the DOM without sanitization, you have an XSS vulnerability. The attacker can steal cookies, hijack sessions, exfiltrate data, or run arbitrary code in the context of your domain.

When You Must Sanitize

  • Any time Markdown is authored by an end user (comment boxes, wiki pages, issue trackers)
  • Any time Markdown is imported from an external source (RSS feeds, API responses, file uploads)
  • Any time the rendered output is displayed to a different user than the one who wrote it

Safe Patterns

Option 1 — Disable raw HTML at the parser level. Most parsers have an option to strip or escape HTML in the Markdown source. In marked: set mangle: false and configure a sanitize renderer hook. In markdown-it: set html: false. This is the simplest approach when you control the parser configuration.

Option 2 — Sanitize the HTML output with DOMPurify. DOMPurify is the gold-standard client-side HTML sanitizer. After parsing Markdown to HTML, pass the output through DOMPurify before inserting it into the DOM:

import DOMPurify from 'dompurify'; import { marked } from 'marked'; const rawHtml = marked(userMarkdown); const safeHtml = DOMPurify.sanitize(rawHtml); document.getElementById('output').innerHTML = safeHtml;

DOMPurify operates on an allowlist of safe tags and attributes — it removes <script>, javascript: URLs, event handler attributes (onclick, onload, etc.), and any other potentially dangerous content while preserving legitimate formatting.

Option 3 — Server-side sanitization. For server-rendered content, use a server-side sanitizer. In Node.js: sanitize-html is a robust choice. In Python: bleach (now maintained by Mozilla). In Go: bluemonday. The advantage is that sanitization happens before the payload is ever sent to the client, so there is no window where unsanitized HTML exists in the browser.

Warning: Never implement an HTML sanitizer yourself using regex or string replacement. The HTML parser rules are complex enough that hand-rolled sanitizers almost always have bypasses. Use a purpose-built library.

Allowlist Configuration

When configuring DOMPurify or a server-side sanitizer for Markdown output, your allowlist typically includes: h1h6, p, ul, ol, li, blockquote, pre, code, em, strong, del, a (with href, title, and optionally target), img (with src, alt, width, height), table, thead, tbody, tr, th, td, hr, br. Everything else — including all event attributes — should be stripped.

<\!-- Section 8 -->

8. Markdown Flavors Compared

When choosing a Markdown flavor for a project, the choice affects which parser you use, which syntax extensions are available, and what the output looks like for edge cases.

Flavor Based On Key Extensions Best For
CommonMark Gruber Markdown None — strict spec only Interoperability, spec compliance testing
GitHub Flavored (GFM) CommonMark Tables, task lists, strikethrough, autolinks, disallowed raw HTML Open source repos, GitHub content
MultiMarkdown Gruber Markdown Tables, footnotes, definition lists, metadata, math, cross-references Academic writing, long-form documents
Pandoc's Markdown CommonMark + extensions Footnotes, citations, definition lists, grid tables, raw TeX/HTML blocks, metadata headers Academic publishing, PDF/docx/EPUB export
kramdown Gruber Markdown Attribute lists, fenced code blocks, footnotes, math via MathJax Jekyll, Ruby static sites
R Markdown Pandoc Executable R/Python code chunks, LaTeX math, bibliography Data science reports, reproducible research

For most web development use cases, GFM is the right choice — it is well-specified, widely supported, and the syntax is familiar to any developer who has used GitHub. For technical documentation pipelines that produce multiple output formats (HTML, PDF, EPUB, docx), Pandoc's Markdown is the most powerful option.

<\!-- Section 9 -->

9. Common Markdown Mistakes and How to Fix Them

Even experienced writers make consistent Markdown errors. Here are the most common, with explanations of why they happen and how to fix them.

1. Missing blank lines around block elements

Markdown requires a blank line before and after headings, lists, code blocks, and blockquotes. Without it, the parser may treat the surrounding text as part of the same paragraph.

# Wrong — no blank line before the list This is a paragraph: - item one - item two # Correct This is a paragraph: - item one - item two

2. Indentation inconsistency in nested lists

Nested list items must be indented by exactly the right amount — typically 2 or 4 spaces depending on the parser. Mixing tabs and spaces or using the wrong indent depth causes the parser to break out of the nested list.

# Wrong — inconsistent indentation - Parent - Child (1 space — may not parse as nested) # Correct - Parent - Child (2 spaces — unambiguous)

3. Forgetting to escape special characters

Characters that are significant in Markdown — *, _, [, ], `, #, \ — must be escaped with a backslash when used literally.

# Accidental emphasis The function accepts a *args and **kwargs. # Correct The function accepts a \*args and \*\*kwargs.

4. Spaces in link URLs

URLs with spaces break the link syntax. Encode spaces as %20 or wrap the URL in angle brackets.

# Breaks [link](https://example.com/my document.pdf) # Correct [link](https://example.com/my%20document.pdf)

5. Using Markdown inside HTML blocks

When you open an HTML block in Markdown (e.g., <div>), most parsers stop processing Markdown inside it. Inline Markdown syntax like **bold** will be output as literal text, not as HTML. To use Markdown inside HTML, you typically need a parser-specific extension or must write the HTML markup directly.

6. Line breaks vs paragraph breaks

A single newline in Markdown is usually treated as a space, not a line break. A paragraph break requires a blank line. A forced line break requires two trailing spaces or a backslash (\) at the end of the line (supported by some parsers). This trips up writers coming from word processors.

7. Code fence language mismatch

The language identifier after the opening fence must be lowercase and match what your syntax highlighter expects. ```JavaScript may not be recognized by Prism, which expects ```javascript. The language tag is case-sensitive in most parsers.

8. Unintended ordered list resets

Most parsers reset ordered list numbering to 1 when there is a blank line between items — even if you wrote 3. or 7.. Rely on the parser to handle numbering automatically and always start your lists with 1..

<\!-- FAQ -->

10. Frequently Asked Questions

Does every Markdown parser produce the same HTML?

No. Original Markdown left many edge cases undefined, which means different parsers make different decisions for ambiguous input. The CommonMark spec was created specifically to address this. If you need consistent output across tools, choose parsers that are CommonMark-compliant and use only syntax covered by the CommonMark spec — avoid dialect-specific extensions unless you are committed to a single parser.

Can I use HTML directly inside Markdown?

Yes, most parsers pass raw HTML through to the output unchanged. This is useful for elements Markdown cannot express — custom attributes, <details> blocks, <video> embeds. However, Markdown inside an HTML block is not processed by many parsers, so you may need to write the inner content in HTML too. GFM explicitly disallows certain HTML tags and escapes others, so check your parser's documentation.

Is it safe to render user-supplied Markdown in my app?

Not without sanitization. Markdown parsers allow raw HTML by default, which opens XSS vulnerabilities. Always sanitize the HTML output before inserting it into the DOM. The recommended approach is to disable raw HTML in the parser configuration, then use DOMPurify on the output for defense-in-depth. Never rely on a single sanitization layer for user-generated content.

What is the difference between CommonMark and GitHub Flavored Markdown?

GitHub Flavored Markdown (GFM) is a strict superset of CommonMark. It adds tables, task lists, strikethrough (~~text~~), and extended autolink detection. GFM also explicitly disallows certain raw HTML tags that CommonMark allows. If your Markdown source is valid CommonMark, it will render correctly under GFM, but not necessarily the other way around.

How do I add custom CSS classes to Markdown-generated HTML?

Standard Markdown has no syntax for CSS classes. Your options are: (1) use raw HTML inline — <p class="note">…</p>; (2) use kramdown's inline attribute list syntax if your parser supports it — {: .classname}; (3) post-process the HTML with a DOM manipulation library to add classes based on element type or position; (4) apply CSS using structural selectors (e.g., article > p:first-of-type) without touching the Markdown output at all.

What is the best way to convert a large Markdown document to HTML in a build pipeline?

For single files, pandoc input.md -o output.html is the fastest option with the most flexibility. For Node.js build pipelines, markdown-it or unified with remark-parse and rehype plugins give you programmatic control and a rich plugin ecosystem. For static sites, use your framework's built-in Markdown processing (Hugo's Goldmark, Eleventy's markdown-it, Astro's built-in processor) rather than wiring up a custom pipeline.