Build a Production-Ready HTML Beautifier: The "Regex Engine" 🏗️
Formatters in 2024 are bloated.
Common libraries like js-beautify are massive (200KB+).
For 99% of use cases, you don't need a full AST parser. You just need a smart Indentation Engine.
In this detailed guide, we will build the exact Regex-Based Engine used in FastTools production. It handles:
- Beautification (Smart Indenting).
- Minification (Whitespace Stripping).
- Comment Removal (Cleaning Code).
- Verification (Tag Matching).
Step 1: The Tokenizer Strategy 🔪
How do you parse HTML without a parser? You exploit the structure of tags.
<div> always starts with < and ends with >.
The Magic Split:
const nodes = html.split(/>\s*</);
This splits <div><span>Text</span></div> into ["<div", "span>Text</span", "/div>"].
Note: We lose the < and > characters during split, so we must re-add them.
Step 2: The Indentation Logic (Stack Counter) 📚
We use a simple integer indent to track depth.
- Start Tag (
<div):indent++ - End Tag (
</div):indent-- - Self-Closing (
<img):indent(No change)
Production Regex for Tags:
- Closing Tag:
/^\/\w/(Starts with/like/div) - Opening Tag:
/^<?\w[^>]*[^/]$/(Starts with char, doesn't end with/)
Step 3: Handling "Void" Tags (The Edge Cases) ⚠️
Some tags look like opening tags but never close (<input>, <br>, <meta>).
If you indent on <input>, your code will drift right forever.
const VOID_TAGS = ['area', 'base', 'br', 'col', 'embed', 'hr', 'img', 'input', 'link', 'meta', 'param', 'source', 'track', 'wbr'];
const isVoid = (tagName) => VOID_TAGS.includes(tagName);
Step 4: The Minification Layer 📦
Minification is the opposite of beautification. We want to destroy whitespace.
But we can't just replace(/\s/g, '') because that kills spaces inside text (e.g., "Hello World" -> "HelloWorld").
Safe Minify Logic:
- Replace all newlines/tabs with unique tokens? No, simpler.
- Collapse multiple spaces into one space:
.replace(/\s+/g, ' ') - Remove space between tags:
.replace(/>\s+</g, '><')
Step 5: Advanced Features (Comments & Indent Size) ⚙️
Production tools need options.
- Remove Comments: regex
<!--[\s\S]*?-->. The[\s\S]trick matches newlines too! - Indent Size: Dynamic string repetition
' '.repeat(size).
Step 6: The Full Production Code (Beautifier.tsx) 💻
Here is the complete, robust component used in our production environment.
'use client';
import { useState } from 'react';
import { Copy, Download, Trash2, FileCode } from 'lucide-react';
export default function ProductionBeautifier() {
const [input, setInput] = useState('');
const [output, setOutput] = useState('');
const [mode, setMode] = useState('beautify'); // 'beautify' | 'minify'
const [indentSize, setIndentSize] = useState(2);
const [stripComments, setStripComments] = useState(false);
// --- CORE LOGIC START ---
const processHTML = () => {
if (!input.trim()) return;
let code = input;
// 1. Pre-processing (Comments)
if (stripComments) {
code = code.replace(/<!--[\s\S]*?-->/g, '');
}
// 2. Minification Path
if (mode === 'minify') {
const minified = code
.replace(/\s+/g, ' ') // Collapse whitespace
.replace(/>\s+</g, '><') // Remove tag gaps
.trim();
setOutput(minified);
return;
}
// 3. Beautification Path
let formatted = '';
let indent = 0;
const tab = ' '.repeat(indentSize);
code.split(/>\s*</).forEach(node => {
// Decrease indent for closing tags (e.g. /div)
if (node.match(/^\/\w/)) indent--;
// Prevent negative indent crash
const level = Math.max(0, indent);
// Reconstruct tag with padding
formatted += tab.repeat(level) + '<' + node + '>\n';
// Increase indent for opening tags (that are NOT void/self-closing)
// Regex checks: Starts with char, doesn't end with /
if (node.match(/^<?\w[^>]*[^/]$/)) {
// Check against list of known void tags
const tagName = node.match(/^<?(\w+)/)[1].toLowerCase();
const voidTags = ['img','input','br','hr','meta','link','base'];
if (!voidTags.includes(tagName)) {
indent++;
}
}
});
// Cleanup first/last lines from split artifacts
setOutput(formatted.substring(1, formatted.length - 2));
};
// --- CORE LOGIC END ---
// UI Helpers
const copy = () => void navigator.clipboard.writeText(output);
const clear = () => { setInput(''); setOutput(''); };
return (
<div className="max-w-4xl mx-auto p-6 space-y-6 bg-slate-50 border rounded-xl">
<div className="flex justify-between items-center">
<h2 className="text-2xl font-bold flex gap-2 items-center text-slate-800">
<FileCode className="text-blue-600"/> HTML Engine
</h2>
{/* Controls */}
<div className="flex gap-2">
<select
className="p-2 rounded border bg-white text-sm"
value={mode} onChange={e => setMode(e.target.value)}
>
<option value="beautify">Beautify Mode</option>
<option value="minify">Minify Mode</option>
</select>
<select
className="p-2 rounded border bg-white text-sm"
value={indentSize} onChange={e => setIndentSize(Number(e.target.value))}
disabled={mode === 'minify'}
>
<option value={2}>2 Spaces</option>
<option value={4}>4 Spaces</option>
</select>
<button
onClick={() => setStripComments(!stripComments)}
className={`p-2 rounded border text-sm ${stripComments ? 'bg-red-100 text-red-700 border-red-200' : 'bg-white'}`}
>
{stripComments ? 'No Comments' : 'Keep Comments'}
</button>
</div>
</div>
<div className="grid md:grid-cols-2 gap-4">
{/* Input */}
<div className="space-y-2">
<div className="flex justify-between text-xs font-bold text-slate-500 uppercase">
<span>Input HTML</span>
<button onClick={clear} className="text-red-500 hover:text-red-700 flex gap-1 items-center">
<Trash2 size={12}/> Clear
</button>
</div>
<textarea
className="w-full h-80 p-4 rounded-lg border border-slate-300 font-mono text-xs focus:ring-2 focus:ring-blue-500 outline-none"
placeholder="Paste messy code..."
value={input}
onChange={e => setInput(e.target.value)}
/>
</div>
{/* Output */}
<div className="space-y-2">
<div className="flex justify-between text-xs font-bold text-slate-500 uppercase">
<span>Result</span>
<button onClick={copy} className="text-blue-600 hover:text-blue-800 flex gap-1 items-center">
<Copy size={12}/> Copy
</button>
</div>
<textarea
className="w-full h-80 p-4 rounded-lg border border-slate-300 bg-slate-900 text-green-400 font-mono text-xs"
readOnly
value={output}
placeholder="Clean code appears here..."
/>
</div>
</div>
<ArticleAd />
<button
onClick={processHTML}
className="w-full py-4 bg-gradient-to-r from-blue-600 to-indigo-600 text-white font-bold rounded-lg hover:shadow-lg transition transform hover:-translate-y-0.5"
>
{mode === 'beautify' ? '✨ Beautify HTML' : '📦 Minify HTML'}
</button>
</div>
);
}
Step 7: Performance Considerations 🚀
Since we use generic String methods (split, replace), this engine is incredibly fast.
It processes 1MB of HTML in milliseconds because it avoids the overhead of creating DOM Nodes.
However, if you paste invalid HTML (like missing brackets), the Regex split might behave unpredictably. This is why we call it a "Beautifier" (visual format only) and not a "Parser" (strict checking).