MinerU-HTML
MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
Category
AI Writing
Quality
82/100
Primary source
GitHub
What is MinerU-HTML?
MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
Key features
Best fit
Why consider it
- MinerU-HTML is categorized for ai writing workflows and tagged with Copywriting, SEO, Notes.
- The public repository has 265 stars, which gives buyers and builders an extra adoption signal.
- License metadata is available: Apache-2.0.
Source & verification
- Verified on Jun 30, 2026 from public source metadata.
- Primary reference: github.com.
- Repository freshness signal: last commit Mar 27, 2026.
Alternative tools
The API to search, scrape, and interact with the web at scale. 🔥
LlamaIndex is the leading document agent and OCR platform
A cross-platform Markdown AI note-taking software.
Related tools
Conversion from Excel to structured JSON (tables, shapes, charts) for LLM/RAG pipelines, and autonomous Excel reading/writing by AI agents via CLI and MCP integration.
LlamaIndex is the leading document agent and OCR platform
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No.