PDF to Markdown, extract structured text instantly
Upload a PDF file and get clean Markdown output. Headings, paragraphs, and lists detected automatically. Layout noise stripped.
Upload your PDF document
Drop a .pdf file here or click to browse
Supports drag & drop
- Runs entirely in your browser
- Smart detection of headings, paragraphs, lists
- Completely free
How it works
Three steps, no signup.
- 1
Upload a PDF
Drag and drop or click to select your PDF file.
- 2
Smart parsing
Font sizes analyzed to infer heading levels. Paragraphs and lists extracted.
- 3
Copy or download
One-click copy Markdown, or download as .md. Feed it back into ChatGPT or Notion.
Features
Built with care for AI-era document delivery.
Smart structure detection
Automatically infers heading levels from font size and weight. Detects lists and paragraph structures.
Layout noise cleanup
Strips headers, footers, page numbers, and other redundant elements. Clean, readable output.
Client-side conversion
Your file never leaves the browser. Works offline. No server round-trip.
Frequently asked questions
Still curious? Email us at mafk35444@gmail.com
FlowDoc uses Mozilla's open-source PDF.js engine to parse PDF text streams directly in the browser. The system extracts metadata for each text block including font size, font name, and coordinates. Through statistical analysis of font size distribution, it automatically determines the body text baseline and identifies text significantly larger than the baseline as headings (H1-H4). It also detects bullet symbols and numbered prefixes to identify list structures.
The current version only supports PDFs with selectable text layers (native or electronic PDFs). For scanned documents or image-only PDFs, OCR processing is required first. We have OCR integration on our development roadmap. For now, we recommend using OCR tools like Adobe Acrobat to convert scanned PDFs to searchable PDFs before importing into FlowDoc.
FlowDoc extracts all visible text content from the PDF without losing any textual information. However, since PDF is fundamentally a visual presentation format rather than a structured format, heading level inference is based on heuristic font-size analysis and may need manual adjustment for non-standard layouts. For standard business documents, academic papers, and technical reports, detection accuracy is very high.
Absolutely not. The entire PDF parsing process runs completely in your local browser. The PDF.js engine operates within the browser's JavaScript sandbox, decompressing and parsing the PDF binary stream, then our structural algorithm converts the extracted text to Markdown. No data ever leaves your device, and it works perfectly even in a completely offline environment.