Back to home

PDF to Markdown, extract structured text instantly

Upload a PDF file and get clean Markdown output. Headings, paragraphs, and lists detected automatically. Layout noise stripped.

Upload your PDF document

Drop a .pdf file here or click to browse

Supports drag & drop

  • Runs entirely in your browser
  • Smart detection of headings, paragraphs, lists
  • Completely free

How it works

Three steps, no signup.

  1. 1

    Upload a PDF

    Drag and drop or click to select your PDF file.

  2. 2

    Smart parsing

    Font sizes analyzed to infer heading levels. Paragraphs and lists extracted.

  3. 3

    Copy or download

    One-click copy Markdown, or download as .md. Feed it back into ChatGPT or Notion.

Features

Built with care for AI-era document delivery.

  • Smart structure detection

    Automatically infers heading levels from font size and weight. Detects lists and paragraph structures.

  • Layout noise cleanup

    Strips headers, footers, page numbers, and other redundant elements. Clean, readable output.

  • Client-side conversion

    Your file never leaves the browser. Works offline. No server round-trip.

Frequently asked questions

Still curious? Email us at mafk35444@gmail.com

  • FlowDoc uses Mozilla's open-source PDF.js engine to parse PDF text streams directly in the browser. The system extracts metadata for each text block including font size, font name, and coordinates. Through statistical analysis of font size distribution, it automatically determines the body text baseline and identifies text significantly larger than the baseline as headings (H1-H4). It also detects bullet symbols and numbered prefixes to identify list structures.

  • The current version only supports PDFs with selectable text layers (native or electronic PDFs). For scanned documents or image-only PDFs, OCR processing is required first. We have OCR integration on our development roadmap. For now, we recommend using OCR tools like Adobe Acrobat to convert scanned PDFs to searchable PDFs before importing into FlowDoc.

  • FlowDoc extracts all visible text content from the PDF without losing any textual information. However, since PDF is fundamentally a visual presentation format rather than a structured format, heading level inference is based on heuristic font-size analysis and may need manual adjustment for non-standard layouts. For standard business documents, academic papers, and technical reports, detection accuracy is very high.

  • Absolutely not. The entire PDF parsing process runs completely in your local browser. The PDF.js engine operates within the browser's JavaScript sandbox, decompressing and parsing the PDF binary stream, then our structural algorithm converts the extracted text to Markdown. No data ever leaves your device, and it works perfectly even in a completely offline environment.