When PDF data needs to be in a spreadsheet
PDFs are good for reading, but awkward when you need the same fields somewhere else. A CSV gives you a table you can sort, filter, review, or import.
- Monthly invoices where you need vendor, date, invoice number, and amount in one table.
- Statements where you want account names, statement dates, balances, or summary totals in a tracking sheet.
- Application forms or certificates where specific fields need to go into a database or CRM.
- Research documents where you collect titles, dates, authors, or reference IDs across many files.
Copying these fields by hand is slow and error-prone. Extracting them in one pass gives you a clean review table before you commit the data anywhere else.
What fields can be extracted?
The tool reads selectable text from each PDF and looks for common patterns:
- Dates: looks for common date formats such as
2024-05-18, 05/18/2024, and 18 May 2024. Dates are normalized to ISO format when detected.
- Amounts and totals: looks for labels like
Total, Amount Due, and Balance Due. Currency symbols are preserved when available.
- Names and identifiers: invoice numbers, reference codes, vendor names, account numbers, and first visible text lines.
- Custom fields via regex: capture patterns such as a policy number like
POL-\d{6} or a project code after "Project ID:".
Each PDF becomes a row. Each extraction rule becomes a field column, alongside the original file name, generated file name, and warnings.
How the export works
- Drop your PDFs: text is read locally from each file in your browser.
- Choose fields: use built-in rules, custom regex, or both.
- Review rows: the preview table shows every PDF as a row with extracted fields as columns.
- Edit cells: fix missing or incorrect values before exporting.
- Download CSV: open the file in Excel, Google Sheets, Numbers, or another spreadsheet tool.
Your original PDFs are not changed. The CSV is generated in your browser.
Example CSV output
For a folder of invoices, the exported CSV might look like this:
| original_file_name |
date |
vendor |
invoice_number |
amount |
new_file_name |
warnings |
| download.pdf |
2024-05-18 |
AcmeCorp |
INV-1042 |
$1,299.00 |
2024-05-18_AcmeCorp_INV-1042.pdf |
|
| invoice-3.pdf |
2024-05-20 |
GlobalTech |
INV-2087 |
$450.00 |
2024-05-20_GlobalTech_INV-2087.pdf |
|
| scan_may.pdf |
|
|
|
|
scan_may.pdf |
No text found |
The last row shows what can happen when a PDF has no selectable text. Empty fields and warnings make those rows easier to review before export.
Difference between CSV export and renaming
The PDF renaming tool generates new file names from PDF content and downloads renamed copies. CSV export keeps the files as they are and outputs extracted fields as a spreadsheet instead.
You can use both in the same workflow: export the CSV for your records, then rename the files for your folder.
Privacy
Your PDF files and PDF content are not uploaded to our server. Text extraction and CSV generation happen in your browser. The CSV file is created locally for download.
Limitations
- Scanned PDFs: image-only PDFs without selectable text will not produce extracted fields. The preview shows warnings for these rows. See the scanned PDF page for details.
- Unstructured documents: PDFs without consistent field labels or positions may extract text in unexpected order. The preview lets you check and correct before exporting.
- Complex tables: multi-column tables, merged cells, or nested layouts inside PDFs may not map cleanly to CSV columns. Dedicated table-extraction tools may be more reliable for heavy table extraction.
- Password-protected PDFs: locked files need to be unlocked before the tool can read them.
FAQ
What format is the CSV in?
Standard comma-separated values with UTF-8 encoding. It opens in Excel, Google Sheets, Numbers, or most spreadsheet tools.
Can I choose which columns to include?
Each extraction rule maps to a field column. The export also includes original file name, generated file name, and warnings columns for review.
What if a field is missing in some PDFs?
The preview table shows empty cells for missing fields. You can edit them manually before exporting, or leave them empty in the CSV.
Does this work with non-English documents?
PDF text extraction can work with many languages when the text layer is selectable, but automatic field detection is strongest for common Latin-script patterns.
Is this free to use?
The current browser tool is free to use and does not require signup.
Ready to extract PDF data?
Drop your PDFs, preview the extracted fields in a table, and download a CSV. Your PDF content is not uploaded.
Open the tool