PdfToText is a lightweight solution contained into one single PHP source; its purpose is to extract text from your PDF files.

Written in pure PHP, the PdfToText class does not require you to use tools available only as external binary packages. This will be a brain-saver for you if you are using it on shared servers (no installation and no configuration will ever be required).

The PdfToText class currently supports the following features :
  • The text contents of a PDF file are available as a whole, but also as individual pages, using either the Text string property or the Pages array property
  • Ability to capture areas of text, and even report lines/columns, based on a Capture XML definition.
  • Form data can be extracted separately
  • You can use class methods to search for text within your document and retrieve the corresponding page
  • You can extract JPEG images from your PDF input and save them to externals files
  • The class has been carefully optimized to reduce both memory usage and execution time

And more to come :
  • Handling of CID fonts (which were designed by Adobe before the Unicode standard emerged). Currently experimental !
  • Handling of additional image formats (CCIT FAX, etc.)

Have a try with the online demo page, then take a look at the documentation one. And finally, dare to try a download !
Latest news
Version 1.6.7 has been released
(2017/05/31)
Check the Downloads section !
Hot news !
  • Capture areas of text
  • Capture lines/columns
  • Extract form data
  • Basic page layout rendering
  • RTL language processing
  • CID fonts (experimental)