is a lightweight solution contained into one single PHP source; its purpose is to extract text from your PDF files.
Written in pure PHP, the PdfToText
class does not require you to use tools available only as external binary
packages. This will be a brain-saver for you if you are using it on shared servers (no installation and no configuration
will ever be required).
class currently supports the following features :
The text contents of a PDF file are available as a whole, but also as individual pages, using either the Text string property
or the Pages array property
Ability to capture areas of text, and even report lines/columns, based on a Capture XML definition.
Form data can be extracted separately
- You can use class methods to search for text within your document and retrieve the corresponding page
- You can extract JPEG images from your PDF input and save them to externals files
- The class has been carefully optimized to reduce both memory usage and execution time
And more to come :
- Handling of CID fonts (which were designed by Adobe before the Unicode standard emerged). Currently experimental !
- Handling of additional image formats (CCIT FAX, etc.)
Have a try with the online demo
page, then take a look at the
one. And finally, dare to try a download