Latest versions

PdfToText 1.6.7
(release date : 2017/05/31)
Changes :
  • Added CID fonts
  • Changed the way CID font maps are searched and handled
Download :
PdfToText 1.6.6
(release date : 2017/05/22)
Changes :
  • Completely rebuilt the page layout rendering algorithm
  • Some character widths were not correctly extracted because of line breaks in the widths list.
  • Fixed an issue where a character map was sometimes instantiated with the wrong parameter.
  • Correctly handle character widths for characters defined by CharProcs (ie, for which the only information we have is how to draw the glyph, but no Unicode equivalent) and the corresponding character names that may have been passed to the AddAdobeExtraMappings() method.
  • Properly decode sequences of hex digits when there is no current font applicable.
  • Completed the Unicode to Ansi character map
Download :
PdfToText 1.6.5
(release date : 2017/05/20)
Changes :
  • Complemented the Unicode to Ansi mapping table.
  • Added the AddAdobeExtraMappings() method, to complement the standard Adobe character maps when given character names refer to a glyph that has no Unicode equivalent.
  • Added the MarkTextLike() method to mark certain portions of text based on their font name and size.
  • Changed the GetCaptures() method to return by default a collection of stdClass objects instead of PdfToText objects whose contents takes time to be displayed when using the print_r() function.
    The new boolean parameter $full allows to return PdfToText objects instead when set to true.
Download :
PdfToText 1.6.4
(release date : 2017/05/18)
Changes :
  • Template expansion was performed using preg_replace(). When the template data contained strings such as '\00' or '$something', preg_replace() tried to replace them with the corresponding capture in the search pattern, which caused the result to be garbled. The PregStrReplace() method has been added to avoid such issues.
  • The file was searched in the wrong directory.
Download :
PdfToText 1.6.3
(release date : 2017/05/17)
Changes :
  • Changed the $CharacterClasses table, which was causing some constructs, such as T*, not to be recognized as a single instruction.
  • Fixed a decoding bug when a series of hex digits enclosed in '<>' also contained spaces and newlines (which should have been ignored).
  • Allow names in the /Differences array to use the '#xy' notation, where 'xy' are hex digits.
  • Significantly complemented character maps for the four Adobe predefined character sets.
  • Text captures : changed the behavior of the definitions ; now, captured areas are accessible by their page number, instead of a sequential index starting from zero. A capture is defined for each page of the document, even if not in the list of applicable pages. Of course, empty captures will be contained in the list if nothing was captured on the corresponding page.
Download :
Archived versions