You will find below some useful documents from Adobe related to the PDF file format :
  • PDF Reference version 1.7
    The complete PDF specifications, version 1.7. If you're enough enthusiastic to read the 1300 pages of this document, keep in mind that Adobe also provided a generous set of technical notes addressing various specific topics not completely covered by these specifications. Some of these technical notes are more than 200 pages long...
  • Changes from PDF versions 1.3 to 1.4 (technical note #5409)
    Differences between PDF file formats version 1.3 and 1.4. This is especially useful if you are interested in how password-protected files are encrypted (chapter 3.5).
  • Character map (CMAP) resources (technical note #5099)
    Character maps are substitution tables that associate characters referenced in page contents with their Unicode counterpart.
    Although described in the PDF specifications, this technical note brings lots of details about how they should be generated and interpreted.
  • CID Fonts (technical note #5014)
    Character-base Id (CID) Fonts are embedded, semi-embedded or internal fonts specific to Adobe.
    They implement a way to associate a character id with its glyph representation using different styles (italic, bold, etc.) and have been used for foreign languages such as Chinese, Japanese or Korean.
    Unfortunately, since they were implemented before the Unicode standard emerged, it is impossible to programmatically determine their Unicode counterpart, because no such information exists.
    Adobe has published several technical notes giving details about the glyphs implemented by some CID fonts ; some of them give their Unicode counterpart ; the list below is not exhaustive :

    • Adobe-Japan1-6 Character Collection for CID-Keyed Fonts (technical note #5078)
      Character collection for the Adobe Japan 1 supplements 0 to 6.
      Includes their Unicode counterparts.
    • Adobe-GB1-5 Character Collection for CID-Keyed Fonts (technical note #5079)
      Character collection for the Adobe GB supplements 1 to 5.
      Supports the GB 2312-80, GB 1988-89, GB/T 12345-90, GB13000.1-93, and GB 18030-2005 character set standards.
      This document does not specify the Unicode counterparts of characters.
    • Adobe-CNS1-5 Character Collection for CID-Keyed Fonts (technical note #5080)
      Character collection for the Adobe CNS1 supplements 0 to 5.
      the Big Five and CNS 11643-1992 character set standards, the ETen extensions to Big Five, the Hong Kong Supplementary Character Set (Hong Kong SCS) in its extended form, plus a number of extensions to Big Five that contain more characters used in the Hong Kong locale.
      This document does not specify the Unicode counterparts of characters.
    • Adobe-Korea1-2 Character Collection for CID-Keyed Fonts (technical note #5093)
      Character collection for the Adobe Korea1 supplements 0 to 2.
      This document does not specify the Unicode counterparts of characters.
    • Adobe-CJKV Character Collections for CID-Keyed Fonts (technical note #5094)
      Provides an overview of the Chinese, Korean, Japanese and Vietnamese (CJKV) character set standards supported by Adobe.
And here are a few useful links collected from elsewhere :
  • PDF Encryption
    A useful, although a little bit old, explanation on how to decrypt a PDF file once you have the password.