Identify corrupt lines in the armoured GPG payload

RJA · January 22, 2024, 3:51pm

I have certain information printed on paper as GPG-encrypted, armoured text. In essence an array of lines, 64 characters each, limited to the following characters:

0123456789+=/ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

I have tried various OCR techniques combined with diff and comparison between paper scan and output to pick the best candidates.

I was able to restore some of the packets, but not all of them. Example output of --list-packets:

gpg: CRC error; A95601 - E825A9
gpg: [don't know]: invalid packet (ctb=08)
gpg: onepass_sig with unknown version 153
gpg: [don't know]: invalid packet (ctb=6d)
gpg: [don't know]: invalid packet (ctb=12)
gpg: [don't know]: invalid packet (ctb=2f)
gpg: [don't know]: invalid packet (ctb=55)
gpg: [don't know]: invalid packet (ctb=2e)
# off=0 ctb=95 tag=5 hlen=3 plen=1414
:secret key packet:
        version 4, algo 1, created 1234101081, expires 0
        pkey[0]: [3072 bits]
        pkey[1]: [17 bits]
        iter+salt S2K, algo: 7, SHA1 protection, hash: 2, salt: 82610A0F91638117
        protect count: 44040192 (245)
        protect IV:  01 d2 54 d4 e7 c9 00 41 65 ba 82 6b 90 a4 e1 87
        skey[2]: [v4 protected]
        keyid: E01B51ED91E23E1F
# off=1417 ctb=b4 tag=13 hlen=2 plen=39
:user ID packet: "Known User <known@user.edu>"
# off=1458 ctb=89 tag=2 hlen=3 plen=463
:signature packet: algo 1, keyid 1F2E15CB10B2F6B3
        version 4, created 1249702031, md5len 0, sigclass 0x13
        digest algo 8, begin of digest 66 30
        hashed subpkt 27 len 1 (key flags: 03)
        hashed subpkt 11 len 4 (pref-sym-algos: 9 8 7 2)
        hashed subpkt 21 len 5 (pref-hash-algos: 8 9 10 11 2)
        hashed subpkt 22 len 4 (pref-zip-algos: 2 3 1 0)
        hashed subpkt 33 len 21 (issuer fpr v4 F24B00174C92E1616751BF107FB344CF50B71E63)
        hashed subpkt 2 len 4 (sig created 2021-04-01)
        hashed subpkt 9 len 4 (key expires after 1y0d10h15m)
        subpkt 16 len 8 (issuer key ID 1F2E15CB10B2F6B3)
        data: [3072 bits]
# off=1925 ctb=90 tag=4 hlen=2 plen=91
:onepass_sig packet: [unknown version]
# off=2019 ctb=ed tag=45 hlen=2 plen=188 new-ctb
:unknown packet: type 45, length 188
dump: [ stripped ]
  24: [ stripped ]
  48: [ stripped ]
  72: [ stripped ]
  96: [ stripped ]
 120: [ stripped ]
 144: [ stripped ]
 168: [ stripped ]
# off=2213 ctb=a3 tag=8 hlen=1 plen=0 indeterminate
:compressed packet: algo=10

Scan quality is pretty good, but the sheer volume of data makes manual entry impractical.

I am looking for clues how I can identify offending lines, even if approximately.

bernhard · January 23, 2024, 9:06am

Hi @RJA,
what about formatting the recognized characters in the same format (size font family) as the scanned images and then overlay them with a graphic application (like gimp, half transparency) and then use visual inspection. The human eye usually detects problems on a page pretty fast.

Best Regards,
Bernhard

RJA · January 23, 2024, 11:08am

That is a good idea. Have to find the used font though, as I don’t know what was used to create the printout… But we will get there.