For anyone who wants the raw text for themselves (though I leave the parsing/sorting to you), I wrote up this Python process that uses Poppler's pdftotext to extract from the PDFs:
I do not understand the matryoshka doll setup here. The quick link to the "iframe busting link" seems to indicate the author of the post is also aware of how screwy things are. I understand the author is not responsible for the awfulness that is rbloggers, but the iframe inside a blog post is bizarre.
Source: I've built & reviewed email graphs from IMAP & POP dumps too