Working with the Clinton State Dept Email Dumps in R, Part 1

anulman · on Feb 21, 2016

Fun fact: the "disconnected mini-graphs" the author mentions toward the end could also be attributed to Clinton being included via cc / bcc.

Source: I've built & reviewed email graphs from IMAP & POP dumps too

danso · on Feb 21, 2016

For anyone who wants the raw text for themselves (though I leave the parsing/sorting to you), I wrote up this Python process that uses Poppler's pdftotext to extract from the PDFs:

https://github.com/datahoarder/secretary-clinton-email-dump

Though it's pretty cool that there's an API on the WSJ site that you can use to leverage the parsing they've done.

toomuchtodo · on Feb 21, 2016

This is the WSJ's email parser: https://github.com/wsjdata/clinton-email-cruncher

ck2 · on Feb 21, 2016

Yeah, so there has to be a pony under all that horsesh*t somewhere, so keep digging as the joke goes.

Why again is all email public record but telephone calls are not?

I'd sure like to see all the emails from all the senators of this country analyzed.

irixusr · on Feb 21, 2016

I read through them a bit. I'd like to see the frequency of "pls print".

Seems to be her only contribution to every conversation...

Gratsby · on Feb 21, 2016

Only 17 ads on that page. I think I'd have a word with my professor for sending me to a site like that.

chflags · on Feb 21, 2016

Search the emails via search engine, e.g., Google: https://www.google.ie/search?q=site:foia.state.gov+/searchap...

pete00 · on Feb 21, 2016

Are there any surprising names in the list? Most of the high frequency ones are to be expected.

packetized · on Feb 21, 2016

Might be better to have the direct link, as the iframe is a bit quirky on mobile:

http://rud.is/projects/clinton_emails_01.html

dfc · on Feb 21, 2016

I do not understand the matryoshka doll setup here. The quick link to the "iframe busting link" seems to indicate the author of the post is also aware of how screwy things are. I understand the author is not responsible for the awfulness that is rbloggers, but the iframe inside a blog post is bizarre.

dang · on Feb 21, 2016

Thanks, changed from http://www.r-bloggers.com/working-with-the-clinton-state-dep....

It's better to send issues like this to hn@ycombinator.com, since that guarantees we'll see them and it's haphazard otherwise.

grej · on Feb 21, 2016

Agree. Though I do not have authorization to adjust the link - guess it requires a mod.

broodbucket · on Feb 21, 2016

Yes, it'd be too easy to abuse. Post a piece of important news, get it upvoted to the top, change the link to something malicious.

toomuchtodo · on Feb 21, 2016

And the gist with code from the post: https://gist.github.com/hrbrmstr/696b124d0190bba15817