Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Working with the Clinton State Dept Email Dumps in R, Part 1 (rud.is)
134 points by grej on Feb 21, 2016 | hide | past | favorite | 14 comments


Fun fact: the "disconnected mini-graphs" the author mentions toward the end could also be attributed to Clinton being included via cc / bcc.

Source: I've built & reviewed email graphs from IMAP & POP dumps too


For anyone who wants the raw text for themselves (though I leave the parsing/sorting to you), I wrote up this Python process that uses Poppler's pdftotext to extract from the PDFs:

https://github.com/datahoarder/secretary-clinton-email-dump

Though it's pretty cool that there's an API on the WSJ site that you can use to leverage the parsing they've done.



Yeah, so there has to be a pony under all that horsesh*t somewhere, so keep digging as the joke goes.

Why again is all email public record but telephone calls are not?

I'd sure like to see all the emails from all the senators of this country analyzed.


I read through them a bit. I'd like to see the frequency of "pls print".

Seems to be her only contribution to every conversation...


Only 17 ads on that page. I think I'd have a word with my professor for sending me to a site like that.


Search the emails via search engine, e.g., Google: https://www.google.ie/search?q=site:foia.state.gov+/searchap...


Are there any surprising names in the list? Most of the high frequency ones are to be expected.


Might be better to have the direct link, as the iframe is a bit quirky on mobile:

http://rud.is/projects/clinton_emails_01.html


I do not understand the matryoshka doll setup here. The quick link to the "iframe busting link" seems to indicate the author of the post is also aware of how screwy things are. I understand the author is not responsible for the awfulness that is rbloggers, but the iframe inside a blog post is bizarre.


Thanks, changed from http://www.r-bloggers.com/working-with-the-clinton-state-dep....

It's better to send issues like this to hn@ycombinator.com, since that guarantees we'll see them and it's haphazard otherwise.


Agree. Though I do not have authorization to adjust the link - guess it requires a mod.


Yes, it'd be too easy to abuse. Post a piece of important news, get it upvoted to the top, change the link to something malicious.


And the gist with code from the post: https://gist.github.com/hrbrmstr/696b124d0190bba15817




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: