The site's maintainer manually subscribes a randomly generated, yet vaguely realistic address to a candidate's mailing list. If that address gets a message, it's very likely to be from that campaign. We then do some very basic processing to save any externally linked files (mostly images) and attachments for posterity, and then add it to our database, usually within a few seconds of the message being sent.
This site is written entirely in Clojure, using ClojureScript on the frontend. For more technical details, refer to the developer documentation and source code.
There are three major reasons we can't collect every email that are inherent to the way we gather emails and that every user should be aware of:
For 2020, the biggest limitation is that 2020 election data is incomplete due to technical issues. It is not a complete archive of the election. In particular, the site was down during the (obviously critical) weeks leading up to the election. We have better coverage for the 2022 midterms. We've fixed many of the issues, so the hope is that 2024 will have a more complete set of records.
If you're informally looking through campaign emails to get a sense of the candidates, the more technical data quality issues shouldn't really affect you (but see the two major issues discussed above). If you plan to get on candidate's mailing lists yourself, it's worth reading up on how campaigns use your data.
Because of the targeting issues described above, it's hard to be sure any email was sent to the campaign's entire email list or even a significant part of it. My best guess is that most emails here were sent to either the entire list or the vast majority, however.
The HTML closely resembles what most people would actually see, so that's your best bet if you want to include a screenshot of the email as a part of your story, video, segment, tweet, etc.
If you're analyzing the textual contents, you can use the text field, but be aware that sometimes the images in the message contain significant amounts of text.
Because we do some processing, you may want to look at dumps of the raw .eml files, which come straight from our email provider (currently MailGun). As far as I know, literally every email provider adds headers to messages, including MailGun, so keep that in mind if you're analyzing headers—some did not originate with the campaign.
While low-volume scraping is okay, both for your convenience and the site's sake, I recommend using either the API or the (forthcoming) data dumps instead.
Teachers are welcome to use this site for coding students to practice scraping on, but please teach scraping etiquette and either implement some technical measures to throttle overzealous student scripts :) or let me know in advance. Eventually, code and documentation will be available so you can stand up your own instance of this site.
If there's a reason neither the data dumps nor the API work for you, please open an issue in the bug tracker.
Yes. Here are a few:
If I had known about the Princeton and DC Inbox efforts when I started this project in November 2019, I'm not sure I would have continued. However, neither offers API access, so it seems like this project adds something to the existing projects.
Princeton Corpus of Political Beliefs claims that has more complete coverage than Archive of Poltical Emails: "For those sender types that are in scope for both of our datasets, our collection of senders is more complete: we miss only 1 sender out of a sample of 30 that they have; conversely they miss 89.4% of the senders in our corpus."
I feel that the basic concept of this site—making the emails campaigns send public—is unambiguously in the public interest. I'd still like to hear your concerns, but just know that I probably won't exclude your campaign just because you ask. Please let me know if Citizen Inbox is causing you technical problems due to impementation bugs or design decisions.
This instance is maintained by Alys S. Brooks, who also developed the app itself. I'm a freelance journalist and software developer based in Madison, Wis.
In the early days, I'd like to at least respond to all questions and requests. Since I don't get paid for this, I cannot make any long-term support guarantees. I'll try to keep this updated if this changes.