Citizen Inbox answers "What are campaigns emailing their supporters?"

Q&A

How does this work?

The site's maintainer manually subscribes a randomly generated, yet vaguely realistic address to a candidate's mailing list. If that address gets a message, it's very likely to be from that campaign. We then do some very basic processing to save any externally linked files (mostly images) and attachments for posterity, and then add it to our database, usually within a few seconds of the message being sent.

This site is written entirely in Clojure, using ClojureScript on the frontend. For more technical details, refer to the developer documentation and source code.

Are there any limitations to the data I should be aware of...

...Regardless of why I'm interested?

There are three major reasons we can't collect every email that are inherent to the way we gather emails and that every user should be aware of:

  • Demographic targeting.
  • Donation-based targeting. Since campaigns frequently ask for donations at the same time they collect email addresses, it would be trivial for them to send emails to only donors. It's both financially infeasible and ethically dubious for Citizen Inbox to donate to any or all of the candidates it tracks, so we will not receiving messages sent only to donors.
  • A/B testing. As the Princeton Corpus of Political Emails points out, campaigns may randomly assign recipients to different email messages in an attempt to see which prompts more donations, more people to open the message, or other metrics, a practice known as A/B testing.

For 2020, the biggest limitation is that 2020 election data is incomplete due to technical issues. It is not a complete archive of the election. In particular, the site was down during the (obviously critical) weeks leading up to the election. We have better coverage for the 2022 midterms. We've fixed many of the issues, so the hope is that 2024 will have a more complete set of records.

...As a voter/resident/student/interested person not from the U.S.?

If you're informally looking through campaign emails to get a sense of the candidates, the more technical data quality issues shouldn't really affect you (but see the two major issues discussed above). If you plan to get on candidate's mailing lists yourself, it's worth reading up on how campaigns use your data.

...As a journalist?

Because of the targeting issues described above, it's hard to be sure any email was sent to the campaign's entire email list or even a significant part of it. My best guess is that most emails here were sent to either the entire list or the vast majority, however.

The HTML closely resembles what most people would actually see, so that's your best bet if you want to include a screenshot of the email as a part of your story, video, segment, tweet, etc.

...As a data scientist/statistician/researcher?

If you're analyzing the textual contents, you can use the text field, but be aware that sometimes the images in the message contain significant amounts of text.

Because we do some processing, you may want to look at dumps of the raw .eml files, which come straight from our email provider (currently MailGun). As far as I know, literally every email provider adds headers to messages, including MailGun, so keep that in mind if you're analyzing headers—some did not originate with the campaign.

Am I allowed to scrape this site?

While low-volume scraping is okay, both for your convenience and the site's sake, I recommend using either the API or the (forthcoming) data dumps instead.

Teachers are welcome to use this site for coding students to practice scraping on, but please teach scraping etiquette and either implement some technical measures to throttle overzealous student scripts :) or let me know in advance. Eventually, code and documentation will be available so you can stand up your own instance of this site.

If there's a reason neither the data dumps nor the API work for you, please open an issue in the bug tracker.

Are there similar projects?

Yes. Here are a few:

  • ProPublica's Message Machine ran during the 2012 election
  • SendView ran the Presidential Senders projct during the 2020 election.
  • Princeton Corpus of Political Emails Covers a wide variety of candidates during the 2020 election, including some state candidates. They didn't run a version in 2022, and it's unclear whether they intend to run the same process for 2024.
  • DCInbox looks at email newsletters sent by current members of Congress.
  • Archive of Political Emails is perhaps the broadest of these, as it includes PACs, 501 organizations, and international organizations as well.

If I had known about the Princeton and DC Inbox efforts when I started this project in November 2019, I'm not sure I would have continued. However, neither offers API access, so it seems like this project adds something to the existing projects.

Princeton Corpus of Political Beliefs claims that has more complete coverage than Archive of Poltical Emails: "For those sender types that are in scope for both of our datasets, our collection of senders is more complete: we miss only 1 sender out of a sample of 30 that they have; conversely they miss 89.4% of the senders in our corpus."

I'm part of a campaign. Who do I contact if I have concerns or objections?

I feel that the basic concept of this site—making the emails campaigns send public—is unambiguously in the public interest. I'd still like to hear your concerns, but just know that I probably won't exclude your campaign just because you ask. Please let me know if Citizen Inbox is causing you technical problems due to impementation bugs or design decisions.

campaign@citizen-inbox.com

Who's behind this?

This instance is maintained by Alys S. Brooks, who also developed the app itself. I'm a freelance journalist and software developer based in Madison, Wis.

Where can I get support?

In the early days, I'd like to at least respond to all questions and requests. Since I don't get paid for this, I cannot make any long-term support guarantees. I'll try to keep this updated if this changes.