A Meta-Data Resilient, Self-Funding “Dark” Internet Mail Idea

I’m still reading the specs on DIME, but already it’s leaving a bad taste in my mouth. It feels like it’s more or less trying to band-aid an already broken anonymous mail system, that really isn’t anonymous at all, and leaves far too much metadata lying around. Even with DIME, it looks like too much information is still exposed to be NSA proof (like sender and recipient domain names), and with all of the new moving parts, it leaves a rather large attack surface. It feels more as if DIME gives you plausible deniability, but not necessarily NSA proof anonymity, especially in light of TAO, and the likelihood at least one end of the conversation will be compromised or compelled by FISA. I could be wrong, but it at least got me thinking about what my idea of an Internet dark mail system would look like.

Let me throw this idea out there for you. We all want to be able to just write an email, then throw it anonymously into some large vortex where it will magically and anonymously end up in the recipient’s hands, right? What’s preventing that from being a reality? Well, a few things.

 

Things Preventing Truly Anonymous Email

First of all, money. That is, any truly anonymous system can’t have “customers” in the traditional sense; you need the money that it takes to maintain an infrastructure for public consumption, rather than “customer only” use, and that’s difficult considering just how much money spammers drain out of infrastructure they abuse. For years, people have been trying to make sender pay systems work, and have failed miserably. The big reason for this is there hasn’t been any incentive to switch to sender pays, except for the recipient. There also hasn’t been any anonymous payment system whereby users could use such a system. Sender pay systems aren’t required to make the architecture I’m going to describe work, although I do describe a bartering system among mail peers. In practice, if an organization like Tor were to fund such a system, it could be provided as a public service in the same way that existing Tor services are, without sender pay. Nonetheless, this architecture doesn’t need to work on Tor at all, and could be quite viable on the public Internet.

The second issue is lack of decentralization. Email is centrally sent from your ISP’s servers (because you’re a paying customer), decentralized, then re-centralized in the recipient’s inbox at their ISP’s servers (because they’re a paying customer). Having source and destination email addresses (or domains) practically paints a target on your back in terms of metadata collection. Even if NSA doesn’t know your username, one could easily figure out, with a reasonable level of certainty, where the email is going just based on your traffic patterns alone. Sending a 1MB attachment? It’s not hard to see that in a traffic analysis, even if they don’t have the username from the server… so lets not kid ourselves that DIME is going to obscure your connections simply because only the domain is readable. Metadata is just as important as message content; being able to hide the identities of both endpoints, even beyond that of the organization, is crucial to deploying a secure messaging system. This is one area where I think DIME appears to falter.

Lets address the money issue first. You may not realize it, but technically we already have a sender pays system: you just don’t think about it. You’re paying for Internet service, this is why they allow you to use their SMTP servers (and why most other ISPs do not). Whether you know it or not, ISPs typically have caps on how much email you can send per hour, so they’ve already quantified how much your email will cost them in terms of resources, and have worked that into the price of your service. Did you really think it was free? Even if you’re using Google mail, you’re paying for it in a different way, such as advertising and being one if their guinea pigs. Facebook? Same. But even still, people aren’t bright enough to realize they’re already paying for mail… so where’s the incentive to use a sender-pay system?

Incentive-wise, things have changed a lot in the past year.

Today, we finally have incentive for the sender to want to get on a sender pay system, but no businesses yet are selling you into it… it’s your anonymity. Between widespread surveillance and sabotage by governments around the world, the abuse presented by metadata and large data warehousing analytics, and the general insecurity and instability of using the Internet at all, everyone including foreign governments, corporations, journalists, scientists, and many other individuals all have a strong and legitimate incentive to need to not only encrypt the contents of their email, but also to hide the metadata connecting individuals’ connections to each other. Protecting email contents is easy, but preventing anyone from knowing who you’re talking to is not. We also have a number of decentralized, anonymous payment schemes such as Bitcoin and friends, most of which can be automated. Again, we’re assuming here that sender pays is even necessary; if a public project, such as Tor, were to design a system like this, something like mail would not necessarily need to be funded by the senders (or the recipients).

A Simple Concept

So now imagine this simple concept: you tape a nickel (or a penny) to an envelope, toss it into some giant abyss (a.k.a mail pit). There’s NOTHING on the envelope. That penny gets bartered out by a number of interconnected mail relays, all who take a small piece of that  and fly that blank envelope all over the Internet, and even cut it into pieces that can later be put back together. You can even add in redundancy if you want by copying the contents without actually knowing what they are. The contents of that envelope (including the sender, recipient, and subject) are all encrypted and protected by PKI. The sender’s mail client has encrypted the message using the recipient’s public key, based on their dark mail address in a key registry, but the mail servers don’t have a clue who it is for or what’s inside, nor is any mail relay aware of how many fragments the envelope is cut into or what piece of the sequence they’re storing. Each mail relay storing a fragment assigns a unique temporary token to serve as a locker number, which is relayed to the sender. The fragment gets stored in the locker.

The recipient’s public key can even have already been downloaded by the sender either out-of-band from a key server or at a crypto-party, long before you ever sent them an envelope. Cutting it into pieces isn’t required but further decentralizes your letter and even makes it much harder (if not infeasible) to perform a cipher-text attack against it.  (for example, if you interleave characters in the cipher text).

OK, fine. So it’s all encrypted. The mail servers don’t know where the hell it’s going. Great idea it’s lost now… how do we know where to deliver it?

We don’t.

Stop Delivering Mail!

That’s one of the key problems with current SMTP mail: it gets delivered. That breaks everything – this centralizes all ingres of correspondence for the recipient, and gives a malicious actor a single point of attack. It also gives law enforcement one central place to subpoena, collect, or sometimes even confiscate the mail. STOP DELIVERING YOUR MAIL! Instead let it sit in the ether – in pieces, and with an unknown existence except by the sender – and let the recipient PULL it with a mail client (or mail server, which we’ll discuss later, for caching) to do the job.

So how can you decentralize something like this then so that mail DOESN’T get delivered, but can be received? It’s simple with this system: you keep it fragmented on a number of those different mail relays… the ones you paid that penny to do toss your mail around? Some of them will have bartered a certain part of that penny to temporarily store a fragment of that encrypted envelope. Since the infrastructure got paid by the sender (or is funded by a project), they’re happy to hold fragments of your mail for you until Max TTL (N days) until your recipient is ready to go and pick it up. Even redundantly, to prevent outage issues, and you could even use RAID level redundancy to reconstruct missing parts if you’re really paranoid. The recipient (or their mail server) can pull it down at a later time, and in pieces, without any knowledge of the sender until the message is decrypted by the recipient.

These servers have no idea who the message is for (since it’s all encrypted with the recipient’s public key); it’s also fragmented, so they don’t even know how long the message is, or what servers it’s all getting stored on. All they know is that they’ve taken a fragment of data and stored it for N days, until someone comes along with a token to retrieve it. Want to email a 10MB attachment? No problem, it’ll get spread out across even more servers, so reconstituting it later won’t even use very much of anyone’s infrastructure, except for the recipient’s. If this is a sender-pays architecture, that might even cost a penny more.

Storage Receipts

So you’ve got a number of fragments of this message flying around the Internet, only the sender knows who it’s going to, and only the mail servers know where any of it is. Now what? A storage receipt.

That abyss the envelope got dropped into has no idea who’s going to pick up the email, and so it prints a receipt FOR THE SENDER, even though it doesn’t know who the sender is or even what their email address is; specifically, a retrieval ticket, which would contain the servers, sequencing, interleave algorithm, and temporary auth tokens to retrieve the message (which will STILL be encrypted with the recipient’s public key even after it’s retrieved). This retrieval information gets encrypted back to the SENDER (rather than the recipient), so that the mail servers are none-the-wiser. So imagine now that you’ve tossed your letter into an abyss, and a receipt pops out of it, and floats down into your hand. Very hitchhikeresque, isn’t it?

Once the sender has the retrieval receipt (which will conveniently pop up in the sender’s mail client, or could even be relayed by the mail client), they have a number of different options to forward it onto the recipient. This can either be done of their own choosing out of band (e.g. over OTR or ZRTP encrypted chat), or could send it in a less traceable fashion using a new service I’m going to call an anonymous discovery and notification system (ADNS).

Anonymous Discovery and Notification Service

Discovery is always a bit of a challenge; how can you let someone discover you, without that system knowing who it is that’s doing the discovering? Obviously, the recipient has to register centrally somewhere to receive notifications, but they don’t necessarily have to register their identity, nor do they need to be constricted to traditional domain name style naming conventions, since mail is decentralized in this design. Recipients can register whatever they like with the service, and senders don’t need to know who they are to send mail to them.

One-time pads are a unique series of identifiers that are never used again, and can be used here like ticker tape at the local deli. Lets say you’re thinking about sending mail to a particular recipient. What if, when your mail client pulled the public key associated with the dark email address, you also pulled on some ticker tape that spat out a handful of unique serial numbers generated for that recipient when you grabbed that public key, either based on their name/email address, a made up handle, or even a bitcoin wallet id (if the recipient wanted to receive mail without identifying themselves).  This would be done using some form of ephemeral key encryption, limiting the likelihood of eavesdropping, but more importantly, those numbers would be completely disassociated from the recipient as soon as they were generated. Of course, if you need more ticker tape, your mail client can always go get it.

The discovery and notification services can also be split up between different systems so that the notification services are never associated (even temporarily) with a recipient’s discovery information.

The recipient’s MUA is then made aware that those numbers have been pulled, and until they expire, their mail client can watch for notifications on them on any public notification service – kind of like tuning into a public station to see if there’s a message on it. When the sender’s MUA posts the notification, it stores it using the same protocol as a message fragment would be pushed to a relay; that is, there’s no way to tell what a message notification looks like, versus a message fragment, or any other stored content. An attacker would have to know the server and the OTP token a recipient was listening on in order to know that a notification had even been posted, and would still have plausible deniability as it would be encrypted to look just like any other message fragment.

For security’s sake, the notification service doesn’t know anything about the association between those unique identifiers and the actual recipient once those numbers get picked up. The server keeps no logs of what identifiers dumped what serial numbers. When the numbers get generated, the recipient’s mail client picks them up, and then the associations are wiped from the server. They will have a max TTL, and once that max is reached, those same serial numbers can even be reused by some other recipient on the service.

When the sender does send the recipient a message, that retrieval receipt they get when they send the message then gets encrypted with the recipient’s public key, and then anonymously (over Tor perhaps) uploaded to an anonymous notification service, with any one of the one-time numbers it pulled off of the roll for the user. The recipient’s mail client will see that there’s a message waiting for it on a public notification service for that identifier, and will be able to pick up the encrypted retrieval receipt.

Conclusion

The idea here is to be able to send and store a message while completely disassociating the sender’s identity and the recipient’s identity from any one point in the system. Here, the mail relays do not know the recipient and only know the IP (unless Torized) of the sender. The ADNS system only knows a unique identifier for the recipient ( not their identity ), but is woefully unaware of who the sender is, or if/when/what message was sent. None of the mail relays are aware of whether they’re storing a message or a notification, and none of them have more than an interleaved fragment of a PKI encrypted message. Even an endpoint mail server would not be aware of the sender, or even what fragments constituted a message, until the message was picked up by the recipient. Even then, this could be abstracted out even more so as to separate domain knowledge from knowledge of the unique identifier.

In reality, the brunt of the work is done here by the sender and recipient’s mail clients, and not by the actual people themselves. Companies and those with needs for reliable message storage could set up a mail server type of architecture to receive these notifications and pull message fragments. This kind of architecture could be set up in a fashion that even the destination mail server did not know what fragments go with what messages, leaving the MUA to request and reassemble the parts from the server.. If designed properly, a system like this could potentially be very easy to use from a UI perspective.

In terms of security, a number of things are done here to abstract user identities out. By creating a system for public consumption, that is paid for by a bartering system using an anonymous payment system, you potentially could spread out email across the world, making it very difficult to trace back to any one user, because there isn’t a mailbox here like there is in the traditional model. Additionally, the connection between the sender and recipient is abstracted out here. A number of different systems would have to be compromised in order to attempt to even piece together a transaction, however private keys would need to be stolen in order for any content to be recovered at all.

There are a lot of possibilities on ways to improve on this, but a system like this could be completely self-funding. All someone needs to do is build it.