I've been working on cryptographic e-mail software for many years now, and i want to set down some of my observations of what i think some of the challenges are. I'm involved in Autocrypt, which is making great strides in sensible key management (see the last section below, which is short not because i think it's easy, but because i think Autocrypt has covered this area quite well), but there are additional nuances to the mechanics and user experience of e-mail encryption that i need to get off my chest.

Feedback welcome!

Table of contents:

Cryptography and E-mail Messages
Cryptographic Mechanism
Key management
- Key changes over time

Cryptography and E-mail Messages

Cryptographic protection (i.e., digital signatures, encryption) of e-mail messages has a complex history. There are several different ways that various parts of an e-mail message can be protected (or not), and those mechanisms can be combined in a huge number of ways.

In contrast to the technical complexity, users of e-mail tend to expect a fairly straightforward experience. They also have little to no expectation of explicit cryptographic protections for their messages, whether for authenticity, for confidentiality, or for integrity.

If we want to change this -- if we want users to be able to rely on cryptographic protections for some e-mail messages in their existing e-mail accounts -- we need to be able to explain those protections without getting in the user's way.

Why expose cryptographic protections to the user at all?

For a new messaging service, the service itself can simply enumerate the set of properties that all messages exchanged through the service must have, design the system to bundle those properties with message deliverability, and then users don't need to see any of the details for any given message. The presence of the message in that messaging service is enough to communicate its security properties to the extent that the users care about those properties.

However, e-mail is a widely deployed, heterogenous, legacy system, and even the most sophisticated users will always interact with some messages that lack cryptographic protections.

So if we think those protections are meaningful, and we want users to be able to respond to a protected message at all differently from how they respond to an unprotected message (or if they want to know whether the message they're sending will be protected, so they can decide how much to reveal in it), we're faced with the challenge of explaining those protections to users at some level.

Simplicity

The best level to display cryptographic protects for a typical e-mail user is on a per-message basis.

Wider than per-message (e.g., describing protections on a per-correspondent or a per-thread basis) is likely to stumble on mixed statuses, particularly when other users switch e-mail clients that don't provide the same cryptographic protections, or when people are added to or removed from a thread.

Narrower than per-message (e.g., describing protections on a per-MIME-part basis, or even within a MIME part) is too confusing: most users do not understand the structure of an e-mail message at a technical level, and are unlikely to be able to (or want to) spend any time learning about it. And a message with some cryptographic protection and other tamperable user-facing parts is a tempting vector for attack.

So at most, an e-mail should have one cryptographic state that covers the entire message.

At most, the user probably wants to know:

Is the content of this message known only to me and the sender (and the other people in Cc)? (Confidentiality)
Did this message come from the person I think it came from, as they wrote it? (Integrity and Authenticity)

Any more detail than this is potentially confusing or distracting.

Combinations

Is it possible to combine the two aspects described above into something even simpler? That would be nice, because it would allow us to categorize a message as either "protected" or "not protected". But there are four possible combinations:

unsigned cleartext messages: these are clearly "not protected"
signed encrypted messages: these are clearly "protected" (though see further sections below for more troubling caveats)
signed cleartext messages: these are useful in cases where confidentiality is irrelevant -- posts to a publicly-archived mailing list, for example, or announcement e-mails about a new version of some piece of software. It's hard to see how we can get away with ignoring this category.
unsigned encrypted messages: There are people who send encrypted messages who don't want to sign those messages, for a number of reasons (e.g., concern over the reuse/misuse of their signing key, and wanting to be able to send anonymous messages). Whether you think those reasons are valid or not, some signed messages cannot be validated. For example:
- the signature was made improperly,
- the signature was made with an unknown key,
- the signature was made using an algorithm the message recipient doesn't know how to interpret
- the signature was made with a key that the recipient believes is broken/bad
We have to handle receipt of signed+encrypted messages with any of these signature failures, so we should probably deal with unsigned encrypted messages in the same way.

My conclusion is that we need to be able to represent these states separately to the user (or at least to the MUA, so it can plan sensible actions), even though i would prefer a simpler representation.

Note that some other message encryption schemes (such as those based on shared symmetric keying material, where message signatures are not used for authenticity) may not actually need these distinctions, and can therefore get away with the simpler "protected/not protected" message state. I am unaware of any such scheme being used for e-mail today.

Partial protections

Sadly, the current encrypted e-mail mechanisms are likely to make even these proposed two indicators blurry if we try to represent them in detail. To avoid adding to user confusion, we need to draw some bright lines.

For integrity and authenticity, either the entire message is signed and integrity-checked, or it isn't. We must not report messages as being signed when only a part of the message is signed, or when the signature comes from someone not in the From: field. We should probably also not present "broken signature" status any differently that we present unsigned mail. See discussion on the enigmail mailing list about some of these tradeoffs.
For confidentiality, the user likely cares that the entire message was confidential. But there are some circumstances (e.g., when replying to an e-mail, and deciding whether to encrypt or not) when they likely care if any part of the message was confidential (e.g. if an encrypted part is placed next to a cleartext part).

It's interesting (and frustrating!) to note that these are scoped slightly differently -- that we might care about partial confidentiality but not about partial integrity and authenticity.

Note that while we might care about partial confidentiality, actually representing which parts of a message were confidential represents a signficant UI challenge in most MUAs.

To the extent that a MUA decides it wants to display details of a partially-protected message, i recommend that MUA strip/remove all non-protected parts of the message, and just show the user the (remaining) protected parts. In the event that a message has partial protections like this, the MUA may need to offer the user a choice of seeing the entire partially-protected message, or the stripped down message that has complete protections.

To the extent that we expect to see partially-protected messages in the real world, further UI/UX exploration would be welcome. It would be great to imagine a world where those messages simply don't exist though :)

Cryptographic Mechanism

There are three major categories of cryptographic protection for e-mail in use today: Inline PGP, PGP/MIME, and S/MIME.

Inline PGP

I've argued elsewhere (and it remains true) that Inline PGP signatures are terrible. Inline PGP encryption is also terrible, but in different ways:

it doesn't protect the structure of the message (e.g., the number and size of attachments is visible)
it has no way of protecting confidential message headers (see the Protected Headers section below)
it is very difficult to safely represent to the user what has been encrypted and what has not, particularly if the message body extends beyond the encrypted block.

No MUA should ever emit messages using inline PGP, either for signatures or for encryption. And no MUA should ever display an inline-PGP-signed block as though it was signed. Don't even bother to validate such a signature.

However, some e-mails will arrive using inline PGP encryption, and responsible MUAs probably need to figure out what to show to the user in that case, because the user wants to know what's there. :/

PGP/MIME and S/MIME

PGP/MIME and S/MIME are roughly equivalent to one another, with the largest difference being their certificate format. PGP/MIME messages are signed/encrypted with certificates that follow the OpenPGP specification, while S/MIME messages rely on certificates that follow the X.509 specification.

The cryptographic protections of both PGP/MIME and S/MIME work at the MIME layer, providing particular forms of cryptographic protection around a subtree of other MIME parts.

Both standards have very similar existing flaws that must be remedied or worked around in order to have sensible user experience for encrypted mail.

This document has no preference of one message format over the other, but acknowledges that it's likely that both will continue to exist for quite some time. To the extent possible, a sensible MUA that wants to provide the largest coverage will be able to support both message formats and both certificate formats, hopefully with the same fixes to the underlying problems.

Cryptographic Envelope

Given that the plausible standards (PGP/MIME and S/MIME) both work at the MIME layer, it's worth thinking about the MIME structure of a cryptographically-protected e-mail messages. I introduce here two terms related to an e-mail message: the "Cryptographic Envelope" and the "Cryptographic Payload".

Consider the MIME structure of a simple cleartext PGP/MIME signed message:

0A └┬╴multipart/signed
0B  ├─╴text/plain
0C  └─╴application/pgp-signature

Consider also the simplest PGP/MIME encrypted message:

1A └┬╴multipart/encrypted
1B  ├─╴application/pgp-encrypted
1C  └─╴application/octet-stream
1D     ╤ <<decryption>>
1E     └─╴text/plain

Or, an S/MIME encrypted message:

2A └─╴application/pkcs7-mime; smime-type=enveloped-data
2B     ╤ <<decryption>>
2C     └─╴text/plain

Note that the PGP/MIME decryption step (denoted "1D" above) may also include a cryptographic signature that can be verified, as a part of that decryption. This is not the case with S/MIME, where the signing layer is always separated from the encryption layer.

Also note that any of these layers of protection may be nested, like so:

3A └┬╴multipart/encrypted
3B  ├─╴application/pgp-encrypted
3C  └─╴application/octet-stream
3D     ╤ <<decryption>>
3E     └┬╴multipart/signed
3F      ├─╴text/plain
3G      └─╴application/pgp-signature

For an e-mail message that has some set of these layers, we define the "Cryptographic Envelope" as the layers of cryptographic protection that start at the root of the message and extend until the first non-cryptographic MIME part is encountered.

Cryptographic Payload

We can call the first non-cryptographic MIME part we encounter (via depth-first search) the "Cryptographic Payload". In the examples above, the Cryptographic Payload parts are labeled 0B, 1E, 2C, and 3F. Note that the Cryptographic Payload itself could be a multipart MIME object, like 4E below:

4A └┬╴multipart/encrypted
4B  ├─╴application/pgp-encrypted
4C  └─╴application/octet-stream
4D     ╤ <<decryption>>
4E     └┬╴multipart/alternative
4F      ├─╴text/plain
4G      └─╴text/html

In this case, the full subtree rooted at 4E is the "Cryptographic Payload".

The cryptographic properties of the message should be derived from the layers in the Cryptographic Envelope, and nothing else, in particular:

the cryptographic signature associated with the message, and
whether the message is "fully" encrypted or not.

Note that if some subpart of the message is protected, but the cryptographic protections don't start at the root of the MIME structure, there is no message-wide cryptographic envelope, and therefore there either is no Cryptographic Payload, or (equivalently) the whole message (5A here) is the Cryptographic Payload, but with a null Cryptographic Envelope:

5A └┬╴multipart/mixed
5B  ├┬╴multipart/signed
5C  │├─╴text/plain
5D  │└─╴application/pgp-signature
5E  └─╴text/plain

Note also that if there are any nested encrypted parts, they do not count toward the Cryptographic Envelope, but may mean that the message is "partially encrypted", albeit with a null Cryptographic Envelope:

6A └┬╴multipart/mixed
6B  ├┬╴multipart/encrypted
6C  │├─╴application/pgp-encrypted
6D  │└─╴application/octet-stream
6E  │   ╤ <<decryption>>
6F  │   └─╴text/plain
6G  └─╴text/plain

Layering within the Envelope

The order and number of the layers in the Cryptographic Envelope might make a difference in how the message's cryptographic properties should be considered.

signed+encrypted vs encrypted+signed

One difference is whether the signature is made over the encrypted data, or whether the encryption is done over the signature. Encryption around a signature means that the signature was hidden from an adversary. And a signature around the encryption indicates that sender may not know the actual contents of what was signed.

The common expectation is that the signature will be inside the encryption. This means that the signer likely had access to the cleartext, and it is likely that the existence of the signature is hidden from an adversary, both of which are sensible properties to want.

Multiple layers of signatures or encryption

Some specifications define triple-layering: signatures around encryption around signatures. It's not clear that this is in wide use, or how any particular MUA should present such a message to the user.

In the event that there are multiple layers of protection of a given kind in the Cryptographic Envelope, the message should be marked based on the properties of the inner-most layer of encryption, and the inner-most layer of signing. The main reason for this is simplicity -- it is unclear how to indicate arbitrary (and potentially-interleaved) layers of signatures and encryption.

(FIXME: what should be done if the inner-most layer of signing can't be validated for some reason, but one of the outer layers of signing does validate? ugh MIME is too complex…)

Signed messages should indicate the intended recipient

Ideally, all signed messages would indicate their intended recipient as a way of defending against some forms of replay attack. For example, Alice signs a signed message to Bob that says "please perform task X"; Bob reformats and forwards the message to Charlie as though it was directly from Alice. Charlie might now believes that Alice is asking him to do task X, instead of Bob.

Of course, this concern also includes encrypted messages that are also signed. However, there is no clear standard for how to include this information in either an encrypted message or a signed message.

An e-mail specific mechanism is to ensure that the To: and Cc: headers are signed appropriately (see the "Protected Headers") below.

See also Vincent Breitmoser's proposal of Intended Recipient Fingerprint for OpenPGP as a possible OpenPGP-specific implementation.

However: what should the MUA do if a message is encrypted but no intended recipients are listed? Or what if a signature clearly indicates the intended recipients, but does not include the current reader? Should the MUA render the message differently somehow?

Protected Headers

Sadly, e-mail cryptographic protections have traditionally only covered the body of the e-mail, and not the headers. Most users do not (and should not have to) understand the difference. There are two not-quite-standards for protecting the headers:

message wrapping, which puts an entire e-mail message (message/rfc822 MIME part) "inside" the cryptographic protections. This is also discussed in RFC 5751 §3.1. I don't know of any MUAs that implement this.
memory hole, which puts headers on the top-level MIME part directly. This is implemented in Enigmail and K-9 mail.

These two different mechanisms are roughly equivalent, with slight differences in how they behave for clients who can handle cryptographic mail but have not implemented them. If a MUA is capable of interpreting one form successfully, it probably is also capable of interpreting the other.

Note that in particular, the cryptographic headers for a given message ought to be derived directly from the headers present (in one of the above two ways) in the root element of the Cryptographic Payload MIME subtree itself. If headers are stored anywhere else (e.g. in one of the leaf nodes of a complex Payload), they should not propagate to the outside of the message.

If the headers the user sees are not protected, that lack of protection may need to be clearly explained and visible to the user. This is unfortunate because it is potentially extremely complex for the UI.

The types of cryptographic protections can differ per header. For example, it's relatively straightforward to pack all of the headers inside the Cryptographic Payload. For a signed message, this would mean that all headers are signed. This is the recommended approach when generating an encrypted message. In this case, the "outside" headers simply match the protected headers. And in the case that the outsider headers differ, they can simply be replaced with their protected versions when displayed to the user. This defeats the replay attack described above.

But for an encrypted message, some of those protected headers will be stripped from the outside of the message, and others will be placed in the outer header in cleartext for the sake of deliverability. In particular, From: and To: and Date: are placed in the clear on the outside of the message.

So, consider a MUA that receives an encrypted, signed message, with all headers present in the Cryptographic Payload (so all headers are signed), but From: and To: and Date: in the clear on the outside. Assume that the external Subject: reads simply "Encrypted Message", but the internal (protected) Subject: is actually "Thursday's Meeting".

When displaying this message, how should the MUA distinguish between the Subject: and the From: and To: and Date: headers? All headers are signed, but only Subject: has been hidden. Should the MUA assume that the user understands that e-mail metadata like this leaks to the MTA? This is unfortuately true today, but not something we want in the long term.

Message-ID and threading headers

Messages that are part of an e-mail thread should ensure that Message-Id: and References: and In-Reply-To: are signed, because those markers provide contextual considerations for the signed content. (e.g., a signed message saying "I like this plan!" means something different depending on which plan is under discussion).

That said, given the state of the e-mail system, it's not clear what a MUA should do if it receives a cryptographically-signed e-mail message where these threading headers are not signed. That is the default today, and we do not want to incur warning fatigue for the user. Furthermore, unlike Date: and Subject: and From: and To: and Cc:, the threading headers are not usually shown directly to the user, but instead affect the location and display of messages.

Perhaps there is room here for some indicator at the thread level, that all messages in a given thread are contextually well-bound? Ugh, more UI complexity.

Protecting Headers during e-mail generation

When generating a cryptographically-protected e-mail (either signed or encrypted or both), the sending MUA should copy all of the headers it knows about into the Cryptographic Payload using one of the two techniques referenced above. For signed-only messages, that is all that needs doing.

The challenging question is for encrypted messages: what headers on the outside of the message (outside the Cryptographic Envelope) can be to be stripped (removed completely) or stubbed (replaced with a generic or randomized value)?

Subject: should obviously be stubbed -- for most users, the subject is directly associated with the body of the message (it is not thought of as metadata), and the Subject is not needed for deliverability. Since some MTAs might treat a message without a Subject: poorly, and arbitrary Subject lines are a nuisance, it is recommended to use the exact string below for all external Subjects:

Subject: Encrypted Message

However, stripping or stubbing other headers is more complex.

The date header can likely be stripped from the outside of an encrypted message, or can have it its temporal resolution made much more coarse. However, this doesn't protect much information from the MTAs that touch the message, since they are likely to see the message when it is in transit. It may protect the message from some metadata analysis as it sits on disk, though.

The To: and Cc: headers could be stripped entirely in some cases, though that may make the e-mail more prone to being flagged as spam. However, some e-mail messages sent to Bcc groups are still deliverable, with a header of

To: undisclosed-recipients:;

Note that the Cryptographic Envelope itself may leak metadata about the recipient (or recipients), so stripping this information from the external header may not be useful unless the Cryptographic Envelope is also stripped of metadata appropriately.

The From: header could also be stripped or stubbed. It's not clear whether such a message would be deliverable, particularly given DKIM and DMARC rules for incoming domains. Note that the MTA will still see the SMTP MAIL FROM: verb before the message body is sent, and will use the address there to route bounces or DSNs. However, once the message is delivered, a stripped From: header is an improvement in the metadata available on-disk. Perhaps this is something that a friendly/cooperative MTA could do for the user?

Even worse is the Message-Id: header and the associated In-Reply-To: and References: headers. Some MUAs (like notmuch) rely heavily on the Message-Id:. A message with a stubbed-out Message-Id would effectively change its Message-Id: when it is decrypted. This may not be a straightforward or safe process for MUAs that are Message-ID-centric. That said, a randomized external Message-ID: header could help to avoid leaking the fact that the same message was sent to multiple people, so long as the message encryption to each person was also made distinct.

Stripped In-Reply-To: and References: headers are also a clear metadata win -- the MTA can no longer tell which messages are associated with each other. However, this means that an incoming message cannot be associated with a relevant thread without decrypting it, something that some MUAs may not be in a position to do.

Recommendation for encrypted message generation in 2018: copy all headers during message generation; stub out only the Subject for now.

Bold MUAs may choose to experiment with stripping or stubbing other fields beyond Subject:, possibly in response to some sort of signal from the recipient that they believe that stripping or stubbing some headers is acceptable. Where should such a signal live? Perhaps a notation in the recipient's certificate would be useful.

Key management

Key management bedevils every cryptographic scheme, e-mail or otherwise. The simplest solution for users is to automate key management as much as possible, making reasonable decisions for them. The Autocrypt project outlines a sensible approach here, so i'll leave most of this section short and hope that it's covered by Autocrypt. While fully-automated key management is likely to be susceptible either to MITM attacks or trusted third parties (depending on the design), as a community we need to experiment with ways to provide straightforward (possibly gamified?) user experience that enables and encourages people to do key verification in a fun and simple way. This should probably be done without ever mentioning the word "key", if possible. Serious UI/UX work is needed. I'm hoping future versions of Autocrypt will cover that territory.

But however key management is done, the result for the e-mail user experience is that that the MUA will have some sense of the "validity" of a key being used for any particular correspondent. If it is expressed at all, it should be done as simply as possible by default. In particular, MUAs should avoid confusing the user with distinct (nearly orthogonal) notions of "trust" and "validity" while reading messages, and should not necessarily associate the validity of a correspondent's key with the validity of a message cryptographically associated with that correspondent's key. Identity not the same thing as message integrity, and trustworthiness is not the same thing as identity either.

Key changes over time

Key management is hard enough in the moment. With a store-and-forward system like e-mail, evaluating the validity of a signed message a year after it was received is tough. Your concept of the correspondent's correct key may have changed, for example. I think our understanding of what to do in this context is not currently clear.

dkg's blog

E-mail Cryptography