MLS and Federated Messaging

Loki Messenger Group Messaging

Over the last few years we have seen the rapid uptake of encrypted messengers like WhatsApp, Wickr, and Signal, and an increase in the availability of encrypted conversations in Telegram and Facebook Messenger.  This adoption has been successful because these companies have been able to deliver a comparable user experience to un-encrypted messengers. Although many Person to Person chats are now using encryption, chat applications have struggled to replicate this in the context of group chats.

There are a number of ways Loki Messenger can accomplish encrypted group messaging. Broadly speaking, there are two main ideas that must be considered when designing the Loki Group Messaging architecture:

  1. Scaling – How intensive is the proposed scheme on the Service Node network and the messaging clients?
  2. Encryption – How secure is the channel, and what features does the proposed scheme have? What impact does the proposed scheme have on scalability?

How we approach these two factors will determine the way users interact with the system, and based on our research there are several options available.

In this article, we use some terms that you may not be familiar with:

  • Synchronous Messaging: Also known as ‘online’ messaging – where the parties must be online and actively interacting in order to send and receive messages.
  • Asynchronous Messaging: Also known as ‘offline’ messaging – where the parties do not have to be online in order to send and receive messages.
  • Client Side Fanout: A message must be sent from the client to a number of other users in the group chat.
  • Server Side Fanout: A message must be sent to a server, which then sends the message to all participants of the group chat.

For messaging with only two participants, Loki Messenger has both an Asynchronous mode using Swarm storage on the Service Node network, and a faster Synchronous mode which allows users to send messages directly to each other via Lokinet.  

With that explained, let’s move on to the potential schemes:

Scaling

Full P2P

This scheme is similar to the Tox model, where a central server is never established. In a fully P2P (Peer to Peer) propagation scheme, the members of a group must send messages to all other members of the group. In Tox, this is done by arranging the public keys of the members in the group into a ring, and each message is sent to the two mathematically closest members to the sender’s public key. Those peers should then repeat the process until the message is propagated across the entire group. [1] This protocol could also be extended so that users who were offline for a period of time could request the group chat history from a member in the group.

The issue with fully P2P systems is that they can be inefficient and are often unreliable. For example, if I create a group chat with two friends and both of them are offline, there is no way for me to send a message that they will get when they come online since there is no storage server – which is the major disadvantage of Synchronous messaging. Additionally, P2P messengers, including Tox, usually rely on some version of Client Side Fanout. For example, if I am in a large group of 200 people and I want to send a message to the group, the cumulative bandwidth consumption of that message is highly inefficient for client devices – when compared with Server Side Fanout.

Service Node Storage

In this model, any Service Node could be elected as a central server for the group chat. Users could establish a unique code specific to their group chat which would dictate which Service Node was responsible for storing messages. This code would then be given to each new user who wanted to join the chat.

The Service Node model is highly scalable since each user only needs to send their message to the Service Node once, before it can be requested by the other participants in the group.

However there are a few issues with this model. Firstly, the redundancy of the message storage is reduced as the group becomes reliant on a single Service Node remaining online. This could be remedied by delegating group messaging to Service Node Swarms, however that would burden the Service Node network as the node would have to manage and store a high volume of messages to provide redundancy for the group. It’s possible this could be combated by requiring payment to the Service Node network, or multi-output payment to the Swarm, when the group is created.

User Hosted Server

In this approach, Loki Messenger would allow the introduction of a simple storage server [2] which a user could host inside Lokinet (practically a SNApp). This server would give the operator a .loki address, and other users could use this address to access the operator’s server.

With the addition of Loki Name Service (LNS), this .loki address could become a human readable name like “Protests.loki,” and higher powered servers could allow users to host large chat rooms with a single domain and server.[3] Additionally, through LNS, the operator could add subdomains, for example “paris.Protests.loki”. These subdomains give the ability for one host to create multiple rooms within their single server, all with separate group encryption keys so the host cannot eavesdrop on conversations within rooms. Each subdomain costs the price of a single name registration on LNS, and users would be able to negotiate payment with the server operator if they require a new room.

This option scales well compared to the other options, since each server operator provisions their own level of service depending on how many users they serve, and there is no additional stress put on the Service Node network to store copies of every message for each member of the group. However, User Hosted Servers also introduce an element of trust – the user must trust the operator to reliably store their messages. Though metadata, the major vector of passive attack when using a central server, is removed, as messages are stored on the server using onion routing (Lokinet).

It also opens some much larger questions about how malleable we can make the messaging protocol, and whether opening the protocol restricts the speed of development. Reflections: The Ecosystem is Moving, by one of the Signal founders, discusses this problem in detail.

Encryption

Now that we have covered some of the challenges with scaling, we should also investigate encryption.

Tox

Tox uses a custom implementation, as described below, which encrypts each message for each participant’s keys.[4]

“Groupchats use the NaCl/libsodium cryptography library for all cryptography related operations. All group communication is end-to-end encrypted. Message confidentiality, integrity, and repudiability are guaranteed via authenticated encryption, and perfect forward secrecy is also provided.

One of the most important security improvements from the old groupchat implementation is the removal of a message-relay mechanism that uses a group-wide shared key. Instead, connections are 1-to-1 (a complete graph), meaning an outbound message is sent once per peer, and encrypted/decrypted using a key unique to each peer.”

This scheme provides perfect forward secrecy (PFS) and deniability, but at the expense of poor scaling, so we should consider this a non-option for the reasons described in the scaling section. In terms of scalability, this solution is similar to using the Signal protocol in pairwise operation.

TextSecure V2/Sender Keys (Signal, WhatsApp)

The “senders keys”  method is used by Signal and WhatsApp to encrypt messages for a group (discussion can be found on the Signal blog). This scheme provides a higher level of scalability, PFS and deniability, however it fails under post compromise security (PCS). This means If sender keys are stolen or leaked, they can be used to impersonate or read all future group messages. To remedy this, a new sender key message can be broadcast and users can re-establish a secure group, but because these messages must be encrypted and sent to each user individually, this scales poorly. Additionally, users who leave the group force renegotiation of shared keys.

The “senders keys” method tends to work best in smaller groups of less than 50. Once the group grows sufficiently, the additional bandwidth overhead of adding new users, departing users, and ordering messages degrades in production applications like Signal.[5] It is also questionable if groups larger than 50-100 members actually require full PFS, because as the group size grows, fewer users verify the identity of added members out of band, and messages can be being leaked by inserting a new member or compromising existing ones.

The ‘senders keys’ scheme is the most easily accessible in the context of Loki Messenger, since it is based on the Signal application. It is also a well tested and implemented protocol with known security properties.

MLS With Tree-Based Diffie Hellman

Message Layer Security (MLS) is proposed as a new interoperable standard for encrypted messaging on the internet, similar to TLS – except MLS specifically focuses on secure group messaging. MLS aims to provide both PFS, Deniable authentication, and PCS inside a group environment while still maintaining a high level of scalability.

The idea behind MLS is to use a tree-based Diffie Hellman in a group context. Quoting from “On Ends-to-Ends Encryption: Asynchronous Group Messaging with Strong Security Guarantees,[6]

By using the treeKEM method, MLS provides much better scalability (shown below) when creating groups and dealing with members who leave groups. It also provides PCS, which the ‘senders keys’ scheme does not. The full specification for MLS is still being discussed and iterated on by an IETF working group.[7] There are some reference implementations, but as of yet no working end to end encrypted application that has deployed MLS.

https://eprint.iacr.org/2017/666.pdf “our solution” here is in reference to Tree-based DH using Asynchronous ratchet trees

MLS is a promising protocol, however since it is in the draft stages of development at the IETF, it may be unwise to begin work on implementation until the protocol stabilises. Additionally, MLS is still working through some open problems – namely that MLS requires messages to maintain strict ordering, which can be difficult to achieve when in large groups with many clients. MLS is also trying to address state corruption attacks that could deny service to members in the group, which could be performed by a malicious group member.

Conclusion

We’re still exploring the best way to go about group messaging in Loki Messenger, however it seems likely that our first group messaging scheme will involve a user hosted server that utilises the existing ‘senders keys’ encryption scheme to attain it’s security. We will continue to assess MLS as it becomes a more complete specification, with the hopes of eventually providing an open-source implementation in Loki Messenger.  
If you have any thoughts or questions to add to this discussion, join us on the Discord.


References

[1] “The TokTok Project – Protocol.” https://toktok.ltd/spec.html#group. Accessed 6 Mar. 2019.

[2] “GitHub – loki-project/loki-storage-server: Storage server for Loki ….” https://github.com/loki-project/loki-storage-server. Accessed 6 Mar. 2019.

[3] “Loki Name Service · Issue #342 · loki-project/loki · GitHub.” 22 Nov. 2018, https://github.com/loki-project/loki/issues/342. Accessed 27 Mar. 2019.

[4] “The TokTok Project – Protocol.” https://toktok.ltd/spec.html. Accessed 8 Mar. 2019.

[5] “After leaving group user is still able to read new messages · Issue ….” 16 Oct. 2017, https://github.com/signalapp/Signal-Android/issues/7103. Accessed 26 Mar. 2019.

[6] “On Ends-to-Ends Encryption – Cryptology ePrint Archive.” https://eprint.iacr.org/2017/666.pdf. Accessed 12 Mar. 2019.

[7] “Messaging Layer Security (mls) – IETF Datatracker.” 7 Nov. 2018, https://datatracker.ietf.org/wg/mls/about/. Accessed 26 Mar. 2019.