Protocol Design Whitepaper
I AM LOOKING AT THE FOLLOWING ASPECTS OF ZEROTIER VL1 (Virtual Layer 1)
Introduction
ZeroTier is a smart Ethernet switch for planet Earth.
It’s a distributed network hypervisor built atop a cryptographically secure global peer to peer network. It provides advanced network virtualization and management capabilities on par with an enterprise SDN switch, but across both local and wide area networks and connecting almost any kind of app or device.
This manual describes the design and operation of ZeroTier and its associated services, apps, and libraries. Its intended audience includes IT professionals, network administrators, information security experts, and developers.
The first section (2) of this guide explains ZeroTier’s design and operation at a high level and is written for those with at least an intermediate knowledge of topics like TCP/IP and Ethernet networking. It’s not required reading for most users, but understanding how things work in detail helps clarify everything else and helps tremendously with troubleshooting should anything go wrong.
The remaining sections deal more concretely with deployment and administration.
Network Hypervisor Overview
The ZeroTier network hypervisor (currently found in the node/ subfolder of the ZeroTierOne git repository) is a self-contained network virtualization engine that implements an Ethernet virtualization layer similar to VXLAN on top of a global encrypted peer to peer network.
The ZeroTier protocol is original, though aspects of it are similar to VXLAN and IPSec. It has two conceptually separate but closely coupled layers in the OSI model sense: VL1 and VL2. VL1 is the underlying peer to peer transport layer, the “virtual wire,” while VL2 is an emulated Ethernet layer that provides operating systems and apps with a familiar communication medium.
VL1: The ZeroTier Peer to Peer Network {#21vl1thezerotierpeertopeernetworkaname2_1a}
A global data center requires a global wire closet.
In conventional networks L1 (OSI layer 1) refers to the actual CAT5/CAT6 cables or wireless radio channels over which data is carried and the physical transceiver chips that modulate and demodulate it. VL1 is a peer to peer network that does the same thing by using encryption, authentication, and a lot of networking tricks to create virtual wires on a dynamic as-needed basis.
Network Topology and Peer Discovery
VL1 is designed to be zero-configuration. A user can start a new ZeroTier node without having to write configuration files or provide the IP addresses of other nodes. It’s also designed to be fast. Any two devices in the world should be able to locate each other and communicate almost instantly.
To achieve this VL1 is organized like DNS. At the base of the network is a collection of always-present root servers whose role is similar to that of DNS root name servers. Roots run the same software as regular endpoints but reside at fast stable locations on the network and are designated as such by a world definition. World definitions come in two forms: the planet and one or more moons. The protocol includes a secure mechanism allowing world definitions to be updated in-band if root servers’ IP addresses or ZeroTier addresses change.
There is only one planet. Earth’s root servers are operated by ZeroTier, Inc. as a free service. There are currently twelve root servers organized into two six-member clusters distributed across every major continent and multiple network providers. Almost everyone in the world has one within less than 100ms network latency from their location.
A node can “orbit” any number of moons. A moon is just a convenient way to add user-defined root servers to the pool. Users can create moons to reduce dependency on ZeroTier, Inc. infrastructure or to locate root servers closer for better performance. For on-premise SDN use a cluster of root servers can be located inside a building or data center so that ZeroTier can continue to operate normally if Internet connectivity is lost.
Nodes start with no direct links to one another, only upstream to roots (planet and moons). Every peer on VL1 possesses a globally unique 40-bit (10 hex digit) ZeroTier address, but unlike IP addresses these are opaque cryptographic identifiers that encode no routing information. To communicate peers first send packets “up” the tree, and as these packets traverse the network they trigger the opportunistic creation of direct links along the way. The tree is constantly trying to “collapse itself” to optimize itself to the pattern of traffic it is carrying.
Peer to peer connection setup goes like this:
- A wants to send a packet to B, but since it has no direct path it sends it upstream to R (a root).
- If R has a direct link to B, it forwards the packet there. Otherwise it sends the packet upstream until planetary roots are reached. Planetary roots know about all nodes, so eventually the packet will reach B if B is online.
- R also sends a message called rendezvous to A containing hints about how it might reach B. Meanwhile the root that forwards the packet to B sends rendezvous informing B how it might reach A.
- A and B get their rendezvous messages and attempt to send test messages to each other, possibly accomplishing hole punching of any NATs or stateful firewalls that happen to be in the way. If this works a direct link is established and packets no longer need to take the scenic route.
Since roots forward packets, A and B can reach each other instantly. A and B then begin attempting to make a direct peer to peer connection. If this succeeds it results in a faster lower latency link. We call this transport triggered link provisioning since it’s the forwarding of the packet itself that triggers the peer to peer network to attempt direct connection.
VL1 never gives up. If a direct path can’t be established, communication can continue through (slower) relaying. Direct connection attempts continue forever on a periodic basis. VL1 also has other features for establishing direct connectivity including LAN peer discovery, port prediction for traversal of symmetric IPv4 NATs, and explicit port mapping using uPnP and/or NAT-PMP if these are available on the local physical LAN.
Addressing
Every node is uniquely identified on VL1 by a 40-bit (10 hex digit) ZeroTier address. This address is computed from the public portion of a public/private key pair. A node’s address, public key, and private key together form its identity.
On devices running ZeroTier One the node identity is stored in identity.public
and identity.secret
in the service’s home directory.
When ZeroTier starts for the first time it generates a new identity. It then attempts to advertise it upstream to the network. In the very unlikely event that the identity’s 40-bit unique address is taken, it discards it and generates another.
Identities are claimed on a first come first serve basis and currently expire from planetary roots after 60 days of inactivity. If a long-dormant device returns it may re-claim its identity unless its address has been taken in the meantime (again, highly unlikely).
The address derivation algorithm used to compute addresses from public keys imposes a computational cost barrier against the intentional generation of a collision. Currently it would take approximately 10,000 CPU-years to do so (assuming e.g. a 3ghz Intel core). This is expensive but not impossible, but it’s only the first line of defense. After generating a collision an attacker would then have to compromise all upstream nodes, network controllers, and anything else that has recently communicated with the target node and replace their cached identities.
ZeroTier addresses are, once advertised and claimed, a very secure method of unique identification.
When a node attempts to send a message to another node whose identity is not cached, it sends a whois query upstream to a root. Roots provide an authoritative identity cache.
Cryptography
If you don’t know much about cryptography you can safely skip this section. TL;DR: packets are end-to-end encrypted and can’t be read by roots or anyone else, and we use modern 256-bit crypto in ways recommended by the professional cryptographers that created it.
Asymmetric public key encryption is Curve25519/Ed25519, a 256-bit elliptic curve variant.
Every VL1 packet is encrypted end to end using (as of the current version) 256-bit Salsa20 and authenticated using the Poly1305 message authentication (MAC) algorithm. MAC is computed after encryption (encrypt-then-MAC) and the cipher/MAC composition used is identical to the NaCl reference implementation.
As of today we do not implement forward secrecy or other stateful cryptographic features in VL1. We don’t do this for the sake of simplicity, reliability, and code footprint, and because frequently changing state makes features like clustering and fail-over much harder to implement. See our discussion on GitHub.
We may implement forward secrecy in the future. For those who want this level of security today, we recommend using other cryptographic protocols such as SSL or SSH over ZeroTier. These protocols typically implement forward secrecy, but using them over ZeroTier also provides the secondary benefit of defense in depth. Most cryptography is compromised not by a flaw in encryption but through bugs in the implementation. If you’re using two secure transports, the odds of a critical bug being discovered in both at the same time is very low. The CPU overhead of double-encryption is not significant for most work loads.
Trusted Paths for Fast Local SDN
To support the use of ZeroTier as a high performance SDN/NFV protocol over physically secure networks the protocol supports a feature called trusted paths. It is possible to configure all ZeroTier devices on a given network to skip encryption and authentication for traffic over a designated physical path. This can cut CPU use noticeably in high traffic scenarios but at the cost of losing virtually all transport security.
Trusted paths do not prevent communication with devices elsewhere, since traffic over other paths will be encrypted and authenticated normally.
We don’t recommend the use of this feature unless you really need the performance and you know what you’re doing. We also recommend thinking carefully before disabling transport security on a cloud private network. Larger cloud providers such as Amazon and Azure tend to provide good network segregation but many less costly providers offer private networks that are “party lines” and are not much more secure than the open Internet.
THERE’S A LOT MORE TO THIS, BUT THIS IS ENOUGH FOR NOW.