The Open Network (TON) Whitepaper PDF

The Open Network based on the work of Dr. Nikolai Durov July 26, 2021 Abstract The aim of this text is to provide a rst description of the The Open Network (TON) and related blockchain, peer-to-peer, distributed stor- age and service hosting technologies. To reduce the size of this doc- ument to reasonable proportions, we focus mainly on the unique and dening features of the TON platform that are important for it to achieve its stated goals. Introduction The The Open Network (TON) is a fast, secure and scalable blockchain and network project, capable of handling millions of transactions per second if necessary, and both user-friendly and service provider-friendly. We aim for it to be able to host all reasonable applications currently proposed and con- ceived. One might think about TON as a huge distributed supercomputer, or rather a huge superserver, intended to host and provide a variety of services. This text is not intended to be the ultimate reference with respect to all implementation details. Some particulars are likely to change during the development and testing phases. 1 Introduction Contents 1 Brief Description of TON Components 3 2 TON Blockchain 5 2.1 TON Blockchain as a Collection of 2-Blockchains....... 5 2.2 Generalities on Blockchains................... 15 2.3 Blockchain State, Accounts and Hashmaps........... 19 2.4 Messages Between Shardchains................. 29 2.5 Global Shardchain State. Bag of Cells Philosophy....... 38 2.6 Creating and Validating New Blocks.............. 44 2.7 Splitting and Merging Shardchains............... 57 2.8 Classication of Blockchain Projects.............. 61 2.9 Comparison to Other Blockchain Projects........... 74 3 TON Networking 80 3.1 Abstract Datagram Network Layer............... 80 3.2 TON DHT: Kademlia-like Distributed Hash Table....... 84 3.3 Overlay Networks and Multicasting Messages......... 90 4 TON Services and Applications 98 4.1 TON Service Implementation Strategies............ 98 4.2 Connecting Users and Service Providers............. 102 4.3 Accessing TON Services..................... 104 5 TON Payments 111 5.1 Payment Channels........................ 111 5.2 Payment Channel Network, or Lightning Network...... 118 Conclusion 122 A The TON Coin 125 2 Chapter 1. Brief Description of TON Components 1 Brief Description of TON Components The The Open Network (TON) is a combination of the following components: A exible multi-blockchain platform (TON Blockchain; cf. Chapter 2), capable of processing millions of transactions per second, with Turing- complete smart contracts, upgradable formal blockchain specications, multi-cryptocurrency value transfer, support for micropayment chan- nels and o-chain payment networks. TON Blockchain presents some new and unique features, such as the self-healing vertical blockchain mechanism (cf. 2.1.17) and Instant Hypercube Routing (cf. 2.4.20), which enable it to be fast, reliable, scalable and self-consistent at the same time. A peer-to-peer network (TON P2P Network, or just TON Network; cf. Chapter 3), used for accessing the TON Blockchain, sending transac- tion candidates, and receiving updates about only those parts of the blockchain a client is interested in (e.g., those related to the client's accounts and smart contracts), but also able to support arbitrary dis- tributed services, blockchain-related or not. A distributed le storage technology (TON Storage; cf. 4.1.7), acces- sible through TON Network, used by the TON Blockchain to store archive copies of blocks and status data (snapshots), but also avail- able for storing arbitrary les for users or other services running on the platform, with torrent-like access technology. A network proxy/anonymizer layer (TON Proxy; cf. 4.1.10 and 3.1.6), 2 similar to the I P (Invisible Internet Project), used to hide the iden- tity and IP addresses of TON Network nodes if necessary (e.g., nodes committing transactions from accounts with large amounts of cryp- tocurrency, or high-stake blockchain validator nodes who wish to hide their exact IP address and geographical location as a measure against DDoS attacks). A Kademlia-like distributed hash table (TON DHT; cf. 3.2), used as a torrent tracker for TON Storage (cf. 3.2.10), as an input tunnel locator for TON Proxy (cf. 3.2.14), and as a service locator for TON Services (cf. 3.2.12). 3 Chapter 1. Brief Description of TON Components A platform for arbitrary services (TON Services; cf. Chapter 4), re- siding in and available through TON Network and TON Proxy, with formalized interfaces (cf. 4.3.14) enabling browser-like or smartphone application interaction. These formal interfaces and persistent service entry points can be published in the TON Blockchain (cf. 4.3.17); ac- tual nodes providing service at any given moment can be looked up through the TON DHT starting from information published in the TON Blockchain (cf. 3.2.12). Services may create smart contracts in the TON Blockchain to oer some guarantees to their clients (cf. 4.1.6). TON DNS (cf. 4.3.1), a service for assigning human-readable names to accounts, smart contracts, services and network nodes. TON Payments (cf. Chapter 5), a platform for micropayments, micro- payment channels and a micropayment channel network. It can be used for fast o-chain value transfers, and for paying for services powered by TON Services. TON will allow easy integration with third-party messaging and so- cial networking applications, thus making blockchain technologies and distributed services nally available and accessible to ordinary users (cf. 4.3.23), rather than just to a handful of early cryptocurrency adopters. While the TON Blockchain is the core of the TON project, and the other components might be considered as playing a supportive role for the blockchain, they turn out to have useful and interesting functionality by themselves. Combined, they allow the platform to host more versatile ap- plications than it would be possible by just using the TON Blockchain (cf. 2.9.13 and 4.1). 4 2.1. TON Blockchain as a Collection of 2-Blockchains 2 TON Blockchain We start with a description of the The Open Network (TON) Blockchain, the core component of the project. Our approach here is top-down: we give a general description of the whole rst, and then provide more detail on each component. For simplicity, we speak here about the TON Blockchain, even though in principle several instances of this blockchain protocol may be running independently (for example, as a result of hard forks). We consider only one of them. 2.1 TON Blockchain as a Collection of 2-Blockchains The TON Blockchain is actually a collection of blockchains (even a collection of blockchains of blockchains, or 2-blockchainsthis point will be claried later in 2.1.17), because no single blockchain project is capable of achieving our goal of processing millions of transactions per second, as opposed to the now-standard dozens of transactions per second. 2.1.1. List of blockchain types. The blockchains in this collection are: The unique master blockchain, or masterchain for short, containing general information about the protocol and the current values of its parameters, the set of validators and their stakes, the set of currently active workchains and their shards, and, most importantly, the set of hashes of the most recent blocks of all workchains and shardchains. Several (up to 232 ) working blockchains, or workchains for short, which are actually the workhorses, containing the value-transfer and smart- contract transactions. Dierent workchains may have dierent rules, meaning dierent formats of account addresses, dierent formats of transactions, dierent virtual machines (VMs) for smart contracts, dif- ferent basic cryptocurrencies and so on. However, they all must satisfy certain basic interoperability criteria to make interaction between dif- ferent workchains possible and relatively simple. In this respect, the TON Blockchain is heterogeneous (cf. 2.8.8), similarly to the EOS (cf. 2.9.7) and PolkaDot (cf. 2.9.8) projects. Each workchain is in turn subdivided into up to 260 shard blockchains, or shardchains for short, having the same rules and block format as 5 2.1. TON Blockchain as a Collection of 2-Blockchains the workchain itself, but responsible only for a subset of accounts, de- pending on several rst (most signicant) bits of the account address. In other words, a form of sharding is built into the system (cf. 2.8.12). Because all these shardchains share a common block format and rules, the TON Blockchain is homogeneous in this respect (cf. 2.8.8), simi- 1 larly to what has been discussed in one of Ethereum scaling proposals. Each block in a shardchain (and in the masterchain) is actually not just a block, but a small blockchain. Normally, this block blockchain or vertical blockchain consists of exactly one block, and then we might think this is just the corresponding block of the shardchain (also called horizontal blockchain in this situation). However, if it becomes nec- essary to x incorrect shardchain blocks, a new block is committed into the vertical blockchain, containing either the replacement for the in- valid horizontal blockchain block, or a block dierence, containing only a description of those parts of the previous version of this block that need to be changed. This is a TON-specic mechanism to replace detected invalid blocks without making a true fork of all shardchains involved; it will be explained in more detail in 2.1.17. For now, we just remark that each shardchain (and the masterchain) is not a con- ventional blockchain, but a blockchain of blockchains, or 2D-blockchain, or just a 2-blockchain. 2.1.2. Innite Sharding Paradigm. Almost all blockchain sharding pro- posals are top-down: one rst imagines a single blockchain, and then dis- cusses how to split it into several interacting shardchains to improve perfor- mance and achieve scalability. The TON approach to sharding is bottom-up, explained as follows. Imagine that sharding has been taken to its extreme, so that exactly one account or smart contract remains in each shardchain. Then we have a huge number of account-chains, each describing the state and state transitions of only one account, and sending value-bearing messages to each other to transfer value and information. Of course, it is impractical to have hundreds of millions of blockchains, with updates (i.e., new blocks) usually appearing quite rarely in each of them. In order to implement them more eciently, we group these account- chains into shardchains, so that each block of the shardchain is essentially a 1 https://github.com/ethereum/wiki/wiki/Sharding-FAQ 6 2.1. TON Blockchain as a Collection of 2-Blockchains collection of blocks of account-chains that have been assigned to this shard. Thus the account-chains have only a purely virtual or logical existence inside the shardchains. We call this perspective the Innite Sharding Paradigm. It explains many of the design decisions for the TON Blockchain. 2.1.3. Messages. Instant Hypercube Routing. The Innite Sharding Paradigm instructs us to regard each account (or smart contract) as if it were in its own shardchain by itself. Then the only way one account might aect the state of another is by sending a message to it (this is a special instance of the so-called Actor model, with accounts as Actors; cf. 2.4.2). Therefore, a system of messages between accounts (and shardchains, because the source and destination accounts are, generally speaking, located in dif- ferent shardchains) is of paramount importance to a scalable system such as the TON Blockchain. In fact, a novel feature of the TON Blockchain, called Instant Hypercube Routing (cf. 2.4.20), enables it to deliver and process a message created in a block of one shardchain into the very next block of the destination shardchain, regardless of the total number of shardchains in the system. 2.1.4. Quantity of masterchains, workchains and shardchains. A TON Blockchain contains exactly one masterchain. However, the system 32 can potentially accommodate up to 2 workchains, each subdivided into up 60 to 2 shardchains. 2.1.5. Workchains can be virtual blockchains, not true blockchains. Because a workchain is usually subdivided into shardchains, the existence of the workchain is virtual, meaning that it is not a true blockchain in the sense of the general denition provided in 2.2.1 below, but just a collection of shardchains. When only one shardchain corresponds to a workchain, this unique shardchain may be identied with the workchain, which in this case becomes a true blockchain, at least for some time, thus gaining a super- cial similarity to customary single-blockchain design. However, the Innite Sharding Paradigm (cf. 2.1.2) tells us that this similarity is indeed super- cial: it is just a coincidence that the potentially huge number of account- chains can temporarily be grouped into one blockchain. 2.1.6. Identication of workchains. Each workchain is identied by its number or workchain identier (workchain_id : uint32 ), which is simply an 7 2.1. TON Blockchain as a Collection of 2-Blockchains unsigned 32-bit integer. Workchains are created by special transactions in the masterchain, dening the (previously unused) workchain identier and the formal description of the workchain, sucient at least for the interaction of this workchain with other workchains and for supercial verication of this workchain's blocks. 2.1.7. Creation and activation of new workchains. The creation of a new workchain may be initiated by essentially any member of the community, ready to pay the (high) masterchain transaction fees required to publish the formal specication of a new workchain. However, in order for the new workchain to become active, a two-thirds consensus of validators is required, because they will need to upgrade their software to process blocks of the new workchain, and signal their readiness to work with the new workchain by special masterchain transactions. The party interested in the activation of the new workchain might provide some incentive for the validators to support the new workchain by means of some rewards distributed by a smart contract. 2.1.8. Identication of shardchains. Each shardchain is identied by a couple (w, s) = (workchain_id , shard_prex), where workchain_id : uint32 0...60 identies the corresponding workchain, and shard_prex : 2 is a bit string of length at most 60, dening the subset of accounts for which this shardchain is responsible. Namely, all accounts with account_id starting with shard_prex (i.e., having shard_prex as most signicant bits) will be assigned to this shardchain. 2.1.9. Identication of account-chains. Recall that account-chains have only a virtual existence (cf. 2.1.2). However, they have a natural identier namely, (workchain_id , account_id )because any account-chain contains information about the state and updates of exactly one account (either a simple account or smart contractthe distinction is unimportant here). 2.1.10. Dynamic splitting and merging of shardchains; cf. 2.7. A less sophisticated system might use static shardingfor example, by using the top eight bits of the account_id to select one of 256 pre-dened shards. An important feature of the TON Blockchain is that it implements dy- namic sharding, meaning that the number of shards is not xed. Instead, shard (w, s) can be automatically subdivided into shards (w, s.0) and (w, s.1) if some formal conditions are met (essentially, if the transaction load on the original shard is high enough for a prolonged period of time). Conversely, 8 2.1. TON Blockchain as a Collection of 2-Blockchains if the load stays too low for some period of time, the shards (w, s.0) and (w, s.1) can be automatically merged back into shard (w, s). Initially, only one shard (w, ∅) is created for workchain w. Later, it is subdivided into more shards, if and when this becomes necessary (cf. 2.7.6 and 2.7.8). 2.1.11. Basic workchain or Workchain Zero. While up to 232 workchains can be dened with their specic rules and transactions, we initially dene only one, with workchain_id = 0. This workchain, called Workchain Zero or the basic workchain, is the one used to work with TON smart contracts and transfer TON coins (cf. Appendix A). Most applications are likely to require only Workchain Zero. Shardchains of the basic workchain will be called basic shardchains. 2.1.12. Block generation intervals. We expect a new block to be gener- ated in each shardchain and the masterchain approximately once every ve seconds. This will lead to reasonably small transaction conrmation times. New blocks of all shardchains are generated approximately simultaneously; a new block of the masterchain is generated approximately one second later, because it must contain the hashes of the latest blocks of all shardchains. 2.1.13. Using the masterchain to make workchains and shardchains tightly coupled. Once the hash of a block of a shardchain is incorporated into a block of the masterchain, that shardchain block and all its ancestors are considered canonical, meaning that they can be referenced from the sub- sequent blocks of all shardchains as something xed and immutable. In fact, each new shardchain block contains a hash of the most recent masterchain block, and all shardchain blocks referenced from that masterchain block are considered immutable by the new block. Essentially, this means that a transaction or a message committed in a shardchain block may be safely used in the very next blocks of the other shardchains, without needing to wait for, say, twenty conrmations (i.e., twenty blocks generated after the original block in the same blockchain) be- fore forwarding a message or taking other actions based on a previous trans- action, as is common in most proposed loosely-coupled systems (cf. 2.8.14), such as EOS. This ability to use transactions and messages in other shard- chains a mere ve seconds after being committed is one of the reasons we believe our tightly-coupled system, the rst of its kind, will be able to deliver unprecedented performance (cf. 2.8.12 and 2.8.14). 9 2.1. TON Blockchain as a Collection of 2-Blockchains 2.1.14. Masterchain block hash as a global state. According to 2.1.13, the hash of the last masterchain block completely determines the overall state of the system from the perspective of an external observer. One does not need to monitor the state of all shardchains separately. 2.1.15. Generation of new blocks by validators; cf. 2.6. The TON Blockchain uses a Proof-of-Stake (PoS) approach for generating new blocks in the shardchains and the masterchain. This means that there is a set of, say, up to a few hundred validatorsspecial nodes that have deposited stakes (large amounts of TON coins) by a special masterchain transaction to be eligible for new block generation and validation. Then a smaller subset of validators is assigned to each shard (w, s) in a deterministic pseudorandom way, changing approximately every 1024 blocks. This subset of validators suggests and reaches consensus on what the next shardchain block would be, by collecting suitable proposed transactions from the clients into new valid block candidates. For each block, there is a pseudo- randomly chosen order on the validators to determine whose block candidate has the highest priority to be committed at each turn. Validators and other nodes check the validity of the proposed block candi- dates; if a validator signs an invalid block candidate, it may be automatically punished by losing part or all of its stake, or by being suspended from the set of validators for some time. After that, the validators should reach con- sensus on the choice of the next block, essentially by an ecient variant of the BFT (Byzantine Fault Tolerant; cf. 2.8.4) consensus protocol, similar to PBFT or Honey Badger BFT. If consensus is reached, a new block is created, and validators divide between themselves the transaction fees for the transactions included, plus some newly-created (minted) coins. Each validator can be elected to participate in several validator subsets; in this case, it is expected to run all validation and consensus algorithms in parallel. After all new shardchain blocks are generated or a timeout is passed, a new masterchain block is generated, including the hashes of the latest blocks 2 of all shardchains. This is done by BFT consensus of all validators. More detail on the TON PoS approach and its economical model is pro- vided in section 2.6. 2 Actually, two-thirds by stake is enough to achieve consensus, but an eort is made to collect as many signatures as possible. 10 2.1. TON Blockchain as a Collection of 2-Blockchains 2.1.16. Forks of the masterchain. A complication that arises from our tightly-coupled approach is that switching to a dierent fork in the master- chain will almost necessarily require switching to another fork in at least some of the shardchains. On the other hand, as long as there are no forks in the masterchain, no forks in the shardchain are even possible, because no blocks in the alternative forks of the shardchains can become canonical by having their hashes incorporated into a masterchain block. 0 The general rule is that if masterchain block B is a predecessor of B , B 0 includes hash Hash(Bw,s 0 0 ) of (w, s)-shardchain block Bw,s , and B includes 0 hash Hash(Bw,s ), then Bw,s must be a predecessor of Bw,s ; otherwise, the masterchain block B is invalid. We expect masterchain forks to be rare, next to non-existent, because in the BFT paradigm adopted by the TON Blockchain they can happen only in the case of incorrect behavior by a majority of validators (cf. 2.6.1 and 2.6.15), which would imply signicant stake losses by the oenders. Therefore, no true forks in the shardchains should be expected. Instead, if an invalid shardchain block is detected, it will be corrected by means of the vertical blockchain mechanism of the 2-blockchain (cf. 2.1.17), which can achieve this goal without forking the horizontal blockchain (i.e., the shardchain). The same mechanism can be used to x non-fatal mistakes in the masterchain blocks as well. 2.1.17. Correcting invalid shardchain blocks. Normally, only valid shardchain blocks will be committed, because validators assigned to the shardchain must reach a two-thirds Byzantine consensus before a new block can be committed. However, the system must allow for detection of previ- ously committed invalid blocks and their correction. Of course, once an invalid shardchain block is foundeither by a validator (not necessarily assigned to this shardchain) or by a sherman (any node of the system that made a certain deposit to be able to raise questions about block validity; cf. 2.6.4)the invalidity claim and its proof are committed into the masterchain, and the validators that have signed the invalid block are punished by losing part of their stake and/or being temporarily suspended from the set of validators (the latter measure is important for the case of an attacker stealing the private signing keys of an otherwise benign validator). However, this is not sucient, because the overall state of the system (TON Blockchain) turns out to be invalid because of the invalid shardchain block previously committed. This invalid block must be replaced by a newer 11 2.1. TON Blockchain as a Collection of 2-Blockchains valid version. Most systems would achieve this by rolling back to the last block before the invalid one in this shardchain and the last blocks unaected by messages propagated from the invalid block in each of the other shardchains, and creating a new fork from these blocks. This approach has the disadvantage that a large number of otherwise correct and committed transactions are suddenly rolled back, and it is unclear whether they will be included later at all. The TON Blockchain solves this problem by making each block of each shardchain and of the masterchain (horizontal blockchains) a small blockchain (vertical blockchain) by itself, containing dierent versions of this block, or their dierences. Normally, the vertical blockchain consists of exactly one block, and the shardchain looks like a classical blockchain. However, once the invalidity of a block is conrmed and committed into a masterchain block, the vertical blockchain of the invalid block is allowed to grow by a new block in the vertical direction, replacing or editing the invalid block. The new block is generated by the current validator subset for the shardchain in question. The rules for a new vertical block to be valid are quite strict. In par- ticular, if a virtual account-chain block (cf. 2.1.2) contained in the invalid block is valid by itself, it must be left unchanged by the new vertical block. Once a new vertical block is committed on top of the invalid block, its hash is published in a new masterchain block (or rather in a new vertical block, lying above the original masterchain block where the hash of the invalid shardchain block was originally published), and the changes are propagated further to any shardchain blocks referring to the previous version of this block (e.g., those having received messages from the incorrect block). This is xed by committing new vertical blocks in vertical blockchains for all blocks previously referring to the incorrect block; new vertical blocks will refer to the most recent (corrected) versions instead. Again, strict rules forbid changing account-chains that are not really aected (i.e., that receive the same messages as in the previous version). In this way, xing an incorrect block generates ripples that are ultimately propagated towards the most recent blocks of all aected shardchains; these changes are reected in new vertical masterchain blocks as well. Once the history rewriting ripples reach the most recent blocks, the new shardchain blocks are generated in one version only, being successors of the newest block versions only. This means that they will contain references to 12 2.1. TON Blockchain as a Collection of 2-Blockchains the correct (most recent) vertical blocks from the very beginning. The masterchain state implicitly denes a map transforming the hash of the rst block of each vertical blockchain into the hash of its latest version. This enables a client to identify and locate any vertical blockchain by the hash of its very rst (and usually the only) block. 2.1.18. TON coins and multi-currency workchains. The TON Block- chain supports up to 232 dierent cryptocurrencies, coins, or tokens, distinguished by a 32-bit currency_id. New cryptocurrencies can be added by special transactions in the masterchain. Each workchain has a basic cryp- tocurrency, and can have several additional cryptocurrencies. There is one special cryptocurrency with currency_id = 0, namely, the TON coin (cf. Appendix A). It is the basic cryptocurrency of Workchain Zero. It is also used for transaction fees and validator stakes. In principle, other workchains may collect transaction fees in other to- kens. In this case, some smart contract for automated conversion of these transaction fees into TON coins should be provided. 2.1.19. Messaging and value transfer. Shardchains belonging to the same or dierent workchains may send messages to each other. While the exact form of the messages allowed depends on the receiving workchain and receiving account (smart contract), there are some common elds making inter-workchain messaging possible. In particular, each message may have some value attached, in the form of a certain amount of TON coins and/or other registered cryptocurrencies, provided they are declared as acceptable cryptocurrencies by the receiving workchain. The simplest form of such messaging is a value transfer from one (usually not a smart-contract) account to another. 2.1.20. TON Virtual Machine. The TON Virtual Machine, also ab- breviated as TON VM or TVM , is the virtual machine used to execute smart-contract code in the masterchain and in the basic workchain. Other workchains may use other virtual machines alongside or instead of the TVM. Here we list some of its features. They are discussed further in 2.3.12, 2.3.14 and elsewhere. TVM represents all data as a collection of (TVM) cells (cf. 2.3.14). Each cell contains up to 128 data bytes and up to 4 references to other cells. As a consequence of the everything is a bag of cells philosophy 13 2.1. TON Blockchain as a Collection of 2-Blockchains (cf. 2.5.14), this enables TVM to work with all data related to the TON Blockchain, including blocks and blockchain global state if necessary. TVM can work with values of arbitrary algebraic data types (cf. 2.3.12), represented as trees or directed acyclic graphs of TVM cells. However, it is agnostic towards the existence of algebraic data types; it just works with cells. TVM has built-in support for hashmaps (cf. 2.3.7). TVM is a stack machine. Its stack keeps either 64-bit integers or cell references. 64-bit, 128-bit and 256-bit arithmetic is supported. All n-bit arithmetic operations come in three avors: for unsigned integers, for signed inte- n gers and for integers modulo 2 (no automatic overow checks in the latter case). TVM has unsigned and signed integer conversion from n-bit to m-bit, for all 0 ≤ m, n ≤ 256, with overow checks. All arithmetic operations perform overow checks by default, greatly simplifying the development of smart contracts. TVM has multiply-then-shift and shift-then-divide arithmetic oper- ations with intermediate values computed in a larger integer type; this simplies implementing xed-point arithmetic. TVM oers support for bit strings and byte strings. Support for 256-bit Elliptic Curve Cryptography (ECC) for some pre- dened curves, including Curve25519, is present. Support for Weil pairings on some elliptic curves, useful for fast imple- mentation of zk-SNARKs, is also present. Support for popular hash functions, including sha256, is present. TVM can work with Merkle proofs (cf. 5.1.9). TVM oers support for large or global smart contracts. Such smart contracts must be aware of sharding (cf. 2.3.18 and 2.3.16). Usual (local) smart contracts can be sharding-agnostic. 14 2.2. Generalities on Blockchains TVM supports closures. A spineless tagless G-machine can be easily implemented inside TVM. Several high-level languages can be designed for TVM, in addition to the TVM assembly. All these languages will have static types and will support algebraic data types. We envision the following possibilities: A Java-like imperative language, with each smart contract resembling a separate class. A lazy functional language (think of Haskell). An eager functional language (think of ML). 2.1.21. Congurable parameters. An important feature of the TON Blockchain is that many of its parameters are congurable. This means that they are part of the masterchain state, and can be changed by certain special proposal/vote/result transactions in the masterchain, without any need for hard forks. Changing such parameters will require collecting two-thirds of validator votes and more than half of the votes of all other participants who would care to take part in the voting process in favor of the proposal. 2.2 Generalities on Blockchains 2.2.1. General blockchain denition. In general, any (true) blockchain is a sequence of blocks, each block B containing a reference blk-prev(B) to the previous block (usually by including the hash of the previous block into the header of the current block), and a list of transactions. Each transaction describes some transformation of the global blockchain state; the transactions listed in a block are applied sequentially to compute the new state starting from the old state, which is the resulting state after the evaluation of the previous block. 2.2.2. Relevance for the TON Blockchain. Recall that the TON Block- chain is not a true blockchain, but a collection of 2-blockchains (i.e., of blockchains of blockchains; cf. 2.1.1), so the above is not directly applicable to it. However, we start with these generalities on true blockchains to use them as building blocks for our more sophisticated constructions. 15 2.2. Generalities on Blockchains 2.2.3. Blockchain instance and blockchain type. One often uses the word blockchain to denote both a general blockchain type and its specic blockchain instances, dened as sequences of blocks satisfying certain condi- tions. For example, 2.2.1 refers to blockchain instances. ∗ In this way, a blockchain type is usually a subtype of the type Block of lists (i.e., nite sequences) of blocks, consisting of those sequences of blocks that satisfy certain compatibility and validity conditions: Blockchain ⊂ Block∗ (1) A better way to dene Blockchain would be to say that Blockchain is a dependent couple type, consisting of couples (B, v), with rst component B : ∗ ∗ Block being of type Block (i.e., a list of blocks), and the second component v : isValidBc(B) being a proof or a witness of the validity of B. In this way, Blockchain ≡ Σ(B:Block∗ ) isValidBc(B) (2) We use here the notation for dependent sums of types borrowed from. 2.2.4. Dependent type theory, Coq and TL. Note that we are using 3 (Martin-Löf ) dependent type theory here, similar to that used in the Coq proof assistant. A simplied version of dependent type theory is also used in 4 TL (Type Language), which will be used in the formal specication of the TON Blockchain to describe the serialization of all data structures and the layouts of blocks, transactions, and the like. In fact, dependent type theory gives a useful formalization of what a proof is, and such formal proofs (or their serializations) might become handy when one needs to provide proof of invalidity for some block, for example. 2.2.5. TL, or the Type Language. Since TL (Type Language) will be used in the formal specications of TON blocks, transactions, and network datagrams, it warrants a brief discussion. TL is a language suitable for description of dependent algebraic types, which are allowed to have numeric (natural) and type parameters. Each type is described by means of several constructors. Each constructor has a (human-readable) identier and a name, which is a bit string (32-bit integer by default). Apart from that, the denition of a constructor contains a list of elds along with their types. 3 https://coq.inria.fr 4 https://core.telegram.org/mtproto/TL 16 2.2. Generalities on Blockchains A collection of constructor and type denitions is called a TL-scheme. It is usually kept in one or several les with the sux.tl. An important feature of TL-schemes is that they determine an unambigu- ous way of serializing and deserializing values (or objects) of algebraic types dened. Namely, when a value needs to be serialized into a stream of bytes, rst the name of the constructor used for this value is serialized. Recursively computed serializations of each eld follow. The description of a previous version of TL, suitable for serializing arbi- trary objects into sequences of 32-bit integers, is available at https://core. telegram.org/mtproto/TL. A new version of TL, called TL-B, is being de- veloped for the purpose of describing the serialization of objects used by the TON Project. This new version can serialize objects into streams of bytes and even bits (not just 32-bit integers), and oers support for serialization into a tree of TVM cells (cf. 2.3.14). A description of TL-B will be a part of the formal specication of the TON Blockchain. 2.2.6. Blocks and transactions as state transformation operators. Normally, any blockchain (type) Blockchain has an associated global state (type) State, and a transaction (type) Transaction. The semantics of a blockchain are to a large extent determined by the transaction application function: 0 ev_trans : Transaction × State → State? (3) Here X ? denotes Maybe X , the result of applying the Maybe monad to ∗ type X. This is similar to our use of X for List X. Essentially, a value ? of type X is either a value of type X or a special value ⊥ indicating the absence of an actual value (think about a null pointer). In our case, we use ? State instead of State as the result type because a transaction may be invalid if invoked from certain original states (think about attempting to withdraw from an account more money than it is actually there). 0 We might prefer a curried version of ev_trans : ev_trans : Transaction → State → State? (4) Because a block is essentially a list of transactions, the block evaluation function ev_block : Block → State → State? (5) can be derived from ev_trans. It takes a block B : Block and the previous blockchain state s : State (which might include the hash of the previous 17 2.2. Generalities on Blockchains block) and computes the next blockchain state s0 = ev_block(B)(s) : State, which is either a true state or a special value ⊥ indicating that the next state cannot be computed (i.e., that the block is invalid if evaluated from the starting state givenfor example, the block includes a transaction trying to debit an empty account.) 2.2.7. Block sequence numbers. Each block B in the blockchain can be referred to by its sequence number blk-seqno(B), starting from zero for the very rst block, and incremented by one whenever passing to the next block. More formally, blk-seqno (B) = blk-seqno blk-prev (B) + 1 (6) Notice that the sequence number does not identify a block uniquely in the presence of forks. 2.2.8. Block hashes. Another way of referring to a block B is by its hash blk-hash (B), which is actually the hash of the header of block B (however, the header of the block usually contains hashes that depend on all content of block B ). Assuming that there are no collisions for the hash function used (or at least that they are very improbable), a block is uniquely identied by its hash. 2.2.9. Hash assumption. During formal analysis of blockchain algorithms, we assume that there are no collisions for the k -bit hash function Hash : ∗ k Bytes → 2 used: Hash (s) = Hash(s0 ) ⇒ s = s0 for any s, s0 ∈ Bytes∗ (7) Here Bytes = {0... 255} = 28 is the type of bytes, or the set of all byte ∗ values, and Bytes is the type or set of arbitrary (nite) lists of bytes; while 2 = {0, 1} is the bit type, and 2k is the set (or actually the type) of all k -bit sequences (i.e., of k -bit numbers). Of course, (7) is impossible mathematically, because a map from an in- nite set to a nite set cannot be injective. A more rigorous assumption would be ∀s, s0 : s 6= s0 , P (s) = Hash(s0 ) = 2−k Hash (8) However, this is not so convenient for the proofs. If (8) is used at most N −k times in a proof with 2 N < for some small (say, = 10−18 ), we can 18 2.3. Blockchain State, Accounts and Hashmaps reason as if (7) were true, provided we accept a failure probability (i.e., the nal conclusions will be true with probability at least 1 − ). Final remark: in order to make the probability statement of (8) really ∗ rigorous, one must introduce a probability distribution on the set Bytes of all byte sequences. A way of doing this is by assuming all byte sequences of the same length l equiprobable, and setting the probability of observing a sequence of length l equal to pl − pl+1 for some p → 1−. Then (8) should be 0 understood as a limit of conditional probability P Hash(s) = Hash(s )|s 6= s0 when p tends to one from below. 2.2.10. Hash used for the TON Blockchain. We are using the 256-bit sha256 hash for the TON Blockchain for the time being. If it turns out to be weaker than expected, it can be replaced by another hash function in the future. The choice of the hash function is a congurable parameter of the protocol, so it can be changed without hard forks as explained in 2.1.21. 2.3 Blockchain State, Accounts and Hashmaps We have noted above that any blockchain denes a certain global state, and each block and each transaction denes a transformation of this global state. Here we describe the global state used by TON blockchains. 2.3.1. Account IDs. The basic account IDs used by TON blockchains or at least by its masterchain and Workchain Zeroare 256-bit integers, assumed to be public keys for 256-bit Elliptic Curve Cryptography (ECC) for a specic elliptic curve. In this way, account_id : Account = uint256 = 2256 (9) Here Account is the account type, while account_id : Account is a specic variable of type Account. Other workchains can use other account ID formats, 256-bit or otherwise. For example, one can use Bitcoin-style account IDs, equal to sha256 of an ECC public key. However, the bit length l of an account ID must be xed during the creation of the workchain (in the masterchain), and it must be at least 64, because the rst 64 bits of account_id are used for sharding and message routing. 19 2.3. Blockchain State, Accounts and Hashmaps 2.3.2. Main component: Hashmaps. The principal component of the TON blockchain state is a hashmap. In some cases we consider (partially n m dened) maps h : 2 99K 2. More generally, we might be interested in n hashmaps h : 2 99K X for a composite type X. However, the source (or n index) type is almost always 2. Sometimes, we have a default value empty : X , and the hashmap h: 2n → X is initialized by its default value i 7→ empty. 2.3.3. Example: TON account balances. An important example is given by TON account balances. It is a hashmap balance : Account → uint128 (10) mapping Account = 2256 into a TON coin balance of type uint128 = 2128. This hashmap has a default value of zero, meaning that initially (before the rst block is processed) the balance of all accounts is zero. 2.3.4. Example: smart-contract persistent storage. Another example is given by smart-contract persistent storage, which can be (very approxi- mately) represented as a hashmap storage : 2256 99K 2256 (11) This hashmap also has a default value of zero, meaning that uninitialized cells of persistent storage are assumed to be zero. 2.3.5. Example: persistent storage of all smart contracts. Because we have more than one smart contract, distinguished by account_id , each having its separate persistent storage, we must actually have a hashmap Storage : Account 99K (2256 99K 2256 ) (12) mapping account_id of a smart contract into its persistent storage. 2.3.6. Hashmap type. The hashmap is not just an abstract (partially n dened) function 2 99K X ; it has a specic representation. Therefore, we suppose that we have a special hashmap type Hashmap(n, X) : Type (13) corresponding to a data structure encoding a (partial) map 2n 99K X. We can also write Hashmap(n : nat )(X : Type) : Type (14) 20 2.3. Blockchain State, Accounts and Hashmaps or Hashmap : nat → Type → Type (15) We can always transform h : Hashmap(n, X) into a map hget (h) : 2n → X ?. Henceforth, we usually write h[i] instead of hget (h)(i): h[i] :≡ hget (h)(i) : X ? for any i : 2n , h : Hashmap(n, X) (16) 2.3.7. Denition of hashmap type as a Patricia tree. Logically, one might dene Hashmap(n, X) as an (incomplete) binary tree of depth n with edge labels 0 and 1 and with values of type X in the leaves. Another way to describe the same structure would be as a (bitwise) trie for binary strings of length equal to n. In practice, we prefer to use a compact representation of this trie, by compressing each vertex having only one child with its parent. The result- ing representation is known as a Patricia tree or a binary radix tree. Each intermediate vertex now has exactly two children, labeled by two non-empty binary strings, beginning with zero for the left child and with one for the right child. In other words, there are two types of (non-root) nodes in a Patricia tree: Leaf (x), containing value x of type X. Node (l, sl , r, sr ), where l is the (reference to the) left child or subtree, sl is the bitstring labeling the edge connecting this vertex to its left child (always beginning with 0), r is the right subtree, and sr is the bitstring labeling the edge to the right child (always beginning with 1). A third type of node, to be used only once at the root of the Patricia tree, is also necessary: Root (n, s0 , t), where n is the common length of index bitstrings of Hashmap(n, X), s0 is the common prex of all index bitstrings, and t is a reference to a Leaf or a Node. If we want to allow the Patricia tree to be empty, a fourth type of (root) node would be used: EmptyRoot (n), where n is the common length of all index bitstrings. 21 2.3. Blockchain State, Accounts and Hashmaps We dene the height of a Patricia tree by ( height Leaf (x)) = 0 (17) height Node(l, sl , r, sr ) = height(l) + len(sl ) = height(r) + len(sr ) (18) height Root (n, s0 , t) = len(s0 ) + height(t) = n (19) The last two expressions in each of the last two formulas must be equal. We use Patricia trees of height n to represent values of type Hashmap(n, X). If there are N leaves in the tree (i.e., our hashmap contains N values), then there are exactly N −1 intermediate vertices. Inserting a new value always involves splitting an existing edge by inserting a new vertex in the middle and adding a new leaf as the other child of this new vertex. Deleting a value from a hashmap does the opposite: a leaf and its parent are deleted, and the parent's parent and its other child become directly linked. 2.3.8. Merkle-Patricia trees. When working with blockchains, we want to be able to compare Patricia trees (i.e., hash maps) and their subtrees, by reducing them to a single hash value. The classical way of achieving this is given by the Merkle tree. Essentially, we want to describe a way of hashing objects h of type Hashmap(n, X) with the aid of a hash function Hash dened for binary strings, provided we know how to compute hashes Hash (x) of objects x : X (e.g., by applying the hash function Hash to a x). binary serialization of object One might dene Hash(h) recursively as follows: Hash Leaf(x) := Hash(x) (20) Hash Node(l, sl , r, sr ) := Hash Hash(l). Hash(r). code(sl ). code(sr ) (21) Hash Root (n, s0 , t) := Hash code(n). code(s0 ). Hash(t) (22) Here s.t denotes the concatenation of (bit) strings s and t, and code(s) is a prex code for all bit strings s. For example, one might encode 0 by 10, 1 5 by 11, and the end of the string by 0. 5 One can show that this encoding is optimal for approximately half of all edge labels of a Patricia tree with random or consecutive indices. Remaining edge labels are likely to be long (i.e., almost 256 bits long). Therefore, a nearly optimal encoding for edge labels is to use the above code with prex 0 for short bit strings, and encode 1, then nine bits containing length l = |s| of bitstring s, and then the l bits of s for long bitstrings (with l ≥ 10). 22 2.3. Blockchain State, Accounts and Hashmaps We will see later (cf. 2.3.12 and 2.3.14) that this is a (slightly tweaked) version of recursively dened hashes for values of arbitrary (dependent) al- gebraic types. 2.3.9. Recomputing Merkle tree hashes. This way of recursively den- ing Hash(h), called a Merkle tree hash, has the advantage that, if one explic- 0 0 itly stores Hash(h ) along with each node h (resulting in a structure called a Merkle tree, or, in our case, a MerklePatricia tree), one needs to recompute only at most n hashes when an element is added to, deleted from or changed in the hashmap. In this way, if one represents the global blockchain state by a suitable Merkle tree hash, it is easy to recompute this state hash after each transac- tion. 2.3.10. Merkle proofs. Under the assumption (7) of injectivity of the chosen hash function Hash, one can construct a proof that, for a given value z of Hash(h), h : Hashmap(n, X), one has hget (h)(i) = x for some i : 2n and x : X. Such a proof will consist of the path in the MerklePatricia tree from the leaf corresponding to i to the root, augmented by the hashes of all siblings of all nodes occurring on this path. 6 In this way, a light node knowing only the value of Hash(h) for some hashmap h (e.g., smart-contract persistent storage or global blockchain state) 7 might request from a full node not only the value x = h[i] = hget (h)(i), but such a value along with a Merkle proof starting from the already known value Hash (h). Then, under assumption (7), the light node can check for itself that x is indeed the correct value of h[i]. In some cases, the client may want to obtain the value y= Hash (x) = Hash (h[i]) insteadfor example, if x itself is very large (e.g., a hashmap itself ). Then a Merkle proof for (i, y) can be provided instead. If x is a hashmap as well, then a second Merkle proof starting from y = Hash(x) may be obtained from a full node, to provide a value x[j] = h[i][j] or just its hash. 6A light node is a node that does not keep track of the full state of a shardchain; instead, it keeps minimal information such as the hashes of the several most recent blocks, and relies on information obtained from full nodes when it becomes necessary to inspect some parts of the full state. 7A full node is a node keeping track of the complete up-to-date state of the shardchain in question. 23 2.3. Blockchain State, Accounts and Hashmaps 2.3.11. Importance of Merkle proofs for a multi-chain system such as TON. Notice that a node normally cannot be a full node for all shard- chains existing in the TON environment. It usually is a full node only for some shardchainsfor instance, those containing its own account, a smart contract it is interested in, or those that this node has been assigned to be a validator of. For other shardchains, it must be a light nodeotherwise the storage, computing and network bandwidth requirements would be pro- hibitive. This means that such a node cannot directly check assertions about the state of other shardchains; it must rely on Merkle proofs obtained from full nodes for those shardchains, which is as safe as checking by itself unless (7) fails (i.e., a hash collision is found). 2.3.12. Peculiarities of TON VM. The TON VM or TVM (TON Virtual Machine), used to run smart contracts in the masterchain and Workchain Zero, is considerably dierent from customary designs inspired by the EVM (Ethereum Virtual Machine): it works not just with 256-bit integers, but ac- tually with (almost) arbitrary records, structures, or sum-product types, making it more suitable to execute code written in high-level (especially func- tional) languages. Essentially, TVM uses tagged data types, not unlike those used in implementations of Prolog or Erlang. One might imagine rst that the state of a TVM smart contract is not just a hashmap 2256 → 2256 , or Hashmap(256, 2256 ), but (as a rst step) Hashmap(256, X), where X is a type with several constructors, enabling it to store, apart from 256-bit integers, other data structures, including other hashmaps Hashmap(256, X) in particular. This would mean that a cell of TVM (persistent or temporary) storageor a variable or an element of an array in a TVM smart-contract codemay contain not only an integer, but a whole new hashmap. Of course, this would mean that a cell contains not just 256 bits, but also, say, an 8-bit tag, describing how these 256 bits should be interpreted. In fact, values do not need to be precisely 256-bit. The value format used by TVM consists of a sequence of raw bytes and references to other structures, mixed in arbitrary order, with some descriptor bytes inserted in suitable locations to be able to distinguish pointers from raw data (e.g., strings or integers); cf. 2.3.14. This raw value format may be used to implement arbitrary sum-product algebraic types. In this case, the value would contain a raw byte rst, de- scribing the constructor being used (from the perspective of a high-level 24 2.3. Blockchain State, Accounts and Hashmaps language), and then other elds or constructor arguments, consisting of raw bytes and references to other structures depending on the constructor chosen (cf. 2.2.5). However, TVM does not know anything about the corre- spondence between constructors and their arguments; the mixture of bytes 8 and references is explicitly described by certain descriptor bytes. The Merkle tree hashing is extended to arbitrary such structures: to compute the hash of such a structure, all references are recursively replaced by hashes of objects referred to, and then the hash of the resulting byte string (descriptor bytes included) is computed. In this way, the Merkle tree hashing for hashmaps, described in 2.3.8, is just a special case of hashing for arbitrary (dependent) algebraic data types, 9 applied to type Hashmap(n, X) with two constructors. 2.3.13. Persistent storage of TON smart contracts. Persistent storage of a TON smart contract essentially consists of its global variables, pre- served between calls to the smart contract. As such, it is just a product, tuple, or record type, consisting of elds of the correct types, correspond- ing to one global variable each. If there are too many global variables, they cannot t into one TON cell because of the global restriction on TON cell size. In such a case, they are split into several records and organized into a tree, essentially becoming a product of products or product of products of products type instead of just a product type. 2.3.14. TVM Cells. Ultimately, the TON VM keeps all data in a collection of (TVM) cells. Each cell contains two descriptor bytes rst, indicating how many bytes of raw data are present in this cell (up to 128) and how many references to other cells are present (up to four). Then these raw data bytes and references follow. Each cell is referenced exactly once, so we might have included in each cell a reference to its parent (the only cell referencing this one). However, this reference need not be explicit. In this way, the persistent data storage cells of a TON smart contract 10 are organized into a tree, with a reference to the root of this tree kept in 8 These two descriptor bytes, present in any TVM cell, describe only the total number of references and the total number of raw bytes; references are kept together either before or after all raw bytes. 9 Actually,Leaf and Node are constructors of an auxiliary type, HashmapAux(n, X). Type Hashmap(n, X) has constructors Root and EmptyRoot, with Root containing a value of type HashmapAux(n, X). 10 Logically; the bag of cells representation described in 2.5.5 identies all duplicate 25 2.3. Blockchain State, Accounts and Hashmaps the smart-contract description. If necessary, a Merkle tree hash of this entire persistent storage is recursively computed, starting from the leaves and then simply replacing all references in a cell with the recursively computed hashes of the referenced cells, and subsequently computing the hash of the byte string thus obtained. 2.3.15. Generalized Merkle proofs for values of arbitrary algebraic types. Because the TON VM represents a value of arbitrary algebraic type by means of a tree consisting of (TVM) cells, and each cell has a well-dened (recursively computed) Merkle hash, depending in fact on the whole subtree rooted in this cell, we can provide generalized Merkle proofs for (parts of ) values of arbitrary algebraic types, intended to prove that a certain subtree of a tree with a known Merkle hash takes a specic value or a value with a specic hash. This generalizes the approach of 2.3.10, where only Merkle proofs for x[i] = y have been considered. 2.3.16. Support for sharding in TON VM data structures. We have just outlined how the TON VM, without being overly complicated, sup- ports arbitrary (dependent) algebraic data types in high-level smart-contract languages. However, sharding of large (or global) smart contracts requires special support on the level of TON VM. To this end, a special version of the hashmap type has been added to the system, amounting to a map Account 99K X. This map might seem equivalent to Hashmap(m, X), where m Account = 2. However, when a shard is split in two, or two shards are merged, such hashmaps are automatically split in two, or merged back, so as to keep only those keys that belong to the corresponding shard. 2.3.17. Payment for persistent storage. A noteworthy feature of the TON Blockchain is the payment exacted from smart contracts for storing their persistent data (i.e., for enlarging the total state of the blockchain). It works as follows: Each block declares two rates, nominated in the principal currency of the blockchain (usually the TON coin): the price for keeping one cell in the persistent storage, and the price for keeping one raw byte in some cell of the persistent storage. Statistics on the total numbers of cells and bytes used by each account are stored as part of its state, so by multiplying these numbers by the two rates declared in the block header, we can compute the payment cells, transforming this tree into a directed acyclic graph (dag) when serialized. 26 2.3. Blockchain State, Accounts and Hashmaps to be deducted from the account balance for keeping its data between the previous block and the current one. However, payment for persistent storage usage is not exacted for every account and smart contract in each block; instead, the sequence number of the block where this payment was last exacted is stored in the account data, and when any action is done with the account (e.g., a value transfer or a message is received and processed by a smart contract), the storage usage payment for all blocks since the previous such payment is deducted from the account balance before performing any further actions. If the account's balance would become negative after this, the account is destroyed. A workchain may declare some number of raw data bytes per account to be free (i.e., not participating in the persistent storage payments) in order to make simple accounts, which keep only their balance in one or two cryptocurrencies, exempt from these constant payments. Notice that, if nobody sends any messages to an account, its persistent storage payments are not collected, and it can exist indenitely. However, anybody can send, for instance, an empty message to destroy such an account. A small incentive, collected from part of the original balance of the account to be destroyed, can be given to the sender of such a message. We expect, however, that the validators would destroy such insolvent accounts for free, simply to decrease the global blockchain state size and to avoid keeping large amounts of data without compensation. Payments collected for keeping persistent data are distributed among the validators of the shardchain or the masterchain (proportionally to their stakes in the latter case). 2.3.18. Local and global smart contracts; smart-contract instances. A smart contract normally resides just in one shard, selected according to the smart contract's account_id , similarly to an ordinary account. This is usu- ally sucient for most applications. However, some high-load smart con- tracts may want to have an instance in each shardchain of some workchain. To achieve this, they must propagate their creating transaction into all shard- chains, for instance, by committing this transaction into the root shardchain (w, ∅)11 of the workchain w and paying a large commission. 12 11 A more expensive alternative is to publish such a global smart contract in the mas- terchain. 12 This is a sort of broadcast feature for all shards, and as such, it must be quite expensive. 27 2.3. Blockchain State, Accounts and Hashmaps This action eectively creates instances of the smart contract in each shard, with separate balances. Originally, the balance transferred in the creating transaction is distributed simply by giving the instance in shard (w, s) the 2−|s| part of the total balance. When a shard splits into two child shards, balances of all instances of global smart contracts are split in half; when two shards merge, balances are added together. In some cases, splitting/merging instances of global smart contracts may involve (delayed) execution of special methods of these smart contracts. By default, the balances are split and merged as described above, and some special account-indexed hashmaps are also automatically split and merged (cf. 2.3.16). 2.3.19. Limiting splitting of smart contracts. A global smart contract may limit its splitting depth d upon its creation, in order to make persistent storage expenses more predictable. This means that, if shardchain (w, s) with |s| ≥ d splits in two, only one of two new shardchains inherits an instance of the smart contract. This shardchain is chosen deterministically: each global smart contract has some account_id , which is essentially the hash of its creating transaction, and its instances have the same account_id with the rst ≤ d bits replaced by suitable values needed to fall into the correct shard. This account_id selects which shard will inherit the smart-contract instance after splitting. 2.3.20. Account/Smart-contract state. We can summarize all of the above to conclude that an account or smart-contract state consists of the following: A balance in the principal currency of the blockchain A balance in other currencies of the blockchain Smart-contract code (or its hash) Smart-contract persistent data (or its Merkle hash) Statistics on the number of persistent storage cells and raw bytes used The last time (actually, the masterchain block number) when payment for smart-contract persistent storage was collected 28 2.4. Messages Between Shardchains The public key needed to transfer currency and send messages from this account (optional; by default equal to account_id itself ). In some cases, more sophisticated signature checking code may be located here, similar to what is done for Bitcoin transaction outputs; then the account_id will be equal to the hash of this code. We also need to keep somewhere, either in the account state or in some other account-indexed hashmap, the following data: The output message queue of the account (cf. 2.4.17) The collection of (hashes of ) recently delivered messages (cf. 2.4.23) Not all of these are really required for every account; for example, smart- contract code is needed only for smart contracts, but not for simple ac- counts. Furthermore, while any account must have a non-zero balance in the principal currency (e.g., TON coins for the masterchain and shardchains of the basic workchain), it may have balances of zero in other currencies. In order to avoid keeping unused data, a sum-product type (depending on the workchain) is dened (during the workchain's creation), which uses dierent tag bytes (e.g., TL constructors; cf. 2.2.5) to distinguish between dierent constructors used. Ultimately, the account state is itself kept as a collection of cells of the TVM persistent storage. 2.4 Messages Between Shardchains An important component of the TON Blockchain is the messaging system between blockchains. These blockchains may be shardchains of the same workchain, or of dierent workchains. 2.4.1. Messages, accounts and transactions: a bird's eye view of the system. Messages are sent from one account to another. Each transaction consists of an account receiving one message, changing its state according to certain rules, and generating several (maybe one or zero) new messages to other accounts. Each message is generated and received (delivered) exactly once. This means that messages play a fundamental role in the system, com- parable to that of accounts (smart contracts). From the perspective of the Innite Sharding Paradigm (cf. 2.1.2), each account resides in its separate account-chain, and the only way it can aect the state of some other account is by sending a message. 29 2.4. Messages Between Shardchains 2.4.2. Accounts as processes or actors; Actor model. One might think about accounts (and smart contracts) as processes, or actors, that are able to process incoming messages, change their internal state and generate some outbound messages as a result. This is closely related to the so-called Actor model, used in languages such as Erlang (however, actors in Erlang are usually called processes). Since new actors (i.e., smart contracts) are also allowed to be created by existing actors as a result of processing an inbound message, the correspondence with the Actor model is essentially complete. 2.4.3. Message recipient. Any message has its recipient, characterized by the target workchain identier w (assumed by default to be the same as that of the originating shardchain), and the recipient account account_id. The exact format (i.e., number of bits) of account_id depends on w; however, the shard is always determined by its rst (most signicant) 64 bits. 2.4.4. Message sender. In most cases, a message has a sender, charac- terized again by a (w0 , account_id 0 ) pair. If present, it is located after the message recipient and message value. Sometimes, the sender is unimportant or it is somebody outside the blockchain (i.e., not a smart contract), in which case this eld is absent. Notice that the Actor model does not require the messages to have an implicit sender. Instead, messages may contain a reference to the Actor to which an answer to the request should be sent; usually it coincides with the sender. However, it is useful to have an explicit unforgeable sender eld in a message in a cryptocurrency (Byzantine) environment. 2.4.5. Message value. Another important characteristic of a message is its attached value, in one or several cryptocurrencies supported both by the source and by the target workchain. The value of the message is indicated at its very beginning immediately after the message recipient; it is essentially a list of (currency_id , value) pairs. Notice that simple value transfers between simple accounts are just empty (no-op) messages with some value attached to them. On the other hand, a slightly more complicated message body might contain a simple text or binary comment (e.g., about the purpose of the payment). 2.4.6. External messages, or messages from nowhere. Some mes- sages arrive into the system from nowherethat is, they are not generated by an account (smart contract or not) residing in the blockchain. The most 30 2.4. Messages Between Shardchains typical example arises when a user wants to transfer some funds from an account controlled by her to some other account. In this case, the user sends a message from nowhere to her own account, requesting it to generate a message to the receiving account, carrying the specied value. If this mes- sage is correctly signed, her account receives it and generates the required outbound messages. In fact, one might consider a simple account as a special case of a smart contract with predened code. This smart contract receives only one type of message. Such an inbound message must contain a list of outbound messages to be generated as a result of delivering (processing) the inbound message, along with a signature. The smart contract checks the signature, and, if it is correct, generates the required messages. Of course, there is a dierence between messages from nowhere and normal messages, because the messages from nowhere cannot bear value, so they cannot pay for their gas (i.e., their processing) themselves. Instead, they are tentatively executed with a small gas limit before even being sug- gested for inclusion in a new shardchain block; if the execution fails (the signature is incorrect), the message from nowhere is deemed incorrect and is discarded. If the execution does not fail within the small gas limit, the mes- sage may be included in a new shardchain block and processed completely, with the payment for the gas (processing capacity) consumed exacted from the receiver's account. Messages from nowhere can also dene some trans- action fee which is deducted from the receiver's account on top of the gas payment for redistribution to the validators. In this sense, messages from nowhere or external messages take the role of transaction candidates used in other blockchain systems (e.g., Bitcoin and Ethereum). 2.4.7. Log messages, or messages to nowhere. Similarly, sometimes a special message can be generated and routed to a specic shardchain not to be delivered to its recipient, but to be logged in order to be easily observable by anybody receiving updates about the shard in question. These logged messages may be output in a user's console, or trigger an execution of some script on an o-chain server. In this sense, they represent the external out- put of the blockchain supercomputer, just as the messages from nowhere represent the external input of the blockchain supercomputer. 2.4.8. Interaction with o-chain services and external blockchains. These external input and output messages can be used for interacting with 31 2.4. Messages Between Shardchains o-chain services and other (external) blockchains, such as Bitcoin or Ethe- reum. One might create tokens or cryptocurrencies inside the TON Block- chain pegged to Bitcoins, Ethers or any ERC-20 tokens dened in the Ethe- reum blockchain, and use messages from nowhere and messages to nowhere, generated and processed by scripts residing on some third-party o-chain servers, to implement the necessary interaction between the TON Blockchain and these external blockchains. 2.4.9. Message body. The message body is simply a sequence of bytes, the meaning of which is determined only by the receiving workchain and/or smart contract. For blockchains using TON VM, this could be the serial- ization of any TVM cell, generated automatically via the Send() operation. Such a serialization is obtained simply by recursively replacing all references in a TON VM cell with the cells referred to. Ultimately, a string of raw bytes appears, which is usually prepended by a 4-byte message type or message constructor, used to select the correct method of the receiving smart con- tract. Another option would be to use TL-serialized objects (cf. 2.2.5) as mes- sage bodies. This might be especially useful for communication between dierent workchains, one or both of which are not necessarily using the TON VM. 2.4.10. Gas limit and other workchain/VM-specic parameters. Sometimes a message needs to carry information about the gas limit, the gas price, transaction fees and similar values that depend on the receiving workchain and are relevant only for the receiving workchain, but not necessar- ily for the originating workchain. Such parameters are included in or before the message body, sometimes (depending on the workchain) with special 4- byte prexes indicating their presence (which can be dened by a TL-scheme; cf. 2.2.5). 2.4.11. Creating messages: smart contracts and transactions. There are two sources of new messages. Most messages are created during smart- contract execution (via the Send() operation in TON VM), when some smart contract is invoked to process an incoming message. Alternatively, mes- sages may come from the outside as external messages or messages from nowhere (cf. 2.4.6).13 13 The above needs to be literally true only for the basic workchain and its shardchains; other workchains may provide other ways of creating messages. 32 2.4. Messages Between Shardchains 2.4.12. Delivering messages. When a message reaches the shardchain con- 14 taining its destination account, it is delivered to its destination account. What happens next depends on the workchain; from an outside perspective, it is important that such a message can never be forwarded further from this shardchain. For shardchains of the basic workchain, delivery consists in adding the message value (minus any gas payments) to the balance of the receiving ac- count, and possibly in invoking a message-dependent method of the receiving smart contract afterwards, if the receiving account is a smart contract. In fact, a smart contract has only one entry point for processing all incoming messages, and it must distinguish between dierent types of messages by looking at their rst few bytes (e.g., the rst four bytes containing a TL constructor; cf. 2.2.5). 2.4.13. Delivery of a message is a transaction. Because the delivery of a message changes the state of an account or smart contract, it is a special transaction in the receiving shardchain, and is explicitly registered as such. Essentially, all TON Blockchain transactions consist in the delivery of one inbound message to its receiving account (smart contract), neglecting some minor technical details. 2.4.14. Messages between instances of the same smart contract. Recall that a smart contract may be local (i.e., residing in one shardchain as any ordinary account does) or global (i.e., having instances in all shards, or at least in all shards up to some known depth d; cf. 2.3.18). Instances of a global smart contract may exchange special messages to transfer information and value between each other if required. In this case, the (unforgeable) sender account_id becomes important (cf. 2.4.4). 2.4.15. Messages to any instance of a smart contract; wildcard ad- dresses. Sometimes a message (e.g., a client request) needs be delivered to any instance of a global smart contract, usually the closest one (if there is one residing in the same shardchain as the sender, it is the obvious candidate). One way of doing this is by using a wildcard recipient address, with the rstd bits of the destination account_id allowed to take arbitrary values. In practice, one will usually set these d bits to the same values as in the sender's account_id. 14 As a degenerate case, this shardchain may coincide with the originating shardchain for example, if we are working inside a workchain which has not yet been split. 33 2.4. Messages Between Shardchains 2.4.16. Input queue is absent. All messages received by a blockchain (usually a shardchain; sometimes the masterchain)or, essentially, by an account-chain residing inside some shardchainare immediately delivered (i.e., processed by the receiving account). Therefore, there is no input queue as such. Instead, if not all messages destined for a specic shardchain can be processed because of limitations on the total size of blocks and gas usage, some messages are simply left to accumulate in the output queues of the originating shardchains. 2.4.17. Output queues. From the perspective of the Innite Sharding Paradigm (cf. 2.1.2), each account-chain (i.e., each account) has its own out- put queue, consisting of all messages it has generated, but not yet delivered to their recipients. Of course, account-chains have only a virtual existence; they are grouped into shardchains, and a shardchain has an output queue, consisting of the union of the output queues of all accounts belonging to the shardchain. This shardchain output queue imposes only partial order on its member messages. Namely, a message generated in a preceding block must be deliv- ered before any message generated in a subsequent block, and any messages generated by the same account and having the same destination must be delivered in the order of their generation. 2.4.18. Reliable and fast inter-chain messaging. It is of paramount importance for a scalable multi-blockchain project such as TON to be able to forward and deliver messages between dierent shardchains (cf. 2.1.3), even if there are millions of them in the system. The messages should be delivered reliably (i.e., messages should not be lost or delivered more than once) and quickly. The TON Blockchain achieves this goal by using a combination of two message routing mechanisms. 2.4.19. Hypercube routing: slow path for messages with assured delivery. The TON Blockchain uses hypercube routing as a slow, but safe and reliable way of delivering messages from one shardchain to another, using several intermediate shardchains for transit if necessary. Otherwise, the validators of any given shardchain would need to keep track of the state of (the output queues of ) all other shardchains, which would require prohibitive amounts of computing power and network bandwidth as the total quantity of shardchains grows, thus limiting the scalability of the system. Therefore, it is not possible to deliver messages directly from any shard to every other. 34 2.4. Messages Between Shardchains Instead, each shard is connected only to shards diering in exactly one hexadecimal digit of their (w, s) shard identiers (cf. 2.1.8). In this way, all shardchains constitute a hypercube graph, and messages travel along the edges of this hypercube. If a message is sent to a shard dierent from the current one, one of the hexadecimal digits (chosen deterministically) of the current shard identier is replaced by the corresponding digit of the target shard, and the resulting 15 identier is used as the proximate target to forward the message to. The main advantage of hypercube routing is that the block validity con- ditions imply that validators creating blocks of a shardchain must collect and process messages from the output queues of neighboring shardchains, on pain of losing their stakes. In this way, any message can be expected to reach its nal destination sooner or later; a message cannot be lost in transit or delivered twice. Notice that hypercube routing introduces some additional delays and ex- penses, because of the necessity to forward messages through several interme- diate shardchains. However, the number of these intermediate shardchains grows very slowly, as the logarithm log N (more precisely, dlog16 N e − 1) of the total number of shardchains N. For example, if N ≈ 250, there will be at most one intermediate hop; and for N ≈ 4000 shardchains, at most two. With four intermediate hops, we can support up to one million shard- chains. We think this is a very small price to pay for the essentially unlimited scalability of the system. In fact, it is not necessary to pay even this price: 2.4.20. Instant Hypercube Routing: fast path for messages. A novel feature of the TON Blockchain is that it introduces a fast path for forwarding messages from one shardchain to any other, allowing in most cases to bypass the slow hypercube routing of 2.4.19 altogether and deliver the message into the very next block of the nal destination shardchain. The idea is as follows. During the slow hypercube routing, the message travels (in the network) along the edges of the hypercube, but it is delayed (for approximately ve seconds) at each intermediate vertex to be committed into the corresponding shardchain before continuing its voyage. To avoid unnecessary delays, one might instead relay the message along with a suitable Merkle proof along the edges of the hypercube, without wait- 15 This is not necessarily the nal version of the algorithm used to compute the next hop for hypercube routing. In particular, hexadecimal digits may be replaced by r-bit groups, with r a congurable parameter, not necessarily equal to four. 35 2.4. Messages Between Shardchains ing to commit it into the intermediate shardchains. In fact, the network mes- sage should be forwarded from the validators of the task group (cf. 2.6.8) of the original shard to the designated block producer (cf. 2.6.9) of the task group of the destination shard; this might be done directly without going along the edges of the hypercube. When this message with the Merkle proof reaches the validators (more precisely, the collators; cf. 2.6.5) of the destina- tion shardchain, they can commit it into a new block immediately, without waiting for the message to complete its travel along the slow path. Then a conrmation of delivery along with a suitable Merkle proof is sent back along the hypercube edges, and it may be used to stop the travel of the message along the slow path, by committing a special transaction. Note that this instant delivery mechanism does not replace the slow but failproof mechanism described in 2.4.19. The slow path is still needed because the validators cannot be punished for losing or simply deciding not 16 to commit the fast path messages into new blocks of their blockchains. Therefore, both message forwarding methods are run in parallel, and the slow mechanism is aborted only if a proof of success of the fast mechanism 17 is committed into an intermediate shardchain. 2.4.21. Collecting input messages from output queues of neighbor- ing shardchains. When a new block for a shardchain is proposed, some of the output messages of the neighboring (in the sense of the routing hy- percube of 2.4.19) shardchains are included in the new block as input messages and immediately delivered (i.e., processed). There are certain rules as to the order in which these neighbors' output messages must be processed. Essentially, an older message (coming from a shardchain block referring to an older masterchain block) must be delivered before any newer message; and for messages coming from the same neighboring shardchain, the partial order of the output queue described in 2.4.17 must be observed. 2.4.22. Deleting messages from output queues. Once an output queue message is observed as having been delivered by a neighboring shardchain, it is explicitly deleted from the output queue by a special transaction. 16 However, the validators have some incentive to do so as soon as possible, because they will be able to collect all forwarding fees associated with the message that have not yet been consumed along the slow path. 17 In fact, one might temporarily or permanently disable the instant delivery mecha- nism altogether, and the system would continue working, albeit more slowly. 36 2.4. Messages Between Shardchains 2.4.23. Preventing double delivery of messages. To prevent double delivery of messages taken from the output queues of the neighboring shard- chains, each shardchain (more precisely, each account-chain inside it) keeps the collection of recently delivered messages (or just their hashes) as part of its state. When a delivered message is observed to be deleted from the out- put queue by its originating neighboring shardchain (cf. 2.4.22), it is deleted from the collection of recently delivered messages as well. 2.4.24. Forwarding messages intended for other shardchains. Hy- percube routing (cf. 2.4.19) means that sometimes outbound messages are delivered not to the shardchain containing the intended recipient, but to a neighboring shardchain lying on the hypercube path to the destination. In this case, delivery consists in moving the inbound message to the outbound queue. This is reected explicitly in the block as a special forwarding trans- action, containing the message itself. Essentially, this looks as if the message had been received by somebody inside the shardchain, and one identical mes- sage had been generated as result. 2.4.25. Payment for forwarding and keeping a message. The for- warding transaction actually spends some gas (depending on the size of the message being forwarded), so a gas payment is deducted from the value of the message being forwarded on behalf of the validators of this shardchain. This forwarding payment is normally considerably smaller than the gas pay- ment exacted when the message is nally delivered to its recipient, even if the message has been forwarded several times because of hypercube routing. Furthermore, as long as a message is kept in the output queue of some shard- chain, it is part of the shardchain's global state, so a payment for keeping global data for a long time may be also collected by special transactions. 2.4.26. Messages to and from the masterchain. Messages can be sent directly from any shardchain to the masterchain, and vice versa. However, gas prices for sending messages to and for processing messages in the master- chain are quite high, so this ability will be used only when truly necessary for example, by the validators to deposit their stakes. In some cases, a minimal deposit (attached value) for messages sent to the masterchain may be dened, which is returned only if the message is deemed valid by the receiving party. Messages cannot be automatically routed through the masterchain. A message with workchain_id 6= −1 (−1 being the special workchain_id indi- 37 2.5. Global Shardchain State. Bag of Cells Philosophy. cating the masterchain) cannot be delivered to the masterchain. In principle, one can create a message-forwarding smart contract inside the masterchain, but the price of using it would be prohibitive. 2.4.27. Messages between accounts in the same shardchain. In some cases, a message is generated by an account belonging to some shardchain, destined to another account in the same shardchain. For example, this hap- pens in a new workchain which has not yet split into several shardchains because the load is manageable. Such messages might be accumulated in the output queue of the shard- chain and then processed as incoming messages in subsequent blocks (any shard is considered a neighbor of itself for this purpose). However, in most cases it is possible to deliver these messages within the originating block itself. In order to achieve this, a partial order is imposed on all transactions included in a shardchain block, and the transactions (each consisting in the delivery of a message to some account) are processed respecting this partial order. In particular, a transaction is allowed to process some output message of a preceding transaction with respect to this partial order. In this case, the message body is not copied twice. Instead, the originating and the processing transactions refer to a shared copy of the message. 2.5 Global Shardchain State. Bag of Cells Philosophy. Now we are ready to describe the global state of a TON blockchain, or at least of a shardchain of the basic workchain. We start with a high-level or logical description, which consists in saying that the global state is a value of algebraic type ShardchainState. 2.5.1. Shardchain state as a collection of account-chain states. Ac- cording to the Innite Sharding Paradigm (cf. 2.1.2), any shardchain is just a (temporary) collection of virtual account-chains, containing exactly one account each. This means that, essentially, the global shardchain state must be a hashmap ShardchainState := (Account 99K AccountState) (23) where all account_id appearing as indices of this hashmap must begin with prex s, if we are discussing the state of shard (w, s) (cf. 2.1.8). 38 2.5. Global Shardchain State. Bag of Cells Philosophy. In practice, we might want to split AccountState into several parts (e.g., keep the account output message queue separate to simplify its examination by the neighboring shardchains), and have several hashmaps (Account 99K AccountStateParti ) inside the ShardchainState. We might also add a small number of global or integral parameters to the ShardchainState, (e.g., the total balance of all accounts belonging to this shard, or the total number of messages in all output queues). However, (23) is a good rst approximation of what the shardchain global state looks like, at least from a logical (high-level) perspective. The formal description of algebraic types AccountState and ShardchainState can be done with the aid of a TL-scheme (cf. 2.2.5), to be provided elsewhere. 2.5.2. Splitting and merging shardchain states. Notice that the Innite Sharding Paradigm description of the shardchain state (23) shows how this state should be processed when shards are split or merged. In fact, these state transformations turn out to be very simple operations with hashmaps. 2.5.3. Account-chain state. The (virtual) account-chain state is just the state of one account, described by type AccountState. Usually it has all or some of the elds listed in 2.3.20, depending on the specic constructor used. 2.5.4. Global workchain state. Similarly to (23), we may dene the global workchain state by the same formula, but with account_id 's allowed to take any values, not just those belonging to one shard. Remarks similar to those made in 2.5.1 apply in this case as well: we might want to split this hashmap into several hashmaps, and we might want to add some integral parameters such as the total balance. Essentially, the global workchain state must be given by the same type ShardchainState as the shardchain state, because it is the shardchain state we would obtain if all existing shardchains of this workchain suddenly merged into one. 2.5.5. Low-level perspective: bag of cells. There is a low-level de- scription of the account-chain or shardchain state as well, complementary to the high-level description given above. This description is quite impor- tant, because it turns out to be pretty universal, providing a common basis for representing, storing, serializing and transferring by network almost all data used by the TON Blockchain (blocks, shardchain states, smart-contract storage, Merkle proofs, etc.). At the same time, such a universal low-level

The Open Network (TON) Whitepaper PDF

Document Details

Tags

Related

Summary

Full Transcript