General scheme
Internal references, absent cells, and complete BoCs
Consider an arbitrary cellc
in a given BoC. A reference of c
is called internal if
the cell corresponding to the reference is also represented in BoC. Otherwise, the reference is called external and the corresponding
cell is called absent from that BoC. In turn, a BoC is called complete if it does not contain any external references.
Although most real-world cases only deal with complete BoCs, in general, the serialization of absent cells in BoC
differs from the serialization of included cells. Therefore, it is very important to be able to identify the type of references.
Assigning indices to the cells from a bag of cells
In the process of BoC serialization, the assignment of indices of its cells plays an important role. Letc1, ..., cn
be the n
distinct cells belonging to a bag of cells B.
The most used options are:
- Order cells by their representation hash. Thus
Hash(ci) < Hash(cj)
wheneveri < j
. - Topological order.
Outline of serialization process
The serialization process of a BoCB
consisting of n
cells can be outlined as follows.
- List the cells from B in a chosen order:
c1, ..., cn
(withc1, ..., c_k
as root cells). - Choose an integer number
s
, such thatn ≤ 2^s
. Represent each cellci
by an integral number of bytes as in standard representation cell algorithm, but using unsigned big-endian s-bit integerj
instead of hashHash(cj)
to represent internal references to cellcj
. More precisely, each individual cellc
is serialized as follows, provideds
is a multiple of eight.- Two descriptor bytes
d1
andd2
are computed by settingd1 = r + 8x + 16h + 32l
and (for absent cells, onlyd1
is present, always equals to7 + 16 + 32l
), where:0 ≤ r ≤ 4
is the number of cell references present in cellc
, ifc
is absent from the bag of cells being serialized and is represented by its hashes only, thenr
is set to7
;0 ≤ b ≤ 1023
is the number of data bits in cellc
;0 ≤ l ≤ 3
is the level of cellc
;x = 1
for exotic cells andx = 0
for ordinary cells;h = 1
if the cell’s hashes are explicitly included into the serialization; otherwise,h = 0
(whenr = 7
,h
must be1
).
- Two bytes
d1
andd2
(ifr < 7
) or one byted1
(ifr = 7
) begin the serialization ofc
. - If
h = 1
, the serialization is continued byl + 1
32-byte higher hashes ofc
: . - After that, data bytes are serialized, by splitting
b
data bits into 8-bit groups and interpreting each group as a big-endian integer in the range0 ... 255
. Ifb
is not divisible by8
, then the data bits are first augmented by one binary1
and up to six binary0
, so as to make the number of data bits divisible by eight. - Finally,
r
cell references to cells are encoded by means ofr
s-bit big-endian integers .
- Two descriptor bytes
- Concatenate the representations of cells
ci
thus obtained in the increasing order ofi
. - Optionally, an index can be constructed that consists of
n + 1
t-bit integer entriesL1, ..., Ln
, whereLi
is the total length (in bytes) of the representations of cellscj
withj ≤ i
, and integert ≥ 0
is chosen so thatLn ≤ 2^t
. If the index is included, any cellci
the serialized bag of cells may be easily accessed by its indexi
without deserializing all other cells, or even without loading the entire serialized bag of cells in memory. - The serialization of the bag of cells now consists of a magic number
indicating the precise format of the serialization, followed by integers
s ≥ 0
,t ≥ 0
,n ≤ 2^s
, an optional index consisting of bytes, andLn
bytes with the cell representations. - An optional
CRC32C
may be appended to the serialization for integrity verification purposes.
A classification of serialization schemes for bags of cells
Each TL-B scheme for a bag of cells must specify the following parameters.- The 4-byte magic number (name of TL-B constructor) prepended to the serialization.
- The number of bits
s
used to represent cell indices. Usuallys
is a multiple of eight. - The number of bits
t
used to represent offsets of cell serializations. Usuallyt
is also a multiple of eight. - A flag indicating whether an index with offsets
L1, ..., Ln
of cell serializations is present. This flag may be combined witht
by settingt = 0
when the index is absent. - A flag indicating whether the
CRC32
of the whole serialization is appended to it for integrity verification purposes. - The total number of cells
n
present in the serialization. - The number of root cells
k ≤ n
present in the serialization. The root cells themselves are . All other cells present in the bag of cells are expected to be reachable by chains of references starting from the root cells. - The number of absent cells
l ≤ n − k
, which represent cells that are actually absent from this bag of cells, but are referred to from it. The absent cells themselves are represented by , and only these cells may (and also must) haver = 7
. Complete bags of cells havel = 0
. - The total length in bytes
Ln
of the serialization of all cells. If the index is present,Ln
might not be stored explicitly since it can be recovered as the last entry of the index.
A TL-B scheme
Only one serialization scheme of BoCs is used in TON Blockchain:n
, roots
is k
, absent
is l
, and tot_cells_size
is Ln
(the total
size of the serialization of all cells in bytes). If an index is present, parameters
s/8
and t/8
are serialized separately as size
and off_bytes
, respectively,
and the flag has_idx
is set. The index itself is contained in index
, present
only if has_idx
is set. The field root_list
contains the (zero-based) indices
of the root nodes of the bag of cells.
There are also two outdated BoC serialization schemes in the same file.