This works in tandem with DataSerializer, if you change one make sure to check the other.
Rationale for soft-forkable ErgoTree serialization.
Encoding of types for serialization.
Encoding of values for serialization.
This works in tandem with ConstantSerializer, if you change one make sure to check the other.
A serializer which encodes group elements, so elliptic curve points in our case, to bytes, and decodes points from bytes.
A serializer which encodes group elements, so elliptic curve points in our case, to bytes, and decodes points from bytes. Every point is encoded in compressed form (so only X coordinate and sign of Y are stored). Thus for secp256k1 point, 33 bytes are needed. The first bytes is whether equals 2 or 3 depending on the sign of Y coordinate(==2 is Y is positive, ==3, if Y is negative). Other 32 bytes are containing the X coordinate. Special case is infinity point, which is encoded by 33 zeroes. Thus elliptic curve point is always encoded with 33 bytes.
The set of all possible IR graph nodes can be split in two subsets:
1) operations which may appear in ErgoTree (these are defined by OpCodes
below)
2) operations which are not valid to be in ErgoTree, but serve special purposes.
The set of all possible IR graph nodes can be split in two subsets:
1) operations which may appear in ErgoTree (these are defined by OpCodes
below)
2) operations which are not valid to be in ErgoTree, but serve special purposes. (these are defined by OpCodesExtra
)
We can assume they are both Byte-sized codes, and store as a single byte, but as long as we can differentiate them
from context (and where we cannot, we should use special encoding).
The general extended encoding is like the following:
0-255 - range of OpCodes
256-511 - range of OpCodesExtra
Thus, any code in an extended code range of 0-511 can be saved using putUShort
.
We use Byte to represent OpCodes and OpCodesExtra.
We use Short to represent any op code from extended code range.
And we use VLQ to serialize Short values of extended codes.
Examples: 1) For validation rule CheckValidOpCode we use OpCodes range, so we use single byte encoding. 2) For CheckCostFuncOperation we use 1-511 range and extended encoding (see docs)
Serialization of types according to specification in TypeSerialization.md.
Rationale for soft-forkable ErgoTree serialization. There are 2 points:
1) we can make size bit obligatory, i.e. always save total size of script body (in this case we don't need size bit in the header). This will allow to always skip right number of bytes in case of any exception (including ValidationException) thrown during deserialization and produce UnparsedErgoTree. The decision about soft-fork can be done later. But is looks like this is not necessary if we do as described below.
2) HeaderVersionCheck: we can also strictly check during deserialization the content of the script against version number in the header. Thus if the header have vS, then script is allowed to have instructions from versions from v1 to vS. On a node vN, N > S, this should also be enforced, i.e. vN node will reject scripts as invalid if the script has vS in header and vS+1 instruction in body.
Keeping this in mind, if we have a vN node and a script with vS in its header then: During script deserialization: 1) if vN >= vS then the node knows all the instructions and should check that only instructions up to vS are used in the script. It either parses successfully or throws MalformedScriptException. If during the process some unknown instruction is encountered (i.e. ValidationException is thrown), this cannot be a soft-fork, because vN >= vS guarantees that all instructions are known, thus the script is malformed.
2) if vN < vS then the vN node is expecting unknown instructions. If the script is parsed successfully, then vN subset of the language is used and script is accepted for execution else if ValidationException is thrown then UnparsedErgoTree is created, delaying decision about soft-fork until stateful validation. if bodySize is stored then script body is skipped and whole TX deserialization continues. otherwise we cannot skip the body which leads to whole TX to be rejected (CannotSkipScriptException) else if some other exception is thrown then the whole TX is rejected due to said exception.
In the stateful context: if vN >= vS then we can execute script, but we do additional check if vS > the current version of protocol (vP) then the script is rejected as invalid because its version exceeds the current consensus version of the protocol else the script can be executed if vN < vS then if we have Right(tree) the script is executed if Left(UnparsedErgoTree()) then check soft fork and either execute or throw
Proposition: CannotSkipScriptException can only happen on < 10% of the nodes, which is safe for consensus. Proof. If follows from the fact that vN >= vS nodes will reject the script until new vP is upgraded to vS, which means the majority has upgraded to at least vS Thus, before vP is upgraded to vS, majority reject (either because they cannot parse, or because vP is not actualized) after that majority accept (however old nodes still reject but they are < 10%) End of proof.