Enforcing resource invariants in Move

Wormhole
7 min readJan 20, 2023

--

Introduction

Move is the new programming language on the block, promising to enable building safer smart contracts than existing languages. It has a number of unique type system features that make it possible to statically enforce resource invariants, making it easier to reason about the correctness of smart contracts. In this post, we’ll demonstrate some of these features through an application to parsing binary data.

Parsing bytes

Smart contract runtimes typically come with a canonical serialization format which is abstracted away from users, so serialization/deserialization code is rarely written by hand. Nevertheless, certain applications (such as cross-chain messaging) do require a portable wire format, which involves manually written binary parsing code.

Binary parsing code is generally quite straightforward. For example, consider this simple struct with two u8 fields:

struct Foo {
a: u8,
b: u8,
}

A parser for this struct written in Move could look something like this:

public fun parse_foo(bytes: vector<u8>): Foo {
let a = *vector::borrow(&bytes, 0);
let b = *vector::borrow(&bytes, 1);
Foo { a, b }
}

Here, we take the first two bytes of the input vector and construct a Foo struct from them. The code looks sensible, but there are a couple of problems with this approach. First, it will accept vectors that are longer than two bytes. Length checks are easy to forget, and can lead to subtle bugs, in some cases security vulnerabilities. The fix is straightforward, just check that the length is 2 before proceeding:

public fun parse_foo(bytes: vector<u8>): Foo {
assert!(vector::length(&bytes) == 2, 0);
let foo = *vector::borrow(&bytes, 0);
let bar = *vector::borrow(&bytes, 1);
Foo { foo, bar }
}

The other problem is that this approach is that it’s not compositional. Say one of the fields wasn’t a u8, but a u64:

struct Foo {
a: u8,
b: u64,
}

Parsing the u64 field is not that straightforward, because we can only read a single byte off the vector at a time, so ideally we would write a utility function to read a u64 from a vector:

public fun parse_u64(bytes: vector<u8>): u64 {
// read 8 bytes from the vector and construct a u64
...
}

But this function is not readily usable in parse_foo, because it doesn't know it needs to skip the first byte of the input vector. We could write a parse_u64 function that takes an offset:

public fun parse_u64(bytes: &vector<u8>, offset: u64): u64 {
// read 8 bytes from the vector starting at 'offset' and construct a u64
...
}

Then use it in parse_foo:

public fun parse_foo(bytes: vector<u8>): Foo {
assert!(vector::length(&bytes) == 9, 0);
let a = *vector::borrow(&bytes, 0);
let b = parse_u64(&bytes, 1);
Foo { a, b }
}

The code is starting to look complex, even though the Foo type is quite simple. Is there a way to eliminate the bookkeeping code and enforce the length checks by construction? In the next section, we'll see how to leverage Move's type system to do just that.

A safe interface for parsing

First, we introduced a wrapper type for vectors, which we’ll call Cursor:

struct Cursor<T> {
data: vector<T>,
}

The Cursor type is parameterized by the type of the elements in the vector — this will be relevant later. The idea is to consume the vector incrementally, by mutating the underlying vector as we go. Move's standard library provides a way to retrieve the last element of a vector and shrink the vector in the process. We'll use this to implement a poke function for our cursor:

public fun poke<T>(cur: &mut Cursor<T>): T {
vector::pop_back(&mut cur.data)
}

Since the standard library only provides a way to pop the last element, we’ll need to reverse the vector when initializing the cursor:

public fun init<T>(data: vector<T>): Cursor<T> {
vector::reverse(&mut data);
Cursor { data }
}

In Move, structs can only be constructed and deconstructed in the module that defines them. The module in turn may choose to export public functions that perform the construction/deconstruction. Since init above is the only public function that creates a Cursor, we can be sure that the vector is always in the correct order.

How does this interface improve the status quo? First of all, parse_u64 can now be implemented without requiring an offset:

public fun parse_u64(cursor: &mut Cursor<u8>): u64 {
...
}

This is because the cursor is incrementally consumed, guaranteeing that the next element will be the one we want. How about the length checks? We may write parse_foo as follows:

public fun parse_foo(bytes: vector<u8>): Foo {
let cur = cursor::init(bytes);
let a = cursor::poke(&mut cur);
let b = parse_u64(&mut cur);
Foo { a, b }
}

If we try to compile this code, we’ll get an error:

57 │         let cur = cursor::init(bytes);
│ --- The local variable 'cur' still contains a value. The value does not have the 'drop' ability and must be consumed before the function returns
·
60 │ Foo { a, b }
│ ^^^^^^^^^^^^ Invalid return

The compiler is telling us that the Cursor is not being consumed, which is an error because the type has no drop ability. Type abilities are a unique feature of Move that control how values can be used. There are four abilities: drop, copy, store, and key. For the purposes of this discussion, we'll focus on the drop and copy abilities, but see the Move book for details on the other two.

Put simply, the drop ability allows the type to go out of scope without being consumed, i.e. a value of a type with the drop may be dropped without being used. The copy ability allows the value to be used multiple times. A type with both of these abilities behaves like a normal type in most mainstream languages. The power of Move's type system comes from the fact that certain types can have only one, or neither, of these abilities. Indeed, Cursor can not be dropped or copied (this property is sometimes called linearity), which is why parse_foo produces a type error: the cur value would go out of scope without being consumed.

In order to consume the cursor (without adding the drop ability), we need a function that explicitly destroys it. As discussed above, only functions defined in the same module as the Cursor type are allowed to deconstruct it. The following function does just that:

public fun destroy_empty<T>(cur: Cursor<T>) {
let Cursor { data } = cur;
vector::destroy_empty(data);
}

First we deconstruct cur, and pass the data field to the vector::destroy_empty function, which destroys the vector if it's empty and reverts otherwise.

Now we can use destroy_empty to consume the cursor in parse_foo:

public fun parse_foo(bytes: vector<u8>): Foo {
let cur = cursor::init(bytes);
let a = cursor::poke(&mut cur);
let b = parse_u64(&mut cur);
cursor::destroy_empty(cur);
Foo { a, b }
}

With this, the code now compiles. The length check is now enforced by the type system, as forgetting to call cursor::destroy_empty results in a type error. The code is also more readable and more maintainable, because there is no need for keeping track of explicit offsets anymore. To see a real-world application of this parsing library, have a look at the wormhole Aptos contracts.

Parametricity

The overall theme of this post is to show how to leverage type system features to enforce correctness. Removing the ability to drop a value is a good example of this. Another, perhaps more subtle example, is the fact that the Cursor type is parametric in the type of the elements in the vector.

It turns out that parametricity guarantees the correctness of the cursor::destroy_empty function, just by looking at its type: destroy_empty<T>(cur: Cursor<T>). This function does not return the cursor, which means that it must consume it. The cursor contains values of type T, which in general may not have the drop ability, and there's certainly no generic way to destroy values of arbitrary type T in Move. Therefore, just by inspecting the type signature, we can be sure that this function can indeed only be implemented for empty cursors. If we forget to call vector::destroy_empty, the compiler will raise an error:

public fun destroy_empty<T>(cur: Cursor<T>) {
let Cursor { data } = cur;
}
    25 │     public fun destroy_empty<T>(cur: Cursor<T>) {
│ - The type 'vector<T>' can have the ability 'drop' but the type argument 'T' does not have the required ability 'drop'
26 │ let Cursor { data } = cur;
│ ---- ^ Invalid return
│ │
│ The local variable 'data' still contains a value. The value does not have the 'drop' ability and must be consumed before the function returns

If instead Cursor contained a vector<u8>, then the following code would happily compile:

public fun destroy_empty(cur: Cursor) {
let Cursor { data } = cur;
}

because vector<u8> has the drop ability (vectors can be dropped as long as the elements can be dropped). Parametricity is a powerful concept, allowing us to reason about the behavior of functions just by looking at their type. For more examples of such parametricity properties, see Theorems for free! by Philip Wadler. (Note that this only works in languages that do not provide runtime introspection of types. Parametricity does not hold in Typescript, for example, because of the typeof primitive.)

Can Rust do this?

Move is often compared to Rust, and indeed the languages are similar in many ways. The syntax is nearly identical, and both support a notion of ownership and borrowing. Superficially, Move looks like a stripped down version of Rust with a smart contract flavor. There is a deeper difference, however. Rust’s borrow checker was designed for the purpose of ensuring memory safety, and in particular, it allows values to go out of scope without being used. Move’s type system requires explicitly opting in to this behavior via the drop type ability.

Rust’s type system is an affine type system, which means that values must be used at most once. Move’s type system is a linear type system, which means that values must be used exactly once. This is a subtle but important difference. Thanks to type abilities in Move, it is possible to opt in to affine types, or even regular types (with both copy and drop).

Conclusion

In this post, we’ve seen how to use Move’s type system to enforce application-specific invariants. We’ve seen how to use the Cursor type to parse a vector of bytes, guaranteeing that the whole vector is consumed. Move's ability to encode such invariants will hopefully raise the bar for smart contract security.

--

--

Wormhole
Wormhole

Written by Wormhole

Wormhole is the leading interoperability platform powering multichain applications and bridges at scale. Build Multichain: http://wormhole.com

No responses yet