automerge/rust/automerge-wasm/README.md
2022-12-09 23:48:07 +00:00

19 KiB

Automerge WASM Low Level Interface

This package is a low level interface to the automerge rust CRDT. The api is intended to be as "close to the metal" as possible with only a few ease of use accommodations. This library is used as the underpinnings for the Automerge JS wrapper and can be used as is or as a basis for another higher level expression of a CRDT.

All example code can be found in test/readme.ts

Why CRDT?

CRDT stands for Conflict Free Replicated Data Type. It is a data structure that offers eventual consistency where multiple actors can write to the document independently and then these edits can be automatically merged together into a coherent document that, as much as possible, preserves the intent of the different writers. This allows for novel masterless application design where different components need not have a central coordinating server when altering application state.

Terminology

The term Actor, Object Id and Heads are used through this documentation. Detailed explanations are in the glossary at the end of this readme. But the most basic definition would be...

An Actor is a unique id that distinguishes a single writer to a document. It can be any hex string.

An Object id uniquely identifies a Map, List or Text object within a document. It can be treated as an opaque string and can be used across documents. This id comes as a string in the form of {number}@{actor} - so "10@aabbcc" for example. The string "_root" or "/" can also be used to refer to the document root. These strings are durable and can be used on any descendant or copy of the document that generated them.

Heads refers to a set of hashes that uniquely identifies a point in time in a document's history. Heads are useful for comparing documents state or retrieving past states from the document.

Automerge Scalar Types

Automerge has many scalar types. Methods like put() and insert() take an optional data type parameter. Normally the type can be inferred but in some cases, such as telling the difference between int, uint and a counter, it cannot.

These are puts without a data type

  import { create } from "@automerge/automerge-wasm"

  let doc = create()
  doc.put("/", "prop1", 100)  // int
  doc.put("/", "prop2", 3.14) // f64
  doc.put("/", "prop3", "hello world")
  doc.put("/", "prop4", new Date())
  doc.put("/", "prop5", new Uint8Array([1,2,3]))
  doc.put("/", "prop6", true)
  doc.put("/", "prop7", null)

Put's with a data type and examples of all the supported data types.

While int vs uint vs f64 matters little in javascript, Automerge is a cross platform library where these distinctions matter.

  import { create } from "@automerge/automerge-wasm"

  let doc = create()
  doc.put("/", "prop1", 100, "int")
  doc.put("/", "prop2", 100, "uint")
  doc.put("/", "prop3", 100.5, "f64")
  doc.put("/", "prop4", 100, "counter")
  doc.put("/", "prop5", 1647531707301, "timestamp")
  doc.put("/", "prop6", new Date(), "timestamp")
  doc.put("/", "prop7", "hello world", "str")
  doc.put("/", "prop8", new Uint8Array([1,2,3]), "bytes")
  doc.put("/", "prop9", true, "boolean")
  doc.put("/", "prop10", null, "null")

Automerge Object Types

Automerge WASM supports 3 object types. Maps, lists, and text. Maps are key value stores where the values can be any scalar type or any object type. Lists are numerically indexed sets of data that can hold any scalar or any object type.

  import { create } from "@automerge/automerge-wasm"

  let doc = create()

  // you can create an object by passing in the inital state - if blank pass in `{}`
  // the return value is the Object Id
  // these functions all return an object id

  let config = doc.putObject("/", "config", { align: "left", archived: false, cycles: [10, 19, 21] })
  let token = doc.putObject("/", "tokens", {})

  // lists can be made with javascript arrays

  let birds = doc.putObject("/", "birds", ["bluejay", "penguin", "puffin"])
  let bots = doc.putObject("/", "bots", [])

  // text is initialized with a string

  let notes = doc.putObject("/", "notes", "Hello world!")

You can access objects by passing the object id as the first parameter for a call.

  import { create } from "@automerge/automerge-wasm"

  let doc = create()

  let config = doc.putObject("/", "config", { align: "left", archived: false, cycles: [10, 19, 21] })

  doc.put(config, "align", "right")

  // Anywhere Object Ids are being used a path can also be used.
  // The following two statements are equivalent:

  // get the id then use it

  // get returns a single simple javascript value or undefined
  // getWithType returns an Array of the datatype plus basic type or null

  let id = doc.getWithType("/", "config")
  if (id && id[0] === 'map') {
    doc.put(id[1], "align", "right")
  }

  // use a path instead

  doc.put("/config", "align", "right")

Using the id directly is always faster (as it prevents the path to id conversion internally) so it is preferred for performance critical code.

Maps

Maps are key/value stores. The root object is always a map. The keys are always strings. The values can be any scalar type or any object.

    let doc = create()
    let mymap = doc.putObject("_root", "mymap", { foo: "bar"})
                              // make a new map with the foo key

    doc.put(mymap, "bytes", new Uint8Array([1,2,3]))
                              // assign a byte array to key `bytes` of the mymap object

    let submap = doc.putObject(mymap, "sub", {})
                              // make a new empty object and assign it to the key `sub` of mymap

    doc.keys(mymap)           // returns ["bytes","foo","sub"]
    doc.materialize("_root")  // returns { mymap: { bytes: new Uint8Array([1,2,3]), foo: "bar", sub: {}}}

Lists

Lists are index addressable sets of values. These values can be any scalar or object type. You can manipulate lists with insert(), put(), insertObject(), putObject(), push(), pushObject(), splice(), and delete().

    let doc = create()
    let items = doc.putObject("_root", "items", [10,"box"])
                                                  // init a new list with two elements
    doc.push(items, true)                         // push `true` to the end of the list
    doc.putObject(items, 0, { hello: "world" })  // overwrite the value 10 with an object with a key and value
    doc.delete(items, 1)                             // delete "box"
    doc.splice(items, 2, 0, ["bag", "brick"])     // splice in "bag" and "brick" at position 2
    doc.insert(items, 0, "bat")                   // insert "bat" to the beginning of the list
    doc.insertObject(items, 1, [1,2])             // insert a list with 2 values at pos 1

    doc.materialize(items)                        // returns [ "bat", [1,2], { hello : "world" }, true, "bag", "brick"]
    doc.length(items)                             // returns 6

Text

Text is a specialized list type intended for modifying a text document. The primary way to interact with a text document is via the splice() method. Spliced strings will be indexable by character (important to note for platforms that index by graphmeme cluster).

    let doc = create("aaaaaa")
    let notes = doc.putObject("_root", "notes", "Hello world")
    doc.splice(notes, 6, 5, "everyone")

    doc.text(notes)      // returns "Hello everyone"

Tables

Automerge's Table type is currently not implemented.

Querying Data

When querying maps use the get() method with the object in question and the property to query. This method returns a tuple with the data type and the data. The keys() method will return all the keys on the object. If you are interested in conflicted values from a merge use getAll() instead which returns an array of values instead of just the winner.

    let doc1 = create("aabbcc")
    doc1.put("_root", "key1", "val1")
    let key2 = doc1.putObject("_root", "key2", [])

    doc1.get("_root", "key1") // returns "val1"
    doc1.getWithType("_root", "key2") // returns ["list", "2@aabbcc"]
    doc1.keys("_root")          // returns ["key1", "key2"]

    let doc2 = doc1.fork("ffaaff")

    // put a value concurrently
    doc1.put("_root","key3","doc1val")
    doc2.put("_root","key3","doc2val")

    doc1.merge(doc2)

    doc1.get("_root","key3")   // returns "doc2val"
    doc1.getAll("_root","key3")  // returns [[ "str", "doc1val"], ["str", "doc2val"]]

Counters

Counters are 64 bit ints that support the increment operation. Frequently different actors will want to increment or decrement a number and have all these coalesse into a merged value.

    let doc1 = create("aaaaaa")
    doc1.put("_root", "number", 0)
    doc1.put("_root", "total", 0, "counter")

    let doc2 = doc1.fork("bbbbbb")
    doc2.put("_root", "number", 10)
    doc2.increment("_root", "total", 11)

    doc1.put("_root", "number", 20)
    doc1.increment("_root", "total", 22)

    doc1.merge(doc2)

    doc1.materialize("_root")  // returns { number: 10, total: 33 }

Transactions

Generally speaking you don't need to think about transactions when using Automerge. Normal edits queue up into an in-progress transaction. You can query the number of ops in the current transaction with pendingOps(). The transaction will commit automatically on certains calls such as save(), saveIncremental(), fork(), merge(), getHeads(), applyChanges(), generateSyncMessage(), and receiveSyncMessage(). When the transaction commits the heads of the document change. If you want to roll back all the in progress ops you can call doc.rollback(). If you want to manually commit a transaction in progress you can call doc.commit() with an optional commit message and timestamp.

    let doc = create()

    doc.put("_root", "key", "val1")

    doc.get("_root", "key")        // returns "val1"
    doc.pendingOps()                 // returns 1

    doc.rollback()

    doc.get("_root", "key")        // returns null
    doc.pendingOps()                 // returns 0

    doc.put("_root", "key", "val2")

    doc.pendingOps()                 // returns 1

    doc.commit("test commit 1")

    doc.get("_root", "key")        // returns "val2"
    doc.pendingOps()                 // returns 0

Viewing Old Versions of the Document

All query functions can take an optional argument of heads which allow you to query a prior document state. Heads are a set of change hashes that uniquely identify a point in the document history. The getHeads() method can retrieve these at any point.

    let doc = create()

    doc.put("_root", "key", "val1")
    let heads1 = doc.getHeads()

    doc.put("_root", "key", "val2")
    let heads2 = doc.getHeads()

    doc.put("_root", "key", "val3")

    doc.get("_root","key")          // returns "val3"
    doc.get("_root","key",heads2)   // returns "val2"
    doc.get("_root","key",heads1)   // returns "val1"
    doc.get("_root","key",[])       // returns undefined

This works for get(), getAll(), keys(), length(), text(), and materialize()

Queries of old document states are not indexed internally and will be slower than normal access. If you need a fast indexed version of a document at a previous point in time you can create one with doc.forkAt(heads, actor?)

Forking and Merging

You can fork() a document which makes an exact copy of it. This assigns a new actor so changes made to the fork can be merged back in with the original. The forkAt() takes a Heads, allowing you to fork off a document from a previous point in its history. These documents allocate new memory in WASM and need to be freed.

The merge() command applies all changes in the argument doc into the calling doc. Therefore if doc a has 1000 changes that doc b lacks and doc b has only 10 changes that doc a lacks, a.merge(b) will be much faster than b.merge(a).

    let doc1 = create()
    doc1.put("_root", "key1", "val1")

    let doc2 = doc1.fork()

    doc1.put("_root", "key2", "val2")
    doc2.put("_root", "key3", "val3")

    doc1.merge(doc2)

    doc1.materialize("_root")       // returns { key1: "val1", key2: "val2", key3: "val3" }
    doc2.materialize("_root")       // returns { key1: "val1", key3: "val3" }

Note that calling a.merge(a) will produce an unrecoverable error from the wasm-bindgen layer which (as of this writing) there is no workaround for.

Saving and Loading

Calling save() converts the document to a compressed Uint8Array() that can be saved to durable storage. This format uses a columnar storage format that compresses away most of the Automerge metadata needed to manage the CRDT state, but does include all of the change history.

If you wish to incrementally update a saved Automerge doc you can call saveIncremental() to get a Uint8Array() of bytes that can be appended to the file with all the new changes(). Note that the saveIncremental() bytes are not as compressed as the whole document save as each chunk has metadata information needed to parse it. It may make sense to periodically perform a new save() to get the smallest possible file footprint.

The load() function takes a Uint8Array() of bytes produced in this way and constitutes a new document. The loadIncremental() method is available if you wish to consume the result of a saveIncremental() with an already instanciated document.

  import { create, load } from "@automerge/automerge-wasm"

  let doc1 = create()

  doc1.put("_root", "key1", "value1")

  let save1 = doc1.save()

  let doc2 = load(save1)

  doc2.materialize("_root")  // returns { key1: "value1" }

  doc1.put("_root", "key2", "value2")

  let saveIncremental = doc1.saveIncremental()

  let save2 = doc1.save()

  let save3 = new Uint8Array([... save1, ... saveIncremental])

  // save2 has fewer bytes than save3 but contains the same ops

  doc2.loadIncremental(saveIncremental)

  let doc3 = load(save2)

  let doc4 = load(save3)

  doc1.materialize("_root")  // returns { key1: "value1", key2: "value2" }
  doc2.materialize("_root")  // returns { key1: "value1", key2: "value2" }
  doc3.materialize("_root")  // returns { key1: "value1", key2: "value2" }
  doc4.materialize("_root")  // returns { key1: "value1", key2: "value2" }

One interesting feature of automerge binary saves is that they can be concatenated together in any order and can still be loaded into a coherent merged document.

import { load } from "@automerge/automerge-wasm"
import * as fs from "fs"

let file1 = fs.readFileSync("automerge_save_1");
let file2 = fs.readFileSync("automerge_save_2");

let docA = load(file1).merge(load(file2))
let docB = load(Buffer.concat([ file1, file2 ]))

assert.deepEqual(docA.materialize("/"), docB.materialize("/"))
assert.equal(docA.save(), docB.save())

Syncing

When syncing a document the generateSyncMessage() and receiveSyncMessage() methods will produce and consume sync messages. A sync state object will need to be managed for the duration of the connection (created by the function initSyncState() and can be serialized to a Uint8Array() to preserve sync state with the encodeSyncState() and decodeSyncState() functions.

A very simple sync implementation might look like this.

  import { encodeSyncState, decodeSyncState, initSyncState } from "@automerge/automerge-wasm"

  let states = {}

  function receiveMessageFromPeer(doc, peer_id, message) {
      let syncState = states[peer_id]
      doc.receiveMessage(syncState, message)
      let reply = doc.generateSyncMessage(syncState)
      if (reply) {
          sendMessage(peer_id, reply)
      }
  }

  function notifyPeerAboutUpdates(doc, peer_id) {
      let syncState = states[peer_id]
      let message = doc.generateSyncMessage(syncState)
      if (message) {
          sendMessage(peer_id, message)
      }
  }

  function onDisconnect(peer_id) {
      let state = states[peer_id]
      if (state) {
        saveSyncToStorage(peer_id, encodeSyncState(state))
      }
      delete states[peer_id]
  }

  function onConnect(peer_id) {
      let state = loadSyncFromStorage(peer_id)
      if (state) {
        states[peer_id] = decodeSyncState(state)
      } else {
        states[peer_id] = initSyncState()
      }
  }

Glossary: Actors

Some basic concepts you will need to know to better understand the api are Actors and Object Ids.

Actors are ids that need to be unique to each process writing to a document. This is normally one actor per device. Or for a web app one actor per tab per browser would be needed. It can be a uuid, or a public key, or a certificate, as your application demands. All that matters is that its bytes are unique. Actors are always expressed in this api as a hex string.

Methods that create new documents will generate random actors automatically - if you wish to supply your own it is always taken as an optional argument. This is true for the following functions.

  import { create, load } from "@automerge/automerge-wasm"

  let doc1 = create()  // random actorid
  let doc2 = create("aabbccdd")
  let doc3 = doc1.fork()  // random actorid
  let doc4 = doc2.fork("ccdd0011")
  let doc5 = load(doc3.save()) // random actorid
  let doc6 = load(doc4.save(), "00aabb11")

  let actor = doc1.getActor()

Glossary: Object Id's

Object Ids uniquely identify an object within a document. They are represented as strings in the format of {counter}@{actor}. The root object is a special case and can be referred to as _root. The counter is an ever increasing integer, starting at 1, that is always one higher than the highest counter seen in the document thus far. Object Id's do not change when the object is modified but they do if it is overwritten with a new object.

  let doc = create("aabbcc")
  let o1 = doc.putObject("_root", "o1", {})
  let o2 = doc.putObject("_root", "o2", {})
  doc.put(o1, "hello", "world")

  assert.deepEqual(doc.materialize("_root"), { "o1": { hello: "world" }, "o2": {} })
  assert.equal(o1, "1@aabbcc")
  assert.equal(o2, "2@aabbcc")

  let o1v2 = doc.putObject("_root", "o1", {})

  doc.put(o1, "a", "b")    // modifying an overwritten object - does nothing
  doc.put(o1v2, "x", "y")  // modifying the new "o1" object

  assert.deepEqual(doc.materialize("_root"), { "o1": { x: "y" }, "o2": {} })

Appendix: Building

The following steps should allow you to build the package

 $ rustup target add wasm32-unknown-unknown
 $ cargo install wasm-bindgen-cli
 $ cargo install wasm-opt
 $ yarn
 $ yarn release
 $ yarn pack

Appendix: WASM and Memory Allocation

Allocated memory in rust will be freed automatically on platforms that support FinalizationRegistry.

This is currently supported in all major browsers and nodejs.

On unsupported platforms you can free memory explicitly.

  import { create, initSyncState } from "@automerge/automerge-wasm"

  let doc = create()
  let sync = initSyncState()

  doc.free()
  sync.free()