automerge/automerge-wasm/README.md
2022-04-08 18:34:04 -04:00

494 lines
20 KiB
Markdown

## Automerge WASM Low Level Interface
This package is a low level interface to the [automerge rust](https://github.com/automerge/automerge-rs/tree/experiment) CRDT. The api is intended to be as "close to the metal" as possible with only a few ease of use accommodations. This library is used as the underpinnings for the [Automerge JS wrapper](https://github.com/automerge/automerge-rs/tree/experiment/automerge-js) and can be used as is or as a basis for another higher level expression of a CRDT.
All example code can be found in `test/readme.ts`
### Why CRDT?
CRDT stands for Conflict Free Replicated Data Type. It is a data structure that offers eventual consistency where multiple actors can write to the document independently and then these edits can be automatically merged together into a coherent document that, as much as possible, preserves the intent of the different writers. This allows for novel masterless application design where different components need not have a central coordinating server when altering application state.
### Terminology
The term Actor, Object Id and Heads are used through this documentation. Detailed explanations are in the glossary at the end of this readme. But the most basic definition would be...
An Actor is a unique id that distinguishes a single writer to a document. It can be any hex string.
An Object id uniquely identifies a Map, List or Text object within a document. It can be treated as an opaque string and can be used across documents. This id comes as a string in the form of `{number}@{actor}` - so `"10@aabbcc"` for example. The string `"_root"` or `"/"` can also be used to refer to the document root. These strings are durable and can be used on any descendant or copy of the document that generated them.
Heads refers to a set of hashes that uniquely identifies a point in time in a document's history. Heads are useful for comparing documents state or retrieving past states from the document.
### Using the Library and Creating a Document
This is a rust/wasm package and will work in a node or web environment. Node is able to load wasm synchronously but a web environment is not. The default import of the package is a function that returns a promise that resolves once the wasm is loaded.
This creates a document in node. The memory allocated is handled by wasm and isn't managed by the javascript garbage collector and thus needs to be manually freed.
```javascript
import { create } from "automerge-wasm"
let doc = create()
doc.free()
```
While this will work in both node and in a web context
```javascript
import init, { create } from "automerge-wasm"
init().then(_ => {
let doc = create()
doc.free()
})
```
The examples below will assume a node context for brevity.
### Automerge Scalar Types
Automerge has many scalar types. Methods like `put()` and `insert()` take an optional data type parameter. Normally the type can be inferred but in some cases, such as telling the difference between int, uint and a counter, it cannot.
These are puts without a data type
```javascript
import { create } from "automerge-wasm"
let doc = create()
doc.put("/", "prop1", 100) // int
doc.put("/", "prop2", 3.14) // f64
doc.put("/", "prop3", "hello world")
doc.put("/", "prop4", new Date())
doc.put("/", "prop5", new Uint8Array([1,2,3]))
doc.put("/", "prop6", true)
doc.put("/", "prop7", null)
doc.free()
```
Put's with a data type and examples of all the supported data types.
While int vs uint vs f64 matters little in javascript, Automerge is a cross platform library where these distinctions matter.
```javascript
import { create } from "automerge-wasm"
let doc = create()
doc.put("/", "prop1", 100, "int")
doc.put("/", "prop2", 100, "uint")
doc.put("/", "prop3", 100.5, "f64")
doc.put("/", "prop4", 100, "counter")
doc.put("/", "prop5", 1647531707301, "timestamp")
doc.put("/", "prop6", new Date(), "timestamp")
doc.put("/", "prop7", "hello world", "str")
doc.put("/", "prop8", new Uint8Array([1,2,3]), "bytes")
doc.put("/", "prop9", true, "boolean")
doc.put("/", "prop10", null, "null")
doc.free()
```
### Automerge Object Types
Automerge WASM supports 3 object types. Maps, lists, and text. Maps are key value stores where the values can be any scalar type or any object type. Lists are numerically indexed sets of data that can hold any scalar or any object type.
```javascript
import { create } from "automerge-wasm"
let doc = create()
// you can create an object by passing in the inital state - if blank pass in `{}`
// the return value is the Object Id
// these functions all return an object id
let config = doc.putObject("/", "config", { align: "left", archived: false, cycles: [10, 19, 21] })
let token = doc.putObject("/", "tokens", {})
// lists can be made with javascript arrays
let birds = doc.putObject("/", "birds", ["bluejay", "penguin", "puffin"])
let bots = doc.putObject("/", "bots", [])
// text is initialized with a string
let notes = doc.putObject("/", "notes", "Hello world!")
doc.free()
```
You can access objects by passing the object id as the first parameter for a call.
```javascript
import { create } from "automerge-wasm"
let doc = create()
let config = doc.putObject("/", "config", { align: "left", archived: false, cycles: [10, 19, 21] })
doc.put(config, "align", "right")
// Anywhere Object Ids are being used a path can also be used.
// The following two statements are equivalent:
// get the id then use it
let id = doc.get("/", "config")
if (id && id[0] === 'map') {
doc.put(id[1], "align", "right")
}
// use a path instead
doc.put("/config", "align", "right")
doc.free()
```
Using the id directly is always faster (as it prevents the path to id conversion internally) so it is preferred for performance critical code.
### Maps
Maps are key/value stores. The root object is always a map. The keys are always strings. The values can be any scalar type or any object.
```javascript
let doc = create()
let mymap = doc.putObject("_root", "mymap", { foo: "bar"})
// make a new map with the foo key
doc.put(mymap, "bytes", new Uint8Array([1,2,3]))
// assign a byte array to key `bytes` of the mymap object
let submap = doc.putObject(mymap, "sub", {})
// make a new empty object and assign it to the key `sub` of mymap
doc.keys(mymap) // returns ["bytes","foo","sub"]
doc.materialize("_root") // returns { mymap: { bytes: new Uint8Array([1,2,3]), foo: "bar", sub: {}}}
doc.free()
```
### Lists
Lists are index addressable sets of values. These values can be any scalar or object type. You can manipulate lists with `insert()`, `put()`, `insertObject()`, `pubObject()`, `push()`, `pushObject()`, `splice()`, and `delete()`.
```javascript
let doc = create()
let items = doc.putObject("_root", "items", [10,"box"])
// init a new list with two elements
doc.push(items, true) // push `true` to the end of the list
doc.putObject(items, 0, { hello: "world" }) // overwrite the value 10 with an object with a key and value
doc.delete(items, 1) // delete "box"
doc.splice(items, 2, 0, ["bag", "brick"]) // splice in "bag" and "brick" at position 2
doc.insert(items, 0, "bat") // insert "bat" to the beginning of the list
doc.insertObject(items, 1, [1,2]) // insert a list with 2 values at pos 1
doc.materialize(items) // returns [ "bat", [1,2], { hello : "world" }, true, "bag", "brick"]
doc.length(items) // returns 6
doc.free()
```
### Text
Text is a specialized list type intended for modifying a text document. The primary way to interact with a text document is via the `splice()` method. Spliced strings will be indexable by character (important to note for platforms that index by graphmeme cluster). Non text can be inserted into a text document and will be represented with the unicode object replacement character.
```javascript
let doc = create("aaaaaa")
let notes = doc.putObject("_root", "notes", "Hello world")
doc.splice(notes, 6, 5, "everyone")
doc.text(notes) // returns "Hello everyone"
let obj = doc.insertObject(notes, 6, { hi: "there" })
doc.text(notes) // returns "Hello \ufffceveryone"
doc.get(notes, 6) // returns ["map", obj]
doc.get(obj, "hi") // returns ["str", "there"]
doc.free()
```
### Tables
Automerge's Table type is currently not implemented.
### Querying Data
When querying maps use the `get()` method with the object in question and the property to query. This method returns a tuple with the data type and the data. The `keys()` method will return all the keys on the object. If you are interested in conflicted values from a merge use `getAll()` instead which returns an array of values instead of just the winner.
```javascript
let doc1 = create("aabbcc")
doc1.put("_root", "key1", "val1")
let key2 = doc1.putObject("_root", "key2", [])
doc1.get("_root", "key1") // returns ["str", "val1"]
doc1.get("_root", "key2") // returns ["list", "2@aabbcc"]
doc1.keys("_root") // returns ["key1", "key2"]
let doc2 = doc1.fork("ffaaff")
// put a value concurrently
doc1.put("_root","key3","doc1val")
doc2.put("_root","key3","doc2val")
doc1.merge(doc2)
doc1.get("_root","key3") // returns ["str", "doc2val"]
doc1.getAll("_root","key3") // returns [[ "str", "doc1val"], ["str", "doc2val"]]
doc1.free(); doc2.free()
```
### Counters
Counters are 64 bit ints that support the increment operation. Frequently different actors will want to increment or decrement a number and have all these coalesse into a merged value.
```javascript
let doc1 = create("aaaaaa")
doc1.put("_root", "number", 0)
doc1.put("_root", "total", 0, "counter")
let doc2 = doc1.fork("bbbbbb")
doc2.put("_root", "number", 10)
doc2.increment("_root", "total", 11)
doc1.put("_root", "number", 20)
doc1.increment("_root", "total", 22)
doc1.merge(doc2)
doc1.materialize("_root") // returns { number: 10, total: 33 }
doc1.free(); doc2.free()
```
### Transactions
Generally speaking you don't need to think about transactions when using Automerge. Normal edits queue up into an in-progress transaction. You can query the number of ops in the current transaction with `pendingOps()`. The transaction will commit automatically on certains calls such as `save()`, `saveIncremental()`, `fork()`, `merge()`, `getHeads()`, `applyChanges()`, `generateSyncMessage()`, and `receiveSyncMessage()`. When the transaction commits the heads of the document change. If you want to roll back all the in progress ops you can call `doc.rollback()`. If you want to manually commit a transaction in progress you can call `doc.commit()` with an optional commit message and timestamp.
```javascript
let doc = create()
doc.put("_root", "key", "val1")
doc.get("_root", "key") // returns ["str","val1"]
doc.pendingOps() // returns 1
doc.rollback()
doc.get("_root", "key") // returns null
doc.pendingOps() // returns 0
doc.put("_root", "key", "val2")
doc.pendingOps() // returns 1
doc.commit("test commit 1")
doc.get("_root", "key") // returns ["str","val2"]
doc.pendingOps() // returns 0
doc.free()
```
### Viewing Old Versions of the Document
All query functions can take an optional argument of `heads` which allow you to query a prior document state. Heads are a set of change hashes that uniquely identify a point in the document history. The `getHeads()` method can retrieve these at any point.
```javascript
let doc = create()
doc.put("_root", "key", "val1")
let heads1 = doc.getHeads()
doc.put("_root", "key", "val2")
let heads2 = doc.getHeads()
doc.put("_root", "key", "val3")
doc.get("_root","key") // returns ["str","val3"]
doc.get("_root","key",heads2) // returns ["str","val2"]
doc.get("_root","key",heads1) // returns ["str","val1"]
doc.get("_root","key",[]) // returns null
doc.free()
```
This works for `get()`, `getAll()`, `keys()`, `length()`, `text()`, and `materialize()`
Queries of old document states are not indexed internally and will be slower than normal access. If you need a fast indexed version of a document at a previous point in time you can create one with `doc.forkAt(heads, actor?)`
### Forking and Merging
You can `fork()` a document which makes an exact copy of it. This assigns a new actor so changes made to the fork can be merged back in with the original. The `forkAt()` takes a Heads, allowing you to fork off a document from a previous point in its history. These documents allocate new memory in WASM and need to be freed.
The `merge()` command applies all changes in the argument doc into the calling doc. Therefore if doc a has 1000 changes that doc b lacks and doc b has only 10 changes that doc a lacks, `a.merge(b)` will be much faster than `b.merge(a)`.
```javascript
let doc1 = create()
doc1.put("_root", "key1", "val1")
let doc2 = doc1.fork()
doc1.put("_root", "key2", "val2")
doc2.put("_root", "key3", "val3")
doc1.merge(doc2)
doc1.materialize("_root") // returns { key1: "val1", key2: "val2", key3: "val3" }
doc2.materialize("_root") // returns { key1: "val1", key3: "val3" }
doc1.free(); doc2.free()
```
Note that calling `a.merge(a)` will produce an unrecoverable error from the wasm-bindgen layer which (as of this writing) there is no workaround for.
### Saving and Loading
Calling `save()` converts the document to a compressed `Uint8Array()` that can be saved to durable storage. This format uses a columnar storage format that compresses away most of the Automerge metadata needed to manage the CRDT state, but does include all of the change history.
If you wish to incrementally update a saved Automerge doc you can call `saveIncremental()` to get a `Uint8Array()` of bytes that can be appended to the file with all the new changes(). Note that the `saveIncremental()` bytes are not as compressed as the whole document save as each chunk has metadata information needed to parse it. It may make sense to periodically perform a new `save()` to get the smallest possible file footprint.
The `load()` function takes a `Uint8Array()` of bytes produced in this way and constitutes a new document. The `loadIncremental()` method is available if you wish to consume the result of a `saveIncremental()` with an already instanciated document.
```javascript
import { create, load } from "automerge-wasm"
let doc1 = create()
doc1.put("_root", "key1", "value1")
let save1 = doc1.save()
let doc2 = load(save1)
doc2.materialize("_root") // returns { key1: "value1" }
doc1.put("_root", "key2", "value2")
let saveIncremental = doc1.saveIncremental()
let save2 = doc1.save()
let save3 = new Uint8Array([... save1, ... saveIncremental])
// save2 has fewer bytes than save3 but contains the same ops
doc2.loadIncremental(saveIncremental)
let doc3 = load(save2)
let doc4 = load(save3)
doc1.materialize("_root") // returns { key1: "value1", key2: "value2" }
doc2.materialize("_root") // returns { key1: "value1", key2: "value2" }
doc3.materialize("_root") // returns { key1: "value1", key2: "value2" }
doc4.materialize("_root") // returns { key1: "value1", key2: "value2" }
doc1.free(); doc2.free(); doc3.free(); doc4.free()
```
One interesting feature of automerge binary saves is that they can be concatenated together in any order and can still be loaded into a coherent merged document.
```javascript
import { load } from "automerge-wasm"
import * as fs from "fs"
let file1 = fs.readFileSync("automerge_save_1");
let file2 = fs.readFileSync("automerge_save_2");
let docA = load(file1).merge(load(file2))
let docB = load(Buffer.concat([ file1, file2 ]))
assert.deepEqual(docA.materialize("/"), docB.materialize("/"))
assert.equal(docA.save(), docB.save())
```
### Syncing
When syncing a document the `generateSyncMessage()` and `receiveSyncMessage()` methods will produce and consume sync messages. A sync state object will need to be managed for the duration of the connection (created by the function `initSyncState()` and can be serialized to a Uint8Array() to preserve sync state with the `encodeSyncState()` and `decodeSyncState()` functions.
A very simple sync implementation might look like this.
```javascript
import { encodeSyncState, decodeSyncState, initSyncState } from "automerge-wasm"
let states = {}
function receiveMessageFromPeer(doc, peer_id, message) {
let syncState = states[peer_id]
doc.receiveMessage(syncState, message)
let reply = doc.generateSyncMessage(syncState)
if (reply) {
sendMessage(peer_id, reply)
}
}
function notifyPeerAboutUpdates(doc, peer_id) {
let syncState = states[peer_id]
let message = doc.generateSyncMessage(syncState)
if (message) {
sendMessage(peer_id, message)
}
}
function onDisconnect(peer_id) {
let state = states[peer_id]
if (state) {
saveSyncToStorage(peer_id, encodeSyncState(state))
}
delete states[peer_id]
}
function onConnect(peer_id) {
let state = loadSyncFromStorage(peer_id)
if (state) {
states[peer_id] = decodeSyncState(state)
} else {
states[peer_id] = initSyncState()
}
}
```
### Glossary: Actors
Some basic concepts you will need to know to better understand the api are Actors and Object Ids.
Actors are ids that need to be unique to each process writing to a document. This is normally one actor per device. Or for a web app one actor per tab per browser would be needed. It can be a uuid, or a public key, or a certificate, as your application demands. All that matters is that its bytes are unique. Actors are always expressed in this api as a hex string.
Methods that create new documents will generate random actors automatically - if you wish to supply your own it is always taken as an optional argument. This is true for the following functions.
```javascript
import { create, load } from "automerge-wasm"
let doc1 = create() // random actorid
let doc2 = create("aabbccdd")
let doc3 = doc1.fork() // random actorid
let doc4 = doc2.fork("ccdd0011")
let doc5 = load(doc3.save()) // random actorid
let doc6 = load(doc4.save(), "00aabb11")
let actor = doc1.getActor()
doc1.free(); doc2.free(); doc3.free(); doc4.free(); doc5.free(); doc6.free()
```
### Glossary: Object Id's
Object Ids uniquely identify an object within a document. They are represented as strings in the format of `{counter}@{actor}`. The root object is a special case and can be referred to as `_root`. The counter is an ever increasing integer, starting at 1, that is always one higher than the highest counter seen in the document thus far. Object Id's do not change when the object is modified but they do if it is overwritten with a new object.
```javascript
let doc = create("aabbcc")
let o1 = doc.putObject("_root", "o1", {})
let o2 = doc.putObject("_root", "o2", {})
doc.put(o1, "hello", "world")
assert.deepEqual(doc.materialize("_root"), { "o1": { hello: "world" }, "o2": {} })
assert.equal(o1, "1@aabbcc")
assert.equal(o2, "2@aabbcc")
let o1v2 = doc.putObject("_root", "o1", {})
doc.put(o1, "a", "b") // modifying an overwritten object - does nothing
doc.put(o1v2, "x", "y") // modifying the new "o1" object
assert.deepEqual(doc.materialize("_root"), { "o1": { x: "y" }, "o2": {} })
doc.free()
```