With the `OpObserver` moving to the transaction rather than being passed
in to the `Transaction::commit` method we have needed to add a way to
get the observer back out of the transaction (via
`Transaction::observer` and `AutoCommit::observer`). This `Observer`
type is then used to handle patch generation logic. However, there are
cases where we might not want an `OpObserver` and in these cases we can
execute various things fast - so we need to have something like an
`Option<OpObserver>`. In order to track the presence or otherwise of the
observer at the type level introduce
`automerge::transaction::observation`, which is a type level `Option`.
This allows us to efficiently choose the right code paths whilst
maintaining correct types for `Transaction::observer` and
`AutoCommit::observer`
For the path to be accurate it needs to be calculated at the moment of op insert
not at commit. This is because the path may contain list indexes in parent
objects that could change by inserts and deletes later in the transaction.
The primary change was adding op_observer to the transaction object and
removing it from commit options. The beginnings of a wasm level
`applyPatch` system is laid out here.
The logic in `query::Prop` works by first doing a binary search in the
OpTree for the node where the key we are looking for starts, and then
proceeding from this point forwards skipping over nodes which contain
only invisible ops. This logic was incorrect if the start index returned
by the binary search was in the last child of the optree and the last
child only contains invisible ops. In this case the index returned by
the query would be greater than the length of the optree.
Clamp the index returned by the query to the total length of the opset.
Sync messages encode changes as length prefixed byte arrays. We were
calculating the length using the uncompressed bytes of a change but
encoding the bytes of the change using the (possibly) compressed bytes.
This meant that if a change was large enough to compress then it would
fail to decode. Switch to using uncompressed bytes in sync messages.
The logic for loading compressed document chunks has a check that the
`max_op` of a change is valid. This check was overly strict in that it
checked that the max op was strictly larger than the max op of a
previous strange - this rejects valid documents which contain changes
with no ops in them, in which case the max op can be equal to the max op
of the previous change. Loosen the logic to allow empty changes.
The logic for reconstructing changes from the compressed document format
records operations which set a key in an object so that it can later
reconstruct delete operations from the successor list of the document
format operations. The logic to do this was only recording set
operations and not `make*` operations. This meant that delete operations
targeting `make*` operations could not be loaded correctly.
Correctly record `make*` operations for later use in constructing delete
operations.
Occasionally one needs to debug problems in a document with a large
number of objects. In this case it is unhelpful to print a graphviz of
the whole opset because there are too many objects. Add a
`Option<Vec<ObjId>>` argument to `OpSet::visualise` to filter the
objects which are visualised.
The compressed document format includes at the end of the document chunk
the indicies of the heads of the document. Older versions of the
javascript implementation do not include these indicies so we allow them
to be omitted when decoding.
Whilst we're here add some tracing::trace logs to make it easier to
understand where parsing is failing.
The latest clippy (90.1.65 for me) added a lint which checks for types
that implement `PartialEq` and could implement `Eq`
(`derive_partial_eq_without_eq`). Add a `derive(Eq)` in a bunch of
places to satisfy this lint.
For some usecases the overhead of compressed columns in the document
format is not worth it. Add `Automerge::save_nocompress` to save without
compressing columns.
Signed-off-by: Alex Good <alex@memoryandthought.me>
This is achieved by liberal use of feature flags. Main additions are:
* Build the OpSet more efficiently when loading from compressed
document storage using a DocObserver as implemented in
`automerge::op_tree::load`
* Reimplement the parsing login in the various types in
`automerge::sync`
There are numerous other small changes required to get the types to line
up.
Signed-off-by: Alex Good <alex@memoryandthought.me>
It is useful to be able to generate a `serde::Value` representation of
an automerge document. We can do this without an intermediate type by
iterating over the keys of the document recursively. Add
`autoeserde::AutoSerde` to implement this.
Signed-off-by: Alex Good <alex@memoryandthought.me>
Implement parsing the binary format using the new parser library and the
new encoding types. This is superior to the previous parsing
implementation in that invalid data should never cause panics and it
exposes and interface to construct an OpSet from a saved document much
more efficiently.
Signed-off-by: Alex Good <alex@memoryandthought.me>
The representation of changes in storage-v2 is different to the existing
representation so add accessor methods to the fields of `Change` and
make all accesses go through them. This allows the change representation
in storage-v2 to be a drop-in.
Signed-off-by: Alex Good <alex@memoryandthought.me>
The existing implementation of the columnar format elides a lot of error
handling (by converting `Err` to `None`) and doesn't allow writing to a
single chunk of memory when encoding. Implement a new set of encoding and
decoding primitives which handle errors more robustly and allow us to
use a single chunk of memory when reading and writing.
Signed-off-by: Alex Good <alex@memoryandthought.me>
Op IDs in the OpSet are represented using an index into a set of actor
IDs. This is efficient but requires conversion when reading and
writing from storage (where the set of actors might be different from
ths in the OpSet). Add a trait for converting between different
representations of an OpID.
Signed-off-by: Alex Good <alex@memoryandthought.me>
We have parsing needs which are slightly more complex than just reading
stuff from a buffer, but not complex enough to justify a dependency on a
parsing library. Implement a simple parser combinator library for use in
parsing the binary storage format.
Signed-off-by: Alex Good <alex@memoryandthought.me>
The colunar storage format allows for values which we do not know the
type of. In order that we can handle these types in a forward compatible
way we add ScalarValue::Unknown.
Signed-off-by: Alex Good <alex@memoryandthought.me>
as `AMgetChangeByHash()`.
Add the `AM_CHANGE_HASH_SIZE` macro define constant for
`AMgetChangeByHash()`.
Replace the literal `32` with the `automerge::types::HASH_SIZE` constant.
Expose `automerge::AutoCommit::splice()` as `AMsplice()`.
Add the `automerge::error::AutomergeError::InvalidValueType` variant for
`AMsplice()`.
Add push functionality to `AMspliceText()`.
Fix some documentation content bugs.
Fix some documentation formatting bugs.
The ordering of opids in the successor and predecessors of an op is
relevant when encoding because inconsistent ordering changes the
hashgraph. This means we must maintain the invariant that opids are
encoded in ascending lamport order. We have been maintaining this
invariant in the encoding implementation - however, this is not ideal
because it requires allocating for every op in the change when we commit
a transaction.
Add `types::OpIds` and use it in place of `Vec<OpId>` for `Op::succ` and
`Op::pred`. `OpIds` maintains the invariant that the IDs it contains
must be ordered with respect to some comparator function - which is
always `OpSetMetadata::lamport_cmp`. Remove the sorting of opids in
SuccEncoder::append.
Makes SuccEncoder sort successors in Lamport clock order.
Such an ordering is expected by automerge js when loading documents,
otherwise some documents fail to load with a "operation IDs are not in
ascending order" error.
There is easy confusion when calling parents with the id of a scalar,
wanting it to get the parent object first but that is not implemented.
To get the parent object of a scalar id would mean searching every
object for the OpId which may get too expensive when lots of objects are
around, this may be reconsidered later but the result would still be
useful to indicate when the id doesn't exist in the document vs has no
parents.