* Fix automerge-c tests on mac
* Generate significantly smaller automerge-c builds
This cuts the size of libautomerge_core.a from 25Mb to 1.6Mb on macOS
and 53Mb to 2.7Mb on Linux.
As a side-effect of setting codegen-units = 1 for all release builds the
optimized wasm files are also 100kb smaller.
Since b78211ca6, OpIds have been silently truncated to 2**32. This
causes corruption in the case the op id overflows.
This change converts the silent error to a panic, and guards against the
panic on the codepath found by the fuzzer.
* Fix doubly-reported ops in load of change chunks
Since c3c04128f5, observers have been
called twice when calling Automerge::load() with change chunks.
* Better handle change chunks with missing deps
Before this change Automerge::load would panic if you passed a change
chunk that was missing a dependency, or multiple change chunks not in
strict dependency order. After this change these cases will error
instead.
The AMvalue union, AMlistItem struct, AMmapItem struct, and AMobjItem struct are gone, replaced by the AMitem struct.
The AMchangeHashes, AMchanges, AMlistItems, AMmapItems, AMobjItems, AMstrs, and AMsyncHaves iterators are gone, replaced by the AMitems iterator.
The AMitem struct is opaque, getting and setting values is now achieved exclusively through function calls.
The AMitemsNext(), AMitemsPrev(), and AMresultItem() functions return a pointer to an AMitem struct so you ultimately get the same thing whether you're iterating over a sequence or calling AMmapGet() or AMlistGet().
Calling AMitemResult() on an AMitem struct will produce a new AMresult struct referencing its storage so now the AMresult struct for an iterator can be subsequently freed without affecting the AMitem structs that were filtered out of it.
The storage for a set of AMitem structs can be recombined into a single AMresult struct by passing pointers to their corresponding AMresult structs to AMresultCat().
For C/C++ programmers, I've added AMstrCmp(), AMstrdup(), AM{idxType,objType,status,valType}ToString() and AM{idxType,objType,status,valType}FromString(). It's also now possible to pass arbitrary parameters through AMstack{Item,Items,Result}() to a callback function.
A few tests were failing which exposed the fact that if skip is `B` (the
out factor of the OpTree) then we set `skip = None` and this causes us
to attempt to return `Skip` in a non root node. I ported the failing
test from JS to Rust and fixed the problem.
I also fixed the formatting issues.
A few tests were failing which exposed the fact that if skip is `B` (the
out factor of the OpTree) then we set `skip = None` and this causes us
to attempt to return `Skip` in a non root node. I ported the failing
test from JS to Rust and fixed the problem.
I also fixed the formatting issues.
The previous approach of using the key and insert columns of existing
ops was leading to quite confusing code. There's no real cost to
introducing new columns so I've switched the code to do that instead.
Introduce an `expand` and a `mark_name` column. `expand` is a boolean
column and `mark_name` is a RLE encoded string column. Neither of these
columns are encoded if they are empty.
Also move the `MarkData::name` property to use strings interned in
`OpSetMetadata::props` rather than representing the string directly on
the basis that we probably will have a lot of repeated mark names and
we do a bunch of equality checks on them while searching so this will
probably speed things up a bit.
Introduced a new `MaybeBooleanEncoder` (and associated `MaybeBooleanDecoder` and
`MaybeBooleanRange`) types to represent a boolean column which is
entirely skipped if all it contains are `false` values. This allows us
to omit encoding the `expand` column for groups of ops which only ever
set it to `false` which in turn makes us backwards compatible when not
using marks.
Context: currently we store a mapping from ChangeHash -> Clock, where
`Clock` is the set of (ActorId, (Sequence number, max Op)) pairs derived
from the given change and it's dependencies. This clock is used to
determine what operations are visible at a given set of heads.
Problem: populating this mapping for documents with large histories
containing many actors can be very slow as for each change we have to
allocate and merge a bunch of hashmaps.
Solution: instead of creating the clocks on load, create an adjacency
list based representation of the change graph and then derive the clock
from this graph when it is needed. Traversing even large graphs is still
almost as fast as looking up the clock in a hashmap.
Problem: when running the sync protocol for a new document the API
requires that the user create an empty document and then call
`receive_sync_message` on that document. This results in the OpObserver
for the new document being called with every single op in the document
history. For documents with a large history this can be extremely time
consuming, but the OpObserver doesn't need to know about all the hidden
states.
Solution: Modify `Automerge::load_with` and
`Automerge::apply_changes_with` to check if the document is empty before
applying changes. If the document _is_ empty then we don't call the
observer for every change, but instead use
`automerge::observe_current_state` to notify the observer of the new
state once all the changes have been applied.
Problem: When loading a document whilst passing an `OpObserver` we call
the OpObserver for every change in the loaded document. This slows down
the loading process for two reasons: 1) we have to make a call to the
observer for every op 2) we cannot just stream the ops into the OpSet in
topological order but must instead buffer them to pass to the observer.
Solution: Construct the OpSet first, then only traverse the visible ops
in the OpSet, calling the observer. For documents with a deep history
this results in vastly fewer calls to the observer and also allows us to
construct the OpSet much more quickly. It is slightly different
semantically because the observer never gets notified of changes which
are not visible, but that shouldn't matter to most observers.
The fields of `automerge::Automerge` were crate public, which made it
hard to change the structure of `Automerge` with confidence. Make all
fields private and put them behind accessors where necessary to allow
for easy internal changes.
Before this change i64 decoding did not work for negative numbers (not a
real problem because it is only used for the timestamp of a change),
and both u64 and i64 would allow overlong LEB encodings.
The Rust API has so far grown somewhat organically driven by the needs of the
javascript implementation. This has led to an API which is quite awkward and
unfamiliar to Rust programmers. Additionally there is no documentation to speak
of. This commit is the first movement towards cleaning things up a bit. We touch
a lot of files but the changes are all very mechanical. We introduce a few
traits to abstract over the common operations between `Automerge` and
`AutoCommit`, and add a whole bunch of documentation.
* Add a `ReadDoc` trait to describe methods which read value from a document.
make `Transactable` extend `ReadDoc`
* Add a `SyncDoc` trait to describe methods necessary for synchronizing
documents.
* Put the `SyncDoc` implementation for `AutoCommit` behind `AutoCommit::sync` to
ensure that any open transactions are closed before taking part in the sync
protocol
* Split `OpObserver` into two traits: `OpObserver` + `BranchableObserver`.
`BranchableObserver` captures the methods which are only needed for observing
transactions.
* Add a whole bunch of documentation.
The main changes Rust users will need to make is:
* Import the `ReadDoc` trait wherever you are using the methods which have been
moved to it. Optionally change concrete paramters on functions to `ReadDoc`
constraints.
* Likewise import the `SyncDoc` trait wherever you are doing synchronisation
work
* If you are using the `AutoCommit::*_sync_message` methods you will need to add
a call to `AutoCommit::sync()` first. E.g. `doc.generate_sync_message` becomes
`doc.sync().generate_sync_message`
* If you have an implementation of `OpObserver` which you are using in an
`AutoCommit` then split it into an implementation of `OpObserver` and
`BranchableObserver`
Problem: In `automerge::query::Index::change_vis` we use `-=` to
subtract the width of an operation which is being hidden from the text
widths which we store on the index of each node in the optree. This
index represents the width of all the visible text operations in this
node and below. This was causing an integer underflow error when
encountering some list operations. More specifically, when a
`ScalarValue::Str` in a list was made invisible by a later operation
which contained a _shorter_ string, the width subtracted from the indexed
text widths could be longer than the current index.
Solution: use `saturating_sub` instead. This is technically papering
over the problem because really the width should never go below zero,
but the text widths are only relevant for text objects where the
existing logic works as advertised because we don't have a `set`
operation for text indices. A more robust solution would be to track the
type of the Index (and consequently of the `OpTree`) at the type level,
but time is limited and problems are infinite.
Also, add a lengthy description of the reason we are using
`saturating_sub` so that when I read it in about a month I don't have
to redo the painful debugging process that got me to this commit.