When get_changes has to fall back to get_changes_slow it can be very
slow, having to traverse the entire graph but also suffers from having
to keep track of the seen dependencies in a hashset (which rather
thrashes the memory).
Instead, if we calculate the vector clock for each hash we are given and
take the maximum over those then we can see which changes we need more
easily. In the worst case this still suffers from going through the
entire graph but this could be cached for later use. The worst case is
now when an actor makes a single change right at the start of the
history. In this case we have to go through the entire graph just to
find it and ensure that it is in the closure.
The javascript implementation of automerge sorts actor IDs
lexicographically when encoding changes. We were sorting actor IDs in
the order the appear in the change we're encoding. This meant that the
index that we assigned to operations in the encoded change was different
to that which the javascript implementation assigns, resulting in
mismatched head errors as the hashes we created did not match the
javascript implementation.
This change fixes the issue by sorting actor IDs lexicographically. We
make a pass over the operations in the change before encoding to collect
the actor IDs and sort them. This means we no longer need to pass a
mutable `Vec<ActorId>` to the various encode functions, which cleans
things up a little.
Fixes#240
This resulted in one failing test which was due to the pending_changes
we report for a patch being incorrectly calculated from missing
dependencies. I've added a test for this failure and fixed it, interop
tests now pass.
* Use tinyvec for actor id
Most of the time users will not be using their own identifiers and so
use random uuids. These can fit nicely on the stack for a speedup.
TinyVec allows us to capture this mostly stack, sometimes heap
behaviour.
* Use git version of tinyvec for arbitrary
* Add SortedVec struct to ensure preds are sorted
* Add into_iter for sortedvec
* Specify capacity of bytes
* Pass docops by value
* Use Cow for operation keys
* More preallocation
* Use swap remove instead of remove to avoid O(n)
* Use default compression level for better balance of performance
* Remove result for incorporate_new_op
* Import actor once
* Allocate for the new value too in LiteralRun
The literalrun will finish with pushing the new `value` onto the vec so
we can allocate for that from the start.
* Set capacity of rangemap
* Preallocate data
* Flatten object type
* Use separate construct functions
* Use separate gen_*_diff functions
* Remove maptype and seqtype from Diffs
* Preallocate ops in new_map_or_table
* More preallocations
It wasn't really uncompressed as we have compressed and uncompressed
changes in the backend. It is just not encoded into the binary format.
The module separation (protocol vs backend) should help with the
distinction.