The Rust API has so far grown somewhat organically driven by the needs of the
javascript implementation. This has led to an API which is quite awkward and
unfamiliar to Rust programmers. Additionally there is no documentation to speak
of. This commit is the first movement towards cleaning things up a bit. We touch
a lot of files but the changes are all very mechanical. We introduce a few
traits to abstract over the common operations between `Automerge` and
`AutoCommit`, and add a whole bunch of documentation.
* Add a `ReadDoc` trait to describe methods which read value from a document.
make `Transactable` extend `ReadDoc`
* Add a `SyncDoc` trait to describe methods necessary for synchronizing
documents.
* Put the `SyncDoc` implementation for `AutoCommit` behind `AutoCommit::sync` to
ensure that any open transactions are closed before taking part in the sync
protocol
* Split `OpObserver` into two traits: `OpObserver` + `BranchableObserver`.
`BranchableObserver` captures the methods which are only needed for observing
transactions.
* Add a whole bunch of documentation.
The main changes Rust users will need to make is:
* Import the `ReadDoc` trait wherever you are using the methods which have been
moved to it. Optionally change concrete paramters on functions to `ReadDoc`
constraints.
* Likewise import the `SyncDoc` trait wherever you are doing synchronisation
work
* If you are using the `AutoCommit::*_sync_message` methods you will need to add
a call to `AutoCommit::sync()` first. E.g. `doc.generate_sync_message` becomes
`doc.sync().generate_sync_message`
* If you have an implementation of `OpObserver` which you are using in an
`AutoCommit` then split it into an implementation of `OpObserver` and
`BranchableObserver`
Problem: In `automerge::query::Index::change_vis` we use `-=` to
subtract the width of an operation which is being hidden from the text
widths which we store on the index of each node in the optree. This
index represents the width of all the visible text operations in this
node and below. This was causing an integer underflow error when
encountering some list operations. More specifically, when a
`ScalarValue::Str` in a list was made invisible by a later operation
which contained a _shorter_ string, the width subtracted from the indexed
text widths could be longer than the current index.
Solution: use `saturating_sub` instead. This is technically papering
over the problem because really the width should never go below zero,
but the text widths are only relevant for text objects where the
existing logic works as advertised because we don't have a `set`
operation for text indices. A more robust solution would be to track the
type of the Index (and consequently of the `OpTree`) at the type level,
but time is limited and problems are infinite.
Also, add a lengthy description of the reason we are using
`saturating_sub` so that when I read it in about a month I don't have
to redo the painful debugging process that got me to this commit.
Before this change numbits_i64() was incorrect for every value of the
form 0 - 2^x. This only manifested in a visible error if x%7 == 6 (so
for -64, -8192, etc.) at which point `lebsize` would return a value one
too large, causing a panic in commit().
Problem: the `OpSet::export_key` method uses `query::ElemIdPos` to
determine the index of sequence elements when exporting a key. This
query returned `None` for invisible elements. The `Parents` iterator
which is used to generate paths to objects in patches in
`automerge-wasm` used `export_key`. The end result is that applying a
remote change which deletes an object in a sequence would panic as it
tries to generate a path for an invisible object.
Solution: modify `query::ElemIdPos` to include invisible objects. This
does mean that the path generated will refer to the previous visible
object in the sequence as it's index, but this is probably fine as for
an invisible object the path shouldn't be used anyway.
While we're here also change the return value of `OpSet::export_key` to
an `Option` and make `query::Index::ops` private as obeisance to the
Lady of the Golden Blade.
The `ExId` structure has some internal details which make lookups for
object IDs which were produced by the document doing the looking up
faster. These internal details are quite specific to the implementation
so we don't want to expose them as a public API. On the other hand, we
need to be able to serialize `ExId`s so that FFI clients can hold on to
them without referencing memory which is owned by the document (ahem,
looking at you Java).
Introduce `ExId::to_bytes` and `TryFrom<&[u8]> ExId` implementing a
canonical serialization which includes a version tag, giveing us
compatibility options if we decide to change the implementation.
Some of the scripts in scripts/ci were not reliable detecting the path
they were operating in. Additionally the deno_tests script was not
correctly picking up the ROOT_MODULE environment variable. Add more
robust path handling and fix the deno_tests script.
In #480 we fixed an issue where `SeekOp` calculated an incorrect
insertion index on optrees where the only visible ops were on internal
nodes. We forgot to port this fix to `SeekOpWithPatch`, which has almost
the same logic just with additional work done in order to notify an
`OpObserver` of changes. Add a test and fix to `SeekOpWithPatch`
The release action we are working conditionally executes based on the
version of `automerge-wasm` in the previous commit. We need to trigger
it even though the version has not changed so we roll back the version
in this commit and the commit immediately following this will bump it
again.
We have been listing all the files to be included in the distributed
package in package.json:files. This is tedious and error prone. We
change to using globs instead, to do this without also including the
test and src files when outputting declarations we add a new typescript
config file for the declaration generation which excludes tests.
The new text features are faster and more ergonomic but not backwards
compatible. In order to make them backwards compatible re-expose the
original functionality and move the new API under a `future` export.
This allows users to interoperably use both implementations.
The wasm codebase assumed that clients want to represent text as a
string of characters. This is faster, but in order to enable backwards
compatibility we add a `TextRepresentation` argument to
`automerge_wasm::Automerge::new` to allow clients to choose between a
`string` or `Array<any>` representation. The `automerge_wasm::Observer`
will consult this setting to determine what kind of diffs to generate.
The tsconfig.json was setup to not include the JS tests. Update the
config to include the tests when checking typescript and fix all the
consequent errors. None of this is semantically meaningful _except_ for
a few incorrect usages of the API which were leading to flaky tests.
Hooray for types!
The `SeekOp` query can produce incorrect results when the optree it is
searching only has visible ops on the internal nodes. Add some tests to
demonstrate the issue as well as a fix.
This is primarily useful when debugging documents which have been
corrupted somehow so you would like to see the ops even if you can't
trust them. Note that this is _not_ currently useful for performance
reasons as the hash graph is still constructed, just not verified.
Automerge CLI depends transitively (via and old version of `clap` and
via `colored_json` on `atty` and `ansi_term`. These crates are both
marked as unmaintained and this generates irritating `cargo deny`
messages. To avoid this, implement colored JSON ourselves using the
`termcolor` crate - colored JSON is pretty mechanical. Also update
criterion and cbindgen dependencies and ignore the criterion tree in
deny.toml as we only ever use it in benchmarks.
All that's left now is a warning about atty in cbindgen, we'll just have
to wait for cbindgen to fix that, it's a build time dependency anyway so
it's not really an issue.
* Don't panic on invalid gzip stream
Before this change automerge-rs would panic if the gzip data in
a raw column was invalid; after this change the error is propagated
to the caller correctly.
* Use AMbyteSpan for byte values
Before this change there was an inconsistency between AMmapPutString
(which took an AMbyteSpan) and AMmapPutBytes (which took a pointer +
length).
Either is fine, but we should do the same in both places. I chose this
path to make it clear that the value passed in was an automerge value,
and to be symmetric with AMvalue.bytes when you do an AMmapGet().
I did not update other APIs (like load) that take a pointer + length, as
that is idiomatic usage for C, and these functions are not operating on
byte values stored in automerge.