automerge

Author	SHA1	Message	Date
Alex Good	9271b20cf5	Correct logic when skip = B and fix formatting A few tests were failing which exposed the fact that if skip is `B` (the out factor of the OpTree) then we set `skip = None` and this causes us to attempt to return `Skip` in a non root node. I ported the failing test from JS to Rust and fixed the problem. I also fixed the formatting issues.	2023-02-14 17:21:59 +00:00
Orion Henry	5e82dbc3c8	rework how skip works to push the logic into node	2023-02-14 17:21:59 +00:00
Conrad Irwin	2cd7427f35	Use our leb128 parser for values This ensures that values in automerge documents are encoded correctly, and that no extra data is smuggled in any LEB fields.	2023-02-09 15:46:22 +00:00
Alex Good	a24d536d16	Move automerge::SequenceTree to automerge_wasm::SequenceTree The `SequenceTree` is only ever used in `automerge_wasm` so move it there.	2023-02-05 11:08:33 +00:00
Alex Good	13a775ed9a	Speed up loading by generating clocks on demand Context: currently we store a mapping from ChangeHash -> Clock, where `Clock` is the set of (ActorId, (Sequence number, max Op)) pairs derived from the given change and it's dependencies. This clock is used to determine what operations are visible at a given set of heads. Problem: populating this mapping for documents with large histories containing many actors can be very slow as for each change we have to allocate and merge a bunch of hashmaps. Solution: instead of creating the clocks on load, create an adjacency list based representation of the change graph and then derive the clock from this graph when it is needed. Traversing even large graphs is still almost as fast as looking up the clock in a hashmap.	2023-02-03 16:15:15 +00:00
Alex Good	1e33c9d9e0	Use Automerge::load instead of load_incremental if empty Problem: when running the sync protocol for a new document the API requires that the user create an empty document and then call `receive_sync_message` on that document. This results in the OpObserver for the new document being called with every single op in the document history. For documents with a large history this can be extremely time consuming, but the OpObserver doesn't need to know about all the hidden states. Solution: Modify `Automerge::load_with` and `Automerge::apply_changes_with` to check if the document is empty before applying changes. If the document _is_ empty then we don't call the observer for every change, but instead use `automerge::observe_current_state` to notify the observer of the new state once all the changes have been applied.	2023-02-03 10:01:12 +00:00
Alex Good	c3c04128f5	Only observe the current state on load Problem: When loading a document whilst passing an `OpObserver` we call the OpObserver for every change in the loaded document. This slows down the loading process for two reasons: 1) we have to make a call to the observer for every op 2) we cannot just stream the ops into the OpSet in topological order but must instead buffer them to pass to the observer. Solution: Construct the OpSet first, then only traverse the visible ops in the OpSet, calling the observer. For documents with a deep history this results in vastly fewer calls to the observer and also allows us to construct the OpSet much more quickly. It is slightly different semantically because the observer never gets notified of changes which are not visible, but that shouldn't matter to most observers.	2023-02-03 10:01:12 +00:00
Alex Good	da55dfac7a	refactor: make fields of Automerge private The fields of `automerge::Automerge` were crate public, which made it hard to change the structure of `Automerge` with confidence. Make all fields private and put them behind accessors where necessary to allow for easy internal changes.	2023-02-03 10:01:12 +00:00
Conrad Irwin	a6959e70e8	More robust leb128 parsing (#515 ) Before this change i64 decoding did not work for negative numbers (not a real problem because it is only used for the timestamp of a change), and both u64 and i64 would allow overlong LEB encodings.	2023-01-31 17:54:54 +00:00
alexjg	08801ab580	automerge-rs: Introduce ReadDoc and SyncDoc traits and add documentation (#511 ) The Rust API has so far grown somewhat organically driven by the needs of the javascript implementation. This has led to an API which is quite awkward and unfamiliar to Rust programmers. Additionally there is no documentation to speak of. This commit is the first movement towards cleaning things up a bit. We touch a lot of files but the changes are all very mechanical. We introduce a few traits to abstract over the common operations between `Automerge` and `AutoCommit`, and add a whole bunch of documentation. * Add a `ReadDoc` trait to describe methods which read value from a document. make `Transactable` extend `ReadDoc` * Add a `SyncDoc` trait to describe methods necessary for synchronizing documents. * Put the `SyncDoc` implementation for `AutoCommit` behind `AutoCommit::sync` to ensure that any open transactions are closed before taking part in the sync protocol * Split `OpObserver` into two traits: `OpObserver` + `BranchableObserver`. `BranchableObserver` captures the methods which are only needed for observing transactions. * Add a whole bunch of documentation. The main changes Rust users will need to make is: * Import the `ReadDoc` trait wherever you are using the methods which have been moved to it. Optionally change concrete paramters on functions to `ReadDoc` constraints. * Likewise import the `SyncDoc` trait wherever you are doing synchronisation work * If you are using the `AutoCommit::_sync_message` methods you will need to add a call to `AutoCommit::sync()` first. E.g. `doc.generate_sync_message` becomes `doc.sync().generate_sync_message` If you have an implementation of `OpObserver` which you are using in an `AutoCommit` then split it into an implementation of `OpObserver` and `BranchableObserver`	2023-01-30 19:37:03 +00:00
Conrad Irwin	931ee7e77b	Add Fuzz Testing (#498 ) * Add fuzz testing for document load * Fix fuzz crashers and add to test suite	2023-01-25 16:03:05 +00:00
alexjg	819767cc33	fix: use saturating_sub when updating cached text width (#505 ) Problem: In `automerge::query::Index::change_vis` we use `-=` to subtract the width of an operation which is being hidden from the text widths which we store on the index of each node in the optree. This index represents the width of all the visible text operations in this node and below. This was causing an integer underflow error when encountering some list operations. More specifically, when a `ScalarValue::Str` in a list was made invisible by a later operation which contained a _shorter_ string, the width subtracted from the indexed text widths could be longer than the current index. Solution: use `saturating_sub` instead. This is technically papering over the problem because really the width should never go below zero, but the text widths are only relevant for text objects where the existing logic works as advertised because we don't have a `set` operation for text indices. A more robust solution would be to track the type of the Index (and consequently of the `OpTree`) at the type level, but time is limited and problems are infinite. Also, add a lengthy description of the reason we are using `saturating_sub` so that when I read it in about a month I don't have to redo the painful debugging process that got me to this commit.	2023-01-23 19:19:55 +00:00
Andrew Jeffery	1f7b109dcd	Add From<SmolStr> for ScalarValue::Str (#506 )	2023-01-23 17:01:41 +00:00
Conrad Irwin	98e755106f	Fix and simplify lebsize calculations (#503 ) Before this change numbits_i64() was incorrect for every value of the form 0 - 2^x. This only manifested in a visible error if x%7 == 6 (so for -64, -8192, etc.) at which point `lebsize` would return a value one too large, causing a panic in commit().	2023-01-23 11:01:05 +00:00
alexjg	9b44a75f69	fix: don't panic when generating parents for hidden objects (#500 ) Problem: the `OpSet::export_key` method uses `query::ElemIdPos` to determine the index of sequence elements when exporting a key. This query returned `None` for invisible elements. The `Parents` iterator which is used to generate paths to objects in patches in `automerge-wasm` used `export_key`. The end result is that applying a remote change which deletes an object in a sequence would panic as it tries to generate a path for an invisible object. Solution: modify `query::ElemIdPos` to include invisible objects. This does mean that the path generated will refer to the previous visible object in the sequence as it's index, but this is probably fine as for an invisible object the path shouldn't be used anyway. While we're here also change the return value of `OpSet::export_key` to an `Option` and make `query::Index::ops` private as obeisance to the Lady of the Golden Blade.	2023-01-19 21:11:36 +00:00
alexjg	d8baa116e7	automerge-rs: Add `ExId::to_bytes` (#491 ) The `ExId` structure has some internal details which make lookups for object IDs which were produced by the document doing the looking up faster. These internal details are quite specific to the implementation so we don't want to expose them as a public API. On the other hand, we need to be able to serialize `ExId`s so that FFI clients can hold on to them without referencing memory which is owned by the document (ahem, looking at you Java). Introduce `ExId::to_bytes` and `TryFrom<&[u8]> ExId` implementing a canonical serialization which includes a version tag, giveing us compatibility options if we decide to change the implementation.	2023-01-19 17:02:47 +00:00
alexjg	964ae2bd81	Fix SeekOpWithPatch on optrees with only internal optrees (#496 ) In #480 we fixed an issue where `SeekOp` calculated an incorrect insertion index on optrees where the only visible ops were on internal nodes. We forgot to port this fix to `SeekOpWithPatch`, which has almost the same logic just with additional work done in order to notify an `OpObserver` of changes. Add a test and fix to `SeekOpWithPatch`	2023-01-14 11:27:48 +00:00
Alex Good	5763210b07	wasm: Allow a choice of text representations The wasm codebase assumed that clients want to represent text as a string of characters. This is faster, but in order to enable backwards compatibility we add a `TextRepresentation` argument to `automerge_wasm::Automerge::new` to allow clients to choose between a `string` or `Array<any>` representation. The `automerge_wasm::Observer` will consult this setting to determine what kind of diffs to generate.	2023-01-10 12:52:19 +00:00
Alex Good	18a3f61704	Update rust toolchain to 1.66	2023-01-10 12:51:56 +00:00
Alex Good	4de0756bb4	Correctly handle ops on optree node boundaries The `SeekOp` query can produce incorrect results when the optree it is searching only has visible ops on the internal nodes. Add some tests to demonstrate the issue as well as a fix.	2022-12-20 20:38:29 +00:00
Alex Good	0f90fe4d02	Add a method for loading a document without verifying heads This is primarily useful when debugging documents which have been corrupted somehow so you would like to see the ops even if you can't trust them. Note that this is _not_ currently useful for performance reasons as the hash graph is still constructed, just not verified.	2022-12-19 16:30:14 +00:00
Conrad Irwin	6dad2b7df1	Don't panic on invalid gzip stream (#477 ) * Don't panic on invalid gzip stream Before this change automerge-rs would panic if the gzip data in a raw column was invalid; after this change the error is propagated to the caller correctly.	2022-12-14 17:34:22 +00:00
Orion Henry	b78211ca65	change opid to (u32,u32) - 10% performance uptick (#473 )	2022-12-11 18:56:20 +00:00
Orion Henry	1222fc0df1	rewrite opnode to store usize instead of Op (#471 )	2022-12-10 10:36:05 +00:00
Orion Henry	2db9e78f2a	Text v2. JS Api now uses text by default (#462 )	2022-12-09 23:48:07 +00:00
Alex Good	0ab6a770d8	wasm: improve error messages The error messages produced by various conversions in `automerge-wasm` were quite uninformative - often consisting of just returning the offending value with no description of the problem. The logic of these error messages was often hard to trace due to the use of `JsValue` to represent both error conditions and valid values - evidenced by most of the public functions of `automerge-wasm` having return types of `Result<JsValue, JsValue>`. Change these return types to mention specific errors, thus enlisting the compilers help in ensuring that specific error messages are emitted.	2022-12-02 14:42:55 +00:00
Alex Good	de16adbcc5	Explicity create empty changes Transactions with no ops in them are generally undesirable. They take up space in the change log but do nothing else. They are not useless though, it may occasionally be necessary to create an empty change in order to list all the current heads of the document as dependents of the empty change. The current API makes no distinction between empty changes and non-empty changes. If the user calls `Transaction::commit` a change is created regardless of whether there are ops to commit. To provide a more useful API modify `commit` so that if there is a no-op transaction then no changes are created, but provide explicit methods to create an empty change via `Transaction::empty_change`, `Automerge::empty_change` and `Autocommit::empty_change`. Also make these APIs available in Javascript and C.	2022-12-02 12:12:54 +00:00
Alex Good	ea5688e418	rust: Make fields of `Transaction` and `TransactionInner` private It's tricky to modify these structs with the fields public as every change requires scanning the codebase for references to make sure you're not breaking any invariants. Make the fields private to ease development.	2022-12-02 12:12:54 +00:00
Alex Good	149f870102	rust: Remove `Default` constraint from `OpObserver`	2022-12-02 12:12:54 +00:00
Orion Henry	aaddb3c9ea	fix error message	2022-11-28 15:43:27 -06:00
Jason Kankiewicz	7c9f927136	Fixed code formatting violations.	2022-11-27 23:52:47 -08:00
Jason Kankiewicz	a324b02005	Added `automerge::AutomergeError::InvalidActorId`. Added `automerge::AutomergeError::InvalidCharacter`. Alphabetized the `automerge::AutomergeError` variants.	2022-11-27 23:52:47 -08:00
Alex Good	484a5bac4f	rust: Add Transactable::base_heads Sometimes it is necessary to query the heads of a document at the time a transaction started without having a mutable reference to the transactable. Add `Transactable::base_heads` to do this.	2022-11-27 16:39:02 +00:00
alexjg	22d60987f6	Dont send duplicate sync messages (#460 ) The API of Automerge::generate_sync_message requires that the user keep track of in flight messages themselves if they want to avoid sending duplicate messages. To avoid this add a flag to `automerge::sync::State` to track if there are any in flight messages and return `None` from `generate_sync_message` if there are.	2022-11-22 18:29:06 +00:00
Orion Henry	ca25ed0ca0	automerge-wasm: Use a SequenceTree in the OpObserver Generating patches to text objects (a la the edit-trace benchmark) was very slow due to appending to the back of a Vec. Use the SequenceTree (effectively a B-tree) instead so as to speed up sequence patch generation.	2022-11-22 12:13:42 +00:00
Alex Good	b53584bec0	Ritual obeisance before the altar of clippy	2022-11-05 22:48:43 +00:00
Orion Henry	d7d2916acb	tiny change that might remove a bloom filter false positive error	2022-10-21 15:15:30 -05:00
Alex Good	dd3c6d1303	Move rust workspace into ./rust After some discussion with PVH I realise that the repo structure in the last reorg was very rust-centric. In an attempt to put each language on a level footing move the rust code and project files into ./rust	2022-10-16 19:55:51 +01:00

38 commits