automerge/rust/automerge/src/op_set/load.rs
Alex Good c3c04128f5 Only observe the current state on load
Problem: When loading a document whilst passing an `OpObserver` we call
the OpObserver for every change in the loaded document. This slows down
the loading process for two reasons: 1) we have to make a call to the
observer for every op 2) we cannot just stream the ops into the OpSet in
topological order but must instead buffer them to pass to the observer.

Solution: Construct the OpSet first, then only traverse the visible ops
in the OpSet, calling the observer. For documents with a deep history
this results in vastly fewer calls to the observer and also allows us to
construct the OpSet much more quickly. It is slightly different
semantically because the observer never gets notified of changes which
are not visible, but that shouldn't matter to most observers.
2023-02-03 10:01:12 +00:00

52 lines
1.5 KiB
Rust

use std::collections::HashMap;
use fxhash::FxBuildHasher;
use super::{OpSet, OpTree};
use crate::{
op_tree::OpTreeInternal,
storage::load::{DocObserver, LoadedObject},
types::ObjId,
};
/// An opset builder which creates an optree for each object as it finishes loading, inserting the
/// ops using `OpTreeInternal::insert`. This should be faster than using `OpSet::insert_*` but only
/// works because the ops in the document format are in the same order as in the optrees.
pub(crate) struct OpSetBuilder {
completed_objects: HashMap<ObjId, OpTree, FxBuildHasher>,
}
impl OpSetBuilder {
pub(crate) fn new() -> OpSetBuilder {
Self {
completed_objects: HashMap::default(),
}
}
}
impl DocObserver for OpSetBuilder {
type Output = OpSet;
fn object_loaded(&mut self, loaded: LoadedObject) {
let mut internal = OpTreeInternal::new();
for (index, op) in loaded.ops.into_iter().enumerate() {
internal.insert(index, op);
}
let tree = OpTree {
internal,
objtype: loaded.obj_type,
parent: loaded.parent,
last_insert: None,
};
self.completed_objects.insert(loaded.id, tree);
}
fn finish(self, metadata: super::OpSetMetadata) -> Self::Output {
let len = self.completed_objects.values().map(|t| t.len()).sum();
OpSet {
trees: self.completed_objects,
length: len,
m: metadata,
}
}
}