docs: refresh pipeline notes and transport analysis
This commit is contained in:
655
GRAPH_TRANSPORT_ALTERNATIVES.md
Normal file
655
GRAPH_TRANSPORT_ALTERNATIVES.md
Normal file
@@ -0,0 +1,655 @@
|
||||
# Graph Transport Alternatives
|
||||
|
||||
## Purpose
|
||||
|
||||
This document compares alternatives to the current `/api/graph` transport format with two goals:
|
||||
|
||||
1. reduce the cost of building, transferring, and decoding very large graph payloads
|
||||
2. move the frontend transport shape closer to the renderer/GPU input shape while preserving all data the current frontend and backend pipeline still need
|
||||
|
||||
This analysis is based on the current repo state plus official documentation for browser fetch/streaming and candidate transport formats.
|
||||
|
||||
## Executive Summary
|
||||
|
||||
The current bottleneck is not the renderer's typed-array path. It is the browser's need to fully materialize a huge JSON object graph before the renderer ever runs.
|
||||
|
||||
The best candidates for this repo are:
|
||||
|
||||
1. **Custom binary columnar payload**
|
||||
- Best fit for the current renderer.
|
||||
- Lowest decode overhead.
|
||||
- Most direct path from backend memory to frontend typed arrays.
|
||||
- Requires custom protocol/versioning work.
|
||||
|
||||
2. **Apache Arrow IPC**
|
||||
- Best off-the-shelf columnar binary format.
|
||||
- Very good fit for typed-array-heavy rendering.
|
||||
- Strong option if you want a standard format instead of inventing one.
|
||||
- Heavier conceptual/tooling footprint than a custom binary envelope.
|
||||
|
||||
3. **Columnar JSON**
|
||||
- Easiest migration.
|
||||
- Better than today's row-oriented JSON.
|
||||
- Still fundamentally JSON, so it does not remove the browser's JSON parse/object-materialization cost.
|
||||
|
||||
4. **NDJSON / streamed chunked JSON**
|
||||
- Good if progressiveness matters.
|
||||
- Better than one giant monolithic JSON document.
|
||||
- Still weaker than a binary/columnar format for this renderer.
|
||||
|
||||
The strongest overall recommendation is:
|
||||
|
||||
- **Long-term**: custom binary columnar payload or Arrow IPC
|
||||
- **Low-risk interim**: columnar JSON, possibly with chunking/streaming
|
||||
|
||||
Not recommended as the primary solution for this repo:
|
||||
|
||||
- row-oriented MessagePack
|
||||
- Protocol Buffers as one giant message
|
||||
|
||||
## Verified Current Pipeline
|
||||
|
||||
### Backend side
|
||||
|
||||
The backend builds a `GraphResponse` and caches it in memory:
|
||||
|
||||
- `backend_go/models.go`
|
||||
- `backend_go/snapshot_service.go`
|
||||
- `backend_go/graph_snapshot.go`
|
||||
|
||||
The response shape is:
|
||||
|
||||
```go
|
||||
type GraphResponse struct {
|
||||
Nodes []Node
|
||||
Edges []Edge
|
||||
RouteSegments []RouteSegment
|
||||
Meta *GraphMeta
|
||||
}
|
||||
```
|
||||
|
||||
and it is currently written as one JSON document with:
|
||||
|
||||
```go
|
||||
json.NewEncoder(w).Encode(v)
|
||||
```
|
||||
|
||||
in `backend_go/http_helpers.go`.
|
||||
|
||||
### Frontend side
|
||||
|
||||
The frontend currently does:
|
||||
|
||||
1. `fetch("/api/graph?...")`
|
||||
2. `await graphRes.json()`
|
||||
3. read `graph.nodes`, `graph.edges`, `graph.route_segments`, `graph.meta`
|
||||
4. build:
|
||||
- `Float32Array xs`
|
||||
- `Float32Array ys`
|
||||
- `Uint32Array vertexIds`
|
||||
- `Uint32Array edgeData`
|
||||
- `Float32Array routeLineVertices`
|
||||
5. call `renderer.init(xs, ys, vertexIds, edgeData, routeLineVertices)`
|
||||
|
||||
Relevant files:
|
||||
|
||||
- `frontend/src/App.tsx`
|
||||
- `frontend/src/renderer.ts`
|
||||
|
||||
This means the current browser path is:
|
||||
|
||||
- wire bytes
|
||||
- JSON text/body handling
|
||||
- JS arrays of node/edge objects
|
||||
- typed arrays
|
||||
- renderer-side typed arrays/maps/GPU buffers
|
||||
|
||||
The expensive part happens before step 4.
|
||||
|
||||
## Verified Data Access Audit
|
||||
|
||||
This section verifies every field currently produced by the backend and whether it is actually needed by the frontend transport.
|
||||
|
||||
### Main graph response fields
|
||||
|
||||
| Field | Produced in backend | Used by frontend? | Where used | Required on wire for current UX? | Notes |
|
||||
| --- | --- | --- | --- | --- | --- |
|
||||
| `nodes[].id` | `backend_go/models.go` | Yes | `frontend/src/App.tsx` | Yes | Used to build `vertexIds`, and to map selected renderer indices back to backend IDs for selection queries. |
|
||||
| `nodes[].x` | `backend_go/models.go` | Yes | `frontend/src/App.tsx` | Yes | Used to build `xs`. |
|
||||
| `nodes[].y` | `backend_go/models.go` | Yes | `frontend/src/App.tsx` | Yes | Used to build `ys`. |
|
||||
| `nodes[].iri` | `backend_go/models.go` | Yes | `frontend/src/App.tsx` | Yes, if keeping current hover UX | Used for hover tooltip text. |
|
||||
| `nodes[].label` | `backend_go/models.go` | Yes | `frontend/src/App.tsx` | Yes, if keeping current hover UX | Used for hover tooltip text. |
|
||||
| `nodes[].termType` | `backend_go/models.go` | No frontend use | none in `frontend/src` | No | Still needed internally by backend snapshot/selection index. |
|
||||
| `edges[].source` | `backend_go/models.go` | Yes | `frontend/src/App.tsx` | Yes | Used to build `edgeData`. |
|
||||
| `edges[].target` | `backend_go/models.go` | Yes | `frontend/src/App.tsx` | Yes | Used to build `edgeData`. |
|
||||
| `edges[].predicate_id` | `backend_go/models.go` | No main-graph frontend use | none in `frontend/src/App.tsx` | No | Still needed internally by backend snapshot and hierarchy layout preparation. |
|
||||
| `route_segments[].points` | `backend_go/models.go` | Yes | `frontend/src/App.tsx` | Yes when route segments are present | Used to build `routeLineVertices`. |
|
||||
| `route_segments[].edge_index` | `backend_go/models.go` | Not used after parsing | `graphRouteSegmentArray` validation only | No | Could be dropped from frontend transport if route lines are pre-flattened. |
|
||||
| `route_segments[].kind` | `backend_go/models.go` | Not used after parsing | `graphRouteSegmentArray` validation only | No | Could be dropped from frontend transport if route lines are pre-flattened. |
|
||||
| `meta.backend` | `backend_go/models.go` | Yes | `frontend/src/App.tsx` | Yes | Displayed in overlay. |
|
||||
| `meta.nodes` | `backend_go/models.go` | Yes | `frontend/src/App.tsx` | Yes | Displayed in overlay. |
|
||||
| `meta.edges` | `backend_go/models.go` | Yes | `frontend/src/App.tsx` | Yes | Displayed in overlay. |
|
||||
| `meta.graph_query_id` | `backend_go/models.go` | Yes | `frontend/src/selection_queries/api.ts` | Yes | Sent back on selection endpoints. |
|
||||
| `meta.node_limit` | `backend_go/models.go` | Yes | `frontend/src/selection_queries/api.ts` | Yes | Sent back on selection endpoints. |
|
||||
| `meta.edge_limit` | `backend_go/models.go` | Yes | `frontend/src/selection_queries/api.ts` | Yes | Sent back on selection endpoints. |
|
||||
| `meta.ttl_path` | `backend_go/models.go` | No | none in `frontend/src` | No | Frontend type declares it, but current UI does not use it. |
|
||||
| `meta.sparql_endpoint` | `backend_go/models.go` | No | none in `frontend/src` | No | Not used by current UI. |
|
||||
| `meta.include_bnodes` | `backend_go/models.go` | No | none in `frontend/src` | No | Not used by current UI. |
|
||||
| `meta.layout_engine` | `backend_go/models.go` | No | none in `frontend/src` | No | Not used by current UI. |
|
||||
| `meta.layout_root_iri` | `backend_go/models.go` | No | none in `frontend/src` | No | Not used by current UI. |
|
||||
| `meta.predicates` | `backend_go/models.go` | No frontend use | none in `frontend/src` | No | Still used internally by backend selection/hierarchy logic. |
|
||||
|
||||
### Backend-internal fields that do not need to stay in the frontend transport
|
||||
|
||||
This is the most important audit result.
|
||||
|
||||
The backend currently reuses one struct for:
|
||||
|
||||
- internal cached snapshot
|
||||
- HTTP response payload
|
||||
|
||||
That is convenient, but it means the frontend receives fields that only the backend needs.
|
||||
|
||||
Verified internal-only dependencies:
|
||||
|
||||
- `snapshot.Nodes[].TermType` is used in `backend_go/selection_query.go` to build the selection index.
|
||||
- `snapshot.Meta.Predicates` is used in `backend_go/selection_query.go`.
|
||||
- `Edge.PredicateID` is used internally for hierarchy layout preparation in `backend_go/hierarchy_layout_bridge.go`.
|
||||
|
||||
The frontend does **not** need those fields for current behavior.
|
||||
|
||||
### What the frontend actually needs
|
||||
|
||||
For the current graph view, the hot path can be reduced to:
|
||||
|
||||
- `vertexIds[]`
|
||||
- `xs[]`
|
||||
- `ys[]`
|
||||
- `edgeSources[]`
|
||||
- `edgeTargets[]`
|
||||
- `routeLineVertices[]` or route geometry equivalent
|
||||
- `label[]` and `iri[]` by node index
|
||||
- `meta.backend`
|
||||
- `meta.nodes`
|
||||
- `meta.edges`
|
||||
- `meta.graph_query_id`
|
||||
- `meta.node_limit`
|
||||
- `meta.edge_limit`
|
||||
|
||||
That is much closer to a columnar or binary payload than to the current array-of-objects JSON.
|
||||
|
||||
## Why the Current JSON Path Hurts
|
||||
|
||||
`Response.json()` is not just a lightweight decode helper. MDN states that `Response.json()` reads the stream to completion and resolves with the result of parsing the body text as JSON into a JavaScript object.
|
||||
|
||||
That matters here because the current payload is row-oriented:
|
||||
|
||||
- millions of node objects
|
||||
- millions of edge objects
|
||||
|
||||
Even though the renderer later wants typed arrays, the browser must first create those JS objects.
|
||||
|
||||
This is exactly the part that can stall or run out of memory before `renderer.init(...)` starts.
|
||||
|
||||
## Alternatives
|
||||
|
||||
### 1. Columnar JSON
|
||||
|
||||
#### Idea
|
||||
|
||||
Keep JSON, but change the schema from row-oriented objects:
|
||||
|
||||
```json
|
||||
{
|
||||
"nodes": [{ "id": 1, "x": 0.1, "y": 0.2, ... }],
|
||||
"edges": [{ "source": 1, "target": 2, ... }]
|
||||
}
|
||||
```
|
||||
|
||||
to column-oriented arrays:
|
||||
|
||||
```json
|
||||
{
|
||||
"vertex_ids": [...],
|
||||
"xs": [...],
|
||||
"ys": [...],
|
||||
"edge_sources": [...],
|
||||
"edge_targets": [...],
|
||||
"node_labels": [...],
|
||||
"node_iris": [...],
|
||||
"route_line_vertices": [...],
|
||||
"meta": { ... }
|
||||
}
|
||||
```
|
||||
|
||||
#### Pros
|
||||
|
||||
- easiest migration from the current API contract
|
||||
- no schema compiler
|
||||
- easy to debug with ordinary tooling
|
||||
- much closer to what the renderer already consumes
|
||||
- avoids creating per-edge objects in frontend application code
|
||||
|
||||
#### Cons
|
||||
|
||||
- still goes through JSON parsing
|
||||
- still materializes JS arrays before typed arrays are built
|
||||
- huge numeric arrays in JSON are still text, not binary
|
||||
- string columns are still ordinary JS strings
|
||||
|
||||
#### Fit for current pipeline
|
||||
|
||||
Good.
|
||||
|
||||
No current frontend feature would be lost if the payload includes:
|
||||
|
||||
- ids/xs/ys/edge sources/targets
|
||||
- labels/iris
|
||||
- route line vertices or equivalent
|
||||
- the small subset of meta fields currently used
|
||||
|
||||
#### Overall assessment
|
||||
|
||||
Best low-risk intermediate step.
|
||||
|
||||
It is clearly better than today's row-oriented JSON, but it is not the endgame if the goal is to remove the parse bottleneck for 1 GB+ payloads.
|
||||
|
||||
### 2. NDJSON / Chunked JSON
|
||||
|
||||
#### Idea
|
||||
|
||||
Change the backend to stream multiple JSON records instead of one giant JSON object.
|
||||
|
||||
Examples:
|
||||
|
||||
- one line per chunk of nodes/edges
|
||||
- one line for metadata
|
||||
- one line per route segment chunk
|
||||
|
||||
NDJSON is explicitly designed for transporting multiple JSON texts in a stream protocol.
|
||||
|
||||
#### Pros
|
||||
|
||||
- can start processing before the whole payload arrives
|
||||
- better observability and progress reporting
|
||||
- easier cancellation/retry semantics
|
||||
- avoids one monolithic `Response.json()` boundary
|
||||
|
||||
#### Cons
|
||||
|
||||
- record-per-edge NDJSON would still create far too many JS objects
|
||||
- to be worth it here, it should be **chunked columnar NDJSON**, not row NDJSON
|
||||
- frontend load path must become stream-based
|
||||
- renderer still currently expects all arrays at once
|
||||
|
||||
#### Fit for current pipeline
|
||||
|
||||
Moderate.
|
||||
|
||||
It can preserve all current information, but it does not by itself solve the "final representation should look like GPU inputs" goal unless each chunk is already columnar.
|
||||
|
||||
#### Best shape if chosen
|
||||
|
||||
Not:
|
||||
|
||||
- one JSON object per edge
|
||||
- one JSON object per node
|
||||
|
||||
Better:
|
||||
|
||||
- one NDJSON record for metadata
|
||||
- then NDJSON records where each record contains columnar chunks:
|
||||
- `vertex_ids_chunk`
|
||||
- `xs_chunk`
|
||||
- `ys_chunk`
|
||||
- `edge_sources_chunk`
|
||||
- `edge_targets_chunk`
|
||||
|
||||
#### Overall assessment
|
||||
|
||||
Viable, but only attractive if progressiveness is a major goal. On its own, it is weaker than columnar binary formats for this renderer.
|
||||
|
||||
### 3. MessagePack
|
||||
|
||||
#### Idea
|
||||
|
||||
Use a compact binary encoding instead of JSON.
|
||||
|
||||
The official JavaScript implementation supports:
|
||||
|
||||
- `encode`
|
||||
- `decode`
|
||||
- `decodeAsync(stream)`
|
||||
- `decodeArrayStream(stream)`
|
||||
- `decodeMultiStream(stream)`
|
||||
|
||||
and even custom extension types for faster handling of large `Float32Array` payloads.
|
||||
|
||||
#### Pros
|
||||
|
||||
- smaller payload than JSON
|
||||
- binary transport
|
||||
- async and stream-capable decoding APIs exist
|
||||
- mature JS library
|
||||
|
||||
#### Cons
|
||||
|
||||
- if you keep the current row-oriented schema, you still get one huge object graph after decode
|
||||
- therefore MessagePack alone does not remove the fundamental object-allocation problem
|
||||
- custom extension types improve typed-array cases, but then you are already halfway to designing a custom binary protocol
|
||||
|
||||
#### Fit for current pipeline
|
||||
|
||||
Moderate.
|
||||
|
||||
It can preserve all current information easily.
|
||||
|
||||
But if the schema remains object-heavy, the browser still ends up with millions of JS objects.
|
||||
|
||||
#### Overall assessment
|
||||
|
||||
Useful if paired with a **columnar** schema. Not compelling as a first move if the schema stays row-oriented.
|
||||
|
||||
### 4. Apache Arrow IPC
|
||||
|
||||
#### Idea
|
||||
|
||||
Use Arrow's columnar binary format and Arrow JS support.
|
||||
|
||||
Arrow JS provides:
|
||||
|
||||
- `tableFromIPC(...)`
|
||||
- support for `fetch(...)`
|
||||
- typed-array-backed vectors
|
||||
- dictionary-encoded strings
|
||||
- a columnar memory model explicitly meant for efficient processing and movement of large in-memory data
|
||||
|
||||
#### Pros
|
||||
|
||||
- strongest off-the-shelf fit for typed-array-oriented rendering
|
||||
- columnar by design
|
||||
- binary rather than textual
|
||||
- supports large numeric columns very naturally
|
||||
- supports dictionary encoding for repeated strings like labels or IRIs
|
||||
- much closer to the renderer/GPU input shape than JSON objects
|
||||
|
||||
#### Cons
|
||||
|
||||
- larger conceptual/tooling jump than columnar JSON
|
||||
- route segments are nested/variable-length; representing them cleanly needs design
|
||||
- frontend code becomes Arrow-aware unless the decode is hidden behind an adapter
|
||||
- backend must serialize Arrow on the Go side or produce Arrow-compatible IPC
|
||||
|
||||
#### Fit for current pipeline
|
||||
|
||||
Very good.
|
||||
|
||||
Current frontend needs can be represented as columns:
|
||||
|
||||
- `vertex_ids: uint32`
|
||||
- `xs: float32`
|
||||
- `ys: float32`
|
||||
- `edge_sources: uint32`
|
||||
- `edge_targets: uint32`
|
||||
- `labels: utf8` or dictionary-encoded utf8
|
||||
- `iris: utf8` or dictionary-encoded utf8
|
||||
|
||||
Route geometry should probably not stay as nested route-segment objects. It would fit better as:
|
||||
|
||||
- pre-flattened `route_line_vertices` float column/buffer
|
||||
- or a second Arrow table dedicated to line segments
|
||||
|
||||
#### Overall assessment
|
||||
|
||||
One of the two best solutions for this repo.
|
||||
|
||||
If you want a standard format instead of inventing one, Arrow is the most attractive candidate.
|
||||
|
||||
### 5. FlatBuffers
|
||||
|
||||
#### Idea
|
||||
|
||||
Use a schema-defined binary format designed for direct access without unpacking/parsing.
|
||||
|
||||
FlatBuffers explicitly advertises:
|
||||
|
||||
- access to serialized data without parsing/unpacking
|
||||
- memory efficiency and speed
|
||||
- forwards/backwards compatibility
|
||||
|
||||
#### Pros
|
||||
|
||||
- very strong memory-efficiency story
|
||||
- schema evolution support
|
||||
- no full parse/unpack step in the same way as JSON
|
||||
- can model both scalars and more complex structures
|
||||
|
||||
#### Cons
|
||||
|
||||
- requires schema/compiler/generated bindings
|
||||
- JavaScript integration is more manual than JSON or Arrow
|
||||
- ergonomics in app code are not as simple as arrays/objects
|
||||
- strings and nested route structures are supported, but the developer experience is more specialized
|
||||
|
||||
#### Fit for current pipeline
|
||||
|
||||
Good, technically.
|
||||
|
||||
It can preserve all current information and remove the giant object-graph parse step.
|
||||
|
||||
However, compared with Arrow or a custom binary envelope, it is a less natural conceptual fit for a renderer whose hot path is already columnar/typed-array-based.
|
||||
|
||||
#### Overall assessment
|
||||
|
||||
A strong technical option, but probably not the most ergonomic option for this specific frontend.
|
||||
|
||||
### 6. Protocol Buffers
|
||||
|
||||
#### Idea
|
||||
|
||||
Use a schema-defined binary format with generated bindings.
|
||||
|
||||
#### Pros
|
||||
|
||||
- compact binary encoding
|
||||
- schema/versioning
|
||||
- mature ecosystem
|
||||
|
||||
#### Cons
|
||||
|
||||
- official docs describe protobuf as a good fit for typed structured messages up to a few megabytes
|
||||
- the same docs warn that large data can require loading entire messages into memory and can cause multiple copies
|
||||
- large repeated numeric arrays are not protobuf's sweet spot
|
||||
- still not especially close to the renderer's typed-array model
|
||||
|
||||
#### Fit for current pipeline
|
||||
|
||||
Poor for this specific payload size and shape.
|
||||
|
||||
#### Overall assessment
|
||||
|
||||
Not recommended for this main graph transport.
|
||||
|
||||
### 7. Custom Binary Typed-Array Envelope
|
||||
|
||||
#### Idea
|
||||
|
||||
Define a transport specifically around what the renderer and hover/selection pipeline need.
|
||||
|
||||
Example structure:
|
||||
|
||||
- small fixed header or small JSON header:
|
||||
- version
|
||||
- counts
|
||||
- offsets/lengths
|
||||
- meta subset
|
||||
- then raw binary buffers:
|
||||
- `vertex_ids`
|
||||
- `xs`
|
||||
- `ys`
|
||||
- `edge_sources`
|
||||
- `edge_targets`
|
||||
- `route_line_vertices`
|
||||
- string dictionary / offsets for `label` and `iri`
|
||||
|
||||
#### Pros
|
||||
|
||||
- closest possible fit to current renderer
|
||||
- no schema compiler required
|
||||
- no row-object materialization
|
||||
- easiest path to zero-copy or near-zero-copy arrays on the frontend
|
||||
- easiest path to worker transfer via `ArrayBuffer`
|
||||
- can separate hot render data from cold metadata cleanly
|
||||
|
||||
#### Cons
|
||||
|
||||
- custom protocol to design, version, validate, and document
|
||||
- less tooling/interoperability than Arrow
|
||||
- backend and frontend both need careful binary codecs
|
||||
|
||||
#### Fit for current pipeline
|
||||
|
||||
Excellent.
|
||||
|
||||
You can preserve all current behavior while only sending the data the frontend actually uses.
|
||||
|
||||
#### Overall assessment
|
||||
|
||||
The best performance-oriented fit if you are comfortable owning a custom format.
|
||||
|
||||
## Comparison Table
|
||||
|
||||
| Option | Closeness to GPU shape | Avoids giant object graph | Supports all current frontend data | Streaming-friendly | Implementation cost | Recommendation |
|
||||
| --- | --- | --- | --- | --- | --- | --- |
|
||||
| Current row JSON | Poor | No | Yes | Poor | Already done | Replace |
|
||||
| Columnar JSON | Medium | No | Yes | Medium | Low | Good interim |
|
||||
| NDJSON chunked columnar JSON | Medium | Partially | Yes | Good | Medium | Situational |
|
||||
| MessagePack row-oriented | Poor | No | Yes | Good | Medium | Not enough alone |
|
||||
| MessagePack columnar | Medium | Partially | Yes | Good | Medium | Viable but secondary |
|
||||
| Arrow IPC | Very high | Yes or mostly yes | Yes | Good | Medium-high | Strong candidate |
|
||||
| FlatBuffers | High | Yes | Yes | Medium | High | Good but specialized |
|
||||
| Protobuf | Low-medium | No practical win here | Yes | Medium | Medium-high | Not recommended |
|
||||
| Custom binary typed-array envelope | Very high | Yes | Yes | Good | High | Strongest fit |
|
||||
|
||||
## Recommended Data Contract Shapes
|
||||
|
||||
### Recommended shape for any non-row-oriented solution
|
||||
|
||||
The frontend does not need node/edge objects as its primary graph transport.
|
||||
|
||||
The main graph payload should be modeled as:
|
||||
|
||||
- `vertex_ids`
|
||||
- `xs`
|
||||
- `ys`
|
||||
- `edge_sources`
|
||||
- `edge_targets`
|
||||
- `route_line_vertices`
|
||||
- `node_labels`
|
||||
- `node_iris`
|
||||
- `meta`
|
||||
|
||||
This can be represented as:
|
||||
|
||||
- columnar JSON
|
||||
- Arrow columns
|
||||
- FlatBuffers vectors
|
||||
- custom binary sections
|
||||
|
||||
### Fields that can be removed from the frontend transport immediately
|
||||
|
||||
Without changing current visible behavior, the main graph transport does not need to include:
|
||||
|
||||
- `nodes[].termType`
|
||||
- `edges[].predicate_id`
|
||||
- `meta.predicates`
|
||||
- `meta.ttl_path`
|
||||
- `meta.sparql_endpoint`
|
||||
- `meta.include_bnodes`
|
||||
- `meta.layout_engine`
|
||||
- `meta.layout_root_iri`
|
||||
- `route_segments[].edge_index`
|
||||
- `route_segments[].kind`
|
||||
|
||||
Important:
|
||||
|
||||
Some of those fields are still needed by the backend's **internal snapshot**, especially for selection queries and hierarchy layout. That argues for splitting:
|
||||
|
||||
- internal snapshot model
|
||||
- frontend transport DTO
|
||||
|
||||
instead of continuing to reuse one struct for both.
|
||||
|
||||
## Additional Architectural Notes
|
||||
|
||||
### A worker is complementary, not a transport format
|
||||
|
||||
Web Workers can move parsing/build work off the main thread, and `ArrayBuffer` is transferable. That is useful, but it does not by itself solve the current over-allocation problem if the payload is still a giant row-oriented JSON document.
|
||||
|
||||
Workers are most valuable when paired with:
|
||||
|
||||
- binary columnar payloads
|
||||
- streamed columnar chunks
|
||||
- transfer of `ArrayBuffer`s rather than giant JS object graphs
|
||||
|
||||
### The backend can keep a richer internal snapshot than it sends
|
||||
|
||||
This repo already caches snapshots server-side. Selection and triple queries are built from the backend snapshot and the small `graphMeta` values sent back by the client.
|
||||
|
||||
That means the frontend transport can be much slimmer than the backend snapshot representation, as long as the backend retains its richer internal data.
|
||||
|
||||
This is the cleanest way to avoid losing information while optimizing the frontend transport.
|
||||
|
||||
## Final Recommendation
|
||||
|
||||
### Best long-term option
|
||||
|
||||
Pick one of:
|
||||
|
||||
1. **Custom binary typed-array envelope**
|
||||
2. **Apache Arrow IPC**
|
||||
|
||||
Reason:
|
||||
|
||||
- both map naturally to the renderer's actual input model
|
||||
- both avoid the giant row-object parse path
|
||||
- both can preserve all current frontend-visible information
|
||||
|
||||
### Best low-risk migration path
|
||||
|
||||
If you want an incremental step before going binary:
|
||||
|
||||
1. split backend internal snapshot from frontend transport DTO
|
||||
2. move `/api/graph` to **columnar JSON**
|
||||
3. keep only the metadata fields the frontend actually uses
|
||||
4. later replace the same columnar DTO with Arrow or custom binary
|
||||
|
||||
That path reduces waste immediately and keeps the eventual binary migration straightforward.
|
||||
|
||||
## Sources
|
||||
|
||||
Official documentation and primary sources used for the comparison:
|
||||
|
||||
- MDN `Response.json()`
|
||||
- https://developer.mozilla.org/en-US/docs/Web/API/Response/json
|
||||
- MDN `TextDecoderStream`
|
||||
- https://developer.mozilla.org/en-US/docs/Web/API/TextDecoderStream
|
||||
- MDN Web Workers
|
||||
- https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers
|
||||
- MDN Transferable Objects
|
||||
- https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Transferable_objects
|
||||
- Apache Arrow JavaScript
|
||||
- https://arrow.apache.org/js/current/
|
||||
- https://arrow.apache.org/js/main/functions/Arrow.dom.tableFromIPC.html
|
||||
- NDJSON specification
|
||||
- https://github.com/ndjson/ndjson-spec
|
||||
- MessagePack for JavaScript
|
||||
- https://github.com/msgpack/msgpack-javascript
|
||||
- FlatBuffers overview and JavaScript docs
|
||||
- https://flatbuffers.dev/
|
||||
- https://flatbuffers.dev/languages/javascript/
|
||||
- Protocol Buffers overview
|
||||
- https://protobuf.dev/overview/
|
||||
- Streaming JSON parser references
|
||||
- https://github.com/juanjoDiaz/streamparser-json
|
||||
- https://rictic.github.io/jsonriver/
|
||||
Reference in New Issue
Block a user