docs: refresh pipeline notes and transport analysis
This commit is contained in:
243
CURRENT_HIERARCHY_PIPELINE.md
Normal file
243
CURRENT_HIERARCHY_PIPELINE.md
Normal file
@@ -0,0 +1,243 @@
|
||||
# Current `subClassOf` / `BFO:entity` Pipeline
|
||||
|
||||
This document summarizes how the repository currently builds the hierarchy that ends up in the radial Sugiyama layout, with special attention to the fact that "start from `bfo:entity`" is **not** implemented in the initial `subClassOf` query.
|
||||
|
||||
`bfo:entity` here means:
|
||||
|
||||
- `http://purl.obolibrary.org/obo/BFO_0000001`
|
||||
|
||||
## TL;DR
|
||||
|
||||
- The current code does **not** query "all `rdfs:subClassOf` relationships rooted at `bfo:entity`" directly.
|
||||
- It first queries the **entire** `rdfs:subClassOf` graph.
|
||||
- It builds an in-memory graph from those triples.
|
||||
- Only later, in the Rust hierarchy layout bridge, it filters that graph to the descendant closure of the configured root IRI.
|
||||
- Because of that, the "rooted at `bfo:entity`" behavior is currently coupled to the layout pipeline instead of existing as a reusable graph-extraction stage.
|
||||
|
||||
## Where The Request Starts
|
||||
|
||||
The frontend loads the hierarchy through the normal graph endpoint:
|
||||
|
||||
1. `frontend/src/App.tsx`
|
||||
2. `GET /api/graph?graph_query_id=hierarchy`
|
||||
3. `backend_go/server.go` -> `handleGraph`
|
||||
4. `backend_go/snapshot_service.go` -> `Get`
|
||||
5. `backend_go/graph_snapshot.go` -> `fetchGraphSnapshot`
|
||||
|
||||
Important consequence:
|
||||
|
||||
- The hierarchy is treated as a graph snapshot mode, not as a dedicated "query descendants of this root" pipeline.
|
||||
|
||||
## The Actual SPARQL Query Used For `hierarchy`
|
||||
|
||||
The `hierarchy` graph query is defined in:
|
||||
|
||||
- `backend_go/graph_queries/hierarchy.go`
|
||||
|
||||
It effectively does:
|
||||
|
||||
```sparql
|
||||
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
|
||||
|
||||
SELECT ?s ?p ?o
|
||||
WHERE {
|
||||
VALUES ?p { rdfs:subClassOf }
|
||||
?s ?p ?o .
|
||||
FILTER(!isLiteral(?o))
|
||||
# optionally also FILTER(!isBlank(?s) && !isBlank(?o))
|
||||
}
|
||||
ORDER BY ?s ?p ?o
|
||||
LIMIT ...
|
||||
OFFSET ...
|
||||
```
|
||||
|
||||
Important facts:
|
||||
|
||||
- It queries **all** `rdfs:subClassOf` triples.
|
||||
- There is **no root restriction** here.
|
||||
- There is **no `bfo:entity` filter** here.
|
||||
- Blank nodes are excluded unless `INCLUDE_BNODES=true`.
|
||||
- Objects that are literals are excluded.
|
||||
|
||||
## How The In-Memory Graph Is Built
|
||||
|
||||
Graph construction is handled by:
|
||||
|
||||
- `backend_go/graph_export.go`
|
||||
|
||||
The accumulator logic works like this:
|
||||
|
||||
- Every returned `?s` and `?o` becomes a node if it has not been seen before.
|
||||
- There is no separate node query.
|
||||
- A class only enters the graph if it appears in at least one fetched edge.
|
||||
- Isolated classes with no fetched `subClassOf` edge never appear.
|
||||
- If `node_limit` is reached, new nodes stop being added, and edges that depend on them are skipped.
|
||||
|
||||
Edge direction at this stage is:
|
||||
|
||||
- `Source = subclass (?s)`
|
||||
- `Target = superclass (?o)`
|
||||
|
||||
So the raw in-memory graph is stored as:
|
||||
|
||||
- `subclass -> superclass`
|
||||
|
||||
## Where `BFO:entity` Is Actually Applied
|
||||
|
||||
The root restriction happens only when the backend chooses the Rust hierarchy layout path.
|
||||
|
||||
Relevant files:
|
||||
|
||||
- `backend_go/config.go`
|
||||
- `.env`
|
||||
- `backend_go/graph_snapshot.go`
|
||||
- `backend_go/hierarchy_layout_bridge.go`
|
||||
- `radial_sugiyama/src/bridge.rs`
|
||||
|
||||
Current behavior:
|
||||
|
||||
- `.env` sets `HIERARCHY_LAYOUT_ENGINE=rust`.
|
||||
- If `graph_query_id == "hierarchy"` and the engine is `rust`, the backend calls the Rust bridge.
|
||||
- The root IRI comes from `HIERARCHY_LAYOUT_ROOT_IRI`.
|
||||
- If that env var is not set, the checked-in default is `http://purl.obolibrary.org/obo/BFO_0000001`.
|
||||
|
||||
This means the current repository behavior is effectively:
|
||||
|
||||
- query all `subClassOf`
|
||||
- then filter to descendants of `BFO:entity`
|
||||
- then lay out the filtered graph
|
||||
|
||||
## What Go Sends To Rust
|
||||
|
||||
Before calling Rust, Go rewrites the edge orientation in:
|
||||
|
||||
- `backend_go/hierarchy_layout_bridge.go`
|
||||
|
||||
It converts each stored edge from:
|
||||
|
||||
- `subclass -> superclass`
|
||||
|
||||
into:
|
||||
|
||||
- `parentID = superclass`
|
||||
- `childID = subclass`
|
||||
|
||||
So the Rust side receives:
|
||||
|
||||
- `superclass -> subclass`
|
||||
|
||||
Go also:
|
||||
|
||||
- de-duplicates repeated parent/child edges
|
||||
- sends the configured `root_iri`
|
||||
- sends all nodes that were present in the fetched hierarchy graph
|
||||
|
||||
## How Rust Filters To Descendants Of The Root
|
||||
|
||||
Filtering happens in:
|
||||
|
||||
- `radial_sugiyama/src/bridge.rs`
|
||||
|
||||
The bridge logic does this:
|
||||
|
||||
1. Build an internal graph from the request.
|
||||
2. Find the node whose label/IRI matches `root_iri`.
|
||||
3. Build adjacency lists in the `parent -> child` direction.
|
||||
4. Run a BFS/queue traversal starting at the root.
|
||||
5. Keep only the visited nodes.
|
||||
6. Keep only edges whose endpoints are both visited.
|
||||
7. Run radial Sugiyama layout on that filtered subgraph.
|
||||
|
||||
Important consequences:
|
||||
|
||||
- Nodes outside the descendant closure of the root are dropped.
|
||||
- Disconnected components are dropped.
|
||||
- Ancestors of the root are not kept unless they are also reachable as descendants, which normally they are not.
|
||||
- If the root is missing, the pipeline errors.
|
||||
- If the root has no descendants, the pipeline errors.
|
||||
|
||||
So the actual "select only those starting from `bfo:entity`" logic is:
|
||||
|
||||
- **graph traversal after fetching the full hierarchy**
|
||||
|
||||
not:
|
||||
|
||||
- **root-constrained SPARQL**
|
||||
|
||||
## What Comes Back From Rust
|
||||
|
||||
After Rust finishes:
|
||||
|
||||
- only the filtered nodes are returned
|
||||
- only edges between retained nodes are returned
|
||||
- routed edge segments are returned for drawing
|
||||
|
||||
That filtering is applied back onto the original Go snapshot response, so the final `/api/graph?graph_query_id=hierarchy` response only contains the root-descendant subgraph when the Rust path is active.
|
||||
|
||||
## Why This Feels Like A Separate Pipeline
|
||||
|
||||
The main reason it feels split is that the current behavior crosses multiple stages:
|
||||
|
||||
1. SPARQL query stage fetches the whole `subClassOf` graph.
|
||||
2. Graph materialization stage builds a generic snapshot graph.
|
||||
3. Layout bridge stage applies the root restriction.
|
||||
4. Layout stage computes coordinates.
|
||||
|
||||
This means the "hierarchy rooted at `BFO:entity`" concept is currently embedded in layout preparation instead of existing as a first-class reusable data pipeline.
|
||||
|
||||
In practice, the root filtering is:
|
||||
|
||||
- not reusable by itself through a dedicated backend API
|
||||
- not expressed in the initial SPARQL query
|
||||
- not controlled per request
|
||||
- tied to the hierarchy layout engine choice
|
||||
|
||||
## Selection Queries Are A Different Mechanism
|
||||
|
||||
The repository also has separate selection-query endpoints:
|
||||
|
||||
- `backend_go/selection_queries/subclasses.go`
|
||||
- `backend_go/selection_queries/superclasses.go`
|
||||
- `backend_go/selection_queries/neighbors.go`
|
||||
|
||||
Those are used after nodes are already present in a graph snapshot and the user selects node IDs.
|
||||
|
||||
They are **not** the mechanism that initially builds the `BFO:entity` hierarchy used by the radial layout.
|
||||
|
||||
Their role is more like:
|
||||
|
||||
- "given selected node IDs in the current snapshot, query related triples"
|
||||
|
||||
not:
|
||||
|
||||
- "materialize the hierarchy rooted at `BFO:entity`"
|
||||
|
||||
## Current End-To-End Behavior In One Sentence
|
||||
|
||||
The current system gets all `rdfs:subClassOf` triples first, constructs a general hierarchy graph, and only then filters it to the descendants of `http://purl.obolibrary.org/obo/BFO_0000001` inside the Rust radial Sugiyama bridge.
|
||||
|
||||
## Files To Read When Rewriting
|
||||
|
||||
If you want to rewrite this from zero, these are the main files that define the current behavior:
|
||||
|
||||
- `backend_go/server.go`
|
||||
- `backend_go/snapshot_service.go`
|
||||
- `backend_go/graph_snapshot.go`
|
||||
- `backend_go/graph_queries/hierarchy.go`
|
||||
- `backend_go/graph_export.go`
|
||||
- `backend_go/hierarchy_layout_bridge.go`
|
||||
- `backend_go/config.go`
|
||||
- `.env`
|
||||
- `radial_sugiyama/src/bridge.rs`
|
||||
- `backend_go/selection_queries/subclasses.go`
|
||||
- `backend_go/selection_queries/superclasses.go`
|
||||
|
||||
## Rewrite-Oriented Takeaway
|
||||
|
||||
If your goal is a cleaner standalone pipeline for:
|
||||
|
||||
- query `rdfs:subClassOf`
|
||||
- start from `bfo:entity`
|
||||
- materialize only the rooted descendant hierarchy
|
||||
|
||||
then the current codebase is doing the root restriction too late. Right now, that concern lives in the layout bridge rather than in the query/materialization layer.
|
||||
Reference in New Issue
Block a user