244 lines
7.6 KiB
Markdown
244 lines
7.6 KiB
Markdown
# Current `subClassOf` / `BFO:entity` Pipeline
|
|
|
|
This document summarizes how the repository currently builds the hierarchy that ends up in the radial Sugiyama layout, with special attention to the fact that "start from `bfo:entity`" is **not** implemented in the initial `subClassOf` query.
|
|
|
|
`bfo:entity` here means:
|
|
|
|
- `http://purl.obolibrary.org/obo/BFO_0000001`
|
|
|
|
## TL;DR
|
|
|
|
- The current code does **not** query "all `rdfs:subClassOf` relationships rooted at `bfo:entity`" directly.
|
|
- It first queries the **entire** `rdfs:subClassOf` graph.
|
|
- It builds an in-memory graph from those triples.
|
|
- Only later, in the Rust hierarchy layout bridge, it filters that graph to the descendant closure of the configured root IRI.
|
|
- Because of that, the "rooted at `bfo:entity`" behavior is currently coupled to the layout pipeline instead of existing as a reusable graph-extraction stage.
|
|
|
|
## Where The Request Starts
|
|
|
|
The frontend loads the hierarchy through the normal graph endpoint:
|
|
|
|
1. `frontend/src/App.tsx`
|
|
2. `GET /api/graph?graph_query_id=hierarchy`
|
|
3. `backend_go/server.go` -> `handleGraph`
|
|
4. `backend_go/snapshot_service.go` -> `Get`
|
|
5. `backend_go/graph_snapshot.go` -> `fetchGraphSnapshot`
|
|
|
|
Important consequence:
|
|
|
|
- The hierarchy is treated as a graph snapshot mode, not as a dedicated "query descendants of this root" pipeline.
|
|
|
|
## The Actual SPARQL Query Used For `hierarchy`
|
|
|
|
The `hierarchy` graph query is defined in:
|
|
|
|
- `backend_go/graph_queries/hierarchy.go`
|
|
|
|
It effectively does:
|
|
|
|
```sparql
|
|
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
|
|
|
|
SELECT ?s ?p ?o
|
|
WHERE {
|
|
VALUES ?p { rdfs:subClassOf }
|
|
?s ?p ?o .
|
|
FILTER(!isLiteral(?o))
|
|
# optionally also FILTER(!isBlank(?s) && !isBlank(?o))
|
|
}
|
|
ORDER BY ?s ?p ?o
|
|
LIMIT ...
|
|
OFFSET ...
|
|
```
|
|
|
|
Important facts:
|
|
|
|
- It queries **all** `rdfs:subClassOf` triples.
|
|
- There is **no root restriction** here.
|
|
- There is **no `bfo:entity` filter** here.
|
|
- Blank nodes are excluded unless `INCLUDE_BNODES=true`.
|
|
- Objects that are literals are excluded.
|
|
|
|
## How The In-Memory Graph Is Built
|
|
|
|
Graph construction is handled by:
|
|
|
|
- `backend_go/graph_export.go`
|
|
|
|
The accumulator logic works like this:
|
|
|
|
- Every returned `?s` and `?o` becomes a node if it has not been seen before.
|
|
- There is no separate node query.
|
|
- A class only enters the graph if it appears in at least one fetched edge.
|
|
- Isolated classes with no fetched `subClassOf` edge never appear.
|
|
- If `node_limit` is reached, new nodes stop being added, and edges that depend on them are skipped.
|
|
|
|
Edge direction at this stage is:
|
|
|
|
- `Source = subclass (?s)`
|
|
- `Target = superclass (?o)`
|
|
|
|
So the raw in-memory graph is stored as:
|
|
|
|
- `subclass -> superclass`
|
|
|
|
## Where `BFO:entity` Is Actually Applied
|
|
|
|
The root restriction happens only when the backend chooses the Rust hierarchy layout path.
|
|
|
|
Relevant files:
|
|
|
|
- `backend_go/config.go`
|
|
- `.env`
|
|
- `backend_go/graph_snapshot.go`
|
|
- `backend_go/hierarchy_layout_bridge.go`
|
|
- `radial_sugiyama/src/bridge.rs`
|
|
|
|
Current behavior:
|
|
|
|
- `.env` sets `HIERARCHY_LAYOUT_ENGINE=rust`.
|
|
- If `graph_query_id == "hierarchy"` and the engine is `rust`, the backend calls the Rust bridge.
|
|
- The root IRI comes from `HIERARCHY_LAYOUT_ROOT_IRI`.
|
|
- If that env var is not set, the checked-in default is `http://purl.obolibrary.org/obo/BFO_0000001`.
|
|
|
|
This means the current repository behavior is effectively:
|
|
|
|
- query all `subClassOf`
|
|
- then filter to descendants of `BFO:entity`
|
|
- then lay out the filtered graph
|
|
|
|
## What Go Sends To Rust
|
|
|
|
Before calling Rust, Go rewrites the edge orientation in:
|
|
|
|
- `backend_go/hierarchy_layout_bridge.go`
|
|
|
|
It converts each stored edge from:
|
|
|
|
- `subclass -> superclass`
|
|
|
|
into:
|
|
|
|
- `parentID = superclass`
|
|
- `childID = subclass`
|
|
|
|
So the Rust side receives:
|
|
|
|
- `superclass -> subclass`
|
|
|
|
Go also:
|
|
|
|
- de-duplicates repeated parent/child edges
|
|
- sends the configured `root_iri`
|
|
- sends all nodes that were present in the fetched hierarchy graph
|
|
|
|
## How Rust Filters To Descendants Of The Root
|
|
|
|
Filtering happens in:
|
|
|
|
- `radial_sugiyama/src/bridge.rs`
|
|
|
|
The bridge logic does this:
|
|
|
|
1. Build an internal graph from the request.
|
|
2. Find the node whose label/IRI matches `root_iri`.
|
|
3. Build adjacency lists in the `parent -> child` direction.
|
|
4. Run a BFS/queue traversal starting at the root.
|
|
5. Keep only the visited nodes.
|
|
6. Keep only edges whose endpoints are both visited.
|
|
7. Run radial Sugiyama layout on that filtered subgraph.
|
|
|
|
Important consequences:
|
|
|
|
- Nodes outside the descendant closure of the root are dropped.
|
|
- Disconnected components are dropped.
|
|
- Ancestors of the root are not kept unless they are also reachable as descendants, which normally they are not.
|
|
- If the root is missing, the pipeline errors.
|
|
- If the root has no descendants, the pipeline errors.
|
|
|
|
So the actual "select only those starting from `bfo:entity`" logic is:
|
|
|
|
- **graph traversal after fetching the full hierarchy**
|
|
|
|
not:
|
|
|
|
- **root-constrained SPARQL**
|
|
|
|
## What Comes Back From Rust
|
|
|
|
After Rust finishes:
|
|
|
|
- only the filtered nodes are returned
|
|
- only edges between retained nodes are returned
|
|
- routed edge segments are returned for drawing
|
|
|
|
That filtering is applied back onto the original Go snapshot response, so the final `/api/graph?graph_query_id=hierarchy` response only contains the root-descendant subgraph when the Rust path is active.
|
|
|
|
## Why This Feels Like A Separate Pipeline
|
|
|
|
The main reason it feels split is that the current behavior crosses multiple stages:
|
|
|
|
1. SPARQL query stage fetches the whole `subClassOf` graph.
|
|
2. Graph materialization stage builds a generic snapshot graph.
|
|
3. Layout bridge stage applies the root restriction.
|
|
4. Layout stage computes coordinates.
|
|
|
|
This means the "hierarchy rooted at `BFO:entity`" concept is currently embedded in layout preparation instead of existing as a first-class reusable data pipeline.
|
|
|
|
In practice, the root filtering is:
|
|
|
|
- not reusable by itself through a dedicated backend API
|
|
- not expressed in the initial SPARQL query
|
|
- not controlled per request
|
|
- tied to the hierarchy layout engine choice
|
|
|
|
## Selection Queries Are A Different Mechanism
|
|
|
|
The repository also has separate selection-query endpoints:
|
|
|
|
- `backend_go/selection_queries/subclasses.go`
|
|
- `backend_go/selection_queries/superclasses.go`
|
|
- `backend_go/selection_queries/neighbors.go`
|
|
|
|
Those are used after nodes are already present in a graph snapshot and the user selects node IDs.
|
|
|
|
They are **not** the mechanism that initially builds the `BFO:entity` hierarchy used by the radial layout.
|
|
|
|
Their role is more like:
|
|
|
|
- "given selected node IDs in the current snapshot, query related triples"
|
|
|
|
not:
|
|
|
|
- "materialize the hierarchy rooted at `BFO:entity`"
|
|
|
|
## Current End-To-End Behavior In One Sentence
|
|
|
|
The current system gets all `rdfs:subClassOf` triples first, constructs a general hierarchy graph, and only then filters it to the descendants of `http://purl.obolibrary.org/obo/BFO_0000001` inside the Rust radial Sugiyama bridge.
|
|
|
|
## Files To Read When Rewriting
|
|
|
|
If you want to rewrite this from zero, these are the main files that define the current behavior:
|
|
|
|
- `backend_go/server.go`
|
|
- `backend_go/snapshot_service.go`
|
|
- `backend_go/graph_snapshot.go`
|
|
- `backend_go/graph_queries/hierarchy.go`
|
|
- `backend_go/graph_export.go`
|
|
- `backend_go/hierarchy_layout_bridge.go`
|
|
- `backend_go/config.go`
|
|
- `.env`
|
|
- `radial_sugiyama/src/bridge.rs`
|
|
- `backend_go/selection_queries/subclasses.go`
|
|
- `backend_go/selection_queries/superclasses.go`
|
|
|
|
## Rewrite-Oriented Takeaway
|
|
|
|
If your goal is a cleaner standalone pipeline for:
|
|
|
|
- query `rdfs:subClassOf`
|
|
- start from `bfo:entity`
|
|
- materialize only the rooted descendant hierarchy
|
|
|
|
then the current codebase is doing the root restriction too late. Right now, that concern lives in the layout bridge rather than in the query/materialization layer.
|