7.6 KiB
Current subClassOf / BFO:entity Pipeline
This document summarizes how the repository currently builds the hierarchy that ends up in the radial Sugiyama layout, with special attention to the fact that "start from bfo:entity" is not implemented in the initial subClassOf query.
bfo:entity here means:
http://purl.obolibrary.org/obo/BFO_0000001
TL;DR
- The current code does not query "all
rdfs:subClassOfrelationships rooted atbfo:entity" directly. - It first queries the entire
rdfs:subClassOfgraph. - It builds an in-memory graph from those triples.
- Only later, in the Rust hierarchy layout bridge, it filters that graph to the descendant closure of the configured root IRI.
- Because of that, the "rooted at
bfo:entity" behavior is currently coupled to the layout pipeline instead of existing as a reusable graph-extraction stage.
Where The Request Starts
The frontend loads the hierarchy through the normal graph endpoint:
frontend/src/App.tsxGET /api/graph?graph_query_id=hierarchybackend_go/server.go->handleGraphbackend_go/snapshot_service.go->Getbackend_go/graph_snapshot.go->fetchGraphSnapshot
Important consequence:
- The hierarchy is treated as a graph snapshot mode, not as a dedicated "query descendants of this root" pipeline.
The Actual SPARQL Query Used For hierarchy
The hierarchy graph query is defined in:
backend_go/graph_queries/hierarchy.go
It effectively does:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?s ?p ?o
WHERE {
VALUES ?p { rdfs:subClassOf }
?s ?p ?o .
FILTER(!isLiteral(?o))
# optionally also FILTER(!isBlank(?s) && !isBlank(?o))
}
ORDER BY ?s ?p ?o
LIMIT ...
OFFSET ...
Important facts:
- It queries all
rdfs:subClassOftriples. - There is no root restriction here.
- There is no
bfo:entityfilter here. - Blank nodes are excluded unless
INCLUDE_BNODES=true. - Objects that are literals are excluded.
How The In-Memory Graph Is Built
Graph construction is handled by:
backend_go/graph_export.go
The accumulator logic works like this:
- Every returned
?sand?obecomes a node if it has not been seen before. - There is no separate node query.
- A class only enters the graph if it appears in at least one fetched edge.
- Isolated classes with no fetched
subClassOfedge never appear. - If
node_limitis reached, new nodes stop being added, and edges that depend on them are skipped.
Edge direction at this stage is:
Source = subclass (?s)Target = superclass (?o)
So the raw in-memory graph is stored as:
subclass -> superclass
Where BFO:entity Is Actually Applied
The root restriction happens only when the backend chooses the Rust hierarchy layout path.
Relevant files:
backend_go/config.go.envbackend_go/graph_snapshot.gobackend_go/hierarchy_layout_bridge.goradial_sugiyama/src/bridge.rs
Current behavior:
.envsetsHIERARCHY_LAYOUT_ENGINE=rust.- If
graph_query_id == "hierarchy"and the engine isrust, the backend calls the Rust bridge. - The root IRI comes from
HIERARCHY_LAYOUT_ROOT_IRI. - If that env var is not set, the checked-in default is
http://purl.obolibrary.org/obo/BFO_0000001.
This means the current repository behavior is effectively:
- query all
subClassOf - then filter to descendants of
BFO:entity - then lay out the filtered graph
What Go Sends To Rust
Before calling Rust, Go rewrites the edge orientation in:
backend_go/hierarchy_layout_bridge.go
It converts each stored edge from:
subclass -> superclass
into:
parentID = superclasschildID = subclass
So the Rust side receives:
superclass -> subclass
Go also:
- de-duplicates repeated parent/child edges
- sends the configured
root_iri - sends all nodes that were present in the fetched hierarchy graph
How Rust Filters To Descendants Of The Root
Filtering happens in:
radial_sugiyama/src/bridge.rs
The bridge logic does this:
- Build an internal graph from the request.
- Find the node whose label/IRI matches
root_iri. - Build adjacency lists in the
parent -> childdirection. - Run a BFS/queue traversal starting at the root.
- Keep only the visited nodes.
- Keep only edges whose endpoints are both visited.
- Run radial Sugiyama layout on that filtered subgraph.
Important consequences:
- Nodes outside the descendant closure of the root are dropped.
- Disconnected components are dropped.
- Ancestors of the root are not kept unless they are also reachable as descendants, which normally they are not.
- If the root is missing, the pipeline errors.
- If the root has no descendants, the pipeline errors.
So the actual "select only those starting from bfo:entity" logic is:
- graph traversal after fetching the full hierarchy
not:
- root-constrained SPARQL
What Comes Back From Rust
After Rust finishes:
- only the filtered nodes are returned
- only edges between retained nodes are returned
- routed edge segments are returned for drawing
That filtering is applied back onto the original Go snapshot response, so the final /api/graph?graph_query_id=hierarchy response only contains the root-descendant subgraph when the Rust path is active.
Why This Feels Like A Separate Pipeline
The main reason it feels split is that the current behavior crosses multiple stages:
- SPARQL query stage fetches the whole
subClassOfgraph. - Graph materialization stage builds a generic snapshot graph.
- Layout bridge stage applies the root restriction.
- Layout stage computes coordinates.
This means the "hierarchy rooted at BFO:entity" concept is currently embedded in layout preparation instead of existing as a first-class reusable data pipeline.
In practice, the root filtering is:
- not reusable by itself through a dedicated backend API
- not expressed in the initial SPARQL query
- not controlled per request
- tied to the hierarchy layout engine choice
Selection Queries Are A Different Mechanism
The repository also has separate selection-query endpoints:
backend_go/selection_queries/subclasses.gobackend_go/selection_queries/superclasses.gobackend_go/selection_queries/neighbors.go
Those are used after nodes are already present in a graph snapshot and the user selects node IDs.
They are not the mechanism that initially builds the BFO:entity hierarchy used by the radial layout.
Their role is more like:
- "given selected node IDs in the current snapshot, query related triples"
not:
- "materialize the hierarchy rooted at
BFO:entity"
Current End-To-End Behavior In One Sentence
The current system gets all rdfs:subClassOf triples first, constructs a general hierarchy graph, and only then filters it to the descendants of http://purl.obolibrary.org/obo/BFO_0000001 inside the Rust radial Sugiyama bridge.
Files To Read When Rewriting
If you want to rewrite this from zero, these are the main files that define the current behavior:
backend_go/server.gobackend_go/snapshot_service.gobackend_go/graph_snapshot.gobackend_go/graph_queries/hierarchy.gobackend_go/graph_export.gobackend_go/hierarchy_layout_bridge.gobackend_go/config.go.envradial_sugiyama/src/bridge.rsbackend_go/selection_queries/subclasses.gobackend_go/selection_queries/superclasses.go
Rewrite-Oriented Takeaway
If your goal is a cleaner standalone pipeline for:
- query
rdfs:subClassOf - start from
bfo:entity - materialize only the rooted descendant hierarchy
then the current codebase is doing the root restriction too late. Right now, that concern lives in the layout bridge rather than in the query/materialization layer.