Files

Oxy8 97a30ab769 docs: refresh pipeline notes and transport analysis

2026-04-06 13:36:53 -03:00

7.6 KiB

Raw Permalink Blame History

Current `subClassOf` / `BFO:entity` Pipeline

This document summarizes how the repository currently builds the hierarchy that ends up in the radial Sugiyama layout, with special attention to the fact that "start from bfo:entity" is not implemented in the initial subClassOf query.

bfo:entity here means:

http://purl.obolibrary.org/obo/BFO_0000001

TL;DR

The current code does not query "all rdfs:subClassOf relationships rooted at bfo:entity" directly.
It first queries the entire rdfs:subClassOf graph.
It builds an in-memory graph from those triples.
Only later, in the Rust hierarchy layout bridge, it filters that graph to the descendant closure of the configured root IRI.
Because of that, the "rooted at bfo:entity" behavior is currently coupled to the layout pipeline instead of existing as a reusable graph-extraction stage.

Where The Request Starts

The frontend loads the hierarchy through the normal graph endpoint:

frontend/src/App.tsx
GET /api/graph?graph_query_id=hierarchy
backend_go/server.go -> handleGraph
backend_go/snapshot_service.go -> Get
backend_go/graph_snapshot.go -> fetchGraphSnapshot

Important consequence:

The hierarchy is treated as a graph snapshot mode, not as a dedicated "query descendants of this root" pipeline.

The Actual SPARQL Query Used For `hierarchy`

The hierarchy graph query is defined in:

backend_go/graph_queries/hierarchy.go

It effectively does:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?s ?p ?o
WHERE {
  VALUES ?p { rdfs:subClassOf }
  ?s ?p ?o .
  FILTER(!isLiteral(?o))
  # optionally also FILTER(!isBlank(?s) && !isBlank(?o))
}
ORDER BY ?s ?p ?o
LIMIT ...
OFFSET ...

Important facts:

It queries all rdfs:subClassOf triples.
There is no root restriction here.
There is no bfo:entity filter here.
Blank nodes are excluded unless INCLUDE_BNODES=true.
Objects that are literals are excluded.

How The In-Memory Graph Is Built

Graph construction is handled by:

backend_go/graph_export.go

The accumulator logic works like this:

Every returned ?s and ?o becomes a node if it has not been seen before.
There is no separate node query.
A class only enters the graph if it appears in at least one fetched edge.
Isolated classes with no fetched subClassOf edge never appear.
If node_limit is reached, new nodes stop being added, and edges that depend on them are skipped.

Edge direction at this stage is:

Source = subclass (?s)
Target = superclass (?o)

So the raw in-memory graph is stored as:

subclass -> superclass

Where `BFO:entity` Is Actually Applied

The root restriction happens only when the backend chooses the Rust hierarchy layout path.

Relevant files:

backend_go/config.go
.env
backend_go/graph_snapshot.go
backend_go/hierarchy_layout_bridge.go
radial_sugiyama/src/bridge.rs

Current behavior:

.env sets HIERARCHY_LAYOUT_ENGINE=rust.
If graph_query_id == "hierarchy" and the engine is rust, the backend calls the Rust bridge.
The root IRI comes from HIERARCHY_LAYOUT_ROOT_IRI.
If that env var is not set, the checked-in default is http://purl.obolibrary.org/obo/BFO_0000001.

This means the current repository behavior is effectively:

query all subClassOf
then filter to descendants of BFO:entity
then lay out the filtered graph

What Go Sends To Rust

Before calling Rust, Go rewrites the edge orientation in:

backend_go/hierarchy_layout_bridge.go

It converts each stored edge from:

subclass -> superclass

into:

parentID = superclass
childID = subclass

So the Rust side receives:

superclass -> subclass

Go also:

de-duplicates repeated parent/child edges
sends the configured root_iri
sends all nodes that were present in the fetched hierarchy graph

How Rust Filters To Descendants Of The Root

Filtering happens in:

radial_sugiyama/src/bridge.rs

The bridge logic does this:

Build an internal graph from the request.
Find the node whose label/IRI matches root_iri.
Build adjacency lists in the parent -> child direction.
Run a BFS/queue traversal starting at the root.
Keep only the visited nodes.
Keep only edges whose endpoints are both visited.
Run radial Sugiyama layout on that filtered subgraph.

Important consequences:

Nodes outside the descendant closure of the root are dropped.
Disconnected components are dropped.
Ancestors of the root are not kept unless they are also reachable as descendants, which normally they are not.
If the root is missing, the pipeline errors.
If the root has no descendants, the pipeline errors.

So the actual "select only those starting from bfo:entity" logic is:

graph traversal after fetching the full hierarchy

not:

root-constrained SPARQL

What Comes Back From Rust

After Rust finishes:

only the filtered nodes are returned
only edges between retained nodes are returned
routed edge segments are returned for drawing

That filtering is applied back onto the original Go snapshot response, so the final /api/graph?graph_query_id=hierarchy response only contains the root-descendant subgraph when the Rust path is active.

Why This Feels Like A Separate Pipeline

The main reason it feels split is that the current behavior crosses multiple stages:

SPARQL query stage fetches the whole subClassOf graph.
Graph materialization stage builds a generic snapshot graph.
Layout bridge stage applies the root restriction.
Layout stage computes coordinates.

This means the "hierarchy rooted at BFO:entity" concept is currently embedded in layout preparation instead of existing as a first-class reusable data pipeline.

In practice, the root filtering is:

not reusable by itself through a dedicated backend API
not expressed in the initial SPARQL query
not controlled per request
tied to the hierarchy layout engine choice

Selection Queries Are A Different Mechanism

The repository also has separate selection-query endpoints:

backend_go/selection_queries/subclasses.go
backend_go/selection_queries/superclasses.go
backend_go/selection_queries/neighbors.go

Those are used after nodes are already present in a graph snapshot and the user selects node IDs.

They are not the mechanism that initially builds the BFO:entity hierarchy used by the radial layout.

Their role is more like:

"given selected node IDs in the current snapshot, query related triples"

not:

"materialize the hierarchy rooted at BFO:entity"

Current End-To-End Behavior In One Sentence

The current system gets all rdfs:subClassOf triples first, constructs a general hierarchy graph, and only then filters it to the descendants of http://purl.obolibrary.org/obo/BFO_0000001 inside the Rust radial Sugiyama bridge.

Files To Read When Rewriting

If you want to rewrite this from zero, these are the main files that define the current behavior:

backend_go/server.go
backend_go/snapshot_service.go
backend_go/graph_snapshot.go
backend_go/graph_queries/hierarchy.go
backend_go/graph_export.go
backend_go/hierarchy_layout_bridge.go
backend_go/config.go
.env
radial_sugiyama/src/bridge.rs
backend_go/selection_queries/subclasses.go
backend_go/selection_queries/superclasses.go

Rewrite-Oriented Takeaway

If your goal is a cleaner standalone pipeline for:

query rdfs:subClassOf
start from bfo:entity
materialize only the rooted descendant hierarchy

then the current codebase is doing the root restriction too late. Right now, that concern lives in the layout bridge rather than in the query/materialization layer.

7.6 KiB Raw Permalink Blame History

Current subClassOf / BFO:entity Pipeline