Files
visualizador_instanciados/CURRENT_HIERARCHY_PIPELINE.md

244 lines
7.6 KiB
Markdown

# Current `subClassOf` / `BFO:entity` Pipeline
This document summarizes how the repository currently builds the hierarchy that ends up in the radial Sugiyama layout, with special attention to the fact that "start from `bfo:entity`" is **not** implemented in the initial `subClassOf` query.
`bfo:entity` here means:
- `http://purl.obolibrary.org/obo/BFO_0000001`
## TL;DR
- The current code does **not** query "all `rdfs:subClassOf` relationships rooted at `bfo:entity`" directly.
- It first queries the **entire** `rdfs:subClassOf` graph.
- It builds an in-memory graph from those triples.
- Only later, in the Rust hierarchy layout bridge, it filters that graph to the descendant closure of the configured root IRI.
- Because of that, the "rooted at `bfo:entity`" behavior is currently coupled to the layout pipeline instead of existing as a reusable graph-extraction stage.
## Where The Request Starts
The frontend loads the hierarchy through the normal graph endpoint:
1. `frontend/src/App.tsx`
2. `GET /api/graph?graph_query_id=hierarchy`
3. `backend_go/server.go` -> `handleGraph`
4. `backend_go/snapshot_service.go` -> `Get`
5. `backend_go/graph_snapshot.go` -> `fetchGraphSnapshot`
Important consequence:
- The hierarchy is treated as a graph snapshot mode, not as a dedicated "query descendants of this root" pipeline.
## The Actual SPARQL Query Used For `hierarchy`
The `hierarchy` graph query is defined in:
- `backend_go/graph_queries/hierarchy.go`
It effectively does:
```sparql
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?s ?p ?o
WHERE {
VALUES ?p { rdfs:subClassOf }
?s ?p ?o .
FILTER(!isLiteral(?o))
# optionally also FILTER(!isBlank(?s) && !isBlank(?o))
}
ORDER BY ?s ?p ?o
LIMIT ...
OFFSET ...
```
Important facts:
- It queries **all** `rdfs:subClassOf` triples.
- There is **no root restriction** here.
- There is **no `bfo:entity` filter** here.
- Blank nodes are excluded unless `INCLUDE_BNODES=true`.
- Objects that are literals are excluded.
## How The In-Memory Graph Is Built
Graph construction is handled by:
- `backend_go/graph_export.go`
The accumulator logic works like this:
- Every returned `?s` and `?o` becomes a node if it has not been seen before.
- There is no separate node query.
- A class only enters the graph if it appears in at least one fetched edge.
- Isolated classes with no fetched `subClassOf` edge never appear.
- If `node_limit` is reached, new nodes stop being added, and edges that depend on them are skipped.
Edge direction at this stage is:
- `Source = subclass (?s)`
- `Target = superclass (?o)`
So the raw in-memory graph is stored as:
- `subclass -> superclass`
## Where `BFO:entity` Is Actually Applied
The root restriction happens only when the backend chooses the Rust hierarchy layout path.
Relevant files:
- `backend_go/config.go`
- `.env`
- `backend_go/graph_snapshot.go`
- `backend_go/hierarchy_layout_bridge.go`
- `radial_sugiyama/src/bridge.rs`
Current behavior:
- `.env` sets `HIERARCHY_LAYOUT_ENGINE=rust`.
- If `graph_query_id == "hierarchy"` and the engine is `rust`, the backend calls the Rust bridge.
- The root IRI comes from `HIERARCHY_LAYOUT_ROOT_IRI`.
- If that env var is not set, the checked-in default is `http://purl.obolibrary.org/obo/BFO_0000001`.
This means the current repository behavior is effectively:
- query all `subClassOf`
- then filter to descendants of `BFO:entity`
- then lay out the filtered graph
## What Go Sends To Rust
Before calling Rust, Go rewrites the edge orientation in:
- `backend_go/hierarchy_layout_bridge.go`
It converts each stored edge from:
- `subclass -> superclass`
into:
- `parentID = superclass`
- `childID = subclass`
So the Rust side receives:
- `superclass -> subclass`
Go also:
- de-duplicates repeated parent/child edges
- sends the configured `root_iri`
- sends all nodes that were present in the fetched hierarchy graph
## How Rust Filters To Descendants Of The Root
Filtering happens in:
- `radial_sugiyama/src/bridge.rs`
The bridge logic does this:
1. Build an internal graph from the request.
2. Find the node whose label/IRI matches `root_iri`.
3. Build adjacency lists in the `parent -> child` direction.
4. Run a BFS/queue traversal starting at the root.
5. Keep only the visited nodes.
6. Keep only edges whose endpoints are both visited.
7. Run radial Sugiyama layout on that filtered subgraph.
Important consequences:
- Nodes outside the descendant closure of the root are dropped.
- Disconnected components are dropped.
- Ancestors of the root are not kept unless they are also reachable as descendants, which normally they are not.
- If the root is missing, the pipeline errors.
- If the root has no descendants, the pipeline errors.
So the actual "select only those starting from `bfo:entity`" logic is:
- **graph traversal after fetching the full hierarchy**
not:
- **root-constrained SPARQL**
## What Comes Back From Rust
After Rust finishes:
- only the filtered nodes are returned
- only edges between retained nodes are returned
- routed edge segments are returned for drawing
That filtering is applied back onto the original Go snapshot response, so the final `/api/graph?graph_query_id=hierarchy` response only contains the root-descendant subgraph when the Rust path is active.
## Why This Feels Like A Separate Pipeline
The main reason it feels split is that the current behavior crosses multiple stages:
1. SPARQL query stage fetches the whole `subClassOf` graph.
2. Graph materialization stage builds a generic snapshot graph.
3. Layout bridge stage applies the root restriction.
4. Layout stage computes coordinates.
This means the "hierarchy rooted at `BFO:entity`" concept is currently embedded in layout preparation instead of existing as a first-class reusable data pipeline.
In practice, the root filtering is:
- not reusable by itself through a dedicated backend API
- not expressed in the initial SPARQL query
- not controlled per request
- tied to the hierarchy layout engine choice
## Selection Queries Are A Different Mechanism
The repository also has separate selection-query endpoints:
- `backend_go/selection_queries/subclasses.go`
- `backend_go/selection_queries/superclasses.go`
- `backend_go/selection_queries/neighbors.go`
Those are used after nodes are already present in a graph snapshot and the user selects node IDs.
They are **not** the mechanism that initially builds the `BFO:entity` hierarchy used by the radial layout.
Their role is more like:
- "given selected node IDs in the current snapshot, query related triples"
not:
- "materialize the hierarchy rooted at `BFO:entity`"
## Current End-To-End Behavior In One Sentence
The current system gets all `rdfs:subClassOf` triples first, constructs a general hierarchy graph, and only then filters it to the descendants of `http://purl.obolibrary.org/obo/BFO_0000001` inside the Rust radial Sugiyama bridge.
## Files To Read When Rewriting
If you want to rewrite this from zero, these are the main files that define the current behavior:
- `backend_go/server.go`
- `backend_go/snapshot_service.go`
- `backend_go/graph_snapshot.go`
- `backend_go/graph_queries/hierarchy.go`
- `backend_go/graph_export.go`
- `backend_go/hierarchy_layout_bridge.go`
- `backend_go/config.go`
- `.env`
- `radial_sugiyama/src/bridge.rs`
- `backend_go/selection_queries/subclasses.go`
- `backend_go/selection_queries/superclasses.go`
## Rewrite-Oriented Takeaway
If your goal is a cleaner standalone pipeline for:
- query `rdfs:subClassOf`
- start from `bfo:entity`
- materialize only the rooted descendant hierarchy
then the current codebase is doing the root restriction too late. Right now, that concern lives in the layout bridge rather than in the query/materialization layer.