Import Solver + neighbors via sparql query

2026-03-04 13:49:14 -03:00
parent d4bfa5f064
commit a75b5b93da
15 changed files with 747 additions and 463 deletions
--- a/backend/app/README.md
+++ b/backend/app/README.md
@@ -32,6 +32,11 @@ Callers (frontend or other clients) interact with a single API surface (`/api/*`
  - Used by `/api/nodes`, `/api/edges`, and `rdflib`-mode `/api/stats`.
 - `pipelines/graph_snapshot.py`
  - Pipeline used by `/api/graph` to return a `{nodes, edges}` snapshot via SPARQL (works for both RDFLib and AnzoGraph).
+- `pipelines/layout_dag_radial.py`
+  - DAG layout helpers used by `pipelines/graph_snapshot.py`:
+    - cycle detection
+    - level-synchronous Kahn layering
+    - radial (ring-per-layer) positioning.
 - `pipelines/snapshot_service.py`
  - Snapshot cache layer used by `/api/graph` and `/api/stats` so the backend doesn't run expensive SPARQL twice.
 - `pipelines/subclass_labels.py`
@@ -64,6 +69,14 @@ RDFLib mode:
 - `TTL_PATH`: path inside the backend container to a `.ttl` file (example: `/data/o3po.ttl`)
 - `MAX_TRIPLES`: optional int; if set, stops parsing after this many triples

+Optional import-combining step (runs before the SPARQL engine starts):
+
+- `COMBINE_OWL_IMPORTS_ON_START`: `true` to recursively load `TTL_PATH` (or `COMBINE_ENTRY_LOCATION`) plus `owl:imports` and write a combined TTL file.
+- `COMBINE_ENTRY_LOCATION`: optional override for the entry file/URL to load (defaults to `TTL_PATH`)
+- `COMBINE_OUTPUT_LOCATION`: optional explicit output path (defaults to `${dirname(entry)}/${COMBINE_OUTPUT_NAME}`)
+- `COMBINE_OUTPUT_NAME`: output filename when `COMBINE_OUTPUT_LOCATION` is not set (default: `combined_ontology.ttl`)
+- `COMBINE_FORCE`: `true` to rebuild even if the output file already exists
+
 AnzoGraph mode:

 - `SPARQL_HOST`: base host (example: `http://anzograph:8080`)
@@ -129,8 +142,8 @@ Returned in `nodes[]` (dense IDs; suitable for indexing in typed arrays):
 - `id`: integer dense node ID used in edges
 - `termType`: `"uri"` or `"bnode"`
 - `iri`: URI string; blank nodes are normalized to `_:<id>`
- `label`: currently `null` in `/api/graph` snapshots (pipelines can be used to populate later)
- `x`/`y`: world-space coordinates for rendering (currently a deterministic spiral layout)
+- `label`: `rdfs:label` when available (best-effort; prefers English)
+- `x`/`y`: world-space coordinates for rendering (currently a radial layered layout derived from `rdfs:subClassOf`)

 ### Edge

@@ -149,11 +162,10 @@ Returned in `edges[]`:

 ## Snapshot Query (`/api/graph`)

-`/api/graph` uses a SPARQL query that:
+`/api/graph` currently uses a SPARQL query that returns only `rdfs:subClassOf` edges:

- selects triples `?s ?p ?o`
- excludes literal objects (`FILTER(!isLiteral(?o))`)
- excludes `rdfs:label`, `skos:prefLabel`, and `skos:altLabel` predicates
+- selects bindings as `?s ?p ?o` (with `?p` bound to `rdfs:subClassOf`)
+- excludes literal objects (`FILTER(!isLiteral(?o))`) for safety
 - optionally excludes blank nodes (unless `INCLUDE_BNODES=true`)
 - applies `LIMIT edge_limit`

@@ -161,6 +173,8 @@ The result bindings are mapped to dense node IDs (first-seen order) and returned

 `/api/graph` also returns `meta` with snapshot counts and engine info so the frontend doesn't need to call `/api/stats`.

+If a cycle is detected in the returned `rdfs:subClassOf` snapshot, `/api/graph` returns HTTP 422 (layout requires a DAG).
+
 ## Pipelines

 ### `pipelines/graph_snapshot.py`