Import Solver + neighbors via sparql query

Reorganiza backend
Graph access via SPARQL
2026-03-04 13:49:14 -03:00 · 2026-03-02 17:33:45 -03:00 · 2026-03-02 16:27:28 -03:00 · 2026-03-02 14:32:42 -03:00
38 changed files with 201994 additions and 200079 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,2 +1,9 @@
 .direnv/
 .envrc
 .env
 backend/.env
 frontend/node_modules/
 frontend/dist/
 .npm/
 .vite/
 data/
--- a/backend/Dockerfile
+++ b/backend/Dockerfile
@@ -0,0 +1,16 @@
 FROM python:3.12-slim
 WORKDIR /app
 ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1
 COPY requirements.txt /app/requirements.txt
 RUN pip install --no-cache-dir -r /app/requirements.txt
 COPY app /app/app
 EXPOSE 8000
 CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
--- a/backend/app/README.md
+++ b/backend/app/README.md
@@ -0,0 +1,197 @@
 # Backend App (`backend/app`)
 This folder contains the FastAPI backend for `visualizador_instanciados`.
 The backend can execute SPARQL queries in two interchangeable ways:
 1. **`GRAPH_BACKEND=rdflib`**: parse a Turtle file into an in-memory RDFLib `Graph` and run SPARQL queries locally.
 2. **`GRAPH_BACKEND=anzograph`**: run SPARQL queries against an AnzoGraph SPARQL endpoint over HTTP (optionally `LOAD` a TTL on startup).
 Callers (frontend or other clients) interact with a single API surface (`/api/*`) and do not need to know which backend is configured.
 ## Files
 - `main.py`
  - FastAPI app setup, startup/shutdown (`lifespan`), and HTTP endpoints.
 - `settings.py`
  - Env-driven configuration (`pydantic-settings`).
 - `sparql_engine.py`
  - Backend-agnostic SPARQL execution layer:
    - `RdflibEngine`: `Graph.query(...)` + SPARQL JSON serialization.
    - `AnzoGraphEngine`: HTTP POST to `/sparql` with Basic auth + readiness gate.
  - `create_sparql_engine(settings)` chooses the engine based on `GRAPH_BACKEND`.
 - `graph_export.py`
  - Shared helpers to:
    - build the snapshot SPARQL query used for edge retrieval
    - map SPARQL JSON bindings to `{nodes, edges}`.
 - `models.py`
  - Pydantic response/request models:
    - `Node`, `Edge`, `GraphResponse`, `StatsResponse`, etc.
 - `rdf_store.py`
  - A local parsed representation (dense IDs + neighbor-ish data) built only in `GRAPH_BACKEND=rdflib`.
  - Used by `/api/nodes`, `/api/edges`, and `rdflib`-mode `/api/stats`.
 - `pipelines/graph_snapshot.py`
  - Pipeline used by `/api/graph` to return a `{nodes, edges}` snapshot via SPARQL (works for both RDFLib and AnzoGraph).
 - `pipelines/layout_dag_radial.py`
  - DAG layout helpers used by `pipelines/graph_snapshot.py`:
    - cycle detection
    - level-synchronous Kahn layering
    - radial (ring-per-layer) positioning.
 - `pipelines/snapshot_service.py`
  - Snapshot cache layer used by `/api/graph` and `/api/stats` so the backend doesn't run expensive SPARQL twice.
 - `pipelines/subclass_labels.py`
  - Pipeline to extract `rdfs:subClassOf` entities and aligned `rdfs:label` list.
 ## Runtime Flow
 On startup (FastAPI lifespan):
 1. `create_sparql_engine(settings)` selects and starts a SPARQL engine.
 2. The engine is stored at `app.state.sparql`.
 3. If `GRAPH_BACKEND=rdflib`, `RDFStore` is also built from the already-loaded RDFLib graph and stored at `app.state.store`.
 On shutdown:
 - `app.state.sparql.shutdown()` is called to close the HTTP client (AnzoGraph mode) or no-op (RDFLib mode).
 ## Environment Variables
 Most configuration is intended to be provided via container environment variables (see repo root `.env` and `docker-compose.yml`).
 Core:
 - `GRAPH_BACKEND`: `rdflib` or `anzograph`
 - `INCLUDE_BNODES`: `true`/`false`
 - `CORS_ORIGINS`: comma-separated list or `*`
 RDFLib mode:
 - `TTL_PATH`: path inside the backend container to a `.ttl` file (example: `/data/o3po.ttl`)
 - `MAX_TRIPLES`: optional int; if set, stops parsing after this many triples
 Optional import-combining step (runs before the SPARQL engine starts):
 - `COMBINE_OWL_IMPORTS_ON_START`: `true` to recursively load `TTL_PATH` (or `COMBINE_ENTRY_LOCATION`) plus `owl:imports` and write a combined TTL file.
 - `COMBINE_ENTRY_LOCATION`: optional override for the entry file/URL to load (defaults to `TTL_PATH`)
 - `COMBINE_OUTPUT_LOCATION`: optional explicit output path (defaults to `${dirname(entry)}/${COMBINE_OUTPUT_NAME}`)
 - `COMBINE_OUTPUT_NAME`: output filename when `COMBINE_OUTPUT_LOCATION` is not set (default: `combined_ontology.ttl`)
 - `COMBINE_FORCE`: `true` to rebuild even if the output file already exists
 AnzoGraph mode:
 - `SPARQL_HOST`: base host (example: `http://anzograph:8080`)
 - `SPARQL_ENDPOINT`: optional full endpoint; if set, overrides `${SPARQL_HOST}/sparql`
 - `SPARQL_USER`, `SPARQL_PASS`: Basic auth credentials
 - `SPARQL_DATA_FILE`: file URI as seen by the **AnzoGraph container** (example: `file:///opt/shared-files/o3po.ttl`)
 - `SPARQL_GRAPH_IRI`: optional graph IRI for `LOAD ... INTO GRAPH <...>`
 - `SPARQL_LOAD_ON_START`: `true` to execute `LOAD <SPARQL_DATA_FILE>` during startup
 - `SPARQL_CLEAR_ON_START`: `true` to execute `CLEAR ALL` during startup (dangerous)
 - `SPARQL_TIMEOUT_S`: request timeout for normal SPARQL requests
 - `SPARQL_READY_RETRIES`, `SPARQL_READY_DELAY_S`, `SPARQL_READY_TIMEOUT_S`: readiness gate parameters
 ## AnzoGraph Readiness Gate
 `AnzoGraphEngine` does not assume "container started" means "SPARQL works".
 It waits for a smoke-test POST:
 - Method: `POST ${SPARQL_ENDPOINT}`
 - Headers:
  - `Content-Type: application/x-www-form-urlencoded`
  - `Accept: application/sparql-results+json`
  - `Authorization: Basic ...` (if configured)
 - Body: `query=ASK WHERE { ?s ?p ?o }`
 - Success condition: HTTP 2xx and response parses as JSON
 This matches the behavior described in `docs/anzograph-readiness-julia.md`.
 ## API Endpoints
 - `GET /api/health`
  - Returns `{ "status": "ok" }`.
 - `GET /api/stats`
  - Returns counts for the same snapshot used by `/api/graph` (via the snapshot cache).
 - `POST /api/sparql`
  - Body: `{ "query": "<SPARQL SELECT/ASK>" }`
  - Returns SPARQL JSON results as-is.
  - Notes:
    - This endpoint is intended for **SELECT/ASK returning SPARQL-JSON**.
    - SPARQL UPDATE is not exposed here (AnzoGraph `LOAD`/`CLEAR` are handled internally during startup).
 - `GET /api/graph?node_limit=...&edge_limit=...`
  - Returns a graph snapshot as `{ nodes: [...], edges: [...] }`.
  - Implemented as a SPARQL edge query + mapping in `pipelines/graph_snapshot.py`.
 - `GET /api/nodes`, `GET /api/edges`
  - Only available in `GRAPH_BACKEND=rdflib` (these use `RDFStore`'s dense ID tables).
 ## Data Contract
 ### Node
 Returned in `nodes[]` (dense IDs; suitable for indexing in typed arrays):
 ```json
 {
  "id": 0,
  "termType": "uri",
  "iri": "http://example.org/Thing",
  "label": null,
  "x": 0.0,
  "y": 0.0
 }
 ```
 - `id`: integer dense node ID used in edges
 - `termType`: `"uri"` or `"bnode"`
 - `iri`: URI string; blank nodes are normalized to `_:<id>`
 - `label`: `rdfs:label` when available (best-effort; prefers English)
 - `x`/`y`: world-space coordinates for rendering (currently a radial layered layout derived from `rdfs:subClassOf`)
 ### Edge
 Returned in `edges[]`:
 ```json
 {
  "source": 0,
  "target": 12,
  "predicate": "http://www.w3.org/2000/01/rdf-schema#subClassOf"
 }
 ```
 - `source`/`target`: dense node IDs (indexes into `nodes[]`)
 - `predicate`: predicate IRI string
 ## Snapshot Query (`/api/graph`)
 `/api/graph` currently uses a SPARQL query that returns only `rdfs:subClassOf` edges:
 - selects bindings as `?s ?p ?o` (with `?p` bound to `rdfs:subClassOf`)
 - excludes literal objects (`FILTER(!isLiteral(?o))`) for safety
 - optionally excludes blank nodes (unless `INCLUDE_BNODES=true`)
 - applies `LIMIT edge_limit`
 The result bindings are mapped to dense node IDs (first-seen order) and returned to the caller.
 `/api/graph` also returns `meta` with snapshot counts and engine info so the frontend doesn't need to call `/api/stats`.
 If a cycle is detected in the returned `rdfs:subClassOf` snapshot, `/api/graph` returns HTTP 422 (layout requires a DAG).
 ## Pipelines
 ### `pipelines/graph_snapshot.py`
 `fetch_graph_snapshot(...)` is the main "export graph" pipeline used by `/api/graph`.
 ### `pipelines/subclass_labels.py`
 `extract_subclass_entities_and_labels(...)`:
 1. Queries all `rdfs:subClassOf` triples.
 2. Builds a unique set of subjects+objects, then converts it to a deterministic list.
 3. Queries `rdfs:label` for those entities and returns aligned lists:
   - `entities[i]` corresponds to `labels[i]`.
 ## Notes / Tradeoffs
 - `/api/graph` returns only nodes that appear in the returned edge result set. Nodes not referenced by those edges will not be present.
 - RDFLib and AnzoGraph may differ in supported SPARQL features (vendor extensions, inference, performance), but the API surface is the same.
 - `rdf_store.py` is currently only needed for `/api/nodes`, `/api/edges`, and rdflib-mode `/api/stats`. If you don't use those endpoints, it can be removed later.
--- a/backend/app/init.py
+++ b/backend/app/init.py
@@ -0,0 +1 @@
--- a/backend/app/graph_export.py
+++ b/backend/app/graph_export.py
@@ -0,0 +1,102 @@
 from __future__ import annotations
 from typing import Any
 def edge_retrieval_query(*, edge_limit: int, include_bnodes: bool) -> str:
    bnode_filter = "" if include_bnodes else "FILTER(!isBlank(?s) && !isBlank(?o))"
    return f"""
 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
 PREFIX owl: <http://www.w3.org/2002/07/owl#>
 SELECT ?s ?p ?o
 WHERE {{
  {{
    VALUES ?p {{ rdf:type }}
    ?s ?p ?o .
    ?o rdf:type owl:Class .
  }}
  UNION
  {{
    VALUES ?p {{ rdfs:subClassOf }}
    ?s ?p ?o .
  }}
  FILTER(!isLiteral(?o))
  {bnode_filter}
 }}
 LIMIT {edge_limit}
 """
 def graph_from_sparql_bindings(
    bindings: list[dict[str, Any]],
    *,
    node_limit: int,
    include_bnodes: bool,
 ) -> tuple[list[dict[str, object]], list[dict[str, object]]]:
    """
    Convert SPARQL JSON results bindings into:
      nodes: [{id, termType, iri, label}]
      edges: [{source, target, predicate}]
    IDs are assigned densely (0..N-1) based on first occurrence in bindings.
    """
    node_id_by_key: dict[tuple[str, str], int] = {}
    node_meta: list[tuple[str, str]] = []  # (termType, iri)
    out_edges: list[dict[str, object]] = []
    def term_to_key_and_iri(term: dict[str, Any]) -> tuple[tuple[str, str], tuple[str, str]] | None:
        t = term.get("type")
        v = term.get("value")
        if not t or v is None:
            return None
        if t == "literal":
            return None
        if t == "bnode":
            if not include_bnodes:
                return None
            # SPARQL JSON uses bnode identifiers without the "_:" prefix; we normalize to "_:id".
            return (("bnode", str(v)), ("bnode", f"_:{v}"))
        # Default to "uri".
        return (("uri", str(v)), ("uri", str(v)))
    def get_or_add(term: dict[str, Any]) -> int | None:
        out = term_to_key_and_iri(term)
        if out is None:
            return None
        key, meta = out
        existing = node_id_by_key.get(key)
        if existing is not None:
            return existing
        if len(node_meta) >= node_limit:
            return None
        nid = len(node_meta)
        node_id_by_key[key] = nid
        node_meta.append(meta)
        return nid
    for b in bindings:
        s_term = b.get("s") or {}
        o_term = b.get("o") or {}
        p_term = b.get("p") or {}
        sid = get_or_add(s_term)
        oid = get_or_add(o_term)
        if sid is None or oid is None:
            continue
        pred = p_term.get("value")
        if not pred:
            continue
        out_edges.append({"source": sid, "target": oid, "predicate": str(pred)})
    out_nodes = [
        {"id": i, "termType": term_type, "iri": iri, "label": None}
        for i, (term_type, iri) in enumerate(node_meta)
    ]
    return out_nodes, out_edges
--- a/backend/app/main.py
+++ b/backend/app/main.py
@@ -0,0 +1,172 @@
 from __future__ import annotations
 from contextlib import asynccontextmanager
 import logging
 import asyncio
 from fastapi import FastAPI, HTTPException, Query
 from fastapi.middleware.cors import CORSMiddleware
 from .models import (
    EdgesResponse,
    GraphResponse,
    NeighborsRequest,
    NeighborsResponse,
    NodesResponse,
    SparqlQueryRequest,
    StatsResponse,
 )
 from .pipelines.layout_dag_radial import CycleError
 from .pipelines.owl_imports_combiner import (
    build_combined_graph,
    output_location_to_path,
    resolve_output_location,
    serialize_graph_to_ttl,
 )
 from .pipelines.selection_neighbors import fetch_neighbor_ids_for_selection
 from .pipelines.snapshot_service import GraphSnapshotService
 from .rdf_store import RDFStore
 from .sparql_engine import RdflibEngine, SparqlEngine, create_sparql_engine
 from .settings import Settings
 settings = Settings()
 logger = logging.getLogger(__name__)
@asynccontextmanager
 async def lifespan(app: FastAPI):
    rdflib_preloaded_graph = None
    if settings.combine_owl_imports_on_start:
        entry_location = settings.combine_entry_location or settings.ttl_path
        output_location = resolve_output_location(
            entry_location,
            output_location=settings.combine_output_location,
            output_name=settings.combine_output_name,
        )
        output_path = output_location_to_path(output_location)
        if output_path.exists() and not settings.combine_force:
            logger.info("Skipping combine step (output exists): %s", output_location)
        else:
            rdflib_preloaded_graph = await asyncio.to_thread(build_combined_graph, entry_location)
            logger.info("Finished combining imports; serializing to: %s", output_location)
            await asyncio.to_thread(serialize_graph_to_ttl, rdflib_preloaded_graph, output_location)
        if settings.graph_backend == "rdflib":
            settings.ttl_path = str(output_path)
    sparql: SparqlEngine = create_sparql_engine(settings, rdflib_graph=rdflib_preloaded_graph)
    await sparql.startup()
    app.state.sparql = sparql
    app.state.snapshot_service = GraphSnapshotService(sparql=sparql, settings=settings)
    # Only build node/edge tables when running in rdflib mode.
    if settings.graph_backend == "rdflib":
        assert isinstance(sparql, RdflibEngine)
        if sparql.graph is None:
            raise RuntimeError("rdflib graph failed to load")
        store = RDFStore(
            ttl_path=settings.ttl_path,
            include_bnodes=settings.include_bnodes,
            max_triples=settings.max_triples,
        )
        store.load(sparql.graph)
        app.state.store = store
    yield
    await sparql.shutdown()
 app = FastAPI(title="visualizador_instanciados backend", lifespan=lifespan)
 cors_origins = settings.cors_origin_list()
 app.add_middleware(
    CORSMiddleware,
    allow_origins=cors_origins,
    allow_credentials=False,
    allow_methods=["*"],
    allow_headers=["*"],
 )
@app.get("/api/health")
 def health() -> dict[str, str]:
    return {"status": "ok"}
@app.get("/api/stats", response_model=StatsResponse)
 async def stats() -> StatsResponse:
    # Stats reflect exactly what we send to the frontend (/api/graph), not global graph size.
    svc: GraphSnapshotService = app.state.snapshot_service
    try:
        snap = await svc.get(node_limit=50_000, edge_limit=100_000)
    except CycleError as e:
        raise HTTPException(status_code=422, detail=str(e)) from None
    meta = snap.meta
    return StatsResponse(
        backend=meta.backend if meta else app.state.sparql.name,
        ttl_path=meta.ttl_path if meta and meta.ttl_path else settings.ttl_path,
        sparql_endpoint=meta.sparql_endpoint if meta else None,
        parsed_triples=len(snap.edges),
        nodes=len(snap.nodes),
        edges=len(snap.edges),
    )
@app.post("/api/sparql")
 async def sparql_query(req: SparqlQueryRequest) -> dict:
    sparql: SparqlEngine = app.state.sparql
    data = await sparql.query_json(req.query)
    return data
@app.post("/api/neighbors", response_model=NeighborsResponse)
 async def neighbors(req: NeighborsRequest) -> NeighborsResponse:
    svc: GraphSnapshotService = app.state.snapshot_service
    snap = await svc.get(node_limit=req.node_limit, edge_limit=req.edge_limit)
    sparql: SparqlEngine = app.state.sparql
    neighbor_ids = await fetch_neighbor_ids_for_selection(
        sparql,
        snapshot=snap,
        selected_ids=req.selected_ids,
        include_bnodes=settings.include_bnodes,
    )
    return NeighborsResponse(selected_ids=req.selected_ids, neighbor_ids=neighbor_ids)
@app.get("/api/nodes", response_model=NodesResponse)
 def nodes(
    limit: int = Query(default=10_000, ge=1, le=200_000),
    offset: int = Query(default=0, ge=0),
 ) -> NodesResponse:
    if settings.graph_backend != "rdflib":
        raise HTTPException(status_code=501, detail="GET /api/nodes is only supported in GRAPH_BACKEND=rdflib mode")
    store: RDFStore = app.state.store
    return NodesResponse(total=store.node_count, nodes=store.node_slice(offset=offset, limit=limit))
@app.get("/api/edges", response_model=EdgesResponse)
 def edges(
    limit: int = Query(default=50_000, ge=1, le=500_000),
    offset: int = Query(default=0, ge=0),
 ) -> EdgesResponse:
    if settings.graph_backend != "rdflib":
        raise HTTPException(status_code=501, detail="GET /api/edges is only supported in GRAPH_BACKEND=rdflib mode")
    store: RDFStore = app.state.store
    return EdgesResponse(total=store.edge_count, edges=store.edge_slice(offset=offset, limit=limit))
@app.get("/api/graph", response_model=GraphResponse)
 async def graph(
    node_limit: int = Query(default=50_000, ge=1, le=200_000),
    edge_limit: int = Query(default=100_000, ge=1, le=500_000),
 ) -> GraphResponse:
    svc: GraphSnapshotService = app.state.snapshot_service
    try:
        return await svc.get(node_limit=node_limit, edge_limit=edge_limit)
    except CycleError as e:
        raise HTTPException(status_code=422, detail=str(e)) from None
--- a/backend/app/models.py
+++ b/backend/app/models.py
@@ -0,0 +1,69 @@
 from __future__ import annotations
 from pydantic import BaseModel
 class Node(BaseModel):
    id: int
    termType: str  # "uri" | "bnode"
    iri: str
    label: str | None = None
    # Optional because /api/nodes (RDFStore) doesn't currently provide positions.
    x: float | None = None
    y: float | None = None
 class Edge(BaseModel):
    source: int
    target: int
    predicate: str
 class StatsResponse(BaseModel):
    backend: str
    ttl_path: str
    sparql_endpoint: str | None = None
    parsed_triples: int
    nodes: int
    edges: int
 class NodesResponse(BaseModel):
    total: int
    nodes: list[Node]
 class EdgesResponse(BaseModel):
    total: int
    edges: list[Edge]
 class GraphResponse(BaseModel):
    class Meta(BaseModel):
        backend: str
        ttl_path: str | None = None
        sparql_endpoint: str | None = None
        include_bnodes: bool
        node_limit: int
        edge_limit: int
        nodes: int
        edges: int
    nodes: list[Node]
    edges: list[Edge]
    meta: Meta | None = None
 class SparqlQueryRequest(BaseModel):
    query: str
 class NeighborsRequest(BaseModel):
    selected_ids: list[int]
    node_limit: int = 50_000
    edge_limit: int = 100_000
 class NeighborsResponse(BaseModel):
    selected_ids: list[int]
    neighbor_ids: list[int]
--- a/backend/app/pipelines/init.py
+++ b/backend/app/pipelines/init.py
@@ -0,0 +1 @@
--- a/backend/app/pipelines/graph_snapshot.py
+++ b/backend/app/pipelines/graph_snapshot.py
@@ -0,0 +1,148 @@
 from __future__ import annotations
 from typing import Any
 from ..graph_export import edge_retrieval_query, graph_from_sparql_bindings
 from ..models import GraphResponse
 from ..sparql_engine import SparqlEngine
 from ..settings import Settings
 from .layout_dag_radial import CycleError, level_synchronous_kahn_layers, radial_positions_from_layers
 RDFS_LABEL = "http://www.w3.org/2000/01/rdf-schema#label"
 def _bindings(res: dict[str, Any]) -> list[dict[str, Any]]:
    return (((res.get("results") or {}).get("bindings")) or [])
 def _label_score(label_binding: dict[str, Any]) -> int:
    # Prefer English, then no-language, then anything else.
    lang = (label_binding.get("xml:lang") or "").lower()
    if lang == "en":
        return 3
    if lang == "":
        return 2
    return 1
 async def _fetch_rdfs_labels_for_iris(
    sparql: SparqlEngine,
    iris: list[str],
    *,
    batch_size: int = 500,
 ) -> dict[str, str]:
    best: dict[str, tuple[int, str]] = {}
    for i in range(0, len(iris), batch_size):
        batch = iris[i : i + batch_size]
        values = " ".join(f"<{u}>" for u in batch)
        q = f"""
 SELECT ?s ?label
 WHERE {{
  VALUES ?s {{ {values} }}
  ?s <{RDFS_LABEL}> ?label .
 }}
 """
        res = await sparql.query_json(q)
        for b in _bindings(res):
            s = (b.get("s") or {}).get("value")
            label_term = b.get("label") or {}
            if not s or label_term.get("type") != "literal":
                continue
            label_value = label_term.get("value")
            if label_value is None:
                continue
            score = _label_score(label_term)
            prev = best.get(s)
            if prev is None or score > prev[0]:
                best[s] = (score, str(label_value))
    return {iri: lbl for iri, (_, lbl) in best.items()}
 async def fetch_graph_snapshot(
    sparql: SparqlEngine,
    *,
    settings: Settings,
    node_limit: int,
    edge_limit: int,
 ) -> GraphResponse:
    """
    Fetch a graph snapshot (nodes + edges) via SPARQL, independent of whether the
    underlying engine is RDFLib or AnzoGraph.
    """
    edges_q = edge_retrieval_query(edge_limit=edge_limit, include_bnodes=settings.include_bnodes)
    res = await sparql.query_json(edges_q)
    bindings = (((res.get("results") or {}).get("bindings")) or [])
    nodes, edges = graph_from_sparql_bindings(
        bindings,
        node_limit=node_limit,
        include_bnodes=settings.include_bnodes,
    )
    # Add positions so the frontend doesn't need to run a layout.
    #
    # We are exporting only rdfs:subClassOf triples. In the exported edges:
    #   source = subclass, target = superclass
    # For hierarchical layout we invert edges to:
    #   superclass -> subclass
    hier_edges: list[tuple[int, int]] = []
    for e in edges:
        s = e.get("source")
        t = e.get("target")
        try:
            sid = int(s)  # subclass
            tid = int(t)  # superclass
        except Exception:
            continue
        hier_edges.append((tid, sid))
    try:
        layers = level_synchronous_kahn_layers(node_count=len(nodes), edges=hier_edges)
    except CycleError as e:
        # Add a small URI sample to aid debugging.
        sample: list[str] = []
        for nid in e.remaining_node_ids[:20]:
            try:
                sample.append(str(nodes[nid].get("iri")))
            except Exception:
                continue
        raise CycleError(
            processed=e.processed,
            total=e.total,
            remaining_node_ids=e.remaining_node_ids,
            remaining_iri_sample=sample or None,
        ) from None
    # Deterministic order within each ring/layer for stable layouts.
    id_to_iri = [str(n.get("iri", "")) for n in nodes]
    for layer in layers:
        layer.sort(key=lambda nid: id_to_iri[nid])
    xs, ys = radial_positions_from_layers(node_count=len(nodes), layers=layers)
    for i, node in enumerate(nodes):
        node["x"] = float(xs[i])
        node["y"] = float(ys[i])
    # Attach labels for URI nodes (blank nodes remain label-less).
    uri_nodes = [n for n in nodes if n.get("termType") == "uri"]
    if uri_nodes:
        iris = [str(n["iri"]) for n in uri_nodes if isinstance(n.get("iri"), str)]
        label_by_iri = await _fetch_rdfs_labels_for_iris(sparql, iris)
        for n in uri_nodes:
            iri = n.get("iri")
            if isinstance(iri, str) and iri in label_by_iri:
                n["label"] = label_by_iri[iri]
    meta = GraphResponse.Meta(
        backend=sparql.name,
        ttl_path=settings.ttl_path if settings.graph_backend == "rdflib" else None,
        sparql_endpoint=settings.effective_sparql_endpoint() if settings.graph_backend == "anzograph" else None,
        include_bnodes=settings.include_bnodes,
        node_limit=node_limit,
        edge_limit=edge_limit,
        nodes=len(nodes),
        edges=len(edges),
    )
    return GraphResponse(nodes=nodes, edges=edges, meta=meta)
--- a/backend/app/pipelines/layout_dag_radial.py
+++ b/backend/app/pipelines/layout_dag_radial.py
@@ -0,0 +1,141 @@
 from __future__ import annotations
 import math
 from collections import deque
 from typing import Iterable, Sequence
 class CycleError(RuntimeError):
    """
    Raised when the requested layout requires a DAG, but a cycle is detected.
    `remaining_node_ids` are the node ids that still had indegree > 0 after Kahn.
    """
    def __init__(
        self,
        *,
        processed: int,
        total: int,
        remaining_node_ids: list[int],
        remaining_iri_sample: list[str] | None = None,
    ) -> None:
        self.processed = int(processed)
        self.total = int(total)
        self.remaining_node_ids = remaining_node_ids
        self.remaining_iri_sample = remaining_iri_sample
        msg = f"Cycle detected in subClassOf graph (processed {self.processed}/{self.total} nodes)."
        if remaining_iri_sample:
            msg += f" Example nodes: {', '.join(remaining_iri_sample)}"
        super().__init__(msg)
 def level_synchronous_kahn_layers(
    *,
    node_count: int,
    edges: Iterable[tuple[int, int]],
 ) -> list[list[int]]:
    """
    Level-synchronous Kahn's algorithm:
    - process the entire current queue as one batch (one layer)
    - only then enqueue newly-unlocked nodes for the next batch
    `edges` are directed (u -> v).
    """
    n = int(node_count)
    if n <= 0:
        return []
    adj: list[list[int]] = [[] for _ in range(n)]
    indeg = [0] * n
    for u, v in edges:
        if u == v:
            # Self-loops don't help layout and would trivially violate DAG-ness.
            continue
        if not (0 <= u < n and 0 <= v < n):
            continue
        adj[u].append(v)
        indeg[v] += 1
    q: deque[int] = deque(i for i, d in enumerate(indeg) if d == 0)
    layers: list[list[int]] = []
    processed = 0
    while q:
        # Consume the full current queue as a single layer.
        layer = list(q)
        q.clear()
        layers.append(layer)
        for u in layer:
            processed += 1
            for v in adj[u]:
                indeg[v] -= 1
                if indeg[v] == 0:
                    q.append(v)
    if processed != n:
        remaining = [i for i, d in enumerate(indeg) if d > 0]
        raise CycleError(processed=processed, total=n, remaining_node_ids=remaining)
    return layers
 def radial_positions_from_layers(
    *,
    node_count: int,
    layers: Sequence[Sequence[int]],
    max_r: float = 5000.0,
 ) -> tuple[list[float], list[float]]:
    """
    Assign node positions in concentric rings (one ring per layer).
    - radius increases with layer index
    - nodes within a layer are placed evenly by angle
    - each ring gets a "golden-angle" rotation to reduce spoke artifacts
    """
    n = int(node_count)
    if n <= 0:
        return ([], [])
    xs = [0.0] * n
    ys = [0.0] * n
    if not layers:
        return (xs, ys)
    two_pi = 2.0 * math.pi
    golden = math.pi * (3.0 - math.sqrt(5.0))
    layer_count = len(layers)
    denom = float(layer_count + 1)
    for li, layer in enumerate(layers):
        m = len(layer)
        if m <= 0:
            continue
        # Keep everything within ~[-max_r, max_r] like the previous spiral layout.
        r = ((li + 1) / denom) * max_r
        # Rotate each layer deterministically to avoid radial spokes aligning.
        offset = (li * golden) % two_pi
        if m == 1:
            nid = int(layer[0])
            if 0 <= nid < n:
                xs[nid] = r * math.cos(offset)
                ys[nid] = r * math.sin(offset)
            continue
        step = two_pi / float(m)
        for j, raw_id in enumerate(layer):
            nid = int(raw_id)
            if not (0 <= nid < n):
                continue
            t = offset + step * float(j)
            xs[nid] = r * math.cos(t)
            ys[nid] = r * math.sin(t)
    return (xs, ys)
--- a/backend/app/pipelines/layout_spiral.py
+++ b/backend/app/pipelines/layout_spiral.py
@@ -0,0 +1,30 @@
 from __future__ import annotations
 import math
 def spiral_positions(n: int, *, max_r: float = 5000.0) -> tuple[list[float], list[float]]:
    """
    Deterministic "sunflower" (golden-angle) spiral layout.
    This is intentionally simple and stable across runs:
    - angle increments by the golden angle to avoid radial spokes
    - radius grows with sqrt(i) to keep density roughly uniform over area
    """
    if n <= 0:
        return ([], [])
    xs = [0.0] * n
    ys = [0.0] * n
    golden = math.pi * (3.0 - math.sqrt(5.0))
    denom = float(max(1, n - 1))
    for i in range(n):
        t = i * golden
        r = math.sqrt(i / denom) * max_r
        xs[i] = r * math.cos(t)
        ys[i] = r * math.sin(t)
    return xs, ys
--- a/backend/app/pipelines/owl_imports_combiner.py
+++ b/backend/app/pipelines/owl_imports_combiner.py
@@ -0,0 +1,96 @@
 from __future__ import annotations
 import logging
 import os
 from pathlib import Path
 from urllib.parse import unquote, urlparse
 from rdflib import Graph
 from rdflib.namespace import OWL
 logger = logging.getLogger(__name__)
 def _is_http_url(location: str) -> bool:
    scheme = urlparse(location).scheme.lower()
    return scheme in {"http", "https"}
 def _is_file_uri(location: str) -> bool:
    return urlparse(location).scheme.lower() == "file"
 def _file_uri_to_path(location: str) -> Path:
    u = urlparse(location)
    if u.scheme.lower() != "file":
        raise ValueError(f"Not a file:// URI: {location!r}")
    return Path(unquote(u.path))
 def resolve_output_location(
    entry_location: str,
    *,
    output_location: str | None,
    output_name: str,
 ) -> str:
    if output_location:
        return output_location
    if _is_http_url(entry_location):
        raise ValueError(
            "COMBINE_ENTRY_LOCATION points to an http(s) URL; set COMBINE_OUTPUT_LOCATION to a writable file path."
        )
    entry_path = _file_uri_to_path(entry_location) if _is_file_uri(entry_location) else Path(entry_location)
    return str(entry_path.parent / output_name)
 def _output_destination_to_path(output_location: str) -> Path:
    if _is_file_uri(output_location):
        return _file_uri_to_path(output_location)
    if _is_http_url(output_location):
        raise ValueError("Output location must be a local file path (or file:// URI), not http(s).")
    return Path(output_location)
 def output_location_to_path(output_location: str) -> Path:
    return _output_destination_to_path(output_location)
 def build_combined_graph(entry_location: str) -> Graph:
    """
    Recursively loads an RDF document (file path, file:// URI, or http(s) URL) and its
    owl:imports into a single in-memory graph.
    """
    combined_graph = Graph()
    visited_locations: set[str] = set()
    def resolve_imports(location: str) -> None:
        if location in visited_locations:
            return
        visited_locations.add(location)
        logger.info("Loading ontology: %s", location)
        try:
            combined_graph.parse(location=location)
        except Exception as e:
            logger.warning("Failed to load %s (%s)", location, e)
            return
        imports = [str(o) for _, _, o in combined_graph.triples((None, OWL.imports, None))]
        for imported_location in imports:
            if imported_location not in visited_locations:
                resolve_imports(imported_location)
    resolve_imports(entry_location)
    return combined_graph
 def serialize_graph_to_ttl(graph: Graph, output_location: str) -> None:
    output_path = _output_destination_to_path(output_location)
    output_path.parent.mkdir(parents=True, exist_ok=True)
    tmp_path = output_path.with_suffix(output_path.suffix + ".tmp")
    graph.serialize(destination=str(tmp_path), format="turtle")
    os.replace(str(tmp_path), str(output_path))
--- a/backend/app/pipelines/selection_neighbors.py
+++ b/backend/app/pipelines/selection_neighbors.py
@@ -0,0 +1,137 @@
 from __future__ import annotations
 from typing import Any, Iterable
 from ..models import GraphResponse, Node
 from ..sparql_engine import SparqlEngine
 def _values_term(node: Node) -> str | None:
    iri = node.iri
    if node.termType == "uri":
        return f"<{iri}>"
    if node.termType == "bnode":
        if iri.startswith("_:"):
            return iri
        return f"_:{iri}"
    return None
 def selection_neighbors_query(*, selected_nodes: Iterable[Node], include_bnodes: bool) -> str:
    values_terms: list[str] = []
    for n in selected_nodes:
        t = _values_term(n)
        if t is None:
            continue
        values_terms.append(t)
    if not values_terms:
        # Caller should avoid running this query when selection is empty, but keep this safe.
        return "SELECT ?nbr WHERE { FILTER(false) }"
    bnode_filter = "" if include_bnodes else "FILTER(!isBlank(?nbr))"
    values = " ".join(values_terms)
    # Neighbors are defined as any node directly connected by rdf:type (to owl:Class)
    # or rdfs:subClassOf, in either direction (treating edges as undirected).
    return f"""
 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
 PREFIX owl: <http://www.w3.org/2002/07/owl#>
 SELECT DISTINCT ?nbr
 WHERE {{
  VALUES ?sel {{ {values} }}
  {{
    ?sel rdf:type ?o .
    ?o rdf:type owl:Class .
    BIND(?o AS ?nbr)
  }}
  UNION
  {{
    ?s rdf:type ?sel .
    ?sel rdf:type owl:Class .
    BIND(?s AS ?nbr)
  }}
  UNION
  {{
    ?sel rdfs:subClassOf ?o .
    BIND(?o AS ?nbr)
  }}
  UNION
  {{
    ?s rdfs:subClassOf ?sel .
    BIND(?s AS ?nbr)
  }}
  FILTER(!isLiteral(?nbr))
  FILTER(?nbr != ?sel)
  {bnode_filter}
 }}
 """
 def _bindings(res: dict[str, Any]) -> list[dict[str, Any]]:
    return (((res.get("results") or {}).get("bindings")) or [])
 def _term_key(term: dict[str, Any], *, include_bnodes: bool) -> tuple[str, str] | None:
    t = term.get("type")
    v = term.get("value")
    if not t or v is None:
        return None
    if t == "literal":
        return None
    if t == "bnode":
        if not include_bnodes:
            return None
        return ("bnode", f"_:{v}")
    return ("uri", str(v))
 async def fetch_neighbor_ids_for_selection(
    sparql: SparqlEngine,
    *,
    snapshot: GraphResponse,
    selected_ids: list[int],
    include_bnodes: bool,
 ) -> list[int]:
    id_to_node: dict[int, Node] = {n.id: n for n in snapshot.nodes}
    selected_nodes: list[Node] = []
    selected_id_set: set[int] = set()
    for nid in selected_ids:
        if not isinstance(nid, int):
            continue
        n = id_to_node.get(nid)
        if n is None:
            continue
        if n.termType == "bnode" and not include_bnodes:
            continue
        selected_nodes.append(n)
        selected_id_set.add(nid)
    if not selected_nodes:
        return []
    key_to_id: dict[tuple[str, str], int] = {}
    for n in snapshot.nodes:
        key_to_id[(n.termType, n.iri)] = n.id
    q = selection_neighbors_query(selected_nodes=selected_nodes, include_bnodes=include_bnodes)
    res = await sparql.query_json(q)
    neighbor_ids: set[int] = set()
    for b in _bindings(res):
        nbr_term = b.get("nbr") or {}
        key = _term_key(nbr_term, include_bnodes=include_bnodes)
        if key is None:
            continue
        nid = key_to_id.get(key)
        if nid is None:
            continue
        if nid in selected_id_set:
            continue
        neighbor_ids.add(nid)
    # Stable ordering for consistent frontend behavior.
    return sorted(neighbor_ids)
--- a/backend/app/pipelines/snapshot_service.py
+++ b/backend/app/pipelines/snapshot_service.py
@@ -0,0 +1,63 @@
 from __future__ import annotations
 import asyncio
 from dataclasses import dataclass
 from ..models import GraphResponse
 from ..sparql_engine import SparqlEngine
 from ..settings import Settings
 from .graph_snapshot import fetch_graph_snapshot
@dataclass(frozen=True)
 class SnapshotKey:
    node_limit: int
    edge_limit: int
    include_bnodes: bool
 class GraphSnapshotService:
    """
    Caches graph snapshots so the backend doesn't re-run expensive SPARQL for stats/graph.
    """
    def __init__(self, *, sparql: SparqlEngine, settings: Settings):
        self._sparql = sparql
        self._settings = settings
        self._cache: dict[SnapshotKey, GraphResponse] = {}
        self._locks: dict[SnapshotKey, asyncio.Lock] = {}
        self._global_lock = asyncio.Lock()
    async def get(self, *, node_limit: int, edge_limit: int) -> GraphResponse:
        key = SnapshotKey(
            node_limit=node_limit,
            edge_limit=edge_limit,
            include_bnodes=self._settings.include_bnodes,
        )
        cached = self._cache.get(key)
        if cached is not None:
            return cached
        # Create/get a per-key lock under a global lock to avoid races.
        async with self._global_lock:
            lock = self._locks.get(key)
            if lock is None:
                lock = asyncio.Lock()
                self._locks[key] = lock
        async with lock:
            cached2 = self._cache.get(key)
            if cached2 is not None:
                return cached2
            snapshot = await fetch_graph_snapshot(
                self._sparql,
                settings=self._settings,
                node_limit=node_limit,
                edge_limit=edge_limit,
            )
            self._cache[key] = snapshot
            return snapshot
--- a/backend/app/pipelines/subclass_labels.py
+++ b/backend/app/pipelines/subclass_labels.py
@@ -0,0 +1,153 @@
 from __future__ import annotations
 from typing import Any
 from ..sparql_engine import SparqlEngine
 RDFS_SUBCLASS_OF = "http://www.w3.org/2000/01/rdf-schema#subClassOf"
 RDFS_LABEL = "http://www.w3.org/2000/01/rdf-schema#label"
 def _bindings(res: dict[str, Any]) -> list[dict[str, Any]]:
    return (((res.get("results") or {}).get("bindings")) or [])
 def _term_key(term: dict[str, Any]) -> tuple[str, str] | None:
    t = term.get("type")
    v = term.get("value")
    if not t or v is None:
        return None
    if t == "literal":
        return None
    if t == "bnode":
        return ("bnode", str(v))
    return ("uri", str(v))
 def _key_to_entity_string(key: tuple[str, str]) -> str:
    t, v = key
    if t == "bnode":
        return f"_:{v}"
    return v
 def _label_score(binding: dict[str, Any]) -> int:
    """
    Higher is better.
    Prefer English, then no-language, then anything else.
    """
    lang = (binding.get("xml:lang") or "").lower()
    if lang == "en":
        return 3
    if lang == "":
        return 2
    return 1
 async def extract_subclass_entities_and_labels(
    sparql: SparqlEngine,
    *,
    include_bnodes: bool,
    label_batch_size: int = 500,
 ) -> tuple[list[str], list[str | None]]:
    """
    Pipeline:
      1) Query all rdfs:subClassOf triples.
      2) Build a unique set of entity terms from subjects+objects, convert to list.
      3) Fetch rdfs:label for those entities and return an aligned labels list.
    Returns:
      entities: list[str] (IRI or "_:bnodeId")
      labels:   list[str|None], aligned with entities
    """
    subclass_q = f"""
 SELECT ?s ?o
 WHERE {{
  ?s <{RDFS_SUBCLASS_OF}> ?o .
  FILTER(!isLiteral(?o))
  {"FILTER(!isBlank(?s) && !isBlank(?o))" if not include_bnodes else ""}
 }}
 """
    res = await sparql.query_json(subclass_q)
    entity_keys: set[tuple[str, str]] = set()
    for b in _bindings(res):
        sk = _term_key(b.get("s") or {})
        ok = _term_key(b.get("o") or {})
        if sk is not None and (include_bnodes or sk[0] != "bnode"):
            entity_keys.add(sk)
        if ok is not None and (include_bnodes or ok[0] != "bnode"):
            entity_keys.add(ok)
    # Deterministic ordering.
    entity_key_list = sorted(entity_keys, key=lambda k: (k[0], k[1]))
    entities = [_key_to_entity_string(k) for k in entity_key_list]
    # Build label map keyed by term key.
    best_label_by_key: dict[tuple[str, str], tuple[int, str]] = {}
    # URIs can be batch-queried via VALUES.
    uri_values = [v for (t, v) in entity_key_list if t == "uri"]
    for i in range(0, len(uri_values), label_batch_size):
        batch = uri_values[i : i + label_batch_size]
        values = " ".join(f"<{u}>" for u in batch)
        labels_q = f"""
 SELECT ?s ?label
 WHERE {{
  VALUES ?s {{ {values} }}
  ?s <{RDFS_LABEL}> ?label .
 }}
 """
        lres = await sparql.query_json(labels_q)
        for b in _bindings(lres):
            sk = _term_key(b.get("s") or {})
            if sk is None or sk[0] != "uri":
                continue
            label_term = b.get("label") or {}
            if label_term.get("type") != "literal":
                continue
            label_value = label_term.get("value")
            if label_value is None:
                continue
            score = _label_score(label_term)
            prev = best_label_by_key.get(sk)
            if prev is None or score > prev[0]:
                best_label_by_key[sk] = (score, str(label_value))
    # Blank nodes can't reliably be addressed by ID across queries, but if enabled we can still
    # fetch all bnode labels and filter locally.
    if include_bnodes:
        bnode_keys = {k for k in entity_key_list if k[0] == "bnode"}
        if bnode_keys:
            bnode_labels_q = f"""
 SELECT ?s ?label
 WHERE {{
  ?s <{RDFS_LABEL}> ?label .
  FILTER(isBlank(?s))
 }}
 """
            blres = await sparql.query_json(bnode_labels_q)
            for b in _bindings(blres):
                sk = _term_key(b.get("s") or {})
                if sk is None or sk not in bnode_keys:
                    continue
                label_term = b.get("label") or {}
                if label_term.get("type") != "literal":
                    continue
                label_value = label_term.get("value")
                if label_value is None:
                    continue
                score = _label_score(label_term)
                prev = best_label_by_key.get(sk)
                if prev is None or score > prev[0]:
                    best_label_by_key[sk] = (score, str(label_value))
    labels: list[str | None] = []
    for k in entity_key_list:
        item = best_label_by_key.get(k)
        labels.append(item[1] if item else None)
    return entities, labels
--- a/backend/app/rdf_store.py
+++ b/backend/app/rdf_store.py
@@ -0,0 +1,150 @@
 from __future__ import annotations
 from dataclasses import dataclass
 from typing import Any
 from rdflib import BNode, Graph, Literal, URIRef
 from rdflib.namespace import RDFS, SKOS
 LABEL_PREDICATES = {RDFS.label, SKOS.prefLabel, SKOS.altLabel}
@dataclass(frozen=True)
 class EdgeRow:
    source: int
    target: int
    predicate: str
 class RDFStore:
    def __init__(self, *, ttl_path: str, include_bnodes: bool, max_triples: int | None):
        self.ttl_path = ttl_path
        self.include_bnodes = include_bnodes
        self.max_triples = max_triples
        self.graph: Graph | None = None
        self._id_by_term: dict[Any, int] = {}
        self._term_by_id: list[Any] = []
        self._labels_by_id: dict[int, str] = {}
        self._edges: list[EdgeRow] = []
        self._parsed_triples = 0
    def _term_allowed(self, term: Any) -> bool:
        if isinstance(term, Literal):
            return False
        if isinstance(term, BNode) and not self.include_bnodes:
            return False
        return isinstance(term, (URIRef, BNode))
    def _get_id(self, term: Any) -> int | None:
        if not self._term_allowed(term):
            return None
        existing = self._id_by_term.get(term)
        if existing is not None:
            return existing
        nid = len(self._term_by_id)
        self._id_by_term[term] = nid
        self._term_by_id.append(term)
        return nid
    def _term_type(self, term: Any) -> str:
        if isinstance(term, BNode):
            return "bnode"
        return "uri"
    def _term_iri(self, term: Any) -> str:
        if isinstance(term, BNode):
            return f"_:{term}"
        return str(term)
    def load(self, graph: Graph | None = None) -> None:
        g = graph or Graph()
        if graph is None:
            g.parse(self.ttl_path, format="turtle")
        self.graph = g
        self._id_by_term.clear()
        self._term_by_id.clear()
        self._labels_by_id.clear()
        self._edges.clear()
        parsed = 0
        for (s, p, o) in g:
            parsed += 1
            if self.max_triples is not None and parsed > self.max_triples:
                break
            # Capture labels but do not emit them as edges.
            if p in LABEL_PREDICATES and isinstance(o, Literal):
                sid = self._get_id(s)
                if sid is not None and sid not in self._labels_by_id:
                    self._labels_by_id[sid] = str(o)
                continue
            sid = self._get_id(s)
            oid = self._get_id(o)
            if sid is None or oid is None:
                continue
            self._edges.append(EdgeRow(source=sid, target=oid, predicate=str(p)))
        self._parsed_triples = parsed
    @property
    def parsed_triples(self) -> int:
        return self._parsed_triples
    @property
    def node_count(self) -> int:
        return len(self._term_by_id)
    @property
    def edge_count(self) -> int:
        return len(self._edges)
    def node_slice(self, *, offset: int, limit: int) -> list[dict[str, Any]]:
        end = min(self.node_count, offset + limit)
        out: list[dict[str, Any]] = []
        for nid in range(offset, end):
            term = self._term_by_id[nid]
            out.append(
                {
                    "id": nid,
                    "termType": self._term_type(term),
                    "iri": self._term_iri(term),
                    "label": self._labels_by_id.get(nid),
                }
            )
        return out
    def edge_slice(self, *, offset: int, limit: int) -> list[dict[str, Any]]:
        end = min(self.edge_count, offset + limit)
        out: list[dict[str, Any]] = []
        for row in self._edges[offset:end]:
            out.append(
                {
                    "source": row.source,
                    "target": row.target,
                    "predicate": row.predicate,
                }
            )
        return out
    def edges_within_nodes(self, *, max_node_id_exclusive: int, limit: int) -> list[dict[str, Any]]:
        out: list[dict[str, Any]] = []
        for row in self._edges:
            if row.source >= max_node_id_exclusive or row.target >= max_node_id_exclusive:
                continue
            out.append(
                {
                    "source": row.source,
                    "target": row.target,
                    "predicate": row.predicate,
                }
            )
            if len(out) >= limit:
                break
        return out
--- a/backend/app/settings.py
+++ b/backend/app/settings.py
@@ -0,0 +1,58 @@
 from __future__ import annotations
 from typing import Literal
 from pydantic import Field
 from pydantic_settings import BaseSettings, SettingsConfigDict
 class Settings(BaseSettings):
    # Which graph engine executes SPARQL queries.
    # - rdflib: parse TTL locally and query in-memory
    # - anzograph: query a remote AnzoGraph SPARQL endpoint (optionally LOAD on startup)
    graph_backend: Literal["rdflib", "anzograph"] = Field(default="rdflib", alias="GRAPH_BACKEND")
    ttl_path: str = Field(default="/data/o3po.ttl", alias="TTL_PATH")
    include_bnodes: bool = Field(default=False, alias="INCLUDE_BNODES")
    max_triples: int | None = Field(default=None, alias="MAX_TRIPLES")
    # Optional: Combine owl:imports into a single TTL file on backend startup.
    combine_owl_imports_on_start: bool = Field(default=False, alias="COMBINE_OWL_IMPORTS_ON_START")
    combine_entry_location: str | None = Field(default=None, alias="COMBINE_ENTRY_LOCATION")
    combine_output_location: str | None = Field(default=None, alias="COMBINE_OUTPUT_LOCATION")
    combine_output_name: str = Field(default="combined_ontology.ttl", alias="COMBINE_OUTPUT_NAME")
    combine_force: bool = Field(default=False, alias="COMBINE_FORCE")
    # AnzoGraph / SPARQL endpoint configuration
    sparql_host: str = Field(default="http://anzograph:8080", alias="SPARQL_HOST")
    # If not set, the backend uses `${SPARQL_HOST}/sparql`.
    sparql_endpoint: str | None = Field(default=None, alias="SPARQL_ENDPOINT")
    sparql_user: str | None = Field(default=None, alias="SPARQL_USER")
    sparql_pass: str | None = Field(default=None, alias="SPARQL_PASS")
    # File URI as seen by the AnzoGraph container (used with SPARQL `LOAD`).
    # Example: file:///opt/shared-files/o3po.ttl
    sparql_data_file: str | None = Field(default=None, alias="SPARQL_DATA_FILE")
    sparql_graph_iri: str | None = Field(default=None, alias="SPARQL_GRAPH_IRI")
    sparql_load_on_start: bool = Field(default=False, alias="SPARQL_LOAD_ON_START")
    sparql_clear_on_start: bool = Field(default=False, alias="SPARQL_CLEAR_ON_START")
    sparql_timeout_s: float = Field(default=300.0, alias="SPARQL_TIMEOUT_S")
    sparql_ready_retries: int = Field(default=30, alias="SPARQL_READY_RETRIES")
    sparql_ready_delay_s: float = Field(default=4.0, alias="SPARQL_READY_DELAY_S")
    sparql_ready_timeout_s: float = Field(default=10.0, alias="SPARQL_READY_TIMEOUT_S")
    # Comma-separated, or "*" (default).
    cors_origins: str = Field(default="*", alias="CORS_ORIGINS")
    model_config = SettingsConfigDict(env_file=".env", extra="ignore")
    def cors_origin_list(self) -> list[str]:
        if self.cors_origins.strip() == "*":
            return ["*"]
        return [o.strip() for o in self.cors_origins.split(",") if o.strip()]
    def effective_sparql_endpoint(self) -> str:
        if self.sparql_endpoint and self.sparql_endpoint.strip():
            return self.sparql_endpoint.strip()
        return self.sparql_host.rstrip("/") + "/sparql"
--- a/backend/app/sparql_engine.py
+++ b/backend/app/sparql_engine.py
@@ -0,0 +1,177 @@
 from __future__ import annotations
 import asyncio
 import base64
 import json
 from typing import Any, Protocol
 import httpx
 from rdflib import Graph
 from .settings import Settings
 class SparqlEngine(Protocol):
    name: str
    async def startup(self) -> None: ...
    async def shutdown(self) -> None: ...
    async def query_json(self, query: str) -> dict[str, Any]: ...
 class RdflibEngine:
    name = "rdflib"
    def __init__(self, *, ttl_path: str, graph: Graph | None = None):
        self.ttl_path = ttl_path
        self.graph: Graph | None = graph
    async def startup(self) -> None:
        if self.graph is not None:
            return
        g = Graph()
        g.parse(self.ttl_path, format="turtle")
        self.graph = g
    async def shutdown(self) -> None:
        # Nothing to close for in-memory rdflib graph.
        return None
    async def query_json(self, query: str) -> dict[str, Any]:
        if self.graph is None:
            raise RuntimeError("RdflibEngine not started")
        result = self.graph.query(query)
        payload = result.serialize(format="json")
        if isinstance(payload, bytes):
            payload = payload.decode("utf-8")
        return json.loads(payload)
 class AnzoGraphEngine:
    name = "anzograph"
    def __init__(self, *, settings: Settings):
        self.endpoint = settings.effective_sparql_endpoint()
        self.timeout_s = settings.sparql_timeout_s
        self.ready_retries = settings.sparql_ready_retries
        self.ready_delay_s = settings.sparql_ready_delay_s
        self.ready_timeout_s = settings.sparql_ready_timeout_s
        self.user = settings.sparql_user
        self.password = settings.sparql_pass
        self.data_file = settings.sparql_data_file
        self.graph_iri = settings.sparql_graph_iri
        self.load_on_start = settings.sparql_load_on_start
        self.clear_on_start = settings.sparql_clear_on_start
        self._client: httpx.AsyncClient | None = None
        self._auth_header = self._build_auth_header(self.user, self.password)
    @staticmethod
    def _build_auth_header(user: str | None, password: str | None) -> str | None:
        if not user or not password:
            return None
        token = base64.b64encode(f"{user}:{password}".encode("utf-8")).decode("ascii")
        return f"Basic {token}"
    async def startup(self) -> None:
        self._client = httpx.AsyncClient(timeout=self.timeout_s)
        await self._wait_ready()
        if self.clear_on_start:
            await self._update("CLEAR ALL")
            await self._wait_ready()
        if self.load_on_start:
            if not self.data_file:
                raise RuntimeError("SPARQL_LOAD_ON_START=true but SPARQL_DATA_FILE is not set")
            if self.graph_iri:
                await self._update(f"LOAD <{self.data_file}> INTO GRAPH <{self.graph_iri}>")
            else:
                await self._update(f"LOAD <{self.data_file}>")
            # AnzoGraph may still be indexing after LOAD.
            await self._wait_ready()
    async def shutdown(self) -> None:
        if self._client is not None:
            await self._client.aclose()
            self._client = None
    async def query_json(self, query: str) -> dict[str, Any]:
        if self._client is None:
            raise RuntimeError("AnzoGraphEngine not started")
        headers = {
            "Content-Type": "application/x-www-form-urlencoded",
            "Accept": "application/sparql-results+json",
        }
        if self._auth_header:
            headers["Authorization"] = self._auth_header
        # AnzoGraph expects x-www-form-urlencoded with `query=...`.
        resp = await self._client.post(
            self.endpoint,
            headers=headers,
            data={"query": query},
        )
        resp.raise_for_status()
        return resp.json()
    async def _update(self, update: str) -> None:
        if self._client is None:
            raise RuntimeError("AnzoGraphEngine not started")
        headers = {
            "Content-Type": "application/sparql-update",
            "Accept": "application/json",
        }
        if self._auth_header:
            headers["Authorization"] = self._auth_header
        resp = await self._client.post(self.endpoint, headers=headers, content=update)
        resp.raise_for_status()
    async def _wait_ready(self) -> None:
        if self._client is None:
            raise RuntimeError("AnzoGraphEngine not started")
        # Match the repo's Julia readiness gate: real SPARQL POST + valid JSON parse.
        headers = {
            "Content-Type": "application/x-www-form-urlencoded",
            "Accept": "application/sparql-results+json",
        }
        if self._auth_header:
            headers["Authorization"] = self._auth_header
        last_err: Exception | None = None
        for _ in range(self.ready_retries):
            try:
                resp = await self._client.post(
                    self.endpoint,
                    headers=headers,
                    data={"query": "ASK WHERE { ?s ?p ?o }"},
                    timeout=self.ready_timeout_s,
                )
                resp.raise_for_status()
                # Ensure it's JSON, not HTML/text during boot.
                resp.json()
                return
            except Exception as e:
                last_err = e
                await asyncio.sleep(self.ready_delay_s)
        raise RuntimeError(f"AnzoGraph not ready at {self.endpoint}") from last_err
 def create_sparql_engine(settings: Settings, *, rdflib_graph: Graph | None = None) -> SparqlEngine:
    if settings.graph_backend == "rdflib":
        return RdflibEngine(ttl_path=settings.ttl_path, graph=rdflib_graph)
    if settings.graph_backend == "anzograph":
        return AnzoGraphEngine(settings=settings)
    raise RuntimeError(f"Unsupported GRAPH_BACKEND={settings.graph_backend!r}")
--- a/backend/requirements.txt
+++ b/backend/requirements.txt
@@ -0,0 +1,5 @@
 fastapi
 uvicorn[standard]
 rdflib
 pydantic-settings
 httpx
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -1,9 +1,60 @@
 services:
-  app:
+  backend:
-    build: .
+    build: ./backend
    ports:
      - "8000:8000"
    environment:
      - GRAPH_BACKEND=${GRAPH_BACKEND:-rdflib}
      - TTL_PATH=${TTL_PATH:-/data/o3po.ttl}
      - INCLUDE_BNODES=${INCLUDE_BNODES:-false}
      - MAX_TRIPLES
      - CORS_ORIGINS=${CORS_ORIGINS:-http://localhost:5173}
      - SPARQL_HOST=${SPARQL_HOST:-http://anzograph:8080}
      - SPARQL_ENDPOINT
      - SPARQL_USER=${SPARQL_USER:-admin}
      - SPARQL_PASS=${SPARQL_PASS:-Passw0rd1}
      - SPARQL_DATA_FILE=${SPARQL_DATA_FILE:-file:///opt/shared-files/o3po.ttl}
      - SPARQL_GRAPH_IRI
      - SPARQL_LOAD_ON_START=${SPARQL_LOAD_ON_START:-false}
      - SPARQL_CLEAR_ON_START=${SPARQL_CLEAR_ON_START:-false}
      - SPARQL_TIMEOUT_S=${SPARQL_TIMEOUT_S:-300}
      - SPARQL_READY_RETRIES=${SPARQL_READY_RETRIES:-30}
      - SPARQL_READY_DELAY_S=${SPARQL_READY_DELAY_S:-4}
      - SPARQL_READY_TIMEOUT_S=${SPARQL_READY_TIMEOUT_S:-10}
      - COMBINE_OWL_IMPORTS_ON_START=${COMBINE_OWL_IMPORTS_ON_START:-false}
      - COMBINE_ENTRY_LOCATION
      - COMBINE_OUTPUT_LOCATION
      - COMBINE_OUTPUT_NAME
      - COMBINE_FORCE=${COMBINE_FORCE:-false}
    volumes:
      - ./backend:/app
      - ./data:/data:Z
    command: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
    healthcheck:
      test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/api/health').read()"]
      interval: 5s
      timeout: 3s
      retries: 60
  frontend:
    build: ./frontend
    ports:
      - "5173:5173"
-    command: sh -c "npx tsx scripts/compute_layout.ts && npm run dev -- --host"
+    environment:
      - VITE_BACKEND_URL=${VITE_BACKEND_URL:-http://backend:8000}
    volumes:
-      - .:/app
+      - ./frontend:/app
-      - /app/node_modules # Prevents local node_modules from overwriting the container's
+      - /app/node_modules
    depends_on:
      - backend
    # Docker Compose v1 doesn't support depends_on:condition. Do an explicit wait here.
    command: sh -c "until wget -qO- http://backend:8000/api/health >/dev/null 2>&1; do echo 'waiting for backend...'; sleep 1; done; npm run dev -- --host --port 5173"
  anzograph:
    image: cambridgesemantics/anzograph:latest
    container_name: anzograph
    ports:
      - "8080:8080"
      - "8443:8443"
    volumes:
      - ./data:/opt/shared-files:Z
--- a/frontend/Dockerfile
+++ b/frontend/Dockerfile
@@ -2,6 +2,8 @@ FROM node:lts-alpine
 WORKDIR /app
 EXPOSE 5173
 # Copy dependency definitions
 COPY package*.json ./
@@ -11,8 +13,5 @@ RUN npm install
 # Copy the rest of the source code
 COPY . .
-# Expose the standard Vite port
+# Start the dev server with --host for external access
-EXPOSE 5173
+CMD ["npm", "run", "dev", "--", "--host", "--port", "5173"]
 # Compute layout, then start the dev server with --host for external access
 CMD ["sh", "-c", "npm run layout && npm run dev -- --host"]
--- a/frontend/index.html
+++ b/frontend/index.html
--- a/frontend/package-lock.json
+++ b/frontend/package-lock.json
--- a/frontend/package.json
+++ b/frontend/package.json
--- a/frontend/public/edges.csv
+++ b/frontend/public/edges.csv
--- a/frontend/public/node_positions.csv
+++ b/frontend/public/node_positions.csv
--- a/frontend/scripts/compute_layout.ts
+++ b/frontend/scripts/compute_layout.ts
--- a/frontend/scripts/generate_tree.ts
+++ b/frontend/scripts/generate_tree.ts
--- a/frontend/src/App.tsx
+++ b/frontend/src/App.tsx
@@ -1,10 +1,25 @@
 import { useEffect, useRef, useState } from "react";
 import { Renderer } from "./renderer";
 function sleep(ms: number): Promise<void> {
  return new Promise((r) => setTimeout(r, ms));
 }
 type GraphMeta = {
  backend?: string;
  ttl_path?: string | null;
  sparql_endpoint?: string | null;
  include_bnodes?: boolean;
  node_limit?: number;
  edge_limit?: number;
  nodes?: number;
  edges?: number;
 };
 export default function App() {
  const canvasRef = useRef<HTMLCanvasElement>(null);
  const rendererRef = useRef<Renderer | null>(null);
-  const [status, setStatus] = useState("Loading node positions…");
+  const [status, setStatus] = useState("Waiting for backend…");
  const [nodeCount, setNodeCount] = useState(0);
  const [stats, setStats] = useState({
    fps: 0,
@@ -14,11 +29,15 @@ export default function App() {
    ptSize: 0,
  });
  const [error, setError] = useState("");
-  const [hoveredNode, setHoveredNode] = useState<{ x: number; y: number; screenX: number; screenY: number } | null>(null);
+  const [hoveredNode, setHoveredNode] = useState<{ x: number; y: number; screenX: number; screenY: number; label?: string; iri?: string } | null>(null);
  const [selectedNodes, setSelectedNodes] = useState<Set<number>>(new Set());
  const [backendStats, setBackendStats] = useState<{ nodes: number; edges: number; backend?: string } | null>(null);
  const graphMetaRef = useRef<GraphMeta | null>(null);
  const neighborsReqIdRef = useRef(0);
  // Store mouse position in a ref so it can be accessed in render loop without re-renders
  const mousePos = useRef({ x: 0, y: 0 });
  const nodesRef = useRef<any[]>([]);
  useEffect(() => {
    const canvas = canvasRef.current;
@@ -35,55 +54,82 @@ export default function App() {
    let cancelled = false;
    // Fetch CSVs, parse, and init renderer
    (async () => {
      try {
-        setStatus("Fetching data files…");
+        // Wait for backend (docker-compose also gates startup via healthcheck, but this
-        const [nodesResponse, edgesResponse] = await Promise.all([
+        // handles running the frontend standalone).
-          fetch("/node_positions.csv"),
+        const deadline = performance.now() + 180_000;
-          fetch("/edges.csv"),
+        let attempt = 0;
-        ]);
+        while (performance.now() < deadline) {
-        if (!nodesResponse.ok) throw new Error(`Failed to fetch nodes: ${nodesResponse.status}`);
+          attempt++;
-        if (!edgesResponse.ok) throw new Error(`Failed to fetch edges: ${edgesResponse.status}`);
+          setStatus(`Waiting for backend… (attempt ${attempt})`);
          try {
            const res = await fetch("/api/health");
            if (res.ok) break;
          } catch {
            // ignore and retry
          }
          await sleep(1000);
          if (cancelled) return;
        }
-        const [nodesText, edgesText] = await Promise.all([
+        setStatus("Fetching graph…");
-          nodesResponse.text(),
+        const graphRes = await fetch("/api/graph");
-          edgesResponse.text(),
+        if (!graphRes.ok) throw new Error(`Failed to fetch graph: ${graphRes.status}`);
-        ]);
+        const graph = await graphRes.json();
        if (cancelled) return;
-        setStatus("Parsing positions…");
+        const nodes = Array.isArray(graph.nodes) ? graph.nodes : [];
-        const nodeLines = nodesText.split("\n").slice(1).filter(l => l.trim().length > 0);
+        const edges = Array.isArray(graph.edges) ? graph.edges : [];
-        const count = nodeLines.length;
+        const meta = graph.meta || null;
        const count = nodes.length;
        nodesRef.current = nodes;
        graphMetaRef.current = meta && typeof meta === "object" ? (meta as GraphMeta) : null;
        // Build positions from backend-provided node coordinates.
        setStatus("Preparing buffers…");
        const xs = new Float32Array(count);
        const ys = new Float32Array(count);
        for (let i = 0; i < count; i++) {
          const nx = nodes[i]?.x;
          const ny = nodes[i]?.y;
          xs[i] = typeof nx === "number" ? nx : 0;
          ys[i] = typeof ny === "number" ? ny : 0;
        }
        const vertexIds = new Uint32Array(count);
        for (let i = 0; i < count; i++) {
-          const parts = nodeLines[i].split(",");
+          const id = nodes[i]?.id;
-          vertexIds[i] = parseInt(parts[0], 10);
+          vertexIds[i] = typeof id === "number" ? id >>> 0 : i;
          xs[i] = parseFloat(parts[1]);
          ys[i] = parseFloat(parts[2]);
        }
-        setStatus("Parsing edges…");
+        // Build edges as vertex-id pairs.
-        const edgeLines = edgesText.split("\n").slice(1).filter(l => l.trim().length > 0);
+        const edgeData = new Uint32Array(edges.length * 2);
-        const edgeData = new Uint32Array(edgeLines.length * 2);
+        for (let i = 0; i < edges.length; i++) {
-        for (let i = 0; i < edgeLines.length; i++) {
+          const s = edges[i]?.source;
-          const parts = edgeLines[i].split(",");
+          const t = edges[i]?.target;
-          edgeData[i * 2] = parseInt(parts[0], 10);
+          edgeData[i * 2] = typeof s === "number" ? s >>> 0 : 0;
-          edgeData[i * 2 + 1] = parseInt(parts[1], 10);
+          edgeData[i * 2 + 1] = typeof t === "number" ? t >>> 0 : 0;
        }
-        if (cancelled) return;
+        // Use /api/graph meta; don't do a second expensive backend call.
        if (meta && typeof meta.nodes === "number" && typeof meta.edges === "number") {
          setBackendStats({
            nodes: meta.nodes,
            edges: meta.edges,
            backend: typeof meta.backend === "string" ? meta.backend : undefined,
          });
        } else {
          setBackendStats({ nodes: nodes.length, edges: edges.length });
        }
        setStatus("Building spatial index…");
-        await new Promise(r => setTimeout(r, 0));
+        await new Promise((r) => setTimeout(r, 0));
        const buildMs = renderer.init(xs, ys, vertexIds, edgeData);
        setNodeCount(renderer.getNodeCount());
        setStatus("");
-        console.log(`Init complete: ${count.toLocaleString()} nodes, ${edgeLines.length.toLocaleString()} edges in ${buildMs.toFixed(0)}ms`);
+        console.log(`Init complete: ${count.toLocaleString()} nodes, ${edges.length.toLocaleString()} edges in ${buildMs.toFixed(0)}ms`);
      } catch (e) {
        if (!cancelled) {
          setError(e instanceof Error ? e.message : String(e));
@@ -167,9 +213,18 @@ export default function App() {
      frameCount++;
      // Find hovered node using quadtree
-      const node = renderer.findNodeAt(mousePos.current.x, mousePos.current.y);
+      const hit = renderer.findNodeIndexAt(mousePos.current.x, mousePos.current.y);
-      if (node) {
+      if (hit) {
-        setHoveredNode({ ...node, screenX: mousePos.current.x, screenY: mousePos.current.y });
+        const origIdx = renderer.sortedIndexToOriginalIndex(hit.index);
        const meta = origIdx === null ? null : nodesRef.current[origIdx];
        setHoveredNode({
          x: hit.x,
          y: hit.y,
          screenX: mousePos.current.x,
          screenY: mousePos.current.y,
          label: meta && typeof meta.label === "string" ? meta.label : undefined,
          iri: meta && typeof meta.iri === "string" ? meta.iri : undefined,
        });
      } else {
        setHoveredNode(null);
      }
@@ -205,9 +260,72 @@ export default function App() {
  // Sync selection state to renderer
  useEffect(() => {
-    if (rendererRef.current) {
+    const renderer = rendererRef.current;
-      rendererRef.current.updateSelection(selectedNodes);
+    if (!renderer) return;
    // Optimistically reflect selection immediately; neighbors will be filled in by backend.
    renderer.updateSelection(selectedNodes, new Set());
    // Invalidate any in-flight neighbor request for the previous selection.
    const reqId = ++neighborsReqIdRef.current;
    // Convert selected sorted indices to backend node IDs (graph-export dense IDs).
    const selectedIds: number[] = [];
    for (const sortedIdx of selectedNodes) {
      const origIdx = renderer.sortedIndexToOriginalIndex(sortedIdx);
      if (origIdx === null) continue;
      const nodeId = nodesRef.current?.[origIdx]?.id;
      if (typeof nodeId === "number") selectedIds.push(nodeId);
    }
    if (selectedIds.length === 0) {
      return;
    }
    // Always send the full current selection list; backend returns the merged neighbor set.
    const ctrl = new AbortController();
    (async () => {
      try {
        const meta = graphMetaRef.current;
        const body = {
          selected_ids: selectedIds,
          node_limit: typeof meta?.node_limit === "number" ? meta.node_limit : undefined,
          edge_limit: typeof meta?.edge_limit === "number" ? meta.edge_limit : undefined,
        };
        const res = await fetch("/api/neighbors", {
          method: "POST",
          headers: { "content-type": "application/json" },
          body: JSON.stringify(body),
          signal: ctrl.signal,
        });
        if (!res.ok) throw new Error(`POST /api/neighbors failed: ${res.status}`);
        const data = await res.json();
        if (ctrl.signal.aborted) return;
        if (reqId !== neighborsReqIdRef.current) return;
        const neighborIds: unknown = data?.neighbor_ids;
        const neighborSorted = new Set<number>();
        if (Array.isArray(neighborIds)) {
          for (const id of neighborIds) {
            if (typeof id !== "number") continue;
            const sorted = renderer.vertexIdToSortedIndexOrNull(id);
            if (sorted === null) continue;
            if (!selectedNodes.has(sorted)) neighborSorted.add(sorted);
          }
        }
        renderer.updateSelection(selectedNodes, neighborSorted);
      } catch (e) {
        if (ctrl.signal.aborted) return;
        console.warn(e);
        // Keep the UI usable even if neighbors fail to load.
        renderer.updateSelection(selectedNodes, new Set());
      }
    })();
    return () => ctrl.abort();
  }, [selectedNodes]);
  return (
@@ -279,6 +397,11 @@ export default function App() {
            <div>Zoom: {stats.zoom < 0.01 ? stats.zoom.toExponential(2) : stats.zoom.toFixed(2)} px/unit</div>
            <div>Pt Size: {stats.ptSize.toFixed(1)}px</div>
            <div style={{ color: "#f80" }}>Selected: {selectedNodes.size}</div>
            {backendStats && (
              <div style={{ color: "#8f8" }}>
                Backend{backendStats.backend ? ` (${backendStats.backend})` : ""}: {backendStats.nodes.toLocaleString()} nodes, {backendStats.edges.toLocaleString()} edges
              </div>
            )}
          </div>
          <div
            style={{
@@ -316,7 +439,12 @@ export default function App() {
                boxShadow: "0 2px 8px rgba(0,0,0,0.5)",
              }}
            >
-              ({hoveredNode.x.toFixed(2)}, {hoveredNode.y.toFixed(2)})
+              <div style={{ color: "#0ff" }}>
                {hoveredNode.label || hoveredNode.iri || "(unknown)"}
              </div>
              <div style={{ color: "#688" }}>
                ({hoveredNode.x.toFixed(2)}, {hoveredNode.y.toFixed(2)})
              </div>
            </div>
          )}
        </>
--- a/frontend/src/index.css
+++ b/frontend/src/index.css
--- a/frontend/src/main.tsx
+++ b/frontend/src/main.tsx
--- a/frontend/src/quadtree.ts
+++ b/frontend/src/quadtree.ts
--- a/frontend/src/renderer.ts
+++ b/frontend/src/renderer.ts
@@ -80,9 +80,11 @@ export class Renderer {
  // Data
  private leaves: Leaf[] = [];
  private sorted: Float32Array = new Float32Array(0);
  // order[sortedIdx] = originalIdx (original ordering matches input arrays)
  private sortedToOriginal: Uint32Array = new Uint32Array(0);
  private vertexIdToSortedIndex: Map<number, number> = new Map();
  private nodeCount = 0;
  private edgeCount = 0;
  private neighborMap: Map<number, number[]> = new Map();
  private leafEdgeStarts: Uint32Array = new Uint32Array(0);
  private leafEdgeCounts: Uint32Array = new Uint32Array(0);
  private maxPtSize = 256;
@@ -202,6 +204,7 @@ export class Renderer {
    const { sorted, leaves, order } = buildSpatialIndex(xs, ys);
    this.leaves = leaves;
    this.sorted = sorted;
    this.sortedToOriginal = order;
    // Pre-allocate arrays for render loop (zero-allocation rendering)
    this.visibleLeafIndices = new Uint32Array(leaves.length);
@@ -226,6 +229,13 @@ export class Renderer {
      originalToSorted[order[i]] = i;
    }
    // Build vertex ID → sorted index mapping (used by backend-driven neighbor highlighting)
    const vertexIdToSortedIndex = new Map<number, number>();
    for (let i = 0; i < count; i++) {
      vertexIdToSortedIndex.set(vertexIds[i], originalToSorted[i]);
    }
    this.vertexIdToSortedIndex = vertexIdToSortedIndex;
    // Remap edges from vertex IDs to sorted indices
    const lineIndices = new Uint32Array(edgeCount * 2);
    let validEdges = 0;
@@ -241,18 +251,6 @@ export class Renderer {
    }
    this.edgeCount = validEdges;
    // Build per-node neighbor list from edges for selection queries
    const neighborMap = new Map<number, number[]>();
    for (let i = 0; i < validEdges; i++) {
      const src = lineIndices[i * 2];
      const dst = lineIndices[i * 2 + 1];
      if (!neighborMap.has(src)) neighborMap.set(src, []);
      neighborMap.get(src)!.push(dst);
      if (!neighborMap.has(dst)) neighborMap.set(dst, []);
      neighborMap.get(dst)!.push(src);
    }
    this.neighborMap = neighborMap;
    // Build per-leaf edge index for efficient visible-only edge drawing
    // Find which leaf each sorted index belongs to
    const nodeToLeaf = new Uint32Array(count);
@@ -331,6 +329,28 @@ export class Renderer {
    return this.nodeCount;
  }
  /**
   * Map a sorted buffer index (what findNodeIndexAt returns) back to the original
   * index in the input arrays used to initialize the renderer.
   */
  sortedIndexToOriginalIndex(sortedIndex: number): number | null {
    if (
      sortedIndex < 0 ||
      sortedIndex >= this.sortedToOriginal.length
    ) {
      return null;
    }
    return this.sortedToOriginal[sortedIndex];
  }
  /**
   * Convert a backend node ID (node.id from /api/graph) to a sorted index used by the renderer.
   */
  vertexIdToSortedIndexOrNull(vertexId: number): number | null {
    const idx = this.vertexIdToSortedIndex.get(vertexId);
    return typeof idx === "number" ? idx : null;
  }
  /**
   * Convert screen coordinates (CSS pixels) to world coordinates.
   */
@@ -412,10 +432,10 @@ export class Renderer {
  /**
   * Update the selection buffer with the given set of node indices.
-   * Also computes neighbors of selected nodes.
+   * Neighbor indices are provided by the backend (SPARQL query) and uploaded separately.
-   * Call this whenever React's selection state changes.
+   * Call this whenever selection or backend neighbor results change.
   */
-  updateSelection(selectedIndices: Set<number>): void {
+  updateSelection(selectedIndices: Set<number>, neighborIndices: Set<number> = new Set()): void {
    const gl = this.gl;
    // Upload selected indices
@@ -425,23 +445,11 @@ export class Renderer {
    gl.bufferData(gl.ELEMENT_ARRAY_BUFFER, indices, gl.DYNAMIC_DRAW);
    gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, null);
    // Compute neighbors of selected nodes (excluding already selected)
    const neighborSet = new Set<number>();
    for (const nodeIdx of selectedIndices) {
      const nodeNeighbors = this.neighborMap.get(nodeIdx);
      if (!nodeNeighbors) continue;
      for (const n of nodeNeighbors) {
        if (!selectedIndices.has(n)) {
          neighborSet.add(n);
        }
      }
    }
    // Upload neighbor indices
-    const neighborIndices = new Uint32Array(neighborSet);
+    const neighborIndexArray = new Uint32Array(neighborIndices);
-    this.neighborCount = neighborIndices.length;
+    this.neighborCount = neighborIndexArray.length;
    gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, this.neighborIbo);
-    gl.bufferData(gl.ELEMENT_ARRAY_BUFFER, neighborIndices, gl.DYNAMIC_DRAW);
+    gl.bufferData(gl.ELEMENT_ARRAY_BUFFER, neighborIndexArray, gl.DYNAMIC_DRAW);
    gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, null);
  }
--- a/frontend/src/utils/cn.ts
+++ b/frontend/src/utils/cn.ts
--- a/frontend/tsconfig.json
+++ b/frontend/tsconfig.json
--- a/frontend/vite.config.ts
+++ b/frontend/vite.config.ts
@@ -16,4 +16,10 @@ export default defineConfig({
      "@": path.resolve(__dirname, "src"),
    },
  },
  server: {
    proxy: {
      // Backend is reachable as http://backend:8000 inside docker-compose; localhost outside.
      "/api": process.env.VITE_BACKEND_URL || "http://localhost:8000",
    },
  },
 });
--- a/public/edges.csv
+++ b/public/edges.csv
--- a/public/node_positions.csv
+++ b/public/node_positions.csv
Author	SHA1	Message	Date
Oxy8	a75b5b93da	Import Solver + neighbors via sparql query	2026-03-04 13:49:14 -03:00
Oxy8	d4bfa5f064	Reorganiza backend	2026-03-02 17:33:45 -03:00
Oxy8	bba0ae887d	Graph access via SPARQL	2026-03-02 16:27:28 -03:00
Oxy8	bf03d333f9	backend	2026-03-02 14:32:42 -03:00