Reorganiza backend
This commit is contained in:
183
backend/app/README.md
Normal file
183
backend/app/README.md
Normal file
@@ -0,0 +1,183 @@
|
|||||||
|
# Backend App (`backend/app`)
|
||||||
|
|
||||||
|
This folder contains the FastAPI backend for `visualizador_instanciados`.
|
||||||
|
|
||||||
|
The backend can execute SPARQL queries in two interchangeable ways:
|
||||||
|
|
||||||
|
1. **`GRAPH_BACKEND=rdflib`**: parse a Turtle file into an in-memory RDFLib `Graph` and run SPARQL queries locally.
|
||||||
|
2. **`GRAPH_BACKEND=anzograph`**: run SPARQL queries against an AnzoGraph SPARQL endpoint over HTTP (optionally `LOAD` a TTL on startup).
|
||||||
|
|
||||||
|
Callers (frontend or other clients) interact with a single API surface (`/api/*`) and do not need to know which backend is configured.
|
||||||
|
|
||||||
|
## Files
|
||||||
|
|
||||||
|
- `main.py`
|
||||||
|
- FastAPI app setup, startup/shutdown (`lifespan`), and HTTP endpoints.
|
||||||
|
- `settings.py`
|
||||||
|
- Env-driven configuration (`pydantic-settings`).
|
||||||
|
- `sparql_engine.py`
|
||||||
|
- Backend-agnostic SPARQL execution layer:
|
||||||
|
- `RdflibEngine`: `Graph.query(...)` + SPARQL JSON serialization.
|
||||||
|
- `AnzoGraphEngine`: HTTP POST to `/sparql` with Basic auth + readiness gate.
|
||||||
|
- `create_sparql_engine(settings)` chooses the engine based on `GRAPH_BACKEND`.
|
||||||
|
- `graph_export.py`
|
||||||
|
- Shared helpers to:
|
||||||
|
- build the snapshot SPARQL query used for edge retrieval
|
||||||
|
- map SPARQL JSON bindings to `{nodes, edges}`.
|
||||||
|
- `models.py`
|
||||||
|
- Pydantic response/request models:
|
||||||
|
- `Node`, `Edge`, `GraphResponse`, `StatsResponse`, etc.
|
||||||
|
- `rdf_store.py`
|
||||||
|
- A local parsed representation (dense IDs + neighbor-ish data) built only in `GRAPH_BACKEND=rdflib`.
|
||||||
|
- Used by `/api/nodes`, `/api/edges`, and `rdflib`-mode `/api/stats`.
|
||||||
|
- `pipelines/graph_snapshot.py`
|
||||||
|
- Pipeline used by `/api/graph` to return a `{nodes, edges}` snapshot via SPARQL (works for both RDFLib and AnzoGraph).
|
||||||
|
- `pipelines/snapshot_service.py`
|
||||||
|
- Snapshot cache layer used by `/api/graph` and `/api/stats` so the backend doesn't run expensive SPARQL twice.
|
||||||
|
- `pipelines/subclass_labels.py`
|
||||||
|
- Pipeline to extract `rdfs:subClassOf` entities and aligned `rdfs:label` list.
|
||||||
|
|
||||||
|
## Runtime Flow
|
||||||
|
|
||||||
|
On startup (FastAPI lifespan):
|
||||||
|
|
||||||
|
1. `create_sparql_engine(settings)` selects and starts a SPARQL engine.
|
||||||
|
2. The engine is stored at `app.state.sparql`.
|
||||||
|
3. If `GRAPH_BACKEND=rdflib`, `RDFStore` is also built from the already-loaded RDFLib graph and stored at `app.state.store`.
|
||||||
|
|
||||||
|
On shutdown:
|
||||||
|
|
||||||
|
- `app.state.sparql.shutdown()` is called to close the HTTP client (AnzoGraph mode) or no-op (RDFLib mode).
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
Most configuration is intended to be provided via container environment variables (see repo root `.env` and `docker-compose.yml`).
|
||||||
|
|
||||||
|
Core:
|
||||||
|
|
||||||
|
- `GRAPH_BACKEND`: `rdflib` or `anzograph`
|
||||||
|
- `INCLUDE_BNODES`: `true`/`false`
|
||||||
|
- `CORS_ORIGINS`: comma-separated list or `*`
|
||||||
|
|
||||||
|
RDFLib mode:
|
||||||
|
|
||||||
|
- `TTL_PATH`: path inside the backend container to a `.ttl` file (example: `/data/o3po.ttl`)
|
||||||
|
- `MAX_TRIPLES`: optional int; if set, stops parsing after this many triples
|
||||||
|
|
||||||
|
AnzoGraph mode:
|
||||||
|
|
||||||
|
- `SPARQL_HOST`: base host (example: `http://anzograph:8080`)
|
||||||
|
- `SPARQL_ENDPOINT`: optional full endpoint; if set, overrides `${SPARQL_HOST}/sparql`
|
||||||
|
- `SPARQL_USER`, `SPARQL_PASS`: Basic auth credentials
|
||||||
|
- `SPARQL_DATA_FILE`: file URI as seen by the **AnzoGraph container** (example: `file:///opt/shared-files/o3po.ttl`)
|
||||||
|
- `SPARQL_GRAPH_IRI`: optional graph IRI for `LOAD ... INTO GRAPH <...>`
|
||||||
|
- `SPARQL_LOAD_ON_START`: `true` to execute `LOAD <SPARQL_DATA_FILE>` during startup
|
||||||
|
- `SPARQL_CLEAR_ON_START`: `true` to execute `CLEAR ALL` during startup (dangerous)
|
||||||
|
- `SPARQL_TIMEOUT_S`: request timeout for normal SPARQL requests
|
||||||
|
- `SPARQL_READY_RETRIES`, `SPARQL_READY_DELAY_S`, `SPARQL_READY_TIMEOUT_S`: readiness gate parameters
|
||||||
|
|
||||||
|
## AnzoGraph Readiness Gate
|
||||||
|
|
||||||
|
`AnzoGraphEngine` does not assume "container started" means "SPARQL works".
|
||||||
|
It waits for a smoke-test POST:
|
||||||
|
|
||||||
|
- Method: `POST ${SPARQL_ENDPOINT}`
|
||||||
|
- Headers:
|
||||||
|
- `Content-Type: application/x-www-form-urlencoded`
|
||||||
|
- `Accept: application/sparql-results+json`
|
||||||
|
- `Authorization: Basic ...` (if configured)
|
||||||
|
- Body: `query=ASK WHERE { ?s ?p ?o }`
|
||||||
|
- Success condition: HTTP 2xx and response parses as JSON
|
||||||
|
|
||||||
|
This matches the behavior described in `docs/anzograph-readiness-julia.md`.
|
||||||
|
|
||||||
|
## API Endpoints
|
||||||
|
|
||||||
|
- `GET /api/health`
|
||||||
|
- Returns `{ "status": "ok" }`.
|
||||||
|
- `GET /api/stats`
|
||||||
|
- Returns counts for the same snapshot used by `/api/graph` (via the snapshot cache).
|
||||||
|
- `POST /api/sparql`
|
||||||
|
- Body: `{ "query": "<SPARQL SELECT/ASK>" }`
|
||||||
|
- Returns SPARQL JSON results as-is.
|
||||||
|
- Notes:
|
||||||
|
- This endpoint is intended for **SELECT/ASK returning SPARQL-JSON**.
|
||||||
|
- SPARQL UPDATE is not exposed here (AnzoGraph `LOAD`/`CLEAR` are handled internally during startup).
|
||||||
|
- `GET /api/graph?node_limit=...&edge_limit=...`
|
||||||
|
- Returns a graph snapshot as `{ nodes: [...], edges: [...] }`.
|
||||||
|
- Implemented as a SPARQL edge query + mapping in `pipelines/graph_snapshot.py`.
|
||||||
|
- `GET /api/nodes`, `GET /api/edges`
|
||||||
|
- Only available in `GRAPH_BACKEND=rdflib` (these use `RDFStore`'s dense ID tables).
|
||||||
|
|
||||||
|
## Data Contract
|
||||||
|
|
||||||
|
### Node
|
||||||
|
|
||||||
|
Returned in `nodes[]` (dense IDs; suitable for indexing in typed arrays):
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": 0,
|
||||||
|
"termType": "uri",
|
||||||
|
"iri": "http://example.org/Thing",
|
||||||
|
"label": null,
|
||||||
|
"x": 0.0,
|
||||||
|
"y": 0.0
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- `id`: integer dense node ID used in edges
|
||||||
|
- `termType`: `"uri"` or `"bnode"`
|
||||||
|
- `iri`: URI string; blank nodes are normalized to `_:<id>`
|
||||||
|
- `label`: currently `null` in `/api/graph` snapshots (pipelines can be used to populate later)
|
||||||
|
- `x`/`y`: world-space coordinates for rendering (currently a deterministic spiral layout)
|
||||||
|
|
||||||
|
### Edge
|
||||||
|
|
||||||
|
Returned in `edges[]`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"source": 0,
|
||||||
|
"target": 12,
|
||||||
|
"predicate": "http://www.w3.org/2000/01/rdf-schema#subClassOf"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- `source`/`target`: dense node IDs (indexes into `nodes[]`)
|
||||||
|
- `predicate`: predicate IRI string
|
||||||
|
|
||||||
|
## Snapshot Query (`/api/graph`)
|
||||||
|
|
||||||
|
`/api/graph` uses a SPARQL query that:
|
||||||
|
|
||||||
|
- selects triples `?s ?p ?o`
|
||||||
|
- excludes literal objects (`FILTER(!isLiteral(?o))`)
|
||||||
|
- excludes `rdfs:label`, `skos:prefLabel`, and `skos:altLabel` predicates
|
||||||
|
- optionally excludes blank nodes (unless `INCLUDE_BNODES=true`)
|
||||||
|
- applies `LIMIT edge_limit`
|
||||||
|
|
||||||
|
The result bindings are mapped to dense node IDs (first-seen order) and returned to the caller.
|
||||||
|
|
||||||
|
`/api/graph` also returns `meta` with snapshot counts and engine info so the frontend doesn't need to call `/api/stats`.
|
||||||
|
|
||||||
|
## Pipelines
|
||||||
|
|
||||||
|
### `pipelines/graph_snapshot.py`
|
||||||
|
|
||||||
|
`fetch_graph_snapshot(...)` is the main "export graph" pipeline used by `/api/graph`.
|
||||||
|
|
||||||
|
### `pipelines/subclass_labels.py`
|
||||||
|
|
||||||
|
`extract_subclass_entities_and_labels(...)`:
|
||||||
|
|
||||||
|
1. Queries all `rdfs:subClassOf` triples.
|
||||||
|
2. Builds a unique set of subjects+objects, then converts it to a deterministic list.
|
||||||
|
3. Queries `rdfs:label` for those entities and returns aligned lists:
|
||||||
|
- `entities[i]` corresponds to `labels[i]`.
|
||||||
|
|
||||||
|
## Notes / Tradeoffs
|
||||||
|
|
||||||
|
- `/api/graph` returns only nodes that appear in the returned edge result set. Nodes not referenced by those edges will not be present.
|
||||||
|
- RDFLib and AnzoGraph may differ in supported SPARQL features (vendor extensions, inference, performance), but the API surface is the same.
|
||||||
|
- `rdf_store.py` is currently only needed for `/api/nodes`, `/api/edges`, and rdflib-mode `/api/stats`. If you don't use those endpoints, it can be removed later.
|
||||||
@@ -5,10 +5,10 @@ from contextlib import asynccontextmanager
|
|||||||
from fastapi import FastAPI, HTTPException, Query
|
from fastapi import FastAPI, HTTPException, Query
|
||||||
from fastapi.middleware.cors import CORSMiddleware
|
from fastapi.middleware.cors import CORSMiddleware
|
||||||
|
|
||||||
from .graph_export import edge_retrieval_query, graph_from_sparql_bindings
|
|
||||||
from .models import EdgesResponse, GraphResponse, NodesResponse, SparqlQueryRequest, StatsResponse
|
from .models import EdgesResponse, GraphResponse, NodesResponse, SparqlQueryRequest, StatsResponse
|
||||||
|
from .pipelines.snapshot_service import GraphSnapshotService
|
||||||
from .rdf_store import RDFStore
|
from .rdf_store import RDFStore
|
||||||
from .sparql_engine import AnzoGraphEngine, RdflibEngine, SparqlEngine, create_sparql_engine
|
from .sparql_engine import RdflibEngine, SparqlEngine, create_sparql_engine
|
||||||
from .settings import Settings
|
from .settings import Settings
|
||||||
|
|
||||||
|
|
||||||
@@ -20,6 +20,7 @@ async def lifespan(app: FastAPI):
|
|||||||
sparql: SparqlEngine = create_sparql_engine(settings)
|
sparql: SparqlEngine = create_sparql_engine(settings)
|
||||||
await sparql.startup()
|
await sparql.startup()
|
||||||
app.state.sparql = sparql
|
app.state.sparql = sparql
|
||||||
|
app.state.snapshot_service = GraphSnapshotService(sparql=sparql, settings=settings)
|
||||||
|
|
||||||
# Only build node/edge tables when running in rdflib mode.
|
# Only build node/edge tables when running in rdflib mode.
|
||||||
if settings.graph_backend == "rdflib":
|
if settings.graph_backend == "rdflib":
|
||||||
@@ -59,70 +60,17 @@ def health() -> dict[str, str]:
|
|||||||
|
|
||||||
@app.get("/api/stats", response_model=StatsResponse)
|
@app.get("/api/stats", response_model=StatsResponse)
|
||||||
async def stats() -> StatsResponse:
|
async def stats() -> StatsResponse:
|
||||||
sparql: SparqlEngine = app.state.sparql
|
# Stats reflect exactly what we send to the frontend (/api/graph), not global graph size.
|
||||||
|
svc: GraphSnapshotService = app.state.snapshot_service
|
||||||
if settings.graph_backend == "rdflib":
|
snap = await svc.get(node_limit=50_000, edge_limit=100_000)
|
||||||
store: RDFStore = app.state.store
|
meta = snap.meta
|
||||||
return StatsResponse(
|
|
||||||
backend=sparql.name,
|
|
||||||
ttl_path=settings.ttl_path,
|
|
||||||
sparql_endpoint=None,
|
|
||||||
parsed_triples=store.parsed_triples,
|
|
||||||
nodes=store.node_count,
|
|
||||||
edges=store.edge_count,
|
|
||||||
)
|
|
||||||
|
|
||||||
# AnzoGraph: compute basic counts via SPARQL.
|
|
||||||
assert isinstance(sparql, AnzoGraphEngine)
|
|
||||||
|
|
||||||
def _count_from(result: dict, *, var: str = "count") -> int:
|
|
||||||
bindings = (((result.get("results") or {}).get("bindings")) or [])
|
|
||||||
if not bindings:
|
|
||||||
return 0
|
|
||||||
raw = bindings[0].get(var, {}).get("value")
|
|
||||||
try:
|
|
||||||
return int(raw)
|
|
||||||
except Exception:
|
|
||||||
return 0
|
|
||||||
|
|
||||||
bnode_filter = "" if settings.include_bnodes else "FILTER(!isBlank(?n))"
|
|
||||||
nodes_q = f"""
|
|
||||||
SELECT (COUNT(DISTINCT ?n) AS ?count)
|
|
||||||
WHERE {{
|
|
||||||
{{ ?n ?p ?o }} UNION {{ ?s ?p ?n }}
|
|
||||||
FILTER(!isLiteral(?n))
|
|
||||||
{bnode_filter}
|
|
||||||
}}
|
|
||||||
"""
|
|
||||||
triples_q = "SELECT (COUNT(*) AS ?count) WHERE { ?s ?p ?o }"
|
|
||||||
|
|
||||||
# Approximate "edges" similarly to our rdflib export: non-literal object, and skip label predicates.
|
|
||||||
edges_bnode_filter = "" if settings.include_bnodes else "FILTER(!isBlank(?s) && !isBlank(?o))"
|
|
||||||
edges_q = f"""
|
|
||||||
SELECT (COUNT(*) AS ?count)
|
|
||||||
WHERE {{
|
|
||||||
?s ?p ?o .
|
|
||||||
FILTER(!isLiteral(?o))
|
|
||||||
FILTER(?p NOT IN (
|
|
||||||
<http://www.w3.org/2000/01/rdf-schema#label>,
|
|
||||||
<http://www.w3.org/2004/02/skos/core#prefLabel>,
|
|
||||||
<http://www.w3.org/2004/02/skos/core#altLabel>
|
|
||||||
))
|
|
||||||
{edges_bnode_filter}
|
|
||||||
}}
|
|
||||||
"""
|
|
||||||
|
|
||||||
triples_res = await sparql.query_json(triples_q)
|
|
||||||
nodes_res = await sparql.query_json(nodes_q)
|
|
||||||
edges_res = await sparql.query_json(edges_q)
|
|
||||||
|
|
||||||
return StatsResponse(
|
return StatsResponse(
|
||||||
backend=sparql.name,
|
backend=meta.backend if meta else app.state.sparql.name,
|
||||||
ttl_path=settings.ttl_path,
|
ttl_path=meta.ttl_path if meta and meta.ttl_path else settings.ttl_path,
|
||||||
sparql_endpoint=settings.effective_sparql_endpoint(),
|
sparql_endpoint=meta.sparql_endpoint if meta else None,
|
||||||
parsed_triples=_count_from(triples_res),
|
parsed_triples=len(snap.edges),
|
||||||
nodes=_count_from(nodes_res),
|
nodes=len(snap.nodes),
|
||||||
edges=_count_from(edges_res),
|
edges=len(snap.edges),
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@@ -160,15 +108,5 @@ async def graph(
|
|||||||
node_limit: int = Query(default=50_000, ge=1, le=200_000),
|
node_limit: int = Query(default=50_000, ge=1, le=200_000),
|
||||||
edge_limit: int = Query(default=100_000, ge=1, le=500_000),
|
edge_limit: int = Query(default=100_000, ge=1, le=500_000),
|
||||||
) -> GraphResponse:
|
) -> GraphResponse:
|
||||||
sparql: SparqlEngine = app.state.sparql
|
svc: GraphSnapshotService = app.state.snapshot_service
|
||||||
|
return await svc.get(node_limit=node_limit, edge_limit=edge_limit)
|
||||||
# Use SPARQL for graph export in BOTH modes so callers don't care which backend is in use.
|
|
||||||
edges_q = edge_retrieval_query(edge_limit=edge_limit, include_bnodes=settings.include_bnodes)
|
|
||||||
res = await sparql.query_json(edges_q)
|
|
||||||
bindings = (((res.get("results") or {}).get("bindings")) or [])
|
|
||||||
nodes, edges = graph_from_sparql_bindings(
|
|
||||||
bindings,
|
|
||||||
node_limit=node_limit,
|
|
||||||
include_bnodes=settings.include_bnodes,
|
|
||||||
)
|
|
||||||
return GraphResponse(nodes=nodes, edges=edges)
|
|
||||||
|
|||||||
@@ -8,6 +8,9 @@ class Node(BaseModel):
|
|||||||
termType: str # "uri" | "bnode"
|
termType: str # "uri" | "bnode"
|
||||||
iri: str
|
iri: str
|
||||||
label: str | None = None
|
label: str | None = None
|
||||||
|
# Optional because /api/nodes (RDFStore) doesn't currently provide positions.
|
||||||
|
x: float | None = None
|
||||||
|
y: float | None = None
|
||||||
|
|
||||||
|
|
||||||
class Edge(BaseModel):
|
class Edge(BaseModel):
|
||||||
@@ -36,8 +39,19 @@ class EdgesResponse(BaseModel):
|
|||||||
|
|
||||||
|
|
||||||
class GraphResponse(BaseModel):
|
class GraphResponse(BaseModel):
|
||||||
|
class Meta(BaseModel):
|
||||||
|
backend: str
|
||||||
|
ttl_path: str | None = None
|
||||||
|
sparql_endpoint: str | None = None
|
||||||
|
include_bnodes: bool
|
||||||
|
node_limit: int
|
||||||
|
edge_limit: int
|
||||||
|
nodes: int
|
||||||
|
edges: int
|
||||||
|
|
||||||
nodes: list[Node]
|
nodes: list[Node]
|
||||||
edges: list[Edge]
|
edges: list[Edge]
|
||||||
|
meta: Meta | None = None
|
||||||
|
|
||||||
|
|
||||||
class SparqlQueryRequest(BaseModel):
|
class SparqlQueryRequest(BaseModel):
|
||||||
|
|||||||
46
backend/app/pipelines/graph_snapshot.py
Normal file
46
backend/app/pipelines/graph_snapshot.py
Normal file
@@ -0,0 +1,46 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from ..graph_export import edge_retrieval_query, graph_from_sparql_bindings
|
||||||
|
from ..models import GraphResponse
|
||||||
|
from ..sparql_engine import SparqlEngine
|
||||||
|
from ..settings import Settings
|
||||||
|
from .layout_spiral import spiral_positions
|
||||||
|
|
||||||
|
|
||||||
|
async def fetch_graph_snapshot(
|
||||||
|
sparql: SparqlEngine,
|
||||||
|
*,
|
||||||
|
settings: Settings,
|
||||||
|
node_limit: int,
|
||||||
|
edge_limit: int,
|
||||||
|
) -> GraphResponse:
|
||||||
|
"""
|
||||||
|
Fetch a graph snapshot (nodes + edges) via SPARQL, independent of whether the
|
||||||
|
underlying engine is RDFLib or AnzoGraph.
|
||||||
|
"""
|
||||||
|
edges_q = edge_retrieval_query(edge_limit=edge_limit, include_bnodes=settings.include_bnodes)
|
||||||
|
res = await sparql.query_json(edges_q)
|
||||||
|
bindings = (((res.get("results") or {}).get("bindings")) or [])
|
||||||
|
nodes, edges = graph_from_sparql_bindings(
|
||||||
|
bindings,
|
||||||
|
node_limit=node_limit,
|
||||||
|
include_bnodes=settings.include_bnodes,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Add positions so the frontend doesn't need to run a layout.
|
||||||
|
xs, ys = spiral_positions(len(nodes))
|
||||||
|
for i, node in enumerate(nodes):
|
||||||
|
node["x"] = float(xs[i])
|
||||||
|
node["y"] = float(ys[i])
|
||||||
|
|
||||||
|
meta = GraphResponse.Meta(
|
||||||
|
backend=sparql.name,
|
||||||
|
ttl_path=settings.ttl_path if settings.graph_backend == "rdflib" else None,
|
||||||
|
sparql_endpoint=settings.effective_sparql_endpoint() if settings.graph_backend == "anzograph" else None,
|
||||||
|
include_bnodes=settings.include_bnodes,
|
||||||
|
node_limit=node_limit,
|
||||||
|
edge_limit=edge_limit,
|
||||||
|
nodes=len(nodes),
|
||||||
|
edges=len(edges),
|
||||||
|
)
|
||||||
|
return GraphResponse(nodes=nodes, edges=edges, meta=meta)
|
||||||
30
backend/app/pipelines/layout_spiral.py
Normal file
30
backend/app/pipelines/layout_spiral.py
Normal file
@@ -0,0 +1,30 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import math
|
||||||
|
|
||||||
|
|
||||||
|
def spiral_positions(n: int, *, max_r: float = 5000.0) -> tuple[list[float], list[float]]:
|
||||||
|
"""
|
||||||
|
Deterministic "sunflower" (golden-angle) spiral layout.
|
||||||
|
|
||||||
|
This is intentionally simple and stable across runs:
|
||||||
|
- angle increments by the golden angle to avoid radial spokes
|
||||||
|
- radius grows with sqrt(i) to keep density roughly uniform over area
|
||||||
|
"""
|
||||||
|
if n <= 0:
|
||||||
|
return ([], [])
|
||||||
|
|
||||||
|
xs = [0.0] * n
|
||||||
|
ys = [0.0] * n
|
||||||
|
|
||||||
|
golden = math.pi * (3.0 - math.sqrt(5.0))
|
||||||
|
denom = float(max(1, n - 1))
|
||||||
|
|
||||||
|
for i in range(n):
|
||||||
|
t = i * golden
|
||||||
|
r = math.sqrt(i / denom) * max_r
|
||||||
|
xs[i] = r * math.cos(t)
|
||||||
|
ys[i] = r * math.sin(t)
|
||||||
|
|
||||||
|
return xs, ys
|
||||||
|
|
||||||
63
backend/app/pipelines/snapshot_service.py
Normal file
63
backend/app/pipelines/snapshot_service.py
Normal file
@@ -0,0 +1,63 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import asyncio
|
||||||
|
from dataclasses import dataclass
|
||||||
|
|
||||||
|
from ..models import GraphResponse
|
||||||
|
from ..sparql_engine import SparqlEngine
|
||||||
|
from ..settings import Settings
|
||||||
|
from .graph_snapshot import fetch_graph_snapshot
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class SnapshotKey:
|
||||||
|
node_limit: int
|
||||||
|
edge_limit: int
|
||||||
|
include_bnodes: bool
|
||||||
|
|
||||||
|
|
||||||
|
class GraphSnapshotService:
|
||||||
|
"""
|
||||||
|
Caches graph snapshots so the backend doesn't re-run expensive SPARQL for stats/graph.
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(self, *, sparql: SparqlEngine, settings: Settings):
|
||||||
|
self._sparql = sparql
|
||||||
|
self._settings = settings
|
||||||
|
|
||||||
|
self._cache: dict[SnapshotKey, GraphResponse] = {}
|
||||||
|
self._locks: dict[SnapshotKey, asyncio.Lock] = {}
|
||||||
|
self._global_lock = asyncio.Lock()
|
||||||
|
|
||||||
|
async def get(self, *, node_limit: int, edge_limit: int) -> GraphResponse:
|
||||||
|
key = SnapshotKey(
|
||||||
|
node_limit=node_limit,
|
||||||
|
edge_limit=edge_limit,
|
||||||
|
include_bnodes=self._settings.include_bnodes,
|
||||||
|
)
|
||||||
|
|
||||||
|
cached = self._cache.get(key)
|
||||||
|
if cached is not None:
|
||||||
|
return cached
|
||||||
|
|
||||||
|
# Create/get a per-key lock under a global lock to avoid races.
|
||||||
|
async with self._global_lock:
|
||||||
|
lock = self._locks.get(key)
|
||||||
|
if lock is None:
|
||||||
|
lock = asyncio.Lock()
|
||||||
|
self._locks[key] = lock
|
||||||
|
|
||||||
|
async with lock:
|
||||||
|
cached2 = self._cache.get(key)
|
||||||
|
if cached2 is not None:
|
||||||
|
return cached2
|
||||||
|
|
||||||
|
snapshot = await fetch_graph_snapshot(
|
||||||
|
self._sparql,
|
||||||
|
settings=self._settings,
|
||||||
|
node_limit=node_limit,
|
||||||
|
edge_limit=edge_limit,
|
||||||
|
)
|
||||||
|
self._cache[key] = snapshot
|
||||||
|
return snapshot
|
||||||
|
|
||||||
@@ -20,12 +20,16 @@ services:
|
|||||||
- SPARQL_TIMEOUT_S=${SPARQL_TIMEOUT_S:-300}
|
- SPARQL_TIMEOUT_S=${SPARQL_TIMEOUT_S:-300}
|
||||||
- SPARQL_READY_RETRIES=${SPARQL_READY_RETRIES:-30}
|
- SPARQL_READY_RETRIES=${SPARQL_READY_RETRIES:-30}
|
||||||
- SPARQL_READY_DELAY_S=${SPARQL_READY_DELAY_S:-4}
|
- SPARQL_READY_DELAY_S=${SPARQL_READY_DELAY_S:-4}
|
||||||
|
- SPARQL_READY_TIMEOUT_S=${SPARQL_READY_TIMEOUT_S:-10}
|
||||||
volumes:
|
volumes:
|
||||||
- ./backend:/app
|
- ./backend:/app
|
||||||
- ./data:/data:ro
|
- ./data:/data:ro
|
||||||
command: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
|
command: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
|
||||||
depends_on:
|
healthcheck:
|
||||||
- anzograph
|
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/api/health').read()"]
|
||||||
|
interval: 5s
|
||||||
|
timeout: 3s
|
||||||
|
retries: 60
|
||||||
|
|
||||||
frontend:
|
frontend:
|
||||||
build: ./frontend
|
build: ./frontend
|
||||||
@@ -38,6 +42,8 @@ services:
|
|||||||
- /app/node_modules
|
- /app/node_modules
|
||||||
depends_on:
|
depends_on:
|
||||||
- backend
|
- backend
|
||||||
|
# Docker Compose v1 doesn't support depends_on:condition. Do an explicit wait here.
|
||||||
|
command: sh -c "until wget -qO- http://backend:8000/api/health >/dev/null 2>&1; do echo 'waiting for backend...'; sleep 1; done; npm run dev -- --host --port 5173"
|
||||||
|
|
||||||
anzograph:
|
anzograph:
|
||||||
image: cambridgesemantics/anzograph:latest
|
image: cambridgesemantics/anzograph:latest
|
||||||
|
|||||||
@@ -1,10 +1,14 @@
|
|||||||
import { useEffect, useRef, useState } from "react";
|
import { useEffect, useRef, useState } from "react";
|
||||||
import { Renderer } from "./renderer";
|
import { Renderer } from "./renderer";
|
||||||
|
|
||||||
|
function sleep(ms: number): Promise<void> {
|
||||||
|
return new Promise((r) => setTimeout(r, ms));
|
||||||
|
}
|
||||||
|
|
||||||
export default function App() {
|
export default function App() {
|
||||||
const canvasRef = useRef<HTMLCanvasElement>(null);
|
const canvasRef = useRef<HTMLCanvasElement>(null);
|
||||||
const rendererRef = useRef<Renderer | null>(null);
|
const rendererRef = useRef<Renderer | null>(null);
|
||||||
const [status, setStatus] = useState("Loading node positions…");
|
const [status, setStatus] = useState("Waiting for backend…");
|
||||||
const [nodeCount, setNodeCount] = useState(0);
|
const [nodeCount, setNodeCount] = useState(0);
|
||||||
const [stats, setStats] = useState({
|
const [stats, setStats] = useState({
|
||||||
fps: 0,
|
fps: 0,
|
||||||
@@ -16,7 +20,7 @@ export default function App() {
|
|||||||
const [error, setError] = useState("");
|
const [error, setError] = useState("");
|
||||||
const [hoveredNode, setHoveredNode] = useState<{ x: number; y: number; screenX: number; screenY: number } | null>(null);
|
const [hoveredNode, setHoveredNode] = useState<{ x: number; y: number; screenX: number; screenY: number } | null>(null);
|
||||||
const [selectedNodes, setSelectedNodes] = useState<Set<number>>(new Set());
|
const [selectedNodes, setSelectedNodes] = useState<Set<number>>(new Set());
|
||||||
const [backendStats, setBackendStats] = useState<{ nodes: number; edges: number; parsed_triples: number } | null>(null);
|
const [backendStats, setBackendStats] = useState<{ nodes: number; edges: number; backend?: string } | null>(null);
|
||||||
|
|
||||||
// Store mouse position in a ref so it can be accessed in render loop without re-renders
|
// Store mouse position in a ref so it can be accessed in render loop without re-renders
|
||||||
const mousePos = useRef({ x: 0, y: 0 });
|
const mousePos = useRef({ x: 0, y: 0 });
|
||||||
@@ -36,68 +40,79 @@ export default function App() {
|
|||||||
|
|
||||||
let cancelled = false;
|
let cancelled = false;
|
||||||
|
|
||||||
// Optional: fetch backend stats (proxied via Vite) so you can confirm backend is up.
|
|
||||||
fetch("/api/stats")
|
|
||||||
.then((r) => (r.ok ? r.json() : null))
|
|
||||||
.then((j) => {
|
|
||||||
if (!j || cancelled) return;
|
|
||||||
if (typeof j.nodes === "number" && typeof j.edges === "number" && typeof j.parsed_triples === "number") {
|
|
||||||
setBackendStats({ nodes: j.nodes, edges: j.edges, parsed_triples: j.parsed_triples });
|
|
||||||
}
|
|
||||||
})
|
|
||||||
.catch(() => {
|
|
||||||
// Backend is optional; ignore failures.
|
|
||||||
});
|
|
||||||
|
|
||||||
// Fetch CSVs, parse, and init renderer
|
|
||||||
(async () => {
|
(async () => {
|
||||||
try {
|
try {
|
||||||
setStatus("Fetching data files…");
|
// Wait for backend (docker-compose also gates startup via healthcheck, but this
|
||||||
const [nodesResponse, edgesResponse] = await Promise.all([
|
// handles running the frontend standalone).
|
||||||
fetch("/node_positions.csv"),
|
const deadline = performance.now() + 180_000;
|
||||||
fetch("/edges.csv"),
|
let attempt = 0;
|
||||||
]);
|
while (performance.now() < deadline) {
|
||||||
if (!nodesResponse.ok) throw new Error(`Failed to fetch nodes: ${nodesResponse.status}`);
|
attempt++;
|
||||||
if (!edgesResponse.ok) throw new Error(`Failed to fetch edges: ${edgesResponse.status}`);
|
setStatus(`Waiting for backend… (attempt ${attempt})`);
|
||||||
|
try {
|
||||||
|
const res = await fetch("/api/health");
|
||||||
|
if (res.ok) break;
|
||||||
|
} catch {
|
||||||
|
// ignore and retry
|
||||||
|
}
|
||||||
|
await sleep(1000);
|
||||||
|
if (cancelled) return;
|
||||||
|
}
|
||||||
|
|
||||||
const [nodesText, edgesText] = await Promise.all([
|
setStatus("Fetching graph…");
|
||||||
nodesResponse.text(),
|
const graphRes = await fetch("/api/graph");
|
||||||
edgesResponse.text(),
|
if (!graphRes.ok) throw new Error(`Failed to fetch graph: ${graphRes.status}`);
|
||||||
]);
|
const graph = await graphRes.json();
|
||||||
if (cancelled) return;
|
if (cancelled) return;
|
||||||
|
|
||||||
setStatus("Parsing positions…");
|
const nodes = Array.isArray(graph.nodes) ? graph.nodes : [];
|
||||||
const nodeLines = nodesText.split("\n").slice(1).filter(l => l.trim().length > 0);
|
const edges = Array.isArray(graph.edges) ? graph.edges : [];
|
||||||
const count = nodeLines.length;
|
const meta = graph.meta || null;
|
||||||
|
const count = nodes.length;
|
||||||
|
|
||||||
|
// Build positions from backend-provided node coordinates.
|
||||||
|
setStatus("Preparing buffers…");
|
||||||
const xs = new Float32Array(count);
|
const xs = new Float32Array(count);
|
||||||
const ys = new Float32Array(count);
|
const ys = new Float32Array(count);
|
||||||
|
for (let i = 0; i < count; i++) {
|
||||||
|
const nx = nodes[i]?.x;
|
||||||
|
const ny = nodes[i]?.y;
|
||||||
|
xs[i] = typeof nx === "number" ? nx : 0;
|
||||||
|
ys[i] = typeof ny === "number" ? ny : 0;
|
||||||
|
}
|
||||||
const vertexIds = new Uint32Array(count);
|
const vertexIds = new Uint32Array(count);
|
||||||
for (let i = 0; i < count; i++) {
|
for (let i = 0; i < count; i++) {
|
||||||
const parts = nodeLines[i].split(",");
|
const id = nodes[i]?.id;
|
||||||
vertexIds[i] = parseInt(parts[0], 10);
|
vertexIds[i] = typeof id === "number" ? id >>> 0 : i;
|
||||||
xs[i] = parseFloat(parts[1]);
|
|
||||||
ys[i] = parseFloat(parts[2]);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
setStatus("Parsing edges…");
|
// Build edges as vertex-id pairs.
|
||||||
const edgeLines = edgesText.split("\n").slice(1).filter(l => l.trim().length > 0);
|
const edgeData = new Uint32Array(edges.length * 2);
|
||||||
const edgeData = new Uint32Array(edgeLines.length * 2);
|
for (let i = 0; i < edges.length; i++) {
|
||||||
for (let i = 0; i < edgeLines.length; i++) {
|
const s = edges[i]?.source;
|
||||||
const parts = edgeLines[i].split(",");
|
const t = edges[i]?.target;
|
||||||
edgeData[i * 2] = parseInt(parts[0], 10);
|
edgeData[i * 2] = typeof s === "number" ? s >>> 0 : 0;
|
||||||
edgeData[i * 2 + 1] = parseInt(parts[1], 10);
|
edgeData[i * 2 + 1] = typeof t === "number" ? t >>> 0 : 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (cancelled) return;
|
// Use /api/graph meta; don't do a second expensive backend call.
|
||||||
|
if (meta && typeof meta.nodes === "number" && typeof meta.edges === "number") {
|
||||||
|
setBackendStats({
|
||||||
|
nodes: meta.nodes,
|
||||||
|
edges: meta.edges,
|
||||||
|
backend: typeof meta.backend === "string" ? meta.backend : undefined,
|
||||||
|
});
|
||||||
|
} else {
|
||||||
|
setBackendStats({ nodes: nodes.length, edges: edges.length });
|
||||||
|
}
|
||||||
|
|
||||||
setStatus("Building spatial index…");
|
setStatus("Building spatial index…");
|
||||||
await new Promise(r => setTimeout(r, 0));
|
await new Promise((r) => setTimeout(r, 0));
|
||||||
|
|
||||||
const buildMs = renderer.init(xs, ys, vertexIds, edgeData);
|
const buildMs = renderer.init(xs, ys, vertexIds, edgeData);
|
||||||
setNodeCount(renderer.getNodeCount());
|
setNodeCount(renderer.getNodeCount());
|
||||||
setStatus("");
|
setStatus("");
|
||||||
console.log(`Init complete: ${count.toLocaleString()} nodes, ${edgeLines.length.toLocaleString()} edges in ${buildMs.toFixed(0)}ms`);
|
console.log(`Init complete: ${count.toLocaleString()} nodes, ${edges.length.toLocaleString()} edges in ${buildMs.toFixed(0)}ms`);
|
||||||
} catch (e) {
|
} catch (e) {
|
||||||
if (!cancelled) {
|
if (!cancelled) {
|
||||||
setError(e instanceof Error ? e.message : String(e));
|
setError(e instanceof Error ? e.message : String(e));
|
||||||
@@ -295,7 +310,7 @@ export default function App() {
|
|||||||
<div style={{ color: "#f80" }}>Selected: {selectedNodes.size}</div>
|
<div style={{ color: "#f80" }}>Selected: {selectedNodes.size}</div>
|
||||||
{backendStats && (
|
{backendStats && (
|
||||||
<div style={{ color: "#8f8" }}>
|
<div style={{ color: "#8f8" }}>
|
||||||
Backend: {backendStats.nodes.toLocaleString()} nodes, {backendStats.edges.toLocaleString()} edges
|
Backend{backendStats.backend ? ` (${backendStats.backend})` : ""}: {backendStats.nodes.toLocaleString()} nodes, {backendStats.edges.toLocaleString()} edges
|
||||||
</div>
|
</div>
|
||||||
)}
|
)}
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
Reference in New Issue
Block a user