Compare commits
4 Commits
graph-awar
...
a75b5b93da
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
a75b5b93da | ||
|
|
d4bfa5f064 | ||
|
|
bba0ae887d | ||
|
|
bf03d333f9 |
5
.gitignore
vendored
5
.gitignore
vendored
@@ -1,4 +1,9 @@
|
||||
.direnv/
|
||||
.envrc
|
||||
.env
|
||||
backend/.env
|
||||
frontend/node_modules/
|
||||
frontend/dist/
|
||||
.npm/
|
||||
.vite/
|
||||
data/
|
||||
|
||||
67
README.md
67
README.md
@@ -1,67 +0,0 @@
|
||||
# Large Instanced Ontology Visualizer
|
||||
|
||||
An experimental visualizer designed to render and explore massive instanced ontologies (millions of nodes) with interactive performance.
|
||||
|
||||
## 🚀 The Core Challenge
|
||||
Ontologies with millions of instances present a significant rendering challenge for traditional graph visualization tools. This project solves this by:
|
||||
1. **Selective Rendering:** Only rendering up to a set limit of nodes (e.g., 2 million) at any given time.
|
||||
2. **Adaptive Sampling:** When zoomed out, it provides a representative spatial sample of the nodes. When zoomed in, the number of nodes within the viewport naturally falls below the rendering limit, allowing for 100% detail with zero performance degradation.
|
||||
3. **Spatial Indexing:** Using a custom Quadtree to manage millions of points in memory and efficiently determine visibility.
|
||||
|
||||
## 🛠 Technical Architecture
|
||||
|
||||
### 1. Data Pipeline & AnzoGraph Integration
|
||||
The project features an automated pipeline to extract and prepare data from an **AnzoGraph** DB:
|
||||
- **SPARQL Extraction:** `scripts/fetch_from_db.ts` connects to AnzoGraph via its SPARQL endpoint. It fetches a seed set of subjects and their related triples, identifying "primary" nodes (objects of `rdf:type`).
|
||||
- **Graph Classification:** Instances are categorized to distinguish between classes and relationships.
|
||||
- **Force-Directed Layout:** `scripts/compute_layout.ts` calculates 2D positions for the nodes using a **Barnes-Hut** optimized force-directed simulation, ensuring scalability for large graphs.
|
||||
|
||||
### 2. Quadtree Spatial Index
|
||||
To handle millions of nodes without per-frame object allocation:
|
||||
- **In-place Sorting:** The Quadtree (`src/quadtree.ts`) spatially sorts the raw `Float32Array` of positions at build-time.
|
||||
- **Index-Based Access:** Leaves store only the index ranges into the sorted array, pointing directly to the data sent to the GPU.
|
||||
- **Fast Lookups:** Used for both frustum culling and efficient "find node under cursor" calculations.
|
||||
|
||||
### 3. WebGL 2 High-Performance Renderer
|
||||
The renderer (`src/renderer.ts`) is built for maximum throughput:
|
||||
- **`WEBGL_multi_draw` Extension:** Batches multiple leaf nodes into single draw calls, minimizing CPU overhead.
|
||||
- **Zero-Allocation Render Loop:** The frame loop uses pre-allocated typed arrays to prevent GC pauses.
|
||||
- **Dynamic Level of Detail (LOD):**
|
||||
- **Points:** Always visible, with adaptive density based on zoom.
|
||||
- **Lines:** Automatically rendered when zoomed in deep enough to see individual relationships (< 20k visible nodes).
|
||||
- **Selection:** Interactive selection of nodes highlights immediate neighbors (incoming/outgoing edges).
|
||||
|
||||
## 🚦 Getting Started
|
||||
|
||||
### Prerequisites
|
||||
- Docker and Docker Compose
|
||||
- Node.js (for local development)
|
||||
|
||||
### Deployment
|
||||
The project includes a `docker-compose.yml` that spins up both the **AnzoGraph** database and the visualizer app.
|
||||
|
||||
```bash
|
||||
# Start the services
|
||||
docker-compose up -d
|
||||
|
||||
# Inside the app container, the following will run automatically:
|
||||
# 1. Fetch data from AnzoGraph (fetch_from_db.ts)
|
||||
# 2. Compute the 2D layout (compute_layout.ts)
|
||||
# 3. Start the Vite development server
|
||||
```
|
||||
|
||||
The app will be available at `http://localhost:5173`.
|
||||
|
||||
## 🖱 Interactions
|
||||
- **Drag:** Pan the view.
|
||||
- **Scroll:** Zoom in/out at the cursor position.
|
||||
- **Click:** Select a node to see its URI/Label and highlight its neighbors.
|
||||
- **HUD:** Real-time stats on FPS, nodes drawn, and current sampling ratio.
|
||||
|
||||
## TODO
|
||||
- **Positioning:** Use better algorithm to position nodes, trying to avoid as much as possible any edges crossing, but at the same time trying to keep the graph compact.
|
||||
- **Positioning:** Decide how to handle classes which are both instances and classes.
|
||||
- **Functionality:** Find every equipment with a specific property or that participate in a specific process.
|
||||
- **Functionality:** Find every equipment which is connecte to a well.
|
||||
- **Functionality:** Show every connection witin a specified depth.
|
||||
- **Functionality:** Show every element of a specific class.
|
||||
16
backend/Dockerfile
Normal file
16
backend/Dockerfile
Normal file
@@ -0,0 +1,16 @@
|
||||
FROM python:3.12-slim
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
ENV PYTHONDONTWRITEBYTECODE=1 \
|
||||
PYTHONUNBUFFERED=1
|
||||
|
||||
COPY requirements.txt /app/requirements.txt
|
||||
RUN pip install --no-cache-dir -r /app/requirements.txt
|
||||
|
||||
COPY app /app/app
|
||||
|
||||
EXPOSE 8000
|
||||
|
||||
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
|
||||
|
||||
197
backend/app/README.md
Normal file
197
backend/app/README.md
Normal file
@@ -0,0 +1,197 @@
|
||||
# Backend App (`backend/app`)
|
||||
|
||||
This folder contains the FastAPI backend for `visualizador_instanciados`.
|
||||
|
||||
The backend can execute SPARQL queries in two interchangeable ways:
|
||||
|
||||
1. **`GRAPH_BACKEND=rdflib`**: parse a Turtle file into an in-memory RDFLib `Graph` and run SPARQL queries locally.
|
||||
2. **`GRAPH_BACKEND=anzograph`**: run SPARQL queries against an AnzoGraph SPARQL endpoint over HTTP (optionally `LOAD` a TTL on startup).
|
||||
|
||||
Callers (frontend or other clients) interact with a single API surface (`/api/*`) and do not need to know which backend is configured.
|
||||
|
||||
## Files
|
||||
|
||||
- `main.py`
|
||||
- FastAPI app setup, startup/shutdown (`lifespan`), and HTTP endpoints.
|
||||
- `settings.py`
|
||||
- Env-driven configuration (`pydantic-settings`).
|
||||
- `sparql_engine.py`
|
||||
- Backend-agnostic SPARQL execution layer:
|
||||
- `RdflibEngine`: `Graph.query(...)` + SPARQL JSON serialization.
|
||||
- `AnzoGraphEngine`: HTTP POST to `/sparql` with Basic auth + readiness gate.
|
||||
- `create_sparql_engine(settings)` chooses the engine based on `GRAPH_BACKEND`.
|
||||
- `graph_export.py`
|
||||
- Shared helpers to:
|
||||
- build the snapshot SPARQL query used for edge retrieval
|
||||
- map SPARQL JSON bindings to `{nodes, edges}`.
|
||||
- `models.py`
|
||||
- Pydantic response/request models:
|
||||
- `Node`, `Edge`, `GraphResponse`, `StatsResponse`, etc.
|
||||
- `rdf_store.py`
|
||||
- A local parsed representation (dense IDs + neighbor-ish data) built only in `GRAPH_BACKEND=rdflib`.
|
||||
- Used by `/api/nodes`, `/api/edges`, and `rdflib`-mode `/api/stats`.
|
||||
- `pipelines/graph_snapshot.py`
|
||||
- Pipeline used by `/api/graph` to return a `{nodes, edges}` snapshot via SPARQL (works for both RDFLib and AnzoGraph).
|
||||
- `pipelines/layout_dag_radial.py`
|
||||
- DAG layout helpers used by `pipelines/graph_snapshot.py`:
|
||||
- cycle detection
|
||||
- level-synchronous Kahn layering
|
||||
- radial (ring-per-layer) positioning.
|
||||
- `pipelines/snapshot_service.py`
|
||||
- Snapshot cache layer used by `/api/graph` and `/api/stats` so the backend doesn't run expensive SPARQL twice.
|
||||
- `pipelines/subclass_labels.py`
|
||||
- Pipeline to extract `rdfs:subClassOf` entities and aligned `rdfs:label` list.
|
||||
|
||||
## Runtime Flow
|
||||
|
||||
On startup (FastAPI lifespan):
|
||||
|
||||
1. `create_sparql_engine(settings)` selects and starts a SPARQL engine.
|
||||
2. The engine is stored at `app.state.sparql`.
|
||||
3. If `GRAPH_BACKEND=rdflib`, `RDFStore` is also built from the already-loaded RDFLib graph and stored at `app.state.store`.
|
||||
|
||||
On shutdown:
|
||||
|
||||
- `app.state.sparql.shutdown()` is called to close the HTTP client (AnzoGraph mode) or no-op (RDFLib mode).
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Most configuration is intended to be provided via container environment variables (see repo root `.env` and `docker-compose.yml`).
|
||||
|
||||
Core:
|
||||
|
||||
- `GRAPH_BACKEND`: `rdflib` or `anzograph`
|
||||
- `INCLUDE_BNODES`: `true`/`false`
|
||||
- `CORS_ORIGINS`: comma-separated list or `*`
|
||||
|
||||
RDFLib mode:
|
||||
|
||||
- `TTL_PATH`: path inside the backend container to a `.ttl` file (example: `/data/o3po.ttl`)
|
||||
- `MAX_TRIPLES`: optional int; if set, stops parsing after this many triples
|
||||
|
||||
Optional import-combining step (runs before the SPARQL engine starts):
|
||||
|
||||
- `COMBINE_OWL_IMPORTS_ON_START`: `true` to recursively load `TTL_PATH` (or `COMBINE_ENTRY_LOCATION`) plus `owl:imports` and write a combined TTL file.
|
||||
- `COMBINE_ENTRY_LOCATION`: optional override for the entry file/URL to load (defaults to `TTL_PATH`)
|
||||
- `COMBINE_OUTPUT_LOCATION`: optional explicit output path (defaults to `${dirname(entry)}/${COMBINE_OUTPUT_NAME}`)
|
||||
- `COMBINE_OUTPUT_NAME`: output filename when `COMBINE_OUTPUT_LOCATION` is not set (default: `combined_ontology.ttl`)
|
||||
- `COMBINE_FORCE`: `true` to rebuild even if the output file already exists
|
||||
|
||||
AnzoGraph mode:
|
||||
|
||||
- `SPARQL_HOST`: base host (example: `http://anzograph:8080`)
|
||||
- `SPARQL_ENDPOINT`: optional full endpoint; if set, overrides `${SPARQL_HOST}/sparql`
|
||||
- `SPARQL_USER`, `SPARQL_PASS`: Basic auth credentials
|
||||
- `SPARQL_DATA_FILE`: file URI as seen by the **AnzoGraph container** (example: `file:///opt/shared-files/o3po.ttl`)
|
||||
- `SPARQL_GRAPH_IRI`: optional graph IRI for `LOAD ... INTO GRAPH <...>`
|
||||
- `SPARQL_LOAD_ON_START`: `true` to execute `LOAD <SPARQL_DATA_FILE>` during startup
|
||||
- `SPARQL_CLEAR_ON_START`: `true` to execute `CLEAR ALL` during startup (dangerous)
|
||||
- `SPARQL_TIMEOUT_S`: request timeout for normal SPARQL requests
|
||||
- `SPARQL_READY_RETRIES`, `SPARQL_READY_DELAY_S`, `SPARQL_READY_TIMEOUT_S`: readiness gate parameters
|
||||
|
||||
## AnzoGraph Readiness Gate
|
||||
|
||||
`AnzoGraphEngine` does not assume "container started" means "SPARQL works".
|
||||
It waits for a smoke-test POST:
|
||||
|
||||
- Method: `POST ${SPARQL_ENDPOINT}`
|
||||
- Headers:
|
||||
- `Content-Type: application/x-www-form-urlencoded`
|
||||
- `Accept: application/sparql-results+json`
|
||||
- `Authorization: Basic ...` (if configured)
|
||||
- Body: `query=ASK WHERE { ?s ?p ?o }`
|
||||
- Success condition: HTTP 2xx and response parses as JSON
|
||||
|
||||
This matches the behavior described in `docs/anzograph-readiness-julia.md`.
|
||||
|
||||
## API Endpoints
|
||||
|
||||
- `GET /api/health`
|
||||
- Returns `{ "status": "ok" }`.
|
||||
- `GET /api/stats`
|
||||
- Returns counts for the same snapshot used by `/api/graph` (via the snapshot cache).
|
||||
- `POST /api/sparql`
|
||||
- Body: `{ "query": "<SPARQL SELECT/ASK>" }`
|
||||
- Returns SPARQL JSON results as-is.
|
||||
- Notes:
|
||||
- This endpoint is intended for **SELECT/ASK returning SPARQL-JSON**.
|
||||
- SPARQL UPDATE is not exposed here (AnzoGraph `LOAD`/`CLEAR` are handled internally during startup).
|
||||
- `GET /api/graph?node_limit=...&edge_limit=...`
|
||||
- Returns a graph snapshot as `{ nodes: [...], edges: [...] }`.
|
||||
- Implemented as a SPARQL edge query + mapping in `pipelines/graph_snapshot.py`.
|
||||
- `GET /api/nodes`, `GET /api/edges`
|
||||
- Only available in `GRAPH_BACKEND=rdflib` (these use `RDFStore`'s dense ID tables).
|
||||
|
||||
## Data Contract
|
||||
|
||||
### Node
|
||||
|
||||
Returned in `nodes[]` (dense IDs; suitable for indexing in typed arrays):
|
||||
|
||||
```json
|
||||
{
|
||||
"id": 0,
|
||||
"termType": "uri",
|
||||
"iri": "http://example.org/Thing",
|
||||
"label": null,
|
||||
"x": 0.0,
|
||||
"y": 0.0
|
||||
}
|
||||
```
|
||||
|
||||
- `id`: integer dense node ID used in edges
|
||||
- `termType`: `"uri"` or `"bnode"`
|
||||
- `iri`: URI string; blank nodes are normalized to `_:<id>`
|
||||
- `label`: `rdfs:label` when available (best-effort; prefers English)
|
||||
- `x`/`y`: world-space coordinates for rendering (currently a radial layered layout derived from `rdfs:subClassOf`)
|
||||
|
||||
### Edge
|
||||
|
||||
Returned in `edges[]`:
|
||||
|
||||
```json
|
||||
{
|
||||
"source": 0,
|
||||
"target": 12,
|
||||
"predicate": "http://www.w3.org/2000/01/rdf-schema#subClassOf"
|
||||
}
|
||||
```
|
||||
|
||||
- `source`/`target`: dense node IDs (indexes into `nodes[]`)
|
||||
- `predicate`: predicate IRI string
|
||||
|
||||
## Snapshot Query (`/api/graph`)
|
||||
|
||||
`/api/graph` currently uses a SPARQL query that returns only `rdfs:subClassOf` edges:
|
||||
|
||||
- selects bindings as `?s ?p ?o` (with `?p` bound to `rdfs:subClassOf`)
|
||||
- excludes literal objects (`FILTER(!isLiteral(?o))`) for safety
|
||||
- optionally excludes blank nodes (unless `INCLUDE_BNODES=true`)
|
||||
- applies `LIMIT edge_limit`
|
||||
|
||||
The result bindings are mapped to dense node IDs (first-seen order) and returned to the caller.
|
||||
|
||||
`/api/graph` also returns `meta` with snapshot counts and engine info so the frontend doesn't need to call `/api/stats`.
|
||||
|
||||
If a cycle is detected in the returned `rdfs:subClassOf` snapshot, `/api/graph` returns HTTP 422 (layout requires a DAG).
|
||||
|
||||
## Pipelines
|
||||
|
||||
### `pipelines/graph_snapshot.py`
|
||||
|
||||
`fetch_graph_snapshot(...)` is the main "export graph" pipeline used by `/api/graph`.
|
||||
|
||||
### `pipelines/subclass_labels.py`
|
||||
|
||||
`extract_subclass_entities_and_labels(...)`:
|
||||
|
||||
1. Queries all `rdfs:subClassOf` triples.
|
||||
2. Builds a unique set of subjects+objects, then converts it to a deterministic list.
|
||||
3. Queries `rdfs:label` for those entities and returns aligned lists:
|
||||
- `entities[i]` corresponds to `labels[i]`.
|
||||
|
||||
## Notes / Tradeoffs
|
||||
|
||||
- `/api/graph` returns only nodes that appear in the returned edge result set. Nodes not referenced by those edges will not be present.
|
||||
- RDFLib and AnzoGraph may differ in supported SPARQL features (vendor extensions, inference, performance), but the API surface is the same.
|
||||
- `rdf_store.py` is currently only needed for `/api/nodes`, `/api/edges`, and rdflib-mode `/api/stats`. If you don't use those endpoints, it can be removed later.
|
||||
1
backend/app/__init__.py
Normal file
1
backend/app/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
|
||||
102
backend/app/graph_export.py
Normal file
102
backend/app/graph_export.py
Normal file
@@ -0,0 +1,102 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any
|
||||
|
||||
|
||||
def edge_retrieval_query(*, edge_limit: int, include_bnodes: bool) -> str:
|
||||
bnode_filter = "" if include_bnodes else "FILTER(!isBlank(?s) && !isBlank(?o))"
|
||||
|
||||
return f"""
|
||||
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
|
||||
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
|
||||
PREFIX owl: <http://www.w3.org/2002/07/owl#>
|
||||
|
||||
SELECT ?s ?p ?o
|
||||
WHERE {{
|
||||
{{
|
||||
VALUES ?p {{ rdf:type }}
|
||||
?s ?p ?o .
|
||||
?o rdf:type owl:Class .
|
||||
}}
|
||||
UNION
|
||||
{{
|
||||
VALUES ?p {{ rdfs:subClassOf }}
|
||||
?s ?p ?o .
|
||||
}}
|
||||
FILTER(!isLiteral(?o))
|
||||
{bnode_filter}
|
||||
}}
|
||||
LIMIT {edge_limit}
|
||||
"""
|
||||
|
||||
|
||||
def graph_from_sparql_bindings(
|
||||
bindings: list[dict[str, Any]],
|
||||
*,
|
||||
node_limit: int,
|
||||
include_bnodes: bool,
|
||||
) -> tuple[list[dict[str, object]], list[dict[str, object]]]:
|
||||
"""
|
||||
Convert SPARQL JSON results bindings into:
|
||||
nodes: [{id, termType, iri, label}]
|
||||
edges: [{source, target, predicate}]
|
||||
|
||||
IDs are assigned densely (0..N-1) based on first occurrence in bindings.
|
||||
"""
|
||||
|
||||
node_id_by_key: dict[tuple[str, str], int] = {}
|
||||
node_meta: list[tuple[str, str]] = [] # (termType, iri)
|
||||
out_edges: list[dict[str, object]] = []
|
||||
|
||||
def term_to_key_and_iri(term: dict[str, Any]) -> tuple[tuple[str, str], tuple[str, str]] | None:
|
||||
t = term.get("type")
|
||||
v = term.get("value")
|
||||
if not t or v is None:
|
||||
return None
|
||||
if t == "literal":
|
||||
return None
|
||||
if t == "bnode":
|
||||
if not include_bnodes:
|
||||
return None
|
||||
# SPARQL JSON uses bnode identifiers without the "_:" prefix; we normalize to "_:id".
|
||||
return (("bnode", str(v)), ("bnode", f"_:{v}"))
|
||||
# Default to "uri".
|
||||
return (("uri", str(v)), ("uri", str(v)))
|
||||
|
||||
def get_or_add(term: dict[str, Any]) -> int | None:
|
||||
out = term_to_key_and_iri(term)
|
||||
if out is None:
|
||||
return None
|
||||
key, meta = out
|
||||
existing = node_id_by_key.get(key)
|
||||
if existing is not None:
|
||||
return existing
|
||||
if len(node_meta) >= node_limit:
|
||||
return None
|
||||
nid = len(node_meta)
|
||||
node_id_by_key[key] = nid
|
||||
node_meta.append(meta)
|
||||
return nid
|
||||
|
||||
for b in bindings:
|
||||
s_term = b.get("s") or {}
|
||||
o_term = b.get("o") or {}
|
||||
p_term = b.get("p") or {}
|
||||
|
||||
sid = get_or_add(s_term)
|
||||
oid = get_or_add(o_term)
|
||||
if sid is None or oid is None:
|
||||
continue
|
||||
|
||||
pred = p_term.get("value")
|
||||
if not pred:
|
||||
continue
|
||||
|
||||
out_edges.append({"source": sid, "target": oid, "predicate": str(pred)})
|
||||
|
||||
out_nodes = [
|
||||
{"id": i, "termType": term_type, "iri": iri, "label": None}
|
||||
for i, (term_type, iri) in enumerate(node_meta)
|
||||
]
|
||||
|
||||
return out_nodes, out_edges
|
||||
172
backend/app/main.py
Normal file
172
backend/app/main.py
Normal file
@@ -0,0 +1,172 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from contextlib import asynccontextmanager
|
||||
import logging
|
||||
import asyncio
|
||||
|
||||
from fastapi import FastAPI, HTTPException, Query
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
|
||||
from .models import (
|
||||
EdgesResponse,
|
||||
GraphResponse,
|
||||
NeighborsRequest,
|
||||
NeighborsResponse,
|
||||
NodesResponse,
|
||||
SparqlQueryRequest,
|
||||
StatsResponse,
|
||||
)
|
||||
from .pipelines.layout_dag_radial import CycleError
|
||||
from .pipelines.owl_imports_combiner import (
|
||||
build_combined_graph,
|
||||
output_location_to_path,
|
||||
resolve_output_location,
|
||||
serialize_graph_to_ttl,
|
||||
)
|
||||
from .pipelines.selection_neighbors import fetch_neighbor_ids_for_selection
|
||||
from .pipelines.snapshot_service import GraphSnapshotService
|
||||
from .rdf_store import RDFStore
|
||||
from .sparql_engine import RdflibEngine, SparqlEngine, create_sparql_engine
|
||||
from .settings import Settings
|
||||
|
||||
|
||||
settings = Settings()
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@asynccontextmanager
|
||||
async def lifespan(app: FastAPI):
|
||||
rdflib_preloaded_graph = None
|
||||
|
||||
if settings.combine_owl_imports_on_start:
|
||||
entry_location = settings.combine_entry_location or settings.ttl_path
|
||||
output_location = resolve_output_location(
|
||||
entry_location,
|
||||
output_location=settings.combine_output_location,
|
||||
output_name=settings.combine_output_name,
|
||||
)
|
||||
|
||||
output_path = output_location_to_path(output_location)
|
||||
if output_path.exists() and not settings.combine_force:
|
||||
logger.info("Skipping combine step (output exists): %s", output_location)
|
||||
else:
|
||||
rdflib_preloaded_graph = await asyncio.to_thread(build_combined_graph, entry_location)
|
||||
logger.info("Finished combining imports; serializing to: %s", output_location)
|
||||
await asyncio.to_thread(serialize_graph_to_ttl, rdflib_preloaded_graph, output_location)
|
||||
|
||||
if settings.graph_backend == "rdflib":
|
||||
settings.ttl_path = str(output_path)
|
||||
|
||||
sparql: SparqlEngine = create_sparql_engine(settings, rdflib_graph=rdflib_preloaded_graph)
|
||||
await sparql.startup()
|
||||
app.state.sparql = sparql
|
||||
app.state.snapshot_service = GraphSnapshotService(sparql=sparql, settings=settings)
|
||||
|
||||
# Only build node/edge tables when running in rdflib mode.
|
||||
if settings.graph_backend == "rdflib":
|
||||
assert isinstance(sparql, RdflibEngine)
|
||||
if sparql.graph is None:
|
||||
raise RuntimeError("rdflib graph failed to load")
|
||||
|
||||
store = RDFStore(
|
||||
ttl_path=settings.ttl_path,
|
||||
include_bnodes=settings.include_bnodes,
|
||||
max_triples=settings.max_triples,
|
||||
)
|
||||
store.load(sparql.graph)
|
||||
app.state.store = store
|
||||
|
||||
yield
|
||||
|
||||
await sparql.shutdown()
|
||||
|
||||
|
||||
app = FastAPI(title="visualizador_instanciados backend", lifespan=lifespan)
|
||||
|
||||
cors_origins = settings.cors_origin_list()
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=cors_origins,
|
||||
allow_credentials=False,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
|
||||
|
||||
@app.get("/api/health")
|
||||
def health() -> dict[str, str]:
|
||||
return {"status": "ok"}
|
||||
|
||||
|
||||
@app.get("/api/stats", response_model=StatsResponse)
|
||||
async def stats() -> StatsResponse:
|
||||
# Stats reflect exactly what we send to the frontend (/api/graph), not global graph size.
|
||||
svc: GraphSnapshotService = app.state.snapshot_service
|
||||
try:
|
||||
snap = await svc.get(node_limit=50_000, edge_limit=100_000)
|
||||
except CycleError as e:
|
||||
raise HTTPException(status_code=422, detail=str(e)) from None
|
||||
meta = snap.meta
|
||||
return StatsResponse(
|
||||
backend=meta.backend if meta else app.state.sparql.name,
|
||||
ttl_path=meta.ttl_path if meta and meta.ttl_path else settings.ttl_path,
|
||||
sparql_endpoint=meta.sparql_endpoint if meta else None,
|
||||
parsed_triples=len(snap.edges),
|
||||
nodes=len(snap.nodes),
|
||||
edges=len(snap.edges),
|
||||
)
|
||||
|
||||
|
||||
@app.post("/api/sparql")
|
||||
async def sparql_query(req: SparqlQueryRequest) -> dict:
|
||||
sparql: SparqlEngine = app.state.sparql
|
||||
data = await sparql.query_json(req.query)
|
||||
return data
|
||||
|
||||
|
||||
@app.post("/api/neighbors", response_model=NeighborsResponse)
|
||||
async def neighbors(req: NeighborsRequest) -> NeighborsResponse:
|
||||
svc: GraphSnapshotService = app.state.snapshot_service
|
||||
snap = await svc.get(node_limit=req.node_limit, edge_limit=req.edge_limit)
|
||||
sparql: SparqlEngine = app.state.sparql
|
||||
neighbor_ids = await fetch_neighbor_ids_for_selection(
|
||||
sparql,
|
||||
snapshot=snap,
|
||||
selected_ids=req.selected_ids,
|
||||
include_bnodes=settings.include_bnodes,
|
||||
)
|
||||
return NeighborsResponse(selected_ids=req.selected_ids, neighbor_ids=neighbor_ids)
|
||||
|
||||
|
||||
@app.get("/api/nodes", response_model=NodesResponse)
|
||||
def nodes(
|
||||
limit: int = Query(default=10_000, ge=1, le=200_000),
|
||||
offset: int = Query(default=0, ge=0),
|
||||
) -> NodesResponse:
|
||||
if settings.graph_backend != "rdflib":
|
||||
raise HTTPException(status_code=501, detail="GET /api/nodes is only supported in GRAPH_BACKEND=rdflib mode")
|
||||
store: RDFStore = app.state.store
|
||||
return NodesResponse(total=store.node_count, nodes=store.node_slice(offset=offset, limit=limit))
|
||||
|
||||
|
||||
@app.get("/api/edges", response_model=EdgesResponse)
|
||||
def edges(
|
||||
limit: int = Query(default=50_000, ge=1, le=500_000),
|
||||
offset: int = Query(default=0, ge=0),
|
||||
) -> EdgesResponse:
|
||||
if settings.graph_backend != "rdflib":
|
||||
raise HTTPException(status_code=501, detail="GET /api/edges is only supported in GRAPH_BACKEND=rdflib mode")
|
||||
store: RDFStore = app.state.store
|
||||
return EdgesResponse(total=store.edge_count, edges=store.edge_slice(offset=offset, limit=limit))
|
||||
|
||||
|
||||
@app.get("/api/graph", response_model=GraphResponse)
|
||||
async def graph(
|
||||
node_limit: int = Query(default=50_000, ge=1, le=200_000),
|
||||
edge_limit: int = Query(default=100_000, ge=1, le=500_000),
|
||||
) -> GraphResponse:
|
||||
svc: GraphSnapshotService = app.state.snapshot_service
|
||||
try:
|
||||
return await svc.get(node_limit=node_limit, edge_limit=edge_limit)
|
||||
except CycleError as e:
|
||||
raise HTTPException(status_code=422, detail=str(e)) from None
|
||||
69
backend/app/models.py
Normal file
69
backend/app/models.py
Normal file
@@ -0,0 +1,69 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
|
||||
class Node(BaseModel):
|
||||
id: int
|
||||
termType: str # "uri" | "bnode"
|
||||
iri: str
|
||||
label: str | None = None
|
||||
# Optional because /api/nodes (RDFStore) doesn't currently provide positions.
|
||||
x: float | None = None
|
||||
y: float | None = None
|
||||
|
||||
|
||||
class Edge(BaseModel):
|
||||
source: int
|
||||
target: int
|
||||
predicate: str
|
||||
|
||||
|
||||
class StatsResponse(BaseModel):
|
||||
backend: str
|
||||
ttl_path: str
|
||||
sparql_endpoint: str | None = None
|
||||
parsed_triples: int
|
||||
nodes: int
|
||||
edges: int
|
||||
|
||||
|
||||
class NodesResponse(BaseModel):
|
||||
total: int
|
||||
nodes: list[Node]
|
||||
|
||||
|
||||
class EdgesResponse(BaseModel):
|
||||
total: int
|
||||
edges: list[Edge]
|
||||
|
||||
|
||||
class GraphResponse(BaseModel):
|
||||
class Meta(BaseModel):
|
||||
backend: str
|
||||
ttl_path: str | None = None
|
||||
sparql_endpoint: str | None = None
|
||||
include_bnodes: bool
|
||||
node_limit: int
|
||||
edge_limit: int
|
||||
nodes: int
|
||||
edges: int
|
||||
|
||||
nodes: list[Node]
|
||||
edges: list[Edge]
|
||||
meta: Meta | None = None
|
||||
|
||||
|
||||
class SparqlQueryRequest(BaseModel):
|
||||
query: str
|
||||
|
||||
|
||||
class NeighborsRequest(BaseModel):
|
||||
selected_ids: list[int]
|
||||
node_limit: int = 50_000
|
||||
edge_limit: int = 100_000
|
||||
|
||||
|
||||
class NeighborsResponse(BaseModel):
|
||||
selected_ids: list[int]
|
||||
neighbor_ids: list[int]
|
||||
1
backend/app/pipelines/__init__.py
Normal file
1
backend/app/pipelines/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
|
||||
148
backend/app/pipelines/graph_snapshot.py
Normal file
148
backend/app/pipelines/graph_snapshot.py
Normal file
@@ -0,0 +1,148 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any
|
||||
|
||||
from ..graph_export import edge_retrieval_query, graph_from_sparql_bindings
|
||||
from ..models import GraphResponse
|
||||
from ..sparql_engine import SparqlEngine
|
||||
from ..settings import Settings
|
||||
from .layout_dag_radial import CycleError, level_synchronous_kahn_layers, radial_positions_from_layers
|
||||
|
||||
|
||||
RDFS_LABEL = "http://www.w3.org/2000/01/rdf-schema#label"
|
||||
|
||||
|
||||
def _bindings(res: dict[str, Any]) -> list[dict[str, Any]]:
|
||||
return (((res.get("results") or {}).get("bindings")) or [])
|
||||
|
||||
|
||||
def _label_score(label_binding: dict[str, Any]) -> int:
|
||||
# Prefer English, then no-language, then anything else.
|
||||
lang = (label_binding.get("xml:lang") or "").lower()
|
||||
if lang == "en":
|
||||
return 3
|
||||
if lang == "":
|
||||
return 2
|
||||
return 1
|
||||
|
||||
|
||||
async def _fetch_rdfs_labels_for_iris(
|
||||
sparql: SparqlEngine,
|
||||
iris: list[str],
|
||||
*,
|
||||
batch_size: int = 500,
|
||||
) -> dict[str, str]:
|
||||
best: dict[str, tuple[int, str]] = {}
|
||||
|
||||
for i in range(0, len(iris), batch_size):
|
||||
batch = iris[i : i + batch_size]
|
||||
values = " ".join(f"<{u}>" for u in batch)
|
||||
q = f"""
|
||||
SELECT ?s ?label
|
||||
WHERE {{
|
||||
VALUES ?s {{ {values} }}
|
||||
?s <{RDFS_LABEL}> ?label .
|
||||
}}
|
||||
"""
|
||||
res = await sparql.query_json(q)
|
||||
for b in _bindings(res):
|
||||
s = (b.get("s") or {}).get("value")
|
||||
label_term = b.get("label") or {}
|
||||
if not s or label_term.get("type") != "literal":
|
||||
continue
|
||||
label_value = label_term.get("value")
|
||||
if label_value is None:
|
||||
continue
|
||||
score = _label_score(label_term)
|
||||
prev = best.get(s)
|
||||
if prev is None or score > prev[0]:
|
||||
best[s] = (score, str(label_value))
|
||||
|
||||
return {iri: lbl for iri, (_, lbl) in best.items()}
|
||||
|
||||
|
||||
async def fetch_graph_snapshot(
|
||||
sparql: SparqlEngine,
|
||||
*,
|
||||
settings: Settings,
|
||||
node_limit: int,
|
||||
edge_limit: int,
|
||||
) -> GraphResponse:
|
||||
"""
|
||||
Fetch a graph snapshot (nodes + edges) via SPARQL, independent of whether the
|
||||
underlying engine is RDFLib or AnzoGraph.
|
||||
"""
|
||||
edges_q = edge_retrieval_query(edge_limit=edge_limit, include_bnodes=settings.include_bnodes)
|
||||
res = await sparql.query_json(edges_q)
|
||||
bindings = (((res.get("results") or {}).get("bindings")) or [])
|
||||
nodes, edges = graph_from_sparql_bindings(
|
||||
bindings,
|
||||
node_limit=node_limit,
|
||||
include_bnodes=settings.include_bnodes,
|
||||
)
|
||||
|
||||
# Add positions so the frontend doesn't need to run a layout.
|
||||
#
|
||||
# We are exporting only rdfs:subClassOf triples. In the exported edges:
|
||||
# source = subclass, target = superclass
|
||||
# For hierarchical layout we invert edges to:
|
||||
# superclass -> subclass
|
||||
hier_edges: list[tuple[int, int]] = []
|
||||
for e in edges:
|
||||
s = e.get("source")
|
||||
t = e.get("target")
|
||||
try:
|
||||
sid = int(s) # subclass
|
||||
tid = int(t) # superclass
|
||||
except Exception:
|
||||
continue
|
||||
hier_edges.append((tid, sid))
|
||||
|
||||
try:
|
||||
layers = level_synchronous_kahn_layers(node_count=len(nodes), edges=hier_edges)
|
||||
except CycleError as e:
|
||||
# Add a small URI sample to aid debugging.
|
||||
sample: list[str] = []
|
||||
for nid in e.remaining_node_ids[:20]:
|
||||
try:
|
||||
sample.append(str(nodes[nid].get("iri")))
|
||||
except Exception:
|
||||
continue
|
||||
raise CycleError(
|
||||
processed=e.processed,
|
||||
total=e.total,
|
||||
remaining_node_ids=e.remaining_node_ids,
|
||||
remaining_iri_sample=sample or None,
|
||||
) from None
|
||||
|
||||
# Deterministic order within each ring/layer for stable layouts.
|
||||
id_to_iri = [str(n.get("iri", "")) for n in nodes]
|
||||
for layer in layers:
|
||||
layer.sort(key=lambda nid: id_to_iri[nid])
|
||||
|
||||
xs, ys = radial_positions_from_layers(node_count=len(nodes), layers=layers)
|
||||
for i, node in enumerate(nodes):
|
||||
node["x"] = float(xs[i])
|
||||
node["y"] = float(ys[i])
|
||||
|
||||
# Attach labels for URI nodes (blank nodes remain label-less).
|
||||
uri_nodes = [n for n in nodes if n.get("termType") == "uri"]
|
||||
if uri_nodes:
|
||||
iris = [str(n["iri"]) for n in uri_nodes if isinstance(n.get("iri"), str)]
|
||||
label_by_iri = await _fetch_rdfs_labels_for_iris(sparql, iris)
|
||||
for n in uri_nodes:
|
||||
iri = n.get("iri")
|
||||
if isinstance(iri, str) and iri in label_by_iri:
|
||||
n["label"] = label_by_iri[iri]
|
||||
|
||||
meta = GraphResponse.Meta(
|
||||
backend=sparql.name,
|
||||
ttl_path=settings.ttl_path if settings.graph_backend == "rdflib" else None,
|
||||
sparql_endpoint=settings.effective_sparql_endpoint() if settings.graph_backend == "anzograph" else None,
|
||||
include_bnodes=settings.include_bnodes,
|
||||
node_limit=node_limit,
|
||||
edge_limit=edge_limit,
|
||||
nodes=len(nodes),
|
||||
edges=len(edges),
|
||||
)
|
||||
return GraphResponse(nodes=nodes, edges=edges, meta=meta)
|
||||
141
backend/app/pipelines/layout_dag_radial.py
Normal file
141
backend/app/pipelines/layout_dag_radial.py
Normal file
@@ -0,0 +1,141 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import math
|
||||
from collections import deque
|
||||
from typing import Iterable, Sequence
|
||||
|
||||
|
||||
class CycleError(RuntimeError):
|
||||
"""
|
||||
Raised when the requested layout requires a DAG, but a cycle is detected.
|
||||
|
||||
`remaining_node_ids` are the node ids that still had indegree > 0 after Kahn.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
processed: int,
|
||||
total: int,
|
||||
remaining_node_ids: list[int],
|
||||
remaining_iri_sample: list[str] | None = None,
|
||||
) -> None:
|
||||
self.processed = int(processed)
|
||||
self.total = int(total)
|
||||
self.remaining_node_ids = remaining_node_ids
|
||||
self.remaining_iri_sample = remaining_iri_sample
|
||||
|
||||
msg = f"Cycle detected in subClassOf graph (processed {self.processed}/{self.total} nodes)."
|
||||
if remaining_iri_sample:
|
||||
msg += f" Example nodes: {', '.join(remaining_iri_sample)}"
|
||||
super().__init__(msg)
|
||||
|
||||
|
||||
def level_synchronous_kahn_layers(
|
||||
*,
|
||||
node_count: int,
|
||||
edges: Iterable[tuple[int, int]],
|
||||
) -> list[list[int]]:
|
||||
"""
|
||||
Level-synchronous Kahn's algorithm:
|
||||
- process the entire current queue as one batch (one layer)
|
||||
- only then enqueue newly-unlocked nodes for the next batch
|
||||
|
||||
`edges` are directed (u -> v).
|
||||
"""
|
||||
n = int(node_count)
|
||||
if n <= 0:
|
||||
return []
|
||||
|
||||
adj: list[list[int]] = [[] for _ in range(n)]
|
||||
indeg = [0] * n
|
||||
|
||||
for u, v in edges:
|
||||
if u == v:
|
||||
# Self-loops don't help layout and would trivially violate DAG-ness.
|
||||
continue
|
||||
if not (0 <= u < n and 0 <= v < n):
|
||||
continue
|
||||
adj[u].append(v)
|
||||
indeg[v] += 1
|
||||
|
||||
q: deque[int] = deque(i for i, d in enumerate(indeg) if d == 0)
|
||||
layers: list[list[int]] = []
|
||||
|
||||
processed = 0
|
||||
while q:
|
||||
# Consume the full current queue as a single layer.
|
||||
layer = list(q)
|
||||
q.clear()
|
||||
layers.append(layer)
|
||||
|
||||
for u in layer:
|
||||
processed += 1
|
||||
for v in adj[u]:
|
||||
indeg[v] -= 1
|
||||
if indeg[v] == 0:
|
||||
q.append(v)
|
||||
|
||||
if processed != n:
|
||||
remaining = [i for i, d in enumerate(indeg) if d > 0]
|
||||
raise CycleError(processed=processed, total=n, remaining_node_ids=remaining)
|
||||
|
||||
return layers
|
||||
|
||||
|
||||
def radial_positions_from_layers(
|
||||
*,
|
||||
node_count: int,
|
||||
layers: Sequence[Sequence[int]],
|
||||
max_r: float = 5000.0,
|
||||
) -> tuple[list[float], list[float]]:
|
||||
"""
|
||||
Assign node positions in concentric rings (one ring per layer).
|
||||
|
||||
- radius increases with layer index
|
||||
- nodes within a layer are placed evenly by angle
|
||||
- each ring gets a "golden-angle" rotation to reduce spoke artifacts
|
||||
"""
|
||||
n = int(node_count)
|
||||
if n <= 0:
|
||||
return ([], [])
|
||||
|
||||
xs = [0.0] * n
|
||||
ys = [0.0] * n
|
||||
if not layers:
|
||||
return (xs, ys)
|
||||
|
||||
two_pi = 2.0 * math.pi
|
||||
golden = math.pi * (3.0 - math.sqrt(5.0))
|
||||
|
||||
layer_count = len(layers)
|
||||
denom = float(layer_count + 1)
|
||||
|
||||
for li, layer in enumerate(layers):
|
||||
m = len(layer)
|
||||
if m <= 0:
|
||||
continue
|
||||
|
||||
# Keep everything within ~[-max_r, max_r] like the previous spiral layout.
|
||||
r = ((li + 1) / denom) * max_r
|
||||
|
||||
# Rotate each layer deterministically to avoid radial spokes aligning.
|
||||
offset = (li * golden) % two_pi
|
||||
|
||||
if m == 1:
|
||||
nid = int(layer[0])
|
||||
if 0 <= nid < n:
|
||||
xs[nid] = r * math.cos(offset)
|
||||
ys[nid] = r * math.sin(offset)
|
||||
continue
|
||||
|
||||
step = two_pi / float(m)
|
||||
for j, raw_id in enumerate(layer):
|
||||
nid = int(raw_id)
|
||||
if not (0 <= nid < n):
|
||||
continue
|
||||
t = offset + step * float(j)
|
||||
xs[nid] = r * math.cos(t)
|
||||
ys[nid] = r * math.sin(t)
|
||||
|
||||
return (xs, ys)
|
||||
30
backend/app/pipelines/layout_spiral.py
Normal file
30
backend/app/pipelines/layout_spiral.py
Normal file
@@ -0,0 +1,30 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import math
|
||||
|
||||
|
||||
def spiral_positions(n: int, *, max_r: float = 5000.0) -> tuple[list[float], list[float]]:
|
||||
"""
|
||||
Deterministic "sunflower" (golden-angle) spiral layout.
|
||||
|
||||
This is intentionally simple and stable across runs:
|
||||
- angle increments by the golden angle to avoid radial spokes
|
||||
- radius grows with sqrt(i) to keep density roughly uniform over area
|
||||
"""
|
||||
if n <= 0:
|
||||
return ([], [])
|
||||
|
||||
xs = [0.0] * n
|
||||
ys = [0.0] * n
|
||||
|
||||
golden = math.pi * (3.0 - math.sqrt(5.0))
|
||||
denom = float(max(1, n - 1))
|
||||
|
||||
for i in range(n):
|
||||
t = i * golden
|
||||
r = math.sqrt(i / denom) * max_r
|
||||
xs[i] = r * math.cos(t)
|
||||
ys[i] = r * math.sin(t)
|
||||
|
||||
return xs, ys
|
||||
|
||||
96
backend/app/pipelines/owl_imports_combiner.py
Normal file
96
backend/app/pipelines/owl_imports_combiner.py
Normal file
@@ -0,0 +1,96 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import os
|
||||
from pathlib import Path
|
||||
from urllib.parse import unquote, urlparse
|
||||
|
||||
from rdflib import Graph
|
||||
from rdflib.namespace import OWL
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _is_http_url(location: str) -> bool:
|
||||
scheme = urlparse(location).scheme.lower()
|
||||
return scheme in {"http", "https"}
|
||||
|
||||
|
||||
def _is_file_uri(location: str) -> bool:
|
||||
return urlparse(location).scheme.lower() == "file"
|
||||
|
||||
|
||||
def _file_uri_to_path(location: str) -> Path:
|
||||
u = urlparse(location)
|
||||
if u.scheme.lower() != "file":
|
||||
raise ValueError(f"Not a file:// URI: {location!r}")
|
||||
return Path(unquote(u.path))
|
||||
|
||||
|
||||
def resolve_output_location(
|
||||
entry_location: str,
|
||||
*,
|
||||
output_location: str | None,
|
||||
output_name: str,
|
||||
) -> str:
|
||||
if output_location:
|
||||
return output_location
|
||||
|
||||
if _is_http_url(entry_location):
|
||||
raise ValueError(
|
||||
"COMBINE_ENTRY_LOCATION points to an http(s) URL; set COMBINE_OUTPUT_LOCATION to a writable file path."
|
||||
)
|
||||
|
||||
entry_path = _file_uri_to_path(entry_location) if _is_file_uri(entry_location) else Path(entry_location)
|
||||
return str(entry_path.parent / output_name)
|
||||
|
||||
|
||||
def _output_destination_to_path(output_location: str) -> Path:
|
||||
if _is_file_uri(output_location):
|
||||
return _file_uri_to_path(output_location)
|
||||
if _is_http_url(output_location):
|
||||
raise ValueError("Output location must be a local file path (or file:// URI), not http(s).")
|
||||
return Path(output_location)
|
||||
|
||||
|
||||
def output_location_to_path(output_location: str) -> Path:
|
||||
return _output_destination_to_path(output_location)
|
||||
|
||||
|
||||
def build_combined_graph(entry_location: str) -> Graph:
|
||||
"""
|
||||
Recursively loads an RDF document (file path, file:// URI, or http(s) URL) and its
|
||||
owl:imports into a single in-memory graph.
|
||||
"""
|
||||
combined_graph = Graph()
|
||||
visited_locations: set[str] = set()
|
||||
|
||||
def resolve_imports(location: str) -> None:
|
||||
if location in visited_locations:
|
||||
return
|
||||
visited_locations.add(location)
|
||||
|
||||
logger.info("Loading ontology: %s", location)
|
||||
try:
|
||||
combined_graph.parse(location=location)
|
||||
except Exception as e:
|
||||
logger.warning("Failed to load %s (%s)", location, e)
|
||||
return
|
||||
|
||||
imports = [str(o) for _, _, o in combined_graph.triples((None, OWL.imports, None))]
|
||||
for imported_location in imports:
|
||||
if imported_location not in visited_locations:
|
||||
resolve_imports(imported_location)
|
||||
|
||||
resolve_imports(entry_location)
|
||||
return combined_graph
|
||||
|
||||
|
||||
def serialize_graph_to_ttl(graph: Graph, output_location: str) -> None:
|
||||
output_path = _output_destination_to_path(output_location)
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
tmp_path = output_path.with_suffix(output_path.suffix + ".tmp")
|
||||
graph.serialize(destination=str(tmp_path), format="turtle")
|
||||
os.replace(str(tmp_path), str(output_path))
|
||||
137
backend/app/pipelines/selection_neighbors.py
Normal file
137
backend/app/pipelines/selection_neighbors.py
Normal file
@@ -0,0 +1,137 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Iterable
|
||||
|
||||
from ..models import GraphResponse, Node
|
||||
from ..sparql_engine import SparqlEngine
|
||||
|
||||
|
||||
def _values_term(node: Node) -> str | None:
|
||||
iri = node.iri
|
||||
if node.termType == "uri":
|
||||
return f"<{iri}>"
|
||||
if node.termType == "bnode":
|
||||
if iri.startswith("_:"):
|
||||
return iri
|
||||
return f"_:{iri}"
|
||||
return None
|
||||
|
||||
|
||||
def selection_neighbors_query(*, selected_nodes: Iterable[Node], include_bnodes: bool) -> str:
|
||||
values_terms: list[str] = []
|
||||
for n in selected_nodes:
|
||||
t = _values_term(n)
|
||||
if t is None:
|
||||
continue
|
||||
values_terms.append(t)
|
||||
|
||||
if not values_terms:
|
||||
# Caller should avoid running this query when selection is empty, but keep this safe.
|
||||
return "SELECT ?nbr WHERE { FILTER(false) }"
|
||||
|
||||
bnode_filter = "" if include_bnodes else "FILTER(!isBlank(?nbr))"
|
||||
values = " ".join(values_terms)
|
||||
|
||||
# Neighbors are defined as any node directly connected by rdf:type (to owl:Class)
|
||||
# or rdfs:subClassOf, in either direction (treating edges as undirected).
|
||||
return f"""
|
||||
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
|
||||
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
|
||||
PREFIX owl: <http://www.w3.org/2002/07/owl#>
|
||||
|
||||
SELECT DISTINCT ?nbr
|
||||
WHERE {{
|
||||
VALUES ?sel {{ {values} }}
|
||||
{{
|
||||
?sel rdf:type ?o .
|
||||
?o rdf:type owl:Class .
|
||||
BIND(?o AS ?nbr)
|
||||
}}
|
||||
UNION
|
||||
{{
|
||||
?s rdf:type ?sel .
|
||||
?sel rdf:type owl:Class .
|
||||
BIND(?s AS ?nbr)
|
||||
}}
|
||||
UNION
|
||||
{{
|
||||
?sel rdfs:subClassOf ?o .
|
||||
BIND(?o AS ?nbr)
|
||||
}}
|
||||
UNION
|
||||
{{
|
||||
?s rdfs:subClassOf ?sel .
|
||||
BIND(?s AS ?nbr)
|
||||
}}
|
||||
FILTER(!isLiteral(?nbr))
|
||||
FILTER(?nbr != ?sel)
|
||||
{bnode_filter}
|
||||
}}
|
||||
"""
|
||||
|
||||
|
||||
def _bindings(res: dict[str, Any]) -> list[dict[str, Any]]:
|
||||
return (((res.get("results") or {}).get("bindings")) or [])
|
||||
|
||||
|
||||
def _term_key(term: dict[str, Any], *, include_bnodes: bool) -> tuple[str, str] | None:
|
||||
t = term.get("type")
|
||||
v = term.get("value")
|
||||
if not t or v is None:
|
||||
return None
|
||||
if t == "literal":
|
||||
return None
|
||||
if t == "bnode":
|
||||
if not include_bnodes:
|
||||
return None
|
||||
return ("bnode", f"_:{v}")
|
||||
return ("uri", str(v))
|
||||
|
||||
|
||||
async def fetch_neighbor_ids_for_selection(
|
||||
sparql: SparqlEngine,
|
||||
*,
|
||||
snapshot: GraphResponse,
|
||||
selected_ids: list[int],
|
||||
include_bnodes: bool,
|
||||
) -> list[int]:
|
||||
id_to_node: dict[int, Node] = {n.id: n for n in snapshot.nodes}
|
||||
|
||||
selected_nodes: list[Node] = []
|
||||
selected_id_set: set[int] = set()
|
||||
for nid in selected_ids:
|
||||
if not isinstance(nid, int):
|
||||
continue
|
||||
n = id_to_node.get(nid)
|
||||
if n is None:
|
||||
continue
|
||||
if n.termType == "bnode" and not include_bnodes:
|
||||
continue
|
||||
selected_nodes.append(n)
|
||||
selected_id_set.add(nid)
|
||||
|
||||
if not selected_nodes:
|
||||
return []
|
||||
|
||||
key_to_id: dict[tuple[str, str], int] = {}
|
||||
for n in snapshot.nodes:
|
||||
key_to_id[(n.termType, n.iri)] = n.id
|
||||
|
||||
q = selection_neighbors_query(selected_nodes=selected_nodes, include_bnodes=include_bnodes)
|
||||
res = await sparql.query_json(q)
|
||||
|
||||
neighbor_ids: set[int] = set()
|
||||
for b in _bindings(res):
|
||||
nbr_term = b.get("nbr") or {}
|
||||
key = _term_key(nbr_term, include_bnodes=include_bnodes)
|
||||
if key is None:
|
||||
continue
|
||||
nid = key_to_id.get(key)
|
||||
if nid is None:
|
||||
continue
|
||||
if nid in selected_id_set:
|
||||
continue
|
||||
neighbor_ids.add(nid)
|
||||
|
||||
# Stable ordering for consistent frontend behavior.
|
||||
return sorted(neighbor_ids)
|
||||
63
backend/app/pipelines/snapshot_service.py
Normal file
63
backend/app/pipelines/snapshot_service.py
Normal file
@@ -0,0 +1,63 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
from dataclasses import dataclass
|
||||
|
||||
from ..models import GraphResponse
|
||||
from ..sparql_engine import SparqlEngine
|
||||
from ..settings import Settings
|
||||
from .graph_snapshot import fetch_graph_snapshot
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class SnapshotKey:
|
||||
node_limit: int
|
||||
edge_limit: int
|
||||
include_bnodes: bool
|
||||
|
||||
|
||||
class GraphSnapshotService:
|
||||
"""
|
||||
Caches graph snapshots so the backend doesn't re-run expensive SPARQL for stats/graph.
|
||||
"""
|
||||
|
||||
def __init__(self, *, sparql: SparqlEngine, settings: Settings):
|
||||
self._sparql = sparql
|
||||
self._settings = settings
|
||||
|
||||
self._cache: dict[SnapshotKey, GraphResponse] = {}
|
||||
self._locks: dict[SnapshotKey, asyncio.Lock] = {}
|
||||
self._global_lock = asyncio.Lock()
|
||||
|
||||
async def get(self, *, node_limit: int, edge_limit: int) -> GraphResponse:
|
||||
key = SnapshotKey(
|
||||
node_limit=node_limit,
|
||||
edge_limit=edge_limit,
|
||||
include_bnodes=self._settings.include_bnodes,
|
||||
)
|
||||
|
||||
cached = self._cache.get(key)
|
||||
if cached is not None:
|
||||
return cached
|
||||
|
||||
# Create/get a per-key lock under a global lock to avoid races.
|
||||
async with self._global_lock:
|
||||
lock = self._locks.get(key)
|
||||
if lock is None:
|
||||
lock = asyncio.Lock()
|
||||
self._locks[key] = lock
|
||||
|
||||
async with lock:
|
||||
cached2 = self._cache.get(key)
|
||||
if cached2 is not None:
|
||||
return cached2
|
||||
|
||||
snapshot = await fetch_graph_snapshot(
|
||||
self._sparql,
|
||||
settings=self._settings,
|
||||
node_limit=node_limit,
|
||||
edge_limit=edge_limit,
|
||||
)
|
||||
self._cache[key] = snapshot
|
||||
return snapshot
|
||||
|
||||
153
backend/app/pipelines/subclass_labels.py
Normal file
153
backend/app/pipelines/subclass_labels.py
Normal file
@@ -0,0 +1,153 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any
|
||||
|
||||
from ..sparql_engine import SparqlEngine
|
||||
|
||||
RDFS_SUBCLASS_OF = "http://www.w3.org/2000/01/rdf-schema#subClassOf"
|
||||
RDFS_LABEL = "http://www.w3.org/2000/01/rdf-schema#label"
|
||||
|
||||
|
||||
def _bindings(res: dict[str, Any]) -> list[dict[str, Any]]:
|
||||
return (((res.get("results") or {}).get("bindings")) or [])
|
||||
|
||||
|
||||
def _term_key(term: dict[str, Any]) -> tuple[str, str] | None:
|
||||
t = term.get("type")
|
||||
v = term.get("value")
|
||||
if not t or v is None:
|
||||
return None
|
||||
if t == "literal":
|
||||
return None
|
||||
if t == "bnode":
|
||||
return ("bnode", str(v))
|
||||
return ("uri", str(v))
|
||||
|
||||
|
||||
def _key_to_entity_string(key: tuple[str, str]) -> str:
|
||||
t, v = key
|
||||
if t == "bnode":
|
||||
return f"_:{v}"
|
||||
return v
|
||||
|
||||
|
||||
def _label_score(binding: dict[str, Any]) -> int:
|
||||
"""
|
||||
Higher is better.
|
||||
Prefer English, then no-language, then anything else.
|
||||
"""
|
||||
lang = (binding.get("xml:lang") or "").lower()
|
||||
if lang == "en":
|
||||
return 3
|
||||
if lang == "":
|
||||
return 2
|
||||
return 1
|
||||
|
||||
|
||||
async def extract_subclass_entities_and_labels(
|
||||
sparql: SparqlEngine,
|
||||
*,
|
||||
include_bnodes: bool,
|
||||
label_batch_size: int = 500,
|
||||
) -> tuple[list[str], list[str | None]]:
|
||||
"""
|
||||
Pipeline:
|
||||
1) Query all rdfs:subClassOf triples.
|
||||
2) Build a unique set of entity terms from subjects+objects, convert to list.
|
||||
3) Fetch rdfs:label for those entities and return an aligned labels list.
|
||||
|
||||
Returns:
|
||||
entities: list[str] (IRI or "_:bnodeId")
|
||||
labels: list[str|None], aligned with entities
|
||||
"""
|
||||
|
||||
subclass_q = f"""
|
||||
SELECT ?s ?o
|
||||
WHERE {{
|
||||
?s <{RDFS_SUBCLASS_OF}> ?o .
|
||||
FILTER(!isLiteral(?o))
|
||||
{"FILTER(!isBlank(?s) && !isBlank(?o))" if not include_bnodes else ""}
|
||||
}}
|
||||
"""
|
||||
res = await sparql.query_json(subclass_q)
|
||||
|
||||
entity_keys: set[tuple[str, str]] = set()
|
||||
for b in _bindings(res):
|
||||
sk = _term_key(b.get("s") or {})
|
||||
ok = _term_key(b.get("o") or {})
|
||||
if sk is not None and (include_bnodes or sk[0] != "bnode"):
|
||||
entity_keys.add(sk)
|
||||
if ok is not None and (include_bnodes or ok[0] != "bnode"):
|
||||
entity_keys.add(ok)
|
||||
|
||||
# Deterministic ordering.
|
||||
entity_key_list = sorted(entity_keys, key=lambda k: (k[0], k[1]))
|
||||
entities = [_key_to_entity_string(k) for k in entity_key_list]
|
||||
|
||||
# Build label map keyed by term key.
|
||||
best_label_by_key: dict[tuple[str, str], tuple[int, str]] = {}
|
||||
|
||||
# URIs can be batch-queried via VALUES.
|
||||
uri_values = [v for (t, v) in entity_key_list if t == "uri"]
|
||||
for i in range(0, len(uri_values), label_batch_size):
|
||||
batch = uri_values[i : i + label_batch_size]
|
||||
values = " ".join(f"<{u}>" for u in batch)
|
||||
labels_q = f"""
|
||||
SELECT ?s ?label
|
||||
WHERE {{
|
||||
VALUES ?s {{ {values} }}
|
||||
?s <{RDFS_LABEL}> ?label .
|
||||
}}
|
||||
"""
|
||||
lres = await sparql.query_json(labels_q)
|
||||
for b in _bindings(lres):
|
||||
sk = _term_key(b.get("s") or {})
|
||||
if sk is None or sk[0] != "uri":
|
||||
continue
|
||||
label_term = b.get("label") or {}
|
||||
if label_term.get("type") != "literal":
|
||||
continue
|
||||
label_value = label_term.get("value")
|
||||
if label_value is None:
|
||||
continue
|
||||
|
||||
score = _label_score(label_term)
|
||||
prev = best_label_by_key.get(sk)
|
||||
if prev is None or score > prev[0]:
|
||||
best_label_by_key[sk] = (score, str(label_value))
|
||||
|
||||
# Blank nodes can't reliably be addressed by ID across queries, but if enabled we can still
|
||||
# fetch all bnode labels and filter locally.
|
||||
if include_bnodes:
|
||||
bnode_keys = {k for k in entity_key_list if k[0] == "bnode"}
|
||||
if bnode_keys:
|
||||
bnode_labels_q = f"""
|
||||
SELECT ?s ?label
|
||||
WHERE {{
|
||||
?s <{RDFS_LABEL}> ?label .
|
||||
FILTER(isBlank(?s))
|
||||
}}
|
||||
"""
|
||||
blres = await sparql.query_json(bnode_labels_q)
|
||||
for b in _bindings(blres):
|
||||
sk = _term_key(b.get("s") or {})
|
||||
if sk is None or sk not in bnode_keys:
|
||||
continue
|
||||
label_term = b.get("label") or {}
|
||||
if label_term.get("type") != "literal":
|
||||
continue
|
||||
label_value = label_term.get("value")
|
||||
if label_value is None:
|
||||
continue
|
||||
score = _label_score(label_term)
|
||||
prev = best_label_by_key.get(sk)
|
||||
if prev is None or score > prev[0]:
|
||||
best_label_by_key[sk] = (score, str(label_value))
|
||||
|
||||
labels: list[str | None] = []
|
||||
for k in entity_key_list:
|
||||
item = best_label_by_key.get(k)
|
||||
labels.append(item[1] if item else None)
|
||||
|
||||
return entities, labels
|
||||
|
||||
150
backend/app/rdf_store.py
Normal file
150
backend/app/rdf_store.py
Normal file
@@ -0,0 +1,150 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from typing import Any
|
||||
|
||||
from rdflib import BNode, Graph, Literal, URIRef
|
||||
from rdflib.namespace import RDFS, SKOS
|
||||
|
||||
|
||||
LABEL_PREDICATES = {RDFS.label, SKOS.prefLabel, SKOS.altLabel}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class EdgeRow:
|
||||
source: int
|
||||
target: int
|
||||
predicate: str
|
||||
|
||||
|
||||
class RDFStore:
|
||||
def __init__(self, *, ttl_path: str, include_bnodes: bool, max_triples: int | None):
|
||||
self.ttl_path = ttl_path
|
||||
self.include_bnodes = include_bnodes
|
||||
self.max_triples = max_triples
|
||||
|
||||
self.graph: Graph | None = None
|
||||
|
||||
self._id_by_term: dict[Any, int] = {}
|
||||
self._term_by_id: list[Any] = []
|
||||
|
||||
self._labels_by_id: dict[int, str] = {}
|
||||
self._edges: list[EdgeRow] = []
|
||||
self._parsed_triples = 0
|
||||
|
||||
def _term_allowed(self, term: Any) -> bool:
|
||||
if isinstance(term, Literal):
|
||||
return False
|
||||
if isinstance(term, BNode) and not self.include_bnodes:
|
||||
return False
|
||||
return isinstance(term, (URIRef, BNode))
|
||||
|
||||
def _get_id(self, term: Any) -> int | None:
|
||||
if not self._term_allowed(term):
|
||||
return None
|
||||
existing = self._id_by_term.get(term)
|
||||
if existing is not None:
|
||||
return existing
|
||||
nid = len(self._term_by_id)
|
||||
self._id_by_term[term] = nid
|
||||
self._term_by_id.append(term)
|
||||
return nid
|
||||
|
||||
def _term_type(self, term: Any) -> str:
|
||||
if isinstance(term, BNode):
|
||||
return "bnode"
|
||||
return "uri"
|
||||
|
||||
def _term_iri(self, term: Any) -> str:
|
||||
if isinstance(term, BNode):
|
||||
return f"_:{term}"
|
||||
return str(term)
|
||||
|
||||
def load(self, graph: Graph | None = None) -> None:
|
||||
g = graph or Graph()
|
||||
if graph is None:
|
||||
g.parse(self.ttl_path, format="turtle")
|
||||
self.graph = g
|
||||
|
||||
self._id_by_term.clear()
|
||||
self._term_by_id.clear()
|
||||
self._labels_by_id.clear()
|
||||
self._edges.clear()
|
||||
|
||||
parsed = 0
|
||||
for (s, p, o) in g:
|
||||
parsed += 1
|
||||
if self.max_triples is not None and parsed > self.max_triples:
|
||||
break
|
||||
|
||||
# Capture labels but do not emit them as edges.
|
||||
if p in LABEL_PREDICATES and isinstance(o, Literal):
|
||||
sid = self._get_id(s)
|
||||
if sid is not None and sid not in self._labels_by_id:
|
||||
self._labels_by_id[sid] = str(o)
|
||||
continue
|
||||
|
||||
sid = self._get_id(s)
|
||||
oid = self._get_id(o)
|
||||
if sid is None or oid is None:
|
||||
continue
|
||||
|
||||
self._edges.append(EdgeRow(source=sid, target=oid, predicate=str(p)))
|
||||
|
||||
self._parsed_triples = parsed
|
||||
|
||||
@property
|
||||
def parsed_triples(self) -> int:
|
||||
return self._parsed_triples
|
||||
|
||||
@property
|
||||
def node_count(self) -> int:
|
||||
return len(self._term_by_id)
|
||||
|
||||
@property
|
||||
def edge_count(self) -> int:
|
||||
return len(self._edges)
|
||||
|
||||
def node_slice(self, *, offset: int, limit: int) -> list[dict[str, Any]]:
|
||||
end = min(self.node_count, offset + limit)
|
||||
out: list[dict[str, Any]] = []
|
||||
for nid in range(offset, end):
|
||||
term = self._term_by_id[nid]
|
||||
out.append(
|
||||
{
|
||||
"id": nid,
|
||||
"termType": self._term_type(term),
|
||||
"iri": self._term_iri(term),
|
||||
"label": self._labels_by_id.get(nid),
|
||||
}
|
||||
)
|
||||
return out
|
||||
|
||||
def edge_slice(self, *, offset: int, limit: int) -> list[dict[str, Any]]:
|
||||
end = min(self.edge_count, offset + limit)
|
||||
out: list[dict[str, Any]] = []
|
||||
for row in self._edges[offset:end]:
|
||||
out.append(
|
||||
{
|
||||
"source": row.source,
|
||||
"target": row.target,
|
||||
"predicate": row.predicate,
|
||||
}
|
||||
)
|
||||
return out
|
||||
|
||||
def edges_within_nodes(self, *, max_node_id_exclusive: int, limit: int) -> list[dict[str, Any]]:
|
||||
out: list[dict[str, Any]] = []
|
||||
for row in self._edges:
|
||||
if row.source >= max_node_id_exclusive or row.target >= max_node_id_exclusive:
|
||||
continue
|
||||
out.append(
|
||||
{
|
||||
"source": row.source,
|
||||
"target": row.target,
|
||||
"predicate": row.predicate,
|
||||
}
|
||||
)
|
||||
if len(out) >= limit:
|
||||
break
|
||||
return out
|
||||
58
backend/app/settings.py
Normal file
58
backend/app/settings.py
Normal file
@@ -0,0 +1,58 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Literal
|
||||
|
||||
from pydantic import Field
|
||||
from pydantic_settings import BaseSettings, SettingsConfigDict
|
||||
|
||||
|
||||
class Settings(BaseSettings):
|
||||
# Which graph engine executes SPARQL queries.
|
||||
# - rdflib: parse TTL locally and query in-memory
|
||||
# - anzograph: query a remote AnzoGraph SPARQL endpoint (optionally LOAD on startup)
|
||||
graph_backend: Literal["rdflib", "anzograph"] = Field(default="rdflib", alias="GRAPH_BACKEND")
|
||||
|
||||
ttl_path: str = Field(default="/data/o3po.ttl", alias="TTL_PATH")
|
||||
include_bnodes: bool = Field(default=False, alias="INCLUDE_BNODES")
|
||||
max_triples: int | None = Field(default=None, alias="MAX_TRIPLES")
|
||||
|
||||
# Optional: Combine owl:imports into a single TTL file on backend startup.
|
||||
combine_owl_imports_on_start: bool = Field(default=False, alias="COMBINE_OWL_IMPORTS_ON_START")
|
||||
combine_entry_location: str | None = Field(default=None, alias="COMBINE_ENTRY_LOCATION")
|
||||
combine_output_location: str | None = Field(default=None, alias="COMBINE_OUTPUT_LOCATION")
|
||||
combine_output_name: str = Field(default="combined_ontology.ttl", alias="COMBINE_OUTPUT_NAME")
|
||||
combine_force: bool = Field(default=False, alias="COMBINE_FORCE")
|
||||
|
||||
# AnzoGraph / SPARQL endpoint configuration
|
||||
sparql_host: str = Field(default="http://anzograph:8080", alias="SPARQL_HOST")
|
||||
# If not set, the backend uses `${SPARQL_HOST}/sparql`.
|
||||
sparql_endpoint: str | None = Field(default=None, alias="SPARQL_ENDPOINT")
|
||||
sparql_user: str | None = Field(default=None, alias="SPARQL_USER")
|
||||
sparql_pass: str | None = Field(default=None, alias="SPARQL_PASS")
|
||||
|
||||
# File URI as seen by the AnzoGraph container (used with SPARQL `LOAD`).
|
||||
# Example: file:///opt/shared-files/o3po.ttl
|
||||
sparql_data_file: str | None = Field(default=None, alias="SPARQL_DATA_FILE")
|
||||
sparql_graph_iri: str | None = Field(default=None, alias="SPARQL_GRAPH_IRI")
|
||||
sparql_load_on_start: bool = Field(default=False, alias="SPARQL_LOAD_ON_START")
|
||||
sparql_clear_on_start: bool = Field(default=False, alias="SPARQL_CLEAR_ON_START")
|
||||
|
||||
sparql_timeout_s: float = Field(default=300.0, alias="SPARQL_TIMEOUT_S")
|
||||
sparql_ready_retries: int = Field(default=30, alias="SPARQL_READY_RETRIES")
|
||||
sparql_ready_delay_s: float = Field(default=4.0, alias="SPARQL_READY_DELAY_S")
|
||||
sparql_ready_timeout_s: float = Field(default=10.0, alias="SPARQL_READY_TIMEOUT_S")
|
||||
|
||||
# Comma-separated, or "*" (default).
|
||||
cors_origins: str = Field(default="*", alias="CORS_ORIGINS")
|
||||
|
||||
model_config = SettingsConfigDict(env_file=".env", extra="ignore")
|
||||
|
||||
def cors_origin_list(self) -> list[str]:
|
||||
if self.cors_origins.strip() == "*":
|
||||
return ["*"]
|
||||
return [o.strip() for o in self.cors_origins.split(",") if o.strip()]
|
||||
|
||||
def effective_sparql_endpoint(self) -> str:
|
||||
if self.sparql_endpoint and self.sparql_endpoint.strip():
|
||||
return self.sparql_endpoint.strip()
|
||||
return self.sparql_host.rstrip("/") + "/sparql"
|
||||
177
backend/app/sparql_engine.py
Normal file
177
backend/app/sparql_engine.py
Normal file
@@ -0,0 +1,177 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import base64
|
||||
import json
|
||||
from typing import Any, Protocol
|
||||
|
||||
import httpx
|
||||
from rdflib import Graph
|
||||
|
||||
from .settings import Settings
|
||||
|
||||
|
||||
class SparqlEngine(Protocol):
|
||||
name: str
|
||||
|
||||
async def startup(self) -> None: ...
|
||||
|
||||
async def shutdown(self) -> None: ...
|
||||
|
||||
async def query_json(self, query: str) -> dict[str, Any]: ...
|
||||
|
||||
|
||||
class RdflibEngine:
|
||||
name = "rdflib"
|
||||
|
||||
def __init__(self, *, ttl_path: str, graph: Graph | None = None):
|
||||
self.ttl_path = ttl_path
|
||||
self.graph: Graph | None = graph
|
||||
|
||||
async def startup(self) -> None:
|
||||
if self.graph is not None:
|
||||
return
|
||||
g = Graph()
|
||||
g.parse(self.ttl_path, format="turtle")
|
||||
self.graph = g
|
||||
|
||||
async def shutdown(self) -> None:
|
||||
# Nothing to close for in-memory rdflib graph.
|
||||
return None
|
||||
|
||||
async def query_json(self, query: str) -> dict[str, Any]:
|
||||
if self.graph is None:
|
||||
raise RuntimeError("RdflibEngine not started")
|
||||
|
||||
result = self.graph.query(query)
|
||||
payload = result.serialize(format="json")
|
||||
if isinstance(payload, bytes):
|
||||
payload = payload.decode("utf-8")
|
||||
return json.loads(payload)
|
||||
|
||||
|
||||
class AnzoGraphEngine:
|
||||
name = "anzograph"
|
||||
|
||||
def __init__(self, *, settings: Settings):
|
||||
self.endpoint = settings.effective_sparql_endpoint()
|
||||
self.timeout_s = settings.sparql_timeout_s
|
||||
self.ready_retries = settings.sparql_ready_retries
|
||||
self.ready_delay_s = settings.sparql_ready_delay_s
|
||||
self.ready_timeout_s = settings.sparql_ready_timeout_s
|
||||
|
||||
self.user = settings.sparql_user
|
||||
self.password = settings.sparql_pass
|
||||
self.data_file = settings.sparql_data_file
|
||||
self.graph_iri = settings.sparql_graph_iri
|
||||
self.load_on_start = settings.sparql_load_on_start
|
||||
self.clear_on_start = settings.sparql_clear_on_start
|
||||
|
||||
self._client: httpx.AsyncClient | None = None
|
||||
self._auth_header = self._build_auth_header(self.user, self.password)
|
||||
|
||||
@staticmethod
|
||||
def _build_auth_header(user: str | None, password: str | None) -> str | None:
|
||||
if not user or not password:
|
||||
return None
|
||||
token = base64.b64encode(f"{user}:{password}".encode("utf-8")).decode("ascii")
|
||||
return f"Basic {token}"
|
||||
|
||||
async def startup(self) -> None:
|
||||
self._client = httpx.AsyncClient(timeout=self.timeout_s)
|
||||
|
||||
await self._wait_ready()
|
||||
|
||||
if self.clear_on_start:
|
||||
await self._update("CLEAR ALL")
|
||||
await self._wait_ready()
|
||||
|
||||
if self.load_on_start:
|
||||
if not self.data_file:
|
||||
raise RuntimeError("SPARQL_LOAD_ON_START=true but SPARQL_DATA_FILE is not set")
|
||||
|
||||
if self.graph_iri:
|
||||
await self._update(f"LOAD <{self.data_file}> INTO GRAPH <{self.graph_iri}>")
|
||||
else:
|
||||
await self._update(f"LOAD <{self.data_file}>")
|
||||
|
||||
# AnzoGraph may still be indexing after LOAD.
|
||||
await self._wait_ready()
|
||||
|
||||
async def shutdown(self) -> None:
|
||||
if self._client is not None:
|
||||
await self._client.aclose()
|
||||
self._client = None
|
||||
|
||||
async def query_json(self, query: str) -> dict[str, Any]:
|
||||
if self._client is None:
|
||||
raise RuntimeError("AnzoGraphEngine not started")
|
||||
|
||||
headers = {
|
||||
"Content-Type": "application/x-www-form-urlencoded",
|
||||
"Accept": "application/sparql-results+json",
|
||||
}
|
||||
if self._auth_header:
|
||||
headers["Authorization"] = self._auth_header
|
||||
|
||||
# AnzoGraph expects x-www-form-urlencoded with `query=...`.
|
||||
resp = await self._client.post(
|
||||
self.endpoint,
|
||||
headers=headers,
|
||||
data={"query": query},
|
||||
)
|
||||
resp.raise_for_status()
|
||||
return resp.json()
|
||||
|
||||
async def _update(self, update: str) -> None:
|
||||
if self._client is None:
|
||||
raise RuntimeError("AnzoGraphEngine not started")
|
||||
|
||||
headers = {
|
||||
"Content-Type": "application/sparql-update",
|
||||
"Accept": "application/json",
|
||||
}
|
||||
if self._auth_header:
|
||||
headers["Authorization"] = self._auth_header
|
||||
|
||||
resp = await self._client.post(self.endpoint, headers=headers, content=update)
|
||||
resp.raise_for_status()
|
||||
|
||||
async def _wait_ready(self) -> None:
|
||||
if self._client is None:
|
||||
raise RuntimeError("AnzoGraphEngine not started")
|
||||
|
||||
# Match the repo's Julia readiness gate: real SPARQL POST + valid JSON parse.
|
||||
headers = {
|
||||
"Content-Type": "application/x-www-form-urlencoded",
|
||||
"Accept": "application/sparql-results+json",
|
||||
}
|
||||
if self._auth_header:
|
||||
headers["Authorization"] = self._auth_header
|
||||
|
||||
last_err: Exception | None = None
|
||||
for _ in range(self.ready_retries):
|
||||
try:
|
||||
resp = await self._client.post(
|
||||
self.endpoint,
|
||||
headers=headers,
|
||||
data={"query": "ASK WHERE { ?s ?p ?o }"},
|
||||
timeout=self.ready_timeout_s,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
# Ensure it's JSON, not HTML/text during boot.
|
||||
resp.json()
|
||||
return
|
||||
except Exception as e:
|
||||
last_err = e
|
||||
await asyncio.sleep(self.ready_delay_s)
|
||||
|
||||
raise RuntimeError(f"AnzoGraph not ready at {self.endpoint}") from last_err
|
||||
|
||||
|
||||
def create_sparql_engine(settings: Settings, *, rdflib_graph: Graph | None = None) -> SparqlEngine:
|
||||
if settings.graph_backend == "rdflib":
|
||||
return RdflibEngine(ttl_path=settings.ttl_path, graph=rdflib_graph)
|
||||
if settings.graph_backend == "anzograph":
|
||||
return AnzoGraphEngine(settings=settings)
|
||||
raise RuntimeError(f"Unsupported GRAPH_BACKEND={settings.graph_backend!r}")
|
||||
5
backend/requirements.txt
Normal file
5
backend/requirements.txt
Normal file
@@ -0,0 +1,5 @@
|
||||
fastapi
|
||||
uvicorn[standard]
|
||||
rdflib
|
||||
pydantic-settings
|
||||
httpx
|
||||
@@ -1,17 +1,55 @@
|
||||
services:
|
||||
app:
|
||||
build: .
|
||||
depends_on:
|
||||
- anzograph
|
||||
backend:
|
||||
build: ./backend
|
||||
ports:
|
||||
- "8000:8000"
|
||||
environment:
|
||||
- GRAPH_BACKEND=${GRAPH_BACKEND:-rdflib}
|
||||
- TTL_PATH=${TTL_PATH:-/data/o3po.ttl}
|
||||
- INCLUDE_BNODES=${INCLUDE_BNODES:-false}
|
||||
- MAX_TRIPLES
|
||||
- CORS_ORIGINS=${CORS_ORIGINS:-http://localhost:5173}
|
||||
- SPARQL_HOST=${SPARQL_HOST:-http://anzograph:8080}
|
||||
- SPARQL_ENDPOINT
|
||||
- SPARQL_USER=${SPARQL_USER:-admin}
|
||||
- SPARQL_PASS=${SPARQL_PASS:-Passw0rd1}
|
||||
- SPARQL_DATA_FILE=${SPARQL_DATA_FILE:-file:///opt/shared-files/o3po.ttl}
|
||||
- SPARQL_GRAPH_IRI
|
||||
- SPARQL_LOAD_ON_START=${SPARQL_LOAD_ON_START:-false}
|
||||
- SPARQL_CLEAR_ON_START=${SPARQL_CLEAR_ON_START:-false}
|
||||
- SPARQL_TIMEOUT_S=${SPARQL_TIMEOUT_S:-300}
|
||||
- SPARQL_READY_RETRIES=${SPARQL_READY_RETRIES:-30}
|
||||
- SPARQL_READY_DELAY_S=${SPARQL_READY_DELAY_S:-4}
|
||||
- SPARQL_READY_TIMEOUT_S=${SPARQL_READY_TIMEOUT_S:-10}
|
||||
- COMBINE_OWL_IMPORTS_ON_START=${COMBINE_OWL_IMPORTS_ON_START:-false}
|
||||
- COMBINE_ENTRY_LOCATION
|
||||
- COMBINE_OUTPUT_LOCATION
|
||||
- COMBINE_OUTPUT_NAME
|
||||
- COMBINE_FORCE=${COMBINE_FORCE:-false}
|
||||
volumes:
|
||||
- ./backend:/app
|
||||
- ./data:/data:Z
|
||||
command: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
|
||||
healthcheck:
|
||||
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/api/health').read()"]
|
||||
interval: 5s
|
||||
timeout: 3s
|
||||
retries: 60
|
||||
|
||||
frontend:
|
||||
build: ./frontend
|
||||
ports:
|
||||
- "5173:5173"
|
||||
env_file:
|
||||
- .env
|
||||
command: sh -c "npm run layout && npm run dev -- --host"
|
||||
environment:
|
||||
- VITE_BACKEND_URL=${VITE_BACKEND_URL:-http://backend:8000}
|
||||
volumes:
|
||||
- .:/app:Z
|
||||
- ./frontend:/app
|
||||
- /app/node_modules
|
||||
|
||||
depends_on:
|
||||
- backend
|
||||
# Docker Compose v1 doesn't support depends_on:condition. Do an explicit wait here.
|
||||
command: sh -c "until wget -qO- http://backend:8000/api/health >/dev/null 2>&1; do echo 'waiting for backend...'; sleep 1; done; npm run dev -- --host --port 5173"
|
||||
|
||||
anzograph:
|
||||
image: cambridgesemantics/anzograph:latest
|
||||
container_name: anzograph
|
||||
@@ -20,4 +58,3 @@ services:
|
||||
- "8443:8443"
|
||||
volumes:
|
||||
- ./data:/opt/shared-files:Z
|
||||
|
||||
|
||||
@@ -2,6 +2,8 @@ FROM node:lts-alpine
|
||||
|
||||
WORKDIR /app
|
||||
|
||||
EXPOSE 5173
|
||||
|
||||
# Copy dependency definitions
|
||||
COPY package*.json ./
|
||||
|
||||
@@ -11,8 +13,5 @@ RUN npm install
|
||||
# Copy the rest of the source code
|
||||
COPY . .
|
||||
|
||||
# Expose the standard Vite port
|
||||
EXPOSE 5173
|
||||
|
||||
# Compute layout, then start the dev server with --host for external access
|
||||
CMD ["sh", "-c", "npm run dev -- --host"]
|
||||
# Start the dev server with --host for external access
|
||||
CMD ["npm", "run", "dev", "--", "--host", "--port", "5173"]
|
||||
@@ -7,7 +7,7 @@
|
||||
"dev": "vite",
|
||||
"build": "vite build",
|
||||
"preview": "vite preview",
|
||||
"layout": "npx tsx scripts/fetch_from_db.ts && npx tsx scripts/compute_layout.ts"
|
||||
"layout": "tsx scripts/compute_layout.ts"
|
||||
},
|
||||
"dependencies": {
|
||||
"@webgpu/types": "^0.1.69",
|
||||
100000
frontend/public/edges.csv
Normal file
100000
frontend/public/edges.csv
Normal file
File diff suppressed because it is too large
Load Diff
100001
frontend/public/node_positions.csv
Normal file
100001
frontend/public/node_positions.csv
Normal file
File diff suppressed because it is too large
Load Diff
354
frontend/scripts/compute_layout.ts
Normal file
354
frontend/scripts/compute_layout.ts
Normal file
@@ -0,0 +1,354 @@
|
||||
#!/usr/bin/env npx tsx
|
||||
/**
|
||||
* Tree-Aware Force Layout
|
||||
*
|
||||
* Generates a random tree (via generate_tree), computes a radial tree layout,
|
||||
* then applies gentle force refinement and writes node_positions.csv.
|
||||
*
|
||||
* Usage: npm run layout
|
||||
*/
|
||||
|
||||
import { writeFileSync } from "fs";
|
||||
import { join, dirname } from "path";
|
||||
import { fileURLToPath } from "url";
|
||||
import { generateTree } from "./generate_tree.js";
|
||||
|
||||
const __dirname = dirname(fileURLToPath(import.meta.url));
|
||||
const PUBLIC_DIR = join(__dirname, "..", "public");
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Configuration
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
const ENABLE_FORCE_SIM = true; // Set to false to skip force simulation
|
||||
const ITERATIONS = 100; // Force iterations (gentle)
|
||||
const REPULSION_K = 80; // Repulsion strength (1% of original 8000)
|
||||
const EDGE_LENGTH = 120; // Desired edge rest length
|
||||
const ATTRACTION_K = 0.0002; // Spring stiffness for edges (1% of original 0.02)
|
||||
const THETA = 0.7; // Barnes-Hut accuracy
|
||||
const INITIAL_MAX_DISP = 15; // Starting max displacement
|
||||
const COOLING = 0.998; // Very slow cooling per iteration
|
||||
const MIN_DIST = 0.5;
|
||||
const PRINT_EVERY = 10; // Print progress every N iterations
|
||||
|
||||
// Scale radius so the tree is nicely spread
|
||||
const RADIUS_PER_DEPTH = EDGE_LENGTH * 1.2;
|
||||
|
||||
// ── Special nodes with longer parent-edges ──
|
||||
// Add vertex IDs here to give them longer edges to their parent.
|
||||
// These nodes (and all their descendants) will be pushed further out.
|
||||
const LONG_EDGE_NODES = new Set<number>([
|
||||
// e.g. 42, 99, 150
|
||||
]);
|
||||
const LONG_EDGE_MULTIPLIER = 3.0; // How many times longer than normal
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Generate tree (in-memory)
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
const { root, nodeCount: N, childrenOf, parentOf } = generateTree();
|
||||
|
||||
const nodeIds: number[] = [];
|
||||
for (let i = 0; i < N; i++) nodeIds.push(i);
|
||||
|
||||
// Dense index mapping (identity since IDs are 0..N-1)
|
||||
const idToIdx = new Map<number, number>();
|
||||
for (let i = 0; i < N; i++) idToIdx.set(i, i);
|
||||
|
||||
// Edge list as index pairs (child, parent)
|
||||
const edges: Array<[number, number]> = [];
|
||||
for (const [child, parent] of parentOf) {
|
||||
edges.push([child, parent]);
|
||||
}
|
||||
|
||||
// Per-node neighbor list (for edge traversal)
|
||||
const neighbors: number[][] = Array.from({ length: N }, () => []);
|
||||
for (const [a, b] of edges) {
|
||||
neighbors[a].push(b);
|
||||
neighbors[b].push(a);
|
||||
}
|
||||
|
||||
console.log(`Tree: ${N} nodes, ${edges.length} edges, root=${root}`);
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Step 1: Radial tree layout (generous spacing, no crossings)
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
const x = new Float64Array(N);
|
||||
const y = new Float64Array(N);
|
||||
const depth = new Uint32Array(N);
|
||||
const nodeRadius = new Float64Array(N); // cumulative radius from root
|
||||
|
||||
// Compute subtree sizes
|
||||
const subtreeSize = new Uint32Array(N).fill(1);
|
||||
{
|
||||
const rootIdx = idToIdx.get(root)!;
|
||||
const stack: Array<{ idx: number; phase: "enter" | "exit" }> = [
|
||||
{ idx: rootIdx, phase: "enter" },
|
||||
];
|
||||
while (stack.length > 0) {
|
||||
const { idx, phase } = stack.pop()!;
|
||||
if (phase === "enter") {
|
||||
stack.push({ idx, phase: "exit" });
|
||||
const kids = childrenOf.get(nodeIds[idx]);
|
||||
if (kids) {
|
||||
for (const kid of kids) {
|
||||
stack.push({ idx: idToIdx.get(kid)!, phase: "enter" });
|
||||
}
|
||||
}
|
||||
} else {
|
||||
const kids = childrenOf.get(nodeIds[idx]);
|
||||
if (kids) {
|
||||
for (const kid of kids) {
|
||||
subtreeSize[idx] += subtreeSize[idToIdx.get(kid)!];
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Compute depths & max depth
|
||||
let maxDepth = 0;
|
||||
{
|
||||
const rootIdx = idToIdx.get(root)!;
|
||||
const stack: Array<{ idx: number; d: number }> = [{ idx: rootIdx, d: 0 }];
|
||||
while (stack.length > 0) {
|
||||
const { idx, d } = stack.pop()!;
|
||||
depth[idx] = d;
|
||||
if (d > maxDepth) maxDepth = d;
|
||||
const kids = childrenOf.get(nodeIds[idx]);
|
||||
if (kids) {
|
||||
for (const kid of kids) {
|
||||
stack.push({ idx: idToIdx.get(kid)!, d: d + 1 });
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// BFS radial assignment (cumulative radii to support per-edge lengths)
|
||||
{
|
||||
const rootIdx = idToIdx.get(root)!;
|
||||
x[rootIdx] = 0;
|
||||
y[rootIdx] = 0;
|
||||
nodeRadius[rootIdx] = 0;
|
||||
|
||||
interface Entry {
|
||||
idx: number;
|
||||
d: number;
|
||||
aStart: number;
|
||||
aEnd: number;
|
||||
}
|
||||
|
||||
const queue: Entry[] = [{ idx: rootIdx, d: 0, aStart: 0, aEnd: 2 * Math.PI }];
|
||||
let head = 0;
|
||||
|
||||
while (head < queue.length) {
|
||||
const { idx, d, aStart, aEnd } = queue[head++];
|
||||
const kids = childrenOf.get(nodeIds[idx]);
|
||||
if (!kids || kids.length === 0) continue;
|
||||
|
||||
// Sort children by subtree size (largest sectors together for balance)
|
||||
const sortedKids = [...kids].sort(
|
||||
(a, b) => (subtreeSize[idToIdx.get(b)!]) - (subtreeSize[idToIdx.get(a)!])
|
||||
);
|
||||
|
||||
const totalWeight = sortedKids.reduce(
|
||||
(s, k) => s + subtreeSize[idToIdx.get(k)!], 0
|
||||
);
|
||||
|
||||
let angle = aStart;
|
||||
for (const kid of sortedKids) {
|
||||
const kidIdx = idToIdx.get(kid)!;
|
||||
const w = subtreeSize[kidIdx];
|
||||
const sector = (w / totalWeight) * (aEnd - aStart);
|
||||
const mid = angle + sector / 2;
|
||||
|
||||
// Cumulative radius: parent's radius + edge step (longer for special nodes)
|
||||
const step = LONG_EDGE_NODES.has(kid)
|
||||
? RADIUS_PER_DEPTH * LONG_EDGE_MULTIPLIER
|
||||
: RADIUS_PER_DEPTH;
|
||||
const r = nodeRadius[idx] + step;
|
||||
nodeRadius[kidIdx] = r;
|
||||
|
||||
x[kidIdx] = r * Math.cos(mid);
|
||||
y[kidIdx] = r * Math.sin(mid);
|
||||
|
||||
queue.push({ idx: kidIdx, d: d + 1, aStart: angle, aEnd: angle + sector });
|
||||
angle += sector;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`Radial layout done (depth=${maxDepth}, radius_step=${RADIUS_PER_DEPTH})`);
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Step 2: Gentle force refinement (preserves non-crossing)
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
// Barnes-Hut quadtree for repulsion
|
||||
interface BHNode {
|
||||
cx: number; cy: number;
|
||||
mass: number;
|
||||
size: number;
|
||||
children: (BHNode | null)[];
|
||||
bodyIdx: number;
|
||||
}
|
||||
|
||||
function buildBHTree(): BHNode {
|
||||
let minX = Infinity, maxX = -Infinity, minY = Infinity, maxY = -Infinity;
|
||||
for (let i = 0; i < N; i++) {
|
||||
if (x[i] < minX) minX = x[i];
|
||||
if (x[i] > maxX) maxX = x[i];
|
||||
if (y[i] < minY) minY = y[i];
|
||||
if (y[i] > maxY) maxY = y[i];
|
||||
}
|
||||
const size = Math.max(maxX - minX, maxY - minY, 1) * 1.01;
|
||||
const cx = (minX + maxX) / 2;
|
||||
const cy = (minY + maxY) / 2;
|
||||
|
||||
const root: BHNode = {
|
||||
cx: 0, cy: 0, mass: 0, size,
|
||||
children: [null, null, null, null], bodyIdx: -1,
|
||||
};
|
||||
|
||||
for (let i = 0; i < N; i++) {
|
||||
insert(root, i, cx, cy, size);
|
||||
}
|
||||
return root;
|
||||
}
|
||||
|
||||
function insert(node: BHNode, idx: number, ncx: number, ncy: number, ns: number): void {
|
||||
if (node.mass === 0) {
|
||||
node.bodyIdx = idx;
|
||||
node.cx = x[idx]; node.cy = y[idx];
|
||||
node.mass = 1;
|
||||
return;
|
||||
}
|
||||
if (node.bodyIdx >= 0) {
|
||||
const old = node.bodyIdx;
|
||||
node.bodyIdx = -1;
|
||||
putInQuadrant(node, old, ncx, ncy, ns);
|
||||
}
|
||||
putInQuadrant(node, idx, ncx, ncy, ns);
|
||||
const tm = node.mass + 1;
|
||||
node.cx = (node.cx * node.mass + x[idx]) / tm;
|
||||
node.cy = (node.cy * node.mass + y[idx]) / tm;
|
||||
node.mass = tm;
|
||||
}
|
||||
|
||||
function putInQuadrant(node: BHNode, idx: number, ncx: number, ncy: number, ns: number): void {
|
||||
const hs = ns / 2;
|
||||
const qx = x[idx] >= ncx ? 1 : 0;
|
||||
const qy = y[idx] >= ncy ? 1 : 0;
|
||||
const q = qy * 2 + qx;
|
||||
const ccx = ncx + (qx ? hs / 2 : -hs / 2);
|
||||
const ccy = ncy + (qy ? hs / 2 : -hs / 2);
|
||||
if (!node.children[q]) {
|
||||
node.children[q] = {
|
||||
cx: 0, cy: 0, mass: 0, size: hs,
|
||||
children: [null, null, null, null], bodyIdx: -1,
|
||||
};
|
||||
}
|
||||
insert(node.children[q]!, idx, ccx, ccy, hs);
|
||||
}
|
||||
|
||||
function repulse(node: BHNode, idx: number, fx: Float64Array, fy: Float64Array): void {
|
||||
if (node.mass === 0 || node.bodyIdx === idx) return;
|
||||
const dx = x[idx] - node.cx;
|
||||
const dy = y[idx] - node.cy;
|
||||
const d2 = dx * dx + dy * dy;
|
||||
const d = Math.sqrt(d2) || MIN_DIST;
|
||||
|
||||
if (node.bodyIdx >= 0 || (node.size / d) < THETA) {
|
||||
const f = REPULSION_K * node.mass / (d2 + MIN_DIST);
|
||||
fx[idx] += (dx / d) * f;
|
||||
fy[idx] += (dy / d) * f;
|
||||
return;
|
||||
}
|
||||
for (const c of node.children) {
|
||||
if (c) repulse(c, idx, fx, fy);
|
||||
}
|
||||
}
|
||||
|
||||
// ── Force simulation ──
|
||||
if (ENABLE_FORCE_SIM) {
|
||||
console.log(`Applying gentle forces (${ITERATIONS} steps, 1% strength)...`);
|
||||
const t0 = performance.now();
|
||||
let maxDisp = INITIAL_MAX_DISP;
|
||||
|
||||
for (let iter = 0; iter < ITERATIONS; iter++) {
|
||||
const fx = new Float64Array(N);
|
||||
const fy = new Float64Array(N);
|
||||
|
||||
// 1. Repulsion
|
||||
const tree = buildBHTree();
|
||||
for (let i = 0; i < N; i++) {
|
||||
repulse(tree, i, fx, fy);
|
||||
}
|
||||
|
||||
// 2. Edge attraction (spring toward per-edge rest length)
|
||||
for (const [a, b] of edges) {
|
||||
const dx = x[b] - x[a];
|
||||
const dy = y[b] - y[a];
|
||||
const d = Math.sqrt(dx * dx + dy * dy) || MIN_DIST;
|
||||
const aId = nodeIds[a], bId = nodeIds[b];
|
||||
const isLong = LONG_EDGE_NODES.has(aId) || LONG_EDGE_NODES.has(bId);
|
||||
const restLen = isLong ? EDGE_LENGTH * LONG_EDGE_MULTIPLIER : EDGE_LENGTH;
|
||||
const displacement = d - restLen;
|
||||
const f = ATTRACTION_K * displacement;
|
||||
const ux = dx / d, uy = dy / d;
|
||||
fx[a] += ux * f;
|
||||
fy[a] += uy * f;
|
||||
fx[b] -= ux * f;
|
||||
fy[b] -= uy * f;
|
||||
}
|
||||
|
||||
// 3. Apply forces with displacement cap (cooling reduces it over time)
|
||||
for (let i = 0; i < N; i++) {
|
||||
const mag = Math.sqrt(fx[i] * fx[i] + fy[i] * fy[i]);
|
||||
if (mag > 0) {
|
||||
const cap = Math.min(maxDisp, mag) / mag;
|
||||
x[i] += fx[i] * cap;
|
||||
y[i] += fy[i] * cap;
|
||||
}
|
||||
}
|
||||
|
||||
// 4. Cool down
|
||||
maxDisp *= COOLING;
|
||||
|
||||
if ((iter + 1) % PRINT_EVERY === 0) {
|
||||
let totalForce = 0;
|
||||
for (let i = 0; i < N; i++) totalForce += Math.sqrt(fx[i] * fx[i] + fy[i] * fy[i]);
|
||||
console.log(` iter ${iter + 1}/${ITERATIONS} max_disp=${maxDisp.toFixed(2)} avg_force=${(totalForce / N).toFixed(2)}`);
|
||||
}
|
||||
}
|
||||
|
||||
const elapsed = performance.now() - t0;
|
||||
console.log(`Force simulation done in ${(elapsed / 1000).toFixed(1)}s`);
|
||||
} else {
|
||||
console.log("Force simulation SKIPPED (ENABLE_FORCE_SIM = false)");
|
||||
}
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Write output
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
// Write node positions
|
||||
const outLines: string[] = ["vertex,x,y"];
|
||||
for (let i = 0; i < N; i++) {
|
||||
outLines.push(`${nodeIds[i]},${x[i]},${y[i]}`);
|
||||
}
|
||||
|
||||
const outPath = join(PUBLIC_DIR, "node_positions.csv");
|
||||
writeFileSync(outPath, outLines.join("\n") + "\n");
|
||||
console.log(`Wrote ${N} positions to ${outPath}`);
|
||||
|
||||
// Write edges (so the renderer can draw them)
|
||||
const edgeLines: string[] = ["source,target"];
|
||||
for (const [child, parent] of parentOf) {
|
||||
edgeLines.push(`${child},${parent}`);
|
||||
}
|
||||
|
||||
const edgesPath = join(PUBLIC_DIR, "edges.csv");
|
||||
writeFileSync(edgesPath, edgeLines.join("\n") + "\n");
|
||||
console.log(`Wrote ${edges.length} edges to ${edgesPath}`);
|
||||
61
frontend/scripts/generate_tree.ts
Normal file
61
frontend/scripts/generate_tree.ts
Normal file
@@ -0,0 +1,61 @@
|
||||
/**
|
||||
* Random Tree Generator
|
||||
*
|
||||
* Generates a random tree with 1–MAX_CHILDREN children per node.
|
||||
* Exports a function that returns the tree data in memory.
|
||||
*/
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Configuration
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
const TARGET_NODES = 100000; // Approximate number of nodes to generate
|
||||
const MAX_CHILDREN = 3; // Each node gets 1..MAX_CHILDREN children
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Tree data types
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
export interface TreeData {
|
||||
root: number;
|
||||
nodeCount: number;
|
||||
childrenOf: Map<number, number[]>;
|
||||
parentOf: Map<number, number>;
|
||||
}
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Generator
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
export function generateTree(): TreeData {
|
||||
const childrenOf = new Map<number, number[]>();
|
||||
const parentOf = new Map<number, number>();
|
||||
|
||||
const root = 0;
|
||||
let nextId = 1;
|
||||
const queue: number[] = [root];
|
||||
let head = 0;
|
||||
|
||||
while (head < queue.length && nextId < TARGET_NODES) {
|
||||
const parent = queue[head++];
|
||||
const nKids = 1 + Math.floor(Math.random() * MAX_CHILDREN); // 1..MAX_CHILDREN
|
||||
|
||||
const kids: number[] = [];
|
||||
for (let c = 0; c < nKids && nextId < TARGET_NODES; c++) {
|
||||
const child = nextId++;
|
||||
kids.push(child);
|
||||
parentOf.set(child, parent);
|
||||
queue.push(child);
|
||||
}
|
||||
childrenOf.set(parent, kids);
|
||||
}
|
||||
|
||||
console.log(`Generated tree: ${nextId} nodes, ${parentOf.size} edges, root=${root}`);
|
||||
|
||||
return {
|
||||
root,
|
||||
nodeCount: nextId,
|
||||
childrenOf,
|
||||
parentOf,
|
||||
};
|
||||
}
|
||||
@@ -1,12 +1,26 @@
|
||||
import { useEffect, useRef, useState } from "react";
|
||||
import { Renderer } from "./renderer";
|
||||
|
||||
function sleep(ms: number): Promise<void> {
|
||||
return new Promise((r) => setTimeout(r, ms));
|
||||
}
|
||||
|
||||
type GraphMeta = {
|
||||
backend?: string;
|
||||
ttl_path?: string | null;
|
||||
sparql_endpoint?: string | null;
|
||||
include_bnodes?: boolean;
|
||||
node_limit?: number;
|
||||
edge_limit?: number;
|
||||
nodes?: number;
|
||||
edges?: number;
|
||||
};
|
||||
|
||||
export default function App() {
|
||||
const canvasRef = useRef<HTMLCanvasElement>(null);
|
||||
const rendererRef = useRef<Renderer | null>(null);
|
||||
const [status, setStatus] = useState("Loading node positions…");
|
||||
const [status, setStatus] = useState("Waiting for backend…");
|
||||
const [nodeCount, setNodeCount] = useState(0);
|
||||
const uriMapRef = useRef<Map<number, { uri: string; label: string; isPrimary: boolean }>>(new Map());
|
||||
const [stats, setStats] = useState({
|
||||
fps: 0,
|
||||
drawn: 0,
|
||||
@@ -15,11 +29,15 @@ export default function App() {
|
||||
ptSize: 0,
|
||||
});
|
||||
const [error, setError] = useState("");
|
||||
const [hoveredNode, setHoveredNode] = useState<{ x: number; y: number; screenX: number; screenY: number; index?: number } | null>(null);
|
||||
const [hoveredNode, setHoveredNode] = useState<{ x: number; y: number; screenX: number; screenY: number; label?: string; iri?: string } | null>(null);
|
||||
const [selectedNodes, setSelectedNodes] = useState<Set<number>>(new Set());
|
||||
const [backendStats, setBackendStats] = useState<{ nodes: number; edges: number; backend?: string } | null>(null);
|
||||
const graphMetaRef = useRef<GraphMeta | null>(null);
|
||||
const neighborsReqIdRef = useRef(0);
|
||||
|
||||
// Store mouse position in a ref so it can be accessed in render loop without re-renders
|
||||
const mousePos = useRef({ x: 0, y: 0 });
|
||||
const nodesRef = useRef<any[]>([]);
|
||||
|
||||
useEffect(() => {
|
||||
const canvas = canvasRef.current;
|
||||
@@ -36,87 +54,82 @@ export default function App() {
|
||||
|
||||
let cancelled = false;
|
||||
|
||||
// Fetch CSVs, parse, and init renderer
|
||||
(async () => {
|
||||
try {
|
||||
setStatus("Fetching data files…");
|
||||
const [nodesResponse, primaryEdgesResponse, secondaryEdgesResponse, uriMapResponse] = await Promise.all([
|
||||
fetch("/node_positions.csv"),
|
||||
fetch("/primary_edges.csv"),
|
||||
fetch("/secondary_edges.csv"),
|
||||
fetch("/uri_map.csv"),
|
||||
]);
|
||||
if (!nodesResponse.ok) throw new Error(`Failed to fetch nodes: ${nodesResponse.status}`);
|
||||
if (!primaryEdgesResponse.ok) throw new Error(`Failed to fetch primary edges: ${primaryEdgesResponse.status}`);
|
||||
if (!secondaryEdgesResponse.ok) throw new Error(`Failed to fetch secondary edges: ${secondaryEdgesResponse.status}`);
|
||||
// Wait for backend (docker-compose also gates startup via healthcheck, but this
|
||||
// handles running the frontend standalone).
|
||||
const deadline = performance.now() + 180_000;
|
||||
let attempt = 0;
|
||||
while (performance.now() < deadline) {
|
||||
attempt++;
|
||||
setStatus(`Waiting for backend… (attempt ${attempt})`);
|
||||
try {
|
||||
const res = await fetch("/api/health");
|
||||
if (res.ok) break;
|
||||
} catch {
|
||||
// ignore and retry
|
||||
}
|
||||
await sleep(1000);
|
||||
if (cancelled) return;
|
||||
}
|
||||
|
||||
const [nodesText, primaryEdgesText, secondaryEdgesText, uriMapText] = await Promise.all([
|
||||
nodesResponse.text(),
|
||||
primaryEdgesResponse.text(),
|
||||
secondaryEdgesResponse.text(),
|
||||
uriMapResponse.ok ? uriMapResponse.text() : Promise.resolve(""),
|
||||
]);
|
||||
setStatus("Fetching graph…");
|
||||
const graphRes = await fetch("/api/graph");
|
||||
if (!graphRes.ok) throw new Error(`Failed to fetch graph: ${graphRes.status}`);
|
||||
const graph = await graphRes.json();
|
||||
if (cancelled) return;
|
||||
|
||||
setStatus("Parsing positions…");
|
||||
const nodeLines = nodesText.split("\n").slice(1).filter(l => l.trim().length > 0);
|
||||
const count = nodeLines.length;
|
||||
const nodes = Array.isArray(graph.nodes) ? graph.nodes : [];
|
||||
const edges = Array.isArray(graph.edges) ? graph.edges : [];
|
||||
const meta = graph.meta || null;
|
||||
const count = nodes.length;
|
||||
|
||||
nodesRef.current = nodes;
|
||||
graphMetaRef.current = meta && typeof meta === "object" ? (meta as GraphMeta) : null;
|
||||
|
||||
// Build positions from backend-provided node coordinates.
|
||||
setStatus("Preparing buffers…");
|
||||
const xs = new Float32Array(count);
|
||||
const ys = new Float32Array(count);
|
||||
for (let i = 0; i < count; i++) {
|
||||
const nx = nodes[i]?.x;
|
||||
const ny = nodes[i]?.y;
|
||||
xs[i] = typeof nx === "number" ? nx : 0;
|
||||
ys[i] = typeof ny === "number" ? ny : 0;
|
||||
}
|
||||
const vertexIds = new Uint32Array(count);
|
||||
for (let i = 0; i < count; i++) {
|
||||
const parts = nodeLines[i].split(",");
|
||||
vertexIds[i] = parseInt(parts[0], 10);
|
||||
xs[i] = parseFloat(parts[1]);
|
||||
ys[i] = parseFloat(parts[2]);
|
||||
const id = nodes[i]?.id;
|
||||
vertexIds[i] = typeof id === "number" ? id >>> 0 : i;
|
||||
}
|
||||
|
||||
setStatus("Parsing edges…");
|
||||
const pLines = primaryEdgesText.split("\n").slice(1).filter(l => l.trim().length > 0);
|
||||
const sLines = secondaryEdgesText.split("\n").slice(1).filter(l => l.trim().length > 0);
|
||||
|
||||
const totalEdges = pLines.length + sLines.length;
|
||||
const edgeData = new Uint32Array(totalEdges * 2);
|
||||
|
||||
let idx = 0;
|
||||
// Parse primary
|
||||
for (let i = 0; i < pLines.length; i++) {
|
||||
const parts = pLines[i].split(",");
|
||||
edgeData[idx++] = parseInt(parts[0], 10);
|
||||
edgeData[idx++] = parseInt(parts[1], 10);
|
||||
}
|
||||
// Parse secondary
|
||||
for (let i = 0; i < sLines.length; i++) {
|
||||
const parts = sLines[i].split(",");
|
||||
edgeData[idx++] = parseInt(parts[0], 10);
|
||||
edgeData[idx++] = parseInt(parts[1], 10);
|
||||
// Build edges as vertex-id pairs.
|
||||
const edgeData = new Uint32Array(edges.length * 2);
|
||||
for (let i = 0; i < edges.length; i++) {
|
||||
const s = edges[i]?.source;
|
||||
const t = edges[i]?.target;
|
||||
edgeData[i * 2] = typeof s === "number" ? s >>> 0 : 0;
|
||||
edgeData[i * 2 + 1] = typeof t === "number" ? t >>> 0 : 0;
|
||||
}
|
||||
|
||||
// Parse URI map if available
|
||||
if (uriMapText) {
|
||||
const uriLines = uriMapText.split("\n").slice(1).filter(l => l.trim().length > 0);
|
||||
for (const line of uriLines) {
|
||||
const parts = line.split(",");
|
||||
if (parts.length >= 4) {
|
||||
const id = parseInt(parts[0], 10);
|
||||
const uri = parts[1];
|
||||
const label = parts[2];
|
||||
const isPrimary = parts[3].trim() === "1";
|
||||
uriMapRef.current.set(id, { uri, label, isPrimary });
|
||||
}
|
||||
}
|
||||
// Use /api/graph meta; don't do a second expensive backend call.
|
||||
if (meta && typeof meta.nodes === "number" && typeof meta.edges === "number") {
|
||||
setBackendStats({
|
||||
nodes: meta.nodes,
|
||||
edges: meta.edges,
|
||||
backend: typeof meta.backend === "string" ? meta.backend : undefined,
|
||||
});
|
||||
} else {
|
||||
setBackendStats({ nodes: nodes.length, edges: edges.length });
|
||||
}
|
||||
|
||||
if (cancelled) return;
|
||||
|
||||
setStatus("Building spatial index…");
|
||||
await new Promise(r => setTimeout(r, 0));
|
||||
await new Promise((r) => setTimeout(r, 0));
|
||||
|
||||
const buildMs = renderer.init(xs, ys, vertexIds, edgeData);
|
||||
setNodeCount(renderer.getNodeCount());
|
||||
setStatus("");
|
||||
console.log(`Init complete: ${count.toLocaleString()} nodes, ${totalEdges.toLocaleString()} edges in ${buildMs.toFixed(0)}ms`);
|
||||
console.log(`Init complete: ${count.toLocaleString()} nodes, ${edges.length.toLocaleString()} edges in ${buildMs.toFixed(0)}ms`);
|
||||
} catch (e) {
|
||||
if (!cancelled) {
|
||||
setError(e instanceof Error ? e.message : String(e));
|
||||
@@ -200,9 +213,18 @@ export default function App() {
|
||||
frameCount++;
|
||||
|
||||
// Find hovered node using quadtree
|
||||
const nodeResult = renderer.findNodeIndexAt(mousePos.current.x, mousePos.current.y);
|
||||
if (nodeResult) {
|
||||
setHoveredNode({ x: nodeResult.x, y: nodeResult.y, screenX: mousePos.current.x, screenY: mousePos.current.y, index: nodeResult.index });
|
||||
const hit = renderer.findNodeIndexAt(mousePos.current.x, mousePos.current.y);
|
||||
if (hit) {
|
||||
const origIdx = renderer.sortedIndexToOriginalIndex(hit.index);
|
||||
const meta = origIdx === null ? null : nodesRef.current[origIdx];
|
||||
setHoveredNode({
|
||||
x: hit.x,
|
||||
y: hit.y,
|
||||
screenX: mousePos.current.x,
|
||||
screenY: mousePos.current.y,
|
||||
label: meta && typeof meta.label === "string" ? meta.label : undefined,
|
||||
iri: meta && typeof meta.iri === "string" ? meta.iri : undefined,
|
||||
});
|
||||
} else {
|
||||
setHoveredNode(null);
|
||||
}
|
||||
@@ -238,9 +260,72 @@ export default function App() {
|
||||
|
||||
// Sync selection state to renderer
|
||||
useEffect(() => {
|
||||
if (rendererRef.current) {
|
||||
rendererRef.current.updateSelection(selectedNodes);
|
||||
const renderer = rendererRef.current;
|
||||
if (!renderer) return;
|
||||
|
||||
// Optimistically reflect selection immediately; neighbors will be filled in by backend.
|
||||
renderer.updateSelection(selectedNodes, new Set());
|
||||
|
||||
// Invalidate any in-flight neighbor request for the previous selection.
|
||||
const reqId = ++neighborsReqIdRef.current;
|
||||
|
||||
// Convert selected sorted indices to backend node IDs (graph-export dense IDs).
|
||||
const selectedIds: number[] = [];
|
||||
for (const sortedIdx of selectedNodes) {
|
||||
const origIdx = renderer.sortedIndexToOriginalIndex(sortedIdx);
|
||||
if (origIdx === null) continue;
|
||||
const nodeId = nodesRef.current?.[origIdx]?.id;
|
||||
if (typeof nodeId === "number") selectedIds.push(nodeId);
|
||||
}
|
||||
|
||||
if (selectedIds.length === 0) {
|
||||
return;
|
||||
}
|
||||
|
||||
// Always send the full current selection list; backend returns the merged neighbor set.
|
||||
const ctrl = new AbortController();
|
||||
|
||||
(async () => {
|
||||
try {
|
||||
const meta = graphMetaRef.current;
|
||||
const body = {
|
||||
selected_ids: selectedIds,
|
||||
node_limit: typeof meta?.node_limit === "number" ? meta.node_limit : undefined,
|
||||
edge_limit: typeof meta?.edge_limit === "number" ? meta.edge_limit : undefined,
|
||||
};
|
||||
|
||||
const res = await fetch("/api/neighbors", {
|
||||
method: "POST",
|
||||
headers: { "content-type": "application/json" },
|
||||
body: JSON.stringify(body),
|
||||
signal: ctrl.signal,
|
||||
});
|
||||
if (!res.ok) throw new Error(`POST /api/neighbors failed: ${res.status}`);
|
||||
const data = await res.json();
|
||||
if (ctrl.signal.aborted) return;
|
||||
if (reqId !== neighborsReqIdRef.current) return;
|
||||
|
||||
const neighborIds: unknown = data?.neighbor_ids;
|
||||
const neighborSorted = new Set<number>();
|
||||
if (Array.isArray(neighborIds)) {
|
||||
for (const id of neighborIds) {
|
||||
if (typeof id !== "number") continue;
|
||||
const sorted = renderer.vertexIdToSortedIndexOrNull(id);
|
||||
if (sorted === null) continue;
|
||||
if (!selectedNodes.has(sorted)) neighborSorted.add(sorted);
|
||||
}
|
||||
}
|
||||
|
||||
renderer.updateSelection(selectedNodes, neighborSorted);
|
||||
} catch (e) {
|
||||
if (ctrl.signal.aborted) return;
|
||||
console.warn(e);
|
||||
// Keep the UI usable even if neighbors fail to load.
|
||||
renderer.updateSelection(selectedNodes, new Set());
|
||||
}
|
||||
})();
|
||||
|
||||
return () => ctrl.abort();
|
||||
}, [selectedNodes]);
|
||||
|
||||
return (
|
||||
@@ -312,6 +397,11 @@ export default function App() {
|
||||
<div>Zoom: {stats.zoom < 0.01 ? stats.zoom.toExponential(2) : stats.zoom.toFixed(2)} px/unit</div>
|
||||
<div>Pt Size: {stats.ptSize.toFixed(1)}px</div>
|
||||
<div style={{ color: "#f80" }}>Selected: {selectedNodes.size}</div>
|
||||
{backendStats && (
|
||||
<div style={{ color: "#8f8" }}>
|
||||
Backend{backendStats.backend ? ` (${backendStats.backend})` : ""}: {backendStats.nodes.toLocaleString()} nodes, {backendStats.edges.toLocaleString()} edges
|
||||
</div>
|
||||
)}
|
||||
</div>
|
||||
<div
|
||||
style={{
|
||||
@@ -349,22 +439,12 @@ export default function App() {
|
||||
boxShadow: "0 2px 8px rgba(0,0,0,0.5)",
|
||||
}}
|
||||
>
|
||||
{(() => {
|
||||
if (hoveredNode.index !== undefined && rendererRef.current) {
|
||||
const vertexId = rendererRef.current.getVertexId(hoveredNode.index);
|
||||
const info = vertexId !== undefined ? uriMapRef.current.get(vertexId) : undefined;
|
||||
if (info) {
|
||||
return (
|
||||
<>
|
||||
<div style={{ fontWeight: "bold", marginBottom: 2 }}>{info.label}</div>
|
||||
<div style={{ fontSize: "10px", color: "#8cf", wordBreak: "break-all", maxWidth: 400 }}>{info.uri}</div>
|
||||
{info.isPrimary && <div style={{ color: "#ff0", fontSize: "10px", marginTop: 2 }}>⭐ Primary (rdf:type)</div>}
|
||||
</>
|
||||
);
|
||||
}
|
||||
}
|
||||
return <>({hoveredNode.x.toFixed(2)}, {hoveredNode.y.toFixed(2)})</>;
|
||||
})()}
|
||||
<div style={{ color: "#0ff" }}>
|
||||
{hoveredNode.label || hoveredNode.iri || "(unknown)"}
|
||||
</div>
|
||||
<div style={{ color: "#688" }}>
|
||||
({hoveredNode.x.toFixed(2)}, {hoveredNode.y.toFixed(2)})
|
||||
</div>
|
||||
</div>
|
||||
)}
|
||||
</>
|
||||
@@ -80,10 +80,11 @@ export class Renderer {
|
||||
// Data
|
||||
private leaves: Leaf[] = [];
|
||||
private sorted: Float32Array = new Float32Array(0);
|
||||
// order[sortedIdx] = originalIdx (original ordering matches input arrays)
|
||||
private sortedToOriginal: Uint32Array = new Uint32Array(0);
|
||||
private vertexIdToSortedIndex: Map<number, number> = new Map();
|
||||
private nodeCount = 0;
|
||||
private edgeCount = 0;
|
||||
private neighborMap: Map<number, number[]> = new Map();
|
||||
private sortedToVertexId: Uint32Array = new Uint32Array(0);
|
||||
private leafEdgeStarts: Uint32Array = new Uint32Array(0);
|
||||
private leafEdgeCounts: Uint32Array = new Uint32Array(0);
|
||||
private maxPtSize = 256;
|
||||
@@ -203,6 +204,7 @@ export class Renderer {
|
||||
const { sorted, leaves, order } = buildSpatialIndex(xs, ys);
|
||||
this.leaves = leaves;
|
||||
this.sorted = sorted;
|
||||
this.sortedToOriginal = order;
|
||||
|
||||
// Pre-allocate arrays for render loop (zero-allocation rendering)
|
||||
this.visibleLeafIndices = new Uint32Array(leaves.length);
|
||||
@@ -214,12 +216,6 @@ export class Renderer {
|
||||
gl.bufferData(gl.ARRAY_BUFFER, sorted, gl.STATIC_DRAW);
|
||||
gl.bindVertexArray(null);
|
||||
|
||||
// Build sorted index → vertex ID mapping for hover lookups
|
||||
this.sortedToVertexId = new Uint32Array(count);
|
||||
for (let i = 0; i < count; i++) {
|
||||
this.sortedToVertexId[i] = vertexIds[order[i]];
|
||||
}
|
||||
|
||||
// Build vertex ID → original input index mapping
|
||||
const vertexIdToOriginal = new Map<number, number>();
|
||||
for (let i = 0; i < count; i++) {
|
||||
@@ -233,6 +229,13 @@ export class Renderer {
|
||||
originalToSorted[order[i]] = i;
|
||||
}
|
||||
|
||||
// Build vertex ID → sorted index mapping (used by backend-driven neighbor highlighting)
|
||||
const vertexIdToSortedIndex = new Map<number, number>();
|
||||
for (let i = 0; i < count; i++) {
|
||||
vertexIdToSortedIndex.set(vertexIds[i], originalToSorted[i]);
|
||||
}
|
||||
this.vertexIdToSortedIndex = vertexIdToSortedIndex;
|
||||
|
||||
// Remap edges from vertex IDs to sorted indices
|
||||
const lineIndices = new Uint32Array(edgeCount * 2);
|
||||
let validEdges = 0;
|
||||
@@ -248,18 +251,6 @@ export class Renderer {
|
||||
}
|
||||
this.edgeCount = validEdges;
|
||||
|
||||
// Build per-node neighbor list from edges for selection queries
|
||||
const neighborMap = new Map<number, number[]>();
|
||||
for (let i = 0; i < validEdges; i++) {
|
||||
const src = lineIndices[i * 2];
|
||||
const dst = lineIndices[i * 2 + 1];
|
||||
if (!neighborMap.has(src)) neighborMap.set(src, []);
|
||||
neighborMap.get(src)!.push(dst);
|
||||
if (!neighborMap.has(dst)) neighborMap.set(dst, []);
|
||||
neighborMap.get(dst)!.push(src);
|
||||
}
|
||||
this.neighborMap = neighborMap;
|
||||
|
||||
// Build per-leaf edge index for efficient visible-only edge drawing
|
||||
// Find which leaf each sorted index belongs to
|
||||
const nodeToLeaf = new Uint32Array(count);
|
||||
@@ -339,12 +330,25 @@ export class Renderer {
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the original vertex ID for a given sorted index.
|
||||
* Useful for looking up URI labels from the URI map.
|
||||
* Map a sorted buffer index (what findNodeIndexAt returns) back to the original
|
||||
* index in the input arrays used to initialize the renderer.
|
||||
*/
|
||||
getVertexId(sortedIndex: number): number | undefined {
|
||||
if (sortedIndex < 0 || sortedIndex >= this.sortedToVertexId.length) return undefined;
|
||||
return this.sortedToVertexId[sortedIndex];
|
||||
sortedIndexToOriginalIndex(sortedIndex: number): number | null {
|
||||
if (
|
||||
sortedIndex < 0 ||
|
||||
sortedIndex >= this.sortedToOriginal.length
|
||||
) {
|
||||
return null;
|
||||
}
|
||||
return this.sortedToOriginal[sortedIndex];
|
||||
}
|
||||
|
||||
/**
|
||||
* Convert a backend node ID (node.id from /api/graph) to a sorted index used by the renderer.
|
||||
*/
|
||||
vertexIdToSortedIndexOrNull(vertexId: number): number | null {
|
||||
const idx = this.vertexIdToSortedIndex.get(vertexId);
|
||||
return typeof idx === "number" ? idx : null;
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -428,10 +432,10 @@ export class Renderer {
|
||||
|
||||
/**
|
||||
* Update the selection buffer with the given set of node indices.
|
||||
* Also computes neighbors of selected nodes.
|
||||
* Call this whenever React's selection state changes.
|
||||
* Neighbor indices are provided by the backend (SPARQL query) and uploaded separately.
|
||||
* Call this whenever selection or backend neighbor results change.
|
||||
*/
|
||||
updateSelection(selectedIndices: Set<number>): void {
|
||||
updateSelection(selectedIndices: Set<number>, neighborIndices: Set<number> = new Set()): void {
|
||||
const gl = this.gl;
|
||||
|
||||
// Upload selected indices
|
||||
@@ -441,23 +445,11 @@ export class Renderer {
|
||||
gl.bufferData(gl.ELEMENT_ARRAY_BUFFER, indices, gl.DYNAMIC_DRAW);
|
||||
gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, null);
|
||||
|
||||
// Compute neighbors of selected nodes (excluding already selected)
|
||||
const neighborSet = new Set<number>();
|
||||
for (const nodeIdx of selectedIndices) {
|
||||
const nodeNeighbors = this.neighborMap.get(nodeIdx);
|
||||
if (!nodeNeighbors) continue;
|
||||
for (const n of nodeNeighbors) {
|
||||
if (!selectedIndices.has(n)) {
|
||||
neighborSet.add(n);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Upload neighbor indices
|
||||
const neighborIndices = new Uint32Array(neighborSet);
|
||||
this.neighborCount = neighborIndices.length;
|
||||
const neighborIndexArray = new Uint32Array(neighborIndices);
|
||||
this.neighborCount = neighborIndexArray.length;
|
||||
gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, this.neighborIbo);
|
||||
gl.bufferData(gl.ELEMENT_ARRAY_BUFFER, neighborIndices, gl.DYNAMIC_DRAW);
|
||||
gl.bufferData(gl.ELEMENT_ARRAY_BUFFER, neighborIndexArray, gl.DYNAMIC_DRAW);
|
||||
gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, null);
|
||||
}
|
||||
|
||||
@@ -16,4 +16,10 @@ export default defineConfig({
|
||||
"@": path.resolve(__dirname, "src"),
|
||||
},
|
||||
},
|
||||
server: {
|
||||
proxy: {
|
||||
// Backend is reachable as http://backend:8000 inside docker-compose; localhost outside.
|
||||
"/api": process.env.VITE_BACKEND_URL || "http://localhost:8000",
|
||||
},
|
||||
},
|
||||
});
|
||||
@@ -1 +0,0 @@
|
||||
vertex,x,y
|
||||
|
@@ -1 +0,0 @@
|
||||
source,target
|
||||
|
@@ -1 +0,0 @@
|
||||
source,target
|
||||
|
@@ -1 +0,0 @@
|
||||
id,uri,label,isPrimary
|
||||
|
@@ -1,376 +0,0 @@
|
||||
#!/usr/bin/env npx tsx
|
||||
/**
|
||||
* Graph Layout
|
||||
*
|
||||
* Computes a 2D layout for a general graph (not necessarily a tree).
|
||||
*
|
||||
* - Primary nodes (from primary_edges.csv) are placed first in a radial layout
|
||||
* - Remaining nodes are placed near their connected primary neighbors
|
||||
* - Barnes-Hut force simulation relaxes the layout
|
||||
*
|
||||
* Reads: primary_edges.csv, secondary_edges.csv
|
||||
* Writes: node_positions.csv
|
||||
*
|
||||
* Usage: npx tsx scripts/compute_layout.ts
|
||||
*/
|
||||
|
||||
import { writeFileSync, readFileSync, existsSync } from "fs";
|
||||
import { join, dirname } from "path";
|
||||
import { fileURLToPath } from "url";
|
||||
|
||||
const __dirname = dirname(fileURLToPath(import.meta.url));
|
||||
const PUBLIC_DIR = join(__dirname, "..", "public");
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Configuration
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
const ITERATIONS = 200; // Force iterations
|
||||
const REPULSION_K = 200; // Repulsion strength
|
||||
const EDGE_LENGTH = 80; // Desired edge rest length
|
||||
const ATTRACTION_K = 0.005; // Spring stiffness for edges
|
||||
const INITIAL_MAX_DISP = 20; // Starting max displacement
|
||||
const COOLING = 0.995; // Cooling per iteration
|
||||
const MIN_DIST = 0.5;
|
||||
const PRINT_EVERY = 20; // Print progress every N iterations
|
||||
const BH_THETA = 0.8; // Barnes-Hut opening angle
|
||||
|
||||
// Primary node radial placement
|
||||
const PRIMARY_RADIUS = 300; // Radius for primary node ring
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Read edge data from CSVs
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
const primaryPath = join(PUBLIC_DIR, "primary_edges.csv");
|
||||
const secondaryPath = join(PUBLIC_DIR, "secondary_edges.csv");
|
||||
|
||||
if (!existsSync(primaryPath) || !existsSync(secondaryPath)) {
|
||||
console.error(`Error: Missing input files!`);
|
||||
console.error(` Expected: ${primaryPath}`);
|
||||
console.error(` Expected: ${secondaryPath}`);
|
||||
console.error(` Run 'npx tsx scripts/fetch_from_db.ts' first.`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
function parseEdges(path: string): Array<[number, number]> {
|
||||
const content = readFileSync(path, "utf-8");
|
||||
const lines = content.trim().split("\n");
|
||||
const edges: Array<[number, number]> = [];
|
||||
for (let i = 1; i < lines.length; i++) {
|
||||
const line = lines[i].trim();
|
||||
if (!line) continue;
|
||||
const [src, tgt] = line.split(",").map(Number);
|
||||
if (!isNaN(src) && !isNaN(tgt)) {
|
||||
edges.push([src, tgt]);
|
||||
}
|
||||
}
|
||||
return edges;
|
||||
}
|
||||
|
||||
const primaryEdges = parseEdges(primaryPath);
|
||||
const secondaryEdges = parseEdges(secondaryPath);
|
||||
const allEdges = [...primaryEdges, ...secondaryEdges];
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Build adjacency
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
const allNodes = new Set<number>();
|
||||
const primaryNodes = new Set<number>();
|
||||
const neighbors = new Map<number, Set<number>>();
|
||||
|
||||
function addNeighbor(a: number, b: number) {
|
||||
if (!neighbors.has(a)) neighbors.set(a, new Set());
|
||||
neighbors.get(a)!.add(b);
|
||||
if (!neighbors.has(b)) neighbors.set(b, new Set());
|
||||
neighbors.get(b)!.add(a);
|
||||
}
|
||||
|
||||
for (const [src, dst] of primaryEdges) {
|
||||
allNodes.add(src);
|
||||
allNodes.add(dst);
|
||||
primaryNodes.add(src);
|
||||
primaryNodes.add(dst);
|
||||
addNeighbor(src, dst);
|
||||
}
|
||||
|
||||
for (const [src, dst] of secondaryEdges) {
|
||||
allNodes.add(src);
|
||||
allNodes.add(dst);
|
||||
addNeighbor(src, dst);
|
||||
}
|
||||
|
||||
const N = allNodes.size;
|
||||
const nodeIds = Array.from(allNodes).sort((a, b) => a - b);
|
||||
const idToIdx = new Map<number, number>();
|
||||
nodeIds.forEach((id, idx) => idToIdx.set(id, idx));
|
||||
|
||||
console.log(
|
||||
`Read graph: ${N} nodes, ${allEdges.length} edges (P=${primaryEdges.length}, S=${secondaryEdges.length})`
|
||||
);
|
||||
console.log(`Primary nodes: ${primaryNodes.size}`);
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Initial placement
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
const x = new Float64Array(N);
|
||||
const y = new Float64Array(N);
|
||||
|
||||
// Step 1: Place primary nodes in a radial layout
|
||||
const primaryArr = Array.from(primaryNodes).sort((a, b) => a - b);
|
||||
const angleStep = (2 * Math.PI) / Math.max(1, primaryArr.length);
|
||||
const radius = PRIMARY_RADIUS * Math.max(1, Math.sqrt(primaryArr.length / 10));
|
||||
|
||||
for (let i = 0; i < primaryArr.length; i++) {
|
||||
const idx = idToIdx.get(primaryArr[i])!;
|
||||
const angle = i * angleStep;
|
||||
x[idx] = radius * Math.cos(angle);
|
||||
y[idx] = radius * Math.sin(angle);
|
||||
}
|
||||
|
||||
console.log(`Placed ${primaryArr.length} primary nodes in radial layout (r=${radius.toFixed(0)})`);
|
||||
|
||||
// Step 2: Place remaining nodes near their connected neighbors
|
||||
// BFS from already-placed nodes
|
||||
const placed = new Set<number>(primaryNodes);
|
||||
const queue: number[] = [...primaryArr];
|
||||
let head = 0;
|
||||
|
||||
while (head < queue.length) {
|
||||
const nodeId = queue[head++];
|
||||
const nodeNeighbors = neighbors.get(nodeId);
|
||||
if (!nodeNeighbors) continue;
|
||||
|
||||
for (const nbId of nodeNeighbors) {
|
||||
if (placed.has(nbId)) continue;
|
||||
placed.add(nbId);
|
||||
|
||||
// Place near this neighbor with some jitter
|
||||
const parentIdx = idToIdx.get(nodeId)!;
|
||||
const childIdx = idToIdx.get(nbId)!;
|
||||
const jitterAngle = Math.random() * 2 * Math.PI;
|
||||
const jitterDist = EDGE_LENGTH * (0.5 + Math.random() * 0.5);
|
||||
x[childIdx] = x[parentIdx] + jitterDist * Math.cos(jitterAngle);
|
||||
y[childIdx] = y[parentIdx] + jitterDist * Math.sin(jitterAngle);
|
||||
|
||||
queue.push(nbId);
|
||||
}
|
||||
}
|
||||
|
||||
// Handle disconnected nodes (place randomly)
|
||||
for (const id of nodeIds) {
|
||||
if (!placed.has(id)) {
|
||||
const idx = idToIdx.get(id)!;
|
||||
const angle = Math.random() * 2 * Math.PI;
|
||||
const r = radius * (1 + Math.random());
|
||||
x[idx] = r * Math.cos(angle);
|
||||
y[idx] = r * Math.sin(angle);
|
||||
placed.add(id);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(`Initial placement complete: ${placed.size} nodes`);
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Force-directed layout with Barnes-Hut
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
console.log(`Running force simulation (${ITERATIONS} iterations, ${N} nodes, ${allEdges.length} edges)...`);
|
||||
|
||||
const t0 = performance.now();
|
||||
let maxDisp = INITIAL_MAX_DISP;
|
||||
|
||||
for (let iter = 0; iter < ITERATIONS; iter++) {
|
||||
const bhRoot = buildBHTree(x, y, N);
|
||||
const fx = new Float64Array(N);
|
||||
const fy = new Float64Array(N);
|
||||
|
||||
// 1. Repulsion via Barnes-Hut
|
||||
for (let i = 0; i < N; i++) {
|
||||
calcBHForce(bhRoot, x[i], y[i], fx, fy, i, BH_THETA, x, y);
|
||||
}
|
||||
|
||||
// 2. Edge attraction (spring force)
|
||||
for (const [aId, bId] of allEdges) {
|
||||
const a = idToIdx.get(aId)!;
|
||||
const b = idToIdx.get(bId)!;
|
||||
const dx = x[b] - x[a];
|
||||
const dy = y[b] - y[a];
|
||||
const d = Math.sqrt(dx * dx + dy * dy) || MIN_DIST;
|
||||
const displacement = d - EDGE_LENGTH;
|
||||
const f = ATTRACTION_K * displacement;
|
||||
const ux = dx / d;
|
||||
const uy = dy / d;
|
||||
fx[a] += ux * f;
|
||||
fy[a] += uy * f;
|
||||
fx[b] -= ux * f;
|
||||
fy[b] -= uy * f;
|
||||
}
|
||||
|
||||
// 3. Apply forces with displacement capping
|
||||
let totalForce = 0;
|
||||
for (let i = 0; i < N; i++) {
|
||||
const mag = Math.sqrt(fx[i] * fx[i] + fy[i] * fy[i]);
|
||||
totalForce += mag;
|
||||
if (mag > 0) {
|
||||
const cap = Math.min(maxDisp, mag) / mag;
|
||||
x[i] += fx[i] * cap;
|
||||
y[i] += fy[i] * cap;
|
||||
}
|
||||
}
|
||||
|
||||
maxDisp *= COOLING;
|
||||
|
||||
if ((iter + 1) % PRINT_EVERY === 0 || iter === 0) {
|
||||
console.log(
|
||||
` iter ${iter + 1}/${ITERATIONS} max_disp=${maxDisp.toFixed(2)} avg_force=${(totalForce / N).toFixed(2)}`
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
const elapsed = performance.now() - t0;
|
||||
console.log(`Force simulation done in ${(elapsed / 1000).toFixed(1)}s`);
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Write output
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
const outLines: string[] = ["vertex,x,y"];
|
||||
for (let i = 0; i < N; i++) {
|
||||
outLines.push(`${nodeIds[i]},${x[i]},${y[i]}`);
|
||||
}
|
||||
|
||||
const outPath = join(PUBLIC_DIR, "node_positions.csv");
|
||||
writeFileSync(outPath, outLines.join("\n") + "\n");
|
||||
console.log(`Wrote ${N} positions to ${outPath}`);
|
||||
console.log(`Layout complete.`);
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Barnes-Hut Helpers
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
interface BHNode {
|
||||
mass: number;
|
||||
cx: number;
|
||||
cy: number;
|
||||
minX: number;
|
||||
maxX: number;
|
||||
minY: number;
|
||||
maxY: number;
|
||||
children?: BHNode[];
|
||||
pointIdx?: number;
|
||||
}
|
||||
|
||||
function buildBHTree(x: Float64Array, y: Float64Array, n: number): BHNode {
|
||||
let minX = Infinity, maxX = -Infinity, minY = Infinity, maxY = -Infinity;
|
||||
for (let i = 0; i < n; i++) {
|
||||
if (x[i] < minX) minX = x[i];
|
||||
if (x[i] > maxX) maxX = x[i];
|
||||
if (y[i] < minY) minY = y[i];
|
||||
if (y[i] > maxY) maxY = y[i];
|
||||
}
|
||||
const cx = (minX + maxX) / 2;
|
||||
const cy = (minY + maxY) / 2;
|
||||
const halfDim = Math.max(maxX - minX, maxY - minY) / 2 + 0.01;
|
||||
|
||||
const root: BHNode = {
|
||||
mass: 0, cx: 0, cy: 0,
|
||||
minX: cx - halfDim, maxX: cx + halfDim,
|
||||
minY: cy - halfDim, maxY: cy + halfDim,
|
||||
};
|
||||
|
||||
for (let i = 0; i < n; i++) {
|
||||
insertBH(root, i, x[i], y[i], x, y);
|
||||
}
|
||||
calcBHMass(root, x, y);
|
||||
return root;
|
||||
}
|
||||
|
||||
function insertBH(node: BHNode, idx: number, px: number, py: number, x: Float64Array, y: Float64Array) {
|
||||
if (!node.children && node.pointIdx === undefined) {
|
||||
node.pointIdx = idx;
|
||||
return;
|
||||
}
|
||||
|
||||
if (!node.children && node.pointIdx !== undefined) {
|
||||
const oldIdx = node.pointIdx;
|
||||
node.pointIdx = undefined;
|
||||
subdivideBH(node);
|
||||
insertBH(node, oldIdx, x[oldIdx], y[oldIdx], x, y);
|
||||
}
|
||||
|
||||
if (node.children) {
|
||||
const mx = (node.minX + node.maxX) / 2;
|
||||
const my = (node.minY + node.maxY) / 2;
|
||||
let q = 0;
|
||||
if (px > mx) q += 1;
|
||||
if (py > my) q += 2;
|
||||
insertBH(node.children[q], idx, px, py, x, y);
|
||||
}
|
||||
}
|
||||
|
||||
function subdivideBH(node: BHNode) {
|
||||
const mx = (node.minX + node.maxX) / 2;
|
||||
const my = (node.minY + node.maxY) / 2;
|
||||
node.children = [
|
||||
{ mass: 0, cx: 0, cy: 0, minX: node.minX, maxX: mx, minY: node.minY, maxY: my },
|
||||
{ mass: 0, cx: 0, cy: 0, minX: mx, maxX: node.maxX, minY: node.minY, maxY: my },
|
||||
{ mass: 0, cx: 0, cy: 0, minX: node.minX, maxX: mx, minY: my, maxY: node.maxY },
|
||||
{ mass: 0, cx: 0, cy: 0, minX: mx, maxX: node.maxX, minY: my, maxY: node.maxY },
|
||||
];
|
||||
}
|
||||
|
||||
function calcBHMass(node: BHNode, x: Float64Array, y: Float64Array) {
|
||||
if (node.pointIdx !== undefined) {
|
||||
node.mass = 1;
|
||||
node.cx = x[node.pointIdx];
|
||||
node.cy = y[node.pointIdx];
|
||||
return;
|
||||
}
|
||||
if (node.children) {
|
||||
let m = 0, sx = 0, sy = 0;
|
||||
for (const c of node.children) {
|
||||
calcBHMass(c, x, y);
|
||||
m += c.mass;
|
||||
sx += c.cx * c.mass;
|
||||
sy += c.cy * c.mass;
|
||||
}
|
||||
node.mass = m;
|
||||
if (m > 0) {
|
||||
node.cx = sx / m;
|
||||
node.cy = sy / m;
|
||||
} else {
|
||||
node.cx = (node.minX + node.maxX) / 2;
|
||||
node.cy = (node.minY + node.maxY) / 2;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
function calcBHForce(
|
||||
node: BHNode,
|
||||
px: number, py: number,
|
||||
fx: Float64Array, fy: Float64Array,
|
||||
idx: number, theta: number,
|
||||
x: Float64Array, y: Float64Array,
|
||||
) {
|
||||
const dx = px - node.cx;
|
||||
const dy = py - node.cy;
|
||||
const d2 = dx * dx + dy * dy;
|
||||
const dist = Math.sqrt(d2);
|
||||
const width = node.maxX - node.minX;
|
||||
|
||||
if (width / dist < theta || !node.children) {
|
||||
if (node.mass > 0 && node.pointIdx !== idx) {
|
||||
const dEff = Math.max(dist, MIN_DIST);
|
||||
const f = (REPULSION_K * node.mass) / (dEff * dEff);
|
||||
fx[idx] += (dx / dEff) * f;
|
||||
fy[idx] += (dy / dEff) * f;
|
||||
}
|
||||
} else {
|
||||
for (const c of node.children) {
|
||||
calcBHForce(c, px, py, fx, fy, idx, theta, x, y);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -1,390 +0,0 @@
|
||||
#!/usr/bin/env npx tsx
|
||||
/**
|
||||
* Fetch RDF Data from AnzoGraph DB
|
||||
*
|
||||
* 1. Query the first 1000 distinct subject URIs
|
||||
* 2. Fetch all triples where those URIs appear as subject or object
|
||||
* 3. Identify primary nodes (objects of rdf:type)
|
||||
* 4. Write primary_edges.csv, secondary_edges.csv, and uri_map.csv
|
||||
*
|
||||
* Usage: npx tsx scripts/fetch_from_db.ts [--host http://localhost:8080]
|
||||
*/
|
||||
|
||||
import { writeFileSync } from "fs";
|
||||
import { join, dirname } from "path";
|
||||
import { fileURLToPath } from "url";
|
||||
|
||||
const __dirname = dirname(fileURLToPath(import.meta.url));
|
||||
const PUBLIC_DIR = join(__dirname, "..", "public");
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Configuration
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
const RDF_TYPE = "http://www.w3.org/1999/02/22-rdf-syntax-ns#type";
|
||||
const BATCH_SIZE = 100; // URIs per VALUES batch query
|
||||
const MAX_RETRIES = 30; // Wait up to ~120s for AnzoGraph to start
|
||||
const RETRY_DELAY_MS = 4000;
|
||||
|
||||
// Path to TTL file inside the AnzoGraph container (mapped via docker-compose volume)
|
||||
const DATA_FILE = process.env.SPARQL_DATA_FILE || "file:///opt/shared-files/vkg-materialized.ttl";
|
||||
|
||||
// Parse --host flag, default to http://localhost:8080
|
||||
function getEndpoint(): string {
|
||||
const hostIdx = process.argv.indexOf("--host");
|
||||
if (hostIdx !== -1 && process.argv[hostIdx + 1]) {
|
||||
return process.argv[hostIdx + 1];
|
||||
}
|
||||
// Inside Docker, use service name; otherwise localhost
|
||||
return process.env.SPARQL_HOST || "http://localhost:8080";
|
||||
}
|
||||
|
||||
const SPARQL_ENDPOINT = `${getEndpoint()}/sparql`;
|
||||
|
||||
// Auth credentials (AnzoGraph defaults)
|
||||
const SPARQL_USER = process.env.SPARQL_USER || "admin";
|
||||
const SPARQL_PASS = process.env.SPARQL_PASS || "Passw0rd1";
|
||||
const AUTH_HEADER = "Basic " + Buffer.from(`${SPARQL_USER}:${SPARQL_PASS}`).toString("base64");
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// SPARQL helpers
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
interface SparqlBinding {
|
||||
[key: string]: { type: string; value: string };
|
||||
}
|
||||
|
||||
function sleep(ms: number): Promise<void> {
|
||||
return new Promise((resolve) => setTimeout(resolve, ms));
|
||||
}
|
||||
|
||||
async function sparqlQuery(query: string, retries = 5): Promise<SparqlBinding[]> {
|
||||
for (let attempt = 1; attempt <= retries; attempt++) {
|
||||
const controller = new AbortController();
|
||||
const timeout = setTimeout(() => controller.abort(), 300_000); // 5 min timeout
|
||||
|
||||
try {
|
||||
const t0 = performance.now();
|
||||
const response = await fetch(SPARQL_ENDPOINT, {
|
||||
method: "POST",
|
||||
headers: {
|
||||
"Content-Type": "application/x-www-form-urlencoded",
|
||||
"Accept": "application/sparql-results+json",
|
||||
"Authorization": AUTH_HEADER,
|
||||
},
|
||||
body: "query=" + encodeURIComponent(query),
|
||||
signal: controller.signal,
|
||||
});
|
||||
const t1 = performance.now();
|
||||
console.log(` [sparql] response status=${response.status} in ${((t1 - t0) / 1000).toFixed(1)}s`);
|
||||
|
||||
if (!response.ok) {
|
||||
const text = await response.text();
|
||||
throw new Error(`SPARQL query failed (${response.status}): ${text}`);
|
||||
}
|
||||
|
||||
const text = await response.text();
|
||||
const t2 = performance.now();
|
||||
console.log(` [sparql] body read (${(text.length / 1024).toFixed(0)} KB) in ${((t2 - t1) / 1000).toFixed(1)}s`);
|
||||
|
||||
const json = JSON.parse(text);
|
||||
return json.results.bindings;
|
||||
} catch (err: any) {
|
||||
clearTimeout(timeout);
|
||||
const msg = err instanceof Error ? err.message : String(err);
|
||||
const isTransient = msg.includes("fetch failed") || msg.includes("Timeout") || msg.includes("ABORT") || msg.includes("abort");
|
||||
if (isTransient && attempt < retries) {
|
||||
console.log(` [sparql] transient error (attempt ${attempt}/${retries}): ${msg.substring(0, 100)}`);
|
||||
console.log(` [sparql] retrying in 10s (AnzoGraph may still be indexing after LOAD)...`);
|
||||
await sleep(10_000);
|
||||
continue;
|
||||
}
|
||||
throw err;
|
||||
} finally {
|
||||
clearTimeout(timeout);
|
||||
}
|
||||
}
|
||||
throw new Error("sparqlQuery: should not reach here");
|
||||
}
|
||||
|
||||
async function waitForAnzoGraph(): Promise<void> {
|
||||
console.log(`Waiting for AnzoGraph at ${SPARQL_ENDPOINT}...`);
|
||||
for (let attempt = 1; attempt <= MAX_RETRIES; attempt++) {
|
||||
try {
|
||||
const response = await fetch(SPARQL_ENDPOINT, {
|
||||
method: "POST",
|
||||
headers: {
|
||||
"Content-Type": "application/x-www-form-urlencoded",
|
||||
"Accept": "application/sparql-results+json",
|
||||
"Authorization": AUTH_HEADER,
|
||||
},
|
||||
body: "query=" + encodeURIComponent("ASK WHERE { ?s ?p ?o }"),
|
||||
});
|
||||
const text = await response.text();
|
||||
// Verify it's actual JSON (not a plain-text error from a half-started engine)
|
||||
JSON.parse(text);
|
||||
console.log(` AnzoGraph is ready (attempt ${attempt})`);
|
||||
return;
|
||||
} catch (err: any) {
|
||||
const msg = err instanceof Error ? err.message : String(err);
|
||||
console.log(` Attempt ${attempt}/${MAX_RETRIES}: ${msg.substring(0, 100)}`);
|
||||
if (attempt < MAX_RETRIES) {
|
||||
await sleep(RETRY_DELAY_MS);
|
||||
}
|
||||
}
|
||||
}
|
||||
throw new Error(`AnzoGraph not available after ${MAX_RETRIES} attempts`);
|
||||
}
|
||||
|
||||
async function sparqlUpdate(update: string): Promise<string> {
|
||||
const response = await fetch(SPARQL_ENDPOINT, {
|
||||
method: "POST",
|
||||
headers: {
|
||||
"Content-Type": "application/sparql-update",
|
||||
"Accept": "application/json",
|
||||
"Authorization": AUTH_HEADER,
|
||||
},
|
||||
body: update,
|
||||
});
|
||||
const text = await response.text();
|
||||
if (!response.ok) {
|
||||
throw new Error(`SPARQL update failed (${response.status}): ${text}`);
|
||||
}
|
||||
return text;
|
||||
}
|
||||
|
||||
async function loadData(): Promise<void> {
|
||||
console.log(`Loading data from ${DATA_FILE}...`);
|
||||
const t0 = performance.now();
|
||||
const result = await sparqlUpdate(`LOAD <${DATA_FILE}>`);
|
||||
const elapsed = ((performance.now() - t0) / 1000).toFixed(1);
|
||||
console.log(` Load complete in ${elapsed}s: ${result.substring(0, 200)}`);
|
||||
}
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Step 1: Fetch seed URIs
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
async function fetchSeedURIs(): Promise<string[]> {
|
||||
console.log("Querying first 1000 distinct subject URIs...");
|
||||
const t0 = performance.now();
|
||||
const query = `
|
||||
SELECT DISTINCT ?s
|
||||
WHERE { ?s ?p ?o }
|
||||
LIMIT 1000
|
||||
`;
|
||||
const bindings = await sparqlQuery(query);
|
||||
const elapsed = ((performance.now() - t0) / 1000).toFixed(1);
|
||||
const uris = bindings.map((b) => b.s.value);
|
||||
console.log(` Got ${uris.length} seed URIs in ${elapsed}s`);
|
||||
return uris;
|
||||
}
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Step 2: Fetch all triples involving seed URIs
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
interface Triple {
|
||||
s: string;
|
||||
p: string;
|
||||
o: string;
|
||||
oType: string; // "uri" or "literal"
|
||||
}
|
||||
|
||||
async function fetchTriples(seedURIs: string[]): Promise<Triple[]> {
|
||||
console.log(`Fetching triples for ${seedURIs.length} seed URIs (batch size: ${BATCH_SIZE})...`);
|
||||
const allTriples: Triple[] = [];
|
||||
|
||||
for (let i = 0; i < seedURIs.length; i += BATCH_SIZE) {
|
||||
const batch = seedURIs.slice(i, i + BATCH_SIZE);
|
||||
const valuesClause = batch.map((u) => `<${u}>`).join(" ");
|
||||
|
||||
const query = `
|
||||
SELECT ?s ?p ?o
|
||||
WHERE {
|
||||
VALUES ?uri { ${valuesClause} }
|
||||
{
|
||||
?uri ?p ?o .
|
||||
BIND(?uri AS ?s)
|
||||
}
|
||||
UNION
|
||||
{
|
||||
?s ?p ?uri .
|
||||
BIND(?uri AS ?o)
|
||||
}
|
||||
}
|
||||
`;
|
||||
const bindings = await sparqlQuery(query);
|
||||
for (const b of bindings) {
|
||||
allTriples.push({
|
||||
s: b.s.value,
|
||||
p: b.p.value,
|
||||
o: b.o.value,
|
||||
oType: b.o.type,
|
||||
});
|
||||
}
|
||||
|
||||
const progress = Math.min(i + BATCH_SIZE, seedURIs.length);
|
||||
process.stdout.write(`\r Fetched triples: batch ${Math.ceil(progress / BATCH_SIZE)}/${Math.ceil(seedURIs.length / BATCH_SIZE)} (${allTriples.length} triples so far)`);
|
||||
}
|
||||
console.log(`\n Total triples: ${allTriples.length}`);
|
||||
return allTriples;
|
||||
}
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Step 3: Build graph data
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
interface GraphData {
|
||||
nodeURIs: string[]; // All unique URIs (subjects & objects that are URIs)
|
||||
uriToId: Map<string, number>;
|
||||
primaryNodeIds: Set<number>; // Nodes that are objects of rdf:type
|
||||
edges: Array<[number, number]>; // [source, target] as numeric IDs
|
||||
primaryEdges: Array<[number, number]>;
|
||||
secondaryEdges: Array<[number, number]>;
|
||||
}
|
||||
|
||||
function buildGraphData(triples: Triple[]): GraphData {
|
||||
console.log("Building graph data...");
|
||||
|
||||
// Collect all unique URI nodes (skip literal objects)
|
||||
const uriSet = new Set<string>();
|
||||
for (const t of triples) {
|
||||
uriSet.add(t.s);
|
||||
if (t.oType === "uri") {
|
||||
uriSet.add(t.o);
|
||||
}
|
||||
}
|
||||
|
||||
// Assign numeric IDs
|
||||
const nodeURIs = Array.from(uriSet).sort();
|
||||
const uriToId = new Map<string, number>();
|
||||
nodeURIs.forEach((uri, idx) => uriToId.set(uri, idx));
|
||||
|
||||
// Identify primary nodes: objects of rdf:type triples
|
||||
const primaryNodeIds = new Set<number>();
|
||||
for (const t of triples) {
|
||||
if (t.p === RDF_TYPE && t.oType === "uri") {
|
||||
const objId = uriToId.get(t.o);
|
||||
if (objId !== undefined) {
|
||||
primaryNodeIds.add(objId);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Build edges (only between URI nodes, skip literal objects)
|
||||
const edgeSet = new Set<string>();
|
||||
const edges: Array<[number, number]> = [];
|
||||
for (const t of triples) {
|
||||
if (t.oType !== "uri") continue;
|
||||
const srcId = uriToId.get(t.s);
|
||||
const dstId = uriToId.get(t.o);
|
||||
if (srcId === undefined || dstId === undefined) continue;
|
||||
if (srcId === dstId) continue; // Skip self-loops
|
||||
|
||||
const key = `${srcId},${dstId}`;
|
||||
if (edgeSet.has(key)) continue; // Deduplicate
|
||||
edgeSet.add(key);
|
||||
edges.push([srcId, dstId]);
|
||||
}
|
||||
|
||||
// Classify edges into primary (touches a primary node) and secondary
|
||||
const primaryEdges: Array<[number, number]> = [];
|
||||
const secondaryEdges: Array<[number, number]> = [];
|
||||
for (const [src, dst] of edges) {
|
||||
if (primaryNodeIds.has(src) || primaryNodeIds.has(dst)) {
|
||||
primaryEdges.push([src, dst]);
|
||||
} else {
|
||||
secondaryEdges.push([src, dst]);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(` Nodes: ${nodeURIs.length}`);
|
||||
console.log(` Primary nodes (rdf:type objects): ${primaryNodeIds.size}`);
|
||||
console.log(` Edges: ${edges.length} (primary: ${primaryEdges.length}, secondary: ${secondaryEdges.length})`);
|
||||
|
||||
return { nodeURIs, uriToId, primaryNodeIds, edges, primaryEdges, secondaryEdges };
|
||||
}
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Step 4: Write CSV files
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
function extractLabel(uri: string): string {
|
||||
// Extract local name: after # or last /
|
||||
const hashIdx = uri.lastIndexOf("#");
|
||||
if (hashIdx !== -1) return uri.substring(hashIdx + 1);
|
||||
const slashIdx = uri.lastIndexOf("/");
|
||||
if (slashIdx !== -1) return uri.substring(slashIdx + 1);
|
||||
return uri;
|
||||
}
|
||||
|
||||
function writeCSVs(data: GraphData): void {
|
||||
// Write primary_edges.csv
|
||||
const pLines = ["source,target"];
|
||||
for (const [src, dst] of data.primaryEdges) {
|
||||
pLines.push(`${src},${dst}`);
|
||||
}
|
||||
const pPath = join(PUBLIC_DIR, "primary_edges.csv");
|
||||
writeFileSync(pPath, pLines.join("\n") + "\n");
|
||||
console.log(`Wrote ${data.primaryEdges.length} primary edges to ${pPath}`);
|
||||
|
||||
// Write secondary_edges.csv
|
||||
const sLines = ["source,target"];
|
||||
for (const [src, dst] of data.secondaryEdges) {
|
||||
sLines.push(`${src},${dst}`);
|
||||
}
|
||||
const sPath = join(PUBLIC_DIR, "secondary_edges.csv");
|
||||
writeFileSync(sPath, sLines.join("\n") + "\n");
|
||||
console.log(`Wrote ${data.secondaryEdges.length} secondary edges to ${sPath}`);
|
||||
|
||||
// Write uri_map.csv (id,uri,label,isPrimary)
|
||||
const uLines = ["id,uri,label,isPrimary"];
|
||||
for (let i = 0; i < data.nodeURIs.length; i++) {
|
||||
const uri = data.nodeURIs[i];
|
||||
const label = extractLabel(uri);
|
||||
const isPrimary = data.primaryNodeIds.has(i) ? "1" : "0";
|
||||
// Escape commas in URIs by quoting
|
||||
const safeUri = uri.includes(",") ? `"${uri}"` : uri;
|
||||
const safeLabel = label.includes(",") ? `"${label}"` : label;
|
||||
uLines.push(`${i},${safeUri},${safeLabel},${isPrimary}`);
|
||||
}
|
||||
const uPath = join(PUBLIC_DIR, "uri_map.csv");
|
||||
writeFileSync(uPath, uLines.join("\n") + "\n");
|
||||
console.log(`Wrote ${data.nodeURIs.length} URI mappings to ${uPath}`);
|
||||
}
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Main
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
async function main() {
|
||||
console.log(`SPARQL endpoint: ${SPARQL_ENDPOINT}`);
|
||||
const t0 = performance.now();
|
||||
|
||||
await waitForAnzoGraph();
|
||||
await loadData();
|
||||
|
||||
// Smoke test: simplest possible query to verify connectivity
|
||||
console.log("Smoke test: SELECT ?s ?p ?o LIMIT 3...");
|
||||
const smokeT0 = performance.now();
|
||||
const smokeResult = await sparqlQuery("SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 3");
|
||||
const smokeElapsed = ((performance.now() - smokeT0) / 1000).toFixed(1);
|
||||
console.log(` Smoke test OK: ${smokeResult.length} results in ${smokeElapsed}s`);
|
||||
if (smokeResult.length > 0) {
|
||||
console.log(` First triple: ${smokeResult[0].s.value} ${smokeResult[0].p.value} ${smokeResult[0].o.value}`);
|
||||
}
|
||||
|
||||
const seedURIs = await fetchSeedURIs();
|
||||
const triples = await fetchTriples(seedURIs);
|
||||
const graphData = buildGraphData(triples);
|
||||
writeCSVs(graphData);
|
||||
|
||||
const elapsed = ((performance.now() - t0) / 1000).toFixed(1);
|
||||
console.log(`\nDone in ${elapsed}s`);
|
||||
}
|
||||
|
||||
main().catch((err) => {
|
||||
console.error("Fatal error:", err);
|
||||
process.exit(1);
|
||||
});
|
||||
@@ -1,132 +0,0 @@
|
||||
/**
|
||||
* Random Tree Generator
|
||||
*
|
||||
* Generates a random tree with 1–MAX_CHILDREN children per node.
|
||||
* Splits edges into primary (depth ≤ PRIMARY_DEPTH) and secondary.
|
||||
*
|
||||
* Usage: npx tsx scripts/generate_tree.ts
|
||||
*/
|
||||
|
||||
import { writeFileSync } from "fs";
|
||||
import { join, dirname } from "path";
|
||||
import { fileURLToPath } from "url";
|
||||
|
||||
const __dirname = dirname(fileURLToPath(import.meta.url));
|
||||
const PUBLIC_DIR = join(__dirname, "..", "public");
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Configuration
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
const TARGET_NODES = 10000; // Approximate number of nodes to generate
|
||||
const MAX_CHILDREN = 4; // Each node gets 1..MAX_CHILDREN children
|
||||
const PRIMARY_DEPTH = 4; // Nodes at depth ≤ this form the primary skeleton
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Tree data types
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
export interface TreeData {
|
||||
root: number;
|
||||
nodeCount: number;
|
||||
childrenOf: Map<number, number[]>;
|
||||
parentOf: Map<number, number>;
|
||||
depthOf: Map<number, number>;
|
||||
primaryNodes: Set<number>; // all nodes at depth ≤ PRIMARY_DEPTH
|
||||
primaryEdges: Array<[number, number]>; // [child, parent] edges within primary
|
||||
secondaryEdges: Array<[number, number]>;// remaining edges
|
||||
}
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Generator
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
export function generateTree(): TreeData {
|
||||
const childrenOf = new Map<number, number[]>();
|
||||
const parentOf = new Map<number, number>();
|
||||
const depthOf = new Map<number, number>();
|
||||
|
||||
const root = 0;
|
||||
depthOf.set(root, 0);
|
||||
let nextId = 1;
|
||||
const queue: number[] = [root];
|
||||
let head = 0;
|
||||
|
||||
while (head < queue.length && nextId < TARGET_NODES) {
|
||||
const parent = queue[head++];
|
||||
const parentDepth = depthOf.get(parent)!;
|
||||
const nKids = 1 + Math.floor(Math.random() * MAX_CHILDREN); // 1..MAX_CHILDREN
|
||||
|
||||
const kids: number[] = [];
|
||||
for (let c = 0; c < nKids && nextId < TARGET_NODES; c++) {
|
||||
const child = nextId++;
|
||||
kids.push(child);
|
||||
parentOf.set(child, parent);
|
||||
depthOf.set(child, parentDepth + 1);
|
||||
queue.push(child);
|
||||
}
|
||||
childrenOf.set(parent, kids);
|
||||
}
|
||||
|
||||
// Classify edges and nodes by depth
|
||||
const primaryNodes = new Set<number>();
|
||||
const primaryEdges: Array<[number, number]> = [];
|
||||
const secondaryEdges: Array<[number, number]> = [];
|
||||
|
||||
// Root is always primary
|
||||
primaryNodes.add(root);
|
||||
|
||||
for (const [child, parent] of parentOf) {
|
||||
const childDepth = depthOf.get(child)!;
|
||||
if (childDepth <= PRIMARY_DEPTH) {
|
||||
primaryNodes.add(child);
|
||||
primaryNodes.add(parent);
|
||||
primaryEdges.push([child, parent]);
|
||||
} else {
|
||||
secondaryEdges.push([child, parent]);
|
||||
}
|
||||
}
|
||||
|
||||
console.log(
|
||||
`Generated tree: ${nextId} nodes, ` +
|
||||
`${primaryEdges.length} primary edges (depth ≤ ${PRIMARY_DEPTH}), ` +
|
||||
`${secondaryEdges.length} secondary edges`
|
||||
);
|
||||
|
||||
return {
|
||||
root,
|
||||
nodeCount: nextId,
|
||||
childrenOf,
|
||||
parentOf,
|
||||
depthOf,
|
||||
primaryNodes,
|
||||
primaryEdges,
|
||||
secondaryEdges,
|
||||
};
|
||||
}
|
||||
|
||||
// ══════════════════════════════════════════════════════════
|
||||
// Run if executed directly
|
||||
// ══════════════════════════════════════════════════════════
|
||||
|
||||
if (import.meta.url === `file://${process.argv[1]}`) {
|
||||
const data = generateTree();
|
||||
|
||||
// Write primary_edges.csv
|
||||
const pLines: string[] = ["source,target"];
|
||||
for (const [child, parent] of data.primaryEdges) {
|
||||
pLines.push(`${child},${parent}`);
|
||||
}
|
||||
const pPath = join(PUBLIC_DIR, "primary_edges.csv");
|
||||
writeFileSync(pPath, pLines.join("\n") + "\n");
|
||||
console.log(`Wrote ${data.primaryEdges.length} edges to ${pPath}`);
|
||||
|
||||
// Write secondary_edges.csv
|
||||
const sLines: string[] = ["source,target"];
|
||||
for (const [child, parent] of data.secondaryEdges) {
|
||||
sLines.push(`${child},${parent}`);
|
||||
}
|
||||
const sPath = join(PUBLIC_DIR, "secondary_edges.csv");
|
||||
writeFileSync(sPath, sLines.join("\n") + "\n");
|
||||
console.log(`Wrote ${data.secondaryEdges.length} edges to ${sPath}`);
|
||||
}
|
||||
Reference in New Issue
Block a user