midpoint - go
This commit is contained in:
@@ -2,12 +2,8 @@
|
|||||||
|
|
||||||
This folder contains the FastAPI backend for `visualizador_instanciados`.
|
This folder contains the FastAPI backend for `visualizador_instanciados`.
|
||||||
|
|
||||||
The backend can execute SPARQL queries in two interchangeable ways:
|
The backend executes SPARQL queries against an AnzoGraph SPARQL endpoint over HTTP
|
||||||
|
(optionally `LOAD` a TTL on startup).
|
||||||
1. **`GRAPH_BACKEND=rdflib`**: parse a Turtle file into an in-memory RDFLib `Graph` and run SPARQL queries locally.
|
|
||||||
2. **`GRAPH_BACKEND=anzograph`**: run SPARQL queries against an AnzoGraph SPARQL endpoint over HTTP (optionally `LOAD` a TTL on startup).
|
|
||||||
|
|
||||||
Callers (frontend or other clients) interact with a single API surface (`/api/*`) and do not need to know which backend is configured.
|
|
||||||
|
|
||||||
## Files
|
## Files
|
||||||
|
|
||||||
@@ -16,10 +12,9 @@ Callers (frontend or other clients) interact with a single API surface (`/api/*`
|
|||||||
- `settings.py`
|
- `settings.py`
|
||||||
- Env-driven configuration (`pydantic-settings`).
|
- Env-driven configuration (`pydantic-settings`).
|
||||||
- `sparql_engine.py`
|
- `sparql_engine.py`
|
||||||
- Backend-agnostic SPARQL execution layer:
|
- SPARQL execution layer:
|
||||||
- `RdflibEngine`: `Graph.query(...)` + SPARQL JSON serialization.
|
|
||||||
- `AnzoGraphEngine`: HTTP POST to `/sparql` with Basic auth + readiness gate.
|
- `AnzoGraphEngine`: HTTP POST to `/sparql` with Basic auth + readiness gate.
|
||||||
- `create_sparql_engine(settings)` chooses the engine based on `GRAPH_BACKEND`.
|
- `create_sparql_engine(settings)` creates the engine.
|
||||||
- `graph_export.py`
|
- `graph_export.py`
|
||||||
- Shared helpers to:
|
- Shared helpers to:
|
||||||
- build the snapshot SPARQL query used for edge retrieval
|
- build the snapshot SPARQL query used for edge retrieval
|
||||||
@@ -27,11 +22,8 @@ Callers (frontend or other clients) interact with a single API surface (`/api/*`
|
|||||||
- `models.py`
|
- `models.py`
|
||||||
- Pydantic response/request models:
|
- Pydantic response/request models:
|
||||||
- `Node`, `Edge`, `GraphResponse`, `StatsResponse`, etc.
|
- `Node`, `Edge`, `GraphResponse`, `StatsResponse`, etc.
|
||||||
- `rdf_store.py`
|
|
||||||
- A local parsed representation (dense IDs + neighbor-ish data) built only in `GRAPH_BACKEND=rdflib`.
|
|
||||||
- Used by `/api/nodes`, `/api/edges`, and `rdflib`-mode `/api/stats`.
|
|
||||||
- `pipelines/graph_snapshot.py`
|
- `pipelines/graph_snapshot.py`
|
||||||
- Pipeline used by `/api/graph` to return a `{nodes, edges}` snapshot via SPARQL (works for both RDFLib and AnzoGraph).
|
- Pipeline used by `/api/graph` to return a `{nodes, edges}` snapshot via SPARQL.
|
||||||
- `pipelines/layout_dag_radial.py`
|
- `pipelines/layout_dag_radial.py`
|
||||||
- DAG layout helpers used by `pipelines/graph_snapshot.py`:
|
- DAG layout helpers used by `pipelines/graph_snapshot.py`:
|
||||||
- cycle detection
|
- cycle detection
|
||||||
@@ -48,11 +40,10 @@ On startup (FastAPI lifespan):
|
|||||||
|
|
||||||
1. `create_sparql_engine(settings)` selects and starts a SPARQL engine.
|
1. `create_sparql_engine(settings)` selects and starts a SPARQL engine.
|
||||||
2. The engine is stored at `app.state.sparql`.
|
2. The engine is stored at `app.state.sparql`.
|
||||||
3. If `GRAPH_BACKEND=rdflib`, `RDFStore` is also built from the already-loaded RDFLib graph and stored at `app.state.store`.
|
|
||||||
|
|
||||||
On shutdown:
|
On shutdown:
|
||||||
|
|
||||||
- `app.state.sparql.shutdown()` is called to close the HTTP client (AnzoGraph mode) or no-op (RDFLib mode).
|
- `app.state.sparql.shutdown()` is called to close the HTTP client.
|
||||||
|
|
||||||
## Environment Variables
|
## Environment Variables
|
||||||
|
|
||||||
@@ -60,20 +51,16 @@ Most configuration is intended to be provided via container environment variable
|
|||||||
|
|
||||||
Core:
|
Core:
|
||||||
|
|
||||||
- `GRAPH_BACKEND`: `rdflib` or `anzograph`
|
|
||||||
- `INCLUDE_BNODES`: `true`/`false`
|
- `INCLUDE_BNODES`: `true`/`false`
|
||||||
- `CORS_ORIGINS`: comma-separated list or `*`
|
- `CORS_ORIGINS`: comma-separated list or `*`
|
||||||
|
|
||||||
RDFLib mode:
|
Optional import-combining step (separate container):
|
||||||
|
|
||||||
- `TTL_PATH`: path inside the backend container to a `.ttl` file (example: `/data/o3po.ttl`)
|
The repo's `owl_imports_combiner` Docker service can be used to recursively load a Turtle file (or URL) plus its `owl:imports` into a single combined TTL output.
|
||||||
- `MAX_TRIPLES`: optional int; if set, stops parsing after this many triples
|
|
||||||
|
|
||||||
Optional import-combining step (runs before the SPARQL engine starts):
|
- `COMBINE_OWL_IMPORTS_ON_START`: `true` to run the combiner container on startup (no-op when `false`)
|
||||||
|
- `COMBINE_ENTRY_LOCATION`: entry file/URL to load (falls back to `TTL_PATH` if not set)
|
||||||
- `COMBINE_OWL_IMPORTS_ON_START`: `true` to recursively load `TTL_PATH` (or `COMBINE_ENTRY_LOCATION`) plus `owl:imports` and write a combined TTL file.
|
- `COMBINE_OUTPUT_LOCATION`: output path for the combined TTL (defaults to `${dirname(entry)}/${COMBINE_OUTPUT_NAME}`)
|
||||||
- `COMBINE_ENTRY_LOCATION`: optional override for the entry file/URL to load (defaults to `TTL_PATH`)
|
|
||||||
- `COMBINE_OUTPUT_LOCATION`: optional explicit output path (defaults to `${dirname(entry)}/${COMBINE_OUTPUT_NAME}`)
|
|
||||||
- `COMBINE_OUTPUT_NAME`: output filename when `COMBINE_OUTPUT_LOCATION` is not set (default: `combined_ontology.ttl`)
|
- `COMBINE_OUTPUT_NAME`: output filename when `COMBINE_OUTPUT_LOCATION` is not set (default: `combined_ontology.ttl`)
|
||||||
- `COMBINE_FORCE`: `true` to rebuild even if the output file already exists
|
- `COMBINE_FORCE`: `true` to rebuild even if the output file already exists
|
||||||
|
|
||||||
@@ -119,8 +106,6 @@ This matches the behavior described in `docs/anzograph-readiness-julia.md`.
|
|||||||
- `GET /api/graph?node_limit=...&edge_limit=...`
|
- `GET /api/graph?node_limit=...&edge_limit=...`
|
||||||
- Returns a graph snapshot as `{ nodes: [...], edges: [...] }`.
|
- Returns a graph snapshot as `{ nodes: [...], edges: [...] }`.
|
||||||
- Implemented as a SPARQL edge query + mapping in `pipelines/graph_snapshot.py`.
|
- Implemented as a SPARQL edge query + mapping in `pipelines/graph_snapshot.py`.
|
||||||
- `GET /api/nodes`, `GET /api/edges`
|
|
||||||
- Only available in `GRAPH_BACKEND=rdflib` (these use `RDFStore`'s dense ID tables).
|
|
||||||
|
|
||||||
## Data Contract
|
## Data Contract
|
||||||
|
|
||||||
@@ -193,5 +178,4 @@ If a cycle is detected in the returned `rdfs:subClassOf` snapshot, `/api/graph`
|
|||||||
## Notes / Tradeoffs
|
## Notes / Tradeoffs
|
||||||
|
|
||||||
- `/api/graph` returns only nodes that appear in the returned edge result set. Nodes not referenced by those edges will not be present.
|
- `/api/graph` returns only nodes that appear in the returned edge result set. Nodes not referenced by those edges will not be present.
|
||||||
- RDFLib and AnzoGraph may differ in supported SPARQL features (vendor extensions, inference, performance), but the API surface is the same.
|
- AnzoGraph SPARQL feature support (inference, extensions, performance) is vendor-specific.
|
||||||
- `rdf_store.py` is currently only needed for `/api/nodes`, `/api/edges`, and rdflib-mode `/api/stats`. If you don't use those endpoints, it can be removed later.
|
|
||||||
|
|||||||
@@ -1,81 +1,34 @@
|
|||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
from contextlib import asynccontextmanager
|
from contextlib import asynccontextmanager
|
||||||
import logging
|
|
||||||
import asyncio
|
|
||||||
|
|
||||||
from fastapi import FastAPI, HTTPException, Query
|
from fastapi import FastAPI, HTTPException, Query
|
||||||
from fastapi.middleware.cors import CORSMiddleware
|
from fastapi.middleware.cors import CORSMiddleware
|
||||||
|
|
||||||
from .models import (
|
from .models import (
|
||||||
EdgesResponse,
|
|
||||||
GraphResponse,
|
GraphResponse,
|
||||||
NeighborsRequest,
|
NeighborsRequest,
|
||||||
NeighborsResponse,
|
NeighborsResponse,
|
||||||
NodesResponse,
|
|
||||||
SparqlQueryRequest,
|
SparqlQueryRequest,
|
||||||
StatsResponse,
|
StatsResponse,
|
||||||
)
|
)
|
||||||
from .pipelines.layout_dag_radial import CycleError
|
from .pipelines.layout_dag_radial import CycleError
|
||||||
from .pipelines.owl_imports_combiner import (
|
|
||||||
build_combined_graph,
|
|
||||||
output_location_to_path,
|
|
||||||
resolve_output_location,
|
|
||||||
serialize_graph_to_ttl,
|
|
||||||
)
|
|
||||||
from .pipelines.selection_neighbors import fetch_neighbor_ids_for_selection
|
from .pipelines.selection_neighbors import fetch_neighbor_ids_for_selection
|
||||||
from .pipelines.snapshot_service import GraphSnapshotService
|
from .pipelines.snapshot_service import GraphSnapshotService
|
||||||
from .rdf_store import RDFStore
|
from .sparql_engine import SparqlEngine, create_sparql_engine
|
||||||
from .sparql_engine import RdflibEngine, SparqlEngine, create_sparql_engine
|
|
||||||
from .settings import Settings
|
from .settings import Settings
|
||||||
|
|
||||||
|
|
||||||
settings = Settings()
|
settings = Settings()
|
||||||
logger = logging.getLogger(__name__)
|
|
||||||
|
|
||||||
|
|
||||||
@asynccontextmanager
|
@asynccontextmanager
|
||||||
async def lifespan(app: FastAPI):
|
async def lifespan(app: FastAPI):
|
||||||
rdflib_preloaded_graph = None
|
sparql: SparqlEngine = create_sparql_engine(settings)
|
||||||
|
|
||||||
if settings.combine_owl_imports_on_start:
|
|
||||||
entry_location = settings.combine_entry_location or settings.ttl_path
|
|
||||||
output_location = resolve_output_location(
|
|
||||||
entry_location,
|
|
||||||
output_location=settings.combine_output_location,
|
|
||||||
output_name=settings.combine_output_name,
|
|
||||||
)
|
|
||||||
|
|
||||||
output_path = output_location_to_path(output_location)
|
|
||||||
if output_path.exists() and not settings.combine_force:
|
|
||||||
logger.info("Skipping combine step (output exists): %s", output_location)
|
|
||||||
else:
|
|
||||||
rdflib_preloaded_graph = await asyncio.to_thread(build_combined_graph, entry_location)
|
|
||||||
logger.info("Finished combining imports; serializing to: %s", output_location)
|
|
||||||
await asyncio.to_thread(serialize_graph_to_ttl, rdflib_preloaded_graph, output_location)
|
|
||||||
|
|
||||||
if settings.graph_backend == "rdflib":
|
|
||||||
settings.ttl_path = str(output_path)
|
|
||||||
|
|
||||||
sparql: SparqlEngine = create_sparql_engine(settings, rdflib_graph=rdflib_preloaded_graph)
|
|
||||||
await sparql.startup()
|
await sparql.startup()
|
||||||
app.state.sparql = sparql
|
app.state.sparql = sparql
|
||||||
app.state.snapshot_service = GraphSnapshotService(sparql=sparql, settings=settings)
|
app.state.snapshot_service = GraphSnapshotService(sparql=sparql, settings=settings)
|
||||||
|
|
||||||
# Only build node/edge tables when running in rdflib mode.
|
|
||||||
if settings.graph_backend == "rdflib":
|
|
||||||
assert isinstance(sparql, RdflibEngine)
|
|
||||||
if sparql.graph is None:
|
|
||||||
raise RuntimeError("rdflib graph failed to load")
|
|
||||||
|
|
||||||
store = RDFStore(
|
|
||||||
ttl_path=settings.ttl_path,
|
|
||||||
include_bnodes=settings.include_bnodes,
|
|
||||||
max_triples=settings.max_triples,
|
|
||||||
)
|
|
||||||
store.load(sparql.graph)
|
|
||||||
app.state.store = store
|
|
||||||
|
|
||||||
yield
|
yield
|
||||||
|
|
||||||
await sparql.shutdown()
|
await sparql.shutdown()
|
||||||
@@ -109,7 +62,7 @@ async def stats() -> StatsResponse:
|
|||||||
meta = snap.meta
|
meta = snap.meta
|
||||||
return StatsResponse(
|
return StatsResponse(
|
||||||
backend=meta.backend if meta else app.state.sparql.name,
|
backend=meta.backend if meta else app.state.sparql.name,
|
||||||
ttl_path=meta.ttl_path if meta and meta.ttl_path else settings.ttl_path,
|
ttl_path=meta.ttl_path if meta else None,
|
||||||
sparql_endpoint=meta.sparql_endpoint if meta else None,
|
sparql_endpoint=meta.sparql_endpoint if meta else None,
|
||||||
parsed_triples=len(snap.edges),
|
parsed_triples=len(snap.edges),
|
||||||
nodes=len(snap.nodes),
|
nodes=len(snap.nodes),
|
||||||
@@ -138,28 +91,6 @@ async def neighbors(req: NeighborsRequest) -> NeighborsResponse:
|
|||||||
return NeighborsResponse(selected_ids=req.selected_ids, neighbor_ids=neighbor_ids)
|
return NeighborsResponse(selected_ids=req.selected_ids, neighbor_ids=neighbor_ids)
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/nodes", response_model=NodesResponse)
|
|
||||||
def nodes(
|
|
||||||
limit: int = Query(default=10_000, ge=1, le=200_000),
|
|
||||||
offset: int = Query(default=0, ge=0),
|
|
||||||
) -> NodesResponse:
|
|
||||||
if settings.graph_backend != "rdflib":
|
|
||||||
raise HTTPException(status_code=501, detail="GET /api/nodes is only supported in GRAPH_BACKEND=rdflib mode")
|
|
||||||
store: RDFStore = app.state.store
|
|
||||||
return NodesResponse(total=store.node_count, nodes=store.node_slice(offset=offset, limit=limit))
|
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/edges", response_model=EdgesResponse)
|
|
||||||
def edges(
|
|
||||||
limit: int = Query(default=50_000, ge=1, le=500_000),
|
|
||||||
offset: int = Query(default=0, ge=0),
|
|
||||||
) -> EdgesResponse:
|
|
||||||
if settings.graph_backend != "rdflib":
|
|
||||||
raise HTTPException(status_code=501, detail="GET /api/edges is only supported in GRAPH_BACKEND=rdflib mode")
|
|
||||||
store: RDFStore = app.state.store
|
|
||||||
return EdgesResponse(total=store.edge_count, edges=store.edge_slice(offset=offset, limit=limit))
|
|
||||||
|
|
||||||
|
|
||||||
@app.get("/api/graph", response_model=GraphResponse)
|
@app.get("/api/graph", response_model=GraphResponse)
|
||||||
async def graph(
|
async def graph(
|
||||||
node_limit: int = Query(default=50_000, ge=1, le=200_000),
|
node_limit: int = Query(default=50_000, ge=1, le=200_000),
|
||||||
|
|||||||
@@ -8,7 +8,7 @@ class Node(BaseModel):
|
|||||||
termType: str # "uri" | "bnode"
|
termType: str # "uri" | "bnode"
|
||||||
iri: str
|
iri: str
|
||||||
label: str | None = None
|
label: str | None = None
|
||||||
# Optional because /api/nodes (RDFStore) doesn't currently provide positions.
|
# Optional because some endpoints may omit positions.
|
||||||
x: float | None = None
|
x: float | None = None
|
||||||
y: float | None = None
|
y: float | None = None
|
||||||
|
|
||||||
@@ -21,23 +21,13 @@ class Edge(BaseModel):
|
|||||||
|
|
||||||
class StatsResponse(BaseModel):
|
class StatsResponse(BaseModel):
|
||||||
backend: str
|
backend: str
|
||||||
ttl_path: str
|
ttl_path: str | None = None
|
||||||
sparql_endpoint: str | None = None
|
sparql_endpoint: str | None = None
|
||||||
parsed_triples: int
|
parsed_triples: int
|
||||||
nodes: int
|
nodes: int
|
||||||
edges: int
|
edges: int
|
||||||
|
|
||||||
|
|
||||||
class NodesResponse(BaseModel):
|
|
||||||
total: int
|
|
||||||
nodes: list[Node]
|
|
||||||
|
|
||||||
|
|
||||||
class EdgesResponse(BaseModel):
|
|
||||||
total: int
|
|
||||||
edges: list[Edge]
|
|
||||||
|
|
||||||
|
|
||||||
class GraphResponse(BaseModel):
|
class GraphResponse(BaseModel):
|
||||||
class Meta(BaseModel):
|
class Meta(BaseModel):
|
||||||
backend: str
|
backend: str
|
||||||
|
|||||||
@@ -69,8 +69,7 @@ async def fetch_graph_snapshot(
|
|||||||
edge_limit: int,
|
edge_limit: int,
|
||||||
) -> GraphResponse:
|
) -> GraphResponse:
|
||||||
"""
|
"""
|
||||||
Fetch a graph snapshot (nodes + edges) via SPARQL, independent of whether the
|
Fetch a graph snapshot (nodes + edges) via SPARQL.
|
||||||
underlying engine is RDFLib or AnzoGraph.
|
|
||||||
"""
|
"""
|
||||||
edges_q = edge_retrieval_query(edge_limit=edge_limit, include_bnodes=settings.include_bnodes)
|
edges_q = edge_retrieval_query(edge_limit=edge_limit, include_bnodes=settings.include_bnodes)
|
||||||
res = await sparql.query_json(edges_q)
|
res = await sparql.query_json(edges_q)
|
||||||
@@ -137,8 +136,8 @@ async def fetch_graph_snapshot(
|
|||||||
|
|
||||||
meta = GraphResponse.Meta(
|
meta = GraphResponse.Meta(
|
||||||
backend=sparql.name,
|
backend=sparql.name,
|
||||||
ttl_path=settings.ttl_path if settings.graph_backend == "rdflib" else None,
|
ttl_path=None,
|
||||||
sparql_endpoint=settings.effective_sparql_endpoint() if settings.graph_backend == "anzograph" else None,
|
sparql_endpoint=settings.effective_sparql_endpoint(),
|
||||||
include_bnodes=settings.include_bnodes,
|
include_bnodes=settings.include_bnodes,
|
||||||
node_limit=node_limit,
|
node_limit=node_limit,
|
||||||
edge_limit=edge_limit,
|
edge_limit=edge_limit,
|
||||||
|
|||||||
@@ -1,150 +0,0 @@
|
|||||||
from __future__ import annotations
|
|
||||||
|
|
||||||
from dataclasses import dataclass
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
from rdflib import BNode, Graph, Literal, URIRef
|
|
||||||
from rdflib.namespace import RDFS, SKOS
|
|
||||||
|
|
||||||
|
|
||||||
LABEL_PREDICATES = {RDFS.label, SKOS.prefLabel, SKOS.altLabel}
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass(frozen=True)
|
|
||||||
class EdgeRow:
|
|
||||||
source: int
|
|
||||||
target: int
|
|
||||||
predicate: str
|
|
||||||
|
|
||||||
|
|
||||||
class RDFStore:
|
|
||||||
def __init__(self, *, ttl_path: str, include_bnodes: bool, max_triples: int | None):
|
|
||||||
self.ttl_path = ttl_path
|
|
||||||
self.include_bnodes = include_bnodes
|
|
||||||
self.max_triples = max_triples
|
|
||||||
|
|
||||||
self.graph: Graph | None = None
|
|
||||||
|
|
||||||
self._id_by_term: dict[Any, int] = {}
|
|
||||||
self._term_by_id: list[Any] = []
|
|
||||||
|
|
||||||
self._labels_by_id: dict[int, str] = {}
|
|
||||||
self._edges: list[EdgeRow] = []
|
|
||||||
self._parsed_triples = 0
|
|
||||||
|
|
||||||
def _term_allowed(self, term: Any) -> bool:
|
|
||||||
if isinstance(term, Literal):
|
|
||||||
return False
|
|
||||||
if isinstance(term, BNode) and not self.include_bnodes:
|
|
||||||
return False
|
|
||||||
return isinstance(term, (URIRef, BNode))
|
|
||||||
|
|
||||||
def _get_id(self, term: Any) -> int | None:
|
|
||||||
if not self._term_allowed(term):
|
|
||||||
return None
|
|
||||||
existing = self._id_by_term.get(term)
|
|
||||||
if existing is not None:
|
|
||||||
return existing
|
|
||||||
nid = len(self._term_by_id)
|
|
||||||
self._id_by_term[term] = nid
|
|
||||||
self._term_by_id.append(term)
|
|
||||||
return nid
|
|
||||||
|
|
||||||
def _term_type(self, term: Any) -> str:
|
|
||||||
if isinstance(term, BNode):
|
|
||||||
return "bnode"
|
|
||||||
return "uri"
|
|
||||||
|
|
||||||
def _term_iri(self, term: Any) -> str:
|
|
||||||
if isinstance(term, BNode):
|
|
||||||
return f"_:{term}"
|
|
||||||
return str(term)
|
|
||||||
|
|
||||||
def load(self, graph: Graph | None = None) -> None:
|
|
||||||
g = graph or Graph()
|
|
||||||
if graph is None:
|
|
||||||
g.parse(self.ttl_path, format="turtle")
|
|
||||||
self.graph = g
|
|
||||||
|
|
||||||
self._id_by_term.clear()
|
|
||||||
self._term_by_id.clear()
|
|
||||||
self._labels_by_id.clear()
|
|
||||||
self._edges.clear()
|
|
||||||
|
|
||||||
parsed = 0
|
|
||||||
for (s, p, o) in g:
|
|
||||||
parsed += 1
|
|
||||||
if self.max_triples is not None and parsed > self.max_triples:
|
|
||||||
break
|
|
||||||
|
|
||||||
# Capture labels but do not emit them as edges.
|
|
||||||
if p in LABEL_PREDICATES and isinstance(o, Literal):
|
|
||||||
sid = self._get_id(s)
|
|
||||||
if sid is not None and sid not in self._labels_by_id:
|
|
||||||
self._labels_by_id[sid] = str(o)
|
|
||||||
continue
|
|
||||||
|
|
||||||
sid = self._get_id(s)
|
|
||||||
oid = self._get_id(o)
|
|
||||||
if sid is None or oid is None:
|
|
||||||
continue
|
|
||||||
|
|
||||||
self._edges.append(EdgeRow(source=sid, target=oid, predicate=str(p)))
|
|
||||||
|
|
||||||
self._parsed_triples = parsed
|
|
||||||
|
|
||||||
@property
|
|
||||||
def parsed_triples(self) -> int:
|
|
||||||
return self._parsed_triples
|
|
||||||
|
|
||||||
@property
|
|
||||||
def node_count(self) -> int:
|
|
||||||
return len(self._term_by_id)
|
|
||||||
|
|
||||||
@property
|
|
||||||
def edge_count(self) -> int:
|
|
||||||
return len(self._edges)
|
|
||||||
|
|
||||||
def node_slice(self, *, offset: int, limit: int) -> list[dict[str, Any]]:
|
|
||||||
end = min(self.node_count, offset + limit)
|
|
||||||
out: list[dict[str, Any]] = []
|
|
||||||
for nid in range(offset, end):
|
|
||||||
term = self._term_by_id[nid]
|
|
||||||
out.append(
|
|
||||||
{
|
|
||||||
"id": nid,
|
|
||||||
"termType": self._term_type(term),
|
|
||||||
"iri": self._term_iri(term),
|
|
||||||
"label": self._labels_by_id.get(nid),
|
|
||||||
}
|
|
||||||
)
|
|
||||||
return out
|
|
||||||
|
|
||||||
def edge_slice(self, *, offset: int, limit: int) -> list[dict[str, Any]]:
|
|
||||||
end = min(self.edge_count, offset + limit)
|
|
||||||
out: list[dict[str, Any]] = []
|
|
||||||
for row in self._edges[offset:end]:
|
|
||||||
out.append(
|
|
||||||
{
|
|
||||||
"source": row.source,
|
|
||||||
"target": row.target,
|
|
||||||
"predicate": row.predicate,
|
|
||||||
}
|
|
||||||
)
|
|
||||||
return out
|
|
||||||
|
|
||||||
def edges_within_nodes(self, *, max_node_id_exclusive: int, limit: int) -> list[dict[str, Any]]:
|
|
||||||
out: list[dict[str, Any]] = []
|
|
||||||
for row in self._edges:
|
|
||||||
if row.source >= max_node_id_exclusive or row.target >= max_node_id_exclusive:
|
|
||||||
continue
|
|
||||||
out.append(
|
|
||||||
{
|
|
||||||
"source": row.source,
|
|
||||||
"target": row.target,
|
|
||||||
"predicate": row.predicate,
|
|
||||||
}
|
|
||||||
)
|
|
||||||
if len(out) >= limit:
|
|
||||||
break
|
|
||||||
return out
|
|
||||||
@@ -1,27 +1,11 @@
|
|||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
from typing import Literal
|
|
||||||
|
|
||||||
from pydantic import Field
|
from pydantic import Field
|
||||||
from pydantic_settings import BaseSettings, SettingsConfigDict
|
from pydantic_settings import BaseSettings, SettingsConfigDict
|
||||||
|
|
||||||
|
|
||||||
class Settings(BaseSettings):
|
class Settings(BaseSettings):
|
||||||
# Which graph engine executes SPARQL queries.
|
|
||||||
# - rdflib: parse TTL locally and query in-memory
|
|
||||||
# - anzograph: query a remote AnzoGraph SPARQL endpoint (optionally LOAD on startup)
|
|
||||||
graph_backend: Literal["rdflib", "anzograph"] = Field(default="rdflib", alias="GRAPH_BACKEND")
|
|
||||||
|
|
||||||
ttl_path: str = Field(default="/data/o3po.ttl", alias="TTL_PATH")
|
|
||||||
include_bnodes: bool = Field(default=False, alias="INCLUDE_BNODES")
|
include_bnodes: bool = Field(default=False, alias="INCLUDE_BNODES")
|
||||||
max_triples: int | None = Field(default=None, alias="MAX_TRIPLES")
|
|
||||||
|
|
||||||
# Optional: Combine owl:imports into a single TTL file on backend startup.
|
|
||||||
combine_owl_imports_on_start: bool = Field(default=False, alias="COMBINE_OWL_IMPORTS_ON_START")
|
|
||||||
combine_entry_location: str | None = Field(default=None, alias="COMBINE_ENTRY_LOCATION")
|
|
||||||
combine_output_location: str | None = Field(default=None, alias="COMBINE_OUTPUT_LOCATION")
|
|
||||||
combine_output_name: str = Field(default="combined_ontology.ttl", alias="COMBINE_OUTPUT_NAME")
|
|
||||||
combine_force: bool = Field(default=False, alias="COMBINE_FORCE")
|
|
||||||
|
|
||||||
# AnzoGraph / SPARQL endpoint configuration
|
# AnzoGraph / SPARQL endpoint configuration
|
||||||
sparql_host: str = Field(default="http://anzograph:8080", alias="SPARQL_HOST")
|
sparql_host: str = Field(default="http://anzograph:8080", alias="SPARQL_HOST")
|
||||||
|
|||||||
@@ -2,11 +2,9 @@ from __future__ import annotations
|
|||||||
|
|
||||||
import asyncio
|
import asyncio
|
||||||
import base64
|
import base64
|
||||||
import json
|
|
||||||
from typing import Any, Protocol
|
from typing import Any, Protocol
|
||||||
|
|
||||||
import httpx
|
import httpx
|
||||||
from rdflib import Graph
|
|
||||||
|
|
||||||
from .settings import Settings
|
from .settings import Settings
|
||||||
|
|
||||||
@@ -21,35 +19,6 @@ class SparqlEngine(Protocol):
|
|||||||
async def query_json(self, query: str) -> dict[str, Any]: ...
|
async def query_json(self, query: str) -> dict[str, Any]: ...
|
||||||
|
|
||||||
|
|
||||||
class RdflibEngine:
|
|
||||||
name = "rdflib"
|
|
||||||
|
|
||||||
def __init__(self, *, ttl_path: str, graph: Graph | None = None):
|
|
||||||
self.ttl_path = ttl_path
|
|
||||||
self.graph: Graph | None = graph
|
|
||||||
|
|
||||||
async def startup(self) -> None:
|
|
||||||
if self.graph is not None:
|
|
||||||
return
|
|
||||||
g = Graph()
|
|
||||||
g.parse(self.ttl_path, format="turtle")
|
|
||||||
self.graph = g
|
|
||||||
|
|
||||||
async def shutdown(self) -> None:
|
|
||||||
# Nothing to close for in-memory rdflib graph.
|
|
||||||
return None
|
|
||||||
|
|
||||||
async def query_json(self, query: str) -> dict[str, Any]:
|
|
||||||
if self.graph is None:
|
|
||||||
raise RuntimeError("RdflibEngine not started")
|
|
||||||
|
|
||||||
result = self.graph.query(query)
|
|
||||||
payload = result.serialize(format="json")
|
|
||||||
if isinstance(payload, bytes):
|
|
||||||
payload = payload.decode("utf-8")
|
|
||||||
return json.loads(payload)
|
|
||||||
|
|
||||||
|
|
||||||
class AnzoGraphEngine:
|
class AnzoGraphEngine:
|
||||||
name = "anzograph"
|
name = "anzograph"
|
||||||
|
|
||||||
@@ -169,9 +138,5 @@ class AnzoGraphEngine:
|
|||||||
raise RuntimeError(f"AnzoGraph not ready at {self.endpoint}") from last_err
|
raise RuntimeError(f"AnzoGraph not ready at {self.endpoint}") from last_err
|
||||||
|
|
||||||
|
|
||||||
def create_sparql_engine(settings: Settings, *, rdflib_graph: Graph | None = None) -> SparqlEngine:
|
def create_sparql_engine(settings: Settings) -> SparqlEngine:
|
||||||
if settings.graph_backend == "rdflib":
|
return AnzoGraphEngine(settings=settings)
|
||||||
return RdflibEngine(ttl_path=settings.ttl_path, graph=rdflib_graph)
|
|
||||||
if settings.graph_backend == "anzograph":
|
|
||||||
return AnzoGraphEngine(settings=settings)
|
|
||||||
raise RuntimeError(f"Unsupported GRAPH_BACKEND={settings.graph_backend!r}")
|
|
||||||
|
|||||||
@@ -1,5 +1,4 @@
|
|||||||
fastapi
|
fastapi
|
||||||
uvicorn[standard]
|
uvicorn[standard]
|
||||||
rdflib
|
|
||||||
pydantic-settings
|
pydantic-settings
|
||||||
httpx
|
httpx
|
||||||
|
|||||||
@@ -1,13 +1,22 @@
|
|||||||
services:
|
services:
|
||||||
|
owl_imports_combiner:
|
||||||
|
build: ./python_services/owl_imports_combiner
|
||||||
|
environment:
|
||||||
|
- COMBINE_OWL_IMPORTS_ON_START=${COMBINE_OWL_IMPORTS_ON_START:-false}
|
||||||
|
- COMBINE_ENTRY_LOCATION
|
||||||
|
- COMBINE_OUTPUT_LOCATION
|
||||||
|
- COMBINE_OUTPUT_NAME
|
||||||
|
- COMBINE_FORCE=${COMBINE_FORCE:-false}
|
||||||
|
- TTL_PATH=${TTL_PATH:-/data/o3po.ttl}
|
||||||
|
volumes:
|
||||||
|
- ./data:/data:Z
|
||||||
|
|
||||||
backend:
|
backend:
|
||||||
build: ./backend
|
build: ./backend
|
||||||
ports:
|
ports:
|
||||||
- "8000:8000"
|
- "8000:8000"
|
||||||
environment:
|
environment:
|
||||||
- GRAPH_BACKEND=${GRAPH_BACKEND:-rdflib}
|
|
||||||
- TTL_PATH=${TTL_PATH:-/data/o3po.ttl}
|
|
||||||
- INCLUDE_BNODES=${INCLUDE_BNODES:-false}
|
- INCLUDE_BNODES=${INCLUDE_BNODES:-false}
|
||||||
- MAX_TRIPLES
|
|
||||||
- CORS_ORIGINS=${CORS_ORIGINS:-http://localhost:5173}
|
- CORS_ORIGINS=${CORS_ORIGINS:-http://localhost:5173}
|
||||||
- SPARQL_HOST=${SPARQL_HOST:-http://anzograph:8080}
|
- SPARQL_HOST=${SPARQL_HOST:-http://anzograph:8080}
|
||||||
- SPARQL_ENDPOINT
|
- SPARQL_ENDPOINT
|
||||||
@@ -21,14 +30,12 @@ services:
|
|||||||
- SPARQL_READY_RETRIES=${SPARQL_READY_RETRIES:-30}
|
- SPARQL_READY_RETRIES=${SPARQL_READY_RETRIES:-30}
|
||||||
- SPARQL_READY_DELAY_S=${SPARQL_READY_DELAY_S:-4}
|
- SPARQL_READY_DELAY_S=${SPARQL_READY_DELAY_S:-4}
|
||||||
- SPARQL_READY_TIMEOUT_S=${SPARQL_READY_TIMEOUT_S:-10}
|
- SPARQL_READY_TIMEOUT_S=${SPARQL_READY_TIMEOUT_S:-10}
|
||||||
- COMBINE_OWL_IMPORTS_ON_START=${COMBINE_OWL_IMPORTS_ON_START:-false}
|
|
||||||
- COMBINE_ENTRY_LOCATION
|
|
||||||
- COMBINE_OUTPUT_LOCATION
|
|
||||||
- COMBINE_OUTPUT_NAME
|
|
||||||
- COMBINE_FORCE=${COMBINE_FORCE:-false}
|
|
||||||
volumes:
|
volumes:
|
||||||
- ./backend:/app
|
- ./backend:/app
|
||||||
- ./data:/data:Z
|
- ./data:/data:Z
|
||||||
|
depends_on:
|
||||||
|
- owl_imports_combiner
|
||||||
|
- anzograph
|
||||||
command: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
|
command: uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
|
||||||
healthcheck:
|
healthcheck:
|
||||||
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/api/health').read()"]
|
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/api/health').read()"]
|
||||||
|
|||||||
14
python_services/owl_imports_combiner/Dockerfile
Normal file
14
python_services/owl_imports_combiner/Dockerfile
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
FROM python:3.12-slim
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
ENV PYTHONDONTWRITEBYTECODE=1 \
|
||||||
|
PYTHONUNBUFFERED=1
|
||||||
|
|
||||||
|
COPY requirements.txt /app/requirements.txt
|
||||||
|
RUN pip install --no-cache-dir -r /app/requirements.txt
|
||||||
|
|
||||||
|
COPY owl_imports_combiner.py /app/owl_imports_combiner.py
|
||||||
|
COPY main.py /app/main.py
|
||||||
|
|
||||||
|
CMD ["python", "/app/main.py"]
|
||||||
54
python_services/owl_imports_combiner/main.py
Normal file
54
python_services/owl_imports_combiner/main.py
Normal file
@@ -0,0 +1,54 @@
|
|||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
|
||||||
|
from owl_imports_combiner import (
|
||||||
|
build_combined_graph,
|
||||||
|
output_location_to_path,
|
||||||
|
resolve_output_location,
|
||||||
|
serialize_graph_to_ttl,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
def _env_bool(name: str, *, default: bool = False) -> bool:
|
||||||
|
val = os.getenv(name)
|
||||||
|
if val is None:
|
||||||
|
return default
|
||||||
|
return val.strip().lower() in {"1", "true", "yes", "y", "on"}
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
logging.basicConfig(level=os.getenv("LOG_LEVEL", "INFO").upper())
|
||||||
|
|
||||||
|
if not _env_bool("COMBINE_OWL_IMPORTS_ON_START", default=False):
|
||||||
|
logger.info("Skipping combine step (COMBINE_OWL_IMPORTS_ON_START=false)")
|
||||||
|
return
|
||||||
|
|
||||||
|
entry_location = os.getenv("COMBINE_ENTRY_LOCATION") or os.getenv("TTL_PATH")
|
||||||
|
if not entry_location:
|
||||||
|
raise SystemExit("Set COMBINE_ENTRY_LOCATION (or TTL_PATH) to the ontology file/URL to load.")
|
||||||
|
|
||||||
|
output_name = os.getenv("COMBINE_OUTPUT_NAME", "combined_ontology.ttl")
|
||||||
|
output_location = resolve_output_location(
|
||||||
|
entry_location,
|
||||||
|
output_location=os.getenv("COMBINE_OUTPUT_LOCATION"),
|
||||||
|
output_name=output_name,
|
||||||
|
)
|
||||||
|
|
||||||
|
output_path = output_location_to_path(output_location)
|
||||||
|
force = _env_bool("COMBINE_FORCE", default=False)
|
||||||
|
if output_path.exists() and not force:
|
||||||
|
logger.info("Skipping combine step (output exists): %s", output_location)
|
||||||
|
return
|
||||||
|
|
||||||
|
graph = build_combined_graph(entry_location)
|
||||||
|
logger.info("Finished combining imports; serializing to: %s", output_location)
|
||||||
|
serialize_graph_to_ttl(graph, output_location)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
1
python_services/owl_imports_combiner/requirements.txt
Normal file
1
python_services/owl_imports_combiner/requirements.txt
Normal file
@@ -0,0 +1 @@
|
|||||||
|
rdflib
|
||||||
Reference in New Issue
Block a user