ekuster/visualizador_instanciados

Fork 0

Files

Oxy8 a75b5b93da Import Solver + neighbors via sparql query

2026-03-04 13:49:14 -03:00

7.9 KiB

Raw Blame History

Backend App (`backend/app`)

This folder contains the FastAPI backend for visualizador_instanciados.

The backend can execute SPARQL queries in two interchangeable ways:

GRAPH_BACKEND=rdflib: parse a Turtle file into an in-memory RDFLib Graph and run SPARQL queries locally.
GRAPH_BACKEND=anzograph: run SPARQL queries against an AnzoGraph SPARQL endpoint over HTTP (optionally LOAD a TTL on startup).

Callers (frontend or other clients) interact with a single API surface (/api/*) and do not need to know which backend is configured.

Files

main.py
- FastAPI app setup, startup/shutdown (lifespan), and HTTP endpoints.
settings.py
- Env-driven configuration (pydantic-settings).
sparql_engine.py
- Backend-agnostic SPARQL execution layer:
  - RdflibEngine: Graph.query(...) + SPARQL JSON serialization.
  - AnzoGraphEngine: HTTP POST to /sparql with Basic auth + readiness gate.
- create_sparql_engine(settings) chooses the engine based on GRAPH_BACKEND.
graph_export.py
- Shared helpers to:
  - build the snapshot SPARQL query used for edge retrieval
  - map SPARQL JSON bindings to {nodes, edges}.
models.py
- Pydantic response/request models:
  - Node, Edge, GraphResponse, StatsResponse, etc.
rdf_store.py
- A local parsed representation (dense IDs + neighbor-ish data) built only in GRAPH_BACKEND=rdflib.
- Used by /api/nodes, /api/edges, and rdflib-mode /api/stats.
pipelines/graph_snapshot.py
- Pipeline used by /api/graph to return a {nodes, edges} snapshot via SPARQL (works for both RDFLib and AnzoGraph).
pipelines/layout_dag_radial.py
- DAG layout helpers used by pipelines/graph_snapshot.py:
  - cycle detection
  - level-synchronous Kahn layering
  - radial (ring-per-layer) positioning.
pipelines/snapshot_service.py
- Snapshot cache layer used by /api/graph and /api/stats so the backend doesn't run expensive SPARQL twice.
pipelines/subclass_labels.py
- Pipeline to extract rdfs:subClassOf entities and aligned rdfs:label list.

Runtime Flow

On startup (FastAPI lifespan):

create_sparql_engine(settings) selects and starts a SPARQL engine.
The engine is stored at app.state.sparql.
If GRAPH_BACKEND=rdflib, RDFStore is also built from the already-loaded RDFLib graph and stored at app.state.store.

On shutdown:

app.state.sparql.shutdown() is called to close the HTTP client (AnzoGraph mode) or no-op (RDFLib mode).

Environment Variables

Most configuration is intended to be provided via container environment variables (see repo root .env and docker-compose.yml).

Core:

GRAPH_BACKEND: rdflib or anzograph
INCLUDE_BNODES: true/false
CORS_ORIGINS: comma-separated list or *

RDFLib mode:

TTL_PATH: path inside the backend container to a .ttl file (example: /data/o3po.ttl)
MAX_TRIPLES: optional int; if set, stops parsing after this many triples

Optional import-combining step (runs before the SPARQL engine starts):

COMBINE_OWL_IMPORTS_ON_START: true to recursively load TTL_PATH (or COMBINE_ENTRY_LOCATION) plus owl:imports and write a combined TTL file.
COMBINE_ENTRY_LOCATION: optional override for the entry file/URL to load (defaults to TTL_PATH)
COMBINE_OUTPUT_LOCATION: optional explicit output path (defaults to ${dirname(entry)}/${COMBINE_OUTPUT_NAME})
COMBINE_OUTPUT_NAME: output filename when COMBINE_OUTPUT_LOCATION is not set (default: combined_ontology.ttl)
COMBINE_FORCE: true to rebuild even if the output file already exists

AnzoGraph mode:

SPARQL_HOST: base host (example: http://anzograph:8080)
SPARQL_ENDPOINT: optional full endpoint; if set, overrides ${SPARQL_HOST}/sparql
SPARQL_USER, SPARQL_PASS: Basic auth credentials
SPARQL_DATA_FILE: file URI as seen by the AnzoGraph container (example: file:///opt/shared-files/o3po.ttl)
SPARQL_GRAPH_IRI: optional graph IRI for LOAD ... INTO GRAPH <...>
SPARQL_LOAD_ON_START: true to execute LOAD <SPARQL_DATA_FILE> during startup
SPARQL_CLEAR_ON_START: true to execute CLEAR ALL during startup (dangerous)
SPARQL_TIMEOUT_S: request timeout for normal SPARQL requests
SPARQL_READY_RETRIES, SPARQL_READY_DELAY_S, SPARQL_READY_TIMEOUT_S: readiness gate parameters

AnzoGraph Readiness Gate

AnzoGraphEngine does not assume "container started" means "SPARQL works". It waits for a smoke-test POST:

Method: POST ${SPARQL_ENDPOINT}
Headers:
- Content-Type: application/x-www-form-urlencoded
- Accept: application/sparql-results+json
- Authorization: Basic ... (if configured)
Body: query=ASK WHERE { ?s ?p ?o }
Success condition: HTTP 2xx and response parses as JSON

This matches the behavior described in docs/anzograph-readiness-julia.md.

API Endpoints

GET /api/health
- Returns { "status": "ok" }.
GET /api/stats
- Returns counts for the same snapshot used by /api/graph (via the snapshot cache).
POST /api/sparql
- Body: { "query": "<SPARQL SELECT/ASK>" }
- Returns SPARQL JSON results as-is.
- Notes:
  - This endpoint is intended for SELECT/ASK returning SPARQL-JSON.
  - SPARQL UPDATE is not exposed here (AnzoGraph LOAD/CLEAR are handled internally during startup).
GET /api/graph?node_limit=...&edge_limit=...
- Returns a graph snapshot as { nodes: [...], edges: [...] }.
- Implemented as a SPARQL edge query + mapping in pipelines/graph_snapshot.py.
GET /api/nodes, GET /api/edges
- Only available in GRAPH_BACKEND=rdflib (these use RDFStore's dense ID tables).

Data Contract

Node

Returned in nodes[] (dense IDs; suitable for indexing in typed arrays):

{
  "id": 0,
  "termType": "uri",
  "iri": "http://example.org/Thing",
  "label": null,
  "x": 0.0,
  "y": 0.0
}

id: integer dense node ID used in edges
termType: "uri" or "bnode"
iri: URI string; blank nodes are normalized to _:<id>
label: rdfs:label when available (best-effort; prefers English)
x/y: world-space coordinates for rendering (currently a radial layered layout derived from rdfs:subClassOf)

Edge

Returned in edges[]:

{
  "source": 0,
  "target": 12,
  "predicate": "http://www.w3.org/2000/01/rdf-schema#subClassOf"
}

source/target: dense node IDs (indexes into nodes[])
predicate: predicate IRI string

Snapshot Query (`/api/graph`)

/api/graph currently uses a SPARQL query that returns only rdfs:subClassOf edges:

selects bindings as ?s ?p ?o (with ?p bound to rdfs:subClassOf)
excludes literal objects (FILTER(!isLiteral(?o))) for safety
optionally excludes blank nodes (unless INCLUDE_BNODES=true)
applies LIMIT edge_limit

The result bindings are mapped to dense node IDs (first-seen order) and returned to the caller.

/api/graph also returns meta with snapshot counts and engine info so the frontend doesn't need to call /api/stats.

If a cycle is detected in the returned rdfs:subClassOf snapshot, /api/graph returns HTTP 422 (layout requires a DAG).

Pipelines

`pipelines/graph_snapshot.py`

fetch_graph_snapshot(...) is the main "export graph" pipeline used by /api/graph.

`pipelines/subclass_labels.py`

extract_subclass_entities_and_labels(...):

Queries all rdfs:subClassOf triples.
Builds a unique set of subjects+objects, then converts it to a deterministic list.
Queries rdfs:label for those entities and returns aligned lists:
- entities[i] corresponds to labels[i].

Notes / Tradeoffs

/api/graph returns only nodes that appear in the returned edge result set. Nodes not referenced by those edges will not be present.
RDFLib and AnzoGraph may differ in supported SPARQL features (vendor extensions, inference, performance), but the API surface is the same.
rdf_store.py is currently only needed for /api/nodes, /api/edges, and rdflib-mode /api/stats. If you don't use those endpoints, it can be removed later.

7.9 KiB Raw Blame History

Backend App (backend/app)