Files
visualizador_instanciados/backend/app/README.md
2026-03-04 13:49:14 -03:00

7.9 KiB

Backend App (backend/app)

This folder contains the FastAPI backend for visualizador_instanciados.

The backend can execute SPARQL queries in two interchangeable ways:

  1. GRAPH_BACKEND=rdflib: parse a Turtle file into an in-memory RDFLib Graph and run SPARQL queries locally.
  2. GRAPH_BACKEND=anzograph: run SPARQL queries against an AnzoGraph SPARQL endpoint over HTTP (optionally LOAD a TTL on startup).

Callers (frontend or other clients) interact with a single API surface (/api/*) and do not need to know which backend is configured.

Files

  • main.py
    • FastAPI app setup, startup/shutdown (lifespan), and HTTP endpoints.
  • settings.py
    • Env-driven configuration (pydantic-settings).
  • sparql_engine.py
    • Backend-agnostic SPARQL execution layer:
      • RdflibEngine: Graph.query(...) + SPARQL JSON serialization.
      • AnzoGraphEngine: HTTP POST to /sparql with Basic auth + readiness gate.
    • create_sparql_engine(settings) chooses the engine based on GRAPH_BACKEND.
  • graph_export.py
    • Shared helpers to:
      • build the snapshot SPARQL query used for edge retrieval
      • map SPARQL JSON bindings to {nodes, edges}.
  • models.py
    • Pydantic response/request models:
      • Node, Edge, GraphResponse, StatsResponse, etc.
  • rdf_store.py
    • A local parsed representation (dense IDs + neighbor-ish data) built only in GRAPH_BACKEND=rdflib.
    • Used by /api/nodes, /api/edges, and rdflib-mode /api/stats.
  • pipelines/graph_snapshot.py
    • Pipeline used by /api/graph to return a {nodes, edges} snapshot via SPARQL (works for both RDFLib and AnzoGraph).
  • pipelines/layout_dag_radial.py
    • DAG layout helpers used by pipelines/graph_snapshot.py:
      • cycle detection
      • level-synchronous Kahn layering
      • radial (ring-per-layer) positioning.
  • pipelines/snapshot_service.py
    • Snapshot cache layer used by /api/graph and /api/stats so the backend doesn't run expensive SPARQL twice.
  • pipelines/subclass_labels.py
    • Pipeline to extract rdfs:subClassOf entities and aligned rdfs:label list.

Runtime Flow

On startup (FastAPI lifespan):

  1. create_sparql_engine(settings) selects and starts a SPARQL engine.
  2. The engine is stored at app.state.sparql.
  3. If GRAPH_BACKEND=rdflib, RDFStore is also built from the already-loaded RDFLib graph and stored at app.state.store.

On shutdown:

  • app.state.sparql.shutdown() is called to close the HTTP client (AnzoGraph mode) or no-op (RDFLib mode).

Environment Variables

Most configuration is intended to be provided via container environment variables (see repo root .env and docker-compose.yml).

Core:

  • GRAPH_BACKEND: rdflib or anzograph
  • INCLUDE_BNODES: true/false
  • CORS_ORIGINS: comma-separated list or *

RDFLib mode:

  • TTL_PATH: path inside the backend container to a .ttl file (example: /data/o3po.ttl)
  • MAX_TRIPLES: optional int; if set, stops parsing after this many triples

Optional import-combining step (runs before the SPARQL engine starts):

  • COMBINE_OWL_IMPORTS_ON_START: true to recursively load TTL_PATH (or COMBINE_ENTRY_LOCATION) plus owl:imports and write a combined TTL file.
  • COMBINE_ENTRY_LOCATION: optional override for the entry file/URL to load (defaults to TTL_PATH)
  • COMBINE_OUTPUT_LOCATION: optional explicit output path (defaults to ${dirname(entry)}/${COMBINE_OUTPUT_NAME})
  • COMBINE_OUTPUT_NAME: output filename when COMBINE_OUTPUT_LOCATION is not set (default: combined_ontology.ttl)
  • COMBINE_FORCE: true to rebuild even if the output file already exists

AnzoGraph mode:

  • SPARQL_HOST: base host (example: http://anzograph:8080)
  • SPARQL_ENDPOINT: optional full endpoint; if set, overrides ${SPARQL_HOST}/sparql
  • SPARQL_USER, SPARQL_PASS: Basic auth credentials
  • SPARQL_DATA_FILE: file URI as seen by the AnzoGraph container (example: file:///opt/shared-files/o3po.ttl)
  • SPARQL_GRAPH_IRI: optional graph IRI for LOAD ... INTO GRAPH <...>
  • SPARQL_LOAD_ON_START: true to execute LOAD <SPARQL_DATA_FILE> during startup
  • SPARQL_CLEAR_ON_START: true to execute CLEAR ALL during startup (dangerous)
  • SPARQL_TIMEOUT_S: request timeout for normal SPARQL requests
  • SPARQL_READY_RETRIES, SPARQL_READY_DELAY_S, SPARQL_READY_TIMEOUT_S: readiness gate parameters

AnzoGraph Readiness Gate

AnzoGraphEngine does not assume "container started" means "SPARQL works". It waits for a smoke-test POST:

  • Method: POST ${SPARQL_ENDPOINT}
  • Headers:
    • Content-Type: application/x-www-form-urlencoded
    • Accept: application/sparql-results+json
    • Authorization: Basic ... (if configured)
  • Body: query=ASK WHERE { ?s ?p ?o }
  • Success condition: HTTP 2xx and response parses as JSON

This matches the behavior described in docs/anzograph-readiness-julia.md.

API Endpoints

  • GET /api/health
    • Returns { "status": "ok" }.
  • GET /api/stats
    • Returns counts for the same snapshot used by /api/graph (via the snapshot cache).
  • POST /api/sparql
    • Body: { "query": "<SPARQL SELECT/ASK>" }
    • Returns SPARQL JSON results as-is.
    • Notes:
      • This endpoint is intended for SELECT/ASK returning SPARQL-JSON.
      • SPARQL UPDATE is not exposed here (AnzoGraph LOAD/CLEAR are handled internally during startup).
  • GET /api/graph?node_limit=...&edge_limit=...
    • Returns a graph snapshot as { nodes: [...], edges: [...] }.
    • Implemented as a SPARQL edge query + mapping in pipelines/graph_snapshot.py.
  • GET /api/nodes, GET /api/edges
    • Only available in GRAPH_BACKEND=rdflib (these use RDFStore's dense ID tables).

Data Contract

Node

Returned in nodes[] (dense IDs; suitable for indexing in typed arrays):

{
  "id": 0,
  "termType": "uri",
  "iri": "http://example.org/Thing",
  "label": null,
  "x": 0.0,
  "y": 0.0
}
  • id: integer dense node ID used in edges
  • termType: "uri" or "bnode"
  • iri: URI string; blank nodes are normalized to _:<id>
  • label: rdfs:label when available (best-effort; prefers English)
  • x/y: world-space coordinates for rendering (currently a radial layered layout derived from rdfs:subClassOf)

Edge

Returned in edges[]:

{
  "source": 0,
  "target": 12,
  "predicate": "http://www.w3.org/2000/01/rdf-schema#subClassOf"
}
  • source/target: dense node IDs (indexes into nodes[])
  • predicate: predicate IRI string

Snapshot Query (/api/graph)

/api/graph currently uses a SPARQL query that returns only rdfs:subClassOf edges:

  • selects bindings as ?s ?p ?o (with ?p bound to rdfs:subClassOf)
  • excludes literal objects (FILTER(!isLiteral(?o))) for safety
  • optionally excludes blank nodes (unless INCLUDE_BNODES=true)
  • applies LIMIT edge_limit

The result bindings are mapped to dense node IDs (first-seen order) and returned to the caller.

/api/graph also returns meta with snapshot counts and engine info so the frontend doesn't need to call /api/stats.

If a cycle is detected in the returned rdfs:subClassOf snapshot, /api/graph returns HTTP 422 (layout requires a DAG).

Pipelines

pipelines/graph_snapshot.py

fetch_graph_snapshot(...) is the main "export graph" pipeline used by /api/graph.

pipelines/subclass_labels.py

extract_subclass_entities_and_labels(...):

  1. Queries all rdfs:subClassOf triples.
  2. Builds a unique set of subjects+objects, then converts it to a deterministic list.
  3. Queries rdfs:label for those entities and returns aligned lists:
    • entities[i] corresponds to labels[i].

Notes / Tradeoffs

  • /api/graph returns only nodes that appear in the returned edge result set. Nodes not referenced by those edges will not be present.
  • RDFLib and AnzoGraph may differ in supported SPARQL features (vendor extensions, inference, performance), but the API surface is the same.
  • rdf_store.py is currently only needed for /api/nodes, /api/edges, and rdflib-mode /api/stats. If you don't use those endpoints, it can be removed later.