Files

Oxy8 b44867abfa midpoint - go

2026-03-05 15:39:47 -03:00

6.6 KiB

Raw Blame History

Backend App (`backend/app`)

This folder contains the FastAPI backend for visualizador_instanciados.

The backend executes SPARQL queries against an AnzoGraph SPARQL endpoint over HTTP (optionally LOAD a TTL on startup).

Files

main.py
- FastAPI app setup, startup/shutdown (lifespan), and HTTP endpoints.
settings.py
- Env-driven configuration (pydantic-settings).
sparql_engine.py
- SPARQL execution layer:
  - AnzoGraphEngine: HTTP POST to /sparql with Basic auth + readiness gate.
- create_sparql_engine(settings) creates the engine.
graph_export.py
- Shared helpers to:
  - build the snapshot SPARQL query used for edge retrieval
  - map SPARQL JSON bindings to {nodes, edges}.
models.py
- Pydantic response/request models:
  - Node, Edge, GraphResponse, StatsResponse, etc.
pipelines/graph_snapshot.py
- Pipeline used by /api/graph to return a {nodes, edges} snapshot via SPARQL.
pipelines/layout_dag_radial.py
- DAG layout helpers used by pipelines/graph_snapshot.py:
  - cycle detection
  - level-synchronous Kahn layering
  - radial (ring-per-layer) positioning.
pipelines/snapshot_service.py
- Snapshot cache layer used by /api/graph and /api/stats so the backend doesn't run expensive SPARQL twice.
pipelines/subclass_labels.py
- Pipeline to extract rdfs:subClassOf entities and aligned rdfs:label list.

Runtime Flow

On startup (FastAPI lifespan):

create_sparql_engine(settings) selects and starts a SPARQL engine.
The engine is stored at app.state.sparql.

On shutdown:

app.state.sparql.shutdown() is called to close the HTTP client.

Environment Variables

Most configuration is intended to be provided via container environment variables (see repo root .env and docker-compose.yml).

Core:

INCLUDE_BNODES: true/false
CORS_ORIGINS: comma-separated list or *

Optional import-combining step (separate container):

The repo's owl_imports_combiner Docker service can be used to recursively load a Turtle file (or URL) plus its owl:imports into a single combined TTL output.

COMBINE_OWL_IMPORTS_ON_START: true to run the combiner container on startup (no-op when false)
COMBINE_ENTRY_LOCATION: entry file/URL to load (falls back to TTL_PATH if not set)
COMBINE_OUTPUT_LOCATION: output path for the combined TTL (defaults to ${dirname(entry)}/${COMBINE_OUTPUT_NAME})
COMBINE_OUTPUT_NAME: output filename when COMBINE_OUTPUT_LOCATION is not set (default: combined_ontology.ttl)
COMBINE_FORCE: true to rebuild even if the output file already exists

AnzoGraph mode:

SPARQL_HOST: base host (example: http://anzograph:8080)
SPARQL_ENDPOINT: optional full endpoint; if set, overrides ${SPARQL_HOST}/sparql
SPARQL_USER, SPARQL_PASS: Basic auth credentials
SPARQL_DATA_FILE: file URI as seen by the AnzoGraph container (example: file:///opt/shared-files/o3po.ttl)
SPARQL_GRAPH_IRI: optional graph IRI for LOAD ... INTO GRAPH <...>
SPARQL_LOAD_ON_START: true to execute LOAD <SPARQL_DATA_FILE> during startup
SPARQL_CLEAR_ON_START: true to execute CLEAR ALL during startup (dangerous)
SPARQL_TIMEOUT_S: request timeout for normal SPARQL requests
SPARQL_READY_RETRIES, SPARQL_READY_DELAY_S, SPARQL_READY_TIMEOUT_S: readiness gate parameters

AnzoGraph Readiness Gate

AnzoGraphEngine does not assume "container started" means "SPARQL works". It waits for a smoke-test POST:

Method: POST ${SPARQL_ENDPOINT}
Headers:
- Content-Type: application/x-www-form-urlencoded
- Accept: application/sparql-results+json
- Authorization: Basic ... (if configured)
Body: query=ASK WHERE { ?s ?p ?o }
Success condition: HTTP 2xx and response parses as JSON

This matches the behavior described in docs/anzograph-readiness-julia.md.

API Endpoints

GET /api/health
- Returns { "status": "ok" }.
GET /api/stats
- Returns counts for the same snapshot used by /api/graph (via the snapshot cache).
POST /api/sparql
- Body: { "query": "<SPARQL SELECT/ASK>" }
- Returns SPARQL JSON results as-is.
- Notes:
  - This endpoint is intended for SELECT/ASK returning SPARQL-JSON.
  - SPARQL UPDATE is not exposed here (AnzoGraph LOAD/CLEAR are handled internally during startup).
GET /api/graph?node_limit=...&edge_limit=...
- Returns a graph snapshot as { nodes: [...], edges: [...] }.
- Implemented as a SPARQL edge query + mapping in pipelines/graph_snapshot.py.

Data Contract

Node

Returned in nodes[] (dense IDs; suitable for indexing in typed arrays):

{
  "id": 0,
  "termType": "uri",
  "iri": "http://example.org/Thing",
  "label": null,
  "x": 0.0,
  "y": 0.0
}

id: integer dense node ID used in edges
termType: "uri" or "bnode"
iri: URI string; blank nodes are normalized to _:<id>
label: rdfs:label when available (best-effort; prefers English)
x/y: world-space coordinates for rendering (currently a radial layered layout derived from rdfs:subClassOf)

Edge

Returned in edges[]:

{
  "source": 0,
  "target": 12,
  "predicate": "http://www.w3.org/2000/01/rdf-schema#subClassOf"
}

source/target: dense node IDs (indexes into nodes[])
predicate: predicate IRI string

Snapshot Query (`/api/graph`)

/api/graph currently uses a SPARQL query that returns only rdfs:subClassOf edges:

selects bindings as ?s ?p ?o (with ?p bound to rdfs:subClassOf)
excludes literal objects (FILTER(!isLiteral(?o))) for safety
optionally excludes blank nodes (unless INCLUDE_BNODES=true)
applies LIMIT edge_limit

The result bindings are mapped to dense node IDs (first-seen order) and returned to the caller.

/api/graph also returns meta with snapshot counts and engine info so the frontend doesn't need to call /api/stats.

If a cycle is detected in the returned rdfs:subClassOf snapshot, /api/graph returns HTTP 422 (layout requires a DAG).

Pipelines

`pipelines/graph_snapshot.py`

fetch_graph_snapshot(...) is the main "export graph" pipeline used by /api/graph.

`pipelines/subclass_labels.py`

extract_subclass_entities_and_labels(...):

Queries all rdfs:subClassOf triples.
Builds a unique set of subjects+objects, then converts it to a deterministic list.
Queries rdfs:label for those entities and returns aligned lists:
- entities[i] corresponds to labels[i].

Notes / Tradeoffs

/api/graph returns only nodes that appear in the returned edge result set. Nodes not referenced by those edges will not be present.
AnzoGraph SPARQL feature support (inference, extensions, performance) is vendor-specific.

6.6 KiB Raw Blame History

Backend App (backend/app)