visualizador_instanciados/backend/app/README.md

# Backend App (`backend/app`)

This folder contains the FastAPI backend for `visualizador_instanciados`.

The backend executes SPARQL queries against an AnzoGraph SPARQL endpoint over HTTP
(optionally `LOAD` a TTL on startup).

## Files

- `main.py`
  - FastAPI app setup, startup/shutdown (`lifespan`), and HTTP endpoints.
- `settings.py`
  - Env-driven configuration (`pydantic-settings`).
- `sparql_engine.py`
  - SPARQL execution layer:
    - `AnzoGraphEngine`: HTTP POST to `/sparql` with Basic auth + readiness gate.
  - `create_sparql_engine(settings)` creates the engine.
- `graph_export.py`
  - Shared helpers to:
    - build the snapshot SPARQL query used for edge retrieval
    - map SPARQL JSON bindings to `{nodes, edges}`.
- `models.py`
  - Pydantic response/request models:
    - `Node`, `Edge`, `GraphResponse`, `StatsResponse`, etc.
- `pipelines/graph_snapshot.py`
  - Pipeline used by `/api/graph` to return a `{nodes, edges}` snapshot via SPARQL.
- `pipelines/layout_dag_radial.py`
  - DAG layout helpers used by `pipelines/graph_snapshot.py`:
    - cycle detection
    - level-synchronous Kahn layering
    - radial (ring-per-layer) positioning.
- `pipelines/snapshot_service.py`
  - Snapshot cache layer used by `/api/graph` and `/api/stats` so the backend doesn't run expensive SPARQL twice.
- `pipelines/subclass_labels.py`
  - Pipeline to extract `rdfs:subClassOf` entities and aligned `rdfs:label` list.

## Runtime Flow

On startup (FastAPI lifespan):

1. `create_sparql_engine(settings)` selects and starts a SPARQL engine.
2. The engine is stored at `app.state.sparql`.

On shutdown:

- `app.state.sparql.shutdown()` is called to close the HTTP client.

## Environment Variables

Most configuration is intended to be provided via container environment variables (see repo root `.env` and `docker-compose.yml`).

Core:

- `INCLUDE_BNODES`: `true`/`false`
- `CORS_ORIGINS`: comma-separated list or `*`

Optional import-combining step (separate container):

The repo's `owl_imports_combiner` Docker service can be used to recursively load a Turtle file (or URL) plus its `owl:imports` into a single combined TTL output.

- `COMBINE_OWL_IMPORTS_ON_START`: `true` to run the combiner container on startup (no-op when `false`)
- `COMBINE_ENTRY_LOCATION`: entry file/URL to load (falls back to `TTL_PATH` if not set)
- `COMBINE_OUTPUT_LOCATION`: output path for the combined TTL (defaults to `${dirname(entry)}/${COMBINE_OUTPUT_NAME}`)
- `COMBINE_OUTPUT_NAME`: output filename when `COMBINE_OUTPUT_LOCATION` is not set (default: `combined_ontology.ttl`)
- `COMBINE_FORCE`: `true` to rebuild even if the output file already exists

AnzoGraph mode:

- `SPARQL_HOST`: base host (example: `http://anzograph:8080`)
- `SPARQL_ENDPOINT`: optional full endpoint; if set, overrides `${SPARQL_HOST}/sparql`
- `SPARQL_USER`, `SPARQL_PASS`: Basic auth credentials
- `SPARQL_DATA_FILE`: file URI as seen by the **AnzoGraph container** (example: `file:///opt/shared-files/o3po.ttl`)
- `SPARQL_GRAPH_IRI`: optional graph IRI for `LOAD ... INTO GRAPH <...>`
- `SPARQL_LOAD_ON_START`: `true` to execute `LOAD <SPARQL_DATA_FILE>` during startup
- `SPARQL_CLEAR_ON_START`: `true` to execute `CLEAR ALL` during startup (dangerous)
- `SPARQL_TIMEOUT_S`: request timeout for normal SPARQL requests
- `SPARQL_READY_RETRIES`, `SPARQL_READY_DELAY_S`, `SPARQL_READY_TIMEOUT_S`: readiness gate parameters

## AnzoGraph Readiness Gate

`AnzoGraphEngine` does not assume "container started" means "SPARQL works".
It waits for a smoke-test POST:

- Method: `POST ${SPARQL_ENDPOINT}`
- Headers:
  - `Content-Type: application/x-www-form-urlencoded`
  - `Accept: application/sparql-results+json`
  - `Authorization: Basic ...` (if configured)
- Body: `query=ASK WHERE { ?s ?p ?o }`
- Success condition: HTTP 2xx and response parses as JSON

This matches the behavior described in `docs/anzograph-readiness-julia.md`.

## API Endpoints

- `GET /api/health`
  - Returns `{ "status": "ok" }`.
- `GET /api/stats`
  - Returns counts for the same snapshot used by `/api/graph` (via the snapshot cache).
- `POST /api/sparql`
  - Body: `{ "query": "<SPARQL SELECT/ASK>" }`
  - Returns SPARQL JSON results as-is.
  - Notes:
    - This endpoint is intended for **SELECT/ASK returning SPARQL-JSON**.
    - SPARQL UPDATE is not exposed here (AnzoGraph `LOAD`/`CLEAR` are handled internally during startup).
- `GET /api/graph?node_limit=...&edge_limit=...`
  - Returns a graph snapshot as `{ nodes: [...], edges: [...] }`.
  - Implemented as a SPARQL edge query + mapping in `pipelines/graph_snapshot.py`.

## Data Contract

### Node

Returned in `nodes[]` (dense IDs; suitable for indexing in typed arrays):

```json
{
  "id": 0,
  "termType": "uri",
  "iri": "http://example.org/Thing",
  "label": null,
  "x": 0.0,
  "y": 0.0
}
```

- `id`: integer dense node ID used in edges
- `termType`: `"uri"` or `"bnode"`
- `iri`: URI string; blank nodes are normalized to `_:<id>`
- `label`: `rdfs:label` when available (best-effort; prefers English)
- `x`/`y`: world-space coordinates for rendering (currently a radial layered layout derived from `rdfs:subClassOf`)

### Edge

Returned in `edges[]`:

```json
{
  "source": 0,
  "target": 12,
  "predicate": "http://www.w3.org/2000/01/rdf-schema#subClassOf"
}
```

- `source`/`target`: dense node IDs (indexes into `nodes[]`)
- `predicate`: predicate IRI string

## Snapshot Query (`/api/graph`)

`/api/graph` currently uses a SPARQL query that returns only `rdfs:subClassOf` edges:

- selects bindings as `?s ?p ?o` (with `?p` bound to `rdfs:subClassOf`)
- excludes literal objects (`FILTER(!isLiteral(?o))`) for safety
- optionally excludes blank nodes (unless `INCLUDE_BNODES=true`)
- applies `LIMIT edge_limit`

The result bindings are mapped to dense node IDs (first-seen order) and returned to the caller.

`/api/graph` also returns `meta` with snapshot counts and engine info so the frontend doesn't need to call `/api/stats`.

If a cycle is detected in the returned `rdfs:subClassOf` snapshot, `/api/graph` returns HTTP 422 (layout requires a DAG).

## Pipelines

### `pipelines/graph_snapshot.py`

`fetch_graph_snapshot(...)` is the main "export graph" pipeline used by `/api/graph`.

### `pipelines/subclass_labels.py`

`extract_subclass_entities_and_labels(...)`:

1. Queries all `rdfs:subClassOf` triples.
2. Builds a unique set of subjects+objects, then converts it to a deterministic list.
3. Queries `rdfs:label` for those entities and returns aligned lists:
   - `entities[i]` corresponds to `labels[i]`.

## Notes / Tradeoffs

- `/api/graph` returns only nodes that appear in the returned edge result set. Nodes not referenced by those edges will not be present.
- AnzoGraph SPARQL feature support (inference, extensions, performance) is vendor-specific.