# Waiting for AnzoGraph readiness from Julia (how this repo does it) This repo runs a Julia pipeline (`julia/main.jl`) against an AnzoGraph SPARQL endpoint. The key problem is that **“container started” ≠ “SPARQL endpoint is ready to accept queries”**. So, before the Julia code does anything that depends on SPARQL (like `LOAD <...>` or large `SELECT`s), it explicitly **waits until AnzoGraph is actually responding to a real SPARQL POST request with valid JSON results**. This document explains the exact mechanism used here, why it works, and gives copy/paste-ready patterns you can transfer to another project. --- ## 1) Where the waiting happens (pipeline control flow) In `julia/main.jl`, the entrypoint calls: ```julia # Step 1: Wait for AnzoGraph wait_for_anzograph() # Step 2: Load TTL file result = sparql_update("LOAD <$SPARQL_DATA_FILE>") ``` So the “await” is not a Julia `Task`/`async` wait; it is a **blocking retry loop** that only returns when it can successfully execute a small SPARQL query. Reference: `julia/main.jl` defines `wait_for_anzograph()` and calls it from `main()`. --- ## 2) Why this is needed even with Docker Compose `depends_on` This repo’s `docker-compose.yml` includes an AnzoGraph `healthcheck`: ```yaml anzograph: healthcheck: test: ["CMD-SHELL", "curl -f http://localhost:8080/sparql || exit 1"] interval: 10s timeout: 5s retries: 30 start_period: 60s ``` However, `julia-layout` currently depends on `anzograph` with: ```yaml depends_on: anzograph: condition: service_started ``` Meaning: - Compose will ensure the **container process has started**. - Compose does **not** guarantee the AnzoGraph HTTP/SPARQL endpoint is ready (unless you use `service_healthy`, and even then a “healthy GET” is not always equivalent to “SPARQL POST works with auth + JSON”). So the Julia code includes its own readiness gate to prevent failures like: - TCP connection refused (port not open yet) - HTTP endpoint reachable but not fully initialized - Non-JSON/HTML error responses while the service is still booting --- ## 3) What “ready” means in this repo In this repo, “AnzoGraph is ready” means: 1. An HTTP `POST` to `${SPARQL_HOST}/sparql` succeeds, with headers: - `Content-Type: application/x-www-form-urlencoded` - `Accept: application/sparql-results+json` - `Authorization: Basic ...` 2. The body parses as SPARQL JSON results (`application/sparql-results+json`) It does **not** strictly mean: - Your dataset is already loaded - The loaded data is fully indexed (that can matter in some systems after `LOAD`) This repo uses readiness as a **“SPARQL endpoint is alive and speaking the protocol”** check. --- ## 4) The actual Julia implementation (as in `julia/main.jl`) ### 4.1 Configuration (endpoint + auth) The Julia script builds endpoint and auth from environment variables: ```julia const SPARQL_HOST = get(ENV, "SPARQL_HOST", "http://localhost:8080") const SPARQL_ENDPOINT = "$SPARQL_HOST/sparql" const SPARQL_USER = get(ENV, "SPARQL_USER", "admin") const SPARQL_PASS = get(ENV, "SPARQL_PASS", "Passw0rd1") const AUTH_HEADER = "Basic " * base64encode("$SPARQL_USER:$SPARQL_PASS") ``` In Docker Compose for this repo, the Julia container overrides `SPARQL_HOST` to use the service DNS name: ```yaml environment: - SPARQL_HOST=http://anzograph:8080 ``` ### 4.2 The smoke query used for readiness This is the query used in the wait loop: ```julia const SMOKE_TEST_QUERY = "SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 3" ``` Notes: - It’s intentionally small (`LIMIT 3`) to keep the readiness check cheap. - It returns *some* bindings when data exists, but **even an empty dataset can still return a valid empty result set**. The code treats “valid response” as ready. If you want a readiness check that does not depend on any data being present, an `ASK` query is also common: ```sparql ASK WHERE { ?s ?p ?o } ``` ### 4.3 SPARQL query function (request + minimal retry) `sparql_query(query; retries=...)` is a generic helper that makes SPARQL POST requests: ```julia function sparql_query(query::String; retries::Int=5)::SparqlResult for attempt in 1:retries try response = HTTP.post( SPARQL_ENDPOINT, [ "Content-Type" => "application/x-www-form-urlencoded", "Accept" => "application/sparql-results+json", "Authorization" => AUTH_HEADER ]; body = "query=" * HTTP.URIs.escapeuri(query) ) if response.status == 200 json = JSON.parse(String(response.body)) return SparqlResult(json["results"]["bindings"]) elseif response.status >= 500 && attempt < retries sleep(10) continue else error("SPARQL query failed with status $(response.status)") end catch e if attempt < retries sleep(10) continue end rethrow(e) end end error("SPARQL query failed after $retries attempts") end ``` Important behaviors to preserve when transferring: - It uses **POST** (not GET) to the SPARQL endpoint. - It requires a **200** response and successfully parses SPARQL JSON results. - It retries on: - `>= 500` server errors - network / protocol / parsing errors (caught exceptions) ### 4.4 The readiness gate: `wait_for_anzograph` This is the “await until ready” logic: ```julia function wait_for_anzograph(max_retries::Int=30)::Bool println("Waiting for AnzoGraph at $SPARQL_ENDPOINT...") for attempt in 1:max_retries try smoke_result = sparql_query(SMOKE_TEST_QUERY; retries=1) println(" AnzoGraph is ready (attempt $attempt, smoke rows=$(length(smoke_result.bindings)))") return true catch e println(" Attempt $attempt/$max_retries: $(typeof(e))") sleep(4) end end error("AnzoGraph not available after $max_retries attempts") end ``` Why it calls `sparql_query(...; retries=1)`: - It makes each outer “readiness attempt” a **single** request. - The outer loop controls cadence (`sleep(4)`) and total wait time. - This avoids “nested retry loops” (inner sleeps + outer sleeps) that can make waits much longer than intended. Time bound in the current implementation: - `max_retries = 30` - `sleep(4)` between attempts - Roughly ~120 seconds of waiting (plus request time). --- ## 5) What failures cause it to keep waiting `wait_for_anzograph()` catches any exception thrown by `sparql_query()` and retries. In practice, that includes: - **Connection errors** (DNS not ready, connection refused, etc.) - **Timeouts** (if HTTP request takes too long and the library throws) - **Non-200 HTTP statuses** that cause `error(...)` - **Non-JSON / unexpected JSON** responses causing `JSON.parse(...)` to throw That last point is a big reason a “real SPARQL request + parse” is stronger than just “ping the port”. --- ## 6) Transferable, self-contained version (recommended pattern) If you want to reuse this in another project, it’s usually easier to: - avoid globals, - make endpoint/auth explicit, - use a **time-based timeout** instead of `max_retries` (more robust), - add request timeouts so the wait loop can’t hang forever on a single request. Below is a drop-in module you can copy into your project. ```julia module AnzoGraphReady using HTTP using JSON using Base64 using Dates struct SparqlResult bindings::Vector{Dict{String, Any}} end function basic_auth_header(user::AbstractString, pass::AbstractString)::String return "Basic " * base64encode("$(user):$(pass)") end function sparql_query( endpoint::AbstractString, auth_header::AbstractString, query::AbstractString; retries::Int = 1, retry_sleep_s::Real = 2, request_timeout_s::Real = 15, )::SparqlResult for attempt in 1:retries try response = HTTP.post( String(endpoint), [ "Content-Type" => "application/x-www-form-urlencoded", "Accept" => "application/sparql-results+json", "Authorization" => auth_header, ]; body = "query=" * HTTP.URIs.escapeuri(String(query)), readtimeout = request_timeout_s, ) if response.status != 200 error("SPARQL query failed with status $(response.status)") end parsed = JSON.parse(String(response.body)) bindings = get(get(parsed, "results", Dict()), "bindings", Any[]) return SparqlResult(Vector{Dict{String, Any}}(bindings)) catch e if attempt < retries sleep(retry_sleep_s) continue end rethrow(e) end end error("sparql_query: unreachable") end """ Wait until AnzoGraph responds to a real SPARQL POST with parseable JSON. This is the direct analog of this repo's `wait_for_anzograph()`, but with: - a time-based timeout (`timeout`) - a request timeout per attempt (`request_timeout_s`) - simple exponential backoff """ function wait_for_anzograph( endpoint::AbstractString, auth_header::AbstractString; timeout::Period = Minute(3), initial_delay_s::Real = 0.5, max_delay_s::Real = 5.0, request_timeout_s::Real = 10.0, query::AbstractString = "ASK WHERE { ?s ?p ?o }", )::Nothing deadline = now() + timeout delay_s = initial_delay_s while now() < deadline try # A single attempt: if it succeeds, we declare "ready". sparql_query( endpoint, auth_header, query; retries = 1, request_timeout_s = request_timeout_s, ) return catch sleep(delay_s) delay_s = min(max_delay_s, delay_s * 1.5) end end error("AnzoGraph not available before timeout=$(timeout)") end end # module ``` Typical usage (matching this repo’s environment variables): ```julia using .AnzoGraphReady sparql_host = get(ENV, "SPARQL_HOST", "http://localhost:8080") endpoint = "$(sparql_host)/sparql" user = get(ENV, "SPARQL_USER", "admin") pass = get(ENV, "SPARQL_PASS", "Passw0rd1") auth = AnzoGraphReady.basic_auth_header(user, pass) AnzoGraphReady.wait_for_anzograph(endpoint, auth; timeout=Minute(5)) # Now it is safe to LOAD / query. ``` --- ## 7) Optional: waiting for “data is ready” after `LOAD` Some systems accept `LOAD` but need time before results show up reliably (indexing / transaction visibility). If you run into that in your other project, add a second gate after `LOAD`, for example: 1) load, then 2) poll a query that must be true after load (e.g., “triple count > 0”, or a known IRI exists). Example “post-load gate”: ```julia post_load_query = """ SELECT (COUNT(*) AS ?n) WHERE { ?s ?p ?o } """ res = AnzoGraphReady.sparql_query(endpoint, auth, post_load_query; retries=1) # Parse `?n` out of bindings and require it to be > 0; retry until it is. ``` (This repo does not currently enforce “non-empty”; it only enforces “SPARQL is working”.) --- ## 8) Practical checklist when transferring to another project - Make readiness checks hit the **real SPARQL POST** path you will use in production. - Require a **valid JSON parse**, not just “port open”. - Add **per-request timeouts**, so a single hung request cannot hang the whole pipeline. - Prefer **time-based overall timeout** for predictable behavior in CI. - Keep the query **cheap** (`ASK` or `LIMIT 1/3`). - If you use Docker Compose healthchecks, consider also using `depends_on: condition: service_healthy`, but still keep the in-app wait as a safety net (it’s closer to the real contract your code needs).