372 lines
12 KiB
Markdown
372 lines
12 KiB
Markdown
# Waiting for AnzoGraph readiness from Julia (how this repo does it)
|
||
|
||
This repo runs a Julia pipeline (`julia/main.jl`) against an AnzoGraph SPARQL endpoint. The key problem is that **“container started” ≠ “SPARQL endpoint is ready to accept queries”**.
|
||
|
||
So, before the Julia code does anything that depends on SPARQL (like `LOAD <...>` or large `SELECT`s), it explicitly **waits until AnzoGraph is actually responding to a real SPARQL POST request with valid JSON results**.
|
||
|
||
This document explains the exact mechanism used here, why it works, and gives copy/paste-ready patterns you can transfer to another project.
|
||
|
||
---
|
||
|
||
## 1) Where the waiting happens (pipeline control flow)
|
||
|
||
In `julia/main.jl`, the entrypoint calls:
|
||
|
||
```julia
|
||
# Step 1: Wait for AnzoGraph
|
||
wait_for_anzograph()
|
||
|
||
# Step 2: Load TTL file
|
||
result = sparql_update("LOAD <$SPARQL_DATA_FILE>")
|
||
```
|
||
|
||
So the “await” is not a Julia `Task`/`async` wait; it is a **blocking retry loop** that only returns when it can successfully execute a small SPARQL query.
|
||
|
||
Reference: `julia/main.jl` defines `wait_for_anzograph()` and calls it from `main()`.
|
||
|
||
---
|
||
|
||
## 2) Why this is needed even with Docker Compose `depends_on`
|
||
|
||
This repo’s `docker-compose.yml` includes an AnzoGraph `healthcheck`:
|
||
|
||
```yaml
|
||
anzograph:
|
||
healthcheck:
|
||
test: ["CMD-SHELL", "curl -f http://localhost:8080/sparql || exit 1"]
|
||
interval: 10s
|
||
timeout: 5s
|
||
retries: 30
|
||
start_period: 60s
|
||
```
|
||
|
||
However, `julia-layout` currently depends on `anzograph` with:
|
||
|
||
```yaml
|
||
depends_on:
|
||
anzograph:
|
||
condition: service_started
|
||
```
|
||
|
||
Meaning:
|
||
- Compose will ensure the **container process has started**.
|
||
- Compose does **not** guarantee the AnzoGraph HTTP/SPARQL endpoint is ready (unless you use `service_healthy`, and even then a “healthy GET” is not always equivalent to “SPARQL POST works with auth + JSON”).
|
||
|
||
So the Julia code includes its own readiness gate to prevent failures like:
|
||
- TCP connection refused (port not open yet)
|
||
- HTTP endpoint reachable but not fully initialized
|
||
- Non-JSON/HTML error responses while the service is still booting
|
||
|
||
---
|
||
|
||
## 3) What “ready” means in this repo
|
||
|
||
In this repo, “AnzoGraph is ready” means:
|
||
|
||
1. An HTTP `POST` to `${SPARQL_HOST}/sparql` succeeds, with headers:
|
||
- `Content-Type: application/x-www-form-urlencoded`
|
||
- `Accept: application/sparql-results+json`
|
||
- `Authorization: Basic ...`
|
||
2. The body parses as SPARQL JSON results (`application/sparql-results+json`)
|
||
|
||
It does **not** strictly mean:
|
||
- Your dataset is already loaded
|
||
- The loaded data is fully indexed (that can matter in some systems after `LOAD`)
|
||
|
||
This repo uses readiness as a **“SPARQL endpoint is alive and speaking the protocol”** check.
|
||
|
||
---
|
||
|
||
## 4) The actual Julia implementation (as in `julia/main.jl`)
|
||
|
||
### 4.1 Configuration (endpoint + auth)
|
||
|
||
The Julia script builds endpoint and auth from environment variables:
|
||
|
||
```julia
|
||
const SPARQL_HOST = get(ENV, "SPARQL_HOST", "http://localhost:8080")
|
||
const SPARQL_ENDPOINT = "$SPARQL_HOST/sparql"
|
||
const SPARQL_USER = get(ENV, "SPARQL_USER", "admin")
|
||
const SPARQL_PASS = get(ENV, "SPARQL_PASS", "Passw0rd1")
|
||
const AUTH_HEADER = "Basic " * base64encode("$SPARQL_USER:$SPARQL_PASS")
|
||
```
|
||
|
||
In Docker Compose for this repo, the Julia container overrides `SPARQL_HOST` to use the service DNS name:
|
||
|
||
```yaml
|
||
environment:
|
||
- SPARQL_HOST=http://anzograph:8080
|
||
```
|
||
|
||
### 4.2 The smoke query used for readiness
|
||
|
||
This is the query used in the wait loop:
|
||
|
||
```julia
|
||
const SMOKE_TEST_QUERY = "SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 3"
|
||
```
|
||
|
||
Notes:
|
||
- It’s intentionally small (`LIMIT 3`) to keep the readiness check cheap.
|
||
- It returns *some* bindings when data exists, but **even an empty dataset can still return a valid empty result set**. The code treats “valid response” as ready.
|
||
|
||
If you want a readiness check that does not depend on any data being present, an `ASK` query is also common:
|
||
|
||
```sparql
|
||
ASK WHERE { ?s ?p ?o }
|
||
```
|
||
|
||
### 4.3 SPARQL query function (request + minimal retry)
|
||
|
||
`sparql_query(query; retries=...)` is a generic helper that makes SPARQL POST requests:
|
||
|
||
```julia
|
||
function sparql_query(query::String; retries::Int=5)::SparqlResult
|
||
for attempt in 1:retries
|
||
try
|
||
response = HTTP.post(
|
||
SPARQL_ENDPOINT,
|
||
[
|
||
"Content-Type" => "application/x-www-form-urlencoded",
|
||
"Accept" => "application/sparql-results+json",
|
||
"Authorization" => AUTH_HEADER
|
||
];
|
||
body = "query=" * HTTP.URIs.escapeuri(query)
|
||
)
|
||
|
||
if response.status == 200
|
||
json = JSON.parse(String(response.body))
|
||
return SparqlResult(json["results"]["bindings"])
|
||
elseif response.status >= 500 && attempt < retries
|
||
sleep(10)
|
||
continue
|
||
else
|
||
error("SPARQL query failed with status $(response.status)")
|
||
end
|
||
catch e
|
||
if attempt < retries
|
||
sleep(10)
|
||
continue
|
||
end
|
||
rethrow(e)
|
||
end
|
||
end
|
||
error("SPARQL query failed after $retries attempts")
|
||
end
|
||
```
|
||
|
||
Important behaviors to preserve when transferring:
|
||
- It uses **POST** (not GET) to the SPARQL endpoint.
|
||
- It requires a **200** response and successfully parses SPARQL JSON results.
|
||
- It retries on:
|
||
- `>= 500` server errors
|
||
- network / protocol / parsing errors (caught exceptions)
|
||
|
||
### 4.4 The readiness gate: `wait_for_anzograph`
|
||
|
||
This is the “await until ready” logic:
|
||
|
||
```julia
|
||
function wait_for_anzograph(max_retries::Int=30)::Bool
|
||
println("Waiting for AnzoGraph at $SPARQL_ENDPOINT...")
|
||
|
||
for attempt in 1:max_retries
|
||
try
|
||
smoke_result = sparql_query(SMOKE_TEST_QUERY; retries=1)
|
||
println(" AnzoGraph is ready (attempt $attempt, smoke rows=$(length(smoke_result.bindings)))")
|
||
return true
|
||
catch e
|
||
println(" Attempt $attempt/$max_retries: $(typeof(e))")
|
||
sleep(4)
|
||
end
|
||
end
|
||
|
||
error("AnzoGraph not available after $max_retries attempts")
|
||
end
|
||
```
|
||
|
||
Why it calls `sparql_query(...; retries=1)`:
|
||
- It makes each outer “readiness attempt” a **single** request.
|
||
- The outer loop controls cadence (`sleep(4)`) and total wait time.
|
||
- This avoids “nested retry loops” (inner sleeps + outer sleeps) that can make waits much longer than intended.
|
||
|
||
Time bound in the current implementation:
|
||
- `max_retries = 30`
|
||
- `sleep(4)` between attempts
|
||
- Roughly ~120 seconds of waiting (plus request time).
|
||
|
||
---
|
||
|
||
## 5) What failures cause it to keep waiting
|
||
|
||
`wait_for_anzograph()` catches any exception thrown by `sparql_query()` and retries. In practice, that includes:
|
||
|
||
- **Connection errors** (DNS not ready, connection refused, etc.)
|
||
- **Timeouts** (if HTTP request takes too long and the library throws)
|
||
- **Non-200 HTTP statuses** that cause `error(...)`
|
||
- **Non-JSON / unexpected JSON** responses causing `JSON.parse(...)` to throw
|
||
|
||
That last point is a big reason a “real SPARQL request + parse” is stronger than just “ping the port”.
|
||
|
||
---
|
||
|
||
## 6) Transferable, self-contained version (recommended pattern)
|
||
|
||
If you want to reuse this in another project, it’s usually easier to:
|
||
- avoid globals,
|
||
- make endpoint/auth explicit,
|
||
- use a **time-based timeout** instead of `max_retries` (more robust),
|
||
- add request timeouts so the wait loop can’t hang forever on a single request.
|
||
|
||
Below is a drop-in module you can copy into your project.
|
||
|
||
```julia
|
||
module AnzoGraphReady
|
||
|
||
using HTTP
|
||
using JSON
|
||
using Base64
|
||
using Dates
|
||
|
||
struct SparqlResult
|
||
bindings::Vector{Dict{String, Any}}
|
||
end
|
||
|
||
function basic_auth_header(user::AbstractString, pass::AbstractString)::String
|
||
return "Basic " * base64encode("$(user):$(pass)")
|
||
end
|
||
|
||
function sparql_query(
|
||
endpoint::AbstractString,
|
||
auth_header::AbstractString,
|
||
query::AbstractString;
|
||
retries::Int = 1,
|
||
retry_sleep_s::Real = 2,
|
||
request_timeout_s::Real = 15,
|
||
)::SparqlResult
|
||
for attempt in 1:retries
|
||
try
|
||
response = HTTP.post(
|
||
String(endpoint),
|
||
[
|
||
"Content-Type" => "application/x-www-form-urlencoded",
|
||
"Accept" => "application/sparql-results+json",
|
||
"Authorization" => auth_header,
|
||
];
|
||
body = "query=" * HTTP.URIs.escapeuri(String(query)),
|
||
readtimeout = request_timeout_s,
|
||
)
|
||
|
||
if response.status != 200
|
||
error("SPARQL query failed with status $(response.status)")
|
||
end
|
||
|
||
parsed = JSON.parse(String(response.body))
|
||
bindings = get(get(parsed, "results", Dict()), "bindings", Any[])
|
||
return SparqlResult(Vector{Dict{String, Any}}(bindings))
|
||
catch e
|
||
if attempt < retries
|
||
sleep(retry_sleep_s)
|
||
continue
|
||
end
|
||
rethrow(e)
|
||
end
|
||
end
|
||
error("sparql_query: unreachable")
|
||
end
|
||
|
||
"""
|
||
Wait until AnzoGraph responds to a real SPARQL POST with parseable JSON.
|
||
|
||
This is the direct analog of this repo's `wait_for_anzograph()`, but with:
|
||
- a time-based timeout (`timeout`)
|
||
- a request timeout per attempt (`request_timeout_s`)
|
||
- simple exponential backoff
|
||
"""
|
||
function wait_for_anzograph(
|
||
endpoint::AbstractString,
|
||
auth_header::AbstractString;
|
||
timeout::Period = Minute(3),
|
||
initial_delay_s::Real = 0.5,
|
||
max_delay_s::Real = 5.0,
|
||
request_timeout_s::Real = 10.0,
|
||
query::AbstractString = "ASK WHERE { ?s ?p ?o }",
|
||
)::Nothing
|
||
deadline = now() + timeout
|
||
delay_s = initial_delay_s
|
||
|
||
while now() < deadline
|
||
try
|
||
# A single attempt: if it succeeds, we declare "ready".
|
||
sparql_query(
|
||
endpoint,
|
||
auth_header,
|
||
query;
|
||
retries = 1,
|
||
request_timeout_s = request_timeout_s,
|
||
)
|
||
return
|
||
catch
|
||
sleep(delay_s)
|
||
delay_s = min(max_delay_s, delay_s * 1.5)
|
||
end
|
||
end
|
||
|
||
error("AnzoGraph not available before timeout=$(timeout)")
|
||
end
|
||
|
||
end # module
|
||
```
|
||
|
||
Typical usage (matching this repo’s environment variables):
|
||
|
||
```julia
|
||
using .AnzoGraphReady
|
||
|
||
sparql_host = get(ENV, "SPARQL_HOST", "http://localhost:8080")
|
||
endpoint = "$(sparql_host)/sparql"
|
||
user = get(ENV, "SPARQL_USER", "admin")
|
||
pass = get(ENV, "SPARQL_PASS", "Passw0rd1")
|
||
|
||
auth = AnzoGraphReady.basic_auth_header(user, pass)
|
||
AnzoGraphReady.wait_for_anzograph(endpoint, auth; timeout=Minute(5))
|
||
|
||
# Now it is safe to LOAD / query.
|
||
```
|
||
|
||
---
|
||
|
||
## 7) Optional: waiting for “data is ready” after `LOAD`
|
||
|
||
Some systems accept `LOAD` but need time before results show up reliably (indexing / transaction visibility).
|
||
If you run into that in your other project, add a second gate after `LOAD`, for example:
|
||
|
||
1) load, then
|
||
2) poll a query that must be true after load (e.g., “triple count > 0”, or a known IRI exists).
|
||
|
||
Example “post-load gate”:
|
||
|
||
```julia
|
||
post_load_query = """
|
||
SELECT (COUNT(*) AS ?n)
|
||
WHERE { ?s ?p ?o }
|
||
"""
|
||
|
||
res = AnzoGraphReady.sparql_query(endpoint, auth, post_load_query; retries=1)
|
||
# Parse `?n` out of bindings and require it to be > 0; retry until it is.
|
||
```
|
||
|
||
(This repo does not currently enforce “non-empty”; it only enforces “SPARQL is working”.)
|
||
|
||
---
|
||
|
||
## 8) Practical checklist when transferring to another project
|
||
|
||
- Make readiness checks hit the **real SPARQL POST** path you will use in production.
|
||
- Require a **valid JSON parse**, not just “port open”.
|
||
- Add **per-request timeouts**, so a single hung request cannot hang the whole pipeline.
|
||
- Prefer **time-based overall timeout** for predictable behavior in CI.
|
||
- Keep the query **cheap** (`ASK` or `LIMIT 1/3`).
|
||
- If you use Docker Compose healthchecks, consider also using `depends_on: condition: service_healthy`, but still keep the in-app wait as a safety net (it’s closer to the real contract your code needs).
|
||
|