PETRA Developer Manual¶
How to use PETRA (Petri-Net Trained Architecture) — both the
high-level adapter (TOML config → trained model) and the underlying
framework modules. Treat ROADMAP.md as the why, this document as
the how. Update this manual whenever a new module, scenario, or
extension point is added.
1. Quick start¶
Install from PyPI (Python 3.11+):
The distribution name is petra-nn (the bare petra was
already taken on PyPI); the import name is petri_net_nn
(matches the source layout). The fastest path:
write a TOML config and use the adapter.
from petri_net_nn import load_scenario
ctx = load_scenario("examples/biological_signalling/scenario.toml")
module, losses = ctx.train()
rules = ctx.extract_rules(module)
print(rules["xor"][0].description())
A scenario is a TOML file describing the net, traces, and training
parameters. The adapter resolves it into a ScenarioContext and the
context exposes compile, train, attribute_to_marking,
extract_rules, and anomaly_score. No per-scenario Python boilerplate
is needed.
2. The adapter¶
2.1 Config schema (TOML)¶
The full surface of the adapter, organised by section. Every field is optional unless flagged otherwise.
# ---------------------------------------------------------------------------
# [scenario] — identification metadata
[scenario]
name = "..." # human-readable identifier (defaults to file stem)
description = "..." # optional one-line description
# ---------------------------------------------------------------------------
# [net] — exactly one source per scenario
[net]
source = "inline" # one of: "inline" | "bpmn_file" | "pnml_file" | "sif_file" | "discover"
# ---- "inline" form: declare the net structurally in TOML ----
[[net.places]]
id = "p_x" # required
tokens = 1 # initial-marking tokens (default 0)
label = "..." # human label
[[net.transitions]]
id = "t_x" # required
label = "..."
duration = 1 # firing duration in time-unrolled steps (default 1)
rate = 1.0 # firing-rate prior multiplier (default 1.0)
guard = { place = "p_x", op = ">=", value = 1000.0 } # CPN structural guard
# silent = true # mark as τ for weak bisimulation
[[net.arcs]]
src = "p_x"
dst = "t_x"
weight = 1 # arc multiplicity (default 1, >1 for batching)
output_value = 1.0 # CPN output value (constant only via TOML)
# Inhibitor arcs — place must be empty for the transition to fire
[[net.inhibitor_arcs]]
place = "p_guard"
transition = "t_guarded"
# ---- OR: "bpmn_file" form — process.bpmn relative to the config ----
# path = "process.bpmn"
# ---- OR: "pnml_file" form — PNML 2009 P/T net subset ----
# path = "net.pnml"
# ---- OR: "sif_file" form — Pathway Commons SIF ----
# path = "pathway.sif"
# ---- OR: "discover" form — mine the structure from the [traces] section
# below via the basic Inductive Miner. No structural payload here. ----
# (just `source = "discover"` is enough; the adapter loads traces first
# and feeds them to `discover_inductive`.)
# ---------------------------------------------------------------------------
# [traces] — exactly one source per scenario; section is optional if you
# only want to load the net structurally
[traces]
source = "xes_file" # one of: "inline" | "xes_file" | "csv_file" | "json_file"
# ---- "xes_file" form ----
path = "log.xes" # or .xes.gz (parser handles gzip)
limit_traces = 300 # cap for large public XES logs
promote_event_attrs = ["impact"] # lift event attrs to trace level
event_name_attr = "lifecycle:transition" # use a different attr as event name
# ---- "csv_file" form (process-mining flat table) ----
# path = "log.csv"
# case_column = "case_id"
# activity_column = "activity"
# ---- "json_file" form ----
# path = "log.json"
# ---- "inline" form ----
# [[traces.inline]]
# attributes = { signal = "0.9" }
# events = ["task_a", "task_b"]
# ---------------------------------------------------------------------------
# [training.input_marking] — required when traces are present.
# Each key is a place id; value is { attribute = "name" } (read from the
# trace's attributes dict) or { constant = N }.
[training.input_marking]
p_x = { attribute = "signal" }
p_y = { constant = 0.5 }
# ---------------------------------------------------------------------------
# [training.input_values] — coloured-Petri-net value channel; same form
# as input_marking but feeds the per-token value the compiler reads
# through structural guards (see §3.3).
[training.input_values]
p_x = { attribute = "amount" }
# ---------------------------------------------------------------------------
# [training] — training hyperparameters
[training]
steps = 1500
lr = 0.1
sharpness = 1.0
firing = "sigmoid" # or "ste"
routing = "independent" # or "softmax"
num_steps = 0 # 0 = acyclic single-pass, >0 = time-unrolled
seed = 0 # torch.manual_seed before module construction
# ---------------------------------------------------------------------------
# [interpretability] — toggles for ctx.extract_rules()
[interpretability]
extract_xor_rules = true
extract_and_join_rules = false
Anything not listed above (per-transition torch guards, custom arc output transforms, etc.) lives in the Python API rather than TOML — torch callables can't round-trip through a configuration file.
2.2 ScenarioContext methods¶
| Method | Returns | What it does |
|---|---|---|
compile() |
PetriNetModule |
Build the module with config training params; seeds torch first. |
train(module=None) |
(PetriNetModule, list[float]) |
Compile if needed and run train_on_traces. |
attribute_to_marking(trace) |
dict[str, float] |
Resolve the config's input-marking spec for a single trace. |
attribute_to_values(trace) |
dict[str, float] |
Resolve the config's [training.input_values] spec — the coloured-token value channel the compiler reads through structural guards. |
extract_rules(module) |
dict[str, list] |
Run XOR / AND-join rule extraction per [interpretability] toggles. |
anomaly_score(module, trace) |
dict[str, float] |
Per-transition residuals for one trace. |
Carrying the raw config dict (and config_dir Path) on the context
lets downstream callers read scenario-specific extras (e.g. anomaly
generator settings) without re-parsing the file.
3. Framework reference¶
When the adapter doesn't fit (e.g. you want to manipulate the net structurally before compilation), use the underlying modules directly.
3.1 petri_net.py — the formal object¶
from petri_net_nn import PetriNet
net = PetriNet()
net.add_place("p_a", tokens=1, label="start")
net.add_place("p_b")
net.add_transition("t_x", label="do thing")
net.add_arc("p_a", "t_x")
net.add_arc("t_x", "p_b")
Other methods: preset, postset, is_enabled, fire,
enabled_transitions, validate. Validation returns a list of
structural issues (empty list = well-formed).
3.2 bpmn.py — BPMN 2.0 → PetriNet¶
Supports: tasks, gateways, start/end events, compensation boundary
events (acyclic and throw-event forms), error/timer/signal/escalation/
message boundary events (interrupting and non-interrupting),
intermediate events, lanes, multi-pool <collaboration> with
<messageFlow>.
Does not support: subprocesses, intermediate event definitions (timer/message/signal as intermediate, not boundary), message flows to non-task nodes.
3.3 compiler.py — PetriNet → differentiable nn.Module¶
from petri_net_nn import PetriNetModule
module = PetriNetModule(
net,
sharpness=1.0, # inside-sigmoid multiplier
num_steps=0, # 0 = acyclic; >0 = time-unrolled
firing="sigmoid", # or "ste" (straight-through estimator)
routing="independent", # or "softmax" (over XOR groups)
)
out = module(input_marking={"p_a": torch.tensor([0.9])})
The compiler's structural constraint is enforced by construction:
one nn.Parameter per arc in F, one per transition (threshold). No
weights exist outside the flow relation.
The forward pass implements a continuous relaxation of the discrete firing rule. For each transition:
activation(t) = σ( sharpness · ( Σ_p w(p,t) · a(p) − θ(t) ) )
and for each downstream place:
a(p) = Σ_{t : (t,p) ∈ F} activation(t) · w(t,p)
That makes the whole network differentiable end to end — standard
backpropagation applies. firing="ste" swaps the sigmoid for a
hard step in the forward pass while keeping the sigmoid gradient
flowing backward (the standard straight-through estimator);
routing="softmax" replaces independent sigmoids over an XOR
group with a softmax that sums to 1; inhibitor arcs multiply the
resulting activation by (1 − a(p)) for each inhibitor place;
transition durations buffer the activation for D−1 time-unrolled
steps before it contributes to downstream places.
Coloured-Petri-net layer. Three ways to express value-dependent behaviour, in order of expressiveness:
-
Structural guard —
{place, op, value}declared in TOML or viaadd_transition(..., structural_guard=...). The compiler builds one learnablenn.Parameterthreshold per guarded transition, seeded atvalueand refined by training, and multiplies the transition's firing strength bysoft_guard(t) = σ( sharpness · scale(t) · sign(op) · ( value(place) − θ_guard(t) ) )
with sign(op) = +1 for >/>= and −1 for
</<=, and scale(t) = 1 / max(|θ_init|, 1.0) so the
sigmoid's gradient is O(1) at the boundary regardless of the
value units the modeller used. Equality / inequality (==,
!=) cannot be expressed structurally and must use the
torch-guard form below. This is the case where you want PETRA
to learn the threshold from data — see the
credit_approval_coloured scenario.
-
Torch guard — a Python callable on the transition (kwarg
torch_guard=...) takingdict[place_id, Tensor(batch,)]of input values and returning aTensor(batch,)gate in [0, 1]. For routing logic the structural form can't express: multi-input comparisons, compound predicates, custom learnable sub-networks. Overrides the structural guard when both are declared on the same transition. -
Token-game-only callable guard —
guard=..., a bool-returningGuardFn. Used byfire_coloured/is_enabled_coloured; the compiler ignores it.
Output-arc values follow the same tiered pattern, in compiler
precedence: torch_output_value (callable on bound input
value tensors, honoured by the compiler), then output_value
(constant float honoured by the compiler, or float-returning
callable evaluated only by the token-game), then the default
1.0.
The forward pass carries a parallel per-place value channel
alongside activations: source-place values come from the
input_values=... argument to forward (default 1.0);
non-source places get an activation-weighted average of
contributing transitions' output-arc values. This channel is
what the guard sigmoids read.
3.4 traces.py — training and anomaly scoring¶
train_on_traces(module, traces, *, attribute_to_marking, attribute_to_values=None, steps, lr, transitions=None)— main training loop. Passattribute_to_valuesto feed the per-token value channel that trains structural-guard thresholds.anomaly_score(module, trace, *, attribute_to_marking, attribute_to_values=None)— per-transition residual dict.trace_anomaly_score(module, trace, ...)— scalar trace-level score.auc(positive_scores, negative_scores)— Mann-Whitney U / ROC AUC.SharpnessScheduler(module, *, start, end, num_steps, kind)— anneal sharpness over training.sweep_trace_count(factory, traces, *, attribute_to_marking, sample_sizes, steps, lr)— characterise training-data requirements.expected_cost(module, transition_costs, *, input_marking, batch_size)— sum of (activation × cost) per transition. Used for cost-ranked variant search (point #6).
3.5 xes.py — XES log loader¶
Returns list[XESTrace]. Each trace has attributes: dict[str, str]
and events: list[XESEvent] where each event has name: str
(concept:name).
3.6 anomalies.py — corruption generators and baselines¶
from petri_net_nn import drop_event, insert_event, swap_event_labels, shuffle_events, FrequencyBaseline
Generators take a normal trace and return a corrupted copy
(non-mutating). FrequencyBaseline().fit(traces).score(trace) is the
non-structural baseline used for Phase 7 comparison.
3.7 interpretability.py — distilled rules¶
from petri_net_nn import (
find_xor_groups, # structural XOR detection (single-input or shared-preset)
extract_xor_rule, # binary routing rule (crossover threshold)
extract_routing_rules, # all binary XOR groups in a module
extract_xor_partition, # N-way piecewise routing
extract_routing_partitions, # all groups, any arity
find_and_join_transitions,
extract_and_join_rule, # weighted-vote / quorum rule
extract_and_join_rules,
explain_anomaly, # prose explanation pinned to BPMN labels
# Phase 13 — confidence intervals, counterfactuals, sensitivity,
# cross-variant comparison, prose
bootstrap_xor_rule, # bootstrap CI on an XOR rule
bootstrap_and_join_rule, # bootstrap CI on an AND-join rule
XORRuleCI, AndJoinRuleCI, # CI bundles (point estimate + distribution + CI)
find_counterfactual, # binary-search counterfactual analysis
Counterfactual, # the counterfactual bundle
transition_sensitivity, # per-input gradients at a base point
input_importance, # aggregate input importance over a trace set
SensitivityReport, # sensitivity bundle (marking + value gradients)
compare_variants, # grid-comparison between two trained variants
ComparisonReport, # comparison bundle (agreement + divergent samples)
DisagreementSample, # one grid point where the variants disagree
prose_for_xor_rule, # rule (or CI) → paragraph
prose_for_and_join_rule, # rule (or CI) → paragraph
prose_for_counterfactual, # counterfactual → paragraph
prose_for_sensitivity, # sensitivity report → paragraph
prose_for_comparison_report,# comparison report → paragraph
)
For shared-preset XOR groups (multi-input competing transitions, e.g. 2PC commit/abort), the rule extractors automatically pick the discriminative input — the place whose learned weight gap across the group is largest.
Bootstrap confidence intervals. Both bootstrap_xor_rule and
bootstrap_and_join_rule take a module_factory (zero-arg
callable returning a fresh, untrained module), the training trace
list, and the per-resample training parameters. They resample
n_bootstrap times (default 100) with replacement, train a fresh
module on each resample, extract the rule, and return a CI bundle
with:
- the point-estimate rule from training on the full trace list;
- the bootstrap distribution of the headline parameter (crossover or threshold);
- percentile-based confidence interval at the requested coverage (default 95%);
- a direction-agreement / quorum-agreement rate — how often the bootstrap rule matched the point estimate's structural form.
The bootstrap RNG is seedable for reproducibility (seed=...);
per-resample training uses the default global torch RNG so
initialisation variance contributes to the distribution.
Counterfactual analysis. find_counterfactual(module,
base_marking, *, flip_place, target_transition, ...) binary-
searches the input at one place for the boundary at which the
target transition's firing activation crosses 0.5. It works on
both the marking channel (flip_channel="marking") and the
coloured-token value channel (flip_channel="value", for CPN
scenarios where the structural guard reads per-token values).
Returns a Counterfactual with the original vs counterfactual
input and activation pair. Separate tolerance (activation
tolerance) and interval_tolerance (search-width tolerance, auto-
defaults to 1e-4 of the search range) let it converge cleanly on
both [0, 1] marking ranges and [0, 10000] value ranges.
Sensitivity analysis. transition_sensitivity(module,
base_marking, target_transition, *, base_values=None) returns a
SensitivityReport with per-input gradient magnitudes of the
target's firing activation at the base point. Both channels are
reported separately (marking and value), so a CPN scenario can
see which channel drives the decision. input_importance(
module, traces, *, attribute_to_marking, transitions=None,
attribute_to_values=None) aggregates absolute gradients across
a trace set and all scored transitions, returning per-input
importance scores keyed by "<channel>:<place>". Both functions
use torch autograd directly — gradients are local, so saturated
regions produce smaller magnitudes (the prose helper reports the
base activation alongside the ranking).
Cross-variant comparison. compare_variants(module_a,
module_b, *, input_grid, input_value_grid=None,
correspondence=None, tolerance=0.1, max_disagreement_samples=50)
runs both modules across a Cartesian-product grid of input points,
pairs transitions by label (with explicit correspondence
override), and reports the hard-agreement rate (same firing
decision, both on the same side of 0.5), the soft-agreement rate
(activations within tolerance), per-transition agreement
rates, and up to max_disagreement_samples divergent grid
points as DisagreementSample entries. Unmatched labels are
recorded separately so callers can spot structural mismatches
before reading the agreement numbers. Useful as the
data-supported counterpart to are_bisimilar /
are_weakly_bisimilar — those check behavioural equivalence;
this one shows where in the input space two trained variants
actually agree.
Prose helpers. prose_for_xor_rule,
prose_for_and_join_rule, prose_for_counterfactual,
prose_for_sensitivity, and prose_for_comparison_report
accept either the bare object or the CI variant where applicable.
With a CI the prose includes the bracket numbers and agreement
percentage; with a bare object it doesn't. An optional
input_label (or input_labels mapping, for the sensitivity
helper) substitutes domain terms for raw place ids — useful for
regulator-facing output. The comparison helper takes
top_disagreement to cap the number of worst-offender
transitions it lists explicitly.
3.8 soundness.py — Aalst soundness + deadlock localisation¶
check_soundness(net, *, final_marking=None) returns a
SoundnessReport pinning three classical soundness conditions:
| Condition | Field on report | Failing case |
|---|---|---|
| Option to complete | incomplete_markings |
reachable markings from which the intended final marking can't be reached |
| Proper completion | lingering_token_markings |
sink reached its completion count but tokens still hang around at other places |
| No dead transitions | dead_transitions |
transitions never enabled in any reachable marking |
report.is_sound is the boolean (all three lists empty);
report.summary() is a one-line digest suitable for log lines and
assertion messages. The default final_marking is "one token at
each sink place" — places with no outgoing arcs. Pass an explicit
final_marking when the sink is ambiguous (cyclic nets without a
clear terminus) or when completion has multiple sink tokens.
find_deadlocks(net) returns the non-final markings with no
enabled successors — the specific token configurations from which
the net can't progress. Overlaps with the option-to-complete
failure that check_soundness reports, but isolates root-cause
states (a state that can only reach a deadlock also fails
option-to-complete; the deadlock itself is the actionable root).
Both checks are structural — they analyse the Petri net's behaviour without running training. Use them before training to catch modeller bugs, after a refactoring to confirm the new variant is still sound, or in CI to enforce soundness on every scenario.
3.9 coverability.py — Karp-Miller coverability analysis¶
The soundness checker above assumes the net is bounded — its reachability graph terminates. Coverability answers the prior question: is the net bounded at all, and if not, which places are the ones blowing up?
coverability_graph(net) runs the classical Karp-Miller
construction and returns a CoverabilityReport:
| Field | What it carries |
|---|---|
is_bounded |
True iff no place in any coverability node carries ω |
unbounded_places |
sorted list of place names with ω somewhere in the tree |
place_bounds |
per-place upper bound — integer for bounded places, OMEGA for unbounded ones |
omega_markings |
sorted list of nodes carrying ω; each one witnesses unboundedness |
report.summary() is the one-liner ("bounded" or
"unbounded at N place(s): p1, p2, ...").
is_bounded(net) is the convenience boolean — equivalent to
coverability_graph(net).is_bounded, useful as a precondition
guard before running the other verification checks that assume
finite reachable state space.
The analyser is exact on nets without inhibitor arcs and
conservative on nets that use them (may overreport
unboundedness). This is a mathematical limit, not a tool
limitation: inhibitor-arc Petri nets are Turing-complete
(Minsky 1967), so exact coverability would solve the halting
problem (Turing 1936). PETRA opts for conservative reporting
rather than refusing to answer; see
BUSINESS_ANALYST_GUIDE.md §17 for the long-form treatment,
three industry examples of where unboundedness shows up
(banking reconciliation queues, NHS specialty waiting lists,
manufacturing WIP at bottlenecks), and the three practical
workarounds.
3.10 ctl.py — Computation Tree Logic model checking¶
from petri_net_nn import (
Atom, Not, And, Or, EX, EU, EG,
AX, AG, AF, EF, AU,
place_has_token, place_empty, place_count_eq, place_count_ge,
transition_enabled,
conj, disj, implies,
check_ctl, satisfies, CTLResult,
)
Build a CTL formula by composing the AST classes, then ask the checker whether the net's initial marking satisfies it:
prop = AG(implies(
place_has_token("request"),
AF(place_has_token("response")),
))
result = check_ctl(net, prop)
result.holds_at_initial # bool
result.holds_at # frozenset[Marking] of satisfying states
result.counterexample # a marking that violates the formula, or None
The AST primitives are Atom, Not, And, Or, EX, EU,
EG; the derived constructors AX, AG, AF, EF, AU
expand to combinations of the primitives via the standard CTL
equivalences (AG φ ≡ ¬EF ¬φ, etc.). Atomic propositions are
predicates over the current marking, expressed either via the
helpers (place_has_token, place_empty, place_count_eq,
place_count_ge, transition_enabled) or built directly with
Atom(callable, label) where the callable takes a dict[str, int]
marking and returns a bool.
Implementation note: implicit self-loops are added at every deadlock state before the fixed-point computation — the standard Kripke convention so that AG / AF / EG behave intuitively on terminating workflow nets (without the self-loops, EG would be vacuously false everywhere on a terminating net, and AX would be vacuously true at the sink).
3.11 bisimulation.py — formal equivalence checking¶
from petri_net_nn import (
are_bisimilar,
are_weakly_bisimilar,
bisimulation_equivalence_classes,
weak_bisimulation_equivalence_classes,
reachability_graph,
)
Strong bisimulation on the labelled transition system via partition
refinement. are_bisimilar(net1, net2) returns True iff the two
nets exhibit identical labelled future behaviour from their initial
markings.
Weak bisimulation collapses transitions flagged silent=True
on the net — internal logging hooks, no-op routing gates, internal
handoffs — before comparison. are_weakly_bisimilar(net1, net2)
treats every silent transition as a τ-edge, saturates the
combined LTS so τ-paths become direct edges and visible-action
edges gain τ-prefix / τ-suffix detours, then runs strong
bisimulation on the saturated LTS. The net effect: refactorings
that add or remove internal-only structural artefacts no longer
break the equivalence claim, which is the case the cost-ranked
refactoring story relies on.
3.12 subnets.py — hand-built reference subnets¶
SequentialSubnet, XORSubnet, AndSplitSubnet, AndJoinSubnet,
SagaSubnet — five nn.Module subclasses corresponding to the
canonical workflow-net building blocks. The general
PetriNetModule subsumes them; the hand-built versions stay as
readable references and regression coverage.
3.13 streaming.py — real-time anomaly scoring¶
from petri_net_nn import (
StreamingEvent, StreamingEvaluation, StreamingEvaluator,
)
evaluator = StreamingEvaluator(
trained_module,
attribute_to_marking=ctx.attribute_to_marking,
score_on_every_event=False, # default: only score on close
)
# Push events as they arrive on the stream:
evaluator.on_event(StreamingEvent(
case_id="loan-4827", name="ApplicationSubmitted",
attributes={"amount": "5000"},
))
evaluator.on_event(StreamingEvent(
case_id="loan-4827", name="CreditCheck",
attributes={},
))
# When the case is known to be finished:
result = evaluator.close_case("loan-4827")
# result.trace_score, result.per_transition_residuals,
# result.n_events, result.closed=True.
The default policy is on close — on_event returns None,
state accumulates per case, close_case scores once and frees
the state. Flip score_on_every_event=True to get an
evaluation back from every on_event call (live-dashboard
mode, at higher per-event cost). The pull helper
process_stream(events_iter) yields evaluations as they come,
useful for testing and batch backfill of a stored log.
Scoring delegates to the offline anomaly_score against a
:class:XESTrace assembled from the per-case events and the
merged attribute dict — so streaming and batch share the same
correctness story. Per-case attributes use latest-event-wins
on key collisions, which matches the "current state" semantics
most real event sources have.
Single-threaded by design. A multi-threaded source should serialise calls through a single consumer goroutine / queue.
3.14 discovery.py — Inductive Miner for log → Petri-net discovery¶
from petri_net_nn import discover_inductive, discover_and_train
# Just discovery — log in, sound Petri net out.
net = discover_inductive(xes_traces)
# End-to-end pipeline — discover, soundness-check, compile,
# train, in one call.
net, module, losses = discover_and_train(
xes_traces,
attribute_to_marking=ctx.attribute_to_marking,
steps=1000,
lr=0.1,
)
Basic Inductive Miner (Leemans, Fahland, van der Aalst, 2013). Builds the directly-follows graph of the log, recursively tries four cut shapes in priority order — exclusive choice, sequence (level-based antichain decomposition of the SCC DAG), parallel, loop (minimal body = start∪end activities, remaining activities partitioned into redo components) — and emits a process tree that the translator turns into a Petri net.
Output is sound by construction: every mined net passes
check_soundness. The test suite verifies this on each
canonical cut shape plus a replay-invariant BFS that confirms
every input trace replays on the mined net modulo τ collapse.
Limitations carried forward (basic IM, not IMf / IMi):
- Noisy logs may collapse into the flower-model fallback.
- Some loop topologies (e.g. body-only loops like
(a,b,a,b)) fall through to the flower; the τ-cut handling that would fix them is a Phase 12 follow-up. - Timestamps and event attributes are deliberately ignored
during discovery — only activity sequences. The training
step that follows reads attributes via the standard
train_on_tracespath.
3.15 cli.py — command-line entry points for engine integration¶
pip install petra-nn drops three commands on the user's PATH
(plus a petra umbrella that exposes them as subcommands):
| Command | What it does |
|---|---|
petra-train scenario.toml -o model.pt |
Load a TOML scenario, train the compiled module, save a bundle (.pt + .meta.json sidecar). |
petra-score model.pt --traces log.csv |
Load a saved bundle, score traces from XES/CSV/JSON, emit per-trace anomaly scores as JSON. |
petra-serve model.pt --port 8000 |
Load a saved bundle and run the FastAPI REST app under uvicorn on a port. |
These exist for engine integration — a workflow team that wants to plug PETRA into Camunda / Activiti / Flowable can do the entire round trip (train → serve / score) without writing any Python beyond the scenario TOML.
What this enables. A workflow team installs petra-nn,
exports a representative slice of their engine's audit log to
CSV, runs petra-train once, gets a deployable bundle, and
then chooses one of three integration patterns:
- call
petra-serveand have the engine POST to PETRA's REST endpoint per event (synchronous); - feed the engine's Kafka / RabbitMQ topic into PETRA's
StreamingEvaluator(live, decoupled); - run
petra-scoreon a cron schedule against an exported history CSV (batch).
What this deliberately is NOT. No JVM-side plugins. There
is no Java code in this repository, no Camunda execution
listener, no Activiti command interceptor, no Flowable process
delegate. The boundary between PETRA and an engine is JSON
(for the three patterns above) or ONNX (for in-process
JVM inference using a petra-train + export_onnx bundle
loaded by ONNX Runtime for Java). Engine-side extension code
is the engine team's responsibility — see
docs/INTEGRATION_PATTERNS.md for the per-pattern wiring
recipes covering all three named engines.
Bundle format¶
A trained-model bundle is two files written side-by-side by
petra-train:
model.pt— pickled :class:PetriNetModule. Use :func:torch.load(path, weights_only=False)to reconstruct. Don't load bundles from untrusted sources — pickle deserialises arbitrary Python objects.model.pt.meta.json— JSON sidecar with the scenario name, description,input_marking_spec,input_values_spec, the PETRA version that produced the bundle, and a UTC timestamp. Read bypetra-scoreto map trace attributes onto place markings without needing the originalscenario.tomlat score time.
3.16 rest.py — HTTP wrapper around a trained module¶
from petri_net_nn import load_scenario, build_app
import uvicorn
ctx = load_scenario("my_scenario.toml")
module, _ = ctx.train()
app = build_app(module, title="My PETRA model", version="1.0.0")
uvicorn.run(app, host="0.0.0.0", port=8000)
build_app(module, *, title, version, description=None) returns
a FastAPI application exposing six JSON endpoints:
| Endpoint | Purpose |
|---|---|
GET /healthz |
Liveness probe + module stats (n_places, n_transitions, version). |
GET /schema |
Static structure: places, transitions, labels, initial marking, flags for silent transitions and structural guards. |
POST /forward |
One forward pass — input_marking + optional input_values → per-transition + per-place activations. |
POST /anomaly |
Score a trace — events + input_marking → trace_score + per-transition residuals. |
POST /counterfactual |
Delegates to find_counterfactual; works on both the marking and value channels. |
POST /sensitivity |
Delegates to transition_sensitivity; per-input gradients at one base point. |
The Swagger UI is auto-generated at /docs and /redoc;
/openapi.json carries the OpenAPI 3.x schema. Pydantic gates
input validation — bad place / transition names return 400 with
useful messages rather than 500-ing inside the model.
The [rest] optional install extra brings fastapi and
uvicorn; [dev] brings httpx for the FastAPI TestClient.
Importing petri_net_nn does not require either — the
fastapi / pydantic imports are guarded at module load and
build_app raises an ImportError when called without the
deps. Single-model-per-app by design; auth / CORS / rate
limiting are left to the caller's app layer.
3.17 onnx_export.py — export to the ONNX interchange format¶
from petri_net_nn import export_onnx
schema = export_onnx(
trained_module,
"model.onnx",
input_places=["p_application"],
input_value_places=["p_application"], # CPN value channel, optional
output_transitions=["t_approve", "t_decline"],
)
# `schema` is a JSON-serialisable dict mapping the ONNX-graph
# input / output names back to PETRA's place / transition ids.
# Ship it as a sidecar (e.g. model.onnx.json) so consumers in
# other languages know which positional tensor maps to which
# place.
The exported .onnx file runs unchanged in any ONNX runtime —
onnxruntime in Python / C++ / Java / browser, plus the
accelerator stacks (TensorRT, OpenVINO, Core ML via converters).
Batch dimension is dynamic by default; opset 17 is the modern
baseline.
The dict-of-tensors interface PETRA uses (input_marking={
place_id: tensor}) gets bridged by a small positional-args
wrapper before tracing. The schema dict the function returns
carries the canonical order: position i in the exported
graph's positional inputs maps to schema["input_marking_places"][i]
(or input_value_places[i - n_marking] for the value channel).
The optional [onnx] install extra pulls in onnxruntime for
the parity-test surface; the export itself only needs torch.
Limitation: time-unrolled mode with non-uniform transition
durations uses Python lists for the in-flight queue which don't
survive ONNX tracing.
4. Examples folder¶
examples/
<scenario_name>/
scenario.toml # the config
[data files] # optional BPMN or XES
README.md # what + why + advantage over alternatives
Each scenario has a paired test in tests/scenarios/test_<name>.py
that loads the config via load_scenario and asserts the pipeline
produces the expected behaviour. pytest tests/scenarios/ runs all
scenario tests.
Current scenarios:
| Scenario | Framing claim |
|---|---|
biological_signalling/ |
Non-BPMN substrate covers signalling pathways. |
distributed_consensus/ |
Cross-pool composition covers distributed protocols (2PC). |
manufacturing_cell/ |
Sequential + XOR primitives cover production lines. |
network_protocol/ |
Substrate covers protocol state machines; attack-pattern anomaly detection. |
scientific_workflow/ |
Substrate covers lab protocols; deviation analysis. |
multi_agent_coordination/ |
Three-pool composition covers contract-net coordination. |
batch_packaging/ |
Phase 9 multi-token markings: arc weight 6 batches bottles into crates. |
resource_lock/ |
Phase 9 inhibitor arcs: two clients race for a single shared resource. |
paint_shop/ |
Phase 9 transition durations: a 3-step cure transition delays output. |
priority_dispatch/ |
Phase 9 stochastic firing rates: three handlers with rate priors. |
credit_approval_coloured/ |
Phase 9 coloured tokens plus the CPN-aware compiler: routing on the application amount carried by the token, with the guard threshold learned from trace data. |
incident_management/ |
Phase 10 — trains on the real BPI Challenge 2013 incidents log (7,554 Volvo IT tickets), the actual public dataset. |
mapk_pathway/ |
Phase 10 — loads a Pathway Commons-style SIF of the MAPK1/3 (ERK1/2) signalling cascade and runs a forward pass through the EGF → MAPK → transcription-factor flow. |
cost_ranked_refactoring/ |
Provably-safe refactoring via Phase 2 + expected_cost. |
discover_and_train_pipeline/ |
Phase 12 native log-to-net discovery via the basic Inductive Miner plus the adapter's net.source = "discover" keyword — full discover → verify → compile → train pipeline from a TOML config with no Petri net supplied. |
safe_refactoring/ |
Phase 11 weak bisimulation + Phase 13 cross-variant comparison + bootstrap CIs — two structurally-different loan-approval variants (with vs without a silent audit-log step) proved equivalent under weak bisim, soft-routing compared across the credit-score domain, distilled rules' CIs overlapping. |
compliance_audit/ |
Phase 11 Aalst soundness + deadlock localisation + CTL — regulatory invariants on a loan-approval process verified mechanically; deliberately-broken variant that bypasses the audit step fails the CTL invariant with a counterexample marking. |
regulator_ready_credit_approval/ |
Phase 13 full diagnostic surface — bootstrap CIs, counterfactuals, sensitivity, prose — applied to a coloured-token loan-approval net to produce the model-explanation artefacts a modern decisioning regulator expects (GDPR Article 22, SR 11-7, EU AI Act). |
realtime_monitoring/ |
Phase 14 streaming evaluator running on a simulated incident-management event stream; anomalous cases scored live as events arrive, with the actionable per-transition residual pinned to the skipped step. |
onnx_deployment/ |
Phase 14 ONNX export — train a loan-approval XOR-routing model from inline traces, export to .onnx, parity-check against the torch forward pass under onnxruntime. The deployment bridge to JVM / C++ / browser / mobile / accelerator stacks. |
rest_serving/ |
Phase 14 REST inference API + CLI — train a small loan-approval model, build a FastAPI app via build_app, exercise every endpoint (/healthz, /schema, /forward, /anomaly, /counterfactual, /sensitivity, /openapi.json) through TestClient. |
5. Adding a new scenario¶
mkdir examples/<name>and createscenario.toml.- Decide on net source — inline TOML (small, self-contained) or
bpmn_filepointing at a.bpmnnext to the config. - Provide traces — inline if small,
xes_fileif downloaded from a real-data repo (e.g. BPI Challenge logs). - Set up
[training.input_marking]to map the attribute(s) that drive routing decisions onto the relevant place(s). - Add
README.mdcovering: - What this scenario shows (which framing claims it validates)
- Advantage over alternatives (what existing tools cannot do that the framework can)
- Real-world source (RFC / paper / dataset)
- Files and how to run
- Write
tests/scenarios/test_<name>.pydriving the scenario throughload_scenarioand asserting at minimum: loads, training converges, anomaly detection separates normal vs anomalous traces. - Update this manual's scenario table and ROADMAP.md if the scenario reveals a framework gap.
If you hit a framework shortcoming, fix it in the framework first rather than working around it in the scenario. Phase 2-8 of the ROADMAP exists precisely so the scenarios can be config-only.
6. Extension points¶
Places to plug new behaviour into the framework:
- New BPMN constructs →
bpmn.py(one branch per construct in the main parsing loop). Track gaps in the Phase 4 carry-forward section of ROADMAP.md. - New firing modes →
compiler.py, add a function with the signature(pre_activation) -> activationand a string key in theFiringModeLiteral. - New routing modes →
compiler.py, extend theRoutingModeLiteral and add the dispatch in_forward_acyclic/_forward_unrolled. - New anomaly generators →
anomalies.py. - New rule shapes →
interpretability.py, follow the pattern ofXORRule/AndJoinRule. - New data formats →
adapter.py, add asource = "..."handler in_load_netor_load_traces.
Tests live alongside each module: tests/test_<module>.py for
unit-level coverage, tests/scenarios/test_<scenario>.py for
end-to-end demos.
7. Where this work sits in the literature¶
PETRA draws on and combines four research threads:
- Workflow nets — van der Aalst's foundational work on sound workflow nets as a Petri-net subclass. PETRA depends on workflow- net soundness for the reachability / liveness / boundedness properties that make training meaningful.
- Graph neural networks for process mining — work applying GNNs to process graphs for conformance checking and anomaly detection (Tax et al., Bukhsh et al.). PETRA is more structurally constrained than a general GNN: the architecture IS the verified workflow net, not learned from process data.
- Neuro-symbolic AI — Scallop, DeepProbLog and similar systems that combine neural and symbolic reasoning. PETRA is a specific instance where the symbolic substrate is a compiler-verified workflow net rather than a logic program.
- Spiking neural networks — networks that model discrete spike propagation rather than continuous activations. The token-firing model is analogous to spike propagation, and SNN training methods (STDP, surrogate-gradient methods) are applicable to PETRA's discrete-firing limit and inform the Phase 6 STE work.
PETRA's novel contribution is the specific combination: a verified workflow net used as a fixed neural-network architecture, where the soundness properties propagate into the network's representational constraints and the learned weights stay interpretable at the granularity of named domain elements (BPMN tasks, pathway components, protocol states, …).
8. Running everything¶
python -m pytest # full suite
python -m pytest tests/scenarios/ # only end-to-end scenarios
python -m pytest tests/test_compiler.py # only the compiler
Current test count: 492 passing across the framework and the end-to-end scenarios.