`petri_net_nn.compiler`¶

compiler ¶

Compile a PetriNet into a differentiable nn.Module.

Implements §4 of the architecture spec. The compiled module instantiates the continuous relaxation from §4.2 directly over the net's flow relation:

activation(t) = sigmoid( sharpness * (sum_p w(p,t)*a(p) - theta(t)) )
a(p)          = sum_{t: (t,p) in F} activation(t) * w(t,p)

The structural constraint from §4.3 — "weights outside this structure are zero by construction and cannot be learned away from zero" — holds because the module allocates exactly one learnable scalar per arc in F and one threshold per transition in T. There is no global weight matrix with a mask; the parameters that don't exist literally don't exist.

Two forward-pass modes:

num_steps == 0 (default) — acyclic mode. The constructor topologically sorts (P ∪ T, F) and forward does a single propagation pass in that order. The §4.2 equations are evaluated exactly once per node. Rejects cyclic nets at construction.
num_steps > 0 — time-unrolled mode. The constructor skips the topological sort so cyclic nets are accepted. Forward initialises place activations from the input marking / M_0 and then performs num_steps synchronous updates (each step: refresh every transition's activation from current place activations, then refresh every non-source place's activation from the new transition activations). Source places — those with empty preset — clamp to their input value at every step, so they behave as a persistent input layer.

Coloured-Petri-net layer¶

When a transition has a structural guard (declarative {place, op, value} form), the compiler builds a learnable soft-guard alongside the standard firing equation: an nn.Parameter threshold initialised at value and a sigmoid gate that multiplies the transition's firing strength by sigmoid(s * sign * (value(place) - threshold)) (sign = +1 for >/>=, −1 for </<=). The threshold trains end-to-end with the rest of the network, so the model can refine the declared boundary from execution traces. Guards declared as opaque callables stay transparent to the compiler (the token-game still uses them — they don't take part in training).

To make value-conditioned routing trainable, the forward pass carries a per-place value channel alongside the activation channel. Source-place values come from the optional input_values argument (default 1.0). Non-source places combine the values arriving on their incoming arcs into an activation-weighted average — the natural soft-token analogue of "what value would this place hold right now if a token were here." Output-arc values may be declared on the net (arc_output_values); constant scalars are honoured by the compiler, callable transforms are evaluated only in the discrete coloured token-game and treated as the default value 1.0 in the differentiable forward pass.

The guard sigmoid scales sharpness by 1 / max(|theta_init|, 1.0) so the initial gradient at the boundary is in O(1) regardless of the units the modeller used. The same SharpnessScheduler from Phase 6 sharpens guards alongside firing transitions during training.

PetriNetModule ¶

PetriNetModule(net, *, sharpness=1.0, num_steps=0, firing='sigmoid', routing='independent')

Bases: Module

Differentiable neural network whose topology is exactly a PetriNet.

Parameters¶

net : A well-formed PetriNet. Validation errors are rejected at construction time; cycles are rejected only when num_steps == 0. sharpness : Multiplier inside the sigmoid (§4.2 has no such factor; this is a training aid for AND-join–shaped transitions where a near-step activation is needed — see §5 Subnet 4). Default 1.0 keeps the forward pass faithful to §4.2 verbatim. num_steps : 0 (default) selects acyclic single-pass mode; any positive integer selects time-unrolled mode and accepts cyclic nets.

Source code in petri_net_nn/compiler.py

def __init__(
    self,
    net: PetriNet,
    *,
    sharpness: float = 1.0,
    num_steps: int = 0,
    firing: FiringMode = "sigmoid",
    routing: RoutingMode = "independent",
) -> None:
    super().__init__()
    issues = net.validate()
    if issues:
        raise ValueError(f"net is not well-formed: {issues}")
    if num_steps < 0:
        raise ValueError(f"num_steps must be non-negative, got {num_steps}")
    if firing not in ("sigmoid", "ste"):
        raise ValueError(
            f"firing must be 'sigmoid' or 'ste', got {firing!r}"
        )
    if routing not in ("independent", "softmax"):
        raise ValueError(
            f"routing must be 'independent' or 'softmax', got {routing!r}"
        )

    self.net = net
    self.sharpness = sharpness
    self.num_steps = num_steps
    self.firing = firing
    self.routing = routing
    self._fire_fn = _fire_ste if firing == "ste" else _fire_sigmoid

    self._softmax_groups: dict[str, list[str]] = {}
    if routing == "softmax":
        for _, group in _xor_groups(net):
            for t in group:
                self._softmax_groups[t] = group

    if num_steps == 0:
        self._order: tuple[str, ...] | None = self._toposort(net)
    else:
        self._order = None

    self.arc_weights = nn.ParameterDict()
    self._arc_key: dict[tuple[str, str], str] = {}
    for i, edge in enumerate(sorted(net.flow)):
        key = f"arc_{i}"
        self._arc_key[edge] = key
        self.arc_weights[key] = nn.Parameter(
            torch.normal(mean=1.0, std=0.1, size=())
        )

    self.transition_thresholds = nn.ParameterDict()
    self._threshold_key: dict[str, str] = {}
    for i, t in enumerate(sorted(net.transitions)):
        key = f"theta_{i}"
        self._threshold_key[t] = key
        n_inputs = len(net.preset(t))
        theta_init = max(0.0, (n_inputs - 1) * 0.5)
        self.transition_thresholds[key] = nn.Parameter(torch.tensor(theta_init))

    # Learnable soft-guard thresholds — one per transition with a
    # structural guard. The TOML value seeds the parameter; training
    # refines it. The per-guard sharpness scale (kept fixed at
    # construction) keeps the sigmoid gradient at O(1) at the
    # boundary regardless of the value units the modeller used.
    self.guard_thresholds = nn.ParameterDict()
    self._guard_meta: dict[str, dict] = {}
    for i, t in enumerate(sorted(net.transitions)):
        spec = net.transition_structural_guards.get(t)
        if spec is None:
            continue
        op = spec["op"]
        if op not in (">", ">=", "<", "<="):
            raise ValueError(
                f"transition {t!r}: compiler only supports structural "
                f"guard ops in {{>, >=, <, <=}}; got {op!r}. "
                f"Equality / inequality guards must be expressed "
                f"as opaque callables (token-game only)."
            )
        place = spec["place"]
        if place not in net.places:
            raise ValueError(
                f"transition {t!r}: structural guard references "
                f"unknown place {place!r}"
            )
        init = float(spec["value"])
        key = f"guard_theta_{i}"
        self.guard_thresholds[key] = nn.Parameter(torch.tensor(init))
        self._guard_meta[t] = {
            "place": place,
            "op": op,
            "key": key,
            # Auto-scale the sigmoid steepness so that learning is
            # well-conditioned for guard thresholds in any unit
            # (loan amounts in £, signal strengths in [0,1], etc).
            "scale": 1.0 / max(abs(init), 1.0),
        }

forward ¶

forward(input_marking=None, *, input_values=None, batch_size=None)

Run a forward pass.

In acyclic mode (num_steps == 0), produces a single topological propagation. In time-unrolled mode, returns the activations after self.num_steps synchronous updates.

input_marking overrides any place's activation. In time-unrolled mode the override is re-applied at every step, which is how you clamp a "persistent input" through the unrolled dynamics — equivalent to the §7.1 "predict next activations from a partial execution" use case.

input_values feeds the per-place value channel that the coloured-Petri-net layer reads. Each entry is a 1D tensor of shape (batch_size,) giving the scalar value carried by the token at that source place — loan amount, signal strength, sensor reading, whatever the modeller chose. Any source place absent from this dict defaults to value 1.0 (the value-carrying-no-information case, equivalent to a plain unannotated token).