7. Graph, Rule, and Molecule Model

A central component of MØD is the graph::Graph/Graph which simplified represents simple graphs with labels on vertices and edges. However, how the labels are interepreted and how graph morphisms are defined depends which LabelSettings/LabelSettings object you pass to various algorithms. Another complication is that in the formal mathematics for describing graph transformation through a rule::Rule/Rule classes, we need need a slightly more complicated description of the graphs in order to make the mathematics work.

We therefore start with the simplified graph model with the labels being opaque types. Then we explore the different label settings which switches the mathematical category of the graphs. Afterwards the rule model is explained where we get into the mathematical details for label-encoding which is necessary to formally make the graph transformation composable. Lastly we see how MØD interprets graphs as molecule (fragments) in order to create depictions that resembles ordinary molecule depictions.

7.1. Simple Graphs with String Labels

In the simplified model, a graph \(G = (V, E, l\colon V\cup E\rightarrow \Omega)\) is undirected, and neither has loop edges nor parallel edges. The function \(l\) assigns a label to each vertex and edge. A morphism between two graphs \(m\colon G\rightarrow H\) is a graph morphism which commutes with the labelling functions. That is,

  • graph morphism: for every edge \((u, v) \in E_G\) we must have an corresponding edge \((m(u), m(v)) \in E_H\) (\(m\) is a graph morphism), and

  • vertex labels: each vertex \(v \in V_G\) is mapped to a vertex with the same label, \(l_G(v) = l_H(m(v))\).

  • edge labels: each edge \(e \in E_G\) is mapped to a vertex with the same label, \(l_G(e) = l_H(m(e))\).

The only requirement of \(\Omega\) is thus that the elements can be compared for equality. In MØD this correspons to using LabelType::String/LabelType.String in the a LabelSettings/LabelSettings object, and in the API of PyMØD, this is the default.

Note

Technically, in this basic model the stereo setting in LabelSettings/LabelSettings should be disabled as well. Though, the stereo model is in an experimental stage and we will therefore not explore it further.

A graph::Graph/Graph is further required to be connected.

7.2. Switching Category — First-Order Terms as Labels

We can imbue the label set \(Omega\) with extra structure by using LabelType::Term/LabelType.Term instead of LabelType::String/LabelType.String. The labels are now interpreted as firt-order terms, and formally the graph/objects are now in a different category.

In the graph/rule objects it is the same text strings being used either verbatim when in string mode or being parsed when in term mode.

The following describes how the strings are parsed in term mode.

  • A constant or function symbol is a word that can be matched by the regex [A-Za-z0-9=#:.+-][A-Za-z0-9=#:.+-_]*. This means that all strings that are considered “molecular” can be reinterpreted as constant symbols.

  • A variable symbol is a word that can be matched by the regex _[A-Za-z0-9=#:.+-][A-Za-z0-9=#:.+-_]*. That is, variable is like a constant/function symbol, but with a _ prepended. An unnamed variable can be specified by the special wildcard symbol *.

    The variable symbols matched by the regex _[HT][0-9][0-9]* are discouraged, as they are reserved for converting terms back into strings.

  • Function terms start with a function symbol followed by a parenthesis with a comma-separated list of terms. They may contain white-space which is ignored.

If parsing of terms fails an exception of type TermParsingError/TermParsingError is thrown. Thus, essentially any function that takes a LabelSettings/LabelSettings object may throw such an exception.

If a graph/rule was created in term mode and then is used in string mode, the first-order terms are serialized back into strings:

  • Constant/function symbols are serialized verbatim.

  • Variable names are generated and will be matchable by the regex _[HT][0-9][0-9]*. Any original variable names are not saved.

  • Function terms will be serialized with a single space character after each comma.

7.2.1. Morphisms

First-order terms are partially ordered, and we thus have additional choice in how to define morphisms. This choice is in the API selected through the LabelRelation/LabelRelation objects which offers the three choices “unification”, “specialisation”, and “isomorphism”. Given a graph morphism \(m\colon G\rightarrow H\) let \(V_G = \{v_1, v_2, \dots, v_n\}\) be the vertices of \(G\) and \(E_G = \{e_1, e_2, \dots, e_p\}\) be the edges of \(G\). Consider the following two terms:

\[\begin{split}t_G &= \texttt{assoc}(l_G(v_1), l_G(v_2), \dots, l_G(v_n), l_G(e_1), l_G(e_2), \dots, l_G(e_p)) \\ t_H &= \texttt{assoc}(l_H(m(v_1)), l_H(m(v_2)), \dots, l_H(m(v_n)), l_H(m(e_1)), l_H(m(e_2)), \dots, l_H(m(e_p)))\end{split}\]

They lay out all the labels of \(G\) in \(t_G\) and then the labels of the corresponding vertices and edges in \(H\) in \(t_H\).

The graph morphism \(m\) is in term mode a valid morphism if

In most cases you should use LabelRelation::Specialisation/LabelRelation.Specialisation though, when for example using graph::Graph::isomorphism()/Graph.isomorphism() to check if two graphs encode the same mathematical object, you should use LabelRelation::Isomorphism/LabelRelation.Isomorphism. The funtion name thus refers to the type of the graph morphism being found, while the label settings describes the additional requirements for the labels.

The third option, LabelRelation::Unification/LabelRelation.Unification, should be used very carefully. For example, when performing graph transofrmation, e.g., through a dg::DG/DG, it may give results that are unexpected. One use-case is when calling graph::Graph::isomorphism()/Graph.isomorphism(), where using LabelRelation::Unification/LabelRelation.Unification means that you will find graph isomorphisms for which the labels on both graphs can be specialized such that the graph objects becomes isomorphic.

Note

For the graph transformation is it necessary to define a distinguished set of morphisms \(\mathcal{M}\) for the term mode. These morphisms are those that are graph monomorphisms, but are label isomorphisms on the label part.

7.3. Rule Model

A rule::Rule/Rule represents a Double Pushout (DPO) graph transformation rule \(p = (L\xleftarrow{l}K\xrightarrow{r}R)\). Specifically, they are in the DPO variant with all morphisms being graph monomorphisms.

7.3.1. Application Constraints

A rule can have additional constriants that must be fulfilled before transformation can proceed. Specifically, a constraint \(c\) for a rule \(L\leftarrow K\rightarrow R\) is evaluated once a match morphism \(m\colon L\rightarrow G\) has been found. In the literature such application constraints are also called “application conditions”.

7.3.1.1. Adjacency Constraint

Parameters:

  • a vertex \(v\in V(L)\),

  • a count \(n\in \mathbb{N}_0\),

  • an operator \(op\in \{<, \leq, =, \geq, >\}\),

  • a set of vertex labels \(Q_V\in 2^\Omega\), and

  • a set of edge labels \(Q_E\in 2^\Omega\).

7.3.1.1.1. String Mode

Given a match morphism \(m\colon L\rightarrow G\) the constraint is satisfied if

\[|\{ e = (m(v), u)\in outEdges(m(v))\ |\ l_G(u)\in Q_V \wedge l_G(e)\in Q_E\}| \ op\ n\]

That is, the vertex \(v\) is mapped to \(G\) and its incident edges are counted. Only those edges with a label in \(Q_E\) and the other endpoint with a label in \(Q_V\) are counted. The resulting count is then compared to the given number \(n\).

7.3.1.1.2. Term Mode

When using LabelType::Term/LabelType.Term the label comparison is changed to be a most general unifier computation:

\[|\{ e = (m(v), u)\in outEdges(m(v))\ |\ \exists q_V\in Q_V \exists q_E\in Q_E : hasMGU(l_G(u) \overset{?}{=} q_V, l_G(e) \overset{?}{=} q_E\}| \ op\ n\]

That is, each edge has an independent most general unifier computation where there must exist vertex and edge labels from the given sets such that they unify with the labels of the candidate edge and its other endpoint.

7.3.1.1.3. Specification in GML

The constraint is specified in GML as described in adjacency, but where omitting the nodeLabels (resp. edgeLabels) list represents setting \(Q_V = \Omega`\) (resp. \(Q_E = \Omega\)), effectively letting the counting be unrestricted with respect to those labels.

7.3.1.1.4. Examples
  • A vertex must have at most 3 neighbours: \(n = 3, Q_V = Q_E = \Omega\) and use \(\leq\) as \(op\). Assuming the vertex has ID 42 in the rule, this becomes the following in GML: constraintAdj [ id 42 op "<=" count 3 ].

  • A vertex must have at least two neighbours with label H: \(n = 2, Q_V = \{\texttt{H}\}, Q_E = \Omega\) and use \(\geq\) as \(op\). Assuming the vertex has ID 42 in the rule, this becomes the following in GML: constraintAdj [ id 42 nodeLabels [ label "H" ] op ">=" count 2 ].

  • A vertex must have exactly 1 neighbour with label either O or S where the connecting edge has label =: \(n = 1, Q_V = \{\texttt{O}, \texttt{S}\}, Q_E = \{\texttt{=}\}\) and use \(=\) as \(op\). Assuming the vertex has ID 42 in the rule, this becomes the following in GML: constraintAdj [ id 42 nodeLabels [ label "O" label "S" ] op "=" count 1 ].

7.3.1.2. Label Unification Constraint

Parameters:

  • a query label \(q\in \Omega\) and

  • a list of constraining labels \(Q\in 2^\Omega\).

7.3.1.2.1. String Mode

While this constraint technically works when using LabelType::String/LabelType.String, it is not really interesting: it is statisfied if \(q\in Q\), and can thus be evaluated statically, independent of any match.

7.3.1.2.2. Term Mode

When using LabelType::Term/LabelType.Term the membership is a most general unifier computation: the constraint is statisfied if there exists a label \(q'\in Q\) that unifies with \(q\).

7.3.1.2.3. Specification in GML

The constraint is specified in GML as described in labelAny,

7.3.1.2.4. Examples

All these examples assume we are in term mode.

  • Assume some vertex is supposed to act as wildcard, but constrained to only match, O or S. We can then give the vertex a label _X, i.e., a variable, and add the constraint \(q = \texttt{_X}, Q = \{\texttt{O}, \texttt{S}\}\). In GML this becomes constrainLabelAny [ label "_X" labels [ label "O" label "S" ] ].

  • Assume we have two such vertices, but we need them to be different, i.e., if one is matched to a O then the other must be S. The two vertices can be given different variables as labels, _X and _Y, and we then add the constraint \(q = \texttt{foo(_X, _Y)}, Q = \{\texttt{foo(O, S)}, \texttt{foo(S, O)}\}`\). In GML this becomes constrainLabelAny [ label "foo(_X, _Y)" labels [ label "foo(O, S)" label "foo(S, O)" ] ]. We here use an arbitrary function symbol, foo, just for the formulation of the constraint.

7.3.2. Resolution of Formal Issues

When relating the model to the formal description of DPO transformation in the literature there are two issues that require elaboration.

7.3.2.1. Label Change

The rules allow both for vertices and edges to change labels, but only the latter is well-defined in traditional DPO transformation. Thus, mathematically it is not immediately possible to define the graphs as above. Consider a vertex changing label in a rule, which label should it have in \(K\)? However, as our graphs are not allowed to have loop edges we can as a trick simply pretend that whenever a vertex would have a label, we attach a loop edge to the vertex, and put the “vertex” label on that edge. Note that all morphisms are required to be at least graph monomorphisms, so a loop edge can not be created inadvertently.

7.3.2.2. Avoiding Parallel Edges

Our graphs are defined to be without parallel edges, which presents a mathematical problem in that certain pushouts are not allowed. In a DPO direct derivation we can avoid parallel edges by simply introducing algorithmic ad-hoc constraint, that if the second pushout would result in parallel edges in the result graph, then the direct derviation is not defined.

However, in the big picture we can avoid this ad-hoc solution and think of the rules as having implicit negative application conditions (NACs):

  • For each pair of vertices \(u, v\in V_K\) which are not connected in \(L\), \((l(u), l(v))\not\in E_L\), but are connected in \(R\), \((r(u), r(v))\in E_R\), there is a NAC on \(L\) preventing edges \((l(u), l(v))\).

  • And symmetrically, for each pair of vertices \(u, v\in V_K\) which are connected in \(L\), \((l(u), l(v))\not\in E_L\), but are not connected in \(R\), \((r(u), r(v))\in E_R\), there is a NAC on \(R\) preventing edges \((l(u), l(v))\).

7.4. Molecule Encoding

There is no strict requirement that graphs/rules encode molecule (fragments), however several optimizations are in place when they do, and depictions are prettified based on “moleculeness”. The following describes the encoding of molecules in the basic graph model, that MØD recognizes.

7.4.1. Edges / Bonds

An edge encodes a chemical bond if and only if its label is listed in the table below.

Label

Interpretation

-

Single bond

:

“Aromatic” bond

=

Double bond

#

Triple bond

7.4.2. Vertices / Atoms

A vertex encodes an atom if and only if its label conforms to the following grammar.

vertexLabel ::=  [ isotope ] atomSymbol [ charge ] [ radical ]
isotope     ::=  unsignedInt
charge      ::=  [ singleDigit ] ('-' | '+')
radical     ::=  '.'
atomSymbol  ::=  an atom symbol with the first letter capitalised

Note

In the basic model there are no valence requirements for a graph being recognised as a molecule.