7. Graph, Rule, and Molecule Model¶
A central component of MØD is the graph::Graph
/Graph
which simplified represents simple graphs with labels on vertices and edges.
However, how the labels are interepreted and how graph morphisms are defined
depends which LabelSettings
/LabelSettings
object you
pass to various algorithms.
Another complication is that in the formal mathematics for describing
graph transformation through a rule::Rule
/Rule
classes,
we need need a slightly more complicated description of the graphs in order
to make the mathematics work.
We therefore start with the simplified graph model with the labels being opaque types. Then we explore the different label settings which switches the mathematical category of the graphs. Afterwards the rule model is explained where we get into the mathematical details for label-encoding which is necessary to formally make the graph transformation composable. Lastly we see how MØD interprets graphs as molecule (fragments) in order to create depictions that resembles ordinary molecule depictions.
7.1. Simple Graphs with String Labels¶
In the simplified model, a graph \(G = (V, E, l\colon V\cup E\rightarrow \Omega)\) is undirected, and neither has loop edges nor parallel edges. The function \(l\) assigns a label to each vertex and edge. A morphism between two graphs \(m\colon G\rightarrow H\) is a graph morphism which commutes with the labelling functions. That is,
graph morphism: for every edge \((u, v) \in E_G\) we must have an corresponding edge \((m(u), m(v)) \in E_H\) (\(m\) is a graph morphism), and
vertex labels: each vertex \(v \in V_G\) is mapped to a vertex with the same label, \(l_G(v) = l_H(m(v))\).
edge labels: each edge \(e \in E_G\) is mapped to a vertex with the same label, \(l_G(e) = l_H(m(e))\).
The only requirement of \(\Omega\) is thus that the elements can be
compared for equality.
In MØD this correspons to using
LabelType::String
/LabelType.String
in the a LabelSettings
/LabelSettings
object,
and in the API of PyMØD, this is the default.
Note
When using LabelType::String
/LabelType.String
the LabelRelation
/LabelRelation
in
LabelSettings
/LabelSettings
doesn’t matter,
Note
Technically, in this basic model the stereo setting in
LabelSettings
/LabelSettings
should be disabled as
well. Though, the stereo model is in an experimental stage and we will
therefore not explore it further.
A graph::Graph
/Graph
is further required to be
connected.
7.2. Switching Category — First-Order Terms as Labels¶
We can imbue the label set \(Omega\) with extra structure by using
LabelType::Term
/LabelType.Term
instead of
LabelType::String
/LabelType.String
.
The labels are now interpreted as firt-order terms,
and formally the graph/objects are now in a different
category.
In the graph/rule objects it is the same text strings being used either verbatim when in string mode or being parsed when in term mode.
The following describes how the strings are parsed in term mode.
A constant or function symbol is a word that can be matched by the regex
[A-Za-z0-9=#:.+-][A-Za-z0-9=#:.+-_]*
. This means that all strings that are considered “molecular” can be reinterpreted as constant symbols.A variable symbol is a word that can be matched by the regex
_[A-Za-z0-9=#:.+-][A-Za-z0-9=#:.+-_]*
. That is, variable is like a constant/function symbol, but with a_
prepended. An unnamed variable can be specified by the special wildcard symbol*
.The variable symbols matched by the regex
_[HT][0-9][0-9]*
are discouraged, as they are reserved for converting terms back into strings.Function terms start with a function symbol followed by a parenthesis with a comma-separated list of terms. They may contain white-space which is ignored.
If parsing of terms fails an exception of type
TermParsingError
/TermParsingError
is thrown. Thus, essentially any function that takes a
LabelSettings
/LabelSettings
object may throw such an exception.
If a graph/rule was created in term mode and then is used in string mode, the first-order terms are serialized back into strings:
Constant/function symbols are serialized verbatim.
Variable names are generated and will be matchable by the regex
_[HT][0-9][0-9]*
. Any original variable names are not saved.Function terms will be serialized with a single space character after each comma.
7.2.1. Morphisms¶
First-order terms are partially ordered, and we thus have additional choice in
how to define morphisms.
This choice is in the API selected through the
LabelRelation
/LabelRelation
objects which offers the three
choices “unification”, “specialisation”, and “isomorphism”.
Given a graph morphism \(m\colon G\rightarrow H\)
let \(V_G = \{v_1, v_2, \dots, v_n\}\) be the vertices of \(G\)
and \(E_G = \{e_1, e_2, \dots, e_p\}\) be the edges of \(G\).
Consider the following two terms:
They lay out all the labels of \(G\) in \(t_G\) and then the labels of the corresponding vertices and edges in \(H\) in \(t_H\).
The graph morphism \(m\) is in term mode a valid morphism if
with
LabelRelation::Unification
/LabelRelation.Unification
there exists a most-general unifier between \(t_G\) and \(t_H\).with
LabelRelation::Specialisation
/LabelRelation.Specialisation
there exists a most-general unifier between \(t_G\) and \(t_H\), and it only binds variables in \(t_G\) but not \(t_H\). That is, \(t_H\) is a specialisation of \(t_G\).with
LabelRelation::Isomorphism
/LabelRelation.Isomorphism
there exists a most-general unifier between \(t_G\) and \(t_H\), and it only binds variables to variables, and that variable mapping is a bijection. That is, the only difference between \(t_G\) and \(t_H\) is the naming of the variables they use.
In most cases you should use
LabelRelation::Specialisation
/LabelRelation.Specialisation
though, when for example using
graph::Graph::isomorphism()
/Graph.isomorphism()
to check if two graphs encode the same mathematical object, you should use
LabelRelation::Isomorphism
/LabelRelation.Isomorphism
.
The funtion name thus refers to the type of the graph morphism being found,
while the label settings describes the additional requirements for the labels.
The third option,
LabelRelation::Unification
/LabelRelation.Unification
,
should be used very carefully.
For example, when performing graph transofrmation, e.g., through a
dg::DG
/DG
, it may give results that are unexpected.
One use-case is when calling
graph::Graph::isomorphism()
/Graph.isomorphism()
,
where using
LabelRelation::Unification
/LabelRelation.Unification
means that you will find graph isomorphisms for which the labels on both graphs
can be specialized such that the graph objects becomes isomorphic.
Note
For the graph transformation is it necessary to define a distinguished set of morphisms \(\mathcal{M}\) for the term mode. These morphisms are those that are graph monomorphisms, but are label isomorphisms on the label part.
7.3. Rule Model¶
A rule::Rule
/Rule
represents a
Double Pushout
(DPO) graph transformation rule \(p = (L\xleftarrow{l}K\xrightarrow{r}R)\).
Specifically, they are in the DPO variant with all morphisms being graph
monomorphisms.
7.3.1. Application Constraints¶
A rule can have additional constriants that must be fulfilled before transformation can proceed. Specifically, a constraint \(c\) for a rule \(L\leftarrow K\rightarrow R\) is evaluated once a match morphism \(m\colon L\rightarrow G\) has been found. In the literature such application constraints are also called “application conditions”.
7.3.1.1. Adjacency Constraint¶
Parameters:
a vertex \(v\in V(L)\),
a count \(n\in \mathbb{N}_0\),
an operator \(op\in \{<, \leq, =, \geq, >\}\),
a set of vertex labels \(Q_V\in 2^\Omega\), and
a set of edge labels \(Q_E\in 2^\Omega\).
7.3.1.1.1. String Mode¶
Given a match morphism \(m\colon L\rightarrow G\) the constraint is satisfied if
That is, the vertex \(v\) is mapped to \(G\) and its incident edges are counted. Only those edges with a label in \(Q_E\) and the other endpoint with a label in \(Q_V\) are counted. The resulting count is then compared to the given number \(n\).
7.3.1.1.2. Term Mode¶
When using LabelType::Term
/LabelType.Term
the label
comparison is changed to be a most general unifier computation:
That is, each edge has an independent most general unifier computation where there must exist vertex and edge labels from the given sets such that they unify with the labels of the candidate edge and its other endpoint.
7.3.1.1.3. Specification in GML¶
The constraint is specified in GML as described in adjacency
,
but where omitting the nodeLabels
(resp. edgeLabels
) list represents setting
\(Q_V = \Omega`\) (resp. \(Q_E = \Omega\)), effectively letting the
counting be unrestricted with respect to those labels.
7.3.1.1.4. Examples¶
A vertex must have at most 3 neighbours: \(n = 3, Q_V = Q_E = \Omega\) and use \(\leq\) as \(op\). Assuming the vertex has ID 42 in the rule, this becomes the following in GML:
constraintAdj [ id 42 op "<=" count 3 ]
.A vertex must have at least two neighbours with label
H
: \(n = 2, Q_V = \{\texttt{H}\}, Q_E = \Omega\) and use \(\geq\) as \(op\). Assuming the vertex has ID 42 in the rule, this becomes the following in GML:constraintAdj [ id 42 nodeLabels [ label "H" ] op ">=" count 2 ]
.A vertex must have exactly 1 neighbour with label either
O
orS
where the connecting edge has label=
: \(n = 1, Q_V = \{\texttt{O}, \texttt{S}\}, Q_E = \{\texttt{=}\}\) and use \(=\) as \(op\). Assuming the vertex has ID 42 in the rule, this becomes the following in GML:constraintAdj [ id 42 nodeLabels [ label "O" label "S" ] op "=" count 1 ]
.
7.3.1.2. Label Unification Constraint¶
Parameters:
a query label \(q\in \Omega\) and
a list of constraining labels \(Q\in 2^\Omega\).
7.3.1.2.1. String Mode¶
While this constraint technically works when using
LabelType::String
/LabelType.String
, it is not really
interesting: it is statisfied if \(q\in Q\), and can thus be evaluated statically,
independent of any match.
7.3.1.2.2. Term Mode¶
When using LabelType::Term
/LabelType.Term
the membership
is a most general unifier computation: the constraint is statisfied if there
exists a label \(q'\in Q\) that unifies with \(q\).
7.3.1.2.3. Specification in GML¶
The constraint is specified in GML as described in labelAny
,
7.3.1.2.4. Examples¶
All these examples assume we are in term mode.
Assume some vertex is supposed to act as wildcard, but constrained to only match,
O
orS
. We can then give the vertex a label_X
, i.e., a variable, and add the constraint \(q = \texttt{_X}, Q = \{\texttt{O}, \texttt{S}\}\). In GML this becomesconstrainLabelAny [ label "_X" labels [ label "O" label "S" ] ]
.Assume we have two such vertices, but we need them to be different, i.e., if one is matched to a
O
then the other must beS
. The two vertices can be given different variables as labels,_X
and_Y
, and we then add the constraint \(q = \texttt{foo(_X, _Y)}, Q = \{\texttt{foo(O, S)}, \texttt{foo(S, O)}\}`\). In GML this becomesconstrainLabelAny [ label "foo(_X, _Y)" labels [ label "foo(O, S)" label "foo(S, O)" ] ]
. We here use an arbitrary function symbol,foo
, just for the formulation of the constraint.
7.3.2. Resolution of Formal Issues¶
When relating the model to the formal description of DPO transformation in the literature there are two issues that require elaboration.
7.3.2.1. Label Change¶
The rules allow both for vertices and edges to change labels, but only the latter is well-defined in traditional DPO transformation. Thus, mathematically it is not immediately possible to define the graphs as above. Consider a vertex changing label in a rule, which label should it have in \(K\)? However, as our graphs are not allowed to have loop edges we can as a trick simply pretend that whenever a vertex would have a label, we attach a loop edge to the vertex, and put the “vertex” label on that edge. Note that all morphisms are required to be at least graph monomorphisms, so a loop edge can not be created inadvertently.
7.3.2.2. Avoiding Parallel Edges¶
Our graphs are defined to be without parallel edges, which presents a mathematical problem in that certain pushouts are not allowed. In a DPO direct derivation we can avoid parallel edges by simply introducing algorithmic ad-hoc constraint, that if the second pushout would result in parallel edges in the result graph, then the direct derviation is not defined.
However, in the big picture we can avoid this ad-hoc solution and think of the rules as having implicit negative application conditions (NACs):
For each pair of vertices \(u, v\in V_K\) which are not connected in \(L\), \((l(u), l(v))\not\in E_L\), but are connected in \(R\), \((r(u), r(v))\in E_R\), there is a NAC on \(L\) preventing edges \((l(u), l(v))\).
And symmetrically, for each pair of vertices \(u, v\in V_K\) which are connected in \(L\), \((l(u), l(v))\not\in E_L\), but are not connected in \(R\), \((r(u), r(v))\in E_R\), there is a NAC on \(R\) preventing edges \((l(u), l(v))\).
7.4. Molecule Encoding¶
There is no strict requirement that graphs/rules encode molecule (fragments), however several optimizations are in place when they do, and depictions are prettified based on “moleculeness”. The following describes the encoding of molecules in the basic graph model, that MØD recognizes.
7.4.1. Edges / Bonds¶
An edge encodes a chemical bond if and only if its label is listed in the table below.
Label |
Interpretation |
---|---|
|
Single bond |
|
“Aromatic” bond |
|
Double bond |
|
Triple bond |
7.4.2. Vertices / Atoms¶
A vertex encodes an atom if and only if its label conforms to the following grammar.
vertexLabel ::= [isotope
]atomSymbol
[charge
] [radical
] isotope ::= unsignedInt charge ::= [ singleDigit ] ('-' | '+') radical ::= '.' atomSymbol ::= an atom symbol with the first letter capitalised
Note
In the basic model there are no valence requirements for a graph being recognised as a molecule.