Chapter 10
ML: Machine Learning and Agents
Distributed and Incremental Learning with Agents and noisy Sensor
Data
Introduction to Machine Learning 326
Decision Trees 331
Artificial Neuronal Networks 339
Learning with Agents 342
Distributed Learning 343
Incremental Learning 350
Further Reading 362
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
326
This Chapter outlines the challenges of machine learning in unreliable dis-
tributed environments and noisy input (sensory) data. The concept of learning
agents is introduced used for distributed learning, e.g., in distributed sensor
networks. Distributed agent-based learning is combined with incremental
learning approaches to meet the requirements in evolving and dynamic
environments.
Learning is closely related to the agent model implementing a generic artifi-
cial intelligent system using sensory data to plan and execute actions, shown
in Figure 10.1.
Fig. 10.1 General Artificial Intelligence System with adaptivity based on learning closely
related to the agent model
10.1 Introduction to Machine Learning
An advantage of Machine Learning (ML) over numerical algorithms is the
ability to handle problems having a Non-Polynomial (NP) complexity class, i.e.,
an exponential dependence of the computation time from the data size of the
problem [WUE16].












 


S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
10.1 Introduction to Machine Learning 327
Basically learning algorithms can be used to improve a system behaviour
either at run-time or at design-time (or a hybrid approach of both) by optimiz-
ing function parameters to increase mainly estimation accuracy. The task of a
Machine Learner is to derive a hypothesis (set) for a target concept, e.g., sim-
ply the Boolean variable Overload that gives the statement that there is
probably an overload condition or not. Machine learning is based either on
example data (training with labelled data) or on experience and history with
reward feedback retrieved at run-time of a system. The labels are concrete
values of the target concept (i.e., possible values of the output variable). The
data consists of attribute variables, e.g., different strain gauge and tempera-
ture sensors.
The fields of application range from the optimizing and fitting of parame-
ters of evaluation functions to full feature extraction classifiers. Machine
learning is often used if there is no or only an incomplete world model speci-
fying the output change of a module on behavioural or functional level (the
system or a part of it) in response to a change of the input stimulus.
Machine Learning is closely related to the agent model suitable for the com-
position of complex system, as shown in Figure 10.2. Agents can be
considered as learning instances, supporting distributed learning.
Fig. 10.2 Supervised and unsupervised learning with relation to agent-based learning.
Multi-agent-based learning can be supervised and unsupervised.
Multi-agent
Systems
Multi-agent
Learning
Machine
Learning
Agent
Learning
Machine
Learning
Multi-agent
Systems
Complex
Systems
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
328
A practically orientated classification of machine learning schemes (based
on [SAM11]), suitable for information extraction in sensor networks and infor-
mation retrieval is given by the following list:
Supervised Learning
A process that learns a model, basically a function mapping input data (of
the past) to an output data set using data comprising examples with an al-
ready given mapping, i.e., using labelled data (see Figure 10.2). That means
the supervised learning task has to find a relationship between input attrib-
utes (independent variables) and output attributes (dependent variables).
Two typical examples are classification and regression. Supervised learning
produces a data base requiring a high demand of storage resources, but
the resulting learned model can be considerable small (e.g., a decision
tree).
Unsupervised Learning
This is generally a process that seeks to learn structure of data (see Figure
10.2) in the absence of an identified output (like in supervised learning with
labelled data) or a feedback (reinforcement learning). That means the un-
supervised learning has to find distributions of instances by grouping in-
stances without pre-specified dependent attributes. Examples are
clustering or self-organizing maps trying to group similar unlabelled data
sets.
Semi-supervised Learning
This is the combination of supervised and unsupervised learning tech-
niques creating a hybrid learning architecture (see Figure 10.2).
Reinforcement Learning
In some situations, the output of a system (the impact of the environment
in that the system operates) is a sequence of actions, and the sequence of
correct actions is important, rather than one single action. Reinforcement
learning seeks to learn a policy mapping states to actions that optimizes a
received reward (the feedback), finally optimizing the behaviour of a sys-
tem (the reactivity). It is different from data classifiers and relates closely to
the autonomous agent behaviour model. There are no trained example sit-
uations for correct or incorrect behaviours, only rules evaluating and
weighting actions and their impact on the environment and the system.
Association Learning
The goal of association learning is to find conditional probabilities of asso-
ciation rules between different items of data sets. An association rule has
the form X Y, where X and Y are item sets. The association rule belongs
to a conditional probability P(Y|X) giving the probability that if X occurs,
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
10.1 Introduction to Machine Learning 329
then set Y is also likely to occur. Giving user-specified support and confi-
dence thresholds, the a-priori algorithm developed by [AGR96] can find all
association rules between two sets X and Y. This proposed algorithm can be
parallelized.
Classification
Classifier systems can be considered as modelling tools. Given a real sys-
tem without a known underlying dynamics model, a classifier system can
be used to generate a behaviour that matches the real system. The classi-
fier offers rules-based model of the unknown system. There is a very large
number of classification algorithms, for example, commonly used, decision
trees (i.e., C4.5 algorithm), instance-based learners (i.e., nearest-neighbour
(NN) methods), support vector machines (basically linear classifiers), rule-
based learners, neural networks, and Bayesian networks. The main pur-
poses of classification in sensor networks are knowledge extraction and
pattern recognition. For example, in [BOS13B], C4.5 and k-NN algorithms
were used to classify a 18-dimensional strain-gauge sensor vector of a flat
rubber plate to a (F,X) vector providing a strength classification of an ap-
plied load (F) with an estimation of the spatial position (X).
Decision trees as one outcome of a machine learning classifier have low
computational resource requirements and can be implemented directly on
microchip level, for example, used for energy management in [BOS11B].
Clustering
Clustering is basically an unsupervised learning with the goal to group data
sets sharing similar characteristics in clusters automatically. The main goal
is finding structure in the given set of data. A common clustering algorithm
is the k-mean algorithm, which quantifies vectors and compute the dis-
tance between vector, finally grouping the nearest vectors in cluster seg-
ments (see, e.g., [BEL15]). A Self-organizing map is another well-known
clustering algorithm.
Regression
Like classification, regression is a supervised learning approach. The goal is
to learn an approximation of a real-valued function mapping an input data
vector (real-valued input variables) on an output vector (the mean of re-
sponse variables).
Machine Learning can be represented by a four-layer model [ROK15]:
L1. Application
Structural Health Monitoring, Fault Detection, Adaptation of control sys-
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
330
tems, Prediction, ..
L2. Tasks
Classification, Regression, Clustering, ..
L3. Models
Models representing the learning outcome: Decision Trees, Bayesian Net-
works, Artificial Neural Networks, Self-organizing Maps, ..
L4. Algorithms
C4.5, nearest neighbour k-nn, ..
Machine Learning is divided basically in two phases that can be executed
off-line, on-line, and mixed:
1. Model derivation (the learning of a prediction or clustering model from
known data); and
2. Application of the learned model (applying new unknown data to the
learned model function to predict a class or to get a relation to a clus-
ter).
In general, ML derives a hypothesis h(x) of a model function f(x): x y using
a learner function M: D h with h H and a training data set D. The model
function f(x) is also known as the target concept. Basically, the model function
(or its hypothesis) maps an input data vector x=(a
1
,a
2
..,a
n
) on an output data
vector y with kyk Ü kxk, i.e., the mapping function performs feature extrac-
tion by reducing data. Each x
i
is a data instance of an instance set x X given
by its attributes, that are the independent feature variables, (a
1
,a
2
..,a
n
) and
the dependent output target attribute y
i
. In supervised learning a set of
labelled data sets are used as training data, i.e., D={(x
1
,l
n
),(x
2
,l
m
),..}. In the case
of classification, the set of labels L={l
1
,l
2
,..} are possible values of the scalar
output variable y, i.e., y L. The general problem of the learning task is the
diversity of different hypothesis functions h H that can be derived from
training sets, approximating the model function more or less accurately.
There are more general hypothesis functions and more specific. The learning
task has to find an appropriate hypothesis from the hypothesis space H.
Each learning task incorporates three different steps using different kind of
input data:
Known Training Data
Task Learning: Find hypothesis h(x) for unknown target function f(x): x y
Known Test Data
Task Testing: Test hypothesis h(x) and check quality and generalization of
hypothesis
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
10.2 Decision Trees 331
Unknown Data
Task Application: Apply hypothesis function h(x) to unknown data
Feedback Data
1
Task Feedback: Adapt the current hypothesis h(x) h’(x) with feedback (re-
ward) from application providing new training data at run-time.
There are various different ML algorithms providing different learning
approaches (i.e., learner function M) and representations of the learned
model (i.e., hypothesis function h). They differ in:
Appropriate matching of specific use cases (kind of input data and
input data distribution)
Accuracy (prediction quality)
Speed: Computational complexity and Real-time Capability (with
respect to learning and application)
Storage requirements
Distributivity (with respect to efficiency and communication costs)
Adaptivity (i.e., support of incremental learning at run-time, if any)
Noise immunity (Impact of input data noise on classification and pre-
diction quality)
The deployment of ML using low-resource embedded systems can be a
challenge with respect to computational complexity and storage require-
ments. The deployment of ML in distributed sensor networks providing
inherent distributed input data demands for distributed ML algorithms oper-
ating on localized data and incremental on-line learning capabilities operating
on data streams. In the following section a selection of ML algorithms are
introduced suitable for the deployment on embedded systems and computa-
tional distribution. Probabilistic learning approaches (i.e, Bayesian Networks)
are not discussed here as there is practically less or incomplete knowledge
about probabilities of heterogeneous sensor data from various sources
required for probabilistic approaches.
10.2 Decision Trees
Decision trees are attractive models representing a learned target concept
as it can be read and understand easily. After a model is generated, it can be
reported back to other people. Decision trees are commonly used for classifi-
cation tasks only.
1.Optional
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
332
Fig. 10.3 A learned decision tree and its linear look-up table representation (A
i
,B
i
:
Attribute Variables, L
i
: Class Symbol (label) i, n
i
: Reference to a table row i)
A decision tree (DT) is a directed acyclic graph consisting of nodes, leaves,
and conditional edges. Each node of the DT is associated with an attribute var-
iable and the edges are related to attribute values selecting an evaluation
path. A leave is associated with a value of the target attribute (class symbol or
numerical value). The main advantage of DTs compared with other models
are their low storage requirement and the capability to store DTs in table for-
mat, shown in Figure 10.3. A learned DT model does not carry any original
training data.
The general task of DT learning is the selection of suitable attribute varia-
bles based on training data sets splitting a tree node in sub-trees leading to
reliable classification paths along different nodes with high classification
probability and diversity. A decision tree is built top-down from a root node
and performs partitioning the data into subsets containing instances with
similar values and creating sub-trees.
Algorithms - Overview
ID3
ID3 by J. R. Quinlan is an iterative algorithm for constructing DT from train-
ing data set D. D is a table with columns (a
1
,a
2
,..,a
n
,y) and a number of rows.
Each table cell has a value vV(a) (of attribute a) or tT for the target attrib-
ute.
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
10.2 Decision Trees 333
The node splitting attribute is selected based on gain and information en-
tropy calculation.
Consider a column of the table, e.g., the target attribute column of y. There
is a finite set of unique values V={v
1
,v
2
,..v
m
} that y (and any column c) can
hold. Now it is assumed that some values occur multiple times. The infor-
mation entropy, i.e., a measure of disorder of the data, of this (or any other)
column c (of attribute a/y) is defined by the entropy of the value distribution
entropy
N
:
(10.1)
with p
v
as the probability that a specific value v occurs in the column, i.e.,
the number of occurrences n(v)=|c
v
| of v in the column c relative to the
number n=|c| of rows in c. For example, the column y with 10 rows is occu-
pied by two different values {A,B,B,A,A,A,B,B,B,B}, e.g., y relates to "Playing
Golf" and A=Yes and B=No. Assume a distribution {4,6} of the set {A,B}. Then
the entropy of y is
entropyN({4,6})=4/10log(4/10)6/10log(6/10)=0.97.
Assume the attribute a
1
is related to the weather outlook with three possi-
ble values {Sunny, Overcast, Rainy}. The column of a
1
of the data set D con-
tains the distribution {3,6,1}. Then the entropy of the column of a
1
is
entropy
N
({3,6,1})= 1.3.
The calculation of the entropy of single columns is unhelpful for the selec-
a
1
a
2
.. a
n
y
v
1
.. .. .. t
1
v
2
.. .. .. t
2
v
2
.. .. .. t
1
v
1
.. .. .. t
3
v
3
.. .. .. t
1
.. .. .. .. ..
entropy N p p p
N
N
entropy c p p
Nii
i
n
i
i
v
() log ,
() log
=− =
=−
=
2
1
2
with
vv
vVc
v
v
p
c
c
=
()
,
||
||
with
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
334
tion of appropriate attributes. Moreover, an input attribute column must
be related to the target attribute distribution. Now assume the outcome of
the target variable y{A,B} (in this example PlayingGolf) depends on the in-
put attribute a
1
{Sunny, Rainy, Overcast} (i.e. Weather). The column a
1
should contain 2(y=A) and 1(y=B) rows with a
1
:Weather=Sunny, 4(y=A) and
0(y=B) rows with a
1
:Weather=Overcast, and 1(y=A) and 2(y=B) rows with
a
1
:Weather=Rainy. Then the entropy of the column a
1
related to y is:
P(Sunny)entropyN({2,1})+P(Overcast)entropyN({4,0})+
P(Rainy)entropyN({1,2})=3/10*0.91+4/10*0.0+3/10*0.91=0.55
In general, the information entropy of a column c(a) value distribution re-
lated to the outcome of the target variable is given by:
(10.2)
with vV(a) as the possible unique values of the attribute variable a and T
the values of the target attribute.
The ID3 algorithm now starts with an empty tree and the full set of attrib-
utes A={a
1
,..,a
n
}. The entropy for each column is calculated (applying Equa-
tion 10.1 and 10.2) and finally the information counting gain for each
column with respect to the target attribute distribution T in this column:
(10.3)
The column c
i
with the highest gain associated with attribute a
i
is selected
for the first tree node and removed from the set A. For each value of the
attribute occurring in the selected column a new branch of the tree is cre-
a/y t
1
t
2 ..
t
o
Distribution
v
1
n
11
n
12
.. n
1o
{n
11
,n
12
,..,n
1o
}
v
2
n
21
n
22
.. n
2o ..
v
3
n31 n
32
.. n
3o ..
.. .. .. .. .. ..
v
m
n
m1
.. .. n
mo
{n
m1
,n
m2
,..,n
mo
}
Entropy T c p entropy c T
p
c
c
vNv
vVc
v
v
(,) ({ |})
||
||
()
=
=
,
with andd with y= with y={ | } {| |,| |,..},cTctcttT
vv v i
=∈
12
Gain T c entropy T Entropy T c(,) () (,)=−
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
10.2 Decision Trees 335
ated (i.e., each branch contains the rows that has one of the values of the
selected attribute). In the next iteration a new attribute (column) is selected
from the remaining attribute set until there are no more attributes. A zero
gain indicates a leave (i.e., all rows select the same target attribute, i.e.,
T={t}).
C4.5
A major limitation of ID3 is the creation of small and efficient DT if there are
many different values of feature variables, e.g., real.valued data and target
variables. Furthermore, ID3 cannot handle undefined attribute values.
Instead, using the information gain for a split criteria given by Equation
10.3, the C4.5 algorithm considers the gain ratio that defines a relation of
the information gain of a data column to the outcome of a possible splitting
by creating a tree node (i.e., the split information entropy of the target at-
tribute distribution) [HSS14].
This is given by:
(10.4)
Like in the ID3 algorithm, the data table is sorted at every node creation
with respect to the best splitting attribute based on the gain ratio impurity.
At each node of the tree, C4.5 chooses one attribute of the remaining attrib-
ute set A that most effectively splits the remaining training data set into
subsets enriched in one class or the other. The criterion is the normalized
information gain (difference in entropy) that results from choosing an at-
tribute for splitting the data. The attribute with the highest normalized in-
formation gain is chosen to make the decision.
In addition, the C4.5 learner performs tree pruning to optimize the applica-
tion of the learned model. One major flaw of ID3/C4.5 is over-fitting of the
tree and sub-tree replication. Furthermore, even C4.5 does not handle
noisy data accurately.
The computational complexity of the C4.5 algorithm is about (mn
2
) [SU06]
with m as the number of training examples and n as the number of attrib-
utes.
Gain T c entropy T Entropy T c
SplitInfo T
T
T
i
(,) () (,)
()
||
||
log
=−
=−
2
|| |
||
(,)
(,)
()
T
T
GainRatio T c
Gain T c
SplitInfo T
i
i
n
=
=
1
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
336
INN
The -interval nearest-neighbour algorithm (INN) uses a modified and sim-
plified ID3/C4.5 algorithm handling significantly noisy data combined with
the NN approach used by the tree evaluation of the DT to optimize classi-
fication. It is well suited for noisy sensor data, geometrically correlated sen-
sor data, and in advance unknown or incomplete training data sets like in
incremental learning. Instead, using single discrete data values, data value
intervals are used considering uncertainty. In the following the algorithm is
introduced as a major algorithm suitable for the deployment in sensor net-
works.
Traditional Decision Tree Learner (DTL) (e.g., using the ID3 algorithm) select
data set attributes (feature variables) for decision-making only based on in-
formation-theoretic entropy calculation to determine the impurity of train-
ing set columns (i.e., the gain of a specific attribute variable), which is well
suited for non-metric symbolic attribute values, like colour names, shapes,
and so on. The distinction probability of two different symbols is usually
one. In contrast, real-valued sensor data is noisy and underlies variations
(e.g., drift) due to the measuring process and the physical world. Two nu-
merical (sensor) values A and B have only a high distinction probability if the
uncertainty intervals [A-,A+] and [B-,B+] due not overlap. That means,
not only the entropy of a data set column is relevant for numerical data, the
standard deviation and value spreading of a specific column must be con-
sidered, too. To improve attribute selection for optimal data set splitting, a
column -interval entropy computation is introduced, that extends each
value of a column vector (associated to a specific attribute) with an uncer-
tainty interval [v
i
-,v
i
+], based on the C4.5 algorithm.
Initially, the values of a real-valued data table is transformed in data range
values:
Applying the 2-interval approach to original ID3 algorithm from Equation
a
1
a
2
.. a
n
y
[v
1
 [v
1
 .. .. t
1
[v
2
 [v
2
 .. .. t
2
[v
1
 [v
1
 .. .. t
1
[v
3
 [v
3
 .. .. t
3
.. .. .. .. t
1
.. .. .. .. ..
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
10.2 Decision Trees 337
10.2 gives:
(10.5)
In contrast to the original ID3/C4.5 algorithms calculating the target de-
pendent data entropy for all columns for optimal attribute selection there
is a simplified -interval data-centric algorithm calculating the entropy for a
data set column only without considering the target attribute relationship,
based on Equation 10.6. This simplification reduces the computational
complexity and enables the deployment of DT learning in embedded sys-
tems under real-time conditions and incomplete training sets as they are
occurred in incremental (stream-based) learning.
(10.6)
The occurrence probability is calculated by counting overlapping 2 inter-
vals of data column values. Based on this calculation, the best attribute
Entropy T c p v c entropy c T
entropy
v
vVc
(,,) (,,) ({ |},)
()
εεε
εε
ε
=
((,) (,,)log (,,)
(’,,)
()
:(
cpvcpvc
pvc
vVc
overlap
εεε
ε
εε
ε
=−
=
2
1 [[,[)
:
,] ,]
||
(,
()
vv
otherwise
vv
c
overlap v
vVc
−+ +
εε εε
0
1
vv
true v v v v
vvvv
2
1212
2121
)
:
=
≥∧
≥∧
( )
(
éëéé
éëéé))
false otherwise:
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
338
(feature variable) of a column with the highest information entropy is se-
lected for creating a new tree node. The column can still contain non-distin-
guishable values with overlapping 2 intervals. All overlapping 2 values are
grouped in partitions that cannot be classified (separated) by the currently
selected attribute variable. Only partitions - ideally containing only one
data set value - are used for a classification selection. All data sets in one
partition create a sub-tree of the current decision tree node. If there is only
one partition available (containing more than one class target value, a data
set attribute selection is based on the column with the highest standard de-
viation, but the 2 separation cannot be guaranteed in this case lowering
the prediction accuracy. The basic principle of the learning algorithm, which
is an adaptation of a common discrete C4.5 Decision Tree Learner, is shown
in Algorithm 10.1. It creates a graph based on node attribute selection using
intervals, e.g. x[500..540], instead the commonly used and simplified rela-
tional value selection, e.g., x < 540, which is an inadmissible extrapolation
beyond the training set boundaries and prevent recognizing totally non-
matching data. The choice of an appropriate value requires some statisti-
cal knowledge from a prior analysis of the data set.
Alg. 10.1 INN algorithm, simplified version [BOS16B]
typevalue=number|numberrange
Thelearnedmodelisadecisiontreewithnodesandleaves
typemodel=Result(name:string)|
Feature(name:string,featvals:modelarray)|
FeatureValue(val:value,child:model)
function
createTree(datasets,target,features)
1.Selectallcolumnsinthedatasetarraywiththetargetkey
2.Ifthereisonlyonecolumn,returnaresultleafnodewiththe
target
3.Determinethebestfeaturesbyapplyingentropyandvaluedeviation
computation
4.Selectthebestfeaturebymaximal
entropy
5.Createpartitionsfromallpossiblecolumnvaluesforthisfeature
6.Ifthereisonlyonepartitionholdingallvalues,gotostep10
7.Foreachpartitioncreateachildfeaturevaluenode
8.ForeachchildnodeapplythecreateTreefunctionwiththe
remainingreduceddata
setbyfilteringalldatarowscontainingat
leastonevalueofthepartitionintherespectivefeaturecolumn
ofthedataset,andbyusingareducedremainingfeaturesetw/o
thecurrentfeature
9.Returnafeaturenodewithpreviouslycreatedfeaturevaluechild
nodes.
‐Finished‐
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
10.3 Artificial Neuronal Networks 339
10.Selectthebestfeaturebymaximalvaluedeviation
11.Mergeoverlappingorequalcolumnvalues
12.Foreachpossiblevaluecreateafeaturevaluenode
13.ForeachchildnodeapplythecreateTreefunctionwiththe
remainingreduceddatasetbyfilteringalldatarowscontainingat
leastone
valueofthepartitionintherespectivefeaturecolumn
ofthedataset,andbyusingareducedremainingfeaturesetw/o
thecurrentfeature
14.Returnafeaturenodewithpreviouslycreatedfeaturevaluechild
nodes.
‐Finished‐
end
function
classify(model,dataset)
1.Iteratethemodeltreeuntilaresultleafisfound.
2.Evaluateafeaturenodebyfindingthebestmatchingfeaturevalue
nodeforthecurrentfeatureattributebyfindingthefeature
valuewithminimaldistancetothecurrentsamplevaluefromthe
dataset.
end
The classification function classify applies input data to the model by iterat-
ing paths along the learned decision tree. Each variable along its path is
evaluated by finding the next edge in the tree. If a tree node has more than
two edges and/or edges have overlapping value intervals, the classifier selects
the best matching feature variable edge by finding the nearest neighbour for
the current feature variable value from the set of feature value intervals.
The definition of a common parameter used in the INN algorithm applying
2e intervals to variable data is only useful if all variables have data values in
equal interval, i.e., x
i
[v
0
,v
1
]. This is the case for sensors of the same class in
distributed sensor networks. If the sensors are different and the sensor val-
ues have relevant different dynamic ranges an x
i
-individual
i
value has to be
defined and the Equation 10.6 has to be changed using individual
i
values.
10.3 Artificial Neuronal Networks
Artificial neuronal networks (ANN) can be used for classification, regression,
and clustering, i.e., supporting supervised and unsupervised learning method-
ologies. In contrast to a DT representing a strict hierarchical processing model
an ANN represents a parallel processing model.
An ANN is composed of a set of simple processing units (nodes), called neu-
rons that represent activation function f(x): x y, mapping one or more input
variables x=(x
1
,x
2
,..) on an output variable, commonly posing a binary set of
values.
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
340
The taxonomy of ANNs basically distinguishes the following classes and
architectures [SIB12]:
Feed forward NN
n Single-layer perceptron.
A single-layer perceptron consists of a single layer of output
nodes. Input variables are directly mapped on output varia-
bles using weights. The sum of the weighted input variables is
calculated for each node, s=x
i
. If the sum value s is about a
threshold s
thr
, the neuron "fires" and exposes the sum value
(or any other fixed value) at the output, otherwise it exposes
another deactivated value (e.g., 0 or -1). Neurons with this
kind of activation function are also called McCulloch-Pitts neu-
rons or threshold neurons.
n Multi-layer perceptron.
A multi-layer perceptron consists of multiple layers of compu-
tational nodes interconnected in a feed forward architecture,
shown in Figure 10.4. A learning algorithm have to adjust the
weights of the network to optimize correct prediction that is
more complex with multi-layer perecptrons than with single-
layer percpetrons. One common technique is to define an er-
ror function comparing the values of the output variables with
expected values and back propagating the errors to the input
layers to adjust the weights. The weights of each single con-
nection are adjusted to minimize the error function outcome
by a small amount.
Recurrent NN
In contrast to feed forward networks with uni-directional data flow
there are connections propagating output variables to previous layers
causing a bi-directional data flow in RNNs. FNN propagate data linearly
from input to output nodes, whereas RNN’s back propagation intro-
duces non-linear functional behaviour.
n Single recurrent NN
n Hopfield NN
Stochastic NN
n Boltzmann Machines
Modular NN
Module NN are composed of multiple independent NNs performing a
sub-task and operating on a sub-set of input data connected by an
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
10.3 Artificial Neuronal Networks 341
intermediate and moderating layer that isolates the independent NNs
from each other.
Examples of modular NN are:
n Committee of machines (CoM)
n Associative NN (ASNN)
In multi-layer ANNs (see Figure 10.4), the input layer connects to the outside
world and the output layer stores the prediction results. All neurons between
the input and output layers are hidden neurons. The transfer function t
n
(i): i
o of a neuron n is commonly a logarithmic function mapping the input vari-
ables of the neuron (the incoming edges from other neurons or input
variables) on the output variable (outgoing edge to other neurons or output
variable).
The learning algorithm of an ANN performs training of the perceptrons in
two ways:
1. By adjusting the weights of the interconnections;
2. By reconfiguring the interconnection network (adding or removing
edges of the network graph).
The commonly used back propagation algorithm is summarized in Algo-
rithm 10.2 that is well suited due to its ability to generalize well on a wide
variety of problems. Back propagation of errors through a feed-forward neu-
ral network requires a deviation of the activation function.
Fig. 10.4 Principle architecture of a feed forward ANN consisting of three layers



S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
342
Alg. 10.2 Back propagation algorithm used for ANN training
1. InitializeNetwork
2. Setupnextinputvectorfromthetrainingdataset
3. Propagateinputvector
4. Calculatetheerrorsignale=|yy
0
|
5. Propagateerrorsignaltonetwork
6. Adjustweightstoreduceerror
7. Repeatsteps25toreduceerror
untiltheerrorisbelowathresholde<e
thr
10.4 Learning with Agents
The agent model poses some kind of autonomy and the capability to inter-
act with an environment via perception and actions. Up to here only
classification and regression tasks were considered. But adaptation of models
at run-time is another major machine learning task. The agent architecture
relies on the adaptation of the behaviour based on perception and reasoning.
This adaptation can be performed by reinforcement learning.
According to [RAN07], learning and agent models can be combined by
merging two operational cycles: The Agent Based Modelling (ABM) and the
Machine Learning (ML) cycle, illustrated in Fig 10.5, creating an incremental
learning approach. From another point of view, the agents control the learn-
ing process. Such agent-based learning systems can be implemented, e.g.,
with ANNs posing multiple advantages and features:
Fault Tolerance, i.e., missing or failing agents can be compensated by a
self-organizing Multi-agent System;
No base data assumption, i.e., composing and training ANNs do not
require specific knowledge about input data distributions and statis-
tics;
"Organic Learning", i.e., ANNs and agents are not limited to prior given
expert knowledge and can extend their knowledge at run-time;
Incremental Learning, i.e., ANNs do not require initial extensive train-
ing;
Learning of complex non-linear functions, i.e., ANNs can detect in prin-
ciple all non-linear relationships between input and output variables.
Both cycles are similar. The integrated cycle inserts the ML cycle in the
update block of the ABM cycle, i.e., updating the internal agent state based on
the outcome of the learning cycle.
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
10.5 Distributed Learning 343
Fig. 10.5 The composition of learning agents by merging the operational Agent Based
Modelling and Machine Learning cycles (based on
[RAN07])
Among using learning to adapt agents and multi-agent systems to environ-
mental changes, agents can be used to perform adaptive learning for
classification, regression, and clustering tasks in a cooperative manner and by
using a divide-and-conquer approach, discussed in the next sections. For
example, neurons of ANNs can be mapped on agents providing a dynamic
reconfigurable network. Connections between neuron agents can be realized
by simple directed communication.
10.5 Distributed Learning
Large scale sensor networks with hundreds and thousands of low-resource
sensor nodes require data processing concepts far beyond the traditional
centralized approach with peer-to-peer and request-reply interaction. Decen-
tralized mobile Multi-Agent systems can be used to implement smart and
optimized sensor data processing and machine learning in such distributed
sensor networks.
Commonly sensing applications operate stream-based, i.e., the sensor
information is collected by one or multiple dedicated nodes periodically from
all sensor nodes, requiring high-bandwidth communication and resulting in
















 


!











 

S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
344
high power consumption. Frequently, most of the sampled sensor data do not
contribute to new information about the sensing system, in a multi-sensor
system only a few sensors will change their data beyond a noise margin. For
example, there is no change in the load of a mechanical structure, and hence
the is no significant change in the sensor data set. Or a change of the load sit-
uation results in a sensor data change in a spatially limited region, not
effecting other regions.
The machine learning algorithms presented in the previous sections are
implemented as centralized instances. Furthermore, learning with training
data is performed off-line with the entire training set and must be finalized
before the learned model can be applied for classification of unknown data.
In distributed sensor networks the input data used for the learning task is
inherently distributed and geometrically correlated. Instead collecting the
entire sensor data by a central instance, a more useful and efficient approach
is to distribute the learning algorithm by creating multiple distributed learning
instances following this approach:
(10.7)
with M as a global machine learner operating on global data D, m a local
machine learner operating on spatially bound local data d. Each distributed
machine learner m derives a hypothesis model h (predictor) operating on local
data d. In contrast to a centralized predictor, there are multiple predictions
that must be collected by a global instance K finally computing a global predic-
tion based on the local ones, e.g., by using majority election of votes given by
the localized predictors.
MD hS
l
hS l
DSlSl
S
xx
xx
n
mn
:()
:
:{( , ),( , ), }
:
,,
,
L
11 2 2
11 1
1

,,
,, ,
,,
:()
:
m
ij ij ij
ij ij
md hs
hs l
⎯→⎯⎯⎯
Distribution
iij
ij ij ij
ij
i
Kll l
dslsl
s
x
,
,,
,, ,
,
:( , , )
:{(,),(,),}
:
11 1 2
11 22
−− +−
+− ++
ujv iujv
iujv iujv
x
xx
,,
,,

S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
10.5 Distributed Learning 345
10.5.1 Event-based Sensor Processing
Distribution of learning in sensor networks requires an event-based,
robust, and decentralized sensor data processing as the prerequisite for the
ML task [BOS16B]. That means:
1. An event-based sensor distribution behaviour is used to deliver sensor
information from source sensor to computation nodes based on local
decision and sensor change predication.
2. Adaptive path finding (routing) supports agent migration in unreliable
networks with missing links or nodes by using a hybrid approach of
random and attractive walk behaviour
3. Self-organizing agent-based learning system with exploration, distribu-
tion, replication, and interval voting behaviours based on feature
marking are used to identify a region of interest (ROI, a collection of
stimulated sensors) and to distinguish sensor failures (noise) from cor-
related sensor activity within this ROI.
Fig. 10.6 The logical view of a sensor network with a two-dimensional mesh-grid topol-
ogy (left) and examples of the population with different mobile and immobile
agents (right): node, learner, explorer, and voting agents.
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
346
In many sensing applications like Structural Monitoring predicting a load sit-
uation or health conditions, sensor nodes are commonly arranged in some
kind of two-dimensional grid network (as shown in Figure 10.6). The sensor
nodes provide spatially resolved and distributed sensing information of the
surrounding technical structure, for example, a metal plate or a composite
material with attached strain gauge sensors. Usually a single sensor cannot
provide any meaningful information of the mechanical structure.
The sensor network can contain missing or broken links between neigh-
bour nodes. Immobile node agents are present on each node (i.e., node
service agents) performing sensor processing, agent control, and event detec-
tion. Node agents on pure sensor nodes (yellow nodes in the inner square)
create learner agents performing regional learning and classification. Each
sensor node has a set of sensors attached to the node, e.g., two orthogonal
placed strain gauge sensors measuring the strain of a mechanical structure.
Spatially bound regions in the network, Regions of Interest (ROI), are used
to compute an event-based prediction and classification of the load case situ-
ation using supervised machine learning. Mobile agents are used to collect
(percept) and deliver sensor data, but only limited to the ROI, shown in Figure
10.6 (explorer agents delivering neighbourhood sensor data to learner
agents).
10.5.2 Distributed Learning Algorithm DINN
According to Equation 10.7 the learning and prediction task is divided into
multiple independent learning instances operating on local data. Pattern rec-
ognition (sensor events) identify the aforementioned ROIs that activate
learning instances (either for training or prediction). In the prediction phase a
set of prediction results from individual learner instances are collected and a
final prediction result is obtained by majority voting, summarized in Algorithm
10.3.
Alg. 10.3 Basic DINN algorithm: A multi process view communicating signal events
globalProcessWorld:
Createnodeprocesses{Node
1
,..Node
N
}
ProcessNode
i
:
Createprocesses:{sensor
i
,learner
i
}
Processsensor
i
:
IfthereisasensorchangeofthisS
i
ThenCreateexploreragent
Processexplorer
i
:
ExploretheneighbourhoodaroundtheoriginS
i
,
collectsensorsignals{S
in
,..S
i
,..,S
i+n
}.
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
10.5 Distributed Learning 347
Ifthereisasignificantchangeofsensorsintheneighbourhood
ThenSignalEvent(SENSORS,D
i
={S
in
,..S
i
,..,S
i+n
})
Processlearner
i
:
Loop:
WaitForEvent(SENSORS,D
i
)
IfinmodeCollectingThenAddnewtrainingdataDS=DS+D
i
ElseIfinmodeLearningThen
M=INN.createtree(DS)
ElseIfinmodePredictingThen
Result
i
=INN.classify(M,D
i
)
SignalEvent(VOTE,Result
i
)
globalProcesselection:
Loop:
WaitForEvents(VOTE,Result
i
)
Makeavotingdecision:Majoritywins!
SignalEvent(PREDICTION,Result
most
)

10.5.3 Distributed Learning with MAS
Figure 10.7 gives an overview of the composition of a complete sensor pro-
cessing and distributed learning system with different agent classes. The MAS
consists of the following agents:
Node Agent
Explorer Agent
Sensing Agent
Distributor Agent
Notification Agents
Learning Agent
Voting Agent
Election Agent
Some classes are super classes composed of sub-classes (e.g. the learner
and the explorer class). A sensor node is managed by the non-mobile node
agent, which creates and manages a sampling and sensing agent, responsible
for local sensor processing, and a learner agent, which is initially inactive.
The world class is only used in the simulation environment and has the pur-
pose to create and initialize the sensor network world and to control the
simulation using monte-carlo techniques. The notify (todo) agents are injected
in the network to notify nodes and learner agents about the network mode, if
it is in training mode and which training class (load situation) is currently
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
348
applied, or being in the classification mode. The notify agents will replicate
and diffuse in the network (divide and conquer behaviour).
The event-based regional learning leads to a set of local prediction results,
which can differ significantly, i.e., the classification set can contain wrong pre-
dictions. To filter out and suppress these wrong predictions, a global major
vote election is applied. All nodes performed a regional classification send
their result to the network collecting all votes and perform an election.
This election result is finally used for the load case prediction. The variance
of different votes can be an indicator for the trust of the election giving the
right prediction.
Fig. 10.7 Overview of different agent classes and sub-classes used for the sensor pro-
cessing and learning in the network and their relationships (grey solid arrow:
agent instantiation at run-time, light arrow: sub-class relationship). The world
agent is artificial and is only used in a simulation and handles the physical
world and the network.
$$WWULEXWHV
UDGLXV
JURXS
HQRXJKLQSXW
VHQVRUV
HSVLORQ
$FWLYLWLHV
ZDLW
H[SORUH
OHDUQ
FODVVLI\
*RDOV
([SORUH 52,
QHLJKERXUKRRG
/HDUQ 52,
3UHGLFW 52,
9RWH
/HDUQHU
$WWULEXWHV
GLU
KRS
PD[KRS
JURXS
HQRXJKLQSXW
VHQVRUV
$FWLYLWLHV
PRYH
SHUFHSW
H[SORUHIRUN
JREDFN
GHOLYHU
*RDOV
&ROOHFW VHQVRU GDWD LQ
52,
([SORUHU
$WWULEXWHV
GLU
KRSV
WDUJHW
YRWH
$FWLYLWLHV
PRYH
YRWH
*RDOV
'HOLYHU YRWH WR
HOHFWLRQ QRGHV
9RWHU
$WWULEXWHV
VHQVRU$
VHQVRU%
WKUHVKROG
$FWLYLWLHV
VHQVH
VOHHS
*RDOV
6HQVLQJ
6HQVRU 3UHSURF
(YHQW 3UHGLFW
/HDUQHU 0DQDJ
1RGH
$WWULEXWHV
$FWLYLWLHV
*RDOV
6LPXODWLRQ &RQWURO
&UHDWH 1RGH $JHQWV
'DWDEDVH $FFHVV
1RWLILFDWLRQ
0/ &RQWURO
:RUOG
$WWULEXWHV
PRGH/HDUQ_$QDO\VH
KRS
PD[KRS
GLU
$FWLYLWLHV
PRYH
QRWLI\
UHSOLFDWH
*RDOV
,QIRUPDWLRQ
'LVWULEXWLRQ
1RWLI\
N * N
0 . . 1
1 . . 4 1 . . 4
0 . . n
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
10.5 Distributed Learning 349
Fig. 10.8 Simulation Results. The top figure shows the temporal agent population for a
long-time run with a large set of single training and classification runs, with a
zoom shown in the middle two figures. The bottom figure shows global classi-
fication results obtained by major voting of all event-activated regional
learner agents.
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
350
10.5.4 Distributed Learning: Case Study
To evaluate the distributed learning approach, an extensive MAS simulation
was performed. The simulation assumes a spatially two-dimensional sensor
network (see Figure 10.6 for details) with nodes arranged in a mesh grid con-
necting each node with up to four neighbour nodes. Each sensor node is
attached to a strain gauge sensor used to measure strain of an artificial plate.
The artificial sensor values were derived by inverse numerical computation
and transferred to the MAS simulation. Some simulation results are shown in
Figure 10.8. The agent population plots show the efficient data processing of
the event-based sensor processing and learning activities performed by the
agents.
Each learning/classification run requires about 0.5-1MB communication
costs (using code compression) in the entire network only, and the agent pop-
ulation reaches up to 400 agents (peak value, but executed in the simulation
by one physical JAM node), and a logical JAM node is populated with up to 10
agents.
10.6 Incremental Learning
The Machine Learning approaches presented in the previous section oper-
ate in two sequentially phases: (1) Learning (Deriving the prediction model
with known training data) and (2) Application (Performing the classification
using unknown data). Jiang [JIA13] showed that it is possible to perform incre-
mental learning at run-time using trees, very attractive for agent and SoS
approaches. A learned model (carried by the learner agent) is used to map
data vectors (of an input variable set x
1
,x
2
,..; the feature vector) on class val-
ues (of an output variable y). The model can be updated at run-time by adding
new training data or by updating the learned model by back propagation and
reinforcement learning. The classification tree consists of nodes testing a spe-
cific feature variable, i.e., a particular sensor value, creating a path to the
leaves of the tree containing the classification result, e.g., a mechanical load
situation. Among the distribution of the entire learning problem, event-based
activation of learning instances can improve the system efficiency significantly
as shown in the previous section, and can be considered as part of the distrib-
uted learning algorithm (a pre-condition). Commonly the locally sampled
sensor values are used for an event prediction, waking up the learner agent,
which collect neighbourhood data by using a divide-and-conquer system with
explorer child agents.
Combining the previously introduced distributed learning approach with
incremental learning algorithms enables a self-adaptive learning system with
a feedback loop, suitable for sensor processing, e.g., by integrated sensor net-
works in structural monitoring or by wide-area sensor networks.
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
10.6 Incremental Learning 351
Fig. 10.9 The principle concept: Global knowledge based on majority decision is back-
propagated to local learning instances to update the learned model.
The run-time behaviour flow of such a decentralized learning system is
shown in Figure 10.9.
Basically there are two different classes of Incremental learning algorithms:
(1) A learned model is updated with new training data sets by adding the new
sets to a stored database of old training sets; (2) A learned model is updated
with new training data sets but without a data base of old sets.
Since agents should perform learning and classification in a distributed
manner it is necessary to apply class (2) to minimize storage and communica-
tion complexity. An agent must store the training data and the learned model
and it is useful to store only the learned model that can be updated at run-
time without saving the entire history data.
10.6.1 Incremental Learning Algorithm I
2
NN
The new algorithm for the incremental updating of a learned model (deci-
sion tree) with new training set(s) is shown in Figure 10.10 (in detail defined in
Algorithm 10.4). The initial model can be empty. The current decision tree can
be weakly structured for a new training set (new target), i.e., containing varia-
bles unsuitable for separation of the new data from old one, which can result
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
352
in a classification of the new target with insignificant variables. Therefore, if a
new node is added to the tree the last node is expanded with an additional
strong (most significant) variable of the new data set (it is still a heuristic for
future updates), i.e., creating an overdetermined tree.
The decision tree (DT) to be constructed consists of Feature and Value
nodes, and Result leaves. First the current DT (model) is analysed. All feature
variables x and their value bounds found in the tree in Feature(x,vals)
nodes are collected in the featureM set (a list of (x, lower bound, upper
bound) tuples). New feature variables added to the tree should not be con-
tained in this set. Now each new training set, consisting of data variables x and
a target result variable y, is applied to the DT. If the DT is empty, and initial
Feature node is created using the most significant data variable As mentioned
before, another Feature node is added for future DT updates. If the DT is not
empty, the tree is iterated from the root node with the current training data
until a feature variable separation is found, i.e., a new Value(x,..) node can
be added with an non-overlapping 2-interval around the current value of the
variable x. If the current 2-interval of a value of a feature variable overlaps an
existing Value interval, this interval is expanded with the new variable interval
and the DT is entered one level deeper. If the last Result leave is found and its
value is not equal to the current target variable value, the update has failed.
This simple learning algorithm has a computational complexity of (N) with
respect to the number of training sets N to be added, and for each data set
(logn) with respect to the number of nodes n in the current tree, assuming a
balanced tree. The incremental learning is significantly simpler than the
entropy-based feature selection IDT algorithm.
A physical stimulus results in sensor activity at multiple positions in the sen-
sor network that is analysed by an event recognition algorithm (see previous
section for an explanation). If a sensor node detects a local sensor event the
local learner is activated. It performs either a learning of a new training set or
applies the learned model with the current data set consisting of ROI data. In
the case of a prediction, it will make a vote. After the election of all votes, the
result is back propagated to the network and all learners can update their
model with the current data set as a new training set.
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
10.6 Incremental Learning 353
Fig. 10.10 The processing flow of the new I
2
NN Algorithm
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
354
Alg. 10.4 Incremental Interval Decision Tree Learner Algorithm (I
2
NN) (x: feature varia-
ble, y: output target variable,
x: lower bound of variable x,
x : upper bound,
v(x): value of x)
1 typenode=Result(t)|
2 Feature(x,vals:node[])|
3 Value(v,child:node)
4
typeofdataset=(x
1
,x
2
,x
3
,..,y)[]
5
6
functionlearnIncr(model,datasets,features,target,options){
7 y=target
8 =options[]
9 Analyzethecurrentmodeltree
10 featuresM={(x
i
,x
i
,x
i
)|Feature(x
i
)model}
11 
12 features’={ffeatures|ffeaturesM};
13 Createarootnode
14 
functioninit(model,set){
15 f
1
=significantFeature(set,features)
16 features’:={fffeaturesff
1
}
17 f
2
=significantFeature(set,features’)
18 features’:={fffeaturesff
2
}
19 featuresM:=featuresM{(f
1
,v(f
1
),v(f
1
)+),(f
2
,v(f
1
),v(f
2
)+)}
20 model=
21 Feature(f
1
,[Value([v(f
1
),v(f
1
)+],
22 Feature(f
2
,[Value([v(f
1
),v(f
2
)+],
23 Result(v(y))]]
24 }
25 Iterateandupdatetree
26 
functionupdate(node,set,feature){
27 whennodeisResult:
28 ift(node)y(set)thenFailure!
29 whennodeisFeature:
30 x=x(Feature)
31 ifset[x]notinfeaturesM(x)then
32 Newtargetcanbeclassified
33 f1:=significantFeature(set,features’)
34 featuresM:=featuresM{(f
1
,v(f
1
),v(f
1
)+)}
35 Extendinterval
36 featuresM(x):=min(featuresM(x),set[x])
37 featuresM(x):=max(featuresM(x),set[x]+)
38 leaf=
39 Value([set[x],set[x]+],
40 Feature(f
1
,[Value([v(f
1
),v(f
1
)+],
41 Result(v(y))]]
42 addleaftovals(Feature)
43 else
44 Godeeperinthetree,findan
45 overlappingvalueandextendtheinterval
46 withvalvals(Feature)|v(val)overlapset[x]do
47 Extendinterval
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
10.6 Incremental Learning 355
48 v(val):=min(v(val),set[x])
49 v(val):=max(v(val),set[x]+)
50 update(val,set,x(node))
51 whennodeisValue:
52 update(child(node),set)
53 }
54 Applyallnewtrainingsets
55 setdatasetsdo:
56 ifmodel=theninit(model,set)
57 elseupdate(model,set)
58
59 returnmodel
60 }
10.6.2 Distributed Incremental Learning Algorithm DI
2
NN
Like in the DINN algorithm, the I
2
NN learning and prediction algorithm is
distributed by applying local data to multiple learning instances and finally
applying majority voting to predictions of multiple instances. Learning
instances are activated by significant sensor changes (event-based learning
and predicting).
10.6.3 Distributed Incremental Learning MAS
The Multi-agent System consists basically of the same agent classes used in
the distributed non.incremental learning approach from the previous section.
In [BOS17C], a real sensor network consisting of seismic stations were
transformed to a two-dimensional mesh-grid, placing station nodes based on
a spatial neighbourhood relation to other stations, shown in Figure 10.11. The
sensor network is populated with different mobile and immobile agents. Non-
mobile node agents are present on each node. Sensor nodes create learner
agents performing regional learning and classification. Each sensor node has
a set of sensors attached to the node, e.g., vibration/acceleration sensors.
Agents interact with each other by exchanging tuples via the tuple space and
by sending of signals. All agents were implemented in AgentJS, and allocate
about 1k-10k Bytes for the entire process including code and data. Some
agents, e.g., the explorer, is partitioned in a main and smaller sub-classes
used only for the creation of child agents.
Node Agent
The immobile node agent performs local sensor acquisition and pre-pro-
cessing:
n Noise filtering;
n Validation of sensor integrity;
n Sensor fusion;
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
356
n Down sampling of sensor data and storing data in tuple
space;
n ROI monitoring by sending out explorer agents (optional) per-
forming a correlated cluster recognition;
n Energy Management;
n Activation of learner agent.
Explorer Agent
The explorer agent is used to collect sensor data in an ROI by performing a
divide-and-conquer approach with child agent forking. This approach pro-
vides robustness against communication failures and weak node connec-
tivity.
n On the starting node, initially a set of explorer agents is sent
out from to all possible directions.
n Each explorer agent migrates to the neighbour node, collect
and processes local sensor data, and sends out further child
explorer agents to all neighbourhood nodes except the previ-
ous node.
n All child explorer agents collect sensor data, divide them-
selves until the boundary of the ROI is reached, and return to
the parent agent and deliver the collected sensor data. The
approach is redundant, and hence multiple explorer agents
can visit one node. The first explorer on a new node stores a
marking in the tuple space, notifying other explorers to return
immediately.
n After all explorer agents returned, the collected sensor matrix
is delivered in the tuple space.
Learner Agent
The learner agent has the goal to learn a local classification model with data
from an ROI. It can operate in two modes: (1) Learning (2) Classification (Ap-
plication).
n The learner sleeps after start-up until it is woken up by the
node agent. The synchronization takes place via the tuple
space by consuming a TODO tuple.
n Learning mode: The TODO tuple contains the target variable
value. The learner sends out explorer agents to collect the
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
10.6 Incremental Learning 357
sensor data (three sensors HHE, HHN, HHZ) in the ROI.
n If it uses the IDT algorithm, the learner stores the data set in
its own data base. The current training data base is used to
learn the model.
n If it uses the I
2
NN algorithm, the learner only updates the cur-
rent model and discards the current training data.
n Application mode: The learner sends out the explorer agents
to collect sensor data in the ROI. It uses this sample data for
prediction. If the classification was successful, it will send out
voter agents.
Distributor Agent
Among the local learning approach, sensor data is collected by central in-
stances, e.g., performing model-based seismic data analysis. A distributor
agents is used to deliver a set of sensor data from an ROI. The distributor
agent is activated only if there was a significant sensor change in the ROI.
The distributions process can be performed with different approaches:
n Peer-to-peer, i.e., the destination node is known or a path to
the destination must be explored.
n Broadcasting, i.e., using divide-and-conquer with agent repli-
cation.
n Data sink driven, i.e., the distributor agent follows a path of
marking (stored in the tuple space) to deliver the data along
this path.
Voter Agent
The voter agent distributes a particular vote to election agents. There can
be multiple election agents in the network, hence a row-column network
distribution approach is used.
n The initial node agent sends out multiple voter agents to all
possible directions (four directions in a mesh-grid network).
n If a distributor agent reaches a boundary of the network, it will
replicate and distribute the vote in perpendicular and oppo-
site directions until a node with an election agent is found.
Election Agent
An election agent is some kind of central instance of the MAS.
n Collecting of votes delivered by voter agents within a time in-
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
358
terval (after that votes are discarded)
n Back propagation of winner votes to the learner agents by
sending out notification agents.
Notification Agent
The notification agents are sent out by some central instance to notify all
nodes that a new training set is available with a specific target variable val-
ue, e.g., the parameters of an earthquake event (identifier, location, magni-
tude,..). One central instance can be the election agent that evaluate votes
in application mode. The winner vote fraction is carried by notification
agents to update learner agents.
n A divide-and-conquer approach with agent replication is used
to broadcast the notification to all nodes in the network.
Disaster Management Agent
Although not considered in this work and currently not existing in the MAS,
disaster management agents are high level central instances in the net-
work that monitor the election results and activity in the sensor network
and plan the co-ordination of disaster management based. See [FIE07] for
a consideration of MAS-based disaster management.
10.6.4 Distributed Incremental Learning: Case Study
In [BOS17C] the MAS simulation was performed with data from a seismic
network to evaluate the incremental learning approach. The real existing seis-
mic network used for monitoring earthquake events was mapped on a two-
dimensional mesh network, shown in Figure 10.11. Each node of the network
is considered as sensor or computational node representing an agent pro-
cessing platform, which can be populated with mobile and immobile agents.
Station data from different earthquake events were used to compute the sen-
sor stimulus.
The seismic input data is high-dimensional. Therefore, a major challenge is
data reduction. The original test data contains temporal resolved seismic data
of at least three sensors (horizontal East, horizontal West, and vertical acceler-
ation sensors) with a time resolution about 10ms, resulting in a very high-
dimensional data vector.
Usually a seismic sensor samples only noise below a threshold level, mainly
resulting from urban vibrations and sensor noise itself. For machine learning,
only specific vibration activity inside a temporal Region of Interest (ROI) is
relevant.
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
10.6 Incremental Learning 359
Fig. 10.11 (Top, Left): The South California Seismic Sensor Network CI [Google Maps]
(Top, Right) Sensor Network with stations mapped on a logical two-dimen-
sional mesh-grid topology with spatial neighbourhood placing, and example
population with different mobile and immobile agents. (Bottom) The station
map.
To reduce the high-dimensional seismic data, (I) The data is down sampled
using absolute peak value detection; (II) Searching for a potential temporal
ROI; and (III) Down sampling the ROI data again with a final magnitude nor-
malization and a 55-value string coding.
The process is shown in Figure 10.12. The compacted 55-string coding
assigns normalized magnitude values to the character range 0, az, and AZ
(!), with 0 indicating silence, and ! overflow. If there were multiple relevant
nearby vibration events separated by "silence", a * character separator is
inserted in the string pattern to indicate the temporal space between single
patterns.
RCT VES MLAC SPG ISA TIN CWC JRC2 CGO CLC MPM GRA SLA FUR SHO
PHL SMM TFT BAK WER MAG ARV TEH WBS LRL CCC GSC DSC TUQ LDF
SMB LCP FIG MPP MPI LJR LDR EDW2 EDW SBB2 LMR2 RRX NBS HEC NEE
SDP NJQ SYP SBC WGR OSI PDE BTP ALP LEV VCS LKL ADO VTV DAN
USB STC SIO SES MOP SMV QUG LFP RIN CHF TA2 LUG BBR JVA PDM
TOV AGO WSS NOT HLL DEC PAS CBC MIK KIK MWC BFS SBPX MCT IRM
LGU SPF DJJ GSA CRP CAC RUS RIO PDU FON CLT CFS HLN SVD BLA
SCZ2 SMS PDR LCG USC WTT LGB WLT OLI CHN MLS RVR RSB RSS BEL
LAF LTP DLA LBW1 BRE FUL SRN CRN PER BBS MSJ SLR DEV PLC MGE
RPV MIS FMP STS LLS SAN OGC STG PLS DGR AGA THX CTC BC3 BLY
SNCC SBI CIA KML SDD BCC GOR CAP PLM DNR BOR SAL NSS2 RXH SSW
SCI2 SDG SDR DPP OLP EML BAR JCS DVT ERR SWS WES DRE BTC GLA
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
360
Fig. 10.12 Data reduction of high dimensional temporal sparse data: (I) Down sampling
(1:16) with absolute peak value detection, (II) ROI analysis and ROI clipping,
(III) Down sampling (1:64) and scaling/normalization with 55-string coding
(0,a-z,A-Z,!,*)
The vibration (acceleration) is measured in two perpendicular horizontal
and one vertical directions. This gives significant information for an earth-
quake recognition and localization.
The data reduction is performed by a node agent present on each seismic
measuring station platform. Only the compact string patterns are used as an
input for the distributed learning approach. Based on this data, the learning
system should give a prediction of an earthquake event and a correlation with
past events. To deploy regional learning for a spatial ROI, seismic stations
should be arranged in a virtual network topology with connectivity reflecting
spatial neighbourhood, e.g., by arranging all station nodes in a two-dimen-
sional network. The virtual links between nodes are used by mobile agents for
exploration and distribution paths. They do not necessarily reflect the physi-
cal connectivity of station nodes.
The evaluation of the distributed incremental learner is performed by
applying the data of different earthquake events to the distributed system in a
random sequence. The learned prediction model should classify earthquake
events and should recognize similar events for disaster management. The dis-
tributed incremental learning algorithm DI
2
NN is compared with results from
non-incremental learning (using the distributed DINN algorithm).
Magnitude [arb. units]
(I)
Time [s]
Magnitude [arb. units]
(II)
Time [s]
Code [sca. units]
(III)
Time [seg. units]
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
10.6 Incremental Learning 361
Fig. 10.13 Comparison of distributed non-incremental (IDT
DINN) with incremental
(I2DT=DI
2
NN) learning and additional comparison of weighted (w) and not
weighted votes using same training data [BOS17C]
Figure 10.13 shows obtained results from simulation that poses a high pre-
diction quality of the incremental learner compared with the non-incremental
learner.
All predictions based on global majority election and Monte Carlo simula-
tion. Multiple learning runs were performed to train the network using a
random sequence of different earthquake events with noisy data (complete
set). During the classification (application) phase, a random sequence of noisy
seismic data was applied, too. All earthquake events can be recognized with a
high confidence and prediction accuracy (suing complete set). Some tests
were made with incomplete training sets (last three rows in Figure 10.13) to
find similar events.
The incremental DI
2
NN learner algorithm was about 200% faster than the
non-incremental DINN algorithm with the same training data sets (accumu-
lated computation time observed in the entire network with all participating
learners). The accuracy of the incremental learner is comparable to the DINN
learner. The confidence of the election result can be slightly improved if
weighted votes are processes, i.e., the local accumulated sensor data is used
as a weight, dominating the election by nodes with a high stimulus. But the
DI
2
NN does not profit from vote weighting.
The transition from learning to prediction is seamless and is based on the
node/learner experience (learned events). Furthermore, after an event is
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)
Chapter 10. ML: Machine Learning and Agents
362
elected by the majority decision, this result can be back propagated to the
learner adding the new data set as a new training set and performing incre-
mental learning to improve further prediction accuracy. A typical learning and
ROI exploration run in the entire network requires about 3-5MB total commu-
nication cost if code compression is enabled, which is a reasonable low
overhead (with a peak value about 500-1000 mobile explorer agents operat-
ing in the network). Vote distribution produces only a low additional
communication overhead (less than 1MB in the entire network). The usage of
I
2
NN lowers the entire communication costs about 30% compared with the
INN approach (due to the data base).
10.7 Further Reading
1. P. Attewell and D. B. Monaghan, Data mining for the social sciences: an
introduction, University of California Press, 2015, ISBN 9780520280977
2. T. Mueller, A. G. Kusne, and R. Ramprasad, Machine learning in materials
science: Recent progress and emerging applications, Reviews in Computa-
tional Chemistry, Volume 29, First Edition., 2016.
3. L. Rokach and O. Maimon, Data Mining with Decision Trees - Theory and
Applications, World Scientific Publishing, 2015, ISBN 9789814590075
4. J. Bell, Machine Learning - Hands-On for Developers and Technical Profes-
sionals, John Wiley & Sons, Ltd, 2015, ISBN 9781118-889060
5. T. Dietterich, C. Bishop, M. J. Heckermann, and M. Kearns, Introduction
to Machine Learning, Second Edition, MIT Press Cambridge, 2009, ISBN
9780262012430
6. T. M. Mitchel, Machine Learning, McGraw Hill, 1997, ISBN 0070428077
7. C. R. Farrar and K. Worden, Structural Health Monitoring: A Machine
Learning Perspective, Wiley-Interscience, 2013, ISBN 9781119994336.
S. Bosse, Unified Distributed Sensor and Environmental Information Processing with Multi-Agent Systems
epubli, ISBN 9783746752228 (2018)