To be able to mimic biological behaviour, the emphasis of ART neural networks lies at unsupervised learning and self-organization. Unsupervised learning means that the network learns the significant patterns on the basis of the inputs only, there is no feedback. There is no external teacher that instructs the network to which category a certain input belongs. Other types of learning are reinforcement learning and supervised learning. In reinforcement learning the net receives only limited feedback, like "on this input you performed well" or "on this input you have made an error". In supervised mode a net receives for each input the correct response. Unsupervised learning is the substrate on which the other types of learning are based. Learning in biological systems always starts as unsupervised learning, for the newly born hardly any pre-existing categories exist. A system that can learn in unsupervised mode can always be adjusted to learn in the other modes, like reinforcement mode or supervised mode. However, a system specifically designed to learn in supervised mode can never perform in unsupervised mode. Needless to say that in unsupervised mode we cannot have a separate training and performance phase because this implies the presence of a homunculus that knows when to alter phases. Self-organization means that the system must be able to build stable recognition categories in real-time.
These design constraints have led to a series of real-time ART neural network models for unsupervised category learning and pattern recognition. Model families include ART1, which can stably learn to categorize binary inputs presented in an arbitrary order (Carpenter & Grossberg, 1987b); ART 2, which can stably learn to categorize either analog or binary data (Carpenter & Grossberg, 1987) and ART 3, which can carry out parallel search of distributed recognition codes in a multilevel network hierarchy (Carpenter & Grossberg, 1990). The Fuzzy ART model (Carpenter et al., 1991) is based on fuzzy logic computations and incorporates the ART1 model since computations from fuzzy set theory reduce to binary computations when the fuzzy variables become binary valued.
Besides the networks described above, which are based on unsupervised learning, supervised network architectures like ARTMAP have been developed that incorporate one or more of the unsupervised ART modules given above (Carpenter et al., 1991a). Figure 1 shows a block diagram of such a system. In supervised mode mappings are learned between input vectors a and b. A familiar example of supervised neural networks are feedforward networks with backpropagation of errors (BP networks, Weenink, 1992). Supervision is, however, their only similarity with ARTMAP networks. ARTMAP networks are self-stabilizing while in BP networks new information gradually washes away old information. A consequence of this is that a BP network has separate training and performance phases while ARTMAP systems perform and learn at the same time. Besides, ARTMAP networks are designed to work in real-time, while BP networks typically are designed to work off-line, at least during their training phase. Another difference is that while ARTMAP systems can learn both in a fast as well as in a slow match configuration, BP networks can only learn in slow mismatch configuration. This means that an ARTMAP system learns, i.e., adapts its weights, only when the input matches an established category, while BP networks learn when the input does not match an established category. In BP networks there is always the danger of the system getting trapped in a local minimum while this is impossible for ART systems. However, in systems based on ART modules learning may depend upon the ordering of the input patterns.
Category ART, that we herewith introduce, is a specialized fast algorithmic variant of the ARTMAP class of neural network architectures and performs incremental supervised learning of recognition categories in response to input vectors presented in arbitrary order. Under supervised learning conditions, Category ART's internal control machanisms create stable recognition categories by maximizing predictive generalization while minimizing predictive error, just like the ARTMAP architectures do.
Category ART differs from the figure 1 ARTMAP architecture in several ways: there is only one ART module present and the map field has disappeared. Instead a simpler algorithm replaces the dynamics of both components. The dynamics of the network, however, are still based on Adaptive Resonance Theory .
Originally all learning equations in ART systems are written in the language of real-time systems, i.e., differential equations. In our implementation, as in most algorithmic variants discussed above, steady state approximations are used that capture the essence of these dynamic equations. Hence we do not have to use integration methods nor will we use differential equations in the formulation of the dynamics of Category ART.
All ART systems incorporate basic features, notably, pattern matching between
bottum-up input and top-down learned prototype vectors. This matching leads
either to a resonant state that focusses attention and triggers stable
prototype learning or to a self-regulating parallel memory search. This search
ends in either of two ways. First, if an established category is selected, then
this prototype may be refined to incorporate new information in the input
pattern. In this case when an input matches an established category, we speak
of resonance. This resonant state persists long enough for learning to occur;
hence the term adaptive resonance theory. Second, if the search ends by
selecting a previously untrained node, then learning of a new category takes
place. The criterion of an acceptable match is defined by a dimensionless
parameter called vigilance. Vigilance weighs how close an input must be to the
top-down prototype for resonance to occur. Because the vigilance
parameter can vary across learning trials a single ART system is able to encode widely differing degrees of generalization. Low vigilance leads to broad generalization and more abstract prototypes than high vigilance. In the limit of very high vigilance, prototype learning reduces to exemplar learning.With the help of the diagrams in figure 3, we will now follow in detail a typical ART search cycle. Not shown in this figure is the preprocessing field F0 whose main purpose is a normalization of the input pattern.
(a) After the preprocessing by field F0, an input pattern I generates a pattern of activity X at field F1. The 2/3 rule is satisfied here because input I also activates the gain control at the F1 level. The activation of the gain control is nonspecific because it does not depend on the type of pattern but only on its overall input activity. Pattern X both inhibits A and generates an output signal S from field F1. Inhibition of A is necessary because otherwise a reset of field F2 would occur. The signal S is multiplied by the bottom-up connection weights and results in a signal T that inputs to the F2 level. The signal T produces an output Y from the level F2. Here also the 2/3 rule is obeyed because the input signal I also nonspecifically activates the gain control for the F2 level. The signal Y, for example, could result from the activation of the node(s) whose connection weights best matched the signal S.
(b) the pattern Y now generates a top-down signal pattern U which, after being multiplied by the top-down connection weights, results in the prototype pattern V. This prototype pattern V is compared at F1 with the input pattern I. The result of this comparison is a new pattern of activity X* at F1. If V mismatches I at F1 the resulting activity X* will have significantly dropped. As a result of this reduction in total activity, less inhibition results at A.
(c) if now the vigilance criterion at A fails to be met, A can release a nonspecific signal to F2 which inhibits the nodes at F2 that were most active. As a result the signal Y is reset as well as the feedback signal U and its prototype V.
(d) pattern X is reinstated at F1 and a different STM pattern Y* becomes active at F2. If the top-down prototype due to Y* also mismatches I at F1, then the search for an appropriate code continues until either a prototype has been found that satisfies the matching criterion at A, or a new category must be established at a previously uncommitted node.
In the sequel we will describe how the the ideas of this section can be implemented in the form of an algorithm for our Category ART. However, before we can explain the supervised Category ART algorithm, we first have to explain how a basic ART module works. As an example we take the Fuzzy ART module for unsupervised classification. This Fuzzy ART module will later be incorporated in the Category ART model.
Three parameters determine the dynamics of a Fuzzy ART network, a choice parameter > 0; a learning rate parameter [0, 1] and a vigilance parameter [0, 1]. The influence of these parameters on the network dynamics will be explained in the following paragraphs.
.
We then get for the norm of I
When a pattern I is presented at field F1, a choice function Tj is defined according to the following formula
,
where is the choice parameter and is the fuzzy AND operator, defined as
.
The fuzzy AND operator reduces to the Boolean AND operator in the case of binary vectors. The system is said to make a category choice when at most one F2 node can become active at a given time. The F2 node with maximum Tj will be chosen to represent the pattern I, and, when the Jth category node is chosen, the output vector y of the field F2 is set as yJ = 1 and yj = 0 if j!=J. In a choice system, the F1 activity vector x obeys the equation
If the chosen category J meets the vigilance criterion, that is if
,
then learning can occur. Mismatch reset occurs when the vigilance criterion is
not met, and subsequently a new node is chosen. This search process continues
until the chosen node satisfies the vigilance criterion. The search order among
the nodes in the F2 layer depends on the choice parameter . If is small then
the search is more dominated by the pattern with the largest ratio
than by the the size of
alone.
For larger values of we see that the patterns for which
is large dominate the search. We can now make the following hierarchy for the
F2 nodes that will be chosen when an input pattern I is presented at the F1
layer (Huang et al., 1995):
(a) If there is a subset node it will be chosen over an uncommitted node. A subset node has a template wj whose components satisfy
This means that for a subset node
(b) Because of the choice parameter > 0, among all the subset nodes the node with the largest template wj will be chosen first.
(c) An uncommitted node will be chosen whenever there are no subset nodes and all committed nodes j satisfy
In our implementation of the Fuzzy ART algorithm we have changed this biologically oriented blind search. Mainly for reasons of efficiency, we always maintain a list of committed and uncomitted nodes to speed up the search process.
When = 1 we speak of fast learning. For efficient coding of noisy inputs, we
choose the fast learning option when J is an uncommitted node, and then take
< 1 after the node is committed. Then
=
I the first time category J becomes active. After the commitment the weight
vector update causes the new weight vector to become more aligned with the most
recently coded input pattern.
We note that both in the ARTMAP as well as in the Fuzzy ARTMAP implementations of Carpenter at al. (1991a, 1992) the map vigilance parameter is ineffective because the output of the ARTb network, yb, is always normalized to one.
In our Category ART algorithm, the second ART system, ARTb, whose only function is to form a category representation, and the map field are replaced by an ordered collection of category labels and an array of pointers. There is a pointer to a category label for each node of the F2 layer of the ARTa module.
The Category ART learning algorithm in pseudo code goes as follows:
for all (pattern p, categoryLabel c)
learn (pattern p, categoryLabel c)
if categoryLabel not in categoryLabelList
create categoryLabel c
add categoryLabel c to categoryLabelList
endif
J = categorize p by ARTa network
if categoryLabelList[nodePointer[J]] c
temporarily increase vigilance
J = categorize p by ARTa network
reset vigilance
endif
update_weights (wJ)
nodePointer[J] = index categoryLabel in categoryLabelList
end learn
endfor
Because of the combination of match tracking and fast learning, a single ARTMAP system can learn a prediction for a rare event that is different from that for a cloud of similar frequent events in which it is embedded. This means that eventually noise is also learned since the system cannot know beforehand what constitutes the signal and what the noise.
The spirals of the benchmark each make three complete turns in the plane and consist of 97 points as is shown by figure 4 above. The coordinates of the points of the two spirals are
where
and
A trivial solution with Category ART is obtained by selecting for the vigilance parameter = 1. In this case the network learns all patterns in one epoch with 100% correct classification. However, the network uses 194 category nodes for the classification, one node for each training pattern. This amounts to using 770 parameters for the classification: the 194 times 4 connection weights from the F2-nodes in the Fuzzy ART module plus 194 category index pointers to either the first or the second spiral. In a Fuzzy Category ART we have in principle four parameters that determine learning. The first three parameters are determined by the Fuzzy ART module namely the choice parameter , the vigilance parameter , and the learning parameter . The fourth parameter matchtrack determines whether matchtracking is on or off. The parameters that influence most the number of categories and therefore the number of weights, are the vigilance parameter and the matchtrack parameter. When matchtracking is on, the network is capable of raising its vigilance level when a mismatch at the category index level occurs. The most effective strategy to lower the number of categories is to start with matchtracking on and a very low vigilance, i.e., = 0. We performed two series of runs with the vigilance parameter increasing from 0 in steps of size 0.02 to 1.0, the learning parameter fixed at 1, the choice parameter fixed at 0.001. The first series had match tracking on, the second had match tracking off. The results are displayed in figure 5. For all combinations of the parameters the training of Category ART completed within 15 training epochs. When match tracking was on, the percentage correct classification obtained was always 100%. The left plot in figure 5 shows the number of committed nodes as a function of the vigilance level. For vigilance levels smaller than approximately 0.45 the number of committed nodes stays at the very low value of 36. It shows a gradual increase in the number of committed nodes when the vigilance level increases to 0.96, still higher values of the vigilance level show a steep increase in this number. The maximum, 194, is reached when the vigilance level is equal to 1.0. When match tracking is off, the percentage correct drops to 50% when the vigilance level is reduced, as the right plot in figure 5 shows. Because match tracking is off, the number of committed nodes drops much steeper, ultimately to only two committed nodes when the vigilance level drops below 0.4.