Attrasoft Boltzmann machine (ABM)/chap. 5

5. Character Recognition

5.1 Character Generation
5.2 19x19 Character Recognition (1,023 Classes)
5.3 32 by 32 Character Recognition
5.4 19x19 Character Recognition (4,095 classes)
5.5 50x50 and 100x100 Image Recognition

5. Character Recognition

There are six character recognition problems. Five of them are packed in ABM:

5 by 7 character recognition
8 by 8 character recognition
19 by 19 character recognition, 1023 classes
19 by 19 character recognition, 4095 classes
32 by 32 character recognition

Figure 8. Example Menu.

and the other one, 16 by 16 character recognition, has to be linked to ABM by users ( the data files are ocr16x16.trn, and ocr16x16.rec). The parameters are:

Projects 5x7 8x8 19x19(1023) 32x32

#Neurons      46          74             1387                    1152
# Class          11         10              1023                    127
# Training      46          50             1023                    127

Below, we will discuss the 19x19 project with 1,023 and 4,095 classes and the 32x32 project with 127 classes in detail. There is a problem here, not for ABM, but for the users: nobody can learn and memorize 1,000 characters in a few seconds. This turns out not to be a problem; we can introduce a mathematical way to generate the characters. By looking at a character, you can calculate its classification instead of memorize them.

5.1 Character Generation

We get a hint of generating these characters from a digital watch: all digits can be generated from the following frame:

11111111
1              1
1              1
1              1
1              1
1              1
1              1
1              1
1              1
11111111
1              1
1              1
1              1
1              1
1              1
1              1
1              1
1              1
11111111

The above pattern is an '8'. The above frame will generate 127 patterns. Some of them are not distinguishable; for example, you can find six '1' in the above frame ( Left 1, Right 1, Left up 1, Left bottom 1, Right up 1, and Right bottom 1). There are seven line segments, called A, B, C, D, E, F, G:

B D

E F

Each character can be represented by a seven-bit string: gfedcba. For example,

0000001 means that segment A is present, the rest are not there;
0000011 means segments A and B are present;
. . . ;
1111111 means all segments are there, which is an '8'.

These characters are ordered by binary numbers from 0000001 to 1111111, given 127 classes.

To generate more characters, a larger frame has to be used. We will choose the following frame:

11111111   11111111
1                1                 1
1                1                 1
1                1                 1
1                1                 1
1                1                 1
1                1                 1
1                1                 1
1                1                 1
11111111   11111111
1                1                 1
1                1                 1
1                1                 1
1                1                 1
1                1                 1
1                1                 1
1                1                 1
1                1                 1
11111111   11111111

There are 12 line segments, giving more than 4000 patterns.

5.2 19x19 Character Recognition (1,023 Classes)

The 1,023 characters are generated by the following basic lattice:

There are 12 line segments in the lattice. Depending on whether these line segments are there or not, we can generate about 4,000 characters. The line segments are labeled by A, B, C, ... L as follows:

A H

B D I

C J

E F K

G L

Each character can be represented by a 12-bit string: lkjihgfedcba. For example,

00000 00000 01 means segment A is present; the rest are not there;
00000 00000 11 means segments A and B are present;
. . . ;
11111 11111 11 means all segments are there.

These characters are ordered by binary numbers from 00000 00000 01 to 1111 11111 11, giving 4,095 classes.

In the following, the line-segments K and L will not be used. These line-segments are basically straight lines with 8 bits. In the above frame, segments A, C, G, . . . look alike. To distinguish the line segments, we will add an extra bit to these lines. The definitions of these line segments are:

A: 11111111
1

---------------------------------------------

C: 1
11111111

--------------------------------------------

G: 1
11111111

--------------------------------------------

H: 11111111
1

--------------------------------------------

J: 1
11111111

--------------------------------------------

B:      11         D 11         E: 1
           1             1               1
           1             1               1
           1             1               1
           1             1               1
           1             1               1
           1             1               1
           1             1               11

....

The definition of the rest of the line-segments can be found in the training file. The training, and testing files are generated by clicking: "Example/19x19 character". This command will take a few seconds.

The class assignment for each character depends on the value of the binary string. For example,

00000 00000 01 is class 1;
00000 00000 11 is class 3;

... ...

The formula to calculate class is:

class = A + B*2 + C*4 + D*8 + E* 16 + F*32 + G* 64 + H* 128 + I*256 + J * 512 + K * 1024 + L* 2048
K = 0
L = 0

The above formula means:

If line segment A is in the character, add 1, else add 0;
If line segment B is in the character, add 2, else add 0;
... ...
If line segment 'I' is in the character, add 512, else add 0.
... ...

Example Several artificial characters are given below:

Class 1: 11111111
1

---------------------------------------------

Class 4 1
11111111

--------------------------------------------

Class 256 11111111
1

--------------------------------------------

Class 3:

11111111
11
1
1
1
1
1
1
1

Two segments, A and B, form class 3.

--------------------------------------------

Class 7:

11111111
11
1
1
1
1
1
1
1 1
11111111

The training file contains all 1,023 classes. The neural net is organized accordingly: there are 1,026 class-neurons and 19 * 19 = 361 pattern-neurons, giving a total of 1,026 + 361 = 1,387 neurons. The reason to choose 1,026 neurons instead of 1,023 is that the output-vector will be printed 19-bits per line and 1,387 neurons fit into 73 lines.

The training-data-file has the 1,023 classes. The training time is about 5 seconds (400 MHz Pentium). ABM will randomly generate a few characters with some noise and ask the net to recognize it. The recognition time is 0.5- 1 second per character (400 MHz Pentium). The recognition rate is 100%. You can regenerate other testing files by clicking "Examples/19x19 Character recognition" again.

Click "Run/Classification" or "Run/Distribution" to run. Three possibilities exist for each testing pattern:

Correct classification: this is usually the case when the Boltzmann Machine assigns one predominate probability to one of the output possibilities.

Incorrect classification: this is usually the case when: the Boltzmann Machine assigns a predominate probability to more than one configuration; or the relative probability is "small". It is hard to define "small" here because it depends on the problem, but for a given problem, an experienced user will know what is "small".

No classification: this is usually the case when the Boltzmann Machine can find little correlation in the test pattern, based on the current training. In such a case, "No classification " is printed in the output data file.

Note, a neural network can learn 1,000 characters in 5 seconds, faster than the human brain. You can argue these characters are special, but we choose them because it is easy for humans. It does not make any difference to ABM as to what set of characters are used. Try it and you will convince yourself.

5.3 32 by 32 Character Recognition

There are 127 classes:

class = A + B*2 + C*4 + D*8 + E*16 + F*32 + G*64.

There are 127 classes. The neural net is organized accordingly: there are 128 class-neurons and 32 * 32 = 1,024 pattern-vector, giving a total of 1,024 + 128 = 1,152 neurons. The reason to choose 128 class-neurons instead of 127 is that the output-vector will be printed 32-bits per line and 1,152 neurons fit into 36 lines.

The training-data-file has the 127 classes. The training time will be about 3 seconds (400 MHz Pentium). ABM will randomly generate a few characters and asks the net to recognize it. The recognition time will be less than 1 second per character. The recognition rate is 100%. You can regenerate other testing files by clicking "Examples/32x32 Character recognition" again.

5.4 19x19 Character Recognition (4,095 classes)

This is the same as problem as the earlier 19x19 problem, however, the number of classes is increased to 4,095.

Note here, a neural network can learn 4,000 characters in 20 seconds, faster than the human brain. It can recognize 1 of the 4,000 characters in 0.5 second. You can argue these characters are special, but we choose them because it is easy for humans. It does not make any difference to ABM as to what set of characters are used.

5.4 50x50 and 100x100 Image Recognition

Attrasoft provides many software for jpg/gif image recognition. They are:

Internet ImageFinder 3.3
ImageFinder for Windows 95/98. 3.4, and
ImageClassifier, 3.1

In this section, images are in text file. Attrasoft ImageClassifier 3.1 will convert any jpg/gif images into 100x100 images in text format, which can be used by ABM.

To run the 50x50 example, click "Example/50 by 50 Binary Image Recognition". The data files are img50x50.trn and img50x50.rec. Images of a face, a rabbit, a boat, . . . are used to train the neural net. Users can add their own data to the end of the data files. Make sure you check the data file format before running. To check the format, click "Data/Test".

To run the 100x100 example, click "Example/100 by 100 Binary Image Recognition". The data files are img100x100.trn and img100x100.rec. Users can add their own data to the end of the data files. Make sure you check the data file format before running. To check the format, click "Data/Test".