attrasoft

Attrasoft

Attrasoft PolyNet 
for Windows 95/98
Version 4.0 (7/1999)
About

New

Image Search Engine

Products

Services

Order

Shareware/Demos

Future Products

Links

Search This Site


All Attrasoft products are Y2K Compliant!

2. Total Operation in 2-4 Clicks 
2.1 Total Operation in 2-4 Clicks 
2.2 Example: Wisconsin Breast Cancer Database 
2.2.1 Database Source 
2.2.2 Training and Recognition Files 
2.2.3 Operation 
2.3 The '5 by 7' Character Recognition Problem 
2.3.1 Step 1: Training and Recognition Files 
2.3.2 Step 2: Link the Two Files
2.3.3 Step 3: Set up PolyNet
2.3.4 Step 4: Running PolyNet

  2.   Total Operation in 2-4 Clicks
 

The detailed user's guide will be given in chapter 3 and 4. In this chapter, we will show you the minimum information to get started. We will use two examples, Cancer diagnosis and character recognition.

2.1   Total Operation in 2-4 Clicks
 

Prepare all your training data in one file and all your recognition data in another file, then in 2-4 clicks, you will have your answer instantaneously. The procedure is 

Step 1. Put the training data into the training file; Put your recognition data into the recognition file; together you prepare two files.

Step 2. Link the two files to the PolyNet by clicking "Data/Link".

Step 3. Once the files are in the PolyNet, set up symmetry and output format for PolyNet if necessary;

Step 4. Click 'D' button (Decimal/Distribution command) for a distribution or 'C' button for a classification (Decimal/Classification command).

2.2   Example: Wisconsin Breast Cancer Database

The question is do you have cancer or not? You must have

  • Training file: a cancer database; 
  • Recognition file: which represents the laboratory test results minus the conclusion (whether or not the patient has cancer).
2.2.1   Database Source

1. Sources: This breast cancer database was obtained from Dr. William H. Wolberg, University of Wisconsin Hospital, Madison. Please see: O. L. Mangasarian and W. H. Wolberg: "Cancer diagnosis via linear programming", SIAM News, Volume 23, Number 5, September 1990, pp. 1 & 18.

2. Database Information

  • Number of Instances: 699 
  • The 16 instances with missing attribute values are removed from the database, leaving 683 instances.
  • Number of Attributes: 10 plus the class attribute 
  • Attributes 2 through 10 will be used to represent instances. 
  • Each instance has one of 2 possible classes: benign or malignant.
  • Class distribution: 
Benign: 458 (65.5%)
Malignant: 241 (34.5%)
3. Attribute Information: 

#                      Attribute                               Domain
_______________________________________

1.                   Sample code number                id number
________________________________________

2.                    Clump Thickness                      0 - 9
3.                    Uniformity of Cell Size              0 - 9
4.                    Uniformity of Cell Shape           0 - 9
5.                    Marginal Adhesion                    0 - 9
6.                    Single Epithelial Cell Size           0 - 9
7.                    Bare Nuclei                              0 - 9
8.                    Bland Chromatin                       0 - 9
9.                    Normal Nucleoli                        0 - 9
10.                  Mitoses                                     0 - 9
________________________________________

11.                 Class:                      (1 for benign, 3 for malignant)

Note that the original data is the range 1 - 10. The data has been changed to 0 -9.
 
 

4. Database

Below are the first 10 instances:

4 0 0 0 1 0 2 0 0 1
4 3 3 4 6 9 2 1 0 1
2 0 0 0 1 1 2 0 0 1
5 7 7 0 2 3 2 6 0 1
3 0 0 2 1 0 2 0 0 1
7 9 9 7 6 9 8 6 0 3
0 0 0 0 1 9 2 0 0 1
1 0 1 0 1 0 2 0 0 1
1 0 0 0 1 0 0 0 4 1
3 1 0 0 1 0 1 0 0 1

Below are the last 20 instances, which will be used to test the PolyNet's accuracy:

Question Answer

1 0 0 0 1 0 0 0 0 1
9 9 9 9 4 9 9 9 6 3 
4 9 9 9 3 9 4 5 2 3 
4 0 0 0 1 0 2 1 0 1 
0 0 0 0 1 0 0 0 0 1 
0 0 0 0 1 0 0 0 0 1 
0 0 0 0 1 0 0 0 0 1 
0 0 0 0 1 0 0 0 0 1 
2 0 0 0 1 0 1 2 0 1 
3 0 0 0 1 0 0 0 0 1 
0 0 0 0 1 0 0 0 7 1 
0 0 0 2 1 0 0 0 0 1 
4 9 9 4 3 4 3 3 0 3 
2 0 0 0 1 0 0 0 0 1 
2 0 0 0 1 0 1 0 1 1 
2 0 0 0 2 1 0 0 0 1 
1 0 0 0 1 0 0 0 0 1 
4 9 9 2 6 2 7 9 1 3 
3 7 5 3 2 3 9 5 0 3 
3 7 7 4 3 4 9 3 0 3
 

2.2.2   Training and Recognition Files

Cancer is usually evaluated and diagnosed by the above test variables, which is called a data set. Assume we have the cancer test data which doctors/laboratory technicians have already accumulated into a Cancer Database. The question is: from this data, do the following two patients have cancer (1 for benign, 3 for malignant)?

10,7,7,6,4,10,4,1,2,?

6,1,1,1,2,1,3,1,1,?

The 683 rows of data in the Wisconsin Breast Cancer database will be divided into two parts: 

  • The first 663 rows will comprise the Cancer-Database training-file; and 
  • The last 20 rows will be used to check the accuracy of the PolyNet.
The last 20 rows are further divided into 2 groups:
  • The first group consists of column 2 - 10. It forms the recognition-file.
  • The last row provides the correct diagnoses concerning whether a patient has cancer or not. The results (1 for benign, 3 for malignant) will be replaced by 'x' in the recognition-file.
The PolyNet is expected to produce the output-file. In the output-file, 'x' will be replaced by 1 (for benign), or 3 (for malignant), which should reflect the correct answer located in the last column.
 

2.2.3   Operation

All data files must be in the text format and each has three sections: remark, number of columns, and data.

Step 1. Training file and Recognition file

The first file, ... in this case ... Cancer-database file, is used to train the PolyNet. Let the name of the file be "cancer1a.txt" and it will look like this:
============== Beginning =============
*
Wisconsin Breast Cancer Database
Training File
· benign = 1 or 
· malignant = 3.
Command: Decimal/Classification
*

10

4 0 0 0 1 0 2 0 0 1
4 3 3 4 6 9 2 1 0 1
2 0 0 0 1 1 2 0 0 1
5 7 7 0 2 3 2 6 0 1
3 0 0 2 1 0 2 0 0 1
...
 

============= End =============
 


 
 

Figure 2. The Cancer-Database Training File. 
 
 

This Cancer-Database file has three sections:

  • Remark section: * . . . *: this section will be ignored by the PolyNet
  • Number of variables/attributes: 10
  • Data 
The second file is the recognition-file, which contains the questions you have: do the following 20 patients have cancer?

============== Beginning =============
*
Wisconsin Breast Cancer Database
Recognition File
· benign = 1 or 
· malignant = 3.
Command: Decimal/Classification
*

1 0 0 0 1 0 0 0 0 x 
9 9 9 9 4 9 9 9 6 x 
4 9 9 9 3 9 4 5 2 x 
. . . 

============= End =============

This recognition file has 2 sections:

  • Remark section: * . . . *: this section will be ignored by the PolyNet
  • Data 

Figure 3. The Recognition file. 

Step 2. Link the data to the PolyNet (1 or 2 click(s) )

To link the cancer data, all you do is to click "Example/Cancer Database". The data will be linked and the training file, cancer1a.txt, and the recognition file, cancer1b.txt, will be opened at this point.

In general, save the training and recognition files in a directory. Click "File/Open" to open one of the two files. This will lead the PolyNet to the directory. Click "Data/Link" (see Figure 4) and a dialog box will ask you for the following (see Figure 5):
 

1. Training File Name; 
2. Recognition Name;
3. Output File Name;



 
 

Figure 4. Click command: "Data/Link".
 
 

Figure 5. Link the data files to the PolyNet
 
 

To link the data files, in the first 2 textboxes (Figure 5), enter

  • cancer1a.txt
  • cancer1b.txt
The linking is completed ... just that easy.
 
 

Step 3. Once the files are in the PolyNet, set up symmetry and output format for PolyNet if necessary;
 

Do nothing here.


Step 4. Click 'D' button (Decimal/Distribution command) for a distribution or 'C' button for a classification (Decimal/Classification command).
 
 


 
 

Figure 6. "Decimal" Menu.

Figure 6 shows the "Decimal" menu. Click "Decimal/Classification" to get the answer. The following output file is opened instantaneous:

============== Beginning =============

100010000x
1000100001 3523 1

999949996x
9999499963 81 1

499939452x
4999394523 71 1

...

============= End =============

In the first output:

100010000x
1000100001 3523 1
The first line is input, the second line is the output with 'x' being replaced by '1'. Here 3532 is a confidence number, similar to the confidence number in an Internet search engine. All 20 predictions are correct, a 100% success rate.
 

2.3   The '5 by 7' Character Recognition Problem

In this section, we will introduce the '5 by 7' character recognition problem.
 

2.3.1   Step 1: Training and Recognition Files

Characters 0, 1, 2, 3, ... look like this:
 

01100
10010
10010
10010
10010
10010
01100
 

00100
01100
00100
00100
00100
00100
01110

01110
00001
00001
01110
10000
10000
01110
...


Their classifications are represented by class neurons. For example, we can use 11 neurons to represent the classes. Only one of the 11 neurons is '1' and the other ten are '0'. In particular, 

  • class "0" is 10000 00000 0;
  • class "1" is 01000 00000 0;
  • ... 
  • class "9" is 00000 00001 0;
  • class "other than digits" is 00000 00000 1.
An example of a training pattern ( character '1') is: 

01000 00000 0

00100
01100
00100
00100
00100
00100
01110


which indicates the pattern (character '1')

00100
01100
00100
00100
00100
00100
01110
is in class "1": 01000 00000 0.

The training file is a list of images like:

01000 00000 0

00100
01100
00100
00100
00100
00100
01110


which consists of a (pattern, class) pair. To test the neural network, only patterns like,

00100
01100
00100
00100
00100
00100
01110


are provided to the neural net; the classification, 01000 00000 0, is replaced by unknown bits:

xxxxx xxxxx x

The neural net has to determine that for the above pattern, xxxxx xxxxx x is 01000 00000 0. The recognition file is a list of images like:

xxxxx xxxxx x

00100
01100
00100
00100
00100
00100
01110


The unknown pattern xxx...xxx can be anywhere inside a test pattern. To see the complete training and recognition files, click: 

Example/5x7 Character.

This action will generate 46 training-characters, and 10 recognition-characters. The two files will also be opened by this click.

2.3.2   Step 2: Link the Two Files

You can link the two files by using 'Data/Link' and enter the file names. The default names are

Training file:             chara.txt
Recognition file:       charb.txt
Output file:              example1c.txt

When you click "Example/5x7 characters", the training and recognition files are generated and linked to the PolyNet

2.3.3   Step 3: Set up PolyNet

Before you can train the network, you have to set up the network. The setup deals with:

  • Setup Output File Format;
  • Setup Symmetry;
To set up the output file format, use the command "Data/Breaker". For the 5x7-character recognition, this command specifies that the output should be printed on eight lines. Line 1 is the classification, and the next seven lines print 5x7 characters.

To set up the symmetry, use the command "Data/Symmetry". The character recognition problem has 2-dimensional x- and y-translation symmetries. Symmetry very is important for training the network

When you click "Example/5x7 characters", the output format and the symmetry are setup for you by PolyNet. Click "Data/Symmetry" to see the symmetry settings. We will explain this later.
 

2.3.4   Step 4: Running PolyNet

Now that the network is initialized and the data files are linked, it is the time to run. There are two commands for classification:

  • Decimal/Classification
  • Decimal/Distribution
The first command simulates the Hopfield Model and will provide you with a classification. The second command simulates the Boltzmann Machine and will provide you with a distribution. 

Click: Decimal/Classification. The results from this command are obtained immediately. The output file, which contains the results, will be opened at this point. It looks like this:
 

xxxxxxxxxxx
00100
07090
09020
08010
02070
00600
00000

80000000000
00100
07090
09020
08010
02070
00600
00000
341 1

xxxxxxxxxxx
00000
00600
04060
09080
08040
02060
00500

80000000000
00000
00600
04060
09080
08040
02060
00500
372 1
...


PolyNet first prints the recognition pattern, then prints the classification. In this case, the first pattern is a 5x7 character '0' and its classification is:

10000000000.

The classification is 100% correct in this example.
 
 
 
 

Online Order or if you prefer, Online Fax order: 
PolyNet for Windows 95/98 
Version 4.0 ($124.99 + $5 US Shipping and Handling)

Mail Order: $129.99 (S&H included)

PolyNet 4.0
Attrasoft, Attn.: Gina
P. O. Box 13051
Savannah, GA. 31406, USA
Products New Lottery Stock jpg/gif Order Order Help Demos
Please join Attrasoft Discussion Group.
Send questions or comments to:
webmaster@attrasoft.com
or contact us at: 
          Attrasoft, P. O. Box 13051, Savannah, GA. 31406, USA
Copyright © 1998 Attrasoft, Inc. All rights reserved.