2.2 Total Operation in 2 Clicks
2.3 Example: Wisconsin Breast Cancer Database
2. Total Operation in 2 Clicks
Order Attrasoft DecisionMaker 2.5.
From your Problem-Database, you will generate the first of two files:
2.2 Total Operation in 2 Clicks
Figure 1. The DecisionMaker
2.3 Example: Wisconsin Breast Cancer Database
More information about the source of Cancer Database can be found in chapter 4.
Cancer is usually evaluated and diagnosed by the following test variables, which is called a data set:
Attribute Domain
--------------------------------------------------------
1. Sample code number id number
__________________________________________
2. Clump Thickness 1 - 10
3. Uniformity of Cell Size 1 - 10
4. Uniformity of Cell Shape 1 - 10
5. Marginal Adhesion 1 - 10
6. Single Epithelial Cell Size 1 - 10
7. Bare Nuclei 1 - 10
8. Bland Chromatin 1 - 10
9. Normal Nucleoli 1 - 10
10. Mitoses 1 - 10
___________________________________________
11. Class: (2 for benign, 4 for malignant)
Therefore, let us use the cancer test data which doctors/laboratory technicians have already accumulated into a Cancer Database as shown below:
1000025,5,1,1,1,2,1,3,1,1,2
1002945,5,4,4,5,7,10,3,2,1,2
1015425,3,1,1,1,2,2,3,1,1,2
1016277,6,8,8,1,3,4,3,7,1,2
1017023,4,1,1,3,2,1,3,1,1,2
1017122,8,10,10,8,7,10,9,7,1,4
1018099,1,1,1,1,2,10,3,1,1,2
1018561,2,1,2,1,2,1,3,1,1,2
1033078,2,1,1,1,2,1,1,1,5,2
1033078,4,2,1,1,2,1,2,1,1,2
1035283,1,1,1,1,1,1,3,1,1,2
1036172,2,1,1,1,2,1,2,1,1,2
1041801,5,3,3,3,2,3,4,4,1,4
1043999,1,1,1,1,2,3,3,1,1,2
1044572,8,7,5,10,7,9,5,5,4,4
1047630,7,4,6,4,6,1,4,3,1,4
1048672,4,1,1,1,2,1,2,1,1,2
1049815,4,1,1,1,2,1,3,1,1,2
...
The question is: from this data, do the following two patients have cancer (2 for benign, 4 for malignant)?
1050670,10,7,7,6,4,10,4,1,2,?
1050718,6,1,1,1,2,1,3,1,1,?
More information on the Wisconsin Breast Cancer database:
Question File
Answer
ID
|
||||||||||
1368882
|
2
|
1
|
1
|
1
|
2
|
1
|
1
|
1
|
1
|
2
|
1369821
|
10
|
10
|
10
|
10
|
5
|
10
|
10
|
10
|
7
|
4
|
1371026
|
5
|
10
|
10
|
10
|
4
|
10
|
5
|
6
|
3
|
4
|
1371920
|
5
|
1
|
1
|
1
|
2
|
1
|
3
|
2
|
1
|
2
|
466906
|
1
|
1
|
1
|
1
|
2
|
1
|
1
|
1
|
1
|
2
|
466906
|
1
|
1
|
1
|
1
|
2
|
1
|
1
|
1
|
1
|
2
|
534555
|
1
|
1
|
1
|
1
|
2
|
1
|
1
|
1
|
1
|
2
|
536708
|
1
|
1
|
1
|
1
|
2
|
1
|
1
|
1
|
1
|
2
|
566346
|
3
|
1
|
1
|
1
|
2
|
1
|
2
|
3
|
1
|
2
|
603148
|
4
|
1
|
1
|
1
|
2
|
1
|
1
|
1
|
1
|
2
|
654546
|
1
|
1
|
1
|
1
|
2
|
1
|
1
|
1
|
8
|
2
|
654546
|
1
|
1
|
1
|
3
|
2
|
1
|
1
|
1
|
1
|
2
|
695091
|
5
|
10
|
10
|
5
|
4
|
5
|
4
|
4
|
1
|
4
|
714039
|
3
|
1
|
1
|
1
|
2
|
1
|
1
|
1
|
1
|
2
|
763235
|
3
|
1
|
1
|
1
|
2
|
1
|
2
|
1
|
2
|
2
|
776715
|
3
|
1
|
1
|
1
|
3
|
2
|
1
|
1
|
1
|
2
|
841769
|
2
|
1
|
1
|
1
|
2
|
1
|
1
|
1
|
1
|
2
|
888820
|
5
|
10
|
10
|
3
|
7
|
3
|
8
|
10
|
2
|
4
|
897471
|
4
|
8
|
6
|
4
|
3
|
4
|
10
|
6
|
1
|
4
|
897471
|
4
|
8
|
8
|
5
|
4
|
5
|
10
|
4
|
1
|
4
|
These 20 rows are further divided into 2 groups:
All data files must be in the text format and each has three sections: remark, number of columns, and data.
Step 1. Problem-Database file and Question file
The first file is the Problem-Database file, ... in this case ... Cancer-database
file, which is used to train the DecisionMaker. Let the name of
the file be "cancer1a.txt" and it will look like this:
=================== Beginning =====================
*
Wisconsin Breast Cancer Database
Cancer-database File
*
10
5 1 1 1 2 1 3 1 1 2
5 4 4 5 7 10 3 2 1 2
3 1 1 1 2 2 3 1 1 2
6 8 8 1 3 4 3 7 1 2
4 1 1 3 2 1 3 1 1 2
...
=================== End ==========================
This Cancer-Database file has three sections:
Figure 2. The Cancer-Database file.
The second file is the Question file, which contains the questions you have: do the following 20 patients have cancer?
=================== Beginning =====================
*
Wisconsin Breast Cancer Database
Question File
· benign = 2 or
· malignant = 4.
Command: Average/+ Integer
Precision: 10.
*
9
2 1 1 1 2 1 1 1 1
10 10 10 10 5 10 10 10 7
5 10 10 10 4 10 5 6 3
5 1 1 1 2 1 3 2 1
1 1 1 1 2 1 1 1 1
1 1 1 1 2 1 1 1 1
1 1 1 1 2 1 1 1 1
1 1 1 1 2 1 1 1 1
3 1 1 1 2 1 2 3 1
4 1 1 1 2 1 1 1 1
1 1 1 1 2 1 1 1 8
1 1 1 3 2 1 1 1 1
5 10 10 5 4 5 4 4 1
3 1 1 1 2 1 1 1 1
3 1 1 1 2 1 2 1 2
3 1 1 1 3 2 1 1 1
2 1 1 1 2 1 1 1 1
5 10 10 3 7 3 8 10 2
4 8 6 4 3 4 10 6 1
4 8 8 5 4 5 10 4 1
=================== End ==========================
This Question file has three sections:
Figure 3. The Question file.
Step 2. Link the data to the DecisionMaker
Save the above two files in the same directory where the DecisionMaker is located. Click "Data/Link" (see Figure 4). This is the first of 2 clicks, and a dialog box will ask you for the following (see Figure 5):
1. Problem-Database File Name;
2. Question File Name;
3. Answer File Name;
4. Precision level of the prediction;
Figure 4. Click command: "Data/Link".
Figure 5. Link the data files to the DecisionMaker
To link the data files, in the first 2 textboxes (Figure 5), enter
Step 3. Click command: "Average/+ Integer"
Figure 6. "Average" Menu.
Figure 6 shows the "Average" menu. Click "Average/+ Integer" (second of 2 clicks) to get the answer. The running time is one second, and the following Answer file is opened automatically:
=================== Beginning =====================
*
Wisconsin Breast Cancer Database
Question File
· benign = 2 or
· malignant = 4.
Command: Average/+ Integer
Precision: 10.
* 2 1 1 1 2 1 1 1 1
2
10 10 10 10 5 10 10 10 7
4
5 10 10 10 4 10 5 6 3
4
5 1 1 1 2 1 3 2 1
2
1 1 1 1 2 1 1 1 1
2
1 1 1 1 2 1 1 1 1
2
1 1 1 1 2 1 1 1 1
2
1 1 1 1 2 1 1 1 1
2
3 1 1 1 2 1 2 3 1
2
4 1 1 1 2 1 1 1 1
2
1 1 1 1 2 1 1 1 8
2
1 1 1 3 2 1 1 1 1
2
5 10 10 5 4 5 4 4 1
4
3 1 1 1 2 1 1 1 1
2
3 1 1 1 2 1 2 1 2
2
3 1 1 1 3 2 1 1 1
2
2 1 1 1 2 1 1 1 1
2
5 10 10 3 7 3 8 10 2
Can not make a Prediction.
(1) Add more data;
(2) If you do not have more data; then click 'Data/link' and
reduce the precision level;
(3) You may also consider
to reduce the number of variables in your model.
See User's Guide
Can not make a Prediction.
(1) Add more data;
(2) If you do not have more data; then click 'Data/link' and
reduce the precision level;
(3) You may also consider
to reduce the number of variables in your model.
See User's Guide
Can not make a Prediction.
(1) Add more data;
(2) If you do not have more data; then click 'Data/link' and
reduce the precision level;
(3) You may also consider
to reduce the number of variables in your model.
See User's Guide
0.11
=================== End ==========================
Step 4. Results
In the Answer file, the remark section: * . . . *, is borrowed from the Question file. Out of the 20 predictions, the DecisionMaker made 17 predictions. These predictions are 100% correct. The DecisionMaker can not handle the last three cases, based on the training received from the 663 instances. More training is required for these 3 instances.
The DecisionMaker is capable of achieving 100% accuracy. Please read
chapter 4 on how to achieve 100% accuracy.