2. Total Operation in 2 Clicks

2.1 Overview

2.2 Total Operation in 2 Clicks

2.3 Example: Wisconsin Breast Cancer Database

2.3.1 Operation

2.   Total Operation in 2 Clicks

2.1   Overview
Assume that:

When applied to a particular problem, like cancer, "Problem-Database" will be replaced by "Cancer-Database". Here is your problem: And you are overwhelmed by the data and loose the sense of direction in the data. What are you going to do?
 
 


Order Attrasoft DecisionMaker 2.5.


 






From your Problem-Database, you will generate the first of two files:

Now the DecisionMaker becomes one of your staff, and is ready to serve you. You will prepare the second of two files: After that, in 2 clicks, you will have your answer instantaneously. The answer (3) will be in a file:

2.2   Total Operation in 2 Clicks

  1. From your "Problem-Database", formulate the data into a file (first of 2 files), the Problem-Database file. Put your questions into another file (second of 2 files), the Question file.
  2. Link these two data files to the DecisionMaker (See Figure 1) by clicking "Data/Link" (first of 2 clicks).
  3. Once those 2 files are in the DecisionMaker, click "Integer/+ Predict" (second of 2 clicks).
  4. DecisionMaker presents you with all possible predictions and how valuable the DecisionMaker thinks each prediction is via a confidence number. This number is similar to the number in an Internet Search Engine: the higher that number, the more confidence the DecisionMaker has in that prediction.

 
 

Figure 1. The DecisionMaker


2.3   Example: Wisconsin Breast Cancer Database

More information about the source of Cancer Database can be found in chapter 4.

Cancer is usually evaluated and diagnosed by the following test variables, which is called a data set:

Attribute                                     Domain

--------------------------------------------------------

1. Sample code number             id number

__________________________________________

2. Clump Thickness                     1 - 10

3. Uniformity of Cell Size             1 - 10

4. Uniformity of Cell Shape           1 - 10

5. Marginal Adhesion                     1 - 10

6. Single Epithelial Cell Size         1 - 10

7. Bare Nuclei                             1 - 10

8. Bland Chromatin                     1 - 10

9. Normal Nucleoli                     1 - 10

10. Mitoses                                 1 - 10

___________________________________________

11. Class:                     (2 for benign, 4 for malignant)

Therefore, let us use the cancer test data which doctors/laboratory technicians have already accumulated into a Cancer Database as shown below:

1000025,5,1,1,1,2,1,3,1,1,2

1002945,5,4,4,5,7,10,3,2,1,2

1015425,3,1,1,1,2,2,3,1,1,2

1016277,6,8,8,1,3,4,3,7,1,2

1017023,4,1,1,3,2,1,3,1,1,2

1017122,8,10,10,8,7,10,9,7,1,4

1018099,1,1,1,1,2,10,3,1,1,2

1018561,2,1,2,1,2,1,3,1,1,2

1033078,2,1,1,1,2,1,1,1,5,2

1033078,4,2,1,1,2,1,2,1,1,2

1035283,1,1,1,1,1,1,3,1,1,2

1036172,2,1,1,1,2,1,2,1,1,2

1041801,5,3,3,3,2,3,4,4,1,4

1043999,1,1,1,1,2,3,3,1,1,2

1044572,8,7,5,10,7,9,5,5,4,4

1047630,7,4,6,4,6,1,4,3,1,4

1048672,4,1,1,1,2,1,2,1,1,2

1049815,4,1,1,1,2,1,3,1,1,2

...

The question is: from this data, do the following two patients have cancer (2 for benign, 4 for malignant)?

1050670,10,7,7,6,4,10,4,1,2,?

1050718,6,1,1,1,2,1,3,1,1,?

More information on the Wisconsin Breast Cancer database:

The 683 rows of data will be divided into two parts: Below are the last 20 rows:

Question File                             Answer
 
ID 
1368882
2
1
1
1
2
1
1
1
1
2
1369821
10
10
10
10
5
10
10
10
7
4
1371026
5
10
10
10
4
10
5
6
3
4
1371920
5
1
1
1
2
1
3
2
1
2
466906
1
1
1
1
2
1
1
1
1
2
466906
1
1
1
1
2
1
1
1
1
2
534555
1
1
1
1
2
1
1
1
1
2
536708
1
1
1
1
2
1
1
1
1
2
566346
3
1
1
1
2
1
2
3
1
2
603148
4
1
1
1
2
1
1
1
1
2
654546
1
1
1
1
2
1
1
1
8
2
654546
1
1
1
3
2
1
1
1
1
2
695091
5
10
10
5
4
5
4
4
1
4
714039
3
1
1
1
2
1
1
1
1
2
763235
3
1
1
1
2
1
2
1
2
2
776715
3
1
1
1
3
2
1
1
1
2
841769
2
1
1
1
2
1
1
1
1
2
888820
5
10
10
3
7
3
8
10
2
4
897471
4
8
6
4
3
4
10
6
1
4
897471
4
8
8
5
4
5
10
4
1
4

These 20 rows are further divided into 2 groups:

The DecisionMaker is expected to produce the Answer file, which reflects the correct answer located in the last column.


2.3.1   Operation

All data files must be in the text format and each has three sections: remark, number of columns, and data.

Step 1. Problem-Database file and Question file

The first file is the Problem-Database file, ... in this case ... Cancer-database file, which is used to train the DecisionMaker. Let the name of the file be "cancer1a.txt" and it will look like this:
 
 

=================== Beginning =====================

*

Wisconsin Breast Cancer Database

Cancer-database File

*

10

5 1 1 1 2 1 3 1 1 2

5 4 4 5 7 10 3 2 1 2

3 1 1 1 2 2 3 1 1 2

6 8 8 1 3 4 3 7 1 2

4 1 1 3 2 1 3 1 1 2

...

=================== End ==========================

This Cancer-Database file has three sections:


 
 

Figure 2. The Cancer-Database file.
 
 

The second file is the Question file, which contains the questions you have: do the following 20 patients have cancer?

=================== Beginning =====================

*

Wisconsin Breast Cancer Database

Question File

· benign = 2 or

· malignant = 4.

Command: Average/+ Integer

Precision: 10.

*

9

2 1 1 1 2 1 1 1 1

10 10 10 10 5 10 10 10 7

5 10 10 10 4 10 5 6 3

5 1 1 1 2 1 3 2 1

1 1 1 1 2 1 1 1 1

1 1 1 1 2 1 1 1 1

1 1 1 1 2 1 1 1 1

1 1 1 1 2 1 1 1 1

3 1 1 1 2 1 2 3 1

4 1 1 1 2 1 1 1 1

1 1 1 1 2 1 1 1 8

1 1 1 3 2 1 1 1 1

5 10 10 5 4 5 4 4 1

3 1 1 1 2 1 1 1 1

3 1 1 1 2 1 2 1 2

3 1 1 1 3 2 1 1 1

2 1 1 1 2 1 1 1 1

5 10 10 3 7 3 8 10 2

4 8 6 4 3 4 10 6 1

4 8 8 5 4 5 10 4 1

=================== End ==========================

This Question file has three sections:


 
 

Figure 3. The Question file.
 
 
 
 

Step 2. Link the data to the DecisionMaker

Save the above two files in the same directory where the DecisionMaker is located. Click "Data/Link" (see Figure 4). This is the first of 2 clicks, and a dialog box will ask you for the following (see Figure 5):

1. Problem-Database File Name;

2. Question File Name;

3. Answer File Name;

4. Precision level of the prediction;


 
 

Figure 4. Click command: "Data/Link".
 
 

Figure 5. Link the data files to the DecisionMaker
 
 
 
 

To link the data files, in the first 2 textboxes (Figure 5), enter

The linking is completed ... just that easy.
 
 

Step 3. Click command: "Average/+ Integer"
 
 


 
 

Figure 6. "Average" Menu.

Figure 6 shows the "Average" menu. Click "Average/+ Integer" (second of 2 clicks) to get the answer. The running time is one second, and the following Answer file is opened automatically:

=================== Beginning =====================
 
 

*

Wisconsin Breast Cancer Database

Question File

· benign = 2 or

· malignant = 4.

Command: Average/+ Integer

Precision: 10.

* 2 1 1 1 2 1 1 1 1

2

10 10 10 10 5 10 10 10 7

4

5 10 10 10 4 10 5 6 3

4

5 1 1 1 2 1 3 2 1

2

1 1 1 1 2 1 1 1 1

2

1 1 1 1 2 1 1 1 1

2

1 1 1 1 2 1 1 1 1

2

1 1 1 1 2 1 1 1 1

2

3 1 1 1 2 1 2 3 1

2

4 1 1 1 2 1 1 1 1

2

1 1 1 1 2 1 1 1 8

2

1 1 1 3 2 1 1 1 1

2

5 10 10 5 4 5 4 4 1

4

3 1 1 1 2 1 1 1 1

2

3 1 1 1 2 1 2 1 2

2

3 1 1 1 3 2 1 1 1

2

2 1 1 1 2 1 1 1 1

2

5 10 10 3 7 3 8 10 2

Can not make a Prediction.

To get a prediction, you can:

(1) Add more data;

(2) If you do not have more data; then click 'Data/link' and

reduce the precision level;

(3) You may also consider

to reduce the number of variables in your model.

See User's Guide

4 8 6 4 3 4 10 6 1

Can not make a Prediction.

To get a prediction, you can:

(1) Add more data;

(2) If you do not have more data; then click 'Data/link' and

reduce the precision level;

(3) You may also consider

to reduce the number of variables in your model.

See User's Guide

4 8 8 5 4 5 10 4 1

Can not make a Prediction.

To get a prediction, you can:

(1) Add more data;

(2) If you do not have more data; then click 'Data/link' and

reduce the precision level;

(3) You may also consider

to reduce the number of variables in your model.

See User's Guide

Precision of each number:

0.11

=================== End ==========================
 
 

Step 4. Results

In the Answer file, the remark section: * . . . *, is borrowed from the Question file. Out of the 20 predictions, the DecisionMaker made 17 predictions. These predictions are 100% correct. The DecisionMaker can not handle the last three cases, based on the training received from the 663 instances. More training is required for these 3 instances.

The DecisionMaker is capable of achieving 100% accuracy. Please read chapter 4 on how to achieve 100% accuracy.