In this chapter, we will define the data format used by this software. The data files must be in text files.
The procedure is:
Step 1. Problem-Database file and Question file;
Step 2. Link the 2 files to the DecisionMaker;
Step 3. Click a command.
All 4 files used in this chapter are in your software package. They are:
Let a 3-variable function be defined by:
y1 = ( x1 + x2 + x3 ) Mod 7;
y2 = x1;
y3 = x2 Mod 5;
We will generate the data as follows: assume initially that (x1, x2, x3) is in state (1 0 0), then it will generate (1 1 0). Now we will use (x1, x2, x3) = (1 1 0) as an question, then it will generate an answer (2 1 1), ... ...
The parameters of this problem are:
1 0 0 1 1 0
1 1 0 2 1 1
2 1 1 4 2 1
4 2 1 0 4 2
0 4 2 6 0 4
6 0 4 3 6 0
3 6 0 2 3 1
2 3 1 6 2 3
6 2 3 4 6 2
4 6 2 5 4 1
5 4 1 3 5 4
3 5 4 5 3 0
5 3 0 1 5 3
1 5 3 2 1 0
2 1 0 3 2 1
3 2 1 6 3 2
6 3 2 4 6 3
4 6 3 6 4 1
6 4 1 4 6 4
4 6 4 0 4 1
0 4 1 5 0 4
5 0 4 2 5 0
2 5 0 0 2 0
0 2 0 2 0 2
2 0 2 4 2 0
4 2 0 6 4 2
6 4 2 5 6 4
5 6 4 1 5 1
1 5 1 0 1 0
0 1 0 1 0 1
1 0 1 2 1 0
2 1 0 3 2 1
3 2 1 6 3 2
6 3 2 4 6 3
4 6 3 6 4 1
6 4 1 4 6 4
4 6 4 0 4 1
0 4 1 5 0 4
5 0 4 2 5 0
2 5 0 0 2 0
0 2 0 2 0 2
2 0 2 4 2 0
4 2 0 6 4 2
6 4 2 5 6 4
5 6 4 1 5 1
1 5 1 0 1 0
0 1 0 1 0 1
1 0 1 2 1 0
2 1 0 3 2 1
3 2 1 6 3 2
6 3 2 4 6 3
4 6 3 6 4 1
6 4 1 4 6 4
4 6 4 0 4 1
0 4 1 5 0 4
5 0 4 2 5 0
2 5 0 0 2 0
0 2 0 2 0 2
2 0 2 4 2 0
4 2 0 6 4 2
6 4 2 5 6 4
5 6 4 1 5 1
1 5 1 0 1 0
0 1 0 1 0 1
---------------------------------------------------
1 0 1 2 1 0
2 1 0 3 2 1
3 2 1 6 3 2
6 3 2 4 6 3
4 6 3 0 0 0
The 69 rows of data are divided into two parts:
Question Answer
1 0 1 2 1 0
2 1 0 3 2 1
3 2 1 6 3 2
6 3 2 4 6 3
4 6 3 0 0 0
The first group,
Question
1 0 1
2 1 0
3 2 1
6 3 2
4 6 3
forms the "Question file" for the DecisionMaker and the DecisionMaker is expected to take this "Question file" and provide the following Answer file:
Question Answer
1 0 1 2 1 0
2 1 0 3 2 1
3 2 1 6 3 2
6 3 2 4 6 3
4 6 3 0 0 0
Step 1. Math-Database file and Question file
The first file is the Math-Database file, which is used to educate the
DecisionMaker.
Let the file name be "mathex1a.txt" and it looks like this:
=================== Beginning =====================
*
Let a system be defined by:
y1 = ( x1 + x2 + x3 ) Mod 7;
y2 = x1;
y3 = x2 Mod 5;
We will generate the data as follows: assume initially that (x1, x2, x3) is in state (1 0 0), then it will generate (1 1 0). Now we will use (x1, x2, x3) = (1 1 0) as an question, then it will generate an answer (2 1 1), ...
*
6
1 0 0 1 1 0
1 1 0 2 1 1
2 1 1 4 2 1
4 2 1 0 4 2
0 4 2 6 0 4
... ...
4 2 0 6 4 2
6 4 2 5 6 4
5 6 4 1 5 1
1 5 1 0 1 0
0 1 0 1 0 1
=================== End ==========================
This Problem-Database file has three sections:
=================== Beginning =====================
*
y1 = ( x1 + x2 + x3 ) Mod 7;
y2 = x1;
y3 = x2 Mod 5;
*
3
1 0 1
2 1 0
3 2 1
6 3 2
4 6 3
=================== End ==========================
This Question file has three sections:
Save the above two files in the same directory where the DecisionMaker is located. Click "Data/Link" (see Figure 4; first of 2 clicks) and the dialog box asks for the following:
1. Problem-Database File Name;
2. Question File Name;
3. Answer File Name;
4. Precision level of the prediction;
To link the data files (Figure 5), enter
Step 3. Click command: "Integer/+ Predict"
Figure 7. The "Integer" commands
Figure 7 shows the "Integer" menu. Click "Integer/+ Predict" (second of 2 clicks) and the following Answer file is opened automatically:
=================== Beginning =====================
*
y1 = ( x1 + x2 + x3 ) Mod 7;
y2 = x1;
y3 = x2 Mod 5;
*
1 0 1
Possibility Confidence*Probability
1 1 0 2048
2 1 0 24576
------------------------------------------------------
2 1 0
2 1 0
Possibility Confidence*Probability
1 1 0 1024
2 1 1 2048
3 2 1 32768
1 0 1 3072
------------------------------------------------------
3 2 1
3 2 1
Possibility Confidence*Probability
1 1 0 32
2 1 1 64
4 2 1 1024
2 1 0 384
3 2 1 1024
6 3 2 32768
2 0 2 384
1 0 1 96
------------------------------------------------------
6 3 2
6 3 2
Possibility Confidence*Probability
4 2 1 4
0 4 2 128
6 2 3 32
1 5 3 128
3 2 1 4
6 3 2 128
4 6 3 32768
4 2 0 24
6 4 2 96
------------------------------------------------------
4 6 3
4 6 3
Possibility Confidence*Probability
0 4 2 8
6 0 4 4
2 3 1 32
6 2 3 2
5 4 1 2048
2 1 0 64
6 3 2 8
6 4 1 24576
5 0 4 3
0 2 0 12
6 4 2 6
0 1 0 12
0 0 0 8192
------------------------------------------------------
5 3 1
Precision of each number:
0.33 0.33 0.22
=================== End ==========================
Step 4. Results
This Answer file has three sections:
1 0 1
Possibility Confidence*Probability
1 1 0 2048
2 1 0 24576
------------------------------------------------------
2 1 0
Th first line, 1 0 1, is the question.
The possibilities are:
1 1 0 2048
2 1 0 24576
The odds are 24576 to 2048 in favor of pattern 2 1 0, which in this case is the correct answer.
The line "---- ... ----" indicates the end of the distribution.
The data set is all parameters used to define specifics in a problem. These parameters are used to present an instance.
Example: The data set of the cancer database is:
Attribute Domain
--------------------------------------------------------
1. Sample code number id number
__________________________________________
2. Clump Thickness 1 - 10
3. Uniformity of Cell Size 1 - 10
4. Uniformity of Cell Shape 1 - 10
5. Marginal Adhesion 1 - 10
6. Single Epithelial Cell Size 1 - 10
7. Bare Nuclei 1 - 10
8. Bland Chromatin 1 - 10
9. Normal Nucleoli 1 - 10
10. Mitoses 1 - 10
___________________________________________
11. Class: (2 for benign, 4 for malignant)
You also have to determine the precision level of the variables. The default value is set to 10.
As a general rule, the more data you have, the better the outcome will be. The complexity of the prediction grows exponentially with these parameters:
Also, do not pass the limit of the Attrasoft neural network, and the limits are:
The number of variables * precision level < 65,000, and
The number of variables < 10,000.
3.2.2. Arranging Variables/Attributes
Once you have selected the data set, they have to be arranged in the following order: input variables first, output variables last.
Example Consider:
y1 = ( x1 + x2 + x3 ) Mod 7;
y2 = x1;
y3 = x2 Mod 5;
The correct order is:
x1, x2, x3, y1, y2, y3
There are two files required to run the DecisionMaker:
Here is a Problem-Database file:
Example The following is a complete example:
=================== Beginning =====================
*
Let a system be defined by:
y1 = ( x1 + x2 + x3 ) Mod 7;
y2 = x1;
y3 = x2 Mod 5;
We will generate the data as follows: assume initially that (x1, x2, x3) is in state (1 0 0), then it will generate (1 1 0). Now we will use (x1, x2, x3) = (1 1 0) as an question, then it will generate an answer (2 1 1), ...
*
6
1 0 0 1 1 0
1 1 0 2 1 1
2 1 1 4 2 1
4 2 1 0 4 2
0 4 2 6 0 4
... ...
1 0 1 2 1 0
2 1 0 3 2 1
3 2 1 6 3 2
6 3 2 4 6 3
4 6 3 0 0 0
=================== End ==========================
The Problem-Database data file has three sections:
Section 3 contains Problem Database: one record per row.
The Question file follows the exact same format as the above. The number of variables in the Question file must be smaller than the number of variables in the Problem-Database file.
* * *
For Mathematicians Only: The black box defines the following function:
y2 = f2(x1, x2, ..., xn)
...
ym = fm(x1, x2, ..., xn)
y1, y2, ..., ym (from the neural computation).
* * *
3.3
Step 2. Link Data Files To DecisionMaker
Three files are used:
Problem-Database file: example2a.txt
Question file: example2b.txt
Answer file: example2c.txt
These default files are automatically connected to the software.
When you use the DecisionMaker, you have link your data files to DecisionMaker by clicking: "Data|Link", or its button on the toolbar, then type in the data file names.
Assume your files are:
cancer1a.txt
cancer1b.txt
These two files must be in the same directory. Here is how to link:
Method 1:
Integer = numbers like 0, 2, 300;
Average = Weighted average.
The weighted average of
5 30%
6 50%
7 20%
is 5*0.30 + 6 * 0.50 + 7 *0.20.
Under menu "Real", there are six commands (See Figure 8):
Figure 8. The "Real" commands
These are the primary commands. These commands present you with all possible predictions and how valuable the DecisionMaker thinks that prediction is via a confidence number (the higher that number, the more confidence the DecisionMaker has in that prediction).
Now we will explain these commands:
In the above commands:
i.e. the result is likely to be 5.6, but could be in the range [5.4, 5.9], then :
Please replace numbers like, .234, by 0.234.
From time to time, the numbers in your database will be all integers. A pure integer presentation will look better than real number presentation. Therefore, integer commands are provided for your convenience. Under menu "Integer", there are six commands:
The default Answer file name is example2c.txt. You can change the file name by clicking: "Data|Link", or its button on the toolbar, and type in your new data file name.
A Confidence & Probability number will be to the right of each answer prediction. The number indicates how valuable the DecisionMaker thinks that prediction is. The higher that that number, the more confidence the DecisionMaker has in that prediction
Here is an answer file generated by clicking "Integer/+ Predict":
=================== Beginning =====================
*
y1 = ( x1 + x2 + x3 ) Mod 7;
y2 = x1;
y3 = x2 Mod 5;
*
1 0 1
Possibility Confidence*Probability
1 1 0 2048
2 1 0 24576
------------------------------------------------------
2 1 0
2 1 0
Possibility Confidence*Probability
1 1 0 1024
2 1 1 2048
3 2 1 32768
1 0 1 3072
------------------------------------------------------
3 2 1
3 2 1
Possibility Confidence*Probability
1 1 0 32
2 1 1 64
4 2 1 1024
2 1 0 384
3 2 1 1024
6 3 2 32768
2 0 2 384
1 0 1 96
------------------------------------------------------
6 3 2
6 3 2
Possibility Confidence*Probability
4 2 1 4
0 4 2 128
6 2 3 32
1 5 3 128
3 2 1 4
6 3 2 128
4 6 3 32768
4 2 0 24
6 4 2 96
------------------------------------------------------
4 6 3
4 6 3
Possibility Confidence*Probability
0 4 2 8
6 0 4 4
2 3 1 32
6 2 3 2
5 4 1 2048
2 1 0 64
6 3 2 8
6 4 1 24576
5 0 4 3
0 2 0 12
6 4 2 6
0 1 0 12
0 0 0 8192
------------------------------------------------------
5 3 1
Precision of each number:
0.33 0.33 0.22
=================== End ==========================
This answer file has three sections:
1 0 1
Possibility Confidence*Probability
1 1 0 2048
2 1 0 24576
------------------------------------------------------
2 1 0
Th first line, 1 0 1, is the question.
The answers are:
Possibility Confidence*Probability
1 1 0 2048
2 1 0 24576
------------------------------------------------------
Their probabilities are:
2048/(2048 + 24576) and
24576/(2048 + 24576),
respectively. The odds are 10 to 1 in favor of pattern 2 1 0, which in this case is the correct answer. The line "---- ... ----" indicates the end of the distribution.
Answer: (y1, y2, ..., ym).
(y1, y2, ..., ym)1,
(y1, y2, ..., ym)2,
(y1, y2, ..., ym)3,
. . . ,
(y1, y2, ..., ym)l
and a set of relative probabilities be:
The Answer file looks like this
*
The remark section from the Question file
*
(y1, y2, ..., ym)1, p1
(y1, y2, ..., ym)2, p2
(y1, y2, ..., ym)3, p3
. . . ,
(y1, y2, ..., ym)l , pl
---------------------------------
(a1, a2, ... , am )
Error:
(e1, e2, ... , em)
where
We have discussed 12 commands; each one gives a distribution. Sometimes, you might not be interested in the distribution, but only in a simple answer. The "Average" commands give you one simple answer: the weighted-average. There are 12 average commands.
If you want one simple answer, which is the one with the highest probability, a customized version has to be ordered from Attrasoft ($1000).
3.4.5 Confidence of a Prediction
A Confidence & Probability number will be to the right of each answer prediction. Among all the numbers in a prediction, the highest number is called the Confidence Number. The higher this number is, the more reliable the result is. For each prediction, one Confidence Number is produced. This number is similar to the number in an Internet Search Engine: the higher that number, the more confidence the DecisionMaker has in that prediction. Unfortunately, no quantitative description can be made for the Confidence Number. You can get a feel for this Confidence Number from experience.
The parameters of the prediction are:
All files used in this section are in your software package. Let a mapping
be defined by:
y1 = ( x1 + x2 + x3 ) Mod 7;y2 = x1;
y3 = x2 Mod 5; 60%
y3 = ( x2 + 1 ) Mod 5; 40%
y4 = ( x4 + 1 ) Mod 5; 60%
y4 = ( x4 + 2 ) Mod 5; 40%
y5 = x4 Mod 3.
Note that by design, each question can have 4 different answers.
All four answers are correct and each answer has certain probability to
appear. The parameters of this problem are:
9999 rows can generated in this way:
1 0 0 0 0 1 1 0 1 0
1 1 0 1 0 2 1 2 3 1
2 1 2 3 1 5 2 1 0 0
5 2 1 0 0 1 5 3 2 0
1 5 3 2 0 2 1 0 4 2
2 1 0 4 2 3 2 2 1 1
3 2 2 1 1 0 3 3 2 1
The 9999 rows of data are divided into two parts:
Question Answer
6 6 1 3 1 6 6 1 4 0
6 6 1 4 0 6 6 1 0 1
6 6 1 0 1 6 6 1 1 0
6 6 1 1 0 6 6 1 3 1
6 6 1 3 1 6 6 1 4 0
They are further divided into 2 groups, as indicated above. The first group,
Question
6 6 1 3 1
6 6 1 4 0
6 6 1 0 1
6 6 1 1 0
6 6 1 3 1
forms the Question file for the DecisionMaker and the DecisionMaker is expected to take the question file and to arrive at the following Answer file:
Question Answer
6 6 1 3 1 6 6 1 4 0
6 6 1 4 0 6 6 1 0 1
6 6 1 0 1 6 6 1 1 0
6 6 1 1 0 6 6 1 3 1
6 6 1 3 1 6 6 1 4 0
Step 1. Problem Database file and Question file
The first file is the Problem-Database file, which is used to educate
the DecisionMaker. Let the file be "mathex2a.txt" and it looks like
this:
=================== Beginning =====================
*
Let a system be defined by:
y1 = ( x1 + x2 + x3 ) Mod 7;
y2 = x1;
y3 = x2 Mod 5; 60%
y3 = ( x2 + 1 ) Mod 5; 40%
y4 = ( x4 + 1 ) Mod 5; 60%
y4 = ( x4 + 2 ) Mod 5; 40%
y5 = x4 Mod 3.
We will generate the data as follows: assume initially that (x1, x2, x3, x4, x5) is in state (1 0 0 0 0), then it will generate (1 1 0 1 0). Now we will use (x1, x2, x3, x4, x5) = (1 1 0 1 0) as an question, then it will generate an answer (2 1 2 3 1), ...
*
10
1 0 0 0 0 1 1 0 1 0
1 1 0 1 0 2 1 2 3 1
2 1 2 3 1 5 2 1 0 0
5 2 1 0 0 1 5 3 2 0
1 5 3 2 0 2 1 0 4 2
2 1 0 4 2 3 2 2 1 1
3 2 2 1 1 0 3 3 2 1
... ...
=================== End ==========================
This Problem-Database file has three sections:
=================== Beginning =====================
*
...
*
5
6 6 1 3 1
6 6 1 4 0
6 6 1 0 1
6 6 1 1 0
6 6 1 3 1
=================== End ==========================
This Question file has three sections:
Save the above two files in the same directory where the DecisionMaker is located. Click "Data/Link" (see Figure 4, first of 2 clicks) and the dialog box asks for the following:
1. Problem-Database File Name;
2. Question File Name;
3. Answer File Name;
4. Precision level of the prediction;
To link the data files, in the first 2 textboxes, enter
Step 3. Click command: "Integer/0 Predict"
Click "Integer/0 Predict"(second of 2 clicks) and the following Answer file is opened automatically:
=================== Beginning =====================
*
y1 = ( x1 + x2 + x3 ) Mod 7;
y2 = x1;
y3 = x2 Mod 5; 60%
y3 = ( x2 + 1 ) Mod 5; 40%
y4 = ( x4 + 1 ) Mod 5; 60%
y4 = ( x4 + 2 ) Mod 5; 40%
y5 = x4 Mod 3.
*
6 6 1 3 1
Possibility Confidence*Probability
6 6 1 4 0 65536
6 6 1 0 0 65536
6 6 2 4 0 163840
6 6 2 0 0 32768
------------------------------------------------------
6 6 2 3 0
6 6 1 4 0
Possibility Confidence*Probability
6 6 1 0 1 229376
6 6 2 1 1 163840
6 6 1 1 1 32768
6 6 2 0 1 65536
------------------------------------------------------
6 6 1 0 1
6 6 1 0 1
Possibility Confidence*Probability
6 6 1 1 0 294912
6 6 2 1 0 98304
6 6 1 2 0 65536
6 6 2 2 0 32768
------------------------------------------------------
6 6 1 1 0
6 6 1 1 0
Possibility Confidence*Probability
6 6 1 2 1 262144
6 6 2 3 1 131072
6 6 1 3 1 98304
6 6 2 2 1 65536
------------------------------------------------------
6 6 1 2 1
6 6 1 3 1
Possibility Confidence*Probability
6 6 1 4 0 65536
6 6 1 0 0 65536
6 6 2 4 0 163840
6 6 2 0 0 32768
------------------------------------------------------
6 6 2 3 0
Precision of each number:
0.33 0.33 0.22 0.22 0.11
=================== End ==========================
4. Results
This Answer file has three sections:
6 6 1 3 1
Possibility Confidence*Probability
6 6 1 4 0 65536
6 6 1 0 0 65536
6 6 2 4 0 163840
6 6 2 0 0 32768
------------------------------------------------------
6 6 2 3 0
Th first line, 6 6 1 3 1, is the Question.
The possibilities are:
6 6 1 0 0 65536
6 6 2 4 0 163840
6 6 2 0 0 32768
Customized software can be ordered from Attrasoft upon your request for the following reasons:
Example: Assume X1 and X2 contribute a classification in a similar fashion, then
Correct conversion:
Attribute Value
X1 good(1), ok(2), bad(3)
X2 good(1), ok(2), bad(3)
Wrong conversion:
Attribute Value
X1 good(1), ok(2), bad(3)
X2 good(3), ok(2), bad(1)
http://attrasoft.com
and click the email button.
We encourage all comments from our prospective customers. These comments are delivered directly to our new product development department. Your comments directly affect our future products ....... , we aim to please you.
To make a comment, go to:
http://attrasoft.com
and click the email button.