4.1 Trend
4.2 Precision Level
4.3 Trend, Precision and Confidence
4.4 Roles of the Four Parameters
4.5 Directory Input
4.6 Batch Input
4.6.1 Batch Run
4.6.2 Batch File Format
4.6.3 Batch Run Code
4.7 Data selection
4.8 Using Excel as Your Text Editor
4.9 Data Preprocessing Using Excel
4.9.1 Moving-Average
4.10 Example of Preprocessing
4.10.1 Using Original Data
4.10.2 Using the 5-Month Moving Average
4.11 Example of Poor Data
4.12 The Software Limits
4.13 Customized Software

4. Advanced Topics

4.1 Trend

What is an N-trend? Suppose you have 5 rows of data:

3
4
7
2
1

then a 2-trend is the combination of all 2 consecutive numbers together:

(3 4), (4 7), (7 2), (2 1).

A 3-trend is:

(3 4 7), (4 7 2), (7 2 1).

A 4-trend is:

(3 4 7 2), (4 7 2 1).

N-trend means the PredictorPro looks at N rows of data at a time. This is one of the parameters used by the PredictorPro. The N-trend is a user-selected variable. If you do not make a choice, the default value is 5.

To change the N-trend (See Figure 4):

Click �Data/Link�;
Go to �Enter trend�, and enter your number.

Under similar prediction confidence numbers, the longer the Trend, the more accurate the prediction will be.

You have to balance the two sides of this parameter. On one hand, a 10-trend certainly provides a more accurate prediction than a 5-trend. On the other hand, a 10-trend prediction requires a much larger volume of data. One way to increase N is to use preprocessed data. Using the moving-averages can increase N significantly.

4.2 Precision Level

The Precision-level determines the error of each variable. The higher the Precision is, the lower the errors are. This is one of the parameters used by the PredictorPro. The Precision level is a user-selected variable. If you do not make a choice, the default value is 10.

This parameter will directly change the errors in the output file. To change the Precision level (See Figure 4):

Click �Data/Link�;
Go to �Enter precision level�, and enter your number.

You have to balance the two sides of this parameter. On one hand, a 10-level prediction certainly provides more precision than a 5-level prediction. On the other hand, a 10-level prediction requires a much larger volume of data. One way to increase the precision level is to use preprocessed data. Using the moving average can increase the precision level significantly.

4.3 Trend, Precision and Confidence

In general, the longer the trend, the more accurate the prediction will be. The higher the precision, the lower the error(s) will be.

Between these two factors, the Trend is more important than Precision. If possible, always goes for the maximum Trend.

Here is how to set the Trend: as you increase the Trend from low to high, initially, you can see the confidence (rating) increase. If you keep increasing the Trend, for most problems, at certain point the confidence begins to drop. You should establish an acceptable confidence level and that will tell you when to stop.

Example "Example/Intel 2"

Click "Example/Intel 2" to generate the example, which predicts the 5-month moving average of Intel on October 1998. Set Trend = 10 and click "Real/+ Exponential". The PredictorPro produce no results. Oops, Trend = 10 is too high. Set the Trend to 9, and click "Real/+ Exponential" again to get:
Possibility Confidence*Probability
77.8113 16384
---------------------------------------------------
Weighted Average
77.8113
Highest Probability
77.8113 16384
Error of each number
0.859271
Therefore, the maximum Trend is 9.

In general, the higher the Trend value is, the lower the Precision level.

4.4 Roles of the Four Parameters

The four parameters have the following impact on the prediction:

Data volume: one year's data will produce more accurate results than one month�s data;
Number of variables in the data set: more variables requires more data;
Trend: a longer trend requires more data than a shorter trend;
Precision level: a higher precision prediction requires more data than a lower one;

If you run the PredictorPro, and get too much information, you can do one or more of the following:

Reduce the data volume;
Increase the number of variables in the data set;
Increase the Trend;
Increase the Precision level.

If you run the PredictorPro, and get no results, you can do one or more of the following:

Data preprocessing;
Increase the data volume;
Reduce the number of variables in the data set;
Reduce the Trend;
Reduce the Precision level.

Data preprocessing can improve the results significantly. There are noise fluctuations in the data. These noises can be removed from the data by a simple procedure like introducing the moving averages. However, if the noise fluctuation is left in the data, the PredictorPro will think these noise fluctuations are meant to be learned and will treat the noise fluctuation as well-defined patterns in the data. As a result, it will shorten the Trend and reduce the Precision. As we will see, preprocessing (such as the moving average) can be achieved in two steps in Excel, which takes only a few seconds to accomplish.

4.5 Directory Input

The directory input lets you make predictions with all files in a single directory. There are two restrictions:

All the input files in the directory must be �*.txt� type and must follow the Predictor formats. All other files in the directory will be ignored.
The maximum number of files is 2,000.

The operation of the Predictor is �link and run�:

To link:

Open any file in the input directory; this will lead the Predictor to the desired directory.
Click the link-button (4th button) to open the input dialog box (see Figure 4); check the Directory option, and click OK.

To run:

Click a command under Real, Integer, Avg-Max, 3-Average or 10-Average menus.

Note:

After linking, if you have many files, please wait for a few seconds to allow the Predictor to check the format of each file in the directory. The Predictor will tell you how many files are linked to the Predictor.

The output file will list:

Input file; and
Prediction results.

There are two examples for directory input:

Attrasoft Dow 5;
Georgia Lottery Cash 3.

Attrasoft Dow 5 is located at �dj30\� directory and Georgia Cash 3 is located at �cahs3\� directory. Make sure you are in the home directory for these two examples.

Example. Attrasoft Dow 5 :

To link:
     Open "dj30\aa.txt"; this will lead the PredictorPro to directory, "dj30\".
     Click the link-button (4th button) to open the input dialog box;
   Check Directory option, and click OK.
To run:
     Click command "10 Average\+ Real Exponential".
The results are:
********************************************
Input file: axp.txt
54.5826 512
56.5172 136
58.1804 97
59.8207 5870
61.5539 11192
63.6219 14193
65.5341 24433
67.2738 22385
68.9473 22385
71.2637 14193
********************************************
Input file: ba.txt

41.7274 2464
41.4983 1552
41.3772 2224
41.3755 2928
41.1061 17024
40.8174 3584
40.5307 3584
40.246 3584
39.9633 3584
39.6826 3584
Some files have no predictions.

4.6 Batch Input

The batch input makes predictions with the batch file. The batch file is fixed as �batch.txt� in the current directory. This file name cannot be changed. The directory of the batch file can be changed. To run a batch code, this file must be used. The advantage of a fixed batch file makes operating the batch file simple.

The operation of the Predictor is �link and run�. There are two topics here:

Batch run; and
Batch file format.

4.6.1 Batch Run

The operation of the Predictor is �link and run�:

Link:

Click the �Batch/Open� command to open the batch file;
Cut and paste your saved batch file into this file, after clearing the old contents;
Click the save-button (3rd button).

Run: click �Batch/Run� command.

4.6.2 Batch File Format

A typical batch file looks like this:

Example 1, a batch file for �File input�:

*
*
example2b.txt
10
5
10
1
example2a.txt

Example 2, a batch file for �Directory input�:

*
*
example2b.txt
10
5
10
2
a.txt
example2a.txt

The batch file format is:

*
comments
*
Output File Name
Precision
Trend
Batch Run Code
Number of Files
file1.txt
fileN.txt
...
fileN.txt

The batch file has two sections:

Comment section; and
Data section.

The comment section starts with a �*� and ends with another �*�. The purpose of this section is to document your data:

Give a description of the data;
Specify the data set used;
Specify the beginning and the ending dates of the data
. . .

In the data section, the first 3 variables are:

Output File Name
Precision
Trend

These 3 variables define the 3 corresponding variables in the input dialog box in Figure 4.

The 4th variable is Batch Run Code, which defines the command to be used. The first command, �Real/+Linear�, will have code = 1; and the next command will have code = 2, � We will explain the code in the next section.

After the batch run code, the 5th variable, N, is the number of files, which is followed by N files:

N
file1.txt
fileN.txt
...
fileN.txt

N will tell the Predictor how many files will be used. For the file input, N = 1; and for the directory input, N > 1 and N <= 2000.

One quick way to generate a batch file is to use the �Batch/Save� command. This command records the last action, including:

Output File Name
Precision
Trend
Command used
Input files

When creating a batch file using the �Batch/Save� command, it is important to remember a few things:

The only batch file used by the Predictor is �batch.txt� in the current directory. So if you click �Batch/Save� again, the newly saved batch file will overwrite the old one. In general, this does not matter because a batch file can be generated quickly.

Even if you have 1,000 files in a directory and duplicating the last action takes a long time, you can still get a batch file quickly. Use �File Input� and click a command, this will create a batch command code on a single run. Now change the �File Input� to the �Directory Input�, and click �Batch/Save�; you will get your long batch file for the 1,000 files.

If you do not want a file to be overwritten,

Click �Batch/Save� to create a batch file; and
Click �File/Save As� to save the file to a different name.

The Predictor will use �batch.txt� file in the current directory. So if you changed the directory, you will use a different batch file.

A few points worthwhile to remember: when using the batch input, a file can be specified in two ways:

Relative path;
Absolute path.

For example, let a file a file be �c:\abc\def\ghi.txt�, then

The relative path is �ghi.txt�; and
The absolute path is �c:\abc\def\ghi.txt�.

If the relative path, �ghi.txt�, is used to pick up this file, the Predictor must be in the directory, �c:\abc\def\�. Otherwise, the Predictor will not be able to find the file because the Predictor will look for the file from the current directory. If the absolute path, �c:\abc\def\ghi.txt�, is used, then it does not matter where is the Predictor is.

For file input, the �Batch/Save� command will use the relative path. For directory input, �Batch/Save� command will use the absolute path the first time and the relative path the rest of the time. In this way, all paths are absolute paths. However, if you want to use relative path, click �Batch/Save� twice consecutively.

4.6.3 Batch Run Code

For simple prediction, the code is 1, 2, 3, �, 27, starting from the first command:

Real

+ Linear            1
- Linear             2
0 Linear             3
+ Exponential              4
- Exponential              5
0 Exponential              6
+ Linear Enumerative              7
- Linear Enumerative              8
0 Linear Enumerative              9

Integer

+ Linear              10
- Linear               11
0 Linear               12
+ Linear Enumerative              13
- Linear Enumerative               14
0 Linear Enumerative               15

AvgMax

+ Real Linear              16
- Real Linear               17
0 Real Linear               18
+ Real Exponential             19
- Real Exponential             20
0 Real Exponential             21
+ Integer Linear             22
- Integer Linear              23
0 Integer Linear              24
+ Integer Enumerative             25
- Integer Enumerative              26
0 Integer Enumerative              27

The command code runs from 1 to 27. For multiple-step predictions, the two right digits will still be the same. For example, the right 2 digits of �3-Average/ + Real Linear� are 01. This is true for all other commands. The two left digits indicate the number of steps. For example, for the �3-Average� menu, the left two digits are 03, or 3; for the �10-Average� menu, the left two digits are 10. The command codes are:

3-Average

+ Real Linear              301
- Real Linear               302
0 Real Linear               303
+ Real Exponential            304
- Real Exponential             305
0 Real Exponential             306
+ Integer Linear             310
- Integer Linear             311
0 Integer Linear             312

10-Average

+ Real Linear             1001
- Real Linear              1002
0 Real Linear              1003
+ Real Exponential             1004
- Real Exponential             1005
0 Real Exponential             1006
+ Integer Linear             1010
- Integer Linear             1011
0 Integer Linear             1012

There is no need to remember this code. If you need a particular code, use the file input and click the command you will need. This will allow the Predictor to know what command you will need. Then go to a proper directory, choose either directory or file input, and generate the batch code by using �Batch/Save�.

There are two examples for batch files:

Attrasoft Dow 5;
Georgia Lottery Cash 3.

For the �Dow 5� example:

Make sure you are in the home directory;
Click �Example\Attrasoft DJ 30, Batch�;
Click �Batch\Run�.

For the Georgia Cash 3 example:

Make sure you are in the home directory;
Click �Example\Cash 3 Dir, Batch�;
Click �Batch\Run�.

4.7 Data selection

Data preparation for a prediction is the most important factor in the prediction or forecast. Data preparation means two things:

data set; and
the number of rows.

Your data preparation directly influences the results. In this section, we will show you how to prepare data.

The better you choose variables, the better the potential for the prediction. This is the single most productive area to work on. You can increase the information content of your data in several ways:

Adding variables: the first approach is to add variables to the data set. Assuming that a new variable captures some aspect of the problem not already included in the other variables you've chosen to be in the data set, the information content is increased. On one hand, obviously, you should continue to do so until you think all aspects of the problem are covered. Otherwise, the data set will not be complete. On the other hand, each additional variable increases the complexity of the problem exponentially. To compensate for this, more data is required. If your data is limited, then your ability to make a prediction is limited.
Data Preprocessing: This approach transforms the existing data into another form. Typically, this is just a mathematical manipulation of the data. A moving average, for example, is a mathematical manipulation of the underlying price variable. This approach may allow us to extract information more effectively or more efficiently. For stock prediction, we always recommend you to use the moving averages of the underlying stock price.
Removing Unnecessary Variables: This approach removes data that does not increase the information content. Overloading the PredictorPro with useless or redundant information will increase the complexity of the computation exponentially and will require more data. If you do not have enough "rows" for the additional variables, the quality of the prediction can be reduced. Averaging Intel and Microsoft stock prices, for example, is a mathematical manipulation, which covers both factors with only one number.
Increase Data Volume: This approach increases the data volume. Two years of data, for example, is better than one year of data, if the conditions in these two years are basically the same. In reality, there is a limit to the data volume. For the Attrasoft PredictorPro, always go for the maximum amount of available data. The cost of increasing data volume in computation time is almost 0. The data volume is not a factor in the Attrasoft PredictorPro, for the software is designed for a large volume.

Using similar data: It is quite possible that sometimes you have no data. In this case, you might consider using similar data. We will present you with an example in which, despite not having Intel data, we still want to predict Intel stock by substituting Intel with Microsoft, because Microsoft stock is similar to Intel stock. As you will see, the PredictorPro will still be able to make a direct hit.

Once again, data set selection is your job. This is the single most productive area to work in, regardless of the technology employed for your data-mining effort. Your largest contribution to your prediction is right here.

4.8 Using Excel as Your Text Editor

Almost any word processor can be used as the text file editor, Microsoft Word, WordPerfect, Windows Notepad, ..., as long as you save the files in text format. We recommend you to use Microsoft Excel as a text editor. First of all,

Excel opens data in text files directly; and
Excel can save files into text format directly.

Secondly, Excel gives you the power to manipulate the data easily. The functions in Excel you might use repeatedly are:

Opens text data in any format (numbers separated by a space, tab, comma, ...)
Moving average;
Sorting;
Selecting a column or several columns from a data file;
. . .

4.9 Data Preprocessing Using Excel

Data preprocessing transforms the existing data into another form. Typically, this is just a mathematical manipulation of the data. A moving average, for example, is a mathematical manipulation of the underlying price variable. This approach may allow us to extract information more effectively or more efficiently.

4.9.1 Moving-Average

When dealing with data with noise fluctuations, like stock market prediction, always use the moving averages of the underlying variables.

If there are noise fluctuations in the data, these noises can be removed from the data by a simple procedure like moving average. However, if the noise fluctuation is left in the data, the PredictorPro will think these noise fluctuations are meant to be learned and will treat the noise fluctuations as well-defined patterns in the data. As a result, it will shorten the Trend and reduce the Precision of a prediction.

It is an easy job to convert the original data to the moving average data. There are only two steps to convert the data:

Compute the first moving average;
Use "Edit/Fill/Down" to fill in the rest of the moving averages.

Now we will show these two steps in detail. Assume we want to calculate the 5-month moving average of the SP 500. Figure 10 shows the data.

Figure 10. SP 500 data in Excel.

The data is as follows:

A B C

3             DATE             CLOSE
4             Nov-91           375.22
5             Dec-91            417.09
6             Jan-92             408.78
7             Feb-92            412.7
8             Mar-92            403.69
9             Apr-92             414.95
10           May-92            415.35
. . .

Step 1: at the C8-cell, enter "=SUM(B4:B8)/5".
Here "SUM(B4:B8)" means B4 + B5 + B6 + B7 + B8 in Excel language. Please see Figure 11.

Figure 11. First moving average of the SP 500 in C8-cell.

Hit enter and the first moving average is calculated. In this case, the first moving-average is 403.496 (See Figure 12).

Figure 12. First moving average of the SP 500 is in C8-cell.

Step 2: Highlight the cells: C8, C9, C10, ... , (See Figure 12) then click: "Edit/Fill/Down".

Now the moving average computation is completed. The result looks like Figure 13.

Figure 13. The rest of the moving averages of the SP 500 in C-column.

4.10 Example of Preprocessing

In this section, we will compare the results of using the original data and the results of using the moving average data (preprocessed data).

Two examples will be used. Both are packed in the PredictorPro, which can be generated by clicking "Example/Intel 1" and "Example/Intel 2".

4.10.1 Using Original Data

1. Click "Example/Intel 1" and the original data will appear:

5.125
6.125
7.484
7.938
6.906
...
71.438
74.125
84.438
71.188
85.75

The target (for Intel on Oct-98) is 89.188.

2. Click "Real/-- Exponential" and the following results will appear:

Possibility              Confidence*Probability
73.138                  128
95.1938                256
---------------------------------------------------
Weighted Average
87.8418
Highest Probability
95.1938 256
Error of each number
2.20558

Using the weighted-average, the answer is:

87.8418 ± 2.20558 = 87.8 ± 2.2,

This prediction does hit 89.188, however, the confidence number, 256, is extremely low, indicating the PredictorPro does not have much confidence in the data at all. Assume we use 500 as the minimum acceptable confidence-level, then the above prediction would not be used.

4.10.2 Using the 5-Month Moving Average

1. Click "Example/Intel 2" and the 5-month moving average data will appear:

6.7156
7.0282
7.0532
6.9814
6.8876
...
80.2004
78.8254
77.7754
76.4004
77.3878

The target (for the 5-month moving average of Intel on Oct-98) is 80.9378.

2. Click "Real/-- Exponential" and the following results will appear:

Possibility     Confidence*Probability
81.2484       6400
77.8113       10344
76.0928       4235
84.6855       273
88.1226       1
86.404         33
79.5299       168
74.3742       8320
82.9669       261
---------------------------------------------------
Weighted Average
77.476
Highest Probability
77.8113 10344
Error of each number
0.859271

The confidence number is 10,344 in this example, instead of 256 in the last example, which indicates the PredictorPro is far more confident in this case. The prediction, 77.476 ± 0.859271 is close to 80.9378.

4.11 Example of Poor Data

Again we want to predict the 5-month moving average of the Intel for Oct-98, which is 80.9378.

Assume the only data we have for Intel is:

DATE       HIGH         LOW         CLOSE       VOLUME     5m Avg
Jan-98        82.875       70.5           81              339559700
Feb-98       95.625       82.188       89.688       333269000
Mar-98      91.375        74             78.063       425380400
Apr-98       85.063       71.313       80.813       386529700
May-98      85.438       71.25          71.438      271927700       80.2004
Jun-98        77.625       65.656       74.125       360081300       78.8254
Jul-98         88.25         72.25          84.438      396391200       77.7754
Aug-98       92.625       70.938       71.188       440267900       76.4004
Sep-98       89              69.75          85.75        369103300       77.3878
========================================
Oct-98       90.813       75.813        89.188       393720800       80.9378

This will not be enough for the prediction. To get around this problem, we find:

(1) Microsoft and Intel are similar; and
(2) We do have Microsoft data, as listed below:

Monthly prices (Nov 1991 to Nov 1998)
DATE       HIGH       LOW       CLOSE   VOLUME       5m Avg
Nov-91     8.479       7.563       8.104       221809200
Dec-91      9.333       8              9.271      202192800
Jan-92       11.104     9.125       10.021    367232400
Feb-92      10.75       9.417       10.292     348507600
�
May-98     91             81.875     84.813     273845600       84.7564
Jun-98       108.563    83.125     108.375   308288100       91.5126
Jul-98        119.625    105.375   109.938   307324500       96.5502
Aug-98      113.75      95.75       95.938     337509900       97.8378
Sep-98      114.625    94.5         110.063    298081100       101.8254
=========================================
Oct-98       110.125    87.75       105.875    410762300       106.0378

So here is the solution. We will use the 5-month moving average of the Microsoft data to predict Intel. Here is the data:

9.5126
9.7294
9.8918
9.6376
9.398
...
84.7564
91.5126
96.5502
97.8378
101.8254
80.2004
78.8254
77.7754
76.4004
77.3878

The first part of the data is the Microsoft data, the last 5 numbers are the Intel data. Click "Real/+ Exponential", and we have

Possibility      Confidence*Probability
79.0219        26752
76.242           2048
---------------------------------------------------
Weighted Average
78.8242
Highest Probability
79.0219 26752
Error of each number
1.38994

The most likely outcomes are 79.0219 ± 1.39. This is a direct hit on the target of 80.9378.

Click "Real/-- Exponential", and we have the similar results:

Weighted Average
79.6091
Highest Probability
79.0219 34576
Error of each number
1.38994

4.12 The Software Limits

Standard version of the PredictorPro has 100,000 neurons, and a certain number of internal neurons. In your prediction, if you set

M precision-levels;
K variables in the data set; and
N Trend;

then the PredictorPro uses N*M*K neurons. Therefore, the limit is:

N*M*K <= 100,000 neurons

This number can go as high as 1,000,000 in a customized version.

4.13 Customized Software

Customized software can be ordered from Attrasoft upon your request for the following reasons:

Larger External Networks: The current version supports up to 100,000 external neurons. If you want a larger one, it has to be customized. We produce neural networks up to 1,000,000 neurons.
Larger Internal Networks: The current version supports up to 100,000 internal neurons. If you want a larger one, it has to be customized.
Special Data Interface: If you need to interface the PredictorPro with your application data, a customized software can be ordered. Example, missing data, string data, ...

To get updated information:

http://attrasoft.com

Online Order/Online Fax order:
Attrasoft PredictorPro for Windows
Version 2.8 ($494.99 + $5 US Shipping and Handling)

Mail Order: 499.99 (S&H included)

PredictorPro
Attrasoft, Attn.: Gina
P. O. Box 13051
Savannah, GA. 31406, USA

Send questions or comments to:

webmaster@attrasoft.com

or contact us at:
Attrasoft, P. O. Box 13051, Savannah, GA. 31406, USA