4. Advanced Topics
4.1 Trend
4.2 Precision Level
4.3 Trend, Precision and Confidence
4.4 Roles of the Four Parameters
4.5 Directory Input
4.6 Batch Input
4.6.1 Batch Run
4.6.2 Batch File Format
4.6.3 Batch Run Code
4.7 Data selection
4.8 Using Excel as Your
Text Editor
4.9 Data Preprocessing
Using Excel
4.9.1 Moving-Average
4.10 Example of Preprocessing
4.10.1 Using Original Data
4.10.2 Using the 5-Month
Moving Average
4.11 Example of Poor Data
4.12 The Software Limits
4.13 Customized Software
4. Advanced Topics
4.1 Trend
What is an N-trend? Suppose you have 5 rows of data:
3
4
7
2
1
then a 2-trend is the combination of all 2 consecutive numbers together:
(3 4), (4 7), (7 2), (2 1).
A 3-trend is:
(3 4 7), (4 7 2), (7 2 1).
A 4-trend is:
(3 4 7 2), (4 7 2 1).
N-trend means the PredictorPro looks at N rows of data at a time.
This is one of the parameters used by the PredictorPro. The N-trend
is a user-selected variable. If you do not make a choice, the default value
is 5.
To change the N-trend (See Figure 4):
-
Click �Data/Link�;
-
Go to �Enter trend�, and enter your number.
Under similar prediction confidence numbers, the longer the Trend,
the more accurate the prediction will be.
You have to balance the two sides of this parameter. On one hand, a
10-trend certainly provides a more accurate prediction than a 5-trend.
On the other hand, a 10-trend prediction requires a much larger volume
of data. One way to increase N is to use preprocessed data. Using
the moving-averages can increase N significantly.
4.2 Precision Level
The Precision-level determines the error of each variable. The higher
the Precision is, the lower the errors are. This is one of the parameters
used by the PredictorPro. The Precision level is a user-selected
variable. If you do not make a choice, the default value is 10.
This parameter will directly change the errors in the output file. To
change the Precision level (See Figure 4):
-
Click �Data/Link�;
-
Go to �Enter precision level�, and enter your number.
You have to balance the two sides of this parameter. On one
hand, a 10-level prediction certainly provides more precision than a 5-level
prediction. On the other hand, a 10-level prediction requires a much larger
volume of data. One way to increase the precision level is to use
preprocessed data. Using the moving average can increase the precision
level significantly.
4.3 Trend, Precision
and Confidence
In general, the longer the trend, the more accurate the prediction
will be. The higher the precision, the lower the error(s) will be.
Between these two factors, the Trend is more important than
Precision. If possible, always goes for the maximum Trend.
Here is how to set the Trend: as you increase the Trend from low
to high, initially, you can see the confidence (rating) increase. If you
keep increasing the Trend, for most problems, at certain point the confidence
begins to drop. You should establish an acceptable confidence level and
that will tell you when to stop.
Example "Example/Intel 2"
Click "Example/Intel 2" to generate the example, which predicts
the 5-month moving average of Intel on October 1998. Set Trend = 10 and
click "Real/+ Exponential". The PredictorPro produce no results.
Oops, Trend = 10 is too high. Set the Trend to 9, and click "Real/+ Exponential"
again to get:
Possibility
Confidence*Probability
77.8113
16384
---------------------------------------------------
Weighted Average
77.8113
Highest Probability
77.8113 16384
Error of each number
0.859271
Therefore, the maximum Trend is 9.
In general, the higher the Trend value is, the lower the Precision
level.
4.4 Roles of the
Four Parameters
The four parameters have the following impact on the prediction:
-
Data volume: one year's data will produce more accurate results than one
month�s data;
-
Number of variables in the data set: more variables requires more data;
-
Trend: a longer trend requires more data than a shorter trend;
-
Precision level: a higher precision prediction requires more data than
a lower one;
If you run the PredictorPro, and get too much information,
you can do one or more of the following:
-
Reduce the data volume;
-
Increase the number of variables in the data set;
-
Increase the Trend;
-
Increase the Precision level.
If you run the PredictorPro, and get no results, you can do one
or more of the following:
-
Data preprocessing;
-
Increase the data volume;
-
Reduce the number of variables in the data set;
-
Reduce the Trend;
-
Reduce the Precision level.
Data preprocessing can improve the results significantly.
There are noise fluctuations in the data. These noises can be removed from
the data by a simple procedure like introducing the moving averages. However,
if the noise fluctuation is left in the data, the PredictorPro will
think these noise fluctuations are meant to be learned and will treat the
noise fluctuation as well-defined patterns in the data. As a result, it
will shorten the Trend and reduce the Precision. As we will see, preprocessing
(such as the moving average) can be achieved in two steps in Excel, which
takes only a few seconds to accomplish.
4.5 Directory Input
The directory input lets you make predictions with all files in a single
directory. There are two restrictions:
-
All the input files in the directory must be �*.txt� type and must follow
the Predictor formats. All other files in the directory will be ignored.
-
The maximum number of files is 2,000.
The operation of the Predictor is �link and run�:
To link:
-
Open any file in the input directory; this will lead the Predictor to
the desired directory.
-
Click the link-button (4th button) to open the input dialog box (see
Figure 4); check the Directory option, and click OK.
To run:
-
Click a command under Real, Integer, Avg-Max, 3-Average or 10-Average
menus.
Note:
After linking, if you have many files, please wait for a
few seconds to allow the Predictor to check the format of each file in
the directory. The Predictor will tell you how many files are linked to
the Predictor.
The output file will list:
Input file; and
Prediction results.
There are two examples for directory input:
-
Attrasoft Dow 5;
-
Georgia Lottery Cash 3.
Attrasoft Dow 5 is located at �dj30\� directory and Georgia Cash 3 is located
at �cahs3\� directory. Make sure you are in the home directory for
these two examples.
Example. Attrasoft Dow 5 :
To link:
Open "dj30\aa.txt"; this will lead the PredictorPro
to directory, "dj30\".
Click the link-button (4th button) to open
the input dialog box;
Check Directory option, and click OK.
To run:
Click command "10 Average\+ Real Exponential".
The results are:
********************************************
Input file: axp.txt
54.5826 512
56.5172 136
58.1804 97
59.8207 5870
61.5539 11192
63.6219 14193
65.5341 24433
67.2738 22385
68.9473 22385
71.2637 14193
********************************************
Input file: ba.txt
41.7274 2464
41.4983 1552
41.3772 2224
41.3755 2928
41.1061 17024
40.8174 3584
40.5307 3584
40.246 3584
39.9633 3584
39.6826 3584
Some files have no predictions.
4.6 Batch Input
The batch input makes predictions with the batch file. The batch file
is fixed as �batch.txt� in the current directory. This file name cannot
be changed. The directory of the batch file can be changed. To run a batch
code, this file must be used. The advantage of a fixed batch file makes
operating the batch file simple.
The operation of the Predictor is �link and run�. There are two topics
here:
Batch run; and
Batch file format.
4.6.1 Batch Run
The operation of the Predictor is �link and run�:
Link:
-
Click the �Batch/Open� command to open the batch file;
-
Cut and paste your saved batch file into this file, after clearing
the old contents;
-
Click the save-button (3rd button).
Run: click �Batch/Run� command.
4.6.2 Batch File Format
A typical batch file looks like this:
Example 1, a batch file for �File input�:
*
*
example2b.txt
10
5
10
1
example2a.txt
Example 2, a batch file for �Directory input�:
*
*
example2b.txt
10
5
10
2
a.txt
example2a.txt
The batch file format is:
*
comments
*
Output File Name
Precision
Trend
Batch Run Code
Number of Files
file1.txt
fileN.txt
...
fileN.txt
The batch file has two sections:
Comment section; and
Data section.
The comment section starts with a �*� and ends with another �*�.
The purpose of this section is to document your data:
-
Give a description of the data;
-
Specify the data set used;
-
Specify the beginning and the ending dates of the data
-
. . .
In the data section, the first 3 variables are:
-
Output File Name
-
Precision
-
Trend
These 3 variables define the 3 corresponding variables in the input
dialog box in Figure 4.
The 4th variable is Batch Run Code, which defines the command to be
used. The first command, �Real/+Linear�, will have code = 1; and the next
command will have code = 2, � We will explain the code in the next section.
After the batch run code, the 5th variable, N, is the number of files,
which is followed by N files:
N
file1.txt
fileN.txt
...
fileN.txt
N will tell the Predictor how many files will be used. For the file
input, N = 1; and for the directory input, N > 1 and N <= 2000.
One quick way to generate a batch file is to use the �Batch/Save� command.
This command records the last action, including:
Output File Name
Precision
Trend
Command used
Input files
When creating a batch file using the �Batch/Save� command, it is
important to remember a few things:
The only batch file used by the Predictor is �batch.txt� in the current
directory. So if you click �Batch/Save� again, the newly saved batch file
will overwrite the old one. In general, this does not matter because a
batch file can be generated quickly.
Even if you have 1,000 files in a directory and duplicating the last
action takes a long time, you can still get a batch file quickly. Use �File
Input� and click a command, this will create a batch command code on a
single run. Now change the �File Input� to the �Directory Input�, and click
�Batch/Save�; you will get your long batch file for the 1,000 files.
If you do not want a file to be overwritten,
Click �Batch/Save� to create a batch file; and
Click �File/Save As� to save the file to a different name.
The Predictor will use �batch.txt� file in the current directory.
So if you changed the directory, you will use a different batch file.
A few points worthwhile to remember: when using the batch input, a file
can be specified in two ways:
-
Relative path;
-
Absolute path.
For example, let a file a file be �c:\abc\def\ghi.txt�, then
The relative path is �ghi.txt�; and
The absolute path is �c:\abc\def\ghi.txt�.
If the relative path, �ghi.txt�, is used to pick up this file, the Predictor
must be in the directory, �c:\abc\def\�. Otherwise, the Predictor will
not be able to find the file because the Predictor will look for the file
from the current directory. If the absolute path, �c:\abc\def\ghi.txt�,
is used, then it does not matter where is the Predictor is.
For file input, the �Batch/Save� command will use the relative path.
For directory input, �Batch/Save� command will use the absolute path the
first time and the relative path the rest of the time. In this way, all
paths are absolute paths. However, if you want to use relative path, click
�Batch/Save� twice consecutively.
4.6.3 Batch Run Code
For simple prediction, the code is 1, 2, 3, �, 27, starting from the
first command:
Real
+ Linear
1
- Linear
2
0 Linear
3
+ Exponential
4
- Exponential
5
0 Exponential
6
+ Linear Enumerative
7
- Linear Enumerative
8
0 Linear Enumerative
9
Integer
+ Linear
10
- Linear
11
0 Linear
12
+ Linear Enumerative
13
- Linear Enumerative
14
0 Linear Enumerative
15
AvgMax
+ Real Linear
16
- Real Linear
17
0 Real Linear
18
+ Real Exponential
19
- Real Exponential
20
0 Real Exponential
21
+ Integer Linear
22
- Integer Linear
23
0 Integer Linear
24
+ Integer Enumerative
25
- Integer Enumerative
26
0 Integer Enumerative
27
The command code runs from 1 to 27. For multiple-step predictions,
the two right digits will still be the same. For example, the right 2 digits
of �3-Average/ + Real Linear� are 01. This is true for all other commands.
The two left digits indicate the number of steps. For example, for the
�3-Average� menu, the left two digits are 03, or 3; for the �10-Average�
menu, the left two digits are 10. The command codes are:
3-Average
+ Real Linear
301
- Real Linear
302
0 Real Linear
303
+ Real Exponential
304
- Real Exponential
305
0 Real Exponential
306
+ Integer Linear
310
- Integer Linear
311
0 Integer Linear
312
10-Average
+ Real Linear
1001
- Real Linear
1002
0 Real Linear
1003
+ Real Exponential
1004
- Real Exponential
1005
0 Real Exponential
1006
+ Integer Linear
1010
- Integer Linear
1011
0 Integer Linear
1012
There is no need to remember this code. If you need a particular
code, use the file input and click the command you will need. This will
allow the Predictor to know what command you will need. Then go to a proper
directory, choose either directory or file input, and generate the batch
code by using �Batch/Save�.
There are two examples for batch files:
-
Attrasoft Dow 5;
-
Georgia Lottery Cash 3.
For the �Dow 5� example:
-
Make sure you are in the home directory;
-
Click �Example\Attrasoft DJ 30, Batch�;
-
Click �Batch\Run�.
For the Georgia Cash 3 example:
-
Make sure you are in the home directory;
-
Click �Example\Cash 3 Dir, Batch�;
-
Click �Batch\Run�.
4.7 Data selection
Data preparation for a prediction is the most important factor in
the prediction or forecast. Data preparation means two things:
-
data set; and
-
the number of rows.
Your data preparation directly influences the results.
In this section, we will show you how to prepare data.
The better you choose variables, the better the potential for
the prediction. This is the single most productive area to work on. You
can increase the information content of your data in several ways:
Adding variables: the first approach is to add variables
to the data set. Assuming that a new variable captures some aspect of the
problem not already included in the other variables you've chosen to be
in the data set, the information content is increased. On one hand, obviously,
you should continue to do so until you think all aspects of the problem
are covered. Otherwise, the data set will not be complete. On the other
hand, each additional variable increases the complexity of the problem
exponentially. To compensate for this, more data is required. If your data
is limited, then your ability to make a prediction is limited.
Data Preprocessing: This approach transforms the existing data
into another form. Typically, this is just a mathematical manipulation
of the data. A moving average, for example, is a mathematical manipulation
of the underlying price variable. This approach may allow us to extract
information more effectively or more efficiently. For stock prediction,
we always recommend you to use the moving averages of the underlying stock
price.
Removing Unnecessary Variables: This approach removes data that
does not increase the information content. Overloading the PredictorPro
with useless or redundant information will increase the complexity of the
computation exponentially and will require more data. If you do not have
enough "rows" for the additional variables, the quality of the prediction
can be reduced. Averaging Intel and Microsoft stock prices, for example,
is a mathematical manipulation, which covers both factors with only one
number.
Increase Data Volume: This approach increases the data volume.
Two years of data, for example, is better than one year of data, if the
conditions in these two years are basically the same. In reality, there
is a limit to the data volume. For the Attrasoft PredictorPro, always
go for the maximum amount of available data. The cost of increasing data
volume in computation time is almost 0. The data volume is not a factor
in the Attrasoft PredictorPro, for the software is designed for
a large volume.
Using similar data: It is quite possible that sometimes
you have no data. In this case, you might consider using similar data.
We will present you with an example in which, despite not having Intel
data, we still want to predict Intel stock by substituting Intel with Microsoft,
because Microsoft stock is similar to Intel stock. As you will see, the
PredictorPro will still be able to make a direct hit.
Once again, data set selection is your job. This is the single
most productive area to work in, regardless of the technology employed
for your data-mining effort. Your largest contribution to your prediction
is right here.
4.8 Using
Excel as Your Text Editor
Almost any word processor can be used as the text file editor, Microsoft
Word, WordPerfect, Windows Notepad, ..., as long as you save the files
in text format. We recommend you to use Microsoft Excel as a text editor.
First of all,
-
Excel opens data in text files directly; and
-
Excel can save files into text format directly.
Secondly, Excel gives you the power to manipulate the data easily.
The functions in Excel you might use repeatedly are:
-
Opens text data in any format (numbers separated by a space, tab, comma,
...)
-
Moving average;
-
Sorting;
-
Selecting a column or several columns from a data file;
-
. . .
4.9
Data Preprocessing Using Excel
Data preprocessing transforms the existing data into another form. Typically,
this is just a mathematical manipulation of the data. A moving average,
for example, is a mathematical manipulation of the underlying price variable.
This approach may allow us to extract information more effectively or more
efficiently.
4.9.1 Moving-Average
When dealing with data with noise fluctuations, like stock
market prediction, always use the moving averages of the underlying variables.
If there are noise fluctuations in the data, these noises can be removed
from the data by a simple procedure like moving average. However, if the
noise fluctuation is left in the data, the PredictorPro will think
these noise fluctuations are meant to be learned and will treat the noise
fluctuations as well-defined patterns in the data. As a result, it will
shorten the Trend and reduce the Precision of a prediction.
It is an easy job to convert the original data to the moving average
data. There are only two steps to convert the data:
-
Compute the first moving average;
-
Use "Edit/Fill/Down" to fill in the rest of the moving averages.
Now we will show these two steps in detail. Assume we want to calculate
the 5-month moving average of the SP 500. Figure 10 shows the data.
Figure 10. SP 500 data in Excel.
The data is as follows:
A
B
C
3
DATE
CLOSE
4
Nov-91 375.22
5
Dec-91
417.09
6
Jan-92
408.78
7
Feb-92
412.7
8
Mar-92
403.69
9
Apr-92
414.95
10 May-92
415.35
. . .
Step 1: at the C8-cell, enter "=SUM(B4:B8)/5".
Here "SUM(B4:B8)" means B4 + B5 + B6 + B7 + B8 in Excel language. Please
see Figure 11.
Figure 11. First moving average of the SP 500 in C8-cell.
Hit enter and the first moving average is calculated. In this case,
the first moving-average is 403.496 (See Figure 12).
Figure 12. First moving average of the SP 500 is in C8-cell.
Step 2: Highlight the cells: C8, C9, C10, ... , (See Figure
12) then click: "Edit/Fill/Down".
Now the moving average computation is completed. The result looks like
Figure 13.
Figure 13. The rest of the moving averages of the SP 500 in C-column.
4.10 Example of Preprocessing
In this section, we will compare the results of using the original data
and the results of using the moving average data (preprocessed data).
Two examples will be used. Both are packed in the PredictorPro,
which can be generated by clicking "Example/Intel 1" and "Example/Intel
2".
4.10.1 Using Original
Data
1. Click "Example/Intel 1" and the original data will appear:
5.125
6.125
7.484
7.938
6.906
...
71.438
74.125
84.438
71.188
85.75
The target (for Intel on Oct-98) is 89.188.
2. Click "Real/-- Exponential" and the following results will
appear:
Possibility
Confidence*Probability
73.138
128
95.1938
256
---------------------------------------------------
Weighted Average
87.8418
Highest Probability
95.1938 256
Error of each number
2.20558
Using the weighted-average, the answer is:
87.8418 ± 2.20558 = 87.8 ± 2.2,
This prediction does hit 89.188, however, the confidence number,
256, is extremely low, indicating the PredictorPro does not have
much confidence in the data at all. Assume we use 500 as the minimum
acceptable confidence-level, then the above prediction would not be
used.
4.10.2
Using the 5-Month Moving Average
1. Click "Example/Intel 2" and the 5-month moving average data will
appear:
6.7156
7.0282
7.0532
6.9814
6.8876
...
80.2004
78.8254
77.7754
76.4004
77.3878
The target (for the 5-month moving average of Intel on Oct-98) is
80.9378.
2. Click "Real/-- Exponential" and the following results will
appear:
Possibility Confidence*Probability
81.2484 6400
77.8113 10344
76.0928 4235
84.6855 273
88.1226 1
86.404 33
79.5299 168
74.3742 8320
82.9669 261
---------------------------------------------------
Weighted Average
77.476
Highest Probability
77.8113 10344
Error of each number
0.859271
The confidence number is 10,344 in this example, instead of 256
in the last example, which indicates the PredictorPro is far more
confident in this case. The prediction, 77.476 ± 0.859271
is close to 80.9378.
4.11 Example of
Poor Data
Again we want to predict the 5-month moving average of the Intel for
Oct-98, which is 80.9378.
Assume the only data we have for Intel is:
DATE HIGH
LOW CLOSE
VOLUME 5m Avg
Jan-98 82.875
70.5 81
339559700
Feb-98 95.625
82.188 89.688
333269000
Mar-98 91.375
74
78.063 425380400
Apr-98 85.063
71.313 80.813
386529700
May-98 85.438
71.25 71.438
271927700 80.2004
Jun-98 77.625
65.656 74.125
360081300 78.8254
Jul-98 88.25
72.25 84.438
396391200 77.7754
Aug-98 92.625
70.938 71.188
440267900 76.4004
Sep-98 89
69.75 85.75
369103300 77.3878
========================================
Oct-98 90.813
75.813 89.188
393720800 80.9378
This will not be enough for the prediction. To get around this problem,
we find:
(1) Microsoft and Intel are similar; and
(2) We do have Microsoft data, as listed below:
Monthly prices (Nov 1991 to Nov 1998)
DATE HIGH
LOW CLOSE VOLUME
5m Avg
Nov-91 8.479
7.563 8.104
221809200
Dec-91 9.333
8
9.271 202192800
Jan-92 11.104
9.125 10.021 367232400
Feb-92 10.75
9.417 10.292
348507600
�
May-98 91
81.875 84.813 273845600
84.7564
Jun-98 108.563
83.125 108.375 308288100
91.5126
Jul-98 119.625
105.375 109.938 307324500
96.5502
Aug-98 113.75
95.75 95.938
337509900 97.8378
Sep-98 114.625 94.5
110.063 298081100
101.8254
=========================================
Oct-98 110.125
87.75 105.875 410762300
106.0378
So here is the solution. We will use the 5-month moving average of the
Microsoft data to predict Intel. Here is the data:
9.5126
9.7294
9.8918
9.6376
9.398
...
84.7564
91.5126
96.5502
97.8378
101.8254
80.2004
78.8254
77.7754
76.4004
77.3878
The first part of the data is the Microsoft data, the last 5 numbers
are the Intel data. Click "Real/+ Exponential", and we have
Possibility Confidence*Probability
79.0219 26752
76.242
2048
---------------------------------------------------
Weighted Average
78.8242
Highest Probability
79.0219 26752
Error of each number
1.38994
The most likely outcomes are 79.0219 ± 1.39. This is a direct
hit on the target of 80.9378.
Click "Real/-- Exponential", and we have the similar results:
Weighted Average
79.6091
Highest Probability
79.0219 34576
Error of each number
1.38994
4.12 The Software
Limits
Standard version of the PredictorPro has 100,000 neurons,
and a certain number of internal neurons. In your prediction, if you set
-
M precision-levels;
-
K variables in the data set; and
-
N Trend;
then the PredictorPro uses N*M*K neurons. Therefore, the
limit is:
N*M*K <= 100,000 neurons
This number can go as high as 1,000,000 in a customized version.
4.13 Customized Software
Customized software can be ordered from Attrasoft upon your request
for the following reasons:
-
Larger External Networks: The current version supports up to 100,000
external neurons. If you want a larger one, it has to be customized. We
produce neural networks up to 1,000,000 neurons.
-
Larger Internal Networks: The current version supports up to 100,000
internal neurons. If you want a larger one, it has to be customized.
-
Special Data Interface: If you need to interface the PredictorPro
with your application data, a customized software can be ordered. Example,
missing data, string data, ...
To get updated information:
http://attrasoft.com
Online Order/Online Fax order:
Attrasoft PredictorPro for Windows
Version 2.8 ($494.99 + $5 US Shipping and Handling)
Mail Order: 499.99 (S&H included)
PredictorPro
Attrasoft, Attn.: Gina
P. O. Box 13051
Savannah, GA. 31406, USA
Send questions or comments to:
webmaster@attrasoft.com
or contact us at:
Attrasoft,
P.
O. Box 13051, Savannah, GA. 31406, USA
Copyright © 2001
Attrasoft, Inc. All rights reserved. |