5. Predicting the Stock Market

5.1 Basic Concepts

5.2 Data Set

5.3 Data Collection From the Internet

5.4 Data Preprocessing

5.4.1 Sorting
5.4.2 Moving Average
5.4.3 Predictor Input File
5.5 Data Set selection

5.6 Commands

5.7 A Simple Prediction: Intel 2

5.8 Prediction with Preprocessing: Intel 4

5.9 Prediction Without Much Data

5.10 The Software Parameters
 


Predicting the Stock Market

For the sake of discussion, we will predict Intel stock one-month ahead. You can obviously substitute Intel for any other stock or market index. To predict Intel one month ahead, the monthly data is required. You can change the prediction to one-day, one-week, or one-year ahead by substituting daily, weekly, or yearly data, respectively.

Attrasoft Predictor predicts events based on the past history:

Before you play real games with real dollars, you should make simulations with past data, or future data (pretending you are playing without the economic cost of your learning experience). Do not jump into predicting the stock market directly without any experience. A reasonable simulation is to set up a model and make 100 - 1,000 predictions. If the results are not satisfactory, change your data set.


Basic Concepts

A data set is a collection of variables used to make a prediction. Examples of the variables are various stocks and indexes, various bond rates, various interest rates, various economic indexes, and any other indexes you like, please see chapter 1 for details. Choosing a data set is your job. This is the most important factor in the stock prediction.

The Predictor predicts the upper and lower bounds. In general, you should use both commands and pick up number between the upper and lower bounds.

The output of the Predictor is a distribution, similar to the output of an Internet search Engine: you get a outcome and a ranking number for it.


Data Set

A data set is a collection of variables used to make a prediction. Three examples will be given below. These examples must be understood as illustrations on how to use the Predictor. They are not intended to show you how to choose the data set. It is your job to choose a data set. This is where your expertise comes in.

Example 1 To predict the Intel stock price 1 month ahead, a data set is

Example 2 To predict the Intel stock price 1 month ahead, a data set is: Example 3 To predict the Intel stock price 1 month ahead, a data set is:

Data Collection From the Internet

It is important to avoid any typing in data collection. There are many sites on the Internet, which offer free data. In the following, we will show you how to download data from the "Microsoft Investor" section of the Microsoft Networks.

Microsoft Excel is required for the following. Assume we are trying to collect data about the Intel Corp. (intc) for the last 7 years, here is the procedure:

First Time User:

  1. Go to:

  2. http://investor.msn.com/quotes/quotes.asp?beenthere=1&Page=Research&Item=QuoteDetail

    click "Enter ..." and enter "intc";
     
     

  3. On the page, click "Charts";

  4.  

     

  5. Click "Start Download"; and when prompted to install a Microsoft software, click "yes';

  6.  

     

  7. You should see a test chart now, and click "I see the chart";

  8.  

     

  9. Click "Period/10 Years" or other selections you want;

  10.  

     

  11. To download, click "File/Export Data", at this point, your Excel will be opened and the Intel data is right there. Save the data and the job will be done.


After the First Time :
 

  1. Go to

  2. http://investor.msn.com/charts/charting.asp?Symbol=intc

    on the page enter "intc";
     
     

  3. Click "Period/10 Years" or other selections you want;

  4.  

     

  5. To download, click "File/Export Data", at this point, your Excel will be opened and the Intel data is right there. Save the data and the job will be done.


Now, you have obtained the Intel stock data for the last 7 years, and it looks like this:

Intel Corporation (INTC)

Monthly prices (Dec 1991 to Nov 1998)

DATE HIGH LOW CLOSE VOLUME

Oct-98 90.813 75.813 89.188 393720800

Sep-98 89 69.75 85.75 369103300

Aug-98 92.625 70.938 71.188 440267900

Jul-98 88.25 72.25 84.438 396391200

Jun-98 77.625 65.656 74.125 360081300

May-98 85.438 71.25 71.438 271927700

Apr-98 85.063 71.313 80.813 386529700

Mar-98 91.375 74 78.063 425380400

Feb-98 95.625 82.188 89.688 333269000

Jan-98 82.875 70.5 81 339559700

Dec-97 81.75 67.375 70.25 353535700

Nov-97 81.75 72.016 77.625 318099500

Oct-97 95.625 69.25 77 496699800

Sep-97 99.25 92 92.313 258597900

Aug-97 102 89.875 92.125 352819700

Jul-97 92.125 69.313 91.813 473075200

Jun-97 77.313 70.375 70.906 525439200

May-97 84.875 70 75.75 596148200

Apr-97 76.75 64.125 76.563 618080200

Mar-97 75.125 62.875 69.563 545409600

Feb-97 82.5 68.5 70.938 539579800

Jan-97 82 63.813 81.125 557706400

Dec-96 70.75 61.688 65.469 451772200

Nov-96 63.625 54 63.438 418029200

Oct-96 57.125 47.125 54.938 445910000

Sep-96 49.75 39.25 47.719 322253000

Aug-96 41.625 37.438 39.906 309881200

Jul-96 37.945 32.063 37.563 491226800

Jun-96 38.875 35.125 36.719 278154800

May-96 37.813 33.25 37.75 318156400

Apr-96 35.188 28.188 33.875 413220000

Mar-96 30.125 25.75 28.438 346666000

Feb-96 30.813 27.43 29.406 283989000

Jan-96 29.688 24.906 27.617 598129600

Dec-95 32 27.594 28.375 310609400

Nov-95 36.57 29.938 30.438 408954800

Oct-95 35.688 28.875 34.938 436342200

Sep-95 33.313 28.625 30.063 361194200

Aug-95 33.625 28.563 30.688 493254400

Jul-95 39.188 31.25 32.5 558346000

Jun-95 33.188 27.375 31.656 497274000

May-95 29.75 25.719 28.063 680593200

Apr-95 26.031 21.125 25.594 515628400

Mar-95 22.5 19.594 21.219 488044400

Feb-95 20.188 17.281 19.938 388339600

Jan-95 17.938 15.75 17.344 498236400

Dec-94 16.438 14.375 15.969 531699200

Nov-94 16.816 14.938 15.781 415400000

Oct-94 15.75 14.344 15.531 419888800

Sep-94 17.063 15 15.375 325405600

Aug-94 16.875 14.156 16.438 388840800

Jul-94 15.75 14.031 14.813 321685200

Jun-94 16.25 14.313 14.625 295517600

May-94 15.688 14.125 15.625 318720800

Apr-94 17.75 14 15.25 464237200

Mar-94 18.375 16.25 16.875 415659200

Feb-94 17.313 15.125 17.188 347250000

Jan-94 17.313 15.188 16.313 493524400

Dec-93 16.188 14.063 15.5 372952800

Nov-93 16.125 14.063 15.375 390534800

Oct-93 18.469 15.375 15.813 505204400

Sep-93 18.563 15.063 17.688 320584800

Aug-93 16.813 13.063 16.063 373164000

Jul-93 14.25 12.313 13.063 308065600

Jun-93 15.125 13.219 13.75 357856000

May-93 14.188 11.906 13.859 422287200

Apr-93 15.125 10.688 11.891 947040800

Mar-93 15.156 13.906 14.375 506624000

Feb-93 15 13.031 14.563 601308800

Jan-93 14.75 10.781 13.344 745068800

Dec-92 11.438 8.875 10.875 480952800

Nov-92 9.031 8.313 8.938 261600800

Oct-92 8.641 7.594 8.438 389177600

Sep-92 8.375 7.219 8.219 301148000

Aug-92 7.656 6.781 7.25 197321600

Jul-92 7.531 6.844 7.469 291545600

Jun-92 7.219 5.969 7.125 346980000

May-92 6.844 5.813 6.25 300004000

Apr-92 7.406 6.125 6.688 443011200

Mar-92 8.438 6.438 6.906 424595200

Feb-92 8.594 7.375 7.938 340300800

Jan-92 7.875 5.906 7.484 589994400

Dec-91 6.156 5.031 6.125 260759200

Now we want:


Data Preprocessing

Sorting

Now without typing a single number, you have obtained the Intel data in the Excel spread sheet. Before you can use the data, you have to preprocess it for two or three steps:
 

The above data file downloaded from Microsoft Investor puts the latest data in the first row and the earliest data in the last row. The Predictor can not use this type of data. The Predictor uses data in order: the earliest data first, and the latest data last. To reverse the order, in the Microsoft Excel: Now the data is sorted in the correct order for the Predictor.
 

Moving Average

To save space, please refer to the last chapter.

Predictor Input File

The above process produces Intel data. In general, you might have several variables and you will have to repeat the above processes several time, for example,
 

After the data collection, combine all your data in to one Excel file, and save it in the text format. Make sure it has three sections (see chapter 3):
 

Data Set selection

Data preparation for the prediction is the most important factor in the prediction or forecast. Data preparation means:

Your data preparation directly influences the results. Data set selection is your job. This is the single most productive area to work in. Your largest contribution to your prediction is right here.

The factors, that will increase your chance of success, are

To save space, please refer to the last chapter for details.


Commands

Stocks like Intel or Microsoft grows exponentially, therefore, you should use the following commands:
 

Here, "+" and "--" mean the upper bound and the lower bound, respectively. You should use both commands. Assume a prediction is

x = 5.6 + 0.3 - 0.2 ,

i.e. the result is likely to be 5.6, but could be in the range [5.4, 5.9]; then

 
'0' prediction ==> 5.6

'+' prediction ==> 5.9

'--' prediction ==> 5.4.


'+' and '--' predictions yield results more often than the '0' prediction, therefore, consider these two commands first. '0' prediction, on other hand, is more accurate than the other two, if it produces a prediction.
 


A Simple Prediction: Intel 2

Chapter 2 presents a complete example. This example can be generated by clicking "Example/Intel 2".


Prediction with Preprocessing: Intel 4

The data set of this example is:

Clicking "Example/Intel 4 ..." can generate this example.

The goal is predict the 5-month moving average of Intel on Oct-98, which is 80.9378.

Click "Real/-- Exponential", we get
 
 

Possibility Confidence*Probability

76.0928 1086.51 107.244 92

79.5299 1101.96 103.787 58

---------------------------------------------------

Weighted Average

77.4218 1092.48 105.907

Highest Probability

76.0928 1086.51 107.244 92

Error of each number

0.859271 3.86403 0.86441
 
 

The confidence is too low. To change the Trend from 5 to 4, click "Data/link" and enter 4 under Trend. Click "Real/-- Exponential" again, we get the distribution:
  Possibility Confidence*Probability

86.404 1078.78 102.058 262

77.8113 1071.05 105.515 4000

76.0928 1086.51 100.329 364

81.2484 1063.32 105.515 330

82.9669 1063.32 105.515 398

82.9669 1071.05 105.515 670

79.5299 1094.23 110.702 160

76.0928 1086.51 107.244 228

81.2484 1101.96 108.973 160

79.5299 1101.96 103.787 126

77.8113 1086.51 107.244 364

76.0928 1094.23 110.702 330

---------------------------------------------------

Weighted Average

78.9408 1075.29 105.665

Highest Probability

77.8113 1071.05 105.515 4000

Error of each number

0.859271 3.86403 0.86441
 

Prediction confidence varies from month to month. The confidence for another month might be much higher. The prediction is 78.9408 ± 0.859271, which is close to 80.9378.


Prediction Without Much Data
 

There are times you want to predict a new stock, "A", where no data is available. You can substitute the stock, "A", with another similar stock, "B".

You have to be careful here: if you use a 5-trend prediction, you must have at least 4 rows of "A" stock. if you use a 6-trend prediction, you must have at least 5 rows of "A" stock.

To predict Intel, the data set of this example is:

Clicking "Example/Intel 5 ..." will generate this example.

The goal is predict the 5-month moving average of Intel on Oct-98, which is 80.9378.

Click "Real/-- Exponential", we get
 

Possibility Confidence*Probability

79.0219 34576

76.242 7936

81.8018 20184

84.5816 380

87.3615 256

62.3426 64

---------------------------------------------------

Weighted Average

79.6091

Highest Probability

79.0219 34576

Error of each number

1.38994
 

Prediction confidence varies from month to month. The confidence for another month might be much higher. The prediction is 79.6091 ± 1.38994, which hits 80.9378.


The Software Parameters

Standard version has 10,000 neurons, and a certain number of internal neurons. If you need a larger software, please contact http://attrasoft.com.

In your prediction, if you set
 

then the Predictor uses N*M*K neurons. Therefore, the limit is:
 
N*M*K <= 10,000 neurons . Standard version;
This number can go as high as 1,000,000 in the customized version.