2. Information
1. CRIM per capita crime rate by town
2. ZN proportion of residential land zoned for lots over 25,000 sq.ft.
3. INDUS proportion of non-retail business acres per town
4. CHAS Charles River dummy variable (= 1 if tract bounds river; 0
otherwise)
5. NOX nitric oxides concentration (parts per 10 million)
6. RM average number of rooms per dwelling
7. AGE proportion of owner-occupied units built prior to 1940
8. DIS weighted distances to five Boston employment centers
9. RAD index of accessibility to radial highways
10. TAX full-value property-tax rate per $10,000
11. PTRATIO pupil-teacher ratio by town
12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacksby town
13. LSTAT % lower status of the population
14. MEDV Median value of owner-occupied homes in $1000's
4. Database
Housing Database file has 506 instances. The data required for 100% accuracy can be estimated by using 10 classes:
10* 14 * 100 = 14,000 instances.
14,000 instances are required for "prefect" predictions. There was not enough data to support "perfect" predictions. As we will see in this case, the DM 2.5 will have an accuracy rate of 70 - 80 % overall.
The last 10 rows are used for the question file. The Housing Database file has 496 instances. Below are the first 10 instances:
0.00632 18.00 2.310 0 0.5380 6.5750 65.20 4.0900
1 296.0 15.30 396.90 4.98 24.00
0.02731 0.00 7.070 0 0.4690 6.4210 78.90 4.9671
2 242.0 17.80 396.90 9.14 21.60
0.02729 0.00 7.070 0 0.4690 7.1850 61.10 4.9671
2 242.0 17.80 392.83 4.03 34.70
0.03237 0.00 2.180 0 0.4580 6.9980 45.80 6.0622
3 222.0 18.70 394.63 2.94 33.40
0.06905 0.00 2.180 0 0.4580 7.1470 54.20 6.0622
3 222.0 18.70 396.90 5.33 36.20
0.02985 0.00 2.180 0 0.4580 6.4300 58.70 6.0622
3 222.0 18.70 394.12 5.21 28.70
0.08829 12.50 7.870 0 0.5240 6.0120 66.60 5.5605
5 311.0 15.20 395.60 12.43 22.90
0.14455 12.50 7.870 0 0.5240 6.1720 96.10 5.9505
5 311.0 15.20 396.90 19.15 27.10
0.21124 12.50 7.870 0 0.5240 5.6310 100.00 6.0821
5 311.0 15.20 386.63 29.93 16.50
0.17004 12.50 7.870 0 0.5240 6.0040 85.90 6.5921
5 311.0 15.20 386.71 17.10 18.90
The last 10 rows are used for the question file. Below are the 10 instances:
Question Answer
0.28960 0.00 9.690 0 0.5850 5.3900
72.90 2.7986 6 391.0 19.20 396.90 21.14
19.70
0.26838 0.00 9.690 0 0.5850 5.7940
70.60 2.8927 6 391.0 19.20 396.90 14.10
18.30
0.23912 0.00 9.690 0 0.5850 6.0190
65.30 2.4091 6 391.0 19.20 396.90 12.92
21.20
0.17783 0.00 9.690 0 0.5850 5.5690
73.50 2.3999 6 391.0 19.20 395.77 15.10
17.50
0.22438 0.00 9.690 0 0.5850 6.0270
79.70 2.4982 6 391.0 19.20 396.90 14.33
16.80
0.06263 0.00 11.930 0 0.5730 6.5930
69.10 2.4786 1 273.0 21.00 391.99 9.67
22.40
0.04527 0.00 11.930 0 0.5730 6.1200
76.70 2.2875 1 273.0 21.00 396.90 9.08
20.60
0.06076 0.00 11.930 0 0.5730 6.9760
91.00 2.1675 1 273.0 21.00 396.90 5.64
23.90
0.10959 0.00 11.930 0 0.5730 6.7940
89.30 2.3889 1 273.0 21.00 393.45 6.48 2
2.00
0.04741 0.00 11.930 0 0.5730 6.0300
80.80 2.5050 1 273.0 21.00 396.90 7.88
11.90
Click here to see question file.
5. Results
Click: "Real/+ Predict":
=================== Beginning =====================
0.2896 0 9.69 0 0.585 5.39 72.9 2.7986 6 391 19.2 396.9 21.14
Possibility Confidence*Probability
22.325 58300
17.375 4000
------------------------------------------------------
22.0072
0.26838 0 9.69 0 0.585 5.794 70.6 2.8927 6 391 19.2 396.9 14.1
Possibility Confidence*Probability
22.325 82000
17.375 5300
------------------------------------------------------
22.0245
0.23912 0 9.69 0 0.585 6.019 65.3 2.4091 6 391 19.2 396.9 12.92
Possibility Confidence*Probability
22.325 162600
37.175 2600
17.375 8000
------------------------------------------------------
22.3193
0.17783 0 9.69 0 0.585 5.569 73.5 2.3999 6 391 19.2 395.77 15.1
Possibility Confidence*Probability
22.325 82000
17.375 5300
------------------------------------------------------
22.0245
0.22438 0 9.69 0 0.585 6.027 79.7 2.4982 6 391 19.2 396.9 14.33
Possibility Confidence*Probability
22.325 147200
37.175 1000
27.275 1300
17.375 10600
------------------------------------------------------
22.1302
0.06263 0 11.93 0 0.573 6.593 69.1 2.4786 1 273 21 391.99 9.67
Possibility Confidence*Probability
22.325 4000
27.275 5300
------------------------------------------------------
25.146
0.04527 0 11.93 0 0.573 6.12 76.7 2.2875 1 273 21 396.9 9.08
Possibility Confidence*Probability
Can not make a Prediction.
------------------------------------------------------
0.06076 0 11.93 0 0.573 6.976 91 2.1675 1 273 21 396.9 5.64
Possibility Confidence*Probability
Can not make a Prediction.
------------------------------------------------------
0.10959 0 11.93 0 0.573 6.794 89.3 2.3889 1 273 21 393.45 6.48
Possibility Confidence*Probability
22.325 3900
37.175 2600
32.225 4000
27.275 2000
------------------------------------------------------
29.3738
0.04741 0 11.93 0 0.573 6.03 80.8 2.505 1 273 21 396.9 7.88
Possibility Confidence*Probability
22.325 7900
37.175 2600
------------------------------------------------------
26.0021
=================== End ==========================
6. Analysis
496 instances are certainly not enough for 14 attributes; more data is definitely required, nevertheless, we can already get some close estimations:
# Value ($10,000) Predicted($10,000)
1 19.7
22.325 58300
17.375 4000
----------------------
22.0072
2 18.3
22.325 82000
17.375 5300
----------------------
22.0245
3 21.2
22.325 162600
37.175 2600
17.375 8000
----------------------
22.3193
.......