Boston Housing Database


1. Sources:

Concerns housing values in suburbs of Boston. This data set was taken from the StatLib library, which is maintained at Carnegie Mellon. See: Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978.

2. Information

3. Attribute Information:

1. CRIM per capita crime rate by town
2. ZN proportion of residential land zoned for lots over 25,000 sq.ft.
3. INDUS proportion of non-retail business acres per town
4. CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
5. NOX nitric oxides concentration (parts per 10 million)
6. RM average number of rooms per dwelling
7. AGE proportion of owner-occupied units built prior to 1940
8. DIS weighted distances to five Boston employment centers
9. RAD index of accessibility to radial highways
10. TAX full-value property-tax rate per $10,000
11. PTRATIO pupil-teacher ratio by town
12. B 1000(Bk - 0.63)^2 where Bk is the proportion of blacksby town
13. LSTAT % lower status of the population
14. MEDV Median value of owner-occupied homes in $1000's
 

4. Database

Housing Database file has 506 instances. The data required for 100% accuracy can be estimated by using 10 classes:

10* 14 * 100 = 14,000 instances.

14,000 instances are required for "prefect" predictions. There was not enough data to support "perfect" predictions. As we will see in this case, the DM 2.5 will have an accuracy rate of 70 - 80 % overall.

The last 10 rows are used for the question file. The Housing Database file has 496 instances. Below are the first 10 instances:

Housing database file:

0.00632 18.00 2.310 0 0.5380 6.5750 65.20 4.0900 1 296.0 15.30 396.90 4.98 24.00
0.02731 0.00 7.070 0 0.4690 6.4210 78.90 4.9671 2 242.0 17.80 396.90 9.14 21.60
0.02729 0.00 7.070 0 0.4690 7.1850 61.10 4.9671 2 242.0 17.80 392.83 4.03 34.70
0.03237 0.00 2.180 0 0.4580 6.9980 45.80 6.0622 3 222.0 18.70 394.63 2.94 33.40
0.06905 0.00 2.180 0 0.4580 7.1470 54.20 6.0622 3 222.0 18.70 396.90 5.33 36.20
0.02985 0.00 2.180 0 0.4580 6.4300 58.70 6.0622 3 222.0 18.70 394.12 5.21 28.70
0.08829 12.50 7.870 0 0.5240 6.0120 66.60 5.5605 5 311.0 15.20 395.60 12.43 22.90
0.14455 12.50 7.870 0 0.5240 6.1720 96.10 5.9505 5 311.0 15.20 396.90 19.15 27.10
0.21124 12.50 7.870 0 0.5240 5.6310 100.00 6.0821 5 311.0 15.20 386.63 29.93 16.50
0.17004 12.50 7.870 0 0.5240 6.0040 85.90 6.5921 5 311.0 15.20 386.71 17.10 18.90

The last 10 rows are used for the question file. Below are the 10 instances:

Question                                                                 Answer

0.28960 0.00 9.690 0 0.5850 5.3900
72.90 2.7986 6 391.0 19.20 396.90 21.14             19.70

0.26838 0.00 9.690 0 0.5850 5.7940
70.60 2.8927 6 391.0 19.20 396.90 14.10             18.30

0.23912 0.00 9.690 0 0.5850 6.0190
65.30 2.4091 6 391.0 19.20 396.90 12.92             21.20

0.17783 0.00 9.690 0 0.5850 5.5690
73.50 2.3999 6 391.0 19.20 395.77 15.10             17.50

0.22438 0.00 9.690 0 0.5850 6.0270
79.70 2.4982 6 391.0 19.20 396.90 14.33             16.80

0.06263 0.00 11.930 0 0.5730 6.5930
69.10 2.4786 1 273.0 21.00 391.99 9.67               22.40

0.04527 0.00 11.930 0 0.5730 6.1200
76.70 2.2875 1 273.0 21.00 396.90 9.08             20.60

0.06076 0.00 11.930 0 0.5730 6.9760
91.00 2.1675 1 273.0 21.00 396.90 5.64                 23.90

0.10959 0.00 11.930 0 0.5730 6.7940
89.30 2.3889 1 273.0 21.00 393.45 6.48 2            2.00

0.04741 0.00 11.930 0 0.5730 6.0300
80.80 2.5050 1 273.0 21.00 396.90 7.88             11.90

Click here to see question file.

5. Results

Click: "Real/+ Predict":

=================== Beginning =====================
 
 

0.2896 0 9.69 0 0.585 5.39 72.9 2.7986 6 391 19.2 396.9 21.14

Possibility Confidence*Probability

22.325 58300

17.375 4000

------------------------------------------------------

22.0072
 

0.26838 0 9.69 0 0.585 5.794 70.6 2.8927 6 391 19.2 396.9 14.1

Possibility Confidence*Probability

22.325 82000

17.375 5300

------------------------------------------------------

22.0245
 

0.23912 0 9.69 0 0.585 6.019 65.3 2.4091 6 391 19.2 396.9 12.92

Possibility Confidence*Probability

22.325 162600

37.175 2600

17.375 8000

------------------------------------------------------

22.3193
 

0.17783 0 9.69 0 0.585 5.569 73.5 2.3999 6 391 19.2 395.77 15.1

Possibility Confidence*Probability

22.325 82000

17.375 5300

------------------------------------------------------

22.0245
 

0.22438 0 9.69 0 0.585 6.027 79.7 2.4982 6 391 19.2 396.9 14.33

Possibility Confidence*Probability

22.325 147200

37.175 1000

27.275 1300

17.375 10600

------------------------------------------------------

22.1302
 

0.06263 0 11.93 0 0.573 6.593 69.1 2.4786 1 273 21 391.99 9.67

Possibility Confidence*Probability

22.325 4000

27.275 5300

------------------------------------------------------

25.146
 

0.04527 0 11.93 0 0.573 6.12 76.7 2.2875 1 273 21 396.9 9.08

Possibility Confidence*Probability

Can not make a Prediction.

------------------------------------------------------
 
 

0.06076 0 11.93 0 0.573 6.976 91 2.1675 1 273 21 396.9 5.64

Possibility Confidence*Probability

Can not make a Prediction.

------------------------------------------------------
 
 

0.10959 0 11.93 0 0.573 6.794 89.3 2.3889 1 273 21 393.45 6.48

Possibility Confidence*Probability

22.325 3900

37.175 2600

32.225 4000

27.275 2000

------------------------------------------------------

29.3738
 
 

0.04741 0 11.93 0 0.573 6.03 80.8 2.505 1 273 21 396.9 7.88

Possibility Confidence*Probability

22.325 7900

37.175 2600

------------------------------------------------------

26.0021

=================== End ==========================

6. Analysis

496 instances are certainly not enough for 14 attributes; more data is definitely required, nevertheless, we can already get some close estimations:

#         Value ($10,000)     Predicted($10,000)

1         19.7                      22.325     58300
                                        17.375     4000
                                        ----------------------
                                        22.0072

2         18.3                       22.325     82000
                                         17.375     5300
                                        ----------------------
                                        22.0245

3         21.2                      22.325         162600
                                        37.175         2600
                                        17.375         8000
                                        ----------------------
                                        22.3193
.......