Thyroid Database

1. Sources

Thyroid disease records supplied by the Garavan Institute and J. Ross Quinlan, New South Wales Institute, Sydney, Australia, 1987.

2. Information

3. Attributes

Each record looks like:

The attributes are given in order. Unknown attribute values are indicated by question marks. Only 27 of the 29 are used: attribute 28 is removed because most of the records do not have values for this attribute; attribute 29 is removed because the referral source is irrelevant. The attributes are

Attribute Name                     Possible Values

--------------                         ---------------

age:                                                  continuous.
sex:                                                  M (0), F(1).
on thyroxine:                                    f, t.
query on thyroxine:                           f, t.
n antithyroid medication:                   f, t.
sick:                                                 f, t.
pregnant:                                          f, t.
thyroid surgery:                                f, t.
I131 treatment:                                f, t.
query hypothyroid:                           f, t.
query hyperthyroid:                          f, t.
lithium:                                             f, t.
goitre:                                              f, t.
tumor:                                              f, t.
hypopituitary:                                   f, t.
psych:                                              f, t.
TSH measured:                                f, t.
TSH:                                                continuous.
T3 measured:                                   f, t.
T3:                                                   continuous.
TT4 measured:                                 f, t.
TT4:                                                continuous.
T4U measured:                                f, t.
T4U:                                               continuous.
FTI measured:                                 f, t.
FTI:                                                continuous.
TBG measured:                               f, t.
TBG:                                              continuous.    (Removed for too many missing values)
referral source:                                WEST, STMW, SVHC, SVI, SVHD, other.  (removed, referral is irrelevant)

In the above attributes:

f = 0
t = 1

20% of the records contain missing information. Because this is a large portion of the data, these records with missing information will be used. The following may not be a good way to deal with the missing data.

Attrasoft neural net, by design, can handle missing information easily, however, the interface for the DecisionMaker cannot handle the missing information. A customized version of the DecisionMaker, which alleviates this problem, can be easily ordered.

Missing Value Conversion:

Attribute Name             Possible Values
--------------                 ---------------
sex: ? ==>                       1 (Female)
TSH: ? ==>                     4.74 (Average*)
T3: ? ==>                        1.41 (Average*)
TT4: ? ==>                      103 (Average*)
T4U: ? ==>                     0.89 (Average*)
FTI: ? ==>                      103.7(Average*)

*Average = sum of all attribute values (?=0) /9172.

The diagnosis consists of a string of letters indicating diagnosed conditions:

The conditions are divided into groups where each group corresponds to a class of comments.

Letter                                              Diagnosis

------                                              ---------

-                                              no requiring comment( class 0).

hyperthyroid conditions (Class 10):

A                                              hyperthyroid
B                                              T3 toxic
C                                              toxic goitre
D                                              secondary toxic

hypothyroid conditions (class 20):

E                                              hypothyroid
F                                              primary hypothyroid
G                                              compensated hypothyroid
H                                              secondary hypothyroid

binding protein (class 30) :

I                                              increased binding protein
J                                              decreased binding protein

general health (class 40):

K                                              concurrent non-thyroidal illness

replacement therapy (class 50):

L                                              consistent with replacement therapy
M                                              underreplaced
N                                              overreplaced

antithyroid treatment (class 60):

O                                              antithyroid drugs
P                                              I131 treatment
Q                                              surgery

Miscellaneous (class 70):

R                                              discordant assay results
S                                              elevated TBG
T                                              elevated thyroid hormones

The number of possible classes are 8 * 8 = 64. However, only the following appears in the database:

A through T




Therefore the number of different classes is less than 30.

4. Database:

The Thyroid Database file has 9172 instances. There is no information on distribution of the instances over the classes. Estimated required data for 100% accuracy is:

100 * 31 variables * ~ 10 classes = 31,000 instances.

As we will see, 9173 instances will generate about 85 % accuracy rate. The first 9152 rows will be used for the database file. The last 20 rows are used for question file. Below are the last 20 instances:

Question                                                                         Answer

80 0 0 0 0 0 0 0 0 0 0 0 0 0   0
0 1 3.3 0 1.41 1 111 1 0.92 1 121 0                                     0

64 0 0 0 0 0 0 0 0 0 0 0 0 0   0
0 1 0.81 0 1.41 1 31 1 0.55 1 56 0 40                                  40

16 0 0 0 0 0 0 0 0 0 0 0 0 0   0
0 1 2.6 0 1.41 1 122 1 0.86 1 142 0 0                                   0

54 0 0 0 0 0 0 0 0 0 0 0 0 0      0
0 1 1.1 0 1.41 1 105 1 0.82 1 128 0 0                                    0

78 0 0 0 0 0 0 0 0 0 0 0 0 0    0
0 1 0.97 0 1.41 1 97 1 0.73 1 133 0 0                                   0

60 0 0 0 1 0 0 0 0 0 0 0 0 0    0
0 1 0.18 0 1.41 1 28 1 0.87 1 32 0 40                                 40

64 0 0 0 0 0 0 0 0 1 0 0 0 0      0
0 0 4.74 0 1.41 1 44 1 0.53 1 83 0 30                                 30

72 1 0 0 0 0 0 0 0 0 0 0 0 0   0
0 0 4.74 0 1.41 1 125 1 1.05 1 119 0 0                                  0

72 1 0 0 0 0 0 0 0 0 1 0 0 0   0
0 0 4.74 0 1.41 1 93 1 1 1 93 0 0                                           0

46 1 0 0 0 0 0 0 0 0 0 1 0 0    0
1 0 4.74 0 1.41 1 70 1 0.75 1 93 0 0                                       0

36 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 4.74 0 1.41 1 84 1 1.26 1 67 0 30                                    30

69 1 1 0 0 0 0 0 0 0 0 0 0 1 0
0 0 4.74 0 1.41 1 94 1 0.94 1 100 0 0                                      0

40 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 4.74 0 1.41 1 67 1 0.79 1 85 0 0                                        0

33 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 4.74 0 1.41 1 76 1 0.66 1 115 0 0                                       0

70 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 4.74 0 1.41 1 88 1 0.74 1 119 0 0                                        0

56 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 4.74 0 1.41 1 64 1 0.83 1 77 0 0                                           0

22 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 4.74 0 1.41 1 91 1 0.92 1 99 0 0                                           0

69 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 4.74 0 1.41 1 113 1 1.27 1 89 0 30                                       30

47 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 4.74 0 1.41 1 75 1 0.85 1 88 0 0                                           0

31 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 4.74 0 1.41 1 66 1 1.02 1 65 0 0                                          0

Thyroid database file.
Question file.

5. Results:

Click "Integer\-- Predict":

================= Beginning ==================

80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 3.3 0 1.41 1 111 1 0.92 1 121 0

Possibility Confidence*Probability

1 1 3.95016e+08

71 71 1.66152e+07

11 41 855600

31 31 1.24416e+07

51 51 6.084e+06

21 21 2.0746e+07

11 11 3.5328e+06

11 31 21200

21 31 135200


6 6

64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.81 0 1.41 1 31 1 0.55 1 56 0

Possibility Confidence*Probability

1 1 7.35247e+08

71 71 1.31328e+07

11 41 1.9232e+06

31 31 2.932e+07

51 51 9.9608e+06

21 21 3.05056e+07

41 41 2.8624e+06

11 11 4.8516e+06

61 61 245600

11 31 102400

51 31 20400

21 41 52800

21 31 390800

61 31 5600


5 5

16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2.6 0 1.41 1 122 1 0.86 1 142 0

Possibility Confidence*Probability

1 1 1.72036e+08

71 71 1.85476e+07

11 41 1.5828e+06

31 31 7.4548e+06

51 51 6.7488e+06

21 21 1.02788e+07

11 11 4.654e+06

11 31 12800


11 11

54 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1.1 0 1.41 1 105 1 0.82 1 128 0

Possibility Confidence*Probability

1 1 2.13235e+08

71 71 1.42528e+07

11 41 2.0068e+06

31 31 5.8468e+06

51 51 7.1632e+06

21 21 1.82596e+07

11 11 5.0204e+06

11 31 12000


8 9

78 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.97 0 1.41 1 97 1 0.73 1 133 0

Possibility Confidence*Probability

1 1 2.23586e+08

71 71 1.45484e+07

11 41 2.1468e+06

31 31 4.2264e+06

51 51 7.5508e+06

21 21 1.7038e+07

11 11 5.5356e+06

11 31 10800


8 8

60 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0.18 0 1.41 1 28 1 0.87 1 32 0

Possibility Confidence*Probability

1 1 2.8104e+06

71 71 60000

31 31 47200

51 51 92400

21 21 104800

41 41 74800

11 11 284800

61 61 460000

61 31 27200


13 13

64 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 4.74 0 1.41 1 44 1 0.53 1 83 0

Possibility Confidence*Probability

1 1 1.0798e+07

71 71 184000

11 41 54400

31 31 352400

51 51 498400

21 21 1.0292e+06

11 11 190000

21 31 24400


6 7

72 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.74 0 1.41 1 125 1 1.05 1 119 0

Possibility Confidence*Probability

1 1 7.85284e+07

71 71 711200

11 41 312800

31 31 6.1728e+06

51 51 2.6532e+06

21 21 916000

11 11 1.3708e+06

11 31 41200

21 31 64800


5 6

72 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 4.74 0 1.41 1 93 1 1 1 93 0

Possibility Confidence*Probability

1 1 1.93872e+07

71 71 53600

31 31 827600

51 51 208800

21 21 523200

11 11 1.0888e+06

11 31 10400

21 31 50000


4 4

46 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 4.74 0 1.41 1 70 1 0.75 1 93 0

Possibility Confidence*Probability

1 1 282000


1 1

36 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.74 0 1.41 1 84 1 1.26 1 67 0

Possibility Confidence*Probability

1 1 1.30932e+07

71 71 202000

31 31 1.51344e+07

51 51 503200

21 21 1.4104e+06

11 11 317600

61 61 35600

11 31 178400

51 31 35600

21 31 364800


18 18

69 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 4.74 0 1.41 1 94 1 0.94 1 100 0

Possibility Confidence*Probability

1 1 448400

51 51 366400

11 31 76800


22 24

40 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.74 0 1.41 1 67 1 0.79 1 85 0

Possibility Confidence*Probability

1 1 4.77179e+08

71 71 3.6668e+06

11 41 1.7084e+06

31 31 1.34876e+07

51 51 7.938e+06

21 21 1.10684e+07

11 11 1.49352e+07

61 61 204000

11 31 151200

21 31 310800

61 31 9600


4 4

33 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.74 0 1.41 1 76 1 0.66 1 115 0

Possibility Confidence*Probability

1 1 1.75566e+08

71 71 4.3388e+06

11 41 1.8904e+06

31 31 5.5048e+06

51 51 6.8416e+06

21 21 3.214e+06

11 11 1.5306e+07

61 61 34800

11 31 41200

21 31 55200


6 6

70 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.74 0 1.41 1 88 1 0.74 1 119 0

Possibility Confidence*Probability

1 1 1.89014e+08

71 71 4.2856e+06

11 41 1.9224e+06

31 31 5.7228e+06

51 51 6.9268e+06

21 21 3.1516e+06

11 11 1.78432e+07

61 61 35600

11 31 43600

21 31 56800


6 6

56 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 4.74 0 1.41 1 64 1 0.83 1 77 0

Possibility Confidence*Probability

1 1 1.78136e+07

71 71 192400

11 41 60800

31 31 458000

51 51 166800

21 21 900000

11 11 1.7048e+06

21 31 27600


4 4

22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.74 0 1.41 1 91 1 0.92 1 99 0

Possibility Confidence*Probability

1 1 1.93528e+08

71 71 2.4572e+06

11 41 529200

31 31 1.43176e+07

51 51 3.7456e+06

21 21 3.9428e+06

11 11 2.0672e+06

11 31 27200

21 31 134800


5 5

69 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.74 0 1.41 1 113 1 1.27 1 89 0

Possibility Confidence*Probability

1 1 6.4128e+06

71 71 59200

31 31 2.20012e+07

51 51 225600

21 21 653600

11 11 207600

11 31 60000

21 31 185200


24 24

47 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.74 0 1.41 1 75 1 0.85 1 88 0

Possibility Confidence*Probability

1 1 4.7567e+08

71 71 3.2736e+06

11 41 1.3604e+06

31 31 1.36276e+07

51 51 7.4676e+06

21 21 1.06468e+07

11 11 1.28232e+07

61 61 171200

11 31 154400

21 31 314000

61 31 10000


4 4

31 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 4.74 0 1.41 1 66 1 1.02 1 65 0

Possibility Confidence*Probability

1 1 5.4412e+06

71 71 121600

31 31 386800

51 51 248000

21 21 758000

11 11 53200

21 31 29200


8 8

Precision of each number:

0.714 0.714

=================== End ==========================

6. Analysis:

Taking the answer with the highest probability, the results are given below:

Case#                  Correct                       DecisionMaker

1                       0 0                                   1 1
2                       0 0                                   1 1
3                       40 40                               1 1*
4                       0 0                                   1 1
5                       0 0                                   1 1
6                       40 40                               1 1*
7                       0 0                                   1 1
8                       30 30                               1 1*
9                       0 0                                   1 1
10                     0 0                                   1 1
11                    30 30                                31 31
12                       0 0                                   1 1
13                       0 0                                   1 1
14                       0 0                                   1 1
15                       0 0                                   1 1
16                       0 0                                   1 1
17                    30 30                                 31 31
18                       0 0                                   1 1
19                       0 0                                   1 1
20                       0 0                                   1 1

17 answers are right, 3 answers are wrong. There is no class distribution information to make further analysis. It appears that there are enough instances for class (0 0), therefore, the classification for this class is 100% correct.