2. Information
Each record looks like:
Attribute Name Possible Values
-------------- ---------------
age:
continuous.
sex:
M (0), F(1).
on thyroxine:
f, t.
query on thyroxine:
f, t.
n antithyroid medication:
f, t.
sick:
f, t.
pregnant:
f, t.
thyroid surgery:
f, t.
I131 treatment:
f, t.
query hypothyroid:
f, t.
query hyperthyroid:
f, t.
lithium:
f, t.
goitre:
f, t.
tumor:
f, t.
hypopituitary:
f, t.
psych:
f, t.
TSH measured:
f, t.
TSH:
continuous.
T3 measured:
f, t.
T3:
continuous.
TT4 measured:
f, t.
TT4:
continuous.
T4U measured:
f, t.
T4U:
continuous.
FTI measured:
f, t.
FTI:
continuous.
TBG measured:
f, t.
TBG:
continuous. (Removed for too many missing values)
referral source:
WEST, STMW, SVHC, SVI, SVHD, other. (removed, referral is irrelevant)
In the above attributes:
f = 0
t = 1
20% of the records contain missing information. Because this is a large portion of the data, these records with missing information will be used. The following may not be a good way to deal with the missing data.
Attrasoft neural net, by design, can handle missing information easily, however, the interface for the DecisionMaker cannot handle the missing information. A customized version of the DecisionMaker, which alleviates this problem, can be easily ordered.
Missing Value Conversion:
Attribute Name
Possible Values
--------------
---------------
sex: ? ==>
1 (Female)
TSH: ? ==>
4.74 (Average*)
T3: ? ==>
1.41 (Average*)
TT4: ? ==>
103 (Average*)
T4U: ? ==>
0.89 (Average*)
FTI: ? ==>
103.7(Average*)
*Average = sum of all attribute values (?=0) /9172.
The diagnosis consists of a string of letters indicating diagnosed conditions:
Letter Diagnosis
------ ---------
- no requiring comment( class 0).
hyperthyroid conditions (Class 10):
A
hyperthyroid
B
T3 toxic
C
toxic goitre
D
secondary toxic
hypothyroid conditions (class 20):
E
hypothyroid
F
primary hypothyroid
G
compensated hypothyroid
H
secondary hypothyroid
binding protein (class 30) :
I
increased binding protein
J
decreased binding protein
general health (class 40):
K concurrent non-thyroidal illness
replacement therapy (class 50):
L
consistent with replacement therapy
M
underreplaced
N
overreplaced
antithyroid treatment (class 60):
O
antithyroid drugs
P
I131 treatment
Q
surgery
Miscellaneous (class 70):
R
discordant assay results
S
elevated TBG
T
elevated thyroid hormones
The number of possible classes are 8 * 8 = 64. However, only the following appears in the database:
A through T
AK HK GI GK CI
OI MK KJ TK DR
LJ MJ
Therefore the number of different classes is less than 30.
4. Database:
The Thyroid Database file has 9172 instances. There is no information on distribution of the instances over the classes. Estimated required data for 100% accuracy is:
100 * 31 variables * ~ 10 classes = 31,000 instances.
As we will see, 9173 instances will generate about 85 % accuracy rate. The first 9152 rows will be used for the database file. The last 20 rows are used for question file. Below are the last 20 instances:
Question Answer
80 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 3.3 0 1.41 1 111 1 0.92 1 121 0
0
64 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0.81 0 1.41 1 31 1 0.55 1 56 0 40
40
16 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 2.6 0 1.41 1 122 1 0.86 1 142 0 0
0
54 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 1.1 0 1.41 1 105 1 0.82 1 128 0 0
0
78 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0.97 0 1.41 1 97 1 0.73 1 133 0 0
0
60 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 1 0.18 0 1.41 1 28 1 0.87 1 32 0 40
40
64 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 4.74 0 1.41 1 44 1 0.53 1 83 0 30
30
72 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 4.74 0 1.41 1 125 1 1.05 1 119 0 0
0
72 1 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 4.74 0 1.41 1 93 1 1 1 93 0 0
0
46 1 0 0 0 0 0 0 0 0 0 1 0 0 0
1 0 4.74 0 1.41 1 70 1 0.75 1 93 0 0
0
36 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 4.74 0 1.41 1 84 1 1.26 1 67 0 30
30
69 1 1 0 0 0 0 0 0 0 0 0 0 1 0
0 0 4.74 0 1.41 1 94 1 0.94 1 100 0 0
0
40 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 4.74 0 1.41 1 67 1 0.79 1 85 0 0
0
33 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 4.74 0 1.41 1 76 1 0.66 1 115 0 0
0
70 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 4.74 0 1.41 1 88 1 0.74 1 119 0 0
0
56 0 0 0 0 0 0 0 0 0 1 0 0 0 0
0 0 4.74 0 1.41 1 64 1 0.83 1 77 0 0
0
22 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 4.74 0 1.41 1 91 1 0.92 1 99 0 0
0
69 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 4.74 0 1.41 1 113 1 1.27 1 89 0 30
30
47 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 4.74 0 1.41 1 75 1 0.85 1 88 0 0
0
31 0 0 0 0 0 0 0 0 1 0 0 0 0 0
0 0 4.74 0 1.41 1 66 1 1.02 1 65 0 0
0
Thyroid database file.
Question file.
5. Results:
Click "Integer\-- Predict":
================= Beginning ==================
80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 3.3 0 1.41 1 111 1 0.92 1 121 0
Possibility Confidence*Probability
1 1 3.95016e+08
71 71 1.66152e+07
11 41 855600
31 31 1.24416e+07
51 51 6.084e+06
21 21 2.0746e+07
11 11 3.5328e+06
11 31 21200
21 31 135200
------------------------------------------------------
6 6
64 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.81 0 1.41 1 31 1 0.55 1 56 0
Possibility Confidence*Probability
1 1 7.35247e+08
71 71 1.31328e+07
11 41 1.9232e+06
31 31 2.932e+07
51 51 9.9608e+06
21 21 3.05056e+07
41 41 2.8624e+06
11 11 4.8516e+06
61 61 245600
11 31 102400
51 31 20400
21 41 52800
21 31 390800
61 31 5600
------------------------------------------------------
5 5
16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2.6 0 1.41 1 122 1 0.86 1 142 0
Possibility Confidence*Probability
1 1 1.72036e+08
71 71 1.85476e+07
11 41 1.5828e+06
31 31 7.4548e+06
51 51 6.7488e+06
21 21 1.02788e+07
11 11 4.654e+06
11 31 12800
------------------------------------------------------
11 11
54 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1.1 0 1.41 1 105 1 0.82 1 128 0
Possibility Confidence*Probability
1 1 2.13235e+08
71 71 1.42528e+07
11 41 2.0068e+06
31 31 5.8468e+06
51 51 7.1632e+06
21 21 1.82596e+07
11 11 5.0204e+06
11 31 12000
------------------------------------------------------
8 9
78 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.97 0 1.41 1 97 1 0.73 1 133 0
Possibility Confidence*Probability
1 1 2.23586e+08
71 71 1.45484e+07
11 41 2.1468e+06
31 31 4.2264e+06
51 51 7.5508e+06
21 21 1.7038e+07
11 11 5.5356e+06
11 31 10800
------------------------------------------------------
8 8
60 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0.18 0 1.41 1 28 1 0.87 1 32 0
Possibility Confidence*Probability
1 1 2.8104e+06
71 71 60000
31 31 47200
51 51 92400
21 21 104800
41 41 74800
11 11 284800
61 61 460000
61 31 27200
------------------------------------------------------
13 13
64 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 4.74 0 1.41 1 44 1 0.53 1 83 0
Possibility Confidence*Probability
1 1 1.0798e+07
71 71 184000
11 41 54400
31 31 352400
51 51 498400
21 21 1.0292e+06
11 11 190000
21 31 24400
------------------------------------------------------
6 7
72 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.74 0 1.41 1 125 1 1.05 1 119 0
Possibility Confidence*Probability
1 1 7.85284e+07
71 71 711200
11 41 312800
31 31 6.1728e+06
51 51 2.6532e+06
21 21 916000
11 11 1.3708e+06
11 31 41200
21 31 64800
------------------------------------------------------
5 6
72 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 4.74 0 1.41 1 93 1 1 1 93 0
Possibility Confidence*Probability
1 1 1.93872e+07
71 71 53600
31 31 827600
51 51 208800
21 21 523200
11 11 1.0888e+06
11 31 10400
21 31 50000
------------------------------------------------------
4 4
46 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 4.74 0 1.41 1 70 1 0.75 1 93 0
Possibility Confidence*Probability
1 1 282000
------------------------------------------------------
1 1
36 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.74 0 1.41 1 84 1 1.26 1 67 0
Possibility Confidence*Probability
1 1 1.30932e+07
71 71 202000
31 31 1.51344e+07
51 51 503200
21 21 1.4104e+06
11 11 317600
61 61 35600
11 31 178400
51 31 35600
21 31 364800
------------------------------------------------------
18 18
69 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 4.74 0 1.41 1 94 1 0.94 1 100 0
Possibility Confidence*Probability
1 1 448400
51 51 366400
11 31 76800
------------------------------------------------------
22 24
40 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.74 0 1.41 1 67 1 0.79 1 85 0
Possibility Confidence*Probability
1 1 4.77179e+08
71 71 3.6668e+06
11 41 1.7084e+06
31 31 1.34876e+07
51 51 7.938e+06
21 21 1.10684e+07
11 11 1.49352e+07
61 61 204000
11 31 151200
21 31 310800
61 31 9600
------------------------------------------------------
4 4
33 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.74 0 1.41 1 76 1 0.66 1 115 0
Possibility Confidence*Probability
1 1 1.75566e+08
71 71 4.3388e+06
11 41 1.8904e+06
31 31 5.5048e+06
51 51 6.8416e+06
21 21 3.214e+06
11 11 1.5306e+07
61 61 34800
11 31 41200
21 31 55200
------------------------------------------------------
6 6
70 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.74 0 1.41 1 88 1 0.74 1 119 0
Possibility Confidence*Probability
1 1 1.89014e+08
71 71 4.2856e+06
11 41 1.9224e+06
31 31 5.7228e+06
51 51 6.9268e+06
21 21 3.1516e+06
11 11 1.78432e+07
61 61 35600
11 31 43600
21 31 56800
------------------------------------------------------
6 6
56 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 4.74 0 1.41 1 64 1 0.83 1 77 0
Possibility Confidence*Probability
1 1 1.78136e+07
71 71 192400
11 41 60800
31 31 458000
51 51 166800
21 21 900000
11 11 1.7048e+06
21 31 27600
------------------------------------------------------
4 4
22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.74 0 1.41 1 91 1 0.92 1 99 0
Possibility Confidence*Probability
1 1 1.93528e+08
71 71 2.4572e+06
11 41 529200
31 31 1.43176e+07
51 51 3.7456e+06
21 21 3.9428e+06
11 11 2.0672e+06
11 31 27200
21 31 134800
------------------------------------------------------
5 5
69 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.74 0 1.41 1 113 1 1.27 1 89 0
Possibility Confidence*Probability
1 1 6.4128e+06
71 71 59200
31 31 2.20012e+07
51 51 225600
21 21 653600
11 11 207600
11 31 60000
21 31 185200
------------------------------------------------------
24 24
47 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4.74 0 1.41 1 75 1 0.85 1 88 0
Possibility Confidence*Probability
1 1 4.7567e+08
71 71 3.2736e+06
11 41 1.3604e+06
31 31 1.36276e+07
51 51 7.4676e+06
21 21 1.06468e+07
11 11 1.28232e+07
61 61 171200
11 31 154400
21 31 314000
61 31 10000
------------------------------------------------------
4 4
31 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 4.74 0 1.41 1 66 1 1.02 1 65 0
Possibility Confidence*Probability
1 1 5.4412e+06
71 71 121600
31 31 386800
51 51 248000
21 21 758000
11 11 53200
21 31 29200
------------------------------------------------------
8 8
Precision of each number:
0.714 0.714
=================== End ==========================
6. Analysis:
Taking the answer with the highest probability, the results are given
below:
Case# Correct DecisionMaker
1
0 0
1 1
2
0 0
1 1
3
40 40
1 1*
4
0 0
1 1
5
0 0
1 1
6
40 40
1 1*
7
0 0
1 1
8
30 30
1 1*
9
0 0
1 1
10
0 0
1 1
11
30 30
31 31
12
0 0
1 1
13
0 0
1 1
14
0 0
1 1
15
0 0
1 1
16
0 0
1 1
17
30 30
31 31
18
0 0
1 1
19
0 0
1 1
20
0 0
1 1
17 answers are right, 3 answers are wrong. There is no class distribution
information to make further analysis. It appears that there are enough
instances for class (0 0), therefore, the classification for this class
is 100% correct.