Methods
Data
About 500 randomly selected health records had been manually analysed from April 2010 through May 2012 as part of a routine use of GTT to monitor patient safety in a 450-bed acute care hospital.
All narrative texts in these records were extracted to a corpus of XML-files. Software
We use the SAS® Text Miner and the SAS® Enterprise Content Categorization to build algorithms.
We build module-based algorithms with clinically specific word lists, and Boolean operators.
Speed
The algorithms typically read 500 records in about 15 seconds.
Results
Bedsores (pressure ulcers)
The poster shows our results for
bedsores (C08).
General findings
The positive predictive values varies, and can
be low for some triggers, but the negative predictive values are
high (see below).
Thus, if an algorithm scores negatively,
humans will usually not find anything either.
All triggers
Pos PV = Positive predictive value
Neg PV = Negative predictive value
C01
Pos PV = 70% [95% CI: 56% to 80%] Neg PV = 99% [95% CI: 97% to 100%]
C02
Pos PV = 45% [95% CI: 36% to 55%] Neg PV = 95% [95% CI: 93% to 97%]
C05
Pos PV = [None found]
Neg PV = 100% [95% CI: 99% to 100%]
C07
Pos PV = 35% [95% CI: 17% to 59%] Neg PV = 100% [95% CI: 98% to 100%]
C08
Pos PV = 56% [95% CI: 42% to 69%] Neg PV = 97% [95% CI: 95% to 98%]
C09
Pos PV = 76% [95% CI: 67% to 83%] Neg PV = 96% [95% CI: 94% to 98%]
C11
Pos PV = 26% [95% CI: 18% to 37%] Neg PV = 97% [95% CI: 95% to 98%]
C14 + C15
[Not estimated]
M10
Pos PV = 52% [95% CI: 41% to 62%] Neg PV = 100% [95% CI: 99% to 100%]
S01
Pos PV = 60% [95% CI: 36% to 80%] Neg PV = 99% [95% CI: 98% to 100%]
S11
Pos PV = 24% [95% CI: 15% to 36%] Neg PV = 97% [95% CI: 96% to 99%]
Tests
We have tested the performance of
the algorithms using about 250 new health records from
other sections/departments in the same hospital.
The algorithms perform well compared to findings by human reviewers, in particular as regard negative findings: The negative predictive values are consistently high.
The tests also revealed that disagreements among GTT-reviewers (2-3 different) of health records are common, and that the computer algorithms perform well when we compared the results to the consensus eventually obtained.
Challenges
The narrative texts. Writings by physicians and nurses in health records are often informed telegram style notes with many acronyms and context-dependent abbreviations, and with spelling errors.
Time. Developing good algorithms requires repetitive cycles of modifying word lists etc., running the algorithms and manually controlling the findings.
Brought in or acquired? It is difficult, for humans as well as for computer algorithms, to distinguish between conditions present on admission and problems acquired during a hospital stay.
|