Abstract
Background: The Refined Semantic Network (RSN) for the UMLS was previously introduced to complement the UMLS Semantic Network (SN). The RSN partitions the UMLS Metathesaurus (META) into disjoint groups of concepts. Each such group is semantically uniform. However, the RSN was initially an order of magnitude larger than the SN, which is undesirable since to be useful, a semantic network should be compact. Most semantic types in the RSN represent combinations of semantic types in the UMLS SN. Such a “combination semantic type” is called Intersection Semantic Type (IST). Many ISTs are assigned to very few concepts. Moreover, when reviewing those concepts, many semantic type assignment inconsistencies were found. After correcting those inconsistencies many ISTs, among them some that contradicted UMLS rules, disappeared, which made the RSN smaller.
Objective: The authors performed a longitudinal study with the goal of reducing the size of the RSN to become compact. This goal was achieved by correcting inconsistencies and errors in the IST assignments in the UMLS, which additionally helped identify and correct ambiguities, inconsistencies, and errors in source terminologies widely used in the realm of public health.
Methods: In this paper, we discuss the process and steps employed in this longitudinal study and the intermediate results for different stages. The sculpting process includes removing redundant semantic type assignments, expanding semantic type assignments, and removing illegitimate ISTs by auditing ISTs of small extents. However, the emphasis of this paper is not on the auditing methodologies employed during the process, since they were introduced in earlier publications, but on the strategy of employing them in order to transform the RSN into a compact network. For this paper we also performed a comprehensive audit of 168 “small ISTs” in the 2013AA version of the UMLS to finalize the longitudinal study.
Results: Over the years it was found that the editors of the UMLS introduced some new inconsistencies that resulted in the reintroduction of unwarranted ISTs that had already been eliminated as a result of their previous corrections. Because of that, the transformation of the RSN into a compact network covering all necessary categories for the UMLS was slowed down. The corrections suggested by an audit of the 2013AA version of the UMLS achieve a compact RSN of equal magnitude as the UMLS SN. The number of ISTs has been reduced to 336. We also demonstrate how auditing the semantic type assignments of UMLS concepts can expose other modeling errors in the UMLS source terminologies, e.g., SNOMED CT, LOINC, and RxNORM that are important for health informatics. Such errors would otherwise stay hidden.
Conclusions: It is hoped that the UMLS curators will implement all required corrections and use the RSN along with the SN when maintaining and extending the UMLS. When used correctly, the RSN will support the prevention of the accidental introduction of inconsistent semantic type assignments into the UMLS. Furthermore, this way the RSN will support the exposure of other hidden errors and inconsistencies in health informatics terminologies, which are sources of the UMLS. Notably, the development of the RSN materializes the deeper, more refined Semantic Network for the UMLS that its designers envisioned originally but had not implemented.