![]() |
|||
| Migratek - Rapid Data Migration Services | |||
|
Data Cleansing ServicesA general discussion of Data Cleansing can be found in the page About PACS Data Migration. When you need itYou need Data Cleansing when the quality of your historical data is not sufficient to your forward-going needs. This may be because new workflow tools increase the benefits of well-organized data, or that several systems are being combined in the new archive and patient folders need to be merged, or that old practices or equipment compromised data quality. Just as often it is because moving to a new system is deemed an appropriate time to clear out accumulated detritus. Discarding obvious junk is relatively easy and fits within the scope of normal data migration. Source archives may contain studies without valid images, test images created by maintenance personnel and inadvertently stored in the archive, derived images that were discarded but never purged, images of phantoms or lab animals, imported demonstration images with names of dead Presidents, and a host of other exceptions that for the most part should never been in the archive in the first place. These cases typically have demographic attributes that are obviously defective and easily spotted. Archived data may also contain violations of the DICOM standard. For example, the DICOM Attribute (0018,0015) Body Part Examined is a CS datatype that allows only the characters {A-Z, 0-9, _} and may only be 16 bytes in length, but some archived data may have a whole line of free text in this Attribute. Such fields can in general not be truncated to valid form without a determination that the lost data is not clinically significant. A limited amount of such exception handling is normally included in the scope of standard migration projects. Patient-level cleanup may be required when multiple facilities are combined into the new system, as when institutional mergers have resulted in the adoption of a single patient index. Such projects usually start with a “Gold Standard” master patient list from a hospital information system, that contains patient Name, patient ID, birthdate and gender, and other IDs by which the patient is known. For each study, the cleanup process finds the best match in the “Gold Standard” list, and imports the master demographic attributes into the image data. Study-level matching is the most reliable form of data cleansing, and results in image sets that are linked to their respective reports. This linkage is routinely provided by Modality Worklist support today, but most PACS archives being replaced today contain many Studies acquired before Modality Worklist was implemented. The streamlined user interfaces offered by some PACS today benefit highly from linkage of historical reports to their respective image sets, a factor that speaks toward a study-level matching project. Study-level matching also provides more reliable demographic identification, because for example, "John Smith, DOB 8/8/80, who had a CT exam on 9/6/2001" is obviously a more solid match than "John Smith, DOB 8/8/80". How we do itMigratek performs all matching-based cleanup activities before any of the images are moved. We obtain a Study inventory from the source archive, and an exam list furnished by hospital HIS/RIS personnel. A typical Study listing would include the columns:
Patient ID A typical exam listing might include the columns:
MRN There is much variation in the names and the meaning of the HIS/RIS data, which we discuss and clarify with the customer for each job. The meaning of DICOM study attributes must also be understood in the context of how they are used in each local setting. Working with the customer, we determine:
We then run the matching rules against the customer’s data and assess the result. The count of matches yeleded by each rule, and the matches made are visually inspected. Rules are adjusted and the process is repeated. Typically three to five iterations of this process are required. The end result of the matching process is a Stream Processor Instruction File (SPIF), which defines the set of operations to be performed on each DICOM information object when it is migrated. How much is enough?Like many endeavors, PACS data cleansing is subject to a Pareto 80-20 rule, and most of the benefit can be obtained by modest efforts. The exceptions encountered in historical clinical data include many that can be detected by a simple algorithm, a number that require carefully crafted rules, some that yield only to individual investigation that looks at the images, and a few that can be resolved only by a physician. Given that a majority of historical data is never accessed again, appropriate discretion is required to decide how much effort to expend at the time of migration. For most cased requiring individual investigation, and for all cases requiring a physician’s help, it is better to send the exception cases to the destination PACS, perhaps with a flag such as a prefix on the patient ID, and defer the individual investigations to the time and context where that exam is sought for a clinical purpose. Our approach is designed to elicit and make clear the point of diminishing return. Doing so has always led us to an easy consensus with the customer on the appropriate level of data cleansing effort. |
||
| LAITEK Inc. • Homewood, Illinois USA • (708) 960-4164 Fax: (708) 957-4468 | |||