As hospital and insurance companies know, a little prevention is worth a lot of cure. Getting patients to stay healthy by following their prescribed regimens (including medication) is key to keeping them out of the hospital. This is especially true in the case of diabetes, which affects nearly one in 10 adults (nine percent) according to the World Health Organization (WHO). By identifying at-risk patients early and ensuring they stick to a prediabetes regimen, healthcare organizations can delay the onset of diabetes by as much as 10 years.
The availability of electronic medical records and the thorough e-documentation of patient-doctor interactions now makes it possible for healthcare companies to analyse more data to improve outcomes. The smart organizations are using tools like math, statistics, data mining, simulation, prediction and financial models to make sense of huge amounts of data.
Predictive analytics that can effectively identify patients who are at a high risk of developing diabetes or other chronic diseases can mean a boon to the medical industry by leading to better treatments and more timely interventions. This can bring down costs as well by reducing hospital readmission and resulting in better management of healthcare resources. Beyond hospitals, health insurance companies and pharmaceutical companies also have a vested interest in healthier patients. Treating diabetes, for example, can cost $200,000 over the lifetime of a patient; far more than the cost of taking preventive medicine.
When calculating the risk associated with patients, a host of factors come into play - family history, demographics, socio-economic profile, lifestyle and others. For instance, factors such as age, blood pressure, sugar, cholesterol levels and family history of chronic disease are studied with the help of statistical models that can estimate the progression of the diseases and even foresee outcomes. It's often difficult, however, to find all of this data in one location, so data scientists have to collate information from a variety of sources include sales data, credit history and tax history. Comorbid conditions can also provide key insights into both positive and false-positive identification of disease. A patient who has a cardiac issue, for example, is more likely to have high cholesterol and hypertension as well; understanding this can have a strong bearing on treatment decisions.
The real challenge is the sheer size of the data involved. An insurance company, for example, may have tens of millions of patients, billions of claims and 50,000-plus medical codes to process and analyse. These kinds of "big data" analyses often prove problematic with conventional analytics techniques, taking days or weeks to complete. It's not unusual for insurers to have a dynamic customer base as employees move in and out of their system, and a 25-day turnaround for analysis is ineffective when insurers need to re-analyse their patient pool every 30 days.
The problem with conventional analytics is threefold. First, the I/O time it takes to move data into an in-memory appliance is significant; a handful of terabytes can take an hour or more to move. Second, the data can no longer fit into memory in one piece, which forces the data scientists to sample their data, leading to a crude model. Finally, the conventional approach leads to several copies of data, and when data is changing fast, this can lead to analysis of outdated data.
To counter these shortcomings, data analytics solutions have emerged that eliminate the need to move the data out of the database for analysis. Instead, these in-database analytics tools move the analytics into the database. This approach also leverages the massively parallel map-reduce approach to solve complex mathematical functions that are typically CPU intensive.
By using in-database analytics, insurance companies can now analyse 100 billion rows of data in 27 hours instead of 22 days. More importantly, they can identify at-risk patients sooner and keep them healthier longer.
Guest Author
Partha Sen is co-founder and CEO of Fuzzy Logix, a Big Data analytics firm