A while back I set out to create a tool that would enable my doctors to quickly understand what had become a very complicated medical history. What I discovered is that this data is good for a variety of purposes, and one of them is creating teaching aids for health professionals.
The long story short is that I have been tracking all of my health markers (my lab results) in a tool I built called MetaMedic (https://metamedic.jellobrain.com). In addition to finding out information about my personal health, by tracking the numbers of different biological markers over time, I am also able to track the way that those markers inter-relate.
This is where things get interesting in a truly META and universal kind of way.
Breifly: the process
The first step is getting the data out of the database. In order to make this an easily repeatable process, I store the query that I perfected as a view, which then enables me to just export the view as a cvs in perpetuity.
The database stores both the original test value (which is a different number range for each test), and it also stores the 'normalized value' which takes each range as if they were all plotted between 0 (the low end of the healthy range for the test) and 1 (the high end of the healthy range for the test), and plots the specific value of the test using those boundaries. This makes them easier to compare to one another.
The value that we take out of the database is that normalized value.
Once that CVS get's uploaded, there is some data clean up to do.
First thing we need to do is to 'pivot' the data, which came out of the database as a long list (637 rows) with 3 columns (date, test, value), and ends up as a shorter list (19 rows) with 41 columns - one column for date, and one for each data marker or lab test.
In order to get the correlations to load correctly, I delete the date column because it is not necessary information for this data set. I also deleted columns that did not have enough data in them to be meaningful.
So we are left with 31 columns, one for each data marker.
Once the data is pivoted in this way, we can start to see where there are gaps in information.
If there is enough data to get a good average, what I will sometimes do is take the average of the column, and then place that average into any empty rows in that column. It perverts the data and correlations slightly, but can get you out of a bind if you just need to plug a few holes up.
Once the data has been cleaned up in this way, it is ready to rumble.
Correlations
The data not only keeps track of my individual markers over time (the initial intent behind this tool's creation), but it can also track the ways that these individual markers change together over time, and create what is called a correlation matrix that shows the ways in which the different markers relate to one another.
This brings us from data that relates to me individually, to data that is similar for everyone. Let me give you a couple of examples generated by different data visualization platforms.
Using R to create a PDF:
To achieve these results using R, this is what I did:
# import the libraries
> library(ggplot2)
> library(corrplot)
# read the cleaned up CSV I made from a MySQL view I created
> data <- read.csv('/home/eleven/Python/cbc_cmp_results_pivot_aug.csv')
# the data needs to be correlated with itself and placed in a matrix
> datacorr <- cor(data)
# call into being the PDF file
> pdf(file = "/home/eleven/Python/cbc_cmp_results_pivot_aug.png")
# run the corrplot command against the correlated data matrix and format it a bit
> corrplot(datacorr, method="circle", shade.col=NA, tl.col="black", tl.srt=90, tl.cex = .7)
# tell the machine to make the pretty picture
> dev.off()
Using Python and related libraries to create a JPG:
To achieve these results with Python with the same data set, this is what I did:
# import the libraries I need
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import rcParams
# read the data from the results file I created
data = pd.read_csv('/home/eleven/Python/cbc_cmp_results_pivot_aug.csv', header = 0, sep = ',')
# create the data frame
df = pd.DataFrame(data=data)
# create the correlation matrix
corrmat = df.corr()
# make the pretty picture BIG
rcParams['figure.figsize'] = 16, 19
# now implement the heat map against the correlated data
sns = sns.heatmap(corrmat, cmap="RdYlBu", linewidths=0.1, annot=True, annot_kws={"fontsize":5})
# and now plot and save it to a file
plt.savefig("/home/eleven/Python/medicalTests/cbc_cmp_Plotting_Correlation_HeatMap.jpg")
The (data)Point
The point is that here we are not seeing personal information about me, but information about how different body markers relate to one another. Does one get bigger as the others get bigger and to what degree? Do some get smaller as other's get bigger and to what degree? This is the kind of information that CORRELATION maps and plots can give you.
My hope is that eventually the Medical field is able to demonstrate things like how these correlations shift in different medical populations. Are they different for people with a history of Cancer, for example? Or Diabetes? Knowing these types of differences could really give us interesting information about the ways that the body changes when it is undergoing and fighting certain disease processes and healing from them.
In any case, cheers/here's to science and progress.