A Window into Data Transparency

Main

Here is my own experiment in data transparency in data visualization. It’s from a paper of mine on a disorder called familial disautonomia (FD, in the figure).  The paper compared physiological variables from 25 children with the disorder to 25 age/gender/race-matched controls (CN) based on in-home EKG and respiration recordings. The figure is rather dense, so by way of explanation I’ll break it down. There were recordings done during the day (left) and night (right), for the two sets of subjects (upper and lower blocks).

block

The multicolored blocks are representations of the heart rates for all of the children in the study for all of the time they were studied (abscissa). For the day studies, each child had two studies of two hours each, which are shown as two color bands within each block. There was only one night study for each child, so in that block each band is twice as tall, but each subject’s two daytime bands are adjacent to his/her nighttime band. In addition, in all four blocks, the youngest patients are arrayed at the top of the block, and the oldest at the bottom. Missing or artifact data epochs are shown in white. The colors themselves indicate the heart rate of that child coded as a percentile of his/her matched control (thus, self-normalized for the control children).

breakout

Leaving aside the clinical interpretation of the data in terms of the particular pathology, my goal with this figure was to be as forthcoming as possible in showing all the data, warts and all. This required some compromises. The normalization to control values was needed because the absolute heart rate values were too different to be represented on a single color scale. This means that absolute heart rate is not represented, and high or low rates outside the control distribution saturate to the ends of the color map. Also, the full temporal resolution of heart rate changes can’t be represented at this graphic resolution, so information about high frequency heart rate variability is lost in this figure.

I haven’t seen this kind of presentation of data in many physiology papers, though similar figures are often seen in omics research. My goal was to show as much of the heart rate data in as much detail as possible, including covariates that were not specifically addressed in the results (like age). Aside from the general incentive for transparency, meant to let the reader to assess my conclusions, I hoped that this format would allow readers to engage their own hypotheses with a large dataset. Of course, this openness also exposes the data to a level of scrutiny that could be avoided with summary statistics. The readers can see exactly how much data had to be thrown out, for better or worse.

I’m curious to know what people think of this approach. Is the goal of transparency reasonable, or is this data-dense figure more of a distraction?

Advertisements

~ by nucamb on February 1, 2013.

5 Responses to “A Window into Data Transparency”

  1. I really think the decision whether the figure is reasonable or data dense and a distraction is will be dependent on the reader. Experienced scientists used to looking at heat maps may readily follow the figure. So I guess what you should ask yourself when deciding to include it or not is whether the target audience is readily proficient in reading such heat maps. If not, I would suggest place it into supplemental data because the figure is extremely data dense.

    • Yes. I do think those are the right questions to ask yourself about any figure. My more general question was whether trying to include representations of more of the data is worthwhile from a data transparency standpoint or a scientific communication standpoint. If I remember correctly, Pediatric Pulmonology doesn’t host supplements, and it seems like some journals are backing away from that model (though for the profits they are making they could host all our data indefinitely). Do you think we physiologists should do what they do in bioinformatics, and upload all our raw data to integrated databases for everyone to use?

  2. That is a beautiful figure. You should check out this book: The visual display of quantitative information by Tufte. There are some nice ideas of how to include more data in a figure without adding confusion. We have a a few of his books (can’t say I’ve read them all).

    • Thanks. I’ve looked at Tufte’s work a bit. He seems to be the great guru of data visualization, but what little I’ve read hasn’t been so compelling that I had to buy his books. I also have to admit that the fact that he promotes his work through paid seminars puts me off a bit. I realize that’s a bit unfair of me, since we all have to make a living, and perhaps that’s the best way for him to reach his particular audience (MBAs?).

  3. […] baseball perhaps, but in my other blog, I go into some detail about a figure I made for one of my papers. In the figure, I’m trying […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: