Category Archives: Open Data

2103-01: TMI

Just a quick post for the start of the year.  At the end of 2012, I participated in both the National Geographic Genographics projects ( and signed up for (  Both of these programs test your DNA to trace your long term genealogy (National Geographic) and participate on health related research to see if your DNA contains markers for known illness and promote future research (23andMe).  In conversations with my wife and family (both her side and mine), I found many different perspectives on this topic.  I suspect that there’s likely some spiritual issues underlying the different view points.

To me, it’s interesting to know where I came from.  I have a unique mix of backgrounds in my blood (North Africa – a land ravaged by many invaders, Dutch – historically tradesmen and travelers, Native American – rumored &  ravaged by Europeans), so it will be quite interesting to see what National Geographic tells me about my ancestry and my path.  At the very least, it will possibly tell me about any of the Native American blood that I may have.

The 23andMe program is interesting in different ways.  About 11 years ago, I had my gall bladder removed.  I was in my late 20s at the time, and had severe stomach cramping issues for the better part of my 20s.  Every diagnosis was tied back to my diet (not good as a student or startup entrepreneur) and the degree of stress in my life (also not good as a design student or startup entrepreneur).  All doctors suggested that I just mellow out, and deal with the stress part.  While that was true, the doctors overlooked the fact (which I didn’t know myself) that both of my parents had their gall bladder removed in the mid 30s.  They never asked me the question, because the probability of gall bladder issues in a 20-something was extremely low.  Interestingly enough, there is a higher likelihood to have gall bladder issues if your parents had gall bladder issues, and I was told (by a doctor at a later time) that gall bladder issues in 20-somethings are on the rise (tied to stress, drinking, and diet).  More recently I had a severe case of viral meningitis (the good kind of meningitis – if there is such a thing).  The case was so bad that I had been hospitalized three times, and was run the longest battery of test imaginable at MGH (a world class leader in these types of illnesses).  They could never diagnose what it was with any certainty.  Knowing this, I’ve also wanted to know “what else”.  Could I prone to any of the known diseases?  Are there things that I could do today to help age peacefully?

In the various conversations that I’ve head since taking the tests (results are still being processed), it dawned on me that not everyone is as open as I am, but also that I had I am sharing a ton of information.  Generally speaking some people felt knowing so much about their own DNA could give them information that they would not know how to process (so what if you find out that you’re likely to get Alzheimer’s?), others felt that having this information would not lead to anything concrete in their diet and habits (so what if I’m likely to get a heart attack?  I’ll still eat my burger rare.)  I respect both of those opinions and also recognize that sharing this information with an organization such as either of the ones that I did creates a potential data privacy risk (could we have a future where employer screening starts with a check of your DNA? – ok that’s paranoid; what if someone breaks into their data warehouse?).

In any case, to me, I feel it’s important to know.  Just so that, well… I KNOW.  It will be interesting, to see what my (long term) ancestry is, and what disease factors are.  I’ll share more knowledge as it comes online.

What’s interesting about “Big Data”

Recently there’s been a lot of writing about “Big Data”.  Many organizations and consultancies have begun to speak, write, and offer view points on big data.  To me, big data is nothing more than the marketing of analytics that has been going in many organization for severals years/decades.  For example, banking, health care, pharma, insurance, and retail have all leveraged some form of big data analytics to support their business decision making for several decades.  Most of us were pretty unaware of this, but have increasingly become aware as personal information data breaches have revealed how much information these organizations actually collect.

So why the focus now?

Hard to tell really, but I suspect that there are a few factors at play:

  1. Open Data: One of the components of big data is the ability to assemble data from different (disparate) sources together and create new levels of insight.  This would not be possible without open data (i.e. “free as in beer” data).  This often comes from public agencies in the Western hemisphere, but increasingly we’ve seen some companies make some of their data available openly.
  2. Computational power and storage costs:  Moore’s law being in effect, we are at point with the rate of improvement on computational power, mixed with the rate of cost reduction means that we now have more cheap computational power than ever before, and thus more opportunities to experiment with this power, or run analyses that were impossible before.  At the very least, you can see examples such as what Microsoft has done with Excel and Access as representative of this trend (without commenting on cause & effect).
  3. Broader focus on analytics:  In the last 10 years, we’ve seen a greater focus on analytics across the board (and this may be linked to the above too).

So what’s interesting about Big Data

To me, the buzzwords are not interesting.  What I find to the be the most interesting piece with big data is the open culture that is created and the ability to “mash” otherwise disparate data.  I believe this is creating a lot of junk analyses out there (i.e. falsely implying causality because of coincidence, as opposed to truly linked causality).  However I do believe that these junk analyses will help augment the dialog about the use of data and analytics, but also help those of us who can demonstrate deeper and thorough appreciation of analytics and data, contextualized in practicality and pragmatism of meaning to provide distinctive answers and help drive new idea generation.