The techworld is consistent: we love our buzz words. Often we forget the actual meaning and use behind the shiny new acronym or term. Right now one of the hottest topics across Silicon Valley and Silicon Beach is Big Data. And we are hiring data scientists as fast as we can. But what are we actually trying to achieve? True, we spent almost a decade aggregating data: web logs, activity logs, and we might actually have a fair amount of content across our SQL databases and archives. Heck, that’s what Amazon S3 is for, right? To store our stuff.
The algorithm will do it
True, there is a chance at low-hanging fruit here. But it depends on what your data looks like. If your first priority is to take some of your structured data, let’s say your web logs or activity logs, then you have two of the important pieces already: data and structure. You can let statistics loose against that – chances are, you’ll learn something. However, if your questions go deeper you might need more than an algorithm. While blind data-crunching against vast data-sets can help you glean something you couldn’t see before, chances are that your questions are more complex than that.
Sprinkle in some knowledge
Here it gets a little tricky to stay with CEO speak, but bear with me. There are really two pieces to the information tucked away in your stores: there’s the raw data, all of the columns of what looks to the normal person like the green code crawling down the screen in the Matrix and—very importantly—meta-data. Meta-data is data describing the data; this is where the structure lives. And this is where it gets exciting. This is also where most organizations have taken shortcuts and cut corners. I often find organizations that have done a great job making sure that their data is tucked away in some reliable fashion, but getting it back out had not been high on the priority list. Thus, the data describing the data and giving it structure is sparse at best. This becomes a problem when you try to pull information out to answer an actual business question. Meta-data management starts with good database architecture, but ranges all the way into knowledge bases. A knowledge base contains nuggets of domain expertise and knowledge about your business. It’s information, market, etc.. Bringing knowledge into the data-mix opens entirely new doors.
The ignition point, deriving meaning and information -> Magic!
This is where some magic lives. And it really hadn’t been feasible before. Bringing together machine learning (bottom up approach, letting the algorithms loose) and combining it with semantic tech that has been cooking in academia (specifically bio-informatics) for the last 2 decades is starting to allow some serious magic. From recommendation engines to contextual ad matching – this tech can really rocket-propel your organization.