TImE TO FACE BIG dATA’S BIG CHAllENGES - bcs.org real challenges in mastering big data for...

2
ENHANCE YOUR IT STRATEGY TWENTY:13 127 Jeff Morris, Vice President of Product Marketing for open source business intelligence specialist Actuate, explains why most discussions on organising big data centre on big data architectures – and not on big data applications. Wherever you turn, the ‘answer’ to all the CIO’s problems from the IT industry at the moment seems to be ‘big data’. All well and good, but this technology-focused view overlooks the real-world work choosing the right combination of ways to maximise your big data opportunities. Big data’s big hype has overshadowed the real challenges in mastering big data for business advantage. Organisations have dived into detailed questions, but there simply isn’t a ‘one size fits all’ answer when it comes to big data. That’s because its exploitation depends on the specific requirements and goals of your organisation. Getting big data right is about the analysis you want to perform, then applying the proper techniques to support that analysis – that is, recognising the different types of big data sources and identifying the most appropriate technologies in each case. It’s really only at that point that the CIO is in a position to analyse, visualise and operationalise that promising big data store and create business value. Social network profiles and social influencers Start by exploiting the rich resource represented by the world of social media, namely user profiles from social networking sites, search engines or interest-specific sites. These can be mined for individual profiles and target group demographics. Another potential and increasingly influential data source are contributions from commentators, analysts and subject experts to articles, user forums, blogs, Twitter comments, as well as user feedback from Facebook, catalogue and review sites, complemented TIME TO FACE BIG DATA’S BIG CHALLENGES Morris.indd 127 01/03/2013 12:03

Transcript of TImE TO FACE BIG dATA’S BIG CHAllENGES - bcs.org real challenges in mastering big data for...

Page 1: TImE TO FACE BIG dATA’S BIG CHAllENGES - bcs.org real challenges in mastering big data for business advantage. Organisations have dived into detailed questions, ... Teradata’s

ENHANCE YOUR IT STRATEGY TWENTY:13 127

Jeff Morris, Vice President of Product Marketing for open source business intelligence specialist Actuate, explains why most discussions on organising big data centre on big data architectures – and not on big data applications.

Wherever you turn, the ‘answer’ to all the CIO’s problems from the IT industry at the moment seems to be ‘big data’. All well and good, but this technology-focused view overlooks the real-world work choosing the right combination of ways to maximise your big data opportunities.

Big data’s big hype has overshadowed the real challenges in mastering big data for business advantage. Organisations have dived into detailed questions, but there simply isn’t a ‘one size fits all’ answer when it comes to big data. That’s because its exploitation depends on the specific requirements and goals of your organisation. Getting big data right is about the analysis you want to perform, then applying the proper techniques to support that analysis – that is, recognising the different types of big data sources and identifying the most appropriate technologies in each case.

It’s really only at that point that the CIO is in a position to analyse, visualise and operationalise that promising big data store and create business value.

Social network profiles and social influencersStart by exploiting the rich resource represented by the world of social media, namely user profiles from social networking sites, search engines or interest-specific sites. These can be mined for individual profiles and target group demographics. Another potential and increasingly influential data source are contributions from commentators, analysts and subject experts to articles, user forums, blogs, Twitter comments, as well as user feedback from Facebook, catalogue and review sites, complemented

TImE TO FACE BIG dATA’S BIG CHAllENGES

Morris.indd 127 01/03/2013 12:03

Page 2: TImE TO FACE BIG dATA’S BIG CHAllENGES - bcs.org real challenges in mastering big data for business advantage. Organisations have dived into detailed questions, ... Teradata’s

TWENTY:13 ENHANCE YOUR IT STRATEGY128

by user-review-based sites like Yelp and so on.

In terms of techniques, this centres on application programming interface (API) integration, e.g. identifying customer affinity based on sentiments gleaned from Facebook ‘likes’, positive tweets and Yelp reviews. To be properly processed, these call for an understanding of multiple APIs and data integration tools. You will then have a truly relevant database of individuals with similar interests that can be matched to local business interests by combining them with ratings, geographic locations and reviews. Consider what is now possible with this integrated big data database; you can market to its members, traverse these connections to see who leads back to your company and so on.

Note, heuristics applied to the content of a Tweet or the positivity of the Yelp review are also required. This type of advanced social media mining has to involve natural language processing (NLP) and/or text-based search to assess the evaluative nature of comments and derive usable insights.

The point is there’s some real work needed here, way beyond some hand-waving about Hadoop being the ready answer.

Activity-generated dataYour next big data big source area is activity-generated data from computers and mobile logs and increasingly data generated by processors in vehicles, video games and other home electronics. Experts say that soon we’ll need to add in links from all manner of everyday appliances, as the so-say ‘Internet of Things’ kicks in. In this context, parsing technologies may help make sense of these semi-structured text files and documents. Log parsers are a popular first candidate for Hadoop deployments here. You are also going to generate lots of files relatively easy to write MapReduce functions over.

SaaS and cloud applicationsMeanwhile, cloud-based data from SaaS apps, such as salesforce.com and their brethren, will help, but will call for specialised distributed data integration technology, in-memory caching and API integration in order to bring online.

Public data, private gain?There is also a wealth of publicly available data – think Microsoft DataMarket,

Wikipedia and so on. All of these potentially valuable resources require the same types of text-based search, distributed data integration and parsing technologies we’ve discussed, but they add a new dimension of complexity due to network bandwidth and that bottleneck factor when moving large amounts of data across firewalls.

Why your yesterday will improve your tomorrowFinally, there are all those electronic filing cabinets that are bursting at the seams with original, print-format-only documents. Let’s make them contribute to their upkeep and fold them into your big data endeavours. Parsing and transforming this legacy content for your big data analysis purposes can be done using specialist document management tools.

Hadoop MapReduceSo, now where will you put all the big data? This is usually where most big data discussions begin – around storage and analysis architectures.

Storage architectures such as next-generation Hadoop and MapReduce-style tools for handling and parallel parsing of data from logs, web posts etc. promise to create truly new forms of useful data. Columnar or NoSQL databases or even a Hadoop cluster, where Pig is used to gather and integrate the data prior to its retention within Hbase, could also be your friend here.

Columnar/NoSQL data sourcesAnother alternative is to use a NoSQL data store like VoltDB or Cassandra. What’s nice about such tools is their ability to absorb new transactions quickly and process queries in real-time so as to fill gaps in big data environments. If those products do not appeal, then data can be collected in traditional big warehouses e.g. Netezza, Teradata’s Astor or Oracle’s Exadata appliances.

Having dealt with the data integration and data mining aspects to all this, what about the analysis? Once the data is prepared, then correlative and predictive analytic exercises can take place.

Big data style analysis is a multi-step process that includes setting analytic goals and performing and refining the analytic formulas upon sample data. There are a host of proven products that can help at this juncture, like Pervasive

Software, KXEN, Quiterian, FuzzyLogix and Revolution.

Here’s the rub. What comes out is typically a list of relevant targets. Real meaning can be found once this list is combined, enhanced and integrated within traditional BI packages that have access to organisational data sources, such as your CRM or financial package.

Do all this heavy-lifting and the resulting dataset will represent previously hidden, solid gold business opportunity. Truly, valuable big data.

Making big data ‘meaningful’That big data ‘meaning’ we’re talking about is often best derived from visualisations, such as time series charts or dynamic cross-tabs. Familiar BI products that can both integrate disparate sources, as well as define highly visual content as dashboards and reports are ideal. Even better, a platform that both scales and is able to transform content from print to web to mobile interfaces is the real deal here.

Making insights deliverSee how much actual work is conveniently brushed under the carpet in all too many big data presentations? The reality is that big data’s real value will only be realised when it becomes part of the normal BI fabric of the organisation. Insights are only valuable if they are shared and ultimately acted upon. Getting past the big data big hype and working to operationalise big data, making it an everyday reality in actual business, is an effort worth making. Just imagine life when big data becomes a reality. Businesses will be able to predict their customer’s desires and offer products or services we really need. Your car, phone, household appliances and office work station will be able to talk you through your day and make intelligent recommendations. All that data will finally start paying its rent and making your business better and more profitable as a result.

Forward-thinking CIOs and business leaders should welcome the big data opportunity to move the discussion away from tools to real business benefits. That’s a result that can only be achieved via a strategy that takes your business through the organising, visualising and operationalising stages we’ve discussed get you to where you really need to be with big data.

CLoud CoMPutiNg & dAtA MANAgeMeNt

Morris.indd 128 01/03/2013 12:03