Making sense of Big Data
Data science and analysis are key skills in need
By Kang Ye-won
Big Data has been one of the top buzz words this year. Yet, except for a handful of global platform providers such as Google, Facebook, Apple and Amazon, most companies haven’t quite figured out how to utilize the massive amount of data streaming on a daily basis.
Almost 73 percent of companies increased collecting data in their organization but 53 percent of them said they use only half of this, according to SAS Institute, a business-to-business (B2B) software provider.
So what is at the core of the Big Data fanfare?
It’s about the technology to deal with Big Data that matters. Because in the smart era, anything in our daily lives can be recorded as data from what we chat about and text on our cellphones to where we eat lunch during the day to what transportation we take on commutes and what we do while on the move.
And if smart technologies such as smartphones are to play the role of our senses to perceive information from the slew of data: Big Data technology is like our brain making sense of the information.
Insiders describe Big Data has characteristics of three V’s — volume, velocity and variety.
And the skills on demand are abilities to access and extract key information out of the data deluge, analyze and update the real-time streaming of data, and visualize the outcome for better understanding and application.
There is a global shortage of so-called “data scientists” and there lies future opportunities, experts say.
Defining Big Data
Although there’s no specific threshold in defining Big Data, the size of data has grown from hundreds of gigabytes, which is about a laptop’s storage size, now to terabytes — thousands of gigabytes — to petabytes and even up to zetabytes.
To put the numbers in perspective, Facebook’s 845 million users’ log data amounts of more than 25 terabytes every day. Twitter generates more than 12 terabytes of tweeting data each day, according to Lee Ji-eun, a manager at IBM Korea.
Also the form of today’s data is not structured as in Excel files but rather unstructured in shapes ranging from the 140-character tweets to mobile photos to YouTube videos.
Despite the buzz, Big Data is not a new concept, at least in finance, said Jim Nelms, the chief information security officer (CISO) at the World Bank, during a Finance IT World conference held by IDG in Seoul.
Financial institutions such as banks and large retailers have churned out volumes of data on transactions and customer profiles.
“The biggest paradigm shift is that it’s fast and accessible to many more people,” Nelms said.
“The question is how we capture the capabilities we have and harness them for profits,” he said.
Although no one has quite figured out how to fully monetize Big Data, the possibilities are infinite.
For instance, the potential value of data in the U.S. healthcare sector was estimated at $300 billion every year, two-thirds of which come from a reduction of national expenditure by about 8 percent, according to a report by McKinsey Global Institute in May, 2011.
In retail, up to a 60 percent jump in net margins was expected and manufacturers can cut nearly half of product development and assembly costs.
In technology, the potential value shoots up exponentially. Personal location data alone was projected to lead to more than $100 billion in revenue for service providers and up to $700 billion for end users.
The biggest utilizers of Big Data have been the B2B software providers such as IBM, Oracle and EMC.
IBM’s Watson, a supercomputer that won Jeopardy! last year is a good example of using Big Data technology. It stored 200 million pages of text in its memory and included advanced search and analysis programs, which enabled it to go from data digging to answering within three seconds, to ultimately beat the other two former champions.
These large solution providers have been aggressively buying smaller firms specialized in Big Data. This April, IBM acquired Vivisimo, a Pittsburgh-based firm that helps organizations access and analyze Big Data.
Another B2B solution provider is MicroStrategy, a Washington D.C.-based company that focuses on mobile intelligence.
“As our competitors have been acquired by bigger players, (MicroStrategy) is one of a few left in the field,” said Chung Kyung-whu, a senior sales engineer at MicroStrategy in Korea.
Following the social media trend, MicroStrategy recently invested in analyzing Facebook data collected from mobile and tablet users. For clients, including clothing brand Guess and professional sports teams, including FC Barcelona and the Washington Capitals, the firm runs apps that collect Facebook fans’ personal information including gender, age, marital status and geographic location as well as income and education level, political stance, and likes and dislikes.
But the company hasn’t been able to connect the dots on how to monetize the data and analysis — that is the challenge for other data analytics.
“It’s been about a year since that we started to dig into social media data, and in eight out of 10 cases, no insights were found. ... most companies are at a stage of market sensing and trawling tweets at most,” Chung said.
Experts say the real business model lies in the “intelligent platform” such as an ecosystem built by Google. Most recently, the search giant, which has focused on providing consumer-based free services, released BigQuery, a cloud-based data analytics tool for corporate customers, which can scan terabytes of data in seconds.
Google’s platform has gradually expanded the field of its data in industries from media and healthcare to game and local traffic, according to Moon Byoung-soon, a researcher at the LG Economic Research Institute (LGERI), in a memo. And in order to succeed, Moon said Google had to master both software and hardware.
“Hardware is a key component in terms of distributing services, and the user information collected from them is more precious than anything else,” he said.
Similar to Google, other platform giants Facebook, Amazon and Apple have all accumulated stacks of user information locked into their own services, which have a mounting potential for Big Data business.
Just last Friday, Facebook’s most anticipated IPO was set at $38 a share to give the company a valuation of $104 billion, the largest gain by an American company. Despite some investors’ worries over the firm’s not-yet-so clear strategy on monetization models, the raging interest in the firm’s shares indicate the power of its 845 million users’ personal data.
Local status quo
In a recent study by the Korea Communications Commission, more than half of 50 million Korean cellphone owners use smartphones. Despite the high penetration of broadband across the nation, Korea’s data only takes up about 9 percent of the global data traffic, said Moon, in a report. The local portal site providers such as NHN and telecom companies including SK, KT and LG, own torrents of data in petabyte sizes but they’re mostly limited to domestic use.
The closest Big Data local technology is SK Telecom’s T Map, a real-time GPS app on mobile and tablet devices, said Lim Byung-hwa, a senior researcher at the Korea Economic Research Institute.
One of the reasons for Korea’s lag in Big Data is the government’s strict regulations on data collection.
Korea passed the Personal Information Protection Act, which took effect last September and requires any “data handler” to provide technical security at each step where the personal information is handled. The law also mandates the data handlers to obtain individuals’ consent when the information is collected and provide the use of data. Any employers or business companies who don’t comply with the rules face up to 10 years in prison or 100 million won in fine.
Google recently stirred a controversy on privacy glitch by secretly installing cookies on iPhone users’ Web browsers and tracking their search histories. Although this is a matter of ethical debate in the U.S., it is illegal in Korea and local companies have a very limited access to collect personal information for the use of target marketing, Moon said.
“The government’s regulation needs to loosen up, too, but at the same time, companies need to invest more in protecting online privacy,” he said.
Koreans also have high resistance toward sharing their information online due to previous privacy breach cases.
“If people don’t trust the service providers and refuse to share their information, the business won’t develop further,” he said.
Another challenge in Big Data is a shortage of workers with the skills set.
The so-called data science is projected to become the next wave of opportunities. The talent is not only limited to abilities in computer science, engineering and maths but also managers and analysts, who can capture insight from large data sets, will be in demand.
The U.S. alone faces a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million experts to handle Big Data and make decisions based on their findings, according to the McKinsey report.
Security of data is another area in need.
“There’s a high deficit in security jobs,” said Nelms. As technologies evolve, positions such as chief information officers (CIOs) or CISOs are challenged to take charge in securing sensitive data in more delicately-controlled fashion.
The field technologists are expected to have not only technical degrees such as science of security or encryption, but also expertise in the type of data they deal with, whether it’s in finance, health care or manufacturing.