Explainer: Big Data
Big Data is a term used to describe large and complex data sets that can provide insightful conclusions when analyzed and visualized in a meaningful way. Conventional database tools do not have capabilities to manage large volumes of unstructured data.
Big Data is typically defined as a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. Relational database management systems and desktop statistics and visualization packages often have difficulty handling Big Data. The work instead requires “massive parallel” software running on tens, hundreds, or even thousands of servers. Big Data must be processed with advanced analytic tools and algorithms to reveal meaningful information.
Over the past five years, new Internet and biometric technologies have emerged that are able to combine silos of data from different information sources into a single unified location where data can be analyzed.
Big Data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time. Big Data “size” ranges from a few dozen terabytes to many petabytes of data. Big Data is often described as a set of techniques and technologies that require new forms of integration to uncover large hidden values from large datasets that are diverse, complex, and of a massive scale.
Processes involved with Big Data include: analysis, capture, data curation, search, sharing, storage, transfer, visualization, and information privacy. The term often refers simply to the use of predictive analytics or other certain advanced methods to extract value from data, and seldom to a particular size of data set. Accuracy in Big Data may lead to more confident decision making. Better decisions can mean greater operational efficiency, cost reduction and reduced risk for enterprise and government.
Data sets have grown in size because they are increasingly being gathered by inexpensive and numerous information-sensing mobile and remote sensing devices, software logs, cameras, microphones, radio-frequency identification (RFID) readers, wireless sensor networks and biometric devices and databases. The world’s technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s. The challenge for large enterprises is determining who should operate Big Data initiatives that run across entire organizations.
Analysis of data sets can find new correlations, allowing users to spot business trends, prevent diseases, combat crime and terrorism, along with other data-intensive applications. Scientists, business executives, media, advertisers and governments alike regularly meet difficulties with large data sets in areas including Internet search, finance, business informatics, national security and policing. Scientists encounter technological limitations when studying meteorology, genomics, connectomics, complex physics simulations, and biological and environmental research.
Big Data systems are meant to allow these actors to more easily find correlations to aid in their problem solving. Big Data is often measured in terms of volume, variety, velocity, variability, veracity and complexity. Big Data systems typically leverage cloud-based servers within advanced data centers, instead of centralized mainframe processors.