Carey Drake's MIST blog: Big Data Research

Everyday, we create 2.5 quintillion bytes of data–so much that 90% of the data in the world today has been created in the last two years alone. - IBM

Where Big Data Comes from...

Massive datasets that have outgrown the capabilities of relational databases can be considered Big Data. The growth of technology use per person and the increasing number of connected devices and analytics gathered creates a need for massive storage and the ability to manipulate it in a useful way.

According to IBM Big Data is comprised of Volume, Velocity, and Variety. Volume being the fact there is a huge amount of data, Velocity, the speed at which we want the data (i.e. streaming, always available, instantaneous, point of sale), and Variety, like video, audio, images, apps, text etc.

NoSQL positions itself as a way to manage Big Data. It does not rely on the relational model. They tend to be superior in scalability and performance. It does not require a fixed table schema with relationships. Companies like Google use it over traditional databases because the MySQL structure could not keep up with the Volume or Velocity of data.

Hadoop by Apache is a software project that helps with Big Data Management. It allows for the distributed processing of large data sets. It is designed with the ability to scale from one computer to clusters of servers with built in software to detect hardware failure for maximum uptime. Hive is an extension of Hadoop that provides data summary features, ad-hoc queries and analysis of large data sets. It has functions similar to that of Google MapReduce but can also work with data stored in SQL.

Big Data is not going away and solutions like NoSQL and Hadoop to help manage it will become increasingly important.

Carey Drake's MIST blog

Wednesday, March 28, 2012

Big Data Research

No comments:

Post a Comment