In this era of almost total internet connectivity, data flows in from all directions, often in a haphazard fashion. This data is quite valuable if the required information can be extracted out of it properly. But since the data is in such large quantities, it is very difficult to handle all of that data using the existing means. This is where Big Data comes in.
Big Data is a term widely used to describe very large data sets which have a size which is so large that it can’t be handled using the common software tools prevalent in the market used to handle data. This data may be generated by social networks, banking, and financial services, e-commerce services, web-related services, internet search indices, scientific and document searches, medical records, web logs etc. Every app that you use generates a large amount of data which big companies use to find trends in the data or even to extract useful information from the data.
Big Data is associated with three Vs defining properties and dimensions:
Volume: This is the factor Big Data is most known for. Since we’re dealing with a lot of data, the volume is very large, almost incomprehensible. Facebook stores roughly 250 billion images. It has more users than China’s population and the number just keeps on increasing. Considering this factor, we can imagine how much data is generated by every single user. Much of that data is very valuable for companies. Let’s say, Facebook decides to add a new option for posting status updates. For doing so, Facebook would have to go through a large volume of data to find whether the users would like the new update or not. Based on this understanding, Facebook would decide whether it should go ahead and bring in the new option or not. And this is just one such example, m-commerce companies use information about the purchases to decide which what time of the year would be good for giving additional discounts which would lead to a lot of sales. The amount of data stored is insanely large, and with every passing second, a number of data increases.
Velocity: 250 billion images is a huge amount of data, isn’t it? But imagine the amount of increase in the data considering that Facebook users upload more than 900 million photos every day. This brings us to velocity. Velocity is the measure of the rate of increase of data. To handle this insanely huge amount of data, we would certainly need more responsive means of handling data.
Variety: Data comes in all shapes and sizes. From photographs to sensor data to tweets and encrypted data packets, data is everywhere, and in a surplus. The data is different from each other. Since there is so much variation in the types of the data, we can’t store data in the old rows and columns representation.
Additionally, we could consider the value to be a factor defining Big Data. It is important to choose which data is valuable and which data isn’t since a lot of resources are spent in handling the data. Data here is haphazard and needs special software to be handled properly.
In the next post, we would dig deeper into the world of Big Data with the tools that we use to handle these gigantic piles of data. Stay tuned until then. Do let us know your thoughts on our post in the comments below.