In this post, we will see What is “BIGDATA” and What Does “BIGDATA” mean?
Are you ready to take “BIGDATA” insights in this post?
What is Bigdata?
Some people says when dataset/data is large (or) volume is huge then data is termed as “BIGDATA”. Some people says when we are not able to store the data on laptop then it is termed as “BIGDATA”. We get many such definitions regarding “BIGDATA”.
FORMAL DEFINATION GIVEN BY IBM IS:
Any data which is characterized by 3v’s is termed as “BIGDATA”.
These are:
1) Volume
2) Variety
3) Velocity
1) Volume: Volume/Amount of data should be Large enough, It should be some Terabytes or petabytes so that a single System (Machine) is incapable of handling it.
Example:
Facebook users upload more than 900+ million photos in a day. which is Huge volume of data, traditional system incapable of handling them.
2) Variety: Data can be any of the below given three types:-
a) Structured: Like RDBMS Databases, Oracle, Mysql etc.
b) SemiStructured: Like Csv, Xml, Json etc.
c) Unstructured: Like Audio, Video, Image, Logfiles etc.
It is not like traditional database “RDBMS” where data we get in a structured manner can be of any type as shown above.
3) Velocity: Speed/intensity at which data is coming is termed to be “Velocity”. In simple words the speed at which “Ingesting data”, “processing data” and “retrieving data”(response) is termed as “Velocity”.
Example:
Remember our Facebook example? 250+ billion images may seem like a lot. But if you want your mind blown,
consider this: Facebook users upload more than 900+ million photos a day. So that 250+ billion number from last year will seem like a drop in the bucket in a few/some months. Velocity is the measure of how fast the data is come in. Facebook has to handle a tsunami of photographs on daily basis. It has to ingest it all, process it, file it, and somehow, later, be able to retrieve it.
According to Ibm Formal definition, the Dataset with above 3 characteristics is termed as “Bigdata.”
There might be 4v’s,5v’s and so-on…….
4v’s are somewhat relevant and let’s talk about 4th v:
3v’s are same as above discussed and 4th ‘V’ is “Veracity”
Veracity: The Quality of the data that is being analyzed is termed as “Veracity”.
Low veracity: We can find meaningless, poor-quality data like :-
a) we can find lot of Null/ None values
b) age might be in negative or zero.
Such Low veracity data doesn’t contribute any meaning-full insight but On the other hand, High veracity data contribute meaning-full insights
Example:
a) High veracity data set would be data from a medical experiment or trial, which gives us a meaning-full insight of an “Experiment”.
Reference: Big Data