The challenges in Big Data are the implementation hurdles which require immediate attention. If these challenges are not handled they may lead to technology failure and also some unpleasant results.
Privacy and Security
It is the most important challenges with Big data which is very sensitive & have legal significance.
The personal information (e.g. in database of a person on social networking website) of a person when combined with external large data sets leads to the inference of new facts about that person and it’s possible that these facts about that person are infringement in his privacy and the person might not want the data owner to know about them.
Also sometimes information regarding the people are collected and used in order to add value to the business of the organization. This is done by creating insights in their lives which they are unaware of.
Human Resources and Manpower
Since Big data is at its youth and an emerging technology so it needs to attract organizations and youth with diverse new skill. These skills should not be limited to technical but also should extend to research, analytical, interpretive and creative ones. Universities need to introduce curriculum on Big data to produce skilled employees in this field.
Technical Challenges
With the incoming of new technologies like Cloud computing and Big data it is always important that whenever the failure occurs the damage done should be within acceptable limit rather than beginning the whole task from the beginning.
1.Fault Tolerance
Fault-tolerant computing is extremely hard involving complicated algorithms. It is not simply possible to construct 100% reliable fault tolerant machines or software. Thus the main task is to reduce the probability of failure to an acceptable level.
Two methods which seem to increase the fault tolerance in Big Data are as:
• First is to divide the whole computation being done into tasks and assign these tasks to different nodes for computation.
• Second is one node is assigned the work of observing that these nodes are working properly and if something happens that particular task is restarted.
But sometimes it’s possible that that the whole computation can’t be divided into independent tasks. There could be tasks which may be recursive in nature and the output of the previous computation of task is the input to the next computation. Thus restarting the whole computation becomes tough process. This can be avoided by applying Checkpoints which keeps the state of the system at certain intervals of the time. In case of any failure, the computation can restart from last checkpoint maintained and helping to retrieve the data .
2. Scalability
The scalability issue of Big data has lead towards cloud computing which now aggregates multiple disparate workloads with varying performance goals into very large clusters. This requires a high level of sharing of resources which is expensive and also brings with it various challenges like how to run and execute various jobs so that we can meet the goal of each workload cost effectively. It also requires dealing with the system failures in an efficient manner which occurs more frequently if operating on large clusters.
These factors combined put the concern on how to express the programs, even complex machine learning tasks. There has been a huge shift in the technologies being used.
Hard Disk Drives (HDD) are being replaced by the solid state Drives and Phase Change technology which are not having the same performance between sequential and random data transfer. Thus, what kinds of storage devices are to be used; is again a big question for data storage.
3.Quality of Data
Collection of huge amount of data and its storage comes at a cost. More data if used for decision making or for predictive analysis in business will definitely lead to better results. Business Leaders will always want more and more data storage whereas the IT Leaders will take all technical aspects in mind before storing all the data. Big data basically focuses on quality data storage rather than having very large irrelevant data so that better results and conclusions can be drawn. This further leads to various questions like how it can be ensured that which data is relevant, how much data would be enough for decision making and whether the stored data is accurate or not to draw conclusions from it.
Privacy and Security
It is the most important challenges with Big data which is very sensitive & have legal significance.
The personal information (e.g. in database of a person on social networking website) of a person when combined with external large data sets leads to the inference of new facts about that person and it’s possible that these facts about that person are infringement in his privacy and the person might not want the data owner to know about them.
Also sometimes information regarding the people are collected and used in order to add value to the business of the organization. This is done by creating insights in their lives which they are unaware of.
Human Resources and Manpower
Since Big data is at its youth and an emerging technology so it needs to attract organizations and youth with diverse new skill. These skills should not be limited to technical but also should extend to research, analytical, interpretive and creative ones. Universities need to introduce curriculum on Big data to produce skilled employees in this field.
Technical Challenges
With the incoming of new technologies like Cloud computing and Big data it is always important that whenever the failure occurs the damage done should be within acceptable limit rather than beginning the whole task from the beginning.
1.Fault Tolerance
Fault-tolerant computing is extremely hard involving complicated algorithms. It is not simply possible to construct 100% reliable fault tolerant machines or software. Thus the main task is to reduce the probability of failure to an acceptable level.
Two methods which seem to increase the fault tolerance in Big Data are as:
• First is to divide the whole computation being done into tasks and assign these tasks to different nodes for computation.
• Second is one node is assigned the work of observing that these nodes are working properly and if something happens that particular task is restarted.
But sometimes it’s possible that that the whole computation can’t be divided into independent tasks. There could be tasks which may be recursive in nature and the output of the previous computation of task is the input to the next computation. Thus restarting the whole computation becomes tough process. This can be avoided by applying Checkpoints which keeps the state of the system at certain intervals of the time. In case of any failure, the computation can restart from last checkpoint maintained and helping to retrieve the data .
2. Scalability
The scalability issue of Big data has lead towards cloud computing which now aggregates multiple disparate workloads with varying performance goals into very large clusters. This requires a high level of sharing of resources which is expensive and also brings with it various challenges like how to run and execute various jobs so that we can meet the goal of each workload cost effectively. It also requires dealing with the system failures in an efficient manner which occurs more frequently if operating on large clusters.
These factors combined put the concern on how to express the programs, even complex machine learning tasks. There has been a huge shift in the technologies being used.
Hard Disk Drives (HDD) are being replaced by the solid state Drives and Phase Change technology which are not having the same performance between sequential and random data transfer. Thus, what kinds of storage devices are to be used; is again a big question for data storage.
3.Quality of Data
Collection of huge amount of data and its storage comes at a cost. More data if used for decision making or for predictive analysis in business will definitely lead to better results. Business Leaders will always want more and more data storage whereas the IT Leaders will take all technical aspects in mind before storing all the data. Big data basically focuses on quality data storage rather than having very large irrelevant data so that better results and conclusions can be drawn. This further leads to various questions like how it can be ensured that which data is relevant, how much data would be enough for decision making and whether the stored data is accurate or not to draw conclusions from it.