Controversy about Big Data

Big Data is a new and emerging hot topic, that has generated a great deal of controversy:

  • There is no need to distinguish Big Data analytics from data analytics, as data will continue growing, and it will never be small again.
  • Big Data may be a hype to sell Hadoop based computing systems. Hadoop is not always the best tool. It seems that data management system sellers try to sell systems based in Hadoop, and MapReduce may be not always the best programming platform, for example for medium-size companies.
  • In real time analytics, data may be changing. In that case, what it is important is not the size of the data, it is its recency. Claims to accuracy are misleading. As Taleb explains in his new book ‘AntiFragile’, when the number of variables grow, the number of fake correlations also grow. For example, Leinweber showed that the S&P 500 stock index was correlated with butter production in Bangladesh, and other strange correlations.
  • Bigger data are not always better data. It depends if the data is noisy or not, and if it is representative of what we are looking for. For example, some times Twitter users are assumed to be  representative of the global population, when this is not always the case.
  • Ethical concerns about accessibility. The main issue is if it is ethical that people can be analyzed without knowing it.
  • Limited access to Big Data creates new digital divides. There may be a digital divide between people or organizations being able to analyze Big Data or not. Also organizations with access to Big Data will be able to extract knowledge that others without access will not. We may create a division between Big Data rich and poor organizations.

References

Wei Fan, Albert Bifet Mining Big Data: Current Status, and Forecast to the Future SIGKDD Explorations 14(2): 1-5 (2012)

Comments are closed.