Programme Title : Big Data Analytics (Batch-1) Module-6
Executive Education Open Programme
Programme Directors : Profs. Dinesh Kumar, Shankar Venkatagiri & Pulak Ghosh
Programme Dates : 2 – 5 March 2017
Programme Venue : M-11, IIMB Campus
Programme Overview :
A triad of terms captures the essence of “big data”: volume, velocity and variety. The volume and pace at which data is created can challenge existing computing infrastructure. For example, every flight of a Boeing 777 can generate up to 1 terabyte (~1000 gigabytes) of data. Making sense of this data is imperative for decision making and troubleshooting. The theory of bounded rationality proposed by Nobel Laureate Herbert Simon is evermore significant today with the increased complexity of business problems; the human mind is constrained in its capacity to evaluate alternatives, given limited time to make conclusions.
Organisations large and small are forced to grapple with problems of big data, which challenge the existing tenets of data science and computing technologies. Techniques in predictive analytics rely heavily on the validity of statistical concepts such as independent and identically distributed (IID) random variables and the central limit theorem (CLT). When dealing with big data, the validity of these assumptions becomes questionable. Straightforward tasks such as interpreting descriptive statistics have their share of issues. We begin to question the utility of summary measures and diagrams.
Algorithms that work well on “small” datasets crumble when the size of the data extends into the gigabytes. Time series techniques must be revamped to handle streaming data in continuous time. Social media messages have data formats that are unfit to be represented by traditional databases. While these may appear to be difficult problems, there has been a tremendous progress in big data analytics. For example, columnar databases have significantly boosted query speeds. File systems can seamlessly distribute datasets on multiple hard drives, and facilitate analytics on them in real time. Finally, the free and open source nature of several big data platforms promotes rapid adoption.