Why Big Data is bigger than ever
Big Data has never been more central to our lives. In this article, we consider the advanced analytics technology that enables value to be extracted from this resource and results achieved in some challenging areas – such as research into COVID-19. Where is analytics heading next and what kind of solutions will enable this?
Big Data experts generally agree that the amount of generated data is set to grow exponentially in the future. A recent report from independent analyst IDC forecasts that the global datasphere will reach 175 zettabytes by 2025. How is this happening? There’s a continual rise in Internet users living their lives online, from business communications to shopping and social networking. IDC estimates that 75% of the world’s population will be interacting with online data every day within five years. And it is not just people, with billions of connected devices and embedded systems now helping to foster the new science of IoT data analytics.
Data analytics has come a long way in a short time. Our understanding of what can be achieved with data has moved forward, as has the sophistication of the tools we use to exploit it. As a result, value is being driven in innovative and exciting new ways. Whole new avenues of data science are opening up, from IoT analytics and augmented big data analytics to DataOps.
Just consider what online retailers can now do to track the habits of customers. Analytics lets them track the customer journey from initial interest right up to the buying decision, every step of the way quantifiable and measurable in some way. The data on an individual customer becomes in turn part of a larger dataset made up of the preferences of thousands of consumers, ready for analytics professionals to use the latest software platforms to extract the kind of insights that are only now possible. A more targeted and relevant customer experience is just one of the outcomes.
The value of modern analytics lies in revealing vital information that was there in the data but previously inaccessible or imperceptible, and in so doing shaking up and disrupting the dynamics of an otherwise settled market. Analyst firm Gartner cites the example of banks and their targeting of wealth management services. Traditional wisdom has always had it that older customers are likely to be the most interested in these products. But with augmented analytics, banks found that younger clients, aged between 20 to 35, are counterintuitively more likely to adopt wealth management services. Deep analytics at a stroke removed a layer of bias and erroneous thinking to uncover new services for clients and new opportunities for the banks.
An even more recent instance of the power of analytics is the scientists and researchers around the world working to find a cure for COVID-19. This vital work has been helped by the NVIDIA scientific computing platform. This platform has accelerated progress in a number of ways, from data analytics to simulation and visualisation to AI to edge processing.
Using NVIDIA GPUs, for example, Oxford Nanopore Technologies sequenced the genome of the virus in just seven hours. Using GPU-accelerated software, the US National Institute of Health and the University of Texas were able to generate a 3D structure of the virus protein using cryogenic electron microscopy. GPU-driven AI has been used to accurately classify COVID-19 infection rates based on lung scans, expediting treatment plans. And in the field of drug discovery, Oak Ridge National Laboratory deployed an InfiniBand-connected, GPU-accelerated supercomputer to screen a billion potential drug combinations in just 12 hours.
Benchmarks and boundaries are continually being shattered in the question for even faster and more powerful analytics. One of the most important benchmarks in data analytics is called TPCx-BB. It features queries that combine SQL with machine learning on structured data, with natural language processing and unstructured data, reflecting the diversity found in modern data analytics workflows. The record for TPCx BB performance has just been shattered almost 20-fold using the RAPIDS suite of open-source data science software libraries powered by 16 NVIDIA DGX A100 systems. The benchmark was run in just 14.5 minutes, versus the previous best result of 4.7 hours on a CPU-powered compute cluster.
Accelerated visualisation solutions, involving terabytes of data, are also finding uses in other areas of science. NASA, for instance, has used the technology to visualise the landing of the first manned mission to Mars, interactively, and in real-time, in what was the world’s largest volumetric visualisation. The best is almost certainly yet to come.
With digital transformation, data is now the beating heart of every organisation. But only the right technology will allow these organisations to determine which data is most important, unlock the key insights in that data, and decide what actions to take to exploit it.
By Kevin Deierling, VP Marketing NVIDIA networking business unit