SageMaker machine learning updates shared at AWS re:Invent
Amazon Web Services, Inc. (AWS) has announced new capabilities for its machine learning service, Amazon SageMaker. The announcements, made at the virtual conference AWS re:Invent, bring together powerful capabilities like faster data preparation, a purpose-built repository for prepared data, workflow automation, and greater transparency into training data to mitigate bias and explain predictions.
Amazon SageMaker removes challenges from each stage of the machine learning process, making it easier and faster for developers and data scientists to build, train, and deploy machine learning models.
“Hundreds of thousands of everyday developers and data scientists have used our industry-leading machine learning service, Amazon SageMaker, to remove barriers to building, training, and deploying custom machine learning models. One of the best parts about having such a widely-adopted service like SageMaker is that we get lots of customer suggestions which fuel our next set of deliverables,” said Swami Sivasubramanian, Vice President, Amazon Machine Learning, Amazon Web Services, Inc.
“Today, we are announcing a set of tools for Amazon SageMaker that makes it much easier for developers to build end-to-end machine learning pipelines to prepare, build, train, explain, inspect, monitor, debug, and run custom machine learning models with greater visibility, explainability, and automation at scale.”
Amazon SageMaker is already being used by leading companies to accelerate their machine learning deployments, including 3M, AstraZeneca, Bayer, Capital One, Cerner, Fidelity Investments, GE Healthcare, JPMorgan Chase, Lenovo, T-Mobile, Thomson Reuters, and Vanguard.
New machine learning capabilities
The announcements made at AWS re:Invent included:
• Data Wrangler
Data Wrangler simplifies the process of data preparation and feature engineering. With Amazon SageMaker Data Wrangler, customers can choose the data they want from their various data stores and import it with a single click. Amazon SageMaker Data Wrangler contains over 300 built-in data transformers that can help customers normalize, transform, and combine features without having to write any code while managing all of the processing infrastructures under the hood.
• Feature Store
Feature Store provides a new repository that makes it easy to store, update, retrieve, and share machine learning features. This makes it simple and easy to organize and update large batches of features for training and smaller instantiations of them for inference. That way, there’s one consistent view of features for machine learning models to use and it becomes significantly easier to generate models that produce highly accurate predictions.
• Pipelines
Pipelines is the first purpose-built, easy-to-use continuous integration and continuous delivery (CI/CD) service for machine learning. Developers can define each step of an end-to-end machine learning workflow. These workflows include the data-load steps, transformations from Amazon SageMaker Data Wrangler, features stored in Amazon SageMaker Feature Store, training configuration and algorithm set up, debugging steps, and optimization steps. Workflows can be shared and re-used between teams, either to recreate a model or to act as a starting point for making improvements.
• Clarify
Amazon SageMaker Clarify provides bias detection across the machine learning workflow, enabling developers to build greater fairness and transparency into their machine learning models. Developers can more easily detect statistical bias across the entire machine learning workflow and provide explanations for predictions their machine learning models are making.
• Deep Profiling for Amazon SageMaker Debugger
Deep Profiling for Amazon SageMaker Debugger now enables developers to train their models faster by automatically monitoring system resource utilization and providing alerts for training bottlenecks.
• Distributed Training on Amazon SageMaker
New Distributed Training on Amazon SageMaker makes it possible to train large, complex deep learning models up to two times faster than current approaches.
• Edge Manager
Edge Manager allows developers to optimize, secure, monitor, and maintain machine learning models deployed on fleets of edge devices. This extends capabilities that were previously only available in the cloud by sampling data from edge devices and sending it to Amazon SageMaker Model Monitor for analysis, so developers can continuously improve model quality.
• JumpStart
JumpStart provides developers an easy-to-use, searchable interface to find solutions, algorithms, and sample notebooks. Developers new to machine learning will be able to select from complete end-to-end machine learning solutions (e.g. fraud detection, customer churn prediction, or forecasting) and deploy them directly in their Amazon SageMaker Studio environments.