This publish is co-written by Goktug Cinar, Michael Binder, and Adrian Horvath from Bosch Heart for Synthetic Intelligence (BCAI).
Income forecasting is a difficult but essential activity for strategic enterprise choices and monetary planning in most organizations. Typically, income forecasting is manually carried out by monetary analysts and is each time consuming and subjective. Such guide efforts are particularly difficult for large-scale, multinational enterprise organizations that require income forecasts throughout a variety of product teams and geographical areas at a number of ranges of granularity. This requires not solely accuracy but additionally hierarchical coherence of the forecasts.
Bosch is a multinational company with entities working in a number of sectors, together with automotive, industrial options, and shopper items. Given the affect of correct and coherent income forecasting on wholesome enterprise operations, the Bosch Center for Artificial Intelligence (BCAI) has been closely investing in using machine studying (ML) to enhance the effectivity and accuracy of economic planning processes. The objective is to alleviate the guide processes by offering cheap baseline income forecasts by way of ML, with solely occasional changes wanted by the monetary analysts utilizing their business and area information.
To realize this objective, BCAI has developed an inner forecasting framework able to offering large-scale hierarchical forecasts by way of custom-made ensembles of a variety of base fashions. A meta-learner selects the best-performing fashions primarily based on options extracted from every time sequence. The forecasts from the chosen fashions are then averaged to acquire the aggregated forecast. The architectural design is modularized and extensible via the implementation of a REST-style interface, which permits steady efficiency enchancment by way of the inclusion of further fashions.
BCAI partnered with the Amazon ML Options Lab (MLSL) to include the newest advances in deep neural community (DNN)-based fashions for income forecasting. Current advances in neural forecasters have demonstrated state-of-the-art efficiency for a lot of sensible forecasting issues. In comparison with conventional forecasting fashions, many neural forecasters can incorporate further covariates or metadata of the time sequence. We embody CNN-QR and DeepAR+, two off-the-shelf fashions in Amazon Forecast, in addition to a customized Transformer mannequin educated utilizing Amazon SageMaker. The three fashions cowl a consultant set of the encoder backbones usually utilized in neural forecasters: convolutional neural community (CNN), sequential recurrent neural community (RNN), and transformer-based encoders.
One of many key challenges confronted by the BCAI-MLSL partnership was to supply strong and cheap forecasts underneath the affect of COVID-19, an unprecedented world occasion inflicting nice volatility on world company monetary outcomes. As a result of neural forecasters are educated on historic information, the forecasts generated primarily based on out-of-distribution information from the extra unstable durations could possibly be inaccurate and unreliable. Subsequently, we proposed the addition of a masked consideration mechanism within the Transformer structure to handle this problem.
The neural forecasters may be bundled as a single ensemble mannequin, or included individually into Bosch’s mannequin universe, and accessed simply by way of REST API endpoints. We suggest an strategy to ensemble the neural forecasters via backtest outcomes, which offers aggressive and strong efficiency over time. Moreover, we investigated and evaluated quite a few classical hierarchical reconciliation methods to make sure that forecasts mixture coherently throughout product teams, geographies, and enterprise organizations.
On this publish, we show the next:
- The best way to apply Forecast and SageMaker customized mannequin coaching for hierarchical, large-scale time-series forecasting issues
- The best way to ensemble customized fashions with off-the-shelf fashions from Forecast
- The best way to cut back the affect of disruptive occasions comparable to COVID-19 on forecasting issues
- The best way to construct an end-to-end forecasting workflow on AWS
We addressed two challenges: creating hierarchical, large-scale income forecasting, and the affect of the COVID-19 pandemic on long-term forecasting.
Hierarchical, large-scale income forecasting
Monetary analysts are tasked with forecasting key monetary figures, together with income, operational prices, and R&D expenditures. These metrics present enterprise planning insights at completely different ranges of aggregation and allow data-driven decision-making. Any automated forecasting resolution wants to supply forecasts at any arbitrary stage of business-line aggregation. At Bosch, the aggregations may be imagined as grouped time sequence as a extra normal type of hierarchical construction. The next determine reveals a simplified instance with a two-level construction, which mimics the hierarchical income forecasting construction at Bosch. The entire income is cut up into a number of ranges of aggregations primarily based on product and area.
The entire variety of time sequence that have to be forecasted at Bosch is on the scale of hundreds of thousands. Discover that the top-level time sequence may be cut up by both merchandise or areas, creating a number of paths to the underside stage forecasts. The income must be forecasted at each node within the hierarchy with a forecasting horizon of 12 months into the longer term. Month-to-month historic information is on the market.
The hierarchical construction may be represented utilizing the next type with the notation of a summing matrix S (Hyndman and Athanasopoulos):
On this equation, Y equals the next:
Right here, b represents the underside stage time-series at time t.
Impacts of the COVID-19 pandemic
The COVID-19 pandemic introduced important challenges for forecasting as a consequence of its disruptive and unprecedented results on nearly all points of labor and social life. For long-term income forecasting, the disruption additionally introduced sudden downstream impacts. For instance this downside, the next determine reveals a pattern time sequence the place the product income skilled a big drop at first of the pandemic and progressively recovered afterwards. A typical neural forecasting mannequin will take income information together with the out-of-distribution (OOD) COVID interval because the historic context enter, in addition to the bottom fact for mannequin coaching. Because of this, the forecasts produced are now not dependable.
On this part, we talk about our numerous modeling approaches.
Forecast is a fully-managed AI/ML service from AWS that gives preconfigured, state-of-the-art time sequence forecasting fashions. It combines these choices with its inner capabilities for automated hyperparameter optimization, ensemble modeling (for the fashions supplied by Forecast), and probabilistic forecast technology. This lets you simply ingest customized datasets, preprocess information, prepare forecasting fashions, and generate strong forecasts. The service’s modular design additional allows us to simply question and mix predictions from further customized fashions developed in parallel.
We incorporate two neural forecasters from Forecast: CNN-QR and DeepAR+. Each are supervised deep studying strategies that prepare a worldwide mannequin for your entire time sequence dataset. Each CNNQR and DeepAR+ fashions can soak up static metadata details about every time sequence, that are the corresponding product, area, and enterprise group in our case. Additionally they mechanically add temporal options comparable to month of the 12 months as a part of the enter to the mannequin.
Transformer with consideration masks for COVID
The Transformer structure (Vaswani et al.), initially designed for pure language processing (NLP), not too long ago emerged as a well-liked architectural selection for time sequence forecasting. Right here, we used the Transformer structure described in Zhou et al. with out probabilistic log sparse consideration. The mannequin makes use of a typical structure design by combining an encoder and a decoder. For income forecasting, we configure the decoder to immediately output the forecast of the 12-month horizon as an alternative of producing the forecast month by month in an autoregressive method. Based mostly on the frequency of the time sequence, further time associated options comparable to month of the 12 months are added because the enter variable. Further categorical variables describing the meta data (product, area, enterprise group) are fed into the community by way of a trainable embedding layer.
The next diagram illustrates the Transformer structure and the eye masking mechanism. Consideration masking is utilized all through all of the encoder and decoder layers, as highlighted in orange, to stop OOD information from affecting the forecasts.
We mitigate the affect of OOD context home windows by including consideration masks. The mannequin is educated to use little or no consideration to the COVID interval that comprises outliers by way of masking, and performs forecasting with masked data. The eye masks is utilized all through each layer of the decoder and encoder structure. The masked window may be both specified manually or via an outlier detection algorithm. Moreover, when utilizing a time window containing outliers because the coaching labels, the losses aren’t back-propagated. This consideration masking-based methodology may be utilized to deal with disruptions and OOD instances introduced by different uncommon occasions and enhance the robustness of the forecasts.
Mannequin ensemble usually outperforms single fashions for forecasting—it improves mannequin generalizability and is best at dealing with time sequence information with various traits in periodicity and intermittency. We incorporate a sequence of mannequin ensemble methods to enhance mannequin efficiency and robustness of forecasts. One widespread type of deep studying mannequin ensemble is to mixture outcomes from mannequin runs with completely different random weight initializations, or from completely different coaching epochs. We make the most of this technique to receive forecasts for the Transformer mannequin.
To additional construct an ensemble on high of various mannequin architectures, comparable to Transformer, CNNQR, and DeepAR+, we use a pan-model ensemble technique that selects the top-k greatest performing fashions for every time sequence primarily based on the backtest outcomes and acquire their averages. As a result of backtest outcomes may be exported immediately from educated Forecast fashions, this technique allows us to reap the benefits of turnkey companies like Forecast with enhancements gained from customized fashions comparable to Transformer. Such an end-to-end mannequin ensemble strategy doesn’t require coaching a meta-learner or calculating time sequence options for mannequin choice.
The framework is adaptive to include a variety of methods as postprocessing steps for hierarchical forecast reconciliation, together with bottom-up (BU), top-down reconciliation with forecasting proportions (TDFP), atypical least sq. (OLS), and weighted least sq. (WLS). All of the experimental outcomes on this publish are reported utilizing top-down reconciliation with forecasting proportions.
We developed an automatic end-to-end workflow on AWS to generate income forecasts using companies together with Forecast, SageMaker, Amazon Easy Storage Service (Amazon S3), AWS Lambda, AWS Step Features, and AWS Cloud Growth Package (AWS CDK). The deployed resolution offers particular person time sequence forecasts via a REST API utilizing Amazon API Gateway, by returning the ends in predefined JSON format.
The next diagram illustrates the end-to-end forecasting workflow.
Key design concerns for the structure are versatility, efficiency, and user-friendliness. The system must be sufficiently versatile to include a various set of algorithms throughout growth and deployment, with minimal required modifications, and may be simply prolonged when including new algorithms sooner or later. The system must also add minimal overhead and assist parallelized coaching for each Forecast and SageMaker to cut back coaching time and acquire the newest forecast sooner. Lastly, the system must be easy to make use of for experimentation functions.
The top-to-end workflow sequentially runs via the next modules:
- A preprocessing module for information reformatting and transformation
- A mannequin coaching module incorporating each the Forecast mannequin and customized mannequin on SageMaker (each are operating in parallel)
- A postprocessing module supporting mannequin ensemble, hierarchical reconciliation, metrics, and report technology
Step Features organizes and orchestrates the workflow from finish to finish as a state machine. The state machine run is configured with a JSON file containing all the mandatory data, together with the placement of the historic income CSV information in Amazon S3, the forecast begin time, and mannequin hyperparameter settings to run the end-to-end workflow. Asynchronous calls are created to parallelize mannequin coaching within the state machine utilizing Lambda capabilities. All of the historic information, config information, forecast outcomes, in addition to intermediate outcomes comparable to backtesting outcomes are saved in Amazon S3. The REST API is constructed on high of Amazon S3 to supply a queryable interface for querying forecasting outcomes. The system may be prolonged to include new forecast fashions and supporting capabilities comparable to producing forecast visualization reviews.
On this part, we element the experiment setup. Key parts embody the dataset, analysis metrics, backtest home windows, and mannequin setup and coaching.
To guard the monetary privateness of Bosch whereas utilizing a significant dataset, we used an artificial dataset that has related statistical traits to a real-world income dataset from one enterprise unit at Bosch. The dataset comprises 1,216 time sequence in complete with income recorded in a month-to-month frequency, masking January 2016 to April 2022. The dataset is delivered with 877 time sequence on the most granular stage (backside time sequence), with a corresponding grouped time sequence construction represented as a summing matrix S. Every time sequence is related to three static categorical attributes, which corresponds to product class, area, and organizational unit in the true dataset (anonymized within the artificial information).
We use median-Imply Arctangent Absolute Share Error (median-MAAPE) and weighted-MAAPE to judge the mannequin efficiency and carry out comparative evaluation, that are the usual metrics used at Bosch. MAAPE addresses the shortcomings of the Imply Absolute Share Error (MAPE) metric generally utilized in enterprise context. Median-MAAPE provides an outline of the mannequin efficiency by computing the median of the MAAPEs calculated individually on every time sequence. Weighted-MAAPE reviews a weighted mixture of the person MAAPEs. The weights are the proportion of the income for every time sequence in comparison with the aggregated income of your entire dataset. Weighted-MAAPE higher displays downstream enterprise impacts of the forecasting accuracy. Each metrics are reported on your entire dataset of 1,216 time sequence.
Backtest home windows
We use rolling 12-month backtest home windows to check mannequin efficiency. The next determine illustrates the backtest home windows used within the experiments and highlights the corresponding information used for coaching and hyperparameter optimization (HPO). For backtest home windows after COVID-19 begins, the result’s affected by OOD inputs from April to Might 2020, primarily based on what we noticed from the income time sequence.
Mannequin setup and coaching
For Transformer coaching, we used quantile loss and scaled every time sequence utilizing its historic imply worth earlier than feeding it into Transformer and computing the coaching loss. The ultimate forecasts are rescaled again to calculate the accuracy metrics, utilizing the MeanScaler applied in GluonTS. We use a context window with month-to-month income information from the previous 18 months, chosen by way of HPO within the backtest window from July 2018 to June 2019. Further metadata about every time sequence within the type of static categorical variables are fed into the mannequin by way of an embedding layer earlier than feeding it to the transformer layers. We prepare the Transformer with 5 completely different random weight initializations and common the forecast outcomes from the final three epochs for every run, in complete averaging 15 fashions. The 5 mannequin coaching runs may be parallelized to cut back coaching time. For the masked Transformer, we point out the months from April to Might 2020 as outliers.
For all Forecast mannequin coaching, we enabled automated HPO, which might choose the mannequin and coaching parameters primarily based on a user-specified backtest interval, which is ready to the final 12 months within the information window used for coaching and HPO.
We prepare masked and unmasked Transformers utilizing the identical set of hyperparameters, and in contrast their efficiency for backtest home windows instantly after COVID-19 shock. Within the masked Transformer, the 2 masked months are April and Might 2020. The next desk reveals the outcomes from a sequence of backtest durations with 12-month forecasting home windows ranging from June 2020. We are able to observe that the masked Transformer constantly outperforms the unmasked model.
We additional carried out analysis on the mannequin ensemble technique primarily based on backtest outcomes. Particularly, we examine the 2 instances when solely the highest performing mannequin is chosen vs. when the highest two performing fashions are chosen, and mannequin averaging is carried out by computing the imply worth of the forecasts. We examine the efficiency of the bottom fashions and the ensemble fashions within the following figures. Discover that not one of the neural forecasters constantly out-perform others for the rolling backtest home windows.
The next desk reveals that, on common, ensemble modeling of the highest two fashions provides one of the best efficiency. CNNQR offers the second-best consequence.
This publish demonstrated how one can construct an end-to-end ML resolution for large-scale forecasting issues combining Forecast and a customized mannequin educated on SageMaker. Relying on your corporation wants and ML information, you should use a completely managed service comparable to Forecast to dump the construct, prepare, and deployment technique of a forecasting mannequin; construct your customized mannequin with particular tuning mechanisms with SageMaker; or carry out mannequin ensembling by combining the 2 companies.
If you need assist accelerating using ML in your services, please contact the Amazon ML Options Lab program.
Hyndman RJ, Athanasopoulos G. Forecasting: rules and follow. OTexts; 2018 Might 8.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Consideration is all you want. Advances in neural data processing programs. 2017;30.
Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, Zhang W. Informer: Past environment friendly transformer for lengthy sequence time-series forecasting. InProceedings of AAAI 2021 Feb 2.
In regards to the Authors
Goktug Cinar is a lead ML scientist and the technical lead of the ML and stats-based forecasting at Robert Bosch LLC and Bosch Heart for Synthetic Intelligence. He leads the analysis of the forecasting fashions, hierarchical consolidation, and mannequin mixture methods in addition to the software program growth staff which scales these fashions and serves them as a part of the inner end-to-end monetary forecasting software program.
Michael Binder is a product proprietor at Bosch International Companies, the place he coordinates the event, deployment and implementation of the corporate vast predictive analytics software for the large-scale automated information pushed forecasting of economic key figures.
Adrian Horvath is a Software program Developer at Bosch Heart for Synthetic Intelligence, the place he develops and maintains programs to create predictions primarily based on numerous forecasting fashions.
Panpan Xu is a Senior Utilized Scientist and Supervisor with the Amazon ML Options Lab at AWS. She is engaged on analysis and growth of Machine Studying algorithms for high-impact buyer purposes in quite a lot of industrial verticals to speed up their AI and cloud adoption. Her analysis curiosity contains mannequin interpretability, causal evaluation, human-in-the-loop AI and interactive information visualization.
Jasleen Grewal is an Utilized Scientist at Amazon Net Companies, the place she works with AWS prospects to unravel actual world issues utilizing machine studying, with particular deal with precision medication and genomics. She has a powerful background in bioinformatics, oncology, and medical genomics. She is keen about utilizing AI/ML and cloud companies to enhance affected person care.
Selvan Senthivel is a Senior ML Engineer with the Amazon ML Options Lab at AWS, specializing in serving to prospects on machine studying, deep studying issues, and end-to-end ML options. He was a founding engineering lead of Amazon Comprehend Medical and contributed to the design and structure of a number of AWS AI companies.
Shane Rai is a Sr. ML Strategist with the Amazon ML Options Lab at AWS. He works with prospects throughout a various spectrum of industries to unravel their most urgent and modern enterprise wants utilizing AWS’s breadth of cloud-based AI/ML companies.
Lin Lee Cheong is an Utilized Science Supervisor with the Amazon ML Options Lab staff at AWS. She works with strategic AWS prospects to discover and apply synthetic intelligence and machine studying to find new insights and resolve complicated issues.