Grouped Prophet

The Grouped Prophet model is a multi-series orchestration framework for building multiple individual models of related, but isolated series data. For example, a project that required the forecasting of airline passengers at major airports around the world would historically require individual orchestration of data acquisition, hyperparameter definitions, model training, metric validation, serialization, and registration of thousands of individual models.

This API consolidates the many thousands of models that would otherwise need to be implemented, trained individually, and managed throughout their frequent retraining and forecasting lifecycles to a single high-level API that simplifies these common use cases that rely on the Prophet forecasting library.

Table of Contents

Grouped Prophet API
Utilities
Class Signature

Grouped Prophet API 

The following sections provide a basic overview of using the GroupedProphet API, from fitting of the grouped models, predicting forecasted data, saving, loading, and customization of the underlying Prophet instances.

To see working end-to-end examples, you can go to Tutorials and Examples. The examples will allow you to explore the data structures required for training, how to extract forecasts for each group, and demonstrations of the saving and loading of trained models.

Model fitting 

In order to fit a GroupedProphet model instance, the fit method is used. Calling this method will process the input DataFrame to create a grouped execution collection, fit a Prophet model on each individual series, and persist the trained state of each group’s model to the object instance.

The arguments for the fit method are:

df: A ‘normalized’ DataFrame that contains an endogenous regressor column (the ‘y’ column), a date (or datetime) column (that defines the ordering, periodicity, and frequency of each series (if this column is a string, the frequency will be inferred)), and grouping column(s) that define the discrete series to be modeled. For further information on the structure of this DataFrame, see the quickstart guide
group_key_columns: The names of the columns within df that, when combined (in order supplied) define distinct series. See the quickstart guide for further information.
kwargs: [Optional] Arguments that are used for overrides to the Prophet pystan optimizer. Details of what parameters are available and how they might affect the optimization of the model can be found by running help(pystan.StanModel.optimizing) from a Python REPL.

Example:

grouped_prophet_model = GroupedProphet().fit(df, ["country", "region"])

Forecast 

The forecast method is the ‘primary means’ of generating future forecast predictions. For each group that was trained in the Model fitting of the grouped model, a value of time periods is predicted based upon the last event date (or datetime) from each series’ temporal termination.

Usage of this method requires providing two arguments:

horizon

The number of events to forecast (supplied as a positive integer)

frequency

The periodicity between each forecast event. Note that this value does not have to match the periodicity of the training data (i.e., training data can be in days and predictions can be in months, minutes, hours, or years).

The frequency abbreviations that are allowed can be found at the following link for pandas timeseries.

Note

The generation of error estimates (yhat_lower and yhat_upper) in the output of a forecast are controlled through the use of the Prophet argument uncertainty_samples during class instantiation, prior to Model fitting being called. Setting this value to 0 will eliminate error estimates and will dramatically increase the speed of training, prediction, and cross validation.

The return data structure for this method will be of a ‘stacked’ pandas DataFrame, consisting of the grouping keys defined (in the order in which they were generated), the grouping columns, elements of the prediction values (deconstructed; e.g. ‘weekly’, ‘yearly’, ‘daily’ seasonality terms and the ‘trend’), the date (datetime) values, and the prediction itself (labeled yhat).

Predict 

A ‘manual’ method of generating predictions based on discrete date (or datetime) values for each group specified. This method accepts a DataFrame as input having columns that define discrete dates to generate predictions for and the grouping key columns that match those supplied when the model was fit. For example, a model trained with the grouping key columns of ‘city’ and ‘country’ that included New York City, US and Toronto, Canada as series would generate predictions for both of these cities if the provided df argument were supplied:

predict_config = pd.DataFrame.from_records(
    {
        "country": ["us", "us", "ca", "ca"],
        "city": ["nyc", "nyc", "toronto", "toronto"],
        "ds": ["2022-01-01", "2022-01-08", "2022-01-01", "2022-01-08"],
    }
)

grouped_prophet_model.predict(predict_config)

The structure of this submitted DataFrame for the above use case is:

Predict df Structure
country	city	ds
us	nyc	2022-01-01
us	nyc	2022-01-08
ca	toronto	2022-01-01
ca	toronto	2022-01-08

Usage of this method with the above specified df would generate 4 individual predictions; one for each row.

Note

The Forecast method is more appropriate for most use cases as it will continue immediately after the training period of data terminates.

Predict Groups 

The predict_groups method generates forecast data for a subset of groups that a diviner.GroupedProphet model was trained upon.

Example:

from diviner import GroupedProphet

model = GroupedProphet().fit(df, ["country", "region"])

subset_forecasts = model.predict_groups(groups=[("US", "NY"), ("FR", "Paris"), ("UA", "Kyiv")],
                                        horizon=90,
                                        frequency="D",
                                        on_error="warn"
                                        )

The arguments for the predict_groups method are:

groups

A collection of one or more groups for which to generate a forecast. The collection of groups must be submitted as a List[Tuple[str]] to identify the order-specific group values to retrieve the correct model. For instance, if the model was trained with the specified group_key_columns of ["country", "city"], a valid groups entry would be: [("US", "LosAngeles"), ("CA", "Toronto")]. Changing the order within the tuples will not resolve (e.g. [("NewYork", "US")] would not find the appropriate model).

Note

Groups that are submitted for prediction that are not present in the trained model will, by default, cause an Exception to be raised. This behavior can be changed to a warning or ignore status with the argument on_error.

horizon

The number of events to forecast (supplied as a positive integer)

frequency

The periodicity between each forecast event. Note that this value does not have to match the periodicity of the training data (i.e., training data can be in days and predictions can be in months, minutes, hours, or years).

The frequency abbreviations that are allowed can be found here.

predict_col

[Optional] The name to use for the generated column containing forecasted data. Default: "yhat"

on_error

[Optional] [Default -> "raise"] Dictates the behavior for handling group keys that have been submitted in the groups argument that do not match with a group identified and registered during training (fit). The modes are:

"raise"
A DivinerException is raised if any supplied groups do not match to the fitted groups.
"warn"
A warning is emitted (printed) and logged for any groups that do not match to those that the model was fit with.
"ignore"
Invalid groups will silently fail prediction.

Note

A DivinerException will still be raised even in "ignore" mode if there are no valid fit groups to match the provided groups provided to this method.

Save 

Supports saving a GroupedProphet model that has been fit. The serialization of the model instance does not rely on pickle or cloudpickle, rather a straight-forward json serialization.

save_location = "/path/to/store/model"
grouped_prophet_model.save(save_location)

Load 

Loading a saved GroupedProphet model is done through the use of a class method. The load method is called as below:

load_location = "/path/to/stored/model"
grouped_prophet_model = GroupedProphet.load(load_location)

Note

The PyStan backend optimizer instance used to fit the model is not saved (this would require compilation of PyStan on the same machine configuration that was used to fit it in order for it to be valid to reuse) as it is not useful to store and would require additional dependencies that are not involved in cross validation, parameter extraction, forecasting, or predicting. If you need access to the PyStan backend, retrain the model and access the underlying solver prior to serializing to disk.

Overriding Prophet settings 

In order to create a GroupedProphet instance, there are no required attributes to define. Utilizing the default values will, as with the underlying Prophet library, utilize the default values to perform model fitting. However, there are arguments that can be overridden which are pass-through values to the individual Prophet instances that are created for each group. Since these are **kwargs entries, the names will be argument names for the respective arguments in Prophet.

To see a full listing of available arguments for the given version of Prophet that you are using, the simplest (as well as the recommended manner in the library documentation) is to run a help() command in a Python REPL:

from prophet import Prophet
help(Prophet)

An example of overriding many of the arguments within the underlying Prophet model for the GroupedProphet API is shown below.

grouped_prophet_model = GroupedProphet(
    growth='linear',
    changepoints=None,
    n_changepoints=90,
    changepoint_range=0.8,
    yearly_seasonality='auto',
    weekly_seasonality='auto',
    daily_seasonality='auto',
    holidays=None,
    seasonality_mode='additive',
    seasonality_prior_scale=10.0,
    holidays_prior_scale=10.0,
    changepoint_prior_scale=0.05,
    mcmc_samples=0,
    interval_width=0.8,
    uncertainty_samples=1000,
    stan_backend=None
)

Utilities 

Parameter Extraction 

The method extract_model_params is a utility that extracts the tuning parameters from each individual model from within the model’s container and returns them as a single DataFrame. Columns are the parameters from the models, while each row is an individual group’s Prophet model’s parameter values. Having a single consolidated extraction data structure eases the historical registration of model performance and enables a simpler approach to the design of frequent retraining through passive retraining systems (allowing for an easier means by which to acquire priors hyperparameter values on frequently retrained forecasting models).

An example extract from a 2-group model (cast to a dictionary from the Pandas DataFrame output) is shown below:

{'changepoint_prior_scale': {0: 0.05, 1: 0.05},
 'changepoint_range': {0: 0.8, 1: 0.8},
 'component_modes': {0: {'additive': ['yearly',
                                      'weekly',
                                      'additive_terms',
                                      'extra_regressors_additive',
                                      'holidays'],
                         'multiplicative': ['multiplicative_terms',
                                            'extra_regressors_multiplicative']},
                     1: {'additive': ['yearly',
                                      'weekly',
                                      'additive_terms',
                                      'extra_regressors_additive',
                                      'holidays'],
                         'multiplicative': ['multiplicative_terms',
                                            'extra_regressors_multiplicative']}},
 'country_holidays': {0: None, 1: None},
 'daily_seasonality': {0: 'auto', 1: 'auto'},
 'extra_regressors': {0: OrderedDict(), 1: OrderedDict()},
 'fit_kwargs': {0: {}, 1: {}},
 'grouping_key_columns': {0: ('key2', 'key1', 'key0'),
                          1: ('key2', 'key1', 'key0')},
 'growth': {0: 'linear', 1: 'linear'},
 'holidays': {0: None, 1: None},
 'holidays_prior_scale': {0: 10.0, 1: 10.0},
 'interval_width': {0: 0.8, 1: 0.8},
 'key0': {0: 'T', 1: 'M'},
 'key1': {0: 'A', 1: 'B'},
 'key2': {0: 'C', 1: 'L'},
 'logistic_floor': {0: False, 1: False},
 'mcmc_samples': {0: 0, 1: 0},
 'n_changepoints': {0: 90, 1: 90},
 'seasonality_mode': {0: 'additive', 1: 'additive'},
 'seasonality_prior_scale': {0: 10.0, 1: 10.0},
 'specified_changepoints': {0: False, 1: False},
 'stan_backend': {0: <prophet.models.PyStanBackend object at 0x7f900056d2e0>,
                  1: <prophet.models.PyStanBackend object at 0x7f9000523eb0>},
 'start': {0: Timestamp('2018-01-02 00:02:00'),
           1: Timestamp('2018-01-02 00:02:00')},
 't_scale': {0: Timedelta('1459 days 00:00:00'),
             1: Timedelta('1459 days 00:00:00')},
 'train_holiday_names': {0: None, 1: None},
 'uncertainty_samples': {0: 1000, 1: 1000},
 'weekly_seasonality': {0: 'auto', 1: 'auto'},
 'y_scale': {0: 1099.9530489951537, 1: 764.727400507604},
 'yearly_seasonality': {0: 'auto', 1: 'auto'}}

Cross Validation and Scoring 

The primary method of evaluating model performance across all groups is by using the method cross_validate_and_score. Using this method from a GroupedProphet instance that has been fit will perform backtesting of each group’s model using the training data set supplied when the fit method was called.

The return type of this method is a single consolidated Pandas DataFrame that contains metrics as columns with each row representing a distinct grouping key. For example, below is a sample of 3 groups’ cross validation metrics.

{'coverage': {0: 0.21839080459770113,
          1: 0.057471264367816084,
          2: 0.5114942528735632},
 'grouping_key_columns': {0: ('key2', 'key1', 'key0'),
                          1: ('key2', 'key1', 'key0'),
                          2: ('key2', 'key1', 'key0')},
 'key0': {0: 'T', 1: 'M', 2: 'K'},
 'key1': {0: 'A', 1: 'B', 2: 'S'},
 'key2': {0: 'C', 1: 'L', 2: 'Q'},
 'mae': {0: 14.230668998203283, 1: 34.62100210053155, 2: 46.17014668092673},
 'mape': {0: 0.015166533573997266,
          1: 0.05578282899646585,
          2: 0.047658812366283436},
 'mdape': {0: 0.013636314354422746,
           1: 0.05644041426067295,
           2: 0.039153745874603914},
 'mse': {0: 285.42142900120183, 1: 1459.7746527190932, 2: 3523.9281809854906},
 'rmse': {0: 15.197908800171147, 1: 35.520537302480314, 2: 55.06313841955681},
 'smape': {0: 0.015327226830099487,
           1: 0.05774645767583018,
           2: 0.0494437278595581}}

Method arguments:

horizon: A pandas.Timedelta string consisting of two parts: an integer and a periodicity. For example, if the training data is daily, consists of 5 years of data, and the end-use for the project is to predict 14 days of future values every week, a plausible horizon value might be "21 days" or "28 days". See pandas documentation for information on the allowable syntax and format for pandas.Timedelta values.
metrics: A list of metrics that will be calculated following the back-testing cross validation. By default, all of the following will be tested:

“mae” (mean absolute error)
“mape” (mean absolute percentage error)
“mdape” (median absolute percentage error)
“mse” (mean squared error)
“rmse” (root mean squared error)
“smape” (symmetric mean absolute percentage error)

To restrict the metrics computed and returned, a subset of these tests can be supplied to the metrics argument.

period: The frequency at which each windowed collection of back testing cross validation will be conducted. If the argument cutoffs is left as None, this argument will determine the spacing between training and validation sets as the cross validation algorithm steps through each series. Smaller values will increase cross validation execution time.
initial: The size of the initial training period to use for cross validation windows. The default derived value, if not specified, is horizon * 3 with cutoff values for each window set at horizon / 2.
parallel: Mode of operation for calculating cross validation windows. None for serial execution, 'processes' for multiprocessing pool execution, and 'threads' for thread pool execution.
cutoffs: Optional control mode that allows for defining specific datetime values in pandas.Timestamp format to determine where to conduct train and test split boundaries for validation of each window.
kwargs: Individual optional overrides to prophet.diagnostics.cross_validation() and prophet.diagnostics.performance_metrics() functions. See the prophet docs for more information.

Cross Validation 

The diviner.GroupedProphet.cross_validate() method is a wrapper around the Prophet function prophet.diagnostics.cross_validation(). It is intended to be used as a debugging tool for the ‘automated’ metric calculation method, see Cross Validation and Scoring. The arguments for this method are:

horizon: A timedelta formatted string in the Pandas.Timedelta format that defines the amount of time to utilize for generating a validation dataset that is used for calculating loss metrics per each cross validation window iteration. Example horizons: ("30 days", "24 hours", "16 weeks"). See the pandas Timedelta docs for more information on supported formats and syntax.
period: The periodicity of how often a windowed validation will be constructed. Smaller values here will take longer as more ‘slices’ of the data will be made to calculate error metrics. The format is the same as that of the horizon (i.e. "60 days").
initial: The minimum size of data that will be used to build the cross validation window. Values that are excessively small may cause issues with the effectiveness of the estimated overall prediction error and lead to long cross validation runtimes. This argument is in the same format as horizon and period, a pandas.Timedelta format string.
parallel: Selection on how to execute the cross validation windows. Supported modes: (None, 'processes', or 'threads'). Due to the reuse of the originating dataset for window slice selection, a shared memory instance mode 'threads' is recommended over using 'processes' mode.
cutoffs: Optional arguments for specified pandas.Timestamp values to define where boundaries should be within the group series values. If this is specified, the period and initial arguments are not used.

Note

For information on how cross validation works within the Prophet library, see this link.

The return type of this method is a dictionary of {<group_key>: <pandas DataFrame>}, the DataFrame containing the cross validation window scores across time horizon splits.

Performance Metrics 

The calculate_performance_metrics method is a debugging tool that wraps the function performance_metrics from Prophet. Usage of this method will generate the defined metric scores for each cross validation window, returning a dictionary of {<group_key>: <DataFrame of metrics for each window>}

Method arguments:

cv_results: The output of cross_validate.
metrics: Optional subset list of metrics. See the signature for cross_validate_and_score() for supported metrics.
rolling_window: Defines the fractional amount of data to use in each rolling window to calculate the performance metrics. Must be in the range of {0: 1}.
monthly: Boolean value that, if set to True, will collate the windows to ensure that horizons are computed as a factor of months of the year from the cutoff date. This is only useful if the data has a yearly seasonality component to it that relates to day of month.

Class Signature 

class diviner.GroupedProphet(**kwargs)[source]

A class for executing multiple Prophet models from a single submitted DataFrame. The structure of the input DataFrame to the fit method must have defined grouping columns that are used to build the per-group processing dataframes and models for each group. The names of these columns, passed in as part of the fit method are required to be present in the DataFrame schema. Any parameters that are needed to override Prophet parameters can be submitted as kwargs to the fit and predict methods. These settings will be overridden globally for all grouped models within the submitted DataFrame.

For the Prophet initialization constructor, showing which arguments are available to be passed in as kwargs in this class constructor, see: https://github.com/facebook/prophet/blob/main/python/prophet/forecaster.py

calculate_performance_metrics(cv_results, metrics=None, rolling_window=0.1, monthly=False)[source]

Model debugging utility function for evaluating performance metrics from the grouped cross validation extract. This will output a metric table for each time boundary from cross validation, for each model. note: This output will be very large and is intended to be used as a debugging tool only.

Parameters

cv_results – The output return of group_cross_validation
metrics – (Optional) overrides (subtractive) for metrics to generate for this function’s output. note: see supported metrics in Prophet documentation: (https://facebook.github.io/prophet/docs/diagnostics. html#cross-validation) note: any model in the collection that was fit with the argument uncertainty_samples set to 0 will have the metric 'coverage' removed from evaluation due to the fact that `yhat_error` values are not calculated with that configuration of that parameter.
rolling_window – Defines how much data to use in each rolling window as a range of [0, 1] for computing the performance metrics.
monthly – If set to true, will collate the windows to ensure that horizons are computed as number of months from the cutoff date. Only useful for date data that has yearly seasonality associated with calendar day of month.

Returns

Dictionary of {'group_key': <performance metrics per window pandas DataFrame>}

cross_validate(horizon, period=None, initial=None, parallel=None, cutoffs=None)[source]

Utility method for generating the cross validation dataset for each grouping key. This is a wrapper around prophet.diagnostics.cross_validation and uses the same signature arguments as that function. It applies each globally to all groups. note: the output of this will be a Pandas DataFrame for each grouping key per cutoff boundary in the datetime series. The output of this function will be many times larger than the original input data utilized for training of the model.

Parameters

horizon – pandas Timedelta formatted string (i.e. '14 days' or '18 hours') to define the amount of time to utilize for a validation set to be created.
period – the periodicity of how often a windowed validation will occur. Default is 0.5 * horizon value.
initial – The minimum amount of training data to include in the first cross validation window.
parallel – mode of computing cross validation statistics. One of: (None, 'processes', or 'threads')
cutoffs – List of pandas Timestamp values that specify cutoff overrides to be used in conducting cross validation.

Returns

Dictionary of {'group_key': <cross validation Pandas DataFrame>}

cross_validate_and_score(horizon, period=None, initial=None, parallel=None, cutoffs=None, metrics=None, **kwargs)[source]

Metric scoring method that will run backtesting cross validation scoring for each time series specified within the model after a fit has been performed.

Note: If the configuration overrides for the model during fit set uncertainty_samples=0, the metric coverage will be removed from metrics calculation, saving a great deal of runtime overhead since the prediction errors (yhat_upper, yhat_lower) will not be calculated.

Note: overrides to functionality of both cross_validation and performance_metrics within Prophet’s diagnostics module are handled here as kwargs. These arguments in this method’s signature are directly passed, per model, to prophet’s cross_validation function.

Parameters

horizon – String pandas Timedelta format that defines the length of forecasting values to generate in order to acquire error metrics. examples: '30 days', '1 year'
metrics – Specific subset list of metrics to calculate and return. note: see supported metrics in Prophet documentation: https://facebook.github.io/prophet/docs/diagnostics.html#cross-validation note: The coverage metric will be removed if error estiamtes are not configured to be calculated as part of the Prophet fit method by setting uncertainty_samples=0 within the GroupedProphet fit method.
period – the periodicity of how often a windowed validation will occur. Default is 0.5 * horizon value.
initial – The minimum amount of training data to include in the first cross validation window.
parallel – mode of computing cross validation statistics. Supported modes: (None, 'processes', or 'threads')
cutoffs – List of pandas Timestamp values that specify cutoff overrides to be used in conducting cross validation.
kwargs – cross validation overrides to Prophet’s prophet.diagnostics.cross_validation and prophet.diagnostics.performance_metrics functions

Returns

A consolidated Pandas DataFrame containing the specified metrics to test as columns with each row representing a group.

extract_model_params()[source]

Utility method for extracting all model parameters from each model within the processed groups.

Returns: A consolidated pandas DataFrame containing the model parameters as columns with each row entry representing a group.

fit(df, group_key_columns, y_col='y', datetime_col='ds', **kwargs)[source]

Main fit method for executing a Prophet fit on the submitted DataFrame, grouped by the group_key_columns submitted. When initiated, the input DataFrame df will be split into an iterable collection that represents a core series to be fit against. This fit method is a per-group wrapper around Prophet’s fit implementation. See: https://facebook.github.io/prophet/docs/quick_start.html for information on the basic API, as well as links to the source code that will demonstrate all of the options available for overriding default functionality. For a full description of all parameters that are available to the optimizer, run the following in a shell:

Retrieving pystan parameters

import pystan

help(pystan.StanModel.optimizing)

Parameters

df –
Normalized pandas DataFrame containing group_key_columns, a 'ds' column, and a target 'y' column. An example normalized data set to be used in this method:

region

zone

ds

y

northeast

1

’2021-10-01’

1234.5

northeast

2

’2021-10-01’

3255.6

northeast

1

’2021-10-02’

1255.9
group_key_columns – The columns in the df argument that define, in aggregate, a unique time series entry. For example, with the DataFrame referenced in the df param, group_key_columns could be: ('region', 'zone') Specifying an incomplete grouping collection, while valid through this API (i.e., (‘region’)), can cause serious problems with any forecast that is built with this API. Ensure that all relevant keys are defined in the input df and declared in this param to ensure that the appropriate per-univariate series data is used to train each model.
y_col – The name of the column within the DataFrame input to any method within this class that contains the endogenous regressor term (the raw data that will be used to train and use as a basis for forecasting).
datetime_col – The name of the column within the DataFrame input that defines the datetime or date values associated with each row of the endogenous regressor (y_col) data.
kwargs – overrides for underlying Prophet .fit() **kwargs (i.e., optimizer backend library configuration overrides) for further information, see: (https://facebook.github.io/prophet/docs/diagnostics.html #hyperparameter-tuning).

Returns

object instance (self) of GroupedProphet

forecast(horizon: int, frequency: str)[source]

Forecasting method that will automatically generate forecasting values where the 'ds' datetime value from the fit DataFrame left off. For example: If the last datetime value in the training data is '2021-01-01 00:01:00', with a specified frequency of '1 day', the beginning of the forecast value will be '2021-01-02 00:01:00' and will continue at a 1 day frequency for horizon number of entries. This implementation wraps the Prophet library’s prophet.forecaster.Prophet.make_future_dataframe method.

Note: This will generate a forecast for each group that was present in the fit input DataFrame df argument. Time horizon values are dependent on the per-group 'ds' values for each group, which may result in different datetime values if the source fit DataFrame did not have consistent datetime values within the 'ds' column for each group.

Note: For full listing of supported periodicity strings for the frequency parameter, see: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

Parameters

horizon – The number of row events to forecast
frequency – The frequency (periodicity) of Pandas date_range format (i.e., 'D', 'M', 'Y')

Returns

A consolidated (unioned) single DataFrame of forecasts for all groups

classmethod load(path: str)[source]

Load the model from the specified path, deserializing it from its JSON string representation and returning a GroupedProphet instance.

Parameters: path – File system path of a saved GroupedProphet model.
Returns: An instance of GroupedProphet with fit attributes applied.

predict(df, predict_col: str = 'yhat')[source]

Main prediction method for generating forecast values based on the group keys and dates for each that are passed in to this method. The structure of the DataFrame submitted to this method is the same normalized format that fit takes as a DataFrame argument. i.e.:

region	zone	ds
northeast	1	‘2021-10-01’
northeast	2	‘2021-10-01’
northeast	1	‘2021-10-02’

Parameters

df – Normalized DataFrame consisting of grouping key entries and the dates to forecast for each group.
predict_col – The name of the column in the output DataFrame that contains the forecasted series data.

Returns

A consolidated (unioned) single DataFrame of all groups forecasts

predict_groups(groups: List[Tuple[str]], horizon: int, frequency: str, predict_col: str = 'yhat', on_error: str = 'raise')[source]

This is a prediction method that allows for generating a subset of forecasts based on the collection of keys.

Parameters

groups –
List[Tuple[str]] the collection of group(s) to generate forecast predictions. The group definitions must be the values within the group_key_columns that were used during the fit of the model in order to return valid forecasts.

Note

The positional ordering of the values are important and must match the order of group_key_columns for the fit argument to provide correct prediction forecasts.
horizon – The number of row events to forecast
frequency – The frequency (periodicity) of Pandas date_range format (i.e., 'D', 'M', 'Y')
predict_col – The name of the column in the output DataFrame that contains the forecasted series data. Default: "yhat"
on_error –
Alert level setting for handling mismatched group keys. Default: "raise" The valid modes are:
- ”ignore” - no logging or exception raising will occur if a submitted group key in the groups argument is not present in the model object.
  
  Note
  
  This is a silent failure mode and will not present any indication of a failure to generate forecast predictions.
- ”warn” - any keys that are not present in the fit model will be recorded as logged warnings.
- ”raise” - any keys that are not present in the fit model will cause a DivinerException to be raised.

Returns

A consolidated (unioned) single DataFrame of forecasts for all groups specified in the groups argument.

save(path: str)[source]

Serialize the model as a JSON string and write it to the provided path. note: The model must be fit in order to save it.

Parameters: path – Location on the file system to store the model.
Returns: None

region	zone	ds	y
northeast	1	’2021-10-01’	1234.5
northeast	2	’2021-10-01’	3255.6
northeast	1	’2021-10-02’	1255.9