Grouped Prophet
The Grouped Prophet model is a multi-series orchestration framework for building multiple individual models of related, but isolated series data. For example, a project that required the forecasting of airline passengers at major airports around the world would historically require individual orchestration of data acquisition, hyperparameter definitions, model training, metric validation, serialization, and registration of thousands of individual models.
This API consolidates the many thousands of models that would otherwise need to be implemented, trained individually, and managed throughout their frequent retraining and forecasting lifecycles to a single high-level API that simplifies these common use cases that rely on the Prophet forecasting library.
Table of Contents
Grouped Prophet API
The following sections provide a basic overview of using the GroupedProphet
API,
from fitting of the grouped models, predicting forecasted data, saving, loading, and customization of the underlying
Prophet
instances.
To see working end-to-end examples, you can go to Tutorials and Examples. The examples will allow you to explore the data structures required for training, how to extract forecasts for each group, and demonstrations of the saving and loading of trained models.
Model fitting
In order to fit a GroupedProphet
model instance, the fit
method is used. Calling this method will process the input DataFrame
to create a grouped execution collection,
fit a Prophet
model on each individual series, and persist the trained state of each group’s model to the
object instance.
The arguments for the fit
method are:
- df
A ‘normalized’ DataFrame that contains an endogenous regressor column (the ‘y’ column), a date (or datetime) column (that defines the ordering, periodicity, and frequency of each series (if this column is a string, the frequency will be inferred)), and grouping column(s) that define the discrete series to be modeled. For further information on the structure of this
DataFrame
, see the quickstart guide- group_key_columns
The names of the columns within
df
that, when combined (in order supplied) define distinct series. See the quickstart guide for further information.- kwargs
[Optional] Arguments that are used for overrides to the
Prophet
pystan optimizer. Details of what parameters are available and how they might affect the optimization of the model can be found by runninghelp(pystan.StanModel.optimizing)
from a Python REPL.
Example:
grouped_prophet_model = GroupedProphet().fit(df, ["country", "region"])
Forecast
The forecast
method is the ‘primary means’ of generating future forecast
predictions. For each group that was trained in the Model fitting of the grouped model,
a value of time periods is predicted based upon the last event date (or datetime) from each series’ temporal
termination.
Usage of this method requires providing two arguments:
- horizon
The number of events to forecast (supplied as a positive integer)
- frequency
The periodicity between each forecast event. Note that this value does not have to match the periodicity of the training data (i.e., training data can be in days and predictions can be in months, minutes, hours, or years).
The frequency abbreviations that are allowed can be found at the following link for pandas timeseries.
Note
The generation of error estimates (yhat_lower and yhat_upper) in the output of a forecast are controlled
through the use of the Prophet
argument uncertainty_samples
during class instantiation, prior to Model fitting
being called. Setting this value to 0 will eliminate error estimates and will dramatically increase the speed of
training, prediction, and cross validation.
The return data structure for this method will be of a ‘stacked’ pandas
DataFrame
, consisting of the
grouping keys defined (in the order in which they were generated), the grouping columns, elements of the prediction
values (deconstructed; e.g. ‘weekly’, ‘yearly’, ‘daily’ seasonality terms and the ‘trend’), the date (datetime) values,
and the prediction itself (labeled yhat).
Predict
A ‘manual’ method of generating predictions based on discrete date (or datetime) values for each group specified.
This method accepts a DataFrame
as input having columns that define discrete dates to generate predictions for
and the grouping key columns that match those supplied when the model was fit.
For example, a model trained with the grouping key columns of ‘city’ and ‘country’ that included New York City, US
and Toronto, Canada as series would generate predictions for both of these cities if the provided
df
argument were supplied:
predict_config = pd.DataFrame.from_records(
{
"country": ["us", "us", "ca", "ca"],
"city": ["nyc", "nyc", "toronto", "toronto"],
"ds": ["2022-01-01", "2022-01-08", "2022-01-01", "2022-01-08"],
}
)
grouped_prophet_model.predict(predict_config)
The structure of this submitted DataFrame
for the above use case is:
country |
city |
ds |
---|---|---|
us |
nyc |
2022-01-01 |
us |
nyc |
2022-01-08 |
ca |
toronto |
2022-01-01 |
ca |
toronto |
2022-01-08 |
Usage of this method with the above specified df would generate 4 individual predictions; one for each row.
Note
The Forecast method is more appropriate for most use cases as it will continue immediately after the training period of data terminates.
Predict Groups
The predict_groups
method generates forecast data for a subset of
groups that a diviner.GroupedProphet
model was trained upon.
Example:
from diviner import GroupedProphet
model = GroupedProphet().fit(df, ["country", "region"])
subset_forecasts = model.predict_groups(groups=[("US", "NY"), ("FR", "Paris"), ("UA", "Kyiv")],
horizon=90,
frequency="D",
on_error="warn"
)
The arguments for the predict_groups
method are:
- groups
A collection of one or more groups for which to generate a forecast. The collection of groups must be submitted as a
List[Tuple[str]]
to identify the order-specific group values to retrieve the correct model. For instance, if the model was trained with the specifiedgroup_key_columns
of["country", "city"]
, a validgroups
entry would be:[("US", "LosAngeles"), ("CA", "Toronto")]
. Changing the order within the tuples will not resolve (e.g.[("NewYork", "US")]
would not find the appropriate model).Note
Groups that are submitted for prediction that are not present in the trained model will, by default, cause an Exception to be raised. This behavior can be changed to a warning or ignore status with the argument
on_error
.- horizon
The number of events to forecast (supplied as a positive integer)
- frequency
The periodicity between each forecast event. Note that this value does not have to match the periodicity of the training data (i.e., training data can be in days and predictions can be in months, minutes, hours, or years).
The frequency abbreviations that are allowed can be found here.
- predict_col
[Optional] The name to use for the generated column containing forecasted data. Default:
"yhat"
- on_error
[Optional] [Default ->
"raise"
] Dictates the behavior for handling group keys that have been submitted in thegroups
argument that do not match with a group identified and registered during training (fit
). The modes are:"raise"
A
DivinerException
is raised if any supplied groups do not match to the fitted groups.
"warn"
A warning is emitted (printed) and logged for any groups that do not match to those that the model was fit with.
"ignore"
Invalid groups will silently fail prediction.
Note
A
DivinerException
will still be raised even in"ignore"
mode if there are no valid fit groups to match the providedgroups
provided to this method.
Save
Supports saving a GroupedProphet
model that has been fit
.
The serialization of the model instance does not rely on pickle or cloudpickle, rather a straight-forward json
serialization.
save_location = "/path/to/store/model"
grouped_prophet_model.save(save_location)
Load
Loading a saved GroupedProphet
model is done through the use of a class method. The
load
method is called as below:
load_location = "/path/to/stored/model"
grouped_prophet_model = GroupedProphet.load(load_location)
Note
The PyStan
backend optimizer instance used to fit the model is not saved (this would require compilation of
PyStan
on the same machine configuration that was used to fit it in order for it to be valid to reuse) as it is
not useful to store and would require additional dependencies that are not involved in cross validation, parameter
extraction, forecasting, or predicting. If you need access to the PyStan
backend, retrain the model and access
the underlying solver prior to serializing to disk.
Overriding Prophet settings
In order to create a GroupedProphet
instance, there are no required attributes to
define. Utilizing the default values will, as with the underlying Prophet
library, utilize the default values to
perform model fitting.
However, there are arguments that can be overridden which are pass-through values to the individual Prophet
instances that are created for each group. Since these are **kwargs
entries, the names will be argument names for
the respective arguments in Prophet
.
To see a full listing of available arguments for the given version of Prophet
that you are using, the simplest
(as well as the recommended manner in the library documentation) is to run a help()
command in a Python REPL:
from prophet import Prophet
help(Prophet)
An example of overriding many of the arguments within the underlying Prophet
model for the GroupedProphet
API
is shown below.
grouped_prophet_model = GroupedProphet(
growth='linear',
changepoints=None,
n_changepoints=90,
changepoint_range=0.8,
yearly_seasonality='auto',
weekly_seasonality='auto',
daily_seasonality='auto',
holidays=None,
seasonality_mode='additive',
seasonality_prior_scale=10.0,
holidays_prior_scale=10.0,
changepoint_prior_scale=0.05,
mcmc_samples=0,
interval_width=0.8,
uncertainty_samples=1000,
stan_backend=None
)
Utilities
Parameter Extraction
The method extract_model_params
is a utility that extracts the tuning parameters
from each individual model from within the model’s container and returns them as a single DataFrame.
Columns are the parameters from the models, while each row is an individual group’s Prophet model’s parameter values.
Having a single consolidated extraction data structure eases the historical registration of model performance and
enables a simpler approach to the design of frequent retraining through passive retraining systems (allowing for
an easier means by which to acquire priors hyperparameter values on frequently retrained forecasting models).
An example extract from a 2-group model (cast to a dictionary from the Pandas DataFrame
output) is shown below:
{'changepoint_prior_scale': {0: 0.05, 1: 0.05},
'changepoint_range': {0: 0.8, 1: 0.8},
'component_modes': {0: {'additive': ['yearly',
'weekly',
'additive_terms',
'extra_regressors_additive',
'holidays'],
'multiplicative': ['multiplicative_terms',
'extra_regressors_multiplicative']},
1: {'additive': ['yearly',
'weekly',
'additive_terms',
'extra_regressors_additive',
'holidays'],
'multiplicative': ['multiplicative_terms',
'extra_regressors_multiplicative']}},
'country_holidays': {0: None, 1: None},
'daily_seasonality': {0: 'auto', 1: 'auto'},
'extra_regressors': {0: OrderedDict(), 1: OrderedDict()},
'fit_kwargs': {0: {}, 1: {}},
'grouping_key_columns': {0: ('key2', 'key1', 'key0'),
1: ('key2', 'key1', 'key0')},
'growth': {0: 'linear', 1: 'linear'},
'holidays': {0: None, 1: None},
'holidays_prior_scale': {0: 10.0, 1: 10.0},
'interval_width': {0: 0.8, 1: 0.8},
'key0': {0: 'T', 1: 'M'},
'key1': {0: 'A', 1: 'B'},
'key2': {0: 'C', 1: 'L'},
'logistic_floor': {0: False, 1: False},
'mcmc_samples': {0: 0, 1: 0},
'n_changepoints': {0: 90, 1: 90},
'seasonality_mode': {0: 'additive', 1: 'additive'},
'seasonality_prior_scale': {0: 10.0, 1: 10.0},
'specified_changepoints': {0: False, 1: False},
'stan_backend': {0: <prophet.models.PyStanBackend object at 0x7f900056d2e0>,
1: <prophet.models.PyStanBackend object at 0x7f9000523eb0>},
'start': {0: Timestamp('2018-01-02 00:02:00'),
1: Timestamp('2018-01-02 00:02:00')},
't_scale': {0: Timedelta('1459 days 00:00:00'),
1: Timedelta('1459 days 00:00:00')},
'train_holiday_names': {0: None, 1: None},
'uncertainty_samples': {0: 1000, 1: 1000},
'weekly_seasonality': {0: 'auto', 1: 'auto'},
'y_scale': {0: 1099.9530489951537, 1: 764.727400507604},
'yearly_seasonality': {0: 'auto', 1: 'auto'}}
Cross Validation and Scoring
The primary method of evaluating model performance across all groups is by using the method
cross_validate_and_score
. Using this method from a GroupedProphet
instance
that has been fit will perform backtesting of each group’s model using the training data set supplied when the
fit
method was called.
The return type of this method is a single consolidated Pandas DataFrame
that contains metrics as columns with
each row representing a distinct grouping key.
For example, below is a sample of 3 groups’ cross validation metrics.
{'coverage': {0: 0.21839080459770113,
1: 0.057471264367816084,
2: 0.5114942528735632},
'grouping_key_columns': {0: ('key2', 'key1', 'key0'),
1: ('key2', 'key1', 'key0'),
2: ('key2', 'key1', 'key0')},
'key0': {0: 'T', 1: 'M', 2: 'K'},
'key1': {0: 'A', 1: 'B', 2: 'S'},
'key2': {0: 'C', 1: 'L', 2: 'Q'},
'mae': {0: 14.230668998203283, 1: 34.62100210053155, 2: 46.17014668092673},
'mape': {0: 0.015166533573997266,
1: 0.05578282899646585,
2: 0.047658812366283436},
'mdape': {0: 0.013636314354422746,
1: 0.05644041426067295,
2: 0.039153745874603914},
'mse': {0: 285.42142900120183, 1: 1459.7746527190932, 2: 3523.9281809854906},
'rmse': {0: 15.197908800171147, 1: 35.520537302480314, 2: 55.06313841955681},
'smape': {0: 0.015327226830099487,
1: 0.05774645767583018,
2: 0.0494437278595581}}
Method arguments:
- horizon
A
pandas.Timedelta
string consisting of two parts: an integer and a periodicity. For example, if the training data is daily, consists of 5 years of data, and the end-use for the project is to predict 14 days of future values every week, a plausible horizon value might be"21 days"
or"28 days"
. See pandas documentation for information on the allowable syntax and format forpandas.Timedelta
values.- metrics
A list of metrics that will be calculated following the back-testing cross validation. By default, all of the following will be tested:
“mae” (mean absolute error)
“mape” (mean absolute percentage error)
“mdape” (median absolute percentage error)
“mse” (mean squared error)
“rmse” (root mean squared error)
“smape” (symmetric mean absolute percentage error)
To restrict the metrics computed and returned, a subset of these tests can be supplied to the metrics
argument.
- period
The frequency at which each windowed collection of back testing cross validation will be conducted. If the argument
cutoffs
is left asNone
, this argument will determine the spacing between training and validation sets as the cross validation algorithm steps through each series. Smaller values will increase cross validation execution time.- initial
The size of the initial training period to use for cross validation windows. The default derived value, if not specified, is
horizon
* 3 with cutoff values for each window set athorizon
/ 2.- parallel
Mode of operation for calculating cross validation windows.
None
for serial execution,'processes'
for multiprocessing pool execution, and'threads'
for thread pool execution.- cutoffs
Optional control mode that allows for defining specific datetime values in
pandas.Timestamp
format to determine where to conduct train and test split boundaries for validation of each window.- kwargs
Individual optional overrides to
prophet.diagnostics.cross_validation()
andprophet.diagnostics.performance_metrics()
functions. See the prophet docs for more information.
Cross Validation
The diviner.GroupedProphet.cross_validate()
method is a wrapper around the Prophet
function
prophet.diagnostics.cross_validation()
. It is intended to be used as a debugging tool for the ‘automated’ metric
calculation method, see Cross Validation and Scoring. The arguments for this
method are:
- horizon
A timedelta formatted string in the
Pandas.Timedelta
format that defines the amount of time to utilize for generating a validation dataset that is used for calculating loss metrics per each cross validation window iteration. Example horizons: ("30 days"
,"24 hours"
,"16 weeks"
). See the pandas Timedelta docs for more information on supported formats and syntax.- period
The periodicity of how often a windowed validation will be constructed. Smaller values here will take longer as more ‘slices’ of the data will be made to calculate error metrics. The format is the same as that of the horizon (i.e.
"60 days"
).- initial
The minimum size of data that will be used to build the cross validation window. Values that are excessively small may cause issues with the effectiveness of the estimated overall prediction error and lead to long cross validation runtimes. This argument is in the same format as
horizon
andperiod
, apandas.Timedelta
format string.- parallel
Selection on how to execute the cross validation windows. Supported modes: (
None
,'processes'
, or'threads'
). Due to the reuse of the originating dataset for window slice selection, a shared memory instance mode'threads'
is recommended over using'processes'
mode.- cutoffs
Optional arguments for specified
pandas.Timestamp
values to define where boundaries should be within the group series values. If this is specified, theperiod
andinitial
arguments are not used.
Note
For information on how cross validation works within the Prophet
library, see this
link.
The return type of this method is a dictionary of {<group_key>: <pandas DataFrame>}
, the DataFrame
containing
the cross validation window scores across time horizon splits.
Performance Metrics
The calculate_performance_metrics
method is a
debugging tool that wraps the function performance_metrics
from Prophet
. Usage of this method will generate the defined metric scores for each cross validation window,
returning a dictionary of {<group_key>: <DataFrame of metrics for each window>}
Method arguments:
- cv_results
The output of
cross_validate
.- metrics
Optional subset list of metrics. See the signature for cross_validate_and_score() for supported metrics.
- rolling_window
Defines the fractional amount of data to use in each rolling window to calculate the performance metrics. Must be in the range of {0: 1}.
- monthly
Boolean value that, if set to
True
, will collate the windows to ensure that horizons are computed as a factor of months of the year from the cutoff date. This is only useful if the data has a yearly seasonality component to it that relates to day of month.
Class Signature
- class diviner.GroupedProphet(**kwargs)[source]
A class for executing multiple Prophet models from a single submitted DataFrame. The structure of the input DataFrame to the
fit
method must have defined grouping columns that are used to build the per-group processing dataframes and models for each group. The names of these columns, passed in as part of thefit
method are required to be present in the DataFrame schema. Any parameters that are needed to override Prophet parameters can be submitted as kwargs to thefit
andpredict
methods. These settings will be overridden globally for all grouped models within the submitted DataFrame.For the Prophet initialization constructor, showing which arguments are available to be passed in as
kwargs
in this class constructor, see: https://github.com/facebook/prophet/blob/main/python/prophet/forecaster.py- calculate_performance_metrics(cv_results, metrics=None, rolling_window=0.1, monthly=False)[source]
Model debugging utility function for evaluating performance metrics from the grouped cross validation extract. This will output a metric table for each time boundary from cross validation, for each model. note: This output will be very large and is intended to be used as a debugging tool only.
- Parameters
cv_results – The output return of
group_cross_validation
metrics – (Optional) overrides (subtractive) for metrics to generate for this function’s output. note: see supported metrics in Prophet documentation: (https://facebook.github.io/prophet/docs/diagnostics. html#cross-validation) note: any model in the collection that was fit with the argument
uncertainty_samples
set to0
will have the metric'coverage'
removed from evaluation due to the fact that`yhat_error`
values are not calculated with that configuration of that parameter.rolling_window – Defines how much data to use in each rolling window as a range of
[0, 1]
for computing the performance metrics.monthly – If set to true, will collate the windows to ensure that horizons are computed as number of months from the cutoff date. Only useful for date data that has yearly seasonality associated with calendar day of month.
- Returns
Dictionary of {
'group_key'
: <performance metrics per window pandas DataFrame>}
- cross_validate(horizon, period=None, initial=None, parallel=None, cutoffs=None)[source]
Utility method for generating the cross validation dataset for each grouping key. This is a wrapper around
prophet.diagnostics.cross_validation
and uses the same signature arguments as that function. It applies each globally to all groups. note: the output of this will be a Pandas DataFrame for each grouping key per cutoff boundary in the datetime series. The output of this function will be many times larger than the original input data utilized for training of the model.- Parameters
horizon – pandas Timedelta formatted string (i.e.
'14 days'
or'18 hours'
) to define the amount of time to utilize for a validation set to be created.period – the periodicity of how often a windowed validation will occur. Default is 0.5 * horizon value.
initial – The minimum amount of training data to include in the first cross validation window.
parallel – mode of computing cross validation statistics. One of: (
None
,'processes'
, or'threads'
)cutoffs – List of pandas Timestamp values that specify cutoff overrides to be used in conducting cross validation.
- Returns
Dictionary of {
'group_key'
: <cross validation Pandas DataFrame>}
- cross_validate_and_score(horizon, period=None, initial=None, parallel=None, cutoffs=None, metrics=None, **kwargs)[source]
Metric scoring method that will run backtesting cross validation scoring for each time series specified within the model after a
fit
has been performed.Note: If the configuration overrides for the model during
fit
setuncertainty_samples=0
, the metriccoverage
will be removed from metrics calculation, saving a great deal of runtime overhead since the prediction errors(yhat_upper, yhat_lower)
will not be calculated.Note: overrides to functionality of both
cross_validation
andperformance_metrics
within Prophet’sdiagnostics
module are handled here askwargs
. These arguments in this method’s signature are directly passed, per model, to prophet’scross_validation
function.- Parameters
horizon – String pandas
Timedelta
format that defines the length of forecasting values to generate in order to acquire error metrics. examples:'30 days'
,'1 year'
metrics – Specific subset list of metrics to calculate and return. note: see supported metrics in Prophet documentation: https://facebook.github.io/prophet/docs/diagnostics.html#cross-validation note: The
coverage
metric will be removed if error estiamtes are not configured to be calculated as part of the Prophetfit
method by settinguncertainty_samples=0
within the GroupedProphetfit
method.period – the periodicity of how often a windowed validation will occur. Default is 0.5 * horizon value.
initial – The minimum amount of training data to include in the first cross validation window.
parallel – mode of computing cross validation statistics. Supported modes: (
None
,'processes'
, or'threads'
)cutoffs – List of pandas
Timestamp
values that specify cutoff overrides to be used in conducting cross validation.kwargs – cross validation overrides to Prophet’s
prophet.diagnostics.cross_validation
andprophet.diagnostics.performance_metrics
functions
- Returns
A consolidated Pandas DataFrame containing the specified metrics to test as columns with each row representing a group.
- extract_model_params()[source]
Utility method for extracting all model parameters from each model within the processed groups.
- Returns
A consolidated pandas DataFrame containing the model parameters as columns with each row entry representing a group.
- fit(df, group_key_columns, y_col='y', datetime_col='ds', **kwargs)[source]
Main
fit
method for executing a Prophetfit
on the submitted DataFrame, grouped by thegroup_key_columns
submitted. When initiated, the input DataFramedf
will be split into an iterable collection that represents a core series to be fit against. Thisfit
method is a per-group wrapper around Prophet’sfit
implementation. See: https://facebook.github.io/prophet/docs/quick_start.html for information on the basic API, as well as links to the source code that will demonstrate all of the options available for overriding default functionality. For a full description of all parameters that are available to the optimizer, run the following in a shell:import pystan help(pystan.StanModel.optimizing)
- Parameters
df –
Normalized pandas DataFrame containing
group_key_columns
, a'ds'
column, and a target'y'
column. An example normalized data set to be used in this method:region
zone
ds
y
northeast
1
’2021-10-01’
1234.5
northeast
2
’2021-10-01’
3255.6
northeast
1
’2021-10-02’
1255.9
group_key_columns – The columns in the
df
argument that define, in aggregate, a unique time series entry. For example, with the DataFrame referenced in thedf
param, group_key_columns could be: ('region'
,'zone'
) Specifying an incomplete grouping collection, while valid through this API (i.e., (‘region’)), can cause serious problems with any forecast that is built with this API. Ensure that all relevant keys are defined in the input df and declared in this param to ensure that the appropriate per-univariate series data is used to train each model.y_col – The name of the column within the DataFrame input to any method within this class that contains the endogenous regressor term (the raw data that will be used to train and use as a basis for forecasting).
datetime_col – The name of the column within the DataFrame input that defines the datetime or date values associated with each row of the endogenous regressor (
y_col
) data.kwargs – overrides for underlying
Prophet
.fit()
**kwargs
(i.e., optimizer backend library configuration overrides) for further information, see: (https://facebook.github.io/prophet/docs/diagnostics.html #hyperparameter-tuning).
- Returns
object instance (self) of GroupedProphet
- forecast(horizon: int, frequency: str)[source]
Forecasting method that will automatically generate forecasting values where the
'ds'
datetime value from thefit
DataFrame left off. For example: If the last datetime value in the training data is'2021-01-01 00:01:00'
, with a specifiedfrequency
of'1 day'
, the beginning of the forecast value will be'2021-01-02 00:01:00'
and will continue at a 1 day frequency forhorizon
number of entries. This implementation wraps the Prophet library’sprophet.forecaster.Prophet.make_future_dataframe
method.Note: This will generate a forecast for each group that was present in the
fit
input DataFramedf
argument. Time horizon values are dependent on the per-group'ds'
values for each group, which may result in different datetime values if the source fit DataFrame did not have consistent datetime values within the'ds'
column for each group.Note: For full listing of supported periodicity strings for the
frequency
parameter, see: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases- Parameters
horizon – The number of row events to forecast
frequency – The frequency (periodicity) of Pandas date_range format (i.e.,
'D'
,'M'
,'Y'
)
- Returns
A consolidated (unioned) single DataFrame of forecasts for all groups
- classmethod load(path: str)[source]
Load the model from the specified path, deserializing it from its JSON string representation and returning a
GroupedProphet
instance.- Parameters
path – File system path of a saved
GroupedProphet
model.- Returns
An instance of GroupedProphet with
fit
attributes applied.
- predict(df, predict_col: str = 'yhat')[source]
Main prediction method for generating forecast values based on the group keys and dates for each that are passed in to this method. The structure of the DataFrame submitted to this method is the same normalized format that
fit
takes as a DataFrame argument. i.e.:region
zone
ds
northeast
1
‘2021-10-01’
northeast
2
‘2021-10-01’
northeast
1
‘2021-10-02’
- Parameters
df – Normalized DataFrame consisting of grouping key entries and the dates to forecast for each group.
predict_col – The name of the column in the output
DataFrame
that contains the forecasted series data.
- Returns
A consolidated (unioned) single DataFrame of all groups forecasts
- predict_groups(groups: List[Tuple[str]], horizon: int, frequency: str, predict_col: str = 'yhat', on_error: str = 'raise')[source]
This is a prediction method that allows for generating a subset of forecasts based on the collection of keys.
- Parameters
groups –
List[Tuple[str]]
the collection of group(s) to generate forecast predictions. The group definitions must be the values within thegroup_key_columns
that were used during thefit
of the model in order to return valid forecasts.Note
The positional ordering of the values are important and must match the order of
group_key_columns
for thefit
argument to provide correct prediction forecasts.horizon – The number of row events to forecast
frequency – The frequency (periodicity) of Pandas date_range format (i.e.,
'D'
,'M'
,'Y'
)predict_col – The name of the column in the output
DataFrame
that contains the forecasted series data. Default:"yhat"
on_error –
Alert level setting for handling mismatched group keys. Default:
"raise"
The valid modes are:”ignore” - no logging or exception raising will occur if a submitted group key in the
groups
argument is not present in the model object.Note
This is a silent failure mode and will not present any indication of a failure to generate forecast predictions.
”warn” - any keys that are not present in the fit model will be recorded as logged warnings.
”raise” - any keys that are not present in the fit model will cause a
DivinerException
to be raised.
- Returns
A consolidated (unioned) single DataFrame of forecasts for all groups specified in the
groups
argument.