California Energy Demand Analysis - Part 2

Long-Term Load Forecasting With Facebook Prophet

This is the second of a series of notebooks that dive into the problem of predicting electricity demand through several time series analysis methods. The first notebook, which you can find here, focused on describing and exploring the dataset that will be leveraged to build our forecasting models. As a brief summary, the dataset consists of the hourly energy load for the PG&E electricity system, which serves approximately the two northern thirds of the State of California.

Forecasting can be performed for different time horizons, from just a few hours to years. In this notebook, we will focus on long-term forecasting, up to one year ahead. For this task, we will primarily use Facebook's Prophet library.

All the modeling and analysis presented here were developed using Jupyter Lab. The original Jupyter notebook, with all the necessary code, can be found on my GitHub. I have cleaned the original notebook by removing most of the code cells and keeping the text and plots.


Data Loading

The first step in our process is loading the data. We will use the California EMS Hourly Load data from the years 2014 to 2017.

The historical data is stored in several Excel files which can be found and downloaded from the CAISO website.

Data Preview

Before getting started with the modeling process it is always good to visualize our time series. From the plot below (and in line with our previous EDA) we can tell that:


Modeling

Now that we have loaded in our data we can start building some forecasting models.

For the remainder of this Notebook, we will use Facebook's Prophet library. According to the authors:

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well

I decided to use Prophet because (1) long-term forecasting of energy demand fits fairly well within this definition, and (2) Prophet's API is relatively straightforward and easy to use.

The main steps I will follow in the modeling process are:

  1. Transform the response variable, that is, the time series, to reduce differences in variance (heteroskedasticity)
  2. Set a baseline model
  3. Perform model choosing and hyperparameter selection
  4. Train the final model in our train + validation set and evaluate its performance in our test set

Transforming the Response Variable

As seen in our Exploratory Analysis and in the visualization above, electricity demand seems to have variance roughly proportional to its mean. In other to reduce this changes in variance throughout the year, we will log-transform our time series. Note that other transformations i.e. Box-Cox transformations, are possible as well. The log transform is commonplace in energy forecasting, however.

As can be seen in the plot of the transformed series, the fluctuations around the mean are of a more similar magnitude across different times of the year.

One final step in the data transformation process is to massage our pandas dataframes into a format that works with Prophet models. This requires creating a dataframe with columns nameds ds and y that store the timestamps and observations, respectively.

Training and Validation Scheme

Instead of using cross-validation (which is not straight-forward for time series) we will train our models using 2014 and 2015 data, and perform hyperparameter tuning and model selection using 2016 data as a validation set. Once a final model is chosen, we will report our final model's performance on the first 9 months of 2017.

Metrics: MAPE

Several metrics may be used to evaluate forecast performance. In this case we will use a popular scale-indifferent metric for forecasting evaluation: the Mean Absolute Percentage Error (MAPE), defined as:

$ MAPE = mean(|P_t|)$

with

$P_t = 100 * (y_t - \hat{y_t}) / y_t$

We will report the Mean Absolute Error for our final model as well, to have a better idea of the expected error of our forecast in the original units of the series (MegaWatts in this case).

Baseline Model

We will use a baseline model consisting of predicting a given time period demand with the equivalent from the previous year, that is, shifting our Train set by 8760 hours.

The performance of this simple method is surprisingly accurate, with a MAPE of 7.03% on the validation set.

Simple Prophet Model

We now proceed to use Prophet to fit a model that, hopefully, will perform better than our simple baseline.

The typical process to define, train and evaluate a Prophet model is:

  1. Define the model using the Prophet class setting the relevant parameters for trend, seasonality, holidays and external regressors if needed
  2. Fit the model using the train set
  3. Create the future dataframe, which contains the train and validation (or testing) timestamps
  4. Forecast our time series for the future dataframe
  5. Visualize and evaluate the forecast

Let's visualize our simple forecast:

Prophet also allows us to plot the components of the model: seasonalities and trend. As can be seen below, this simple model captures some key factors of the Hourly Energy demand:

This model achieves a MAPE of 9.16% on the validation set, which is worse than the performance of our naive baseline model!

Improving our Prophet Model

We will try to improve our previous model by using the following features, inspired by our Exploratory Data Analysis:

A Note on External Regressors:

As seen in our Exploratory Data Analysis notebook, weather variables are heavily correlated with energy demand. This Kaggle dataset, which was compiled using Weather API, contains hourly temperature data for several cities, including San Francisco and Los Angeles, which could be used as external regressors in our models. However, since we are forecasting for time horizons of several months it would not be realistic to know the hourly temperature in advance. Therefore, we will not include temperature for any of our long-term forecasting models.

Holidays and Conditional Seasonality

Let's add our conditional daily seasonality and a series of holidays to our Prophet model. The relevant code can be found below.

Note that the holidays are passed into our model using a pandas dataframe with the relevant holiday dates, as well as the number od days before and after the holiday that should be considered affected. For example, we could incluce Christmas Day and Christmas Eve. In this particular case, we have considered the day before and the day after for the following holidays:

The conditional daily seasonality is defined using extra set of dummy variables that indicate, for each time step, if the observation corresponds to Summer or not and if the observation corresponds to a workday (weekday) or not.

Now, let us visualize our forecast and its components. It is clear that the daily seasonality in summer is roughly unimodal while the daily seasonality in non-summer months is roughly bi-modal. Weekends seem to have lower demand during the mornings as compared to weekdays for both summer and non-summer months. Later we will evaluate if capturing these different patterns yields a better forecast performance.

After evaluating this model's performance on the validation set (2016 data), we achieve a MAPE of 6.76 %, which shows a significant improvement from our simple Prophet model from earlier and also beats the baseline model's performance.

Hyperparameter Tuning and Final Model

Even though we have beaten the baseline, perhaps we can achieve further improvements by tuning the hyper-parameters of the model. We will focus on "tweaking" two hyperparameters:

After fitting a total of 9 different models, corresponding to all the possible combinations of values for changepoint_prior_scale and seasonality_prior_scale, it appears that the best performance on our validation set,based on MAPE, is achieved for:

This model yields a MAPE of 4.23% on the validation set.


Diagnosing our Final Model

We now refit our model on the training set using the best combination of hyper-parameters according to our validation procedure. After that, we take a deeper look into the components of the model and diagnose the fit that we are achieving on the validation set using concepts from classical time series analysis.

Let's visualize our forecast along with the actual validation set observations. Additionally, we can plot the trendline and its changepoints that our model has "learned". From the chart below we can tell that our model does not appropriately forecast peaks in energy demand, especially for the summer months.

We can visualize one more time the different seasonalities defined in our model, starting with the four different daily seasonalities, plotted below for comparison. These are very similar to the ones from the not-tuned model. Likewise, we see that the weekly seasonality is similar to the pattern that already appeared in our simple model, with a dip in demand for the weekends and Mondays.

In order to better understand how our model compares to the actual energy demand, let's look at the forecast for three particular months in 2016: March, July and October.

As can be seen, the demand pattern changes significantly across different times of the year. Our model is able follow these patterns to a certain extent but misses certain peaks in demand, such as the ones in late July, or underestimates the dip in demand during mid-day hours for March and October.

Model Diagnostics

In a similar vein to some classical time series analysis, we will look at the charateristics of the errors of our model on the validation set. Note that this is not technically speaking a residual analysis since residuals are the errors on the training set. However, this will help us understand if our model is failing to capture certain information and to what extent.

Some of the aspects we notice are:


Final Score on the Test Set and Prediction Intervals

Now that we have chosen and understood our forecasting model we will fit it using all the data from 2014 through 2016. Then, we will report its MAPE on our Test Set, the first 9 months of 2017, as a final estimate of the model's performance.

We plot our final forecast on the Test Set, both for the transformed variable (log) and the original units (MW). In the first plot, we leverage Prophet's prediction interval (a 50% prediction interval) to get an idea of where future observations are likely to fall in. Note that these uncertainty intervals are constructed from the uncertainty in the trend. We know that energy demand has very flat trend, which is not likely to change significantly. Thus, it would be more realistic to assess uncertainty using the uncertainty in the seasonality. While Prophet allows that, it requires fairly intense computation so I have kept it out of this notebook for simplicity.

Our final model achieves a MAPE of 5.99% and a Mean Absolute Error (MAE) of 778.59 MW on the Test set.


Take-aways and Next Steps

In this notebook we have defined, trained and fine-tuned a forecasting model for PG&E's hourly electricity demand using Facebook's open source library Prophet.

The modeling process has comprised setting a baseline, transforming the response variable (using the $log$ transform) and performing model selection and hyperparameter tuning with a held-out set.

Our final model is an additive model with the following components:

It is worth noting that our model does not include weather information such as temperature or any auto-regressive features, which is reflected in the high degree of correlation between the forecasting errors. The decision not include these pieces of information was deliberate, as a model forecasting a somewhere between 9 months and a year ahead will not have accurate hourly temperature nor observed demand data at the time of producing the forecast.

While our model only makes use of what we may call "calendar information" it still yields a relatively good performance for most times of the year, achieving a Mean Absolute Percentage Error (MAPE) of 5.99% and a Mean Absolute Error (MAE) of 778.59 MW on our test set, comprised of the first 9 months of the year 2017.

However, our Prophet model is still unable to appropriately predict the significant peaks in demand during the Summer months I believe one way to improve the performance of this model could be to include information from previous years as an external regressor. Another way to improve our model's performance could be to further sophisticate the daily demand patterns by making them conditional on each month or week of the year.

Additionally, this model could be a starting point for models intended to forecast demand at shorter horizons, which could realistically include lagged observations and weather forecasts as extra features.

The next steps in our analysis of PG&E's electricity load will be to develop models for short-term forecasting (one or a few days ahead):