California Energy Demand Analysis - Part 3

Short-Term Load Forecasting With Generalized Additive Models

This is the third of a series of notebooks that dive into the problem of predicting hourly electricity demand through several time series analysis methods.

In this work, we will focus on forecasting hourly energy demand one day ahead (although we will also present a model that performs fairly well for forecasts of several days ahead, under certain assumptions). For this task, we will be using Generalized Additive Models (or GAMs), implemented in the PyGAM library. These models have been used for energy forecasting before, for example in this paper by Hyndman and Fang.

All the modeling and analysis presented here was developed using Jupyter Lab. The original Jupyter notebook, with all the necessary code, can be found on my GitHub. I have cleaned the original notebook by removing most of the code cells and keeping the text and plots.

A Very Brief Intro to Generalized Additive Models

Generalized Additive Models (or GAMs) are a generalization of Generalized Linear Models, in which the expected value of the response variable is modeled as a sum of smooth functions of the covariates, (in some cases a link function is used; in our case, however, the link function is the identity function, so we will not have to worry about that).

While the term "smooth function" can be very broad, these are normally modeled as smoothing splines. In many cases, such as in the PyGAM implementation, the functions are represented by a finite sum of smooth basis functions, like B-splines that simplify the optimization problem to be solved.

As a very high-level overview of what fitting smoothing splines looks like, we can say that they aim to balance a fit objective (representing the data well enough) and a smoothness objective (not being too "wiggly"). As you may have noticed, this sounds very similar to a bias-variance tradeoff. In fact, by controlling the smoothness of the terms in our GAMs we will precisely try to find a "sweetspot" between both (thus, one could think of the smoothing parameter of GAMs more or less as a regularization parameter).

The relaxation in the assumptions about the additive terms makes GAMs more flexible than linear regression while their additive nature makes them more interpretable than many non-parametric methods (e.g. ensemble methods), since it allows us to visualize the contribution of each term to the response variable. Additionally, many modern implementations of GAMs allow for a very fast fitting process.

More information on GAMs can be found on their Wikipedia page or even on PyGAM's documentation. For a deeper dive on this topic, refer to this seminal paper by Hastie and Tibshirani or to a comprehensive book on GAMs by Wood. Due to the limitations of this work, I will not be able to dive further into the details of GAMs, but I highly encourage anyone interested to read more about this technique.


Data Loading

As in previous work, the first step in our process is to fetch the data. We will use the California EMS Hourly Load PG&E data from the years 2014 to 2017.

The historical data is stored in several Excel files which can be found and downloaded from the CAISO website.

All of our models will be fitted using the log-transformed hourly load. Taking the log of our original dependent variable aims to equalize the variance of the response, which is a useful property to have.


Model Selection and Fitting

Once our Training, Validation and Test sets are ready, and having already performed an Exploratory Data Analysis of our data, we can start fitting Generalized Additive Models to the time series.

The process followed for the rest of this notebook basically consists on defining increasingly complex models, starting with a simple model that uses only "calendar" variables (time of year, day of week, hour of day) and work our way through several sophistications of the model to arrive at a GAM that uses lagged observations and hourly temperature.

The models will be fitted using the Training set and then evaluated on the Validation set. The performance of the models will be measured using the Mean Absolute Percentage Error or MAPE.

Finally, the best performing model (according to the Validation set MAPE) will be re-trained on all the data that is not part of the Test set. We will report this model's performance on the Test set as a final indicator of the real performance of our model

A Simple Linear GAM with Calendar Features

To start, let's model the log-transformed hourly electricity demand usign only three additive spline terms:

Our model can be roughly represented as:

$ Y_t = f_{hour}(x_{hour}) + f_{weekly}(x_{weekly}) + f_{yearly}(x_{yearly}) + \beta_0 $

where

PyGAM includes useful functionality to plot the contribution of each term to the response, separately, through the partial_dependence method. Below we can see the partial dependence for each term in this very simple model. In line with our intuitions and previous analyses we can distinguish:

This simple GAM with three independent "calendar" terms achieves a Train set MAPE of 4.61% and a Validation set MAPE of 4.94%. For reference, this is already better than the first model developed with Facebook's Prophet in my previous analysis.

Time of Year - Hour of Day Interaction

Now, as discovered in the Exploratory Data Analysis performed previous to this work, the daily seasonality pattern changes during the year, transitioning from a bi-modal pattern in Winter to a unimodal one in Summer. Using GAMs, it is possible to model this behavior by interacting the Hour of Day and Day of Year variables through a tensor product term, which is readily available in PyGAM. The details of this operation fall beyond the scope of this work but what we are basically doing is modeling each possible combination of hour of day and time of year differently.

Thus, we formulate a more sophisticated model, based on the same three calendar variables, but substituting the spline terms for daily and yearly seasonalities with a tensor product term. The weekly seasonality remains modeled as an additional spline term.

Our model can be represented as:

$ Y_t = te_(x_{hour}, x_{yearly}) + f_{weekly}(x_{weekly}) + \beta_0 $

where:

We can visualize the tensor product term contribution (partial dependence) as a 3D surface, with a particular contribution for each combination of hour of the day and day of the year. Below we can precisely appreciate how the daily seasonality and the average demand level change throughout the year. For a better comparison, the chart on the right represents sections of this 3D surface at three different times of the year, which basically model the daily seasonality pattern at that point of the year. As can be seen, the demand during the middle of the year is higher and has a roughly unimodal pattern, whereas demand decreases during the non-Summer months and show a bi-modal daily pattern.

This GAM achieves a MAPE of 3.98% on the Train set and a MAPE of 4.45% on the Validation set We see that including the interaction term has lead to more drastic improvement in the Train set performance. In fact, the difference between the Train and Validation scores could point to some overfitting; later on we will try to address this issue by tuning the smoothing of the model.

At this point, it would be good to get a better idea of how well our current forecast fits the data, so let's zoom in closer and visualize the actual and forecasted log-transformed demand for the first 11 days of April and July 2016. From the plots below, we can conclude that our model is able to capture the seasonality patterns relatively well, but misses certain "peaks and valleys" in hourly demand, for example:

Yearly-Weekly-Daily Interaction

In addition to the points above, we notice slight "bumps" in the forecast from the last hour of one day to the first hour of the next, espcially from Sundays to Mondays. This and another errors in our forecast may be due to the fact that daily seasonality changes for different days of the week. We can try adding an interaction term between the already existing yearly-daily term and the day of the week. PyGAM allows us to do this by interacting the tensor term with dummy variables representing each day of the week. Thus, we model a daily-yearly demand surface separately for each day of the week.

This model can be represented by the following formula, in which we define one tensor product term for each day of the week:

$ Y = x{M}te{M}(x{hour}, x{yearly}) + x{T}te{T}(x{hour}, x{yearly}) + x{W}te{W}(x{hour}, x{yearly}) + x{Th}te{Th}(x{hour}, x{yearly})

+ x_{F}te_{F}(x_{hour}, x_{yearly}) + x_{S}te_{S}(x_{hour}, x_{yearly}) + x_{Su}te_{Su}(x_{hour}, x_{yearly}) + \beta_0 $

where

Unfortunately, PyGAM's current capabilities do not allow for visualization of the partial dependence of the seven different tensor product terms included in our new model.

With this new model, in which interactions between all calendar variables have been captured, we achieve a Train set MAPE of 3.73% and a Validation set MAPE of 4.33%. Additionally, as can be seen in the charts below, we have gotten rid of the abrupt changes in forecast demand between days.

Holiday Effects

Finally, we can include extra terms in our model that aim to account for the effect that holidays may have on energy demand. As done in previous work, we do so by extracting several important holidays from the US Federal Holiday Calendar, and representing them as dummy variables in the model. Additionally, we include the day before/day after the holiday indicator, which is known to have an effect on electricity demand as well.

Below we can see the partial dependence of the holiday and "next-to-holiday" dummy variables. As expected, both lead to a decrease in energy demand, although the effect is significantly larger in the case of holidays.

Including holidays as extra terms in the model leads to a MAPE of 3.70% on the Train Set and a MAPE of 4.28% on the Validation Set. These improvements in performance are small but enough to illustrate that holidays do have a relevant effect on electricity demand.

Hyperparameter Tuning: Models with Calendar Variables

Our GAM exclusively based on calendar variables (hour of day, day of week, time of year and holidays) achieves a MAPE of 4.28% on the Validation Set, without any hyperparameter tuning. We can adjust the smoothing parameter for each spline term in order to achieve a better performance. After trying several combinations of the smotthing parameters for the daily and yearly seasonality we find that the best combinaton (that is, the one that yields the best MAPE on the validation set) is:

which yields a MAPE of 3.94% on the Validation set

Comparing Our Forecasts (so far)

We have made a lot of progress so far, by sophisticating the way our model uses only four simple variables. Let us visualize all of our models forecasts for four different 11-day chunks of the year 2016:

It can be seen that our very first model was not able to capture the changes in the daily seasonality pattern at different times of the year. Adding the daily-yearly interaction helps solve this problem. Further, including an interaction with the day of the week prevents the model from overshooting the dip in consumption during weekend nights. Finally, if we look at special days such as July 4th or December 26th (in 2016 Christmas Day was on a sunday and thus the Monday after was a Holiday), we see how adding holiday terms allows for a better prediction of the decrease in demand in these cases.

Visualizing the Entire Forecast with Prediction Intervals

Finally, below we can visualize the forecast of our best model so far for our entire validation set, the year 2016, as well as prediction intervals that provide us with a range of values between which the actual demand is likely to fall into with a certain probability.

While our model achieves good performance (3.94% MAPE), it is not able to capture a significant part of the demand variability, as can be seen by the high demand peaks during Summer. From here on, we will add other pieces to our model, including lagged observations and external regressors, such as temperature, to capture this variability.

Lagged Observations

Unlike our previous Long-Term Forecasting model, developed using Prophet, the models developed in this analysis will include past observations as features, in order to capture the auto- regressive nature of the time series. Obviously, only current and past observations are available at a given point when the forecast is performed.

Hyperparameter Tuning: Models with Lagged Observations

We will tune our model to find the ideal number of lagged observations to include as extra terms, in addition to the smoothing parameters to use for the lagged terms as well as the calendar terms (note that the smoothing parameters found before are not necessarily the best once we have included lagged observations as extra terms).

After trying a few combinations we see that the best performing model, accorind to the Validation set MAPE, has the following specification:

We can represent this model with the following formula:

$ Y_{t} = \sum (x_i tei(x{hour}, x{yearly})) + \beta{holiday} x{holiday} + \beta{next to holiday} x_{next to holiday}

+ f_{lag1}(Y_{t-24}) + f_{lag2}(Y_{t-48}) f_{lag3}(Y_{t-72})
+ f_{weekly lag}(Y_{t-168})
+ \beta_0 $

where:

This model achieves a MAPE of 2.57% on the Validation set which is a very big improvement from our model using exclusively time and date information.

Understanding the Lagged Observations Partial Dependence

Just like we did with other terms, we can visual the partial dependence of the model with respect to our lagged variables, as seen below. The partial dependence of the first daily lag stands out as being almost a perfectly linear relationship

Comparing Our Forecasts (again)

As mentioned earlier, including past observations of energy demand in our model leads to a model with significantly better performance than our previous one, which shows the power of auto-regressive terms. We can interpret this as follows: energy demand variability does not depend exclusively on the date and the time, obviously, and observations of past demand help us indirectly capture a lot of this information to improve our predictions. Of course, we can only benefit from this performance increase in short-term forecasts (at most 24 hours ahead) in which we have observed the real lagged observations.

Let us visualize the forecasts of this model and the model exlcusively based on calendar variables on the same four chunks of our validation set as before to see how they differ. Clearly, the model with lagged observations matches the observations more closely than the previous model in most cases, although its reliance on past observations seems to lead to very wrong predictions in some cases (e.g. April 7th).

Temperature

It is well-known that weather, and in particular temperature, is very much related to energy demand. Therefore, the last addition to our GAM is the hourly average temperature between San Francisco and Los Angeles.

Temperature data was obtained from this Kaggle dataset, which in turn was compiled using Weather API.

GAM with a Temperature Term

Temperature is included in the model via an extra spline term.

If we keep the previous model as is, with the same hyperparameters, and add a spline term for temperature with a smoothing parameter of 1000 we achieve a MAPE of 2.56% in the Validation Set, which is not a great improvement from our latest model.

We can re-tune some of our parameters to squeeze the best performance we can get from this model. After tuning the smoothing parameter for the temperature, lags and daily terms we arrive at a model that achieves MAPE of 2.53% on the Validation set with the following smoothing parameters:

This is still not an amazing improvement from our last model. We have done what we could.

Temperature Partial Dependence

As we have been doing with other terms, it is interesting to visualize the contribution of temperature to our model, via its partial dependence. Like in many other analyses, including my previous EDA, energy demand shows an asymmetric 'U'-shaped relationship with temperature.

A Warning on Temperature Data

While we have easily included real temperature data as a feature in our model, in a real scenario we only have the forecasted temperature. Even though weather forecasts have improved greatly and can be quite reliable for short forecasting horizons (e.g. 1 day), soe variability is expected which could lead to a worse performance of our model.


Final Model Performance on the Test Set

Finally, now that we have a decent model based on calendar variables, lagged observations and temperature, which we have tuned via the smoothing parameters of each term, we can train the model on the Training and Validation set, and then report the model's performance on the Test set.

The final score of our model on the Test set is a MAPE of 3.10% and a Mean Absolute Error of 388.61 MW.

This performance is significantly better than the one achieved

Below we can visualize the observed and forecasted electricity demand for the training + validation set and the test set, now in its actual units (MW). We see that, roughly, our forecast follows actual demand relatively closely, although the prediction intervals seem too narrow.


Take-aways and Future Work

In this notebook we have defined, trained and fine-tuned a forecasting model for PG&E's hourly electricity demand using Generalized Additive Models implemented in PyGAM.

The modeling process has followed similar steps to our previous forecasting model with Prophet:

Our final model is a Generalized Additive Model containing tensor product terms, spline terms and dummy variables (or factor terms). More specifically, our model includes:

Our model achieves a Mean Absolute Percentage Error (MAPE) of 3.10% and a Mean Absolute Error (MAE) of 388.61 MW on the Test set, comprised of the first 9 months of the year 2017. This is a significant improvement with respect to the model built with Prophet, although I suspect that a big part of that improvement comes from the inclusion of auto-regressive terms. That said, we have observed that a GAM based exclusively on calendar variables can achieve a better validation performance than an equivalent model in Prophet. While we have not looked at the Test performance of this model we can say that GAMs, as implemented on PyGAM, seem to provide a better fit than Prophet, even when based exclusively on temporal information.