Builds

Building the components of the Visual Analytics application

  • Explore
  • Technical Analysis
  • Forecast

Forecast

  • A Forecasting Workflow
    • A Visual Analytics Application
    • Benefits of modeltime
  • Time series forecasting with Modeltime
    • Loading the packages
    • Loading and preprocessing the data
    • Models creation and training
      • ARIMA
      • Prophet
      • ElasticNet
      • Random Forest
      • XGBoost
      • SVM RBF
      • Boosted ARIMA
      • Boosted Prophet
    • Evaluation
  • Dashboard design
  • Reference

This report details the exploration of using modeltime for stock forecasting, which will be implemented as a sub-module for our Shiny Visual Analytics application

A Forecasting Workflow

Let’s first take a look at the standard process of producing forecasts for time series data and the common R packages which are used for each step of the workflow.

  • Data preparation (tidy): Before any time series forecasting can be done, the core requirement that needs to be satisfied is getting historical data. After which the data must be prepared to get the correct format. Lucky for us, since the focus of our application is on stock analysis and forecasting, the retrieval of stock’s historical data can be easily done with the use of packges such as tidyquant or quantmod. No pre-processing is needed as the data will already be in a cleaned format.
  • Plot the data (visualize): Once data are collected and processed, the next essential step in understanding the data is Visualization. By visually examining the data, we can spot common patterns and trends, which will in turn help us specify an appropriate model. This step of the workflow will be handled by the Explorer module of our Shiny application through the use the timetk package.
  • Define a model (specify): There are many different time series models that can be used for forecasting. Choosing an appropriate model for the data is essential for producing appropriate forecasts. It is generally a good idea to try out and compare a few different models before specifying a certain model for forecasting. Traditional time series forecasting models such as ARIMA, Exponential smoothing state space model (ETS) and Time series linear model (TSLM) are available through the forecast package, which has now been deprecated and replaced by the fable package. Machine learning models can deployed using the tidymodels framework with its’ machine learning focused packages and toolkit.
  • Train the model (estimate): Models will have to be fitted into the time series before it can carry out any forecasting. The process usually involves one or more parameters which must be estimated using the known historical data. The parameters often differs between models and packages, requiring the forecaster to understand the syntax of each model to perform the modeling.
  • Check model performance (evaluate): The performance of the model can only be properly evaluated after the data for the forecast period have become available. A number of methods have been developed to help in assessing the accuracy of forecasts.
  • Produce forecasts (forecast): Once a model has been evaluated and selected as the best model with its parameters estimated, the model should then be refitted with the entire data being forecasted forward.

A Visual Analytics Application

The Visualize, Specify, Estimate and Evaluate steps form an iterative process which requires the forecaster to perform repeated cycles of calculated trial and error in order to achieve a good result. The Shiny Visual Analytics Application (VAA) will utilize graphs to explore the data, analyze the validity of the models fitted and present the forecasting results. By providing the user with an interface to tune and visualize models, the application will enable forecasters to easily experiment with different algorithms without the need to write codes and scripts.

Benefits of modeltime

  • Integrate closely with the tidyverse collection of packages, particularly modeltime lets user taps into the machine learning ecosystem of tidymodels through the use of parsnip models.
  • Easy to create, combine and evaluate a plethora of forecasting models.
  • Plots created by modeltime are either ggplot2 for non-interactive plots or plotly for interactive plots, making it easy to do additional configuration such as themes and legends.

Time series forecasting with Modeltime

Loading the packages

The packages below will be required. Additional tidymodels engines are also needed depending on the machine learning models to be used. - tidyverse - lubridate - timetk - modeltime - tidymodels

packages <- c('tidyverse', 'lubridate', 'timetk', 'modeltime', 'tidymodels', 'tidyquant', 'glmnet', 'randomForest', 'earth')

for (p in packages){
  if (!require(p,character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

Loading and preprocessing the data

One of the objective of the application is to make it stock-agnostic, meaning we can use the application with any stock and not just any specific symbol. For the purpose of this report, we will be using Apple’s stock (AAPL).

stock <- tq_get("AAPL", get = "stock.prices", from = " 2021-01-01")

The target for this experiment is to use historical data from the beginning of 2021 and try to forecast the stock price of the next 2 weeks. We will use timetk diff_vec() function to perform differencing on the data to remove the trends and make the data stationary.

stock_tbl <- stock %>%
  select(date, close) %>%
  filter(date >= "2021/01/01") %>%
  set_names(c("date", "value")) %>%
  mutate(value = diff_vec(value)) %>%
  mutate(value = replace_na(value, 0))
  
stock_tbl
## # A tibble: 89 x 2
##    date        value
##    <date>      <dbl>
##  1 2021-01-04  0    
##  2 2021-01-05  1.60 
##  3 2021-01-06 -4.41 
##  4 2021-01-07  4.32 
##  5 2021-01-08  1.13 
##  6 2021-01-11 -3.07 
##  7 2021-01-12 -0.180
##  8 2021-01-13  2.09 
##  9 2021-01-14 -1.98 
## 10 2021-01-15 -1.77 
## # ... with 79 more rows

Plotting the stock’s historical data.

stock_tbl %>%
  plot_time_series(date, value, .interactive = TRUE)
Jan 172021Jan 31Feb 14Feb 28Mar 14Mar 28Apr 11Apr 25May 9−4−20246
Time Series Plot
plotly-logomark

Split the data, using the data from the last 3 weeks as validation data.

splits <- stock_tbl %>%
  time_series_split(assess = "3 weeks", cumulative = TRUE)

splits %>%
  tk_time_series_cv_plan() %>%
  plot_time_series_cv_plan(date, value, .interactive = TRUE)
Feb 2021Mar 2021Apr 2021May 2021−4−20246
trainingtestingTime Series Cross Validation PlanSlice1Legend
plotly-logomark

The machine learning models from parsnip can’t process the Date column as is, hence, some features engineering are required to convert the data to the correct format. This can be done using the step_timeseries_signature from modeltime, which returns a recipe object. With it, we can perform additional recipe functions such as removing unused columns and creating dummy variables for categorical features. Note that for parsnip models, the role of the Date column needs to be converted to “ID” while modeltime can use the Date column as a predictor so we’ll create 2 recipes to be used, depending on the model.

recipe_spec <- recipe(value ~ date, training(splits)) %>%
  step_timeseries_signature(date) %>%
  step_rm(contains("am.pm"), contains("hour"), contains("minute"),
          contains("second"), contains("xts"), contains("half"),
          contains(".iso")) %>%
  step_normalize(date_index.num) %>%
  step_fourier(date, period = 12, K = 1) %>%
  step_dummy(all_nominal())
  
recipe_spec_parsnip <- recipe_spec %>%
  update_role(date, new_role = "ID")

bake(recipe_spec %>% prep(), new_data = NULL)
## # A tibble: 75 x 36
##    date        value date_index.num date_year date_quarter date_month date_day
##    <date>      <dbl>          <dbl>     <int>        <int>      <int>    <int>
##  1 2021-01-04  0              -1.68      2021            1          1        4
##  2 2021-01-05  1.60           -1.65      2021            1          1        5
##  3 2021-01-06 -4.41           -1.62      2021            1          1        6
##  4 2021-01-07  4.32           -1.58      2021            1          1        7
##  5 2021-01-08  1.13           -1.55      2021            1          1        8
##  6 2021-01-11 -3.07           -1.46      2021            1          1       11
##  7 2021-01-12 -0.180          -1.43      2021            1          1       12
##  8 2021-01-13  2.09           -1.40      2021            1          1       13
##  9 2021-01-14 -1.98           -1.36      2021            1          1       14
## 10 2021-01-15 -1.77           -1.33      2021            1          1       15
## # ... with 65 more rows, and 29 more variables: date_wday <int>,
## #   date_mday <int>, date_qday <int>, date_yday <int>, date_mweek <int>,
## #   date_week <int>, date_week2 <int>, date_week3 <int>, date_week4 <int>,
## #   date_mday7 <int>, date_sin12_K1 <dbl>, date_cos12_K1 <dbl>,
## #   date_month.lbl_01 <dbl>, date_month.lbl_02 <dbl>, date_month.lbl_03 <dbl>,
## #   date_month.lbl_04 <dbl>, date_month.lbl_05 <dbl>, date_month.lbl_06 <dbl>,
## #   date_month.lbl_07 <dbl>, date_month.lbl_08 <dbl>, date_month.lbl_09 <dbl>,
## #   date_month.lbl_10 <dbl>, date_month.lbl_11 <dbl>, date_wday.lbl_1 <dbl>,
## #   date_wday.lbl_2 <dbl>, date_wday.lbl_3 <dbl>, date_wday.lbl_4 <dbl>,
## #   date_wday.lbl_5 <dbl>, date_wday.lbl_6 <dbl>

Models creation and training

ARIMA

Let’s create and fit our first model - ARIMA, using modeltime. Below are the possible parameters which we will expose on the final Shiny application for users to configure. - seasonal_period: The periodic nature of the seasonality. Uses “auto” by default. - non_seasonal_ar: The order of the non-seasonal auto-regressive (AR) terms. - non_seasonal_differences: The order of integration for non-seasonal differencing. - non_seasonal_ma: The order of the non-seasonal moving average (MA) terms. - seasonal_ar: The order of the seasonal auto-regressive (SAR) terms. - seasonal_differences: The order of integration for seasonal differencing. - seasonal_ma: The order of the seasonal moving average (SMA) terms.

model_fit_arima <- arima_reg() %>%
  set_engine("auto_arima") %>%
  fit(value ~ date, training(splits))

For the purpose of this report, we’ll immediately plot the forecast for each model using the code chunk below (the code is hidden for subsequent models). However, the usual process is to combine all the fitted model into a Modeltime Table and perform forecasting together. Once all the models have been setup, we will do the combination in a later step.

model_table_temp <- modeltime_table(
  model_fit_arima) %>%
  update_model_description(1, "ARIMA")

calibration_table_temp <- model_table_temp %>%
  modeltime_calibrate(testing(splits))

calibration_table_temp %>%
  modeltime_forecast(actual_data = stock_tbl) %>%
  plot_modeltime_forecast(.interactive = TRUE)
Feb 2021Mar 2021Apr 2021May 2021−4−20246
ACTUAL1_ARIMAForecast PlotLegend
plotly-logomark

Prophet

Similar to the ARIMA model, we use modeltime to create and train a Prophet model. Possible paramters to be exposed on Shiny app are: - growth: String ‘linear’ or ‘logistic’ to specify a linear or logistic trend. - changepoint_num: Number of potential changepoints to include for modeling trend. - changepoint_range: Range changepoints that adjusts how close to the end the last changepoint can be located. - season: ‘additive’ (default) or ‘multiplicative’.

workflow_fit_prophet <- workflow() %>%
  add_model(
    prophet_reg() %>% set_engine("prophet")
  ) %>%
  add_recipe(recipe_spec) %>%
  fit(training(splits))
Feb 2021Mar 2021Apr 2021May 2021−6−4−20246
ACTUAL1_PROPHETForecast PlotLegend
plotly-logomark

ElasticNet

Next, we can start creating our machine learning models, starting with a linear regression model. Setting the mixture variable will allow us to configure the model to be Ridge, Lasso or ElasticNet. Here, we’ll use ElasticNet for our experiment. The penalty parameter can be used to set the regularization amount.

model_spec_glmnet <- linear_reg(penalty = 0.01, mixture = 0.5) %>%
  set_engine("glmnet")

workflow_fit_glmnet <- workflow() %>%
  add_model(model_spec_glmnet) %>%
  add_recipe(recipe_spec_parsnip) %>%
  fit(training(splits))
Feb 2021Mar 2021Apr 2021May 2021−6−4−202468
ACTUAL1_ElasticNetForecast PlotLegend
plotly-logomark

Random Forest

A random forest model can be setup similarly to the linear regression model. The arguments for this model are: - mtry: The number of predictors that will be randomly sampled at each split when creating the tree models. - trees: The number of trees contained in the ensemble. - min_n: The minimum number of data points in a node that are required for the node to be split further.

model_spec_rf <- rand_forest(trees = 500, min_n = 50) %>%
  set_engine("randomForest")

workflow_fit_rf <- workflow() %>%
  add_model(model_spec_rf) %>%
  add_recipe(recipe_spec_parsnip) %>%
  fit(training(splits))
Feb 2021Mar 2021Apr 2021May 2021−4−20246
ACTUAL1_Random ForestForecast PlotLegend
plotly-logomark

XGBoost

The model for XGBoost has similar arguments to the Random Forest model, with the addition of: - tree_depth: The maximum depth of the tree (i.e. number of splits). - learn_rate: The rate at which the boosting algorithm adapts from iteration-to-iteration.

workflow_fit_xgboost <- workflow() %>%
  add_model(
    boost_tree() %>% set_engine("xgboost")
  ) %>%
  add_recipe(recipe_spec_parsnip) %>%
  fit(training(splits))
Feb 2021Mar 2021Apr 2021May 2021−4−20246
ACTUAL1_XGBoostForecast PlotLegend
plotly-logomark

SVM RBF

The last machine learning model to be tested is a radial basis function support vector machines (SVM RBF) model, with the following parameters to be exposed on Shiny app: - cost: The cost of predicting a sample within or on the wrong side of the margin. - rbf_sigma: The precision parameter for the radial basis function. - margin: The epsilon in the SVM insensitive loss function (regression only)

workflow_fit_svm <- workflow() %>%
  add_model(
    svm_rbf() %>% 
    set_engine("kernlab") %>%
    set_mode("regression")
  ) %>%
  add_recipe(recipe_spec_parsnip) %>%
  fit(training(splits))
Feb 2021Mar 2021Apr 2021May 2021−4−20246
ACTUAL1_SVMForecast PlotLegend
plotly-logomark

Boosted ARIMA

modeltime supports a hybrid model - Boosted ARIMA, so let’s also add it to our experiment for stock forecasting.

workflow_fit_arima_boosted <- workflow() %>%
  add_model(
    arima_boost(min_n = 2, learn_rate = 0.015) %>%
    set_engine(engine = "auto_arima_xgboost")
  ) %>%
  add_recipe(recipe_spec) %>%
  fit(training(splits))
Feb 2021Mar 2021Apr 2021May 2021−4−20246
ACTUAL1_ARIMA BoostedForecast PlotLegend
plotly-logomark

Boosted Prophet

A hybrid model of Prophet and XGBoost is also supported by modeltime.

model_spec_prophet_boost <- prophet_boost(seasonality_weekly = FALSE,
                                          seasonality_daily =  FALSE,
                                          seasonality_yearly = FALSE) %>%
  set_engine("prophet_xgboost") 

workflow_fit_prophet_boost <- workflow() %>%
  add_model(model_spec_prophet_boost) %>%
  add_recipe(recipe_spec) %>%
  fit(training(splits))
Feb 2021Mar 2021Apr 2021May 2021−6−4−202468
ACTUAL1_Prophet BoostedForecast PlotLegend
plotly-logomark

Evaluation

Once all the models have been fitted, they will be added into a Model Table using modeltime_table for an easy way to visualize and evaluate the performance of the models as well as do forecasting on all models at once.

model_table <- modeltime_table(
  model_fit_arima,
  workflow_fit_prophet,
  workflow_fit_glmnet,
  workflow_fit_rf,
  workflow_fit_xgboost,
  workflow_fit_svm,
  workflow_fit_arima_boosted,
  workflow_fit_prophet_boost) %>%
  update_model_description(1, "ARIMA") %>%
  update_model_description(2, "Prophet") %>%
  update_model_description(3, "ElasticNet") %>%
  update_model_description(4, "Random Forest") %>%
  update_model_description(5, "XGBoost") %>%
  update_model_description(6, "SVM") %>%
  update_model_description(7, "ARIMA Boosted") %>%
  update_model_description(8, "Prophet Boosted")

model_table
## # Modeltime Table
## # A tibble: 8 x 3
##   .model_id .model     .model_desc    
##       <int> <list>     <chr>          
## 1         1 <fit[+]>   ARIMA          
## 2         2 <workflow> Prophet        
## 3         3 <workflow> ElasticNet     
## 4         4 <workflow> Random Forest  
## 5         5 <workflow> XGBoost        
## 6         6 <workflow> SVM            
## 7         7 <workflow> ARIMA Boosted  
## 8         8 <workflow> Prophet Boosted

Before the forecasting can be evaluated, we need to call the calibration_table() function to compute predictions and residuals from out of sample data.

calibration_table <- model_table %>%
  modeltime_calibrate(testing(splits))

calibration_table
## # Modeltime Table
## # A tibble: 8 x 5
##   .model_id .model     .model_desc     .type .calibration_data
##       <int> <list>     <chr>           <chr> <list>           
## 1         1 <fit[+]>   ARIMA           Test  <tibble [14 x 4]>
## 2         2 <workflow> Prophet         Test  <tibble [14 x 4]>
## 3         3 <workflow> ElasticNet      Test  <tibble [14 x 4]>
## 4         4 <workflow> Random Forest   Test  <tibble [14 x 4]>
## 5         5 <workflow> XGBoost         Test  <tibble [14 x 4]>
## 6         6 <workflow> SVM             Test  <tibble [14 x 4]>
## 7         7 <workflow> ARIMA Boosted   Test  <tibble [14 x 4]>
## 8         8 <workflow> Prophet Boosted Test  <tibble [14 x 4]>

Once the models have been calibrated, we can start plotting the forecasted values for all models and evaluate their accuracy/error using the table_modeltime_accuracy function.

calibration_table %>%
  modeltime_forecast(actual_data = stock_tbl) %>%
  plot_modeltime_forecast(.interactive = TRUE)
Feb 2021Mar 2021Apr 2021May 2021−6−4−202468
ACTUAL1_ARIMA2_Prophet3_ElasticNet4_Random Forest5_XGBoost6_SVM7_ARIMA Boosted8_Prophet BoostedForecast PlotLegend
plotly-logomark
calibration_table %>%
  modeltime_accuracy() %>%
  table_modeltime_accuracy(.interactive = FALSE)
Accuracy Table
.model_id .model_desc .type mae mape mase smape rmse rsq
1 ARIMA Test 1.42 124.85 0.57 178.86 1.86 0.07
2 Prophet Test 1.94 237.08 0.78 160.50 2.42 0.08
3 ElasticNet Test 1.94 222.76 0.78 155.71 2.54 0.09
4 Random Forest Test 1.50 118.14 0.60 160.91 2.02 0.03
5 XGBoost Test 1.74 239.24 0.70 147.38 2.18 0.01
6 SVM Test 1.41 90.84 0.56 157.91 1.97 0.01
7 ARIMA Boosted Test 1.53 169.20 0.61 144.11 2.02 0.23
8 Prophet Boosted Test 1.81 203.25 0.73 148.76 2.48 0.05

The final step is to refit the models to the full dataset using modeltime_refit and forecast them forward.

calibration_table %>%
  modeltime_refit(stock_tbl) %>%
  modeltime_forecast(h = "2 weeks", actual_data = stock_tbl) %>%
  plot_modeltime_forecast(.interactive = TRUE)
Feb 2021Mar 2021Apr 2021May 2021−8−6−4−20246
ACTUAL1_ARIMA2_Prophet3_ElasticNet4_Random Forest5_XGBoost6_SVM7_ARIMA Boosted8_Prophet BoostedForecast PlotLegend
plotly-logomark

Dashboard design

Notable features:

  • Stock selection: the application will allow the user select the stock they want to forecast from the dropdown list.
  • Horizon selection: users will be able to define the forecasting horizon. It’s still to be decided whether the field will have some fixed values or will it be a free text field.
  • Differencing: a checkbox to configure whether the stock data should go through differencing before being trained and forecasted.
  • Model selection and Dynamic UI: depending on which model is selected in the Model field, the Dynamic UI section will display the configurable parameters of the selected model. This will enable users to fine tune their models to get the desired result. Once the parameters have been set, clicking on the retrain model will re-fit all the models and update the line charts on the right.
  • Train & Forecast: the chart for training and the chart for forecast will be on 2 different sub-tabs. This will preserve the screen real-estate while allowing users to quickly swap back and forth between training and forecasting.

Reference

  • Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. OTexts.com/fpp3. Accessed on April 7th 2021.
Close

Explore

  • 1. Overview
  • 2. Literature Review
  • 3. Exploring and Visualizing the stock data
    • Check the required R packages and load them
    • Load the data
    • Scatterplot of 8 stocks’ prices
    • Scatterplot of 8 stocks’ transaction volumes
    • Scatterplot of the stock price - Weekly Trend
    • Scatterplot of the stock price - Monthly Trend
    • Scatterplot of the stock price
    • Scatterplot of the stock volume
  • 4. Stationize the data
    • Visualize the JP Morgan price
    • Assesse its ACF and PACF
    • Use the Differencing technique
  • 5. Observations & Suggestions
  • 6. Storyboard for the design of the sub-module.
  • 7. References

1. Overview

This post is served as a sub-module of Visual Analytics project. I aim to leverage on the time series analysis techniques and interactivity approaches in R to present the possible visualizations of US Market stocks.

Throughout this exercise, I mainly use the “tidyverse”, “tidyquant”, “timetk”, “TSstudio” and “forecast” packages in R to explore the patterns of the stock prices and the transaction volumes. Basically, it will consists of Single Time Series Analysis, Multiple Time Series Analysis and Auto-correlation Plots. The entire project will incorporate the Machine Learning and ARIMA Model Forecasting, thus this sub-module is the preliminary data exploration to interpret the data behaviors and patterns and the pre-processing step for further analysis.

2. Literature Review

According to APTECH, Time series data is data that is collected at different points in time.

A few examples are shown as below. (Reference: APTECH )

Generally speaking, the Time Series data has six basic patterns:

  • Trend: the overall direction of the change
  • Variability: the average degree of the change throughout a particular time span
  • Rate of Change: the percentage of the change from one point to another
  • Cycles: the patterns that repeat at regular intervals (such as daily or weekly etc.)
  • Exceptions: the values fall outside the norms
  • Co-variation: two time series relate to one another then the changes in one are reflected as changes in the other

Thus, visualizing time series data provides a preliminary tool for detecting if data:

  • Is mean-reverting or has explosive behavior;
  • Has a time trend;
  • Exhibits seasonality;
  • Demonstrates structural breaks.

In this project, we target on the US Market stocks as the Stock prices and transaction volumes are sort of time series data. In addition, we would try to do Forecasting on the Stock prices (in another sub-module).

In this sub-module, I will start off with visualizing some stocks’ prices and transaction volumes as Exploratory Data Analysis.

Then, I will pick one stock and use Autocorrelation function (ACF) and Partial Autocorrelation function (PACF) to stationize it. Why do we need to stationize the data? Well, in general, the stock prices and volumes are not stationary data, Thus in order to do Forecasting afterwards, making it stationary is a must-have processing step.

3. Exploring and Visualizing the stock data

Check the required R packages and load them

packages = c('timetk', 'modeltime', 'tidymodels', 'lubridate', 'tidyverse', 'tidyquant', 'TSstudio', 'forecast')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Load the data

We selected 8 stocks from 4 different sectors.

  • Industrials sector: AAL (American Airlines), SAVE (Spirit Airlines)
  • Financials sector: BAC (Bank of America), JPM (JP Morgan)
  • Healthcare sector: JNJ (Johnson & Johnson), PFE (Pfizer)
  • Information Technology sector: MSFT (Microsoft), AAPL (Apple Inc.)

In this paper, we focus on the period from 2015 April 1 to 2021 March 31.

tq_get() function is used to retrieve the stock prices and transaction volume.

stocks = c('AAL', 'SAVE', 'BAC', 'JPM', 'JNJ', 'PFE', 'MSFT', 'AAPL')

startdate <- "2015-04-01"
enddate <- "2021-03-31"

data <- data.frame()

for(s in stocks){
  newstock <- tq_get(s, get = "stock.prices", from = startdate, to  = enddate)
  data <- rbind(data, newstock)

}

Scatterplot of 8 stocks’ prices

Let’s display them by 2 columns.

data %>%
  group_by(symbol) %>%
  plot_time_series(date, adjusted,
                   .color_var = year(date),  
                   .facet_ncol = 2,
                   .interactive = F,
                   .y_intercept = 0,
                   .title = "Stocks Price",
                   # .x_lab = "Date",
                   # .y_lab = "Price (US$)",
                   .color_lab = "Year",
                   .plotly_slider = FALSE) 

Scatterplot of 8 stocks’ transaction volumes

Let’s display them by 2 columns.

data %>%
  group_by(symbol) %>%
  summarise_by_time(
    date, .by = "month",
    volume = SUM(volume)
  ) %>%
  plot_time_series(date, volume, 
                   .facet_vars   = contains("symbol"),
                   .title = "Transaction Volume",
                   .facet_ncol = 2, .interactive = F, .y_intercept = 0)

Scatterplot of the stock price - Weekly Trend

We can aggregate the data by weekly basis. Here, I choose American Airlines as an example.

data %>%
  filter(symbol == "AAL") %>%
  summarise_by_time(
    date, .by = "week",
    meanadjusted = mean(adjusted)
    ) %>%
  plot_time_series(date, meanadjusted, .interactive = F, .y_intercept = 0)

Scatterplot of the stock price - Monthly Trend

We can aggregate the data by monthly basis. Here, I choose Spirit Airlines as an example.

data %>%
  filter(symbol == "SAVE") %>%
  summarise_by_time(
    date, .by = "month",
    meanadjusted = mean(adjusted)
    ) %>%
  plot_time_series(date, meanadjusted, .interactive = F, .y_intercept = 0)

Scatterplot of the stock price

Here, I choose Johnson & Johnson as an example.

data %>%
  filter(symbol == "JNJ") %>%
  select("date", "adjusted") %>%
  ggplot(aes(x = date, y = adjusted))+
    geom_line(color="red")+
    geom_point(color = "green1", size = 0.1)+
    labs(x="Date", y="Price (USD)")+
    ggtitle("Price of Johnson & Johnson") +
    theme_minimal()

Scatterplot of the stock volume

Here, I illustrate with Apple Inc. stock.

data %>%
  filter(symbol == "AAPL") %>%
  select("date", "volume")%>%
  ggplot(aes(x = date, y = volume))+
    geom_line(color="cyan2")+
    geom_point(color = "firebrick", size = 0.1)+
    labs(x="Date", y=" Transaction Volume ")+
    ggtitle("Transaction volume of Apple Inc.") +
    theme_minimal()

4. Stationize the data

Let’s pick one stock “JP Morgan” from Financials sector and illustrate how we can make the data stationary.

Visualize the JP Morgan price

jpmorgan <- data %>%
  filter(symbol == "JPM")

jpmorgan %>%
plot_time_series(date, adjusted, .color_var = year(date), .interactive = F)

Assesse its ACF and PACF

In R this is done with the appropriately named acf and pacf functions.

  • Plot the ACF(Autocorrelation Function)

The ACF shows the correlation of a time series with lags of itself. That is, how much the time series is correlated with itself at one lag, at two lags, at three lags and so on.

acf(jpmorgan$adjusted)

  • Plot the PACF(Partial Autocorrelation Function)

The PACF is a little more complicated. The autocorrelation at lag one can have lingering effects on the autocorrelation at lag two and onward. The partial autocorrelation is the amount of correlation between a time series and lags of itself that is not explained by a previous lag. So, the partial autocorrelation at lag two is the correlation between the time series and its second lag that is not explained by the first lag.

pacf(jpmorgan$adjusted)

Use the Differencing technique

Differencing a time series method, to subtract each data point in the series from its successor. It is commonly used to make a time series stationary. Besides, if the time series appears to be seasonal, a better approach is to difference with respective season’s data points to remove seasonal effect.

But, how should we know how many differencing is needed? the nsdiffs and ndiffs from forecast package can help find out how many seasonal differencing and regular differencing respectively is needed to make the series stationary. (Note: For most time series patterns, 1 or 2 differencing is necessary to make it a stationary series.)

Seasonal Differencing
# nsdiffs(jpmprice$adjusted)  # number for seasonal differencing needed
#> Error in nsdiffs(jpmprice$adjusted) : Non seasonal data
Regular Differencing
ndiffs(jpmorgan$adjusted)  # number of differences needed
## [1] 1
Make it stationary
stationaryTS <- diff(jpmorgan$adjusted, differences= 1)
plot(stationaryTS, type="l", main="Differenced and Stationary")  # appears to be stationary

5. Observations & Suggestions

  • Different sector’s performances are significantly different in the same period.

The airlines’ stock prices dropped dramatically due to the Covid-19 in year 2020 whereas the stock prices of Healthcare sector and Information Technology sector skyrocketed because Covid-19 boosts the usage of healthcare products and IT products.

From the 1st observation, I suggest that we need to show a few different stock sectors for the users to select when we build the final Shiny App.

  • In the same sector, by and large the stocks behave similarly in the long run.

In other words, the stocks in the same sector show the same trends and fluctuations. For instance, if we compare the Apple Inc.’s price scatterplot with Microsoft Corporation one in the past 5 years, their performance are almost the same. If we check the American Airlines and Spirit Airlines’s prices scatterplots, they look almost the same too. Probably the stocks in the same sector would be affected by the sector’s outlook and investor’s sentiments on the sector. If the investors favor one sector, most stocks in this sector would benefit.

From the 2nd observation, I suggest that we may also need to allow users to select several stocks from the same sector in the final Shiny App.

  • In early 2020, all the stocks’ transaction volumes peaked in a historic record.

This could be explained by the Covid-19 factor too. Investors rushed to sell off the stocks negatively affected by Covid-19 and buy in the stocks positively affected by Covid-19 substantially.

From the 3rd observation, we should display the stock’s transaction volume data as well as price data in the final Shiny App as both can tell some meaningful insights.

  • The differencing technique can be used as pre-processing step to make the time series data stationary for further modeling.

From the 4th observation, we can present the stationized plots (ACF and PACF charts) along with a little explanations.

  • Other thoughts

On top of the ideas above, as the key components of EDA, in my view, we can make the Start-date and End-date as calender view selections so that the users will be able to choose any time period as they wish.

6. Storyboard for the design of the sub-module.

After examining and exploring the stock data, I propose the design of EDA layout as below.

  1. We can have two tabs: one is to compare multiple stocks and the other one is to deep dive into one stock analysis.

  2. In the comparison page of multiple stocks, both the price and transaction volume will be displayed.

  3. In the one stock analysis page, the price, transaction volume, ACF & PACF and final stationary charts will be plotted.

The storyboard for the design is attached as below.

7. References

  • Wikipedia about Time Series: https://en.wikipedia.org/wiki/Time_series;
  • Github about the Timetk package: https://business-science.github.io/timetk/articles/TK04_Plotting_Time_Series.html;
  • Business Science: https://www.business-science.io/code-tools/2020/08/19/five-minute-time-series-rolling-calculations.html;
  • Tidyquant: https://cran.csiro.au/web/packages/tidyquant/vignettes/TQ02-quant-integrations-in-tidyquant.html;
  • Tidyquant package: https://cran.r-project.org/web/packages/tidyquant/index.html;
  • Time Series Analysis with R from R-Statistics: http://r-statistics.co/Time-Series-Analysis-With-R.html
  • Time Series with R from Dominodatalab: https://blog.dominodatalab.com/time-series-with-r/
  • Stocks sectors breakdown: https://www.investopedia.com/terms/s/sector-breakdown.asp;
  • Stocks market: https://www.fool.com/investing/stock-market/market-sectors;
Close

Technical Analysis

  • 1.Background
  • 2. Objective/Layout of this assignment
  • 3. Literature Review - Why TidyQuant (Tidy Quantitative Financial Analysis)
    • 3.1 TidyQuant - Charting
    • 3.2 TidyQuant - Portfolio Analysis
  • 4. Loading and Preping Data : Single Stock and Multiple Stock analysis
  • 5.Testing and Prototyping - Single Stock Analysis ( Using APPLE stock)
    • 5.1 Candlestick Chart
    • 5.2 Stock Price/TradingVolume/Returns
  • 6. Charting for Multiple Securities
    • 6.1 Multiple securities -Closing prices
    • 6.2 Multiple securities -Moving averages for multiple stocks at once
    • 6.3 Multiple securities -BBands for multiple stocks
    • 6.4 Multiple securities -Annural Returns
    • 6.6 Miscelleneous Plotting experiementations
    • 6.6.1 GGPLOT - Exploration of Log Scale vs Continous Scale on Y for stock prices
  • 7.0 Reflection- Conclusion/Key Learnings/Benefits of Interactivity in the Shiny App
    • 7.1 Interactive User Experience and Self Configuration of views
    • 7.2 Finer details - Interacing with Chart Points.
  • 8.0 Possible SHiny App Story Board for Stock Analysis
  • 9.0 References

Author : Evelyn Phang

Focus of this report : Visualisations for Stock Performance ( leveraging TidyQuant Package).

Distill Blog : https://admiring-babbage-48f9ce.netlify.app/posts/2021-04-05-assignment/

1.Background

Many investors analyze stocks based on their fundamentals such as their revenue, valuation, or industry trends but fundamental factors aren’t always reflected in the market price.

Technical analysis using charts help to identify trading signals and price patterns, and provides as a window into market psychology to identify opportunities to profit.

Project Group Scope : provides various tools for a beginner investor,to analyse stocks past performance and forecasting with 3 parts:
1. Forecasting ( leveraging ModelTime package)
2. Visualisations for Stock Performance ( leveraging TidyQuant Package)
3. Extensive Time Series analysis ( leveraging TimeTK)

2. Objective/Layout of this assignment

This assignment explores the features of TidyQuant to experiment what is best of the user, and also to identify any gaps that an eventually Shiny App implementation can adderss to improve the user’s experience.

After some research,the following 3 charts provide the most information for the beginner trader to study a single stock:

  1. Price Movement - Candlesticks Chart/BarCharts
    • This chart reflect the impact of investor sentiment on security prices and are used by technical analysts to determine when to enter and exit trades.
  2. Stock Price and Trading Volume - Line Charts
    • This chart provide the the volumes have been in the past and what they are currently before making a decision.
  3. Moving averages (15-day, 50-day and 200-day) on a particular stock price movement-
    • This chart provides information about the moving average. For Liquid/Volatile stocks may benefit from shorter (eg.15-day) moving average analysis where as illiquid stocks my be examined with 50-day moving average.

After some research, the typical four charts that is useful for a trader to analysis of group of stocks or portfolio : - Closing prices - Moving Average - Bollinger Bands - Annual Returns

3. Literature Review - Why TidyQuant (Tidy Quantitative Financial Analysis)

# Loads tidyquant, lubridate, xts, quantmod, TTR, and PerformanceAnalytics

packages <- c('tidyverse','tidyquant','lubridate','xts','quantmod','TTR','PerformanceAnalytics')

for (p in packages){
  if (!require(p,character.only=T)){
    install.packages(p)
  }
  library(p, character.only=T)
}

TidyQuant addresses a few gaps that in the existing quantitative analysis functions (QAF).

  • GAP-1

    • Existing Quantitative analysis functions (QAF) -such as xts, zoo, quantmod, TTR, and PerformanceAnalytics work with time-series objects. These objects do not work well with the data frames or the tidyverse workflow.
  • SOLUTION- 1

    • TidyQuant addresses this by providing a wrapper to enabling seamless usage within the tidyverse workflow.
  • GAP-2: Existing QAF functions for Stock Analysis do not provide any easy to use functions to load stock information and stock indices.

  • SOLUTION-2 : TidyQuant addresses this by implementing the following functions :Get a Stock Index, tq_index(), or a Stock Exchange, tq_exchange(): Returns the stock symbols and various attributes for every stock in an index or exchange. Eighteen indexes and three exchanges are available. To get Quantitative Data, tq_get()is implemented to provide a one-stop shop to get data from various web-sources.

  • GAP-3 : PerformanceAnalytics package consolidates functions to compute many of the most widely used performance metrics.

  • SOLUTION -3 : Tidyquant integrates this functionality so it can be used at scale using the split, apply, combine framework within the tidyverse. Two primary functions integrate the performance analysis functionality:

    • tq_performance implements the performance analysis functions in a tidy way, enabling scaling analysis using the split, apply, combine framework
    • tq_portfolio provides a useful tool set for aggregating a group of individual asset returns into one or many portfolios.

3.1 TidyQuant - Charting

The tidyquant package leverages ggplot2 - and has three primary geometry (geom) categories and one coordinate manipulation (coord) category that i will explore:

  • Chart Types: Two chart type visualizations are available using geom_barchart and geom_candlestick.

  • Moving Averages: Seven moving average visualizations are available using geom_ma.

  • Bollinger Bands: Bollinger bands can be visualized using geom_bbands. The BBand moving average can be one of the seven available in Moving Averages.

  • Zooming in on Date Ranges:

    • Two coord functions are available (coord_x_date and coord_x_datetime), which prevent data loss when zooming in on specific regions of a chart. This is important when using the moving average and Bollinger band geoms.

3.2 TidyQuant - Portfolio Analysis

The Performance Analytics package in TidyQuant consolidates functions to compute the most widely used stock performance metrics. Tidquant integrates this functionality so it can be used at scale using the split, apply, combine framework within the tidyverse.

  • Two primary functions integrate the performance analysis functionality:

    • tq_performance - implements the performance analysis functions in a tidy way, enabling scaling analysis using the split, apply, combine framework.

    • tq_portfolio - provides a useful tool set for aggregating a group of individual asset returns into one or many portfolios.

4. Loading and Preping Data : Single Stock and Multiple Stock analysis

Tidyquant provides a function tq_get for directly loading data. For the purpose of this assignment i use this funciton to get the data for APPLE, AMAZON, FACEBOOK< GOOGLE, NETFLIX. To preapre for a multi-stock analysis evaluation, i also built combined data set.

AAPL <- tq_get("AAPL", get = "stock.prices", from = "2011-01-01", to = "2021-03-31")
AMZN <- tq_get("AMZN", get = "stock.prices", from = "2011-01-01", to = "2021-03-31")
GOOG <- tq_get("GOOG",get = "stock.prices", from = "2011-01-01", to = "2021-03-31")
NFLX <- tq_get("NFLX",get = "stock.prices", from = "2011-01-01", to = "2021-03-31")
FB <- tq_get("FB",get = "stock.prices", from = "2011-01-01", to = "2021-03-31")
FAANG<- rbind(FB,AAPL,AMZN,NFLX,GOOG)

Setup the End Data for review as last day in March of 2021

end <- as_date("2021-03-31")

5.Testing and Prototyping - Single Stock Analysis ( Using APPLE stock)

5.1 Candlestick Chart

5.1.1 Candlestick Chart- Complete Period

AAPL %>%
    ggplot(aes(x = date, y = close)) +
    geom_candlestick(aes(open = open, high = high, low = low, close = close)) +
    labs(title = "AAPL Candlestick Chart", y = "Closing Price", x = "") +
    scale_color_tq(theme = "dark") +
    scale_y_continuous(labels = scales::dollar)

KEY LEARNING : A candlestick chart for a long period eg. more than 5 years is not really meaningful. THe user will need to able to zoom into the prefered periods eg. 6 weeks or less.

5.1.2 Candlestick Chart- Zooming into a 6 week window.

AAPL %>%
    ggplot(aes(x = date, y = close)) +
    geom_candlestick(aes(open = open, high = high, low = low, close = close)) +
    labs(title = "AAPL Candlestick Chart", 
         subtitle = "Zoomed in using coord_x_date- 6 weeks",
         y = "Closing Price", x = "") + 
    coord_x_date(xlim = c(end - weeks(6), end),
                 ylim = c(110, 140)) + 
    theme_tq() +
    scale_color_tq(theme = "dark") +
    scale_y_continuous(labels = scales::dollar)

5.1.3 Candlestick Chart- Zooming into 1 week

AAPL %>%
    ggplot(aes(x = date, y = close)) +
    geom_candlestick(aes(open = open, high = high, low = low, close = close)) +
    labs(title = "AAPL Candlestick Chart", 
         subtitle = "Zoomed in using coord_x_date - 14 days",
         y = "Closing Price", x = "") + 
    coord_x_date(xlim = c(end - weeks(1), end),
                 ylim = c(115, 130)) + 
    theme_tq() +
    scale_color_tq(theme = "dark") +
    scale_y_continuous(labels = scales::dollar)

5.1.4 Candlestick Chart- Using different colours for the candlesticks.

The colors can be modified using colour_up and colour_down, which control the line color, and fill_up and fill_down, which control the rectangle fills.Using RED and GREEN - makes it very clear for the user.. which is the ‘danger’ candlestick.

AAPL %>%
    ggplot(aes(x = date, y = close)) +
    geom_candlestick(aes(open = open, high = high, low = low, close = close),
                        colour_up = "darkgreen", colour_down = "darkred", 
                        fill_up  = "darkgreen", fill_down  = "darkred") +
    labs(title = "AAPL Candlestick Chart- 6 weeks ", 
         subtitle = "Zoomed in, Experimenting with Formatting: dark red and dark green gives the best visual",
         y = "Closing Price", x = "") + 
    coord_x_date(xlim = c(end - weeks(6), end),
                 ylim = c(110, 140)) + 
    theme_tq()

KEY LEARNING : Red and Green is a good colour palette to immediately show up the days where the closing price was lower then the opening price ( Red )- makes the candle chart quickly readable.

5.1.5 Price Bar Chart- alternative to candlestick

Price Bar Chart - Each bar typically shows open, high, low, and closing (OHLC) prices. Bar charts are very similar to candlestick charts. The two chart types show the same information but in different ways. A bar chart is composed of a vertical line with small horizontal lines on the left and right that show the open and close. Candlesticks also have a vertical line showing the high and low of the period (called a shadow or wick), but the difference between the open and close is represented by a thicker portion called a real body. The body is shaded in or colored red if the close is below the open. The body is white or green if the close is above the open. While the information is the same, the visual look of the two chart types is different.

AAPL %>%
    ggplot(aes(x = date, y = close)) +
    geom_barchart(aes(open = open, high = high, low = low, close = close)) +
    labs(title = "AAPL Bar Chart -Stock Prices",
         subtitle = "The bar chart does not give move meaningful info then the line chart for large time scales",
         y = "Closing Price in USD", x = ")+
         ")+
    theme_tq()

KEY LEARNING : The bar chart does not give move meaningful info then the line chart for large time scales

5.1.6 Price BarChart- for 4 weeks - zoomed in

AAPL %>%
    ggplot(aes(x = date, y = close)) +
    geom_barchart(aes(open = open, high = high, low = low, close = close),
                  colour_up = "darkgreen", colour_down = "darkred", size = 0.8) +
    labs(title = "AAPL Bar Chart of Stock Prices", 
         subtitle = "Zoomed in using coord_x_date- colour_up = darkgreen, colour_down = darkred",
         y = "Closing Price", x = "") + 
    coord_x_date(xlim = c(end - weeks(4), end),
                 ylim = c(110, 130)) + 
     theme_tq()

Key Learning :Candle Charts far easier to interpret than Pricebarcharts and much easier to read.

5.2 Stock Price/TradingVolume/Returns

Exploring the varies ways to visualize Price/TradingVolume/Returns/Moving Averages.

5.2.1 Price Line Chart- for the full period

AAPL %>%
    ggplot(aes(x = date, y = close)) +
    geom_line() +
    labs(title = "AAPL Line Chart- Prices", y = "Closing Price in USD", x = "") +
    theme_tq()

5.2.2 Stock Volume Chart

We can use the geom_segment() function to chart daily volume, which uses xy points for the beginning and end of the line. Using the aesthetic color argument, we color based on the value of volume to make these data stick out.

AAPL %>%
    ggplot(aes(x = date, y = volume)) +
    geom_segment(aes(xend = date, yend = 0, color = volume)) + 
    geom_smooth(method = "loess", se = FALSE) +
    labs(title = "APPLE Volume Chart - Bar Chart - Whole Period", 
         subtitle = "Charting Daily Volume", 
         y = "Volume", x = "") +
    theme_tq() +
    theme(legend.position = "none") 

5.2.3 Volume Chart - using Geom_segment- zoomed into past 50 days

And, we can zoom in on a specific region. Using scale_color_gradient we can quickly visualize the high and low points, and using geom_smooth we can see the trend.

start <- end - weeks(24)

AAPL %>%
    filter(date >= start - days(20)) %>%
    ggplot(aes(x = date, y = volume)) +
    geom_segment(aes(xend = date, yend = 0, color = volume)) +
    geom_smooth(method = "loess", se = FALSE) +
    labs(title = "APPLE Price Chart - Bar Chart- WHole Period", 
         subtitle = "Charting Daily Volume, Zooming In to past 20 days", 
         y = "Volume", x = "") + 
    coord_x_date(xlim = c(start, end)) +
    scale_color_gradient(low = "red", high = "darkblue") +
    theme_tq() + 
    theme(legend.position = "none")

#### 5.2.4 Single Stock Returns Chart - Line Chart/Time series.

Showing monthly return for single stock

tq_get(c("AAPL"), get="stock.prices") %>%
  tq_transmute(select=adjusted,
               mutate_fun=periodReturn,
               period="monthly",
               col_rename = "monthly_return") %>%
  ggplot(aes(date, monthly_return)) +
  labs(title = "APPLE STOCK-Monthly Return - Line Chart") + 
  geom_line()

5.2.5 Moving Averages

Visualizing Trends averages is important for time-series analysis of stocks. For example : A simple moving average (SMA) calculates the average of a selected range of prices, usually closing prices, by the number of periods in that range. A simple moving average is a technical indicator that can aid in determining if an asset price will continue or if it will reverse a bull or bear trend.A simple moving average can be enhanced as an exponential moving average (EMA) that is more heavily weighted on recent price action.

Tidyquant includes geoms to enable “rapid prototyping” to quickly visualize signals using moving averages and Bollinger bands.

Within Tidyquant the following moving averages are available:

1)Simple moving averages (SMA) 2)Exponential moving averages (EMA)

.. and more advanced averages : 3)Weighted moving averages (WMA) 4)Double exponential moving averages (DEMA) 5)Zero-lag exponential moving averages (ZLEMA) 6)Volume-weighted moving averages (VWMA) (also known as VWAP) 7)Elastic, volume-weighted moving averages (EVWMA) (also known as MVWAP)

For exploration : We will visualize the SMA, EMA.

5.2.5.1 Charting the 14-day,50-day and 200-day Simple Moving Average

Charting the 50-day and 200-day simple moving average using the SMA funciton as an example.We apply the moving average geoms after the candlestick geom to overlay the moving averages on top of the candlesticks. We add two moving average calls, one for the 50-day and the other for the 200-day. We add color = “red” and linetype = 5 to distinguish the 200-day from the 50-day.

AAPL %>%
    ggplot(aes(x = date, y = close)) +
    geom_candlestick(aes(open = open, high = high, low = low, close = close)) +
    geom_ma(ma_fun = SMA, n = 14, color= 'green', size = 1.25) +
    geom_ma(ma_fun = SMA, n = 50, color= 'blue', size = 1.25) +
    geom_ma(ma_fun = SMA, n = 200, color = "red", size = 1.25) + 
    labs(title = "AAPL Candlestick Chart", 
         subtitle = "14-Day SMA(green),50-Day SMA(blue) and 200-DaySMA (red )", 
         y = "Closing Price", x = "") + 
    coord_x_date(xlim = c(end - weeks(24), end),
                 ylim = c(80, 160)) + 
    theme_tq()

##### 5.2.5.2 Charting Exponential Moving Average (EMA)

AAPL %>%
    ggplot(aes(x = date, y = close)) +
    geom_barchart(aes(open = open, high = high, low = low, close = close)) +
    geom_ma(ma_fun = EMA, n = 50, wilder = TRUE, linetype = 5, size = 1.25) +
    geom_ma(ma_fun = EMA, n = 200, wilder = TRUE, color = "red", size = 1.25) + 
    labs(title = "AAPL Bar Chart", 
         subtitle = "50 (Blue) and 200-Day (Red) EMA ", 
         y = "Closing Price", x = "") + 
    coord_x_date(xlim = c(end - weeks(24), end),
                 ylim = c(80, 160)) +
    theme_tq()

5.2.5.3 Charting for Reviewing Stock Volatility - Bolingar Bands

WHy Bollinger Bands - Bollinger Bands are used to visualize volatility by plotting a range around a moving average typically two standard deviations up and down. Because they use a moving average, the geom_bbands function works almost identically to geom_ma. The same seven moving averages are compatible. The main difference is the addition of the standard deviation, sd, argument which is 2 by default, and the high, low and close aesthetics which are required to calculate the bands.

5.2.5.4.1 Applying BBands using a SMA
AAPL %>%
    ggplot(aes(x = date, y = close, open = open,
               high = high, low = low, close = close)) +
    geom_candlestick() +
    geom_bbands(ma_fun = SMA, sd = 2, n = 20) +
    labs(title = "AAPL Candlestick Chart", 
         subtitle = "BBands with SMA Applied", 
         y = "Closing Price", x = "") + 
    coord_x_date(xlim = c(end - weeks(24), end),
                 ylim = c(80, 160)) + 
    theme_tq()

Modifying the appearance of Bollinger Bands

AAPL %>%
    ggplot(aes(x = date, y = close, open = open,
               high = high, low = low, close = close)) +
    geom_candlestick() +
    geom_bbands(ma_fun = SMA, sd = 2, n = 20, 
                linetype = 4, size = 1, alpha = 0.2, 
                fill        = palette_light()[[1]], 
                color_bands = palette_light()[[1]], 
                color_ma    = palette_light()[[2]]) +
    labs(title = "AAPL Candlestick Chart", 
         subtitle = "BBands with SMA Applied, Experimenting with Formatting", 
         y = "Closing Price", x = "") + 
    coord_x_date(xlim = c(end - weeks(24), end),
                 ylim = c(80, 160)) + 
    theme_tq()

6. Charting for Multiple Securities

The main fucntion here is to use facet_wrap to visualize multiple stocks at the same time. By adding a group aesthetic in the main ggplot() function and combining with a facet_wrap() function at the end of the ggplot workflow, all five “FAANG” stocks can be viewed simultaneously.

6.1 Multiple securities -Closing prices

6.1.1 Multiple securities - closing prices - in a Timeseries Line Chart

tq_get(c("GOOG","AMZN","FB","AAPL","NFLX"),get="stock.prices") %>%
  ggplot(aes(date, close, color=symbol)) +
  geom_line()

#### 6.1.2 Multiple securities - closing prices - in Facet

start <- end - weeks(6)
FAANG %>%
    filter(date >= start - days(2 * 15)) %>%
    ggplot(aes(x = date, y = close, group = symbol)) +
    geom_candlestick(aes(open = open, high = high, low = low, close = close)) +
    labs(title = "FAANG Candlestick Chart", 
         subtitle = "Experimenting with Mulitple Stocks",
         y = "Closing Price", x = "") + 
    coord_x_date(xlim = c(start, end)) +
    facet_wrap(~ symbol, ncol = 2, scale = "free_y") + 
    theme_tq()

6.2 Multiple securities -Moving averages for multiple stocks at once

Experimenting with plotting multiple moving averages.

start <- end - weeks(6)

FAANG %>%
    filter(date >= start - days(6 * 50)) %>%
    ggplot(aes(x = date, y = close, volume = volume, group = symbol)) +
    geom_candlestick(aes(open = open, high = high, low = low, close = close)) +
    geom_ma(ma_fun = SMA, n = 14, color= 'green', size = 0.5) +
    geom_ma(ma_fun = SMA, n = 50, color= 'blue', size = 0.5) +
    geom_ma(ma_fun = SMA, n = 200, color = "red", size = 0.5) +
    labs(title = "Multiple Securities : FAANG ", 
         subtitle = "15 and 50-Day EMA, Experimenting with Multiple Stocks", 
         y = "Closing Price", x = "") + 
    coord_x_date(xlim = c(start, end)) +
    facet_wrap(~ symbol, ncol = 2, scales = "free_y") + 
    theme_tq()

KEY LEARNING : It doesn’t make sense to plot multiple moving average days in one combined visual.. as the lines are too close to visualize.

start <- end - weeks(6)

FAANG %>%
    filter(date >= start - days(2 * 15)) %>%
    ggplot(aes(x = date, y = close, volume = volume, group = symbol)) +
    geom_candlestick(aes(open = open, high = high, low = low, close = close)) +
    geom_ma(ma_fun = SMA, n = 15, color= 'blue', size = 0.5) +
    labs(title = "Multiple Securities : FAANG ", 
         subtitle = "30-Day SMA, Experimenting with Multiple Stocks", 
         y = "Closing Price", x = "") + 
    coord_x_date(xlim = c(start, end)) +
    facet_wrap(~ symbol, ncol = 2, scales = "free_y") + 
    theme_tq()

KEY LEARNING FAANG data is filtered by date from double the number of moving-average days (2 * n) previous to the start date. This yields a nice y-axis scale and still allows us to create a moving average line using geom_ma.Too much out-of-bounds data distorts the scale of the y-axis, and too little and we cannot get a moving average. The optimal method is to include “just enough” out-of-bounds data to get the chart we want.

6.3 Multiple securities -BBands for multiple stocks

This is to try out the geom_bbands and facet_wrap functions.

start <- end - weeks(24)
FAANG %>%
    filter(date >= start - days(2 * 20)) %>%
    ggplot(aes(x = date, y = close, 
               open = open, high = high, low = low, close = close, 
               group = symbol)) +
    geom_barchart() +
    geom_bbands(ma_fun = SMA, sd = 2, n = 20, linetype = 5) +
    labs(title = "Multiple Securities : FAANG ", 
         subtitle = "BBands with SMA Applied, Experimenting with Multiple Stocks", 
         y = "Closing Price", x = "") + 
    coord_x_date(xlim = c(start, end)) +
    facet_wrap(~ symbol, ncol = 2, scales = "free_y") + 
    theme_tq()

6.4 Multiple securities -Annural Returns

6.4.1 Multiple Stocks in a one line chart

tq_get(c("GOOGL","AMZN","FB","AAPL","NFLX"), get="stock.prices") %>%
  group_by(symbol) %>%
  tq_transmute(select=adjusted,
               mutate_fun=periodReturn,
               period="monthly",
               col_rename = "monthly_return") %>%
  ggplot(aes(date, monthly_return, color=symbol)) +
  labs(title = "Multiple Securities : FAANG - in one line chart") + 
  geom_line()

*KEY LEARNING From the chart we can see that NETFLIX had much bigger swings in return than the overstocks. For a deeper dive comparison, a facet layout will be required side by side.

6.4.2 Multiple Stocks in a Facet Layout

This is to try out the tq_transmute function.

FAANG_annual_returns <- FAANG %>%
    group_by(symbol) %>%
    tq_transmute(select     = adjusted, 
                 mutate_fun = periodReturn, 
                 period     = "yearly", 
                 type       = "arithmetic")
FAANG_annual_returns %>%
    ggplot(aes(x = date, y = yearly.returns, fill = symbol)) +
    geom_col() +
    geom_hline(yintercept = 0, color = palette_light()[[1]]) +
    scale_y_continuous(labels = scales::percent) +
    labs(title = "FAANG: Annual Returns",
         subtitle = "Get annual returns quickly with tq_transmute!",
         y = "Annual Returns", x = "") + 
    facet_wrap(~ symbol, ncol = 2, scales = "free_y") +
    theme_tq() + 
    scale_fill_tq()

### 6.5 Portfolio Analysis using tq_portfolio function.

6.5.1 Portfolio Returns

stock_returns_monthly <- c("AAPL", "GOOG", "NFLX","FB","AMZN") %>%
    tq_get(get  = "stock.prices",
           from = "2010-01-01",
           to   = "2015-12-31") %>%
    group_by(symbol) %>%
    tq_transmute(select     = adjusted, 
                 mutate_fun = periodReturn, 
                 period     = "monthly", 
                 col_rename = "Ra")
wts <- c(0.2, 0.2, 0.2,0.2,0.2)
portfolio_returns_monthly <- stock_returns_monthly %>%
    tq_portfolio(assets_col  = symbol, 
                 returns_col = Ra, 
                 weights     = wts, 
                 col_rename  = "Ra")


portfolio_returns_monthly %>%
    ggplot(aes(x = date, y = Ra)) +
    geom_bar(stat = "identity", fill = palette_light()[[1]]) +
    labs(title = "Portfolio Returns",
         subtitle = "20% AAPL, 20% GOOG,20% NFLX, 20% FB, 20% AMZN",
         caption = "Shows an above-zero trend meaning positive returns",
         x = "", y = "Monthly Returns") +
    geom_smooth(method = "lm") +
    theme_tq() +
    scale_color_tq() +
    scale_y_continuous(labels = scales::percent)

6.5.2 Portfolio Growth

wts <- c(0.2,0.2,0.2,0.2,0.2)
portfolio_growth_monthly <- stock_returns_monthly %>%
    tq_portfolio(assets_col   = symbol, 
                 returns_col  = Ra, 
                 weights      = wts, 
                 col_rename   = "investment.growth",
                 wealth.index = TRUE) %>%
    mutate(investment.growth = investment.growth * 10000)

portfolio_growth_monthly %>%
    ggplot(aes(x = date, y = investment.growth)) +
    geom_line(size = 1.5, color = palette_light()[[1]]) +
    labs(title = "Portfolio Growth",
         subtitle = "20% AAPL, 20% GOOG,20% NFLX, 20% FB, 20% AMZN",
         caption = "Visualize performance!",
         x = "", y = "Portfolio Value") +
    geom_smooth(method = "loess") +
    theme_tq() +
    scale_color_tq() +
    scale_y_continuous(labels = scales::dollar)

6.5.3 Visualizing Multiple Portfolios

To visualize multiple portfolios, we just need to configure the WEIGHTs table, for the chosen number of 3 stocks.

20% AAPL, 25% GOOG, 25% NFLX, 20% FB,10% AMZN 15% AAPL, 50% GOOG, 25% NFLX, 5% FB,5% AMZN 15% AAPL, 25% GOOG, 40% NFLX, 10% FB,10% AMZN

weights <- c(
    0.2, 0.25, 0.25,0.2,0.1,
    0.10, 0.40, 0.3,0.1,0.1,
    0.15, 0.25, 0.40,0.1,0.1
)

stocks <- c("AAPL", "GOOG", "NFLX","FB","AMZN")

weights_table <-  tibble(stocks) %>%
    tq_repeat_df(n =3) %>%
    bind_cols(tibble(weights)) %>%
    group_by(portfolio)

stock_returns_monthly_multi <- stock_returns_monthly %>%
    tq_repeat_df(n = 3)

portfolio_growth_monthly_multi <- stock_returns_monthly_multi %>%
    tq_portfolio(assets_col   = symbol, 
                 returns_col  = Ra, 
                 weights      = weights_table, 
                 col_rename   = "investment.growth",
                 wealth.index = TRUE) %>%
    mutate(investment.growth = investment.growth * 10000)



portfolio_growth_monthly_multi %>%
    ggplot(aes(x = date, y = investment.growth, color = factor(portfolio))) +
    geom_line(size = 1) +
    labs(title = "Portfolio Growth",
         subtitle = "Comparing Multiple Portfolios- Portfolio 2 has the highest returns",
         caption = "Best Performing Porfolio -3 : 15% AAPL, 25% GOOG, 40% NFLX,10% FB,10% AMZN",
         x = "", y = "Portfolio Value",
         color = "Portfolio") +
    geom_smooth(method = "loess") +
    theme_tq() +
    scale_color_tq() +
    scale_y_continuous(labels = scales::dollar)

6.6 Miscelleneous Plotting experiementations

6.6.1 GGPLOT - Exploration of Log Scale vs Continous Scale on Y for stock prices

AMZN %>%
    ggplot(aes(x = date, y = adjusted)) +
    geom_line(color = palette_light()[[1]]) + 
    scale_y_continuous() +
    labs(title = "AMZN Line Chart", 
         subtitle = "Continuous Scale", 
         y = "Closing Price", x = "") + 
    theme_tq()

AMZN %>%
    ggplot(aes(x = date, y = adjusted)) +
    geom_line(color = palette_light()[[1]]) + 
    scale_y_log10() +
    labs(title = "AMZN Line Chart", 
         subtitle = "Log Scale", 
         y = "Closing Price", x = "") + 
    theme_tq()

### 6.6.2 GGPLOT - Plotting Regression trendlines with geom_smooth

AMZN %>%
    ggplot(aes(x = date, y = adjusted)) +
    geom_line(color = palette_light()[[1]]) + 
    scale_y_log10() +
    geom_smooth(method = "lm") +
    labs(title = "AMZN Line Chart", 
         subtitle = "Log Scale, Applying Linear Trendline", 
         y = "Adjusted Closing Price", x = "") + 
    theme_tq()

#### 6.6.3 Tidyquant Themes

6.6.3.1 Testing with Dark Theme and thin lines

n_mavg <- 50 # Number of periods (days) for moving average
FAANG %>%
    filter(date >= start - days(2 * n_mavg)) %>%
    ggplot(aes(x = date, y = close, color = symbol)) +
    geom_line(size = 0.2) +
    geom_ma(n = 15, color = "darkblue", size = 0.2) + 
    geom_ma(n = n_mavg, color = "red", size = 0.2) +
    labs(title = "Dark Theme and thin lines- size 0.2",
         x = "", y = "Closing Price") +
    coord_x_date(xlim = c(start, end)) +
    facet_wrap(~ symbol, scales = "free_y") +
    theme_tq_dark() +
    scale_color_tq(theme = "dark") +
    scale_y_continuous(labels = scales::dollar)

n_mavg <- 50 # Number of periods (days) for moving average
FAANG %>%
    filter(date >= start - days(2 * n_mavg)) %>%
    ggplot(aes(x = date, y = close, color = symbol)) +
    geom_line(size = 1.5) +
    geom_ma(n = 15, color = "darkblue", size = 1.5) + 
    geom_ma(n = n_mavg, color = "red", size = 1.5) +
    labs(title = "Light Theme with thicker lines of size -1.5",
         x = "", y = "Closing Price") +
    coord_x_date(xlim = c(start, end)) +
    facet_wrap(~ symbol, scales = "free_y") +
    theme_tq() +
    scale_color_tq(theme = "dark") +
    scale_y_continuous(labels = scales::dollar)

n_mavg <- 50 # Number of periods (days) for moving average
FAANG %>%
    filter(date >= start - days(2 * n_mavg)) %>%
    ggplot(aes(x = date, y = close, color = symbol)) +
    geom_line(size = 1) +
    geom_ma(n = 15, color = "darkblue", size = 1) + 
    geom_ma(n = n_mavg, color = "red", size = 1) +
    labs(title = "Green Theme with normal size line",
         x = "", y = "Closing Price") +
    coord_x_date(xlim = c(start, end)) +
    facet_wrap(~ symbol, scales = "free_y") +
    theme_tq_green() +
    scale_color_tq(theme = "dark") +
    scale_y_continuous(labels = scales::dollar)

Key Learning : THe other colour schemes - will invariably collide with one of the coloured lines - eg. in the last example above. The white theme with normal size line of 1 is good for single charts, and for faceted charts, white background with a fine line width of about 0.8 : still presents the information with better clarity, and should be used for overall approach.

7.0 Reflection- Conclusion/Key Learnings/Benefits of Interactivity in the Shiny App

A interactive implementation of the above would be able to bring much better user experience. Based on experimenting with various formats, graph and chart options i think :

a)Candle-stick is clearer for the user to read then barchart

b)Facets provide possiblity to read the details per stock ( when comparing a portofolio of stocks ) and should be provided as an additional option in addition to the typical ‘one chart’. blog c)The default theme ‘white’ is much clearer than the fancier color schemes such as grey/green.

d)THe log-scale for y-axis eg. for stock prices although allowing more to be captured, can be misleading and is not recommended to be used for stock prices- especially for intepreting trends.

7.1 Interactive User Experience and Self Configuration of views

  1. The user will be able select the range date for “zooming” down into the chart rather than hard-coding as above. Example a candlestick chart is not meaningful over multi-years.

  2. The user will be able to use a drop-down menu to select the type of charts he wish to view both for single stock or groups of stock

  3. The example shows the group stock analysis for 5 chosen stocks, in an interactive implementation the user will be able to decide which stocks he wants to view and compare as a group.

  4. With an interactive add the portfolio mix can also be ‘configured’ for various scenarios

7.2 Finer details - Interacing with Chart Points.

With an interactive visualisation, the user is able to see more detail about the chart, by ‘hovering’ over the chart. Currently, all values have to be ‘read off’ the grid, and the user has to estimate the X and Y values based on the grid lines, instead of knowing the exact values.

8.0 Possible SHiny App Story Board for Stock Analysis

Origina

9.0 References

1)Tidyquant Reference Manual : https://cloud.r-project.org/web/packages/tidyquant/tidyquant.pdf : 5th March 2021

2)Introduction to TidyQuant https://cloud.r-project.org/web/packages/tidyquant/vignettes/TQ00-introduction-to-tidyquant.html

3)R Graphics Cook Book - Winston Chong

4)Workflow for : TidyQuant Portfolio Analysis https://cloud.r-project.org/web/packages/tidyquant/vignettes/TQ05-performance-analysis-with-tidyquant.html

5)Various Examples https://edav.info/tidyquant.html

Close

BUI ANH HOANG - Evelyn Phang - Huang Ling

GitHub