Replicating Fama-French Factors

Felix Dimmerling

2025-07-03

Hintergründe des Empirical Assets Pricing

Bisherige Portfolio Bildung

  • Capital Asset Pricing Modell (CAPM) → hat Limits
    • Bewertung beruht nur auf der Beziehung von systematischem Risiko zu den Assets Returns
    • „Alle Firmen gleich“ (nur Beta im Bezug auf Marktüberrendite)
    • Ohne Auflösung, da Firmeneigenschaften vollkommen ignoriert werden
      → mangelnde Belastbarkeit in Detailbetrachtungen
    • Return-Muster, die nicht in dem Modell erklärt werden können

Firmeneigenschaften zur verbesserten Vorhersage

CAPM FF3 FF5
Faktoren 1 3 5
Granularität (Portfolios) 1 6 (independent sort) ~18 (dependent sort)
Theoretische Eleganz Hoch Solide Redundanzdiskussion (HML vs. RMW/CMA)
Erklärungskraft Schwach Gut (Size & Value) Sehr gut (Profitabilität & Investment)
Komplexität Niedrig Mittel Hoch
Empirische Robustheit Niedrig Langfristig bestätigt Noch nicht über alle Zeiträume stabil
Praktische Relvanz Eingeschränkt Hoch Sehr hoch

Faktoren im Detail

CAPM FF3 FF5
Marktüberrendite
Size SMB
Value HML
Profitabilität RMW
Investition CMA
  • Firmeneigenschaften werden systematisch genutzt
  • Renditeunterschiede werden erstmal nur korrelativ und nicht kausal erklärt

Portfolio Bildung bei Fama-French

  • Firmen werden anhand der Portfolio Gruppen sortiert
  • Granularität bei FF3 weniger als bei FF5
  • Renditeerwartungen werden mit einer Long/Short-Strategie gesteigert
    • Höher performende Long-Portfolios werden durch den Leerverkauf von Short-Portfolios finanziert
  • Renditemuster werden systematisch durch Firmeneigenschaften bergründet

Pakete & Datenimport

library(tidyverse)
library(nanoparquet)

crsp_monthly <- read_parquet("data/crsp_monthly.parquet") |>
  select(permno, gvkey, date, ret_excess, mktcap, mktcap_lag, exchange)

compustat <- read_parquet("data/compustat.parquet") |>
  select(gvkey, datadate, be, op, inv)

factors_ff3_monthly <- read_parquet("data/factors_ff3_monthly.parquet") |>
  select(date, smb, hml)

factors_ff5_monthly <- read_parquet("data/factors_ff5_monthly.parquet") |>
  select(date, smb, hml, rmw, cma)
[1] "crsp_monthly"
# A data frame: 3,378,303 × 7
   permno gvkey  date       ret_excess mktcap mktcap_lag exchange
    <dbl> <chr>  <date>          <dbl>  <dbl>      <dbl> <chr>   
 1  10000 013007 1986-02-01    -0.262   12.0       16.1  NASDAQ  
 2  10000 013007 1986-03-01     0.359   16.3       12.0  NASDAQ  
 3  10000 013007 1986-04-01    -0.104   15.2       16.3  NASDAQ  
 4  10000 013007 1986-05-01    -0.228   11.8       15.2  NASDAQ  
 5  10000 013007 1986-06-01    -0.0102  11.7       11.8  NASDAQ  
 6  10000 013007 1986-07-01    -0.0860  10.8       11.7  NASDAQ  
 7  10000 013007 1986-08-01    -0.620    4.15      10.8  NASDAQ  
 8  10000 013007 1986-09-01    -0.0616   3.91       4.15 NASDAQ  
 9  10000 013007 1986-10-01    -0.247    3.00       3.91 NASDAQ  
10  10000 013007 1986-11-01     0.0561   3.18       3.00 NASDAQ  
# ℹ 3,378,293 more rows
[1] "compustat"
# A data frame: 495,358 × 5
   gvkey  datadate       be      op     inv
   <chr>  <date>      <dbl>   <dbl>   <dbl>
 1 001000 1961-12-31 NA     NA      NA     
 2 001000 1962-12-31  0.552  2.88   NA     
 3 001000 1963-12-31  0.561  0.0463 NA     
 4 001000 1964-12-31  0.627  0.150  NA     
 5 001000 1965-12-31  0.491 -0.452   0.631 
 6 001000 1966-12-31  0.834  0.516   0.0519
 7 001000 1967-12-31  0.744  0.140   0.0107
 8 001000 1968-12-31  2.57   0.370   1.41  
 9 001000 1969-12-31 10.2    0.435   3.85  
10 001000 1970-12-31 10.5    0.430   0.165 
# ℹ 495,348 more rows
[1] "factors_ff3_monthly"
# A data frame: 768 × 3
   date           smb     hml
   <date>       <dbl>   <dbl>
 1 1960-01-01  0.0209  0.0278
 2 1960-02-01  0.0051 -0.0193
 3 1960-03-01 -0.0049 -0.0294
 4 1960-04-01  0.0032 -0.0228
 5 1960-05-01  0.0121 -0.037 
 6 1960-06-01 -0.0021 -0.0034
 7 1960-07-01 -0.0051  0.0198
 8 1960-08-01  0.0087 -0.0018
 9 1960-09-01 -0.0111  0.0162
10 1960-10-01 -0.0408  0.0283
# ℹ 758 more rows
[1] "factors_ff5_monthly"
# A data frame: 726 × 5
   date           smb     hml     rmw     cma
   <date>       <dbl>   <dbl>   <dbl>   <dbl>
 1 1963-07-01 -0.0041 -0.0097  0.0068 -0.0118
 2 1963-08-01 -0.008   0.018   0.0036 -0.0035
 3 1963-09-01 -0.0052  0.0013 -0.0071  0.0029
 4 1963-10-01 -0.0139 -0.001   0.028  -0.0201
 5 1963-11-01 -0.0088  0.0175 -0.0051  0.0224
 6 1963-12-01 -0.021  -0.0002  0.0003 -0.0007
 7 1964-01-01  0.0013  0.0148  0.0017  0.0147
 8 1964-02-01  0.0028  0.0281 -0.0005  0.0091
 9 1964-03-01  0.0123  0.034  -0.0221  0.0322
10 1964-04-01 -0.0152 -0.0067 -0.0127 -0.0108
# ℹ 716 more rows

Fama-French Datums-Anpassungen

size <- crsp_monthly |>
  filter(month(date) == 6) |>
  mutate(sorting_date = date %m+% months(1)) |>
  select(permno, exchange, sorting_date, size = mktcap)

market_equity <- crsp_monthly |>
  filter(month(date) == 12) |>
  mutate(sorting_date = ymd(str_c(year(date) + 1, "0701)"))) |>
  select(permno, gvkey, sorting_date, me = mktcap)

book_to_market <- compustat |>
  mutate(sorting_date = ymd(str_c(year(datadate) + 1, "0701"))) |>
  select(gvkey, sorting_date, be) |>
  inner_join(market_equity, join_by(gvkey, sorting_date)) |>
  mutate(bm = be / me) |>
  select(permno, sorting_date, me, bm)

sorting_variables <- size |>
  inner_join(
    book_to_market, join_by(permno, sorting_date)
    ) |>
  drop_na() |>
  distinct(permno, sorting_date, .keep_all = TRUE)
[1] "size"
# A data frame: 281,110 × 4
   permno exchange sorting_date  size
    <dbl> <chr>    <date>       <dbl>
 1  10000 NASDAQ   1986-07-01   11.7 
 2  10001 NASDAQ   1986-07-01    6.03
 3  10001 NASDAQ   1987-07-01    5.82
 4  10001 NASDAQ   1988-07-01    6.2 
 5  10001 NASDAQ   1989-07-01    7.01
 6  10001 NASDAQ   1990-07-01   10.1 
 7  10001 NASDAQ   1991-07-01   11.3 
 8  10001 NASDAQ   1992-07-01   12.6 
 9  10001 NASDAQ   1993-07-01   18.0 
10  10001 NASDAQ   1994-07-01   18.9 
# ℹ 281,100 more rows
[1] "market_equity"
# A data frame: 281,726 × 4
   permno gvkey  sorting_date    me
    <dbl> <chr>  <date>       <dbl>
 1  10000 013007 1987-07-01    1.98
 2  10001 012994 1987-07-01    6.94
 3  10001 012994 1988-07-01    5.83
 4  10001 012994 1989-07-01    6.36
 5  10001 012994 1990-07-01   10.3 
 6  10001 012994 1991-07-01   10.0 
 7  10001 012994 1992-07-01   15.6 
 8  10001 012994 1993-07-01   15.1 
 9  10001 012994 1994-07-01   20.0 
10  10001 012994 1995-07-01   17.8 
# ℹ 281,716 more rows
[1] "book_to_market"
# A data frame: 256,864 × 4
   permno sorting_date    me    bm
    <dbl> <date>       <dbl> <dbl>
 1  25881 1971-07-01   26.6  0.397
 2  25881 1972-07-01   15.3  0.549
 3  25881 1973-07-01   13.6  0.537
 4  25881 1974-07-01    4.65 1.89 
 5  25881 1975-07-01    4.71 1.76 
 6  25881 1976-07-01    9.24 1.19 
 7  25881 1977-07-01   12.7  1.22 
 8  25881 1978-07-01   20.6  0.858
 9  10015 1984-07-01   26.8  0.292
10  10015 1985-07-01   13.6  0.685
# ℹ 256,854 more rows
[1] "sorting_variables"
# A data frame: 233,216 × 6
   permno exchange sorting_date  size    me    bm
    <dbl> <chr>    <date>       <dbl> <dbl> <dbl>
 1  10001 NASDAQ   1987-07-01    5.82  6.94 1.01 
 2  10001 NASDAQ   1988-07-01    6.2   5.83 1.21 
 3  10001 NASDAQ   1989-07-01    7.01  6.36 1.15 
 4  10001 NASDAQ   1990-07-01   10.1  10.3  0.818
 5  10001 NASDAQ   1991-07-01   11.3  10.0  0.943
 6  10001 NASDAQ   1992-07-01   12.6  15.6  0.668
 7  10001 NASDAQ   1993-07-01   18.0  15.1  0.710
 8  10001 NASDAQ   1994-07-01   18.9  20.0  0.573
 9  10001 NASDAQ   1995-07-01   18.6  17.8  0.705
10  10001 NASDAQ   1996-07-01   18.6  21.4  0.642
# ℹ 233,206 more rows

Sorting für FF3

assign_portfolio <- function(data, 
                             sorting_variable, 
                             percentiles) {
  breakpoints <- data |>
    filter(exchange == "NYSE") |>
    pull({{ sorting_variable }}) |>
    quantile(
      probs = percentiles,
      na.rm = TRUE,
      names = FALSE
    )

  assigned_portfolios <- data |>
    mutate(portfolio = findInterval(
      pick(everything()) |>
        pull({{ sorting_variable }}),
      breakpoints,
      all.inside = TRUE
    )) |>
    pull(portfolio)
  
  return(assigned_portfolios)
}
portfolios <- sorting_variables |>
  group_by(sorting_date) |>
  mutate(
    portfolio_size = assign_portfolio(
      data = pick(everything()),
      sorting_variable = size,
      percentiles = c(0, 0.5, 1)
    ),
    portfolio_bm = assign_portfolio(
      data = pick(everything()),
      sorting_variable = bm,
      percentiles = c(0, 0.3, 0.7, 1)
    )
  ) |>
  ungroup() |> 
  select(permno, sorting_date, 
         portfolio_size, portfolio_bm)
portfolios <- crsp_monthly |>
  mutate(sorting_date = case_when(
    month(date) <= 6 ~ ymd(str_c(year(date) - 1, "0701")),
    month(date) >= 7 ~ ymd(str_c(year(date), "0701"))
  )) |>
  inner_join(portfolios, join_by(permno, sorting_date))
# A data frame: 2,666,586 × 6
    permno date       sorting_date mktcap portfolio_size portfolio_bm
     <dbl> <date>     <date>        <dbl>          <int>        <int>
  1  10001 1987-07-01 1987-07-01     5.95              1            3
  2  10001 1987-08-01 1987-07-01     6.44              1            3
  3  10001 1987-09-01 1987-07-01     6.2               1            3
  4  10001 1987-10-01 1987-07-01     6.32              1            3
  5  10001 1987-11-01 1987-07-01     6.14              1            3
  6  10001 1987-12-01 1987-07-01     5.83              1            3
  7  10001 1988-01-01 1987-07-01     6.2               1            3
  8  10001 1988-02-01 1987-07-01     6.70              1            3
  9  10001 1988-03-01 1987-07-01     6.08              1            3
 10  10001 1988-04-01 1987-07-01     6.26              1            3
 11  10001 1988-05-01 1987-07-01     6.39              1            3
 12  10001 1988-06-01 1987-07-01     6.2               1            3
 13  10001 1988-07-01 1988-07-01     6.39              1            3
 14  10001 1988-08-01 1988-07-01     6.57              1            3
 15  10001 1988-09-01 1988-07-01     6.36              1            3
 16  10001 1988-10-01 1988-07-01     6.61              1            3
 17  10001 1988-11-01 1988-07-01     6.61              1            3
 18  10001 1988-12-01 1988-07-01     6.36              1            3
 19  10001 1989-01-01 1988-07-01     6.49              1            3
 20  10001 1989-02-01 1988-07-01     6.74              1            3
 21  10001 1989-03-01 1988-07-01     6.74              1            3
 22  10001 1989-04-01 1988-07-01     7.24              1            3
 23  10001 1989-05-01 1988-07-01     6.99              1            3
 24  10001 1989-06-01 1988-07-01     7.01              1            3
 25  10001 1989-07-01 1989-07-01     7.26              1            3
 26  10001 1989-08-01 1989-07-01     9.26              1            3
 27  10001 1989-09-01 1989-07-01     9.04              1            3
 28  10001 1989-10-01 1989-07-01     9.68              1            3
 29  10001 1989-11-01 1989-07-01    10.1               1            3
 30  10001 1989-12-01 1989-07-01    10.3               1            3
 31  10001 1990-01-01 1989-07-01    10.2               1            3
 32  10001 1990-02-01 1989-07-01    10.1               1            3
 33  10001 1990-03-01 1989-07-01    10.1               1            3
 34  10001 1990-04-01 1989-07-01    10.1               1            3
 35  10001 1990-05-01 1989-07-01    10.0               1            3
 36  10001 1990-06-01 1989-07-01    10.1               1            3
 37  10001 1990-07-01 1990-07-01    10.3               1            2
 38  10001 1990-08-01 1990-07-01     9.79              1            2
 39  10001 1990-09-01 1990-07-01    10.2               1            2
 40  10001 1990-10-01 1990-07-01    10.0               1            2
 41  10001 1990-11-01 1990-07-01    10.0               1            2
 42  10001 1990-12-01 1990-07-01    10.0               1            2
 43  10001 1991-01-01 1990-07-01    10.1               1            2
 44  10001 1991-02-01 1990-07-01    10.3               1            2
 45  10001 1991-03-01 1990-07-01    10.0               1            2
 46  10001 1991-04-01 1990-07-01    10.4               1            2
 47  10001 1991-05-01 1990-07-01    10.4               1            2
 48  10001 1991-06-01 1990-07-01    11.3               1            2
 49  10001 1991-07-01 1991-07-01    10.9               1            2
 50  10001 1991-08-01 1991-07-01    12.3               1            2
 51  10001 1991-09-01 1991-07-01    12.3               1            2
 52  10001 1991-10-01 1991-07-01    13.9               1            2
 53  10001 1991-11-01 1991-07-01    15.8               1            2
 54  10001 1991-12-01 1991-07-01    15.6               1            2
 55  10001 1992-01-01 1991-07-01    14.8               1            2
 56  10001 1992-02-01 1991-07-01    11.8               1            2
 57  10001 1992-03-01 1991-07-01    12.6               1            2
 58  10001 1992-04-01 1991-07-01    12.8               1            2
 59  10001 1992-05-01 1991-07-01    12.9               1            2
 60  10001 1992-06-01 1991-07-01    12.6               1            2
 61  10001 1992-07-01 1992-07-01    13.4               1            2
 62  10001 1992-08-01 1992-07-01    14.0               1            2
 63  10001 1992-09-01 1992-07-01    16.1               1            2
 64  10001 1992-10-01 1992-07-01    15.7               1            2
 65  10001 1992-11-01 1992-07-01    15.5               1            2
 66  10001 1992-12-01 1992-07-01    15.1               1            2
 67  10001 1993-01-01 1992-07-01    15.1               1            2
 68  10001 1993-02-01 1992-07-01    15.4               1            2
 69  10001 1993-03-01 1992-07-01    15.3               1            2
 70  10001 1993-04-01 1992-07-01    16.4               1            2
 71  10001 1993-05-01 1992-07-01    16.3               1            2
 72  10001 1993-06-01 1992-07-01    18.0               1            2
 73  10001 1993-07-01 1993-07-01    17.8               1            2
 74  10001 1993-08-01 1993-07-01    17.3               1            2
 75  10001 1993-09-01 1993-07-01    18.3               1            2
 76  10001 1993-10-01 1993-07-01    18.8               1            2
 77  10001 1993-11-01 1993-07-01    18.5               1            2
 78  10001 1993-12-01 1993-07-01    20.0               1            2
 79  10001 1994-01-01 1993-07-01    19.1               1            2
 80  10001 1994-02-01 1993-07-01    19.1               1            2
 81  10001 1994-03-01 1993-07-01    18.8               1            2
 82  10001 1994-04-01 1993-07-01    16.1               1            2
 83  10001 1994-05-01 1993-07-01    17.2               1            2
 84  10001 1994-06-01 1993-07-01    18.9               1            2
 85  10001 1994-07-01 1994-07-01    20.3               1            2
 86  10001 1994-08-01 1994-07-01    19.7               1            2
 87  10001 1994-09-01 1994-07-01    20.5               1            2
 88  10001 1994-10-01 1994-07-01    19.3               1            2
 89  10001 1994-11-01 1994-07-01    18.5               1            2
 90  10001 1994-12-01 1994-07-01    17.8               1            2
 91  10001 1995-01-01 1994-07-01    17.2               1            2
 92  10001 1995-02-01 1994-07-01    16.8               1            2
 93  10001 1995-03-01 1994-07-01    16.8               1            2
 94  10001 1995-04-01 1994-07-01    16.8               1            2
 95  10001 1995-05-01 1994-07-01    17.7               1            2
 96  10001 1995-06-01 1994-07-01    18.6               1            2
 97  10001 1995-07-01 1995-07-01    18.6               1            2
 98  10001 1995-08-01 1995-07-01    18.0               1            2
 99  10001 1995-09-01 1995-07-01    18.8               1            2
100  10001 1995-10-01 1995-07-01    18.2               1            2
101  10001 1995-11-01 1995-07-01    20.0               1            2
102  10001 1995-12-01 1995-07-01    21.4               1            2
103  10001 1996-01-01 1995-07-01    20.8               1            2
104  10001 1996-02-01 1995-07-01    21.1               1            2
105  10001 1996-03-01 1995-07-01    21.9               1            2
106  10001 1996-04-01 1995-07-01    20.3               1            2
107  10001 1996-05-01 1995-07-01    19.9               1            2
108  10001 1996-06-01 1995-07-01    18.6               1            2
109  10001 1996-07-01 1996-07-01    19.0               1            2
110  10001 1996-08-01 1996-07-01    19.7               1            2
111  10001 1996-09-01 1996-07-01    20.5               1            2
112  10001 1996-10-01 1996-07-01    19.9               1            2
113  10001 1996-11-01 1996-07-01    20.5               1            2
114  10001 1996-12-01 1996-07-01    19.2               1            2
115  10001 1997-01-01 1996-07-01    20.3               1            2
116  10001 1997-02-01 1996-07-01    20.3               1            2
117  10001 1997-03-01 1996-07-01    20.3               1            2
118  10001 1997-04-01 1996-07-01    20.3               1            2
119  10001 1997-05-01 1996-07-01    20.3               1            2
120  10001 1997-06-01 1996-07-01    19.4               1            2
121  10001 1997-07-01 1997-07-01    20.3               1            3
122  10001 1997-08-01 1997-07-01    20.6               1            3
123  10001 1997-09-01 1997-07-01    21.1               1            3
124  10001 1997-10-01 1997-07-01    21.1               1            3
125  10001 1997-11-01 1997-07-01    20.8               1            3
126  10001 1997-12-01 1997-07-01    21.6               1            3
127  10001 1998-01-01 1997-07-01    21.6               1            3
128  10001 1998-02-01 1997-07-01    21.4               1            3
129  10001 1998-03-01 1997-07-01    21.0               1            3
130  10001 1998-04-01 1997-07-01    21.2               1            3
131  10001 1998-05-01 1997-07-01    20.9               1            3
132  10001 1998-06-01 1997-07-01    20.7               1            3
133  10001 1998-07-01 1998-07-01    21.0               1            3
134  10001 1998-08-01 1998-07-01    21.0               1            3
135  10001 1998-09-01 1998-07-01    22.3               1            3
136  10001 1998-10-01 1998-07-01    22.3               1            3
137  10001 1998-11-01 1998-07-01    22.7               1            3
138  10001 1998-12-01 1998-07-01    23.3               1            3
139  10001 1999-01-01 1998-07-01    23.3               1            3
140  10001 1999-02-01 1998-07-01    21.2               1            3
141  10001 1999-03-01 1998-07-01    21.2               1            3
142  10001 1999-04-01 1998-07-01    21.5               1            3
143  10001 1999-05-01 1998-07-01    21.0               1            3
144  10001 1999-06-01 1998-07-01    21.1               1            3
145  10001 1999-07-01 1999-07-01    21.6               1            2
146  10001 1999-08-01 1999-07-01    21.4               1            2
147  10001 1999-09-01 1999-07-01    19.6               1            2
148  10001 1999-10-01 1999-07-01    21.3               1            2
149  10001 1999-11-01 1999-07-01    21.2               1            2
150  10001 1999-12-01 1999-07-01    20.8               1            2
151  10001 2000-01-01 1999-07-01    19.9               1            2
152  10001 2000-02-01 1999-07-01    20.2               1            2
153  10001 2000-03-01 1999-07-01    19.7               1            2
154  10001 2000-04-01 1999-07-01    19.9               1            2
155  10001 2000-05-01 1999-07-01    19.5               1            2
156  10001 2000-06-01 1999-07-01    19.8               1            2
157  10001 2000-07-01 2000-07-01    19.5               1            2
158  10001 2000-08-01 2000-07-01    20.4               1            2
159  10001 2000-09-01 2000-07-01    21.8               1            2
160  10001 2000-10-01 2000-07-01    22.5               1            2
161  10001 2000-11-01 2000-07-01    23.9               1            2
162  10001 2000-12-01 2000-07-01    24.4               1            2
163  10001 2001-01-01 2000-07-01    24.7               1            2
164  10001 2001-02-01 2000-07-01    24.4               1            2
165  10001 2001-03-01 2000-07-01    25.1               1            2
166  10001 2001-04-01 2000-07-01    24.5               1            2
167  10001 2001-05-01 2000-07-01    26.8               1            2
168  10001 2001-06-01 2000-07-01    29.7               1            2
169  10001 2001-07-01 2001-07-01    30.4               1            2
170  10001 2001-08-01 2001-07-01    31.0               1            2
171  10001 2001-09-01 2001-07-01    30.1               1            2
172  10001 2001-10-01 2001-07-01    29.1               1            2
173  10001 2001-11-01 2001-07-01    29.3               1            2
174  10001 2001-12-01 2001-07-01    29.4               1            2
175  10001 2002-01-01 2001-07-01    29.0               1            2
176  10001 2002-02-01 2001-07-01    27.5               1            2
177  10001 2002-03-01 2001-07-01    26.7               1            2
178  10001 2002-04-01 2001-07-01    25.6               1            2
179  10001 2002-05-01 2001-07-01    26.0               1            2
180  10001 2002-06-01 2001-07-01    25.0               1            2
181  10001 2002-07-01 2002-07-01    22.5               1            2
182  10001 2002-08-01 2002-07-01    22.9               1            2
183  10001 2002-09-01 2002-07-01    22.5               1            2
184  10001 2002-10-01 2002-07-01    22.2               1            2
185  10001 2002-11-01 2002-07-01    21.7               1            2
186  10001 2002-12-01 2002-07-01    19.0               1            2
187  10001 2003-01-01 2002-07-01    21.9               1            2
188  10001 2003-02-01 2002-07-01    22.6               1            2
189  10001 2003-03-01 2002-07-01    19.8               1            2
190  10001 2003-04-01 2002-07-01    13.4               1            2
191  10001 2003-05-01 2002-07-01    21.9               1            2
192  10001 2003-06-01 2002-07-01    15.6               1            2
193  10001 2003-07-01 2003-07-01    15.9               1            3
194  10001 2003-08-01 2003-07-01    17.3               1            3
195  10001 2003-09-01 2003-07-01    17.9               1            3
196  10001 2003-10-01 2003-07-01    15.6               1            3
197  10001 2003-11-01 2003-07-01    15.5               1            3
198  10001 2003-12-01 2003-07-01    15.4               1            3
199  10001 2004-01-01 2003-07-01    15.6               1            3
200  10001 2004-02-01 2003-07-01    16.8               1            3
# ℹ 2,666,386 more rows

Fama-French Three-Factor Model

factors_replicated <- portfolios |>
  group_by(portfolio_size, portfolio_bm, date) |>
  summarize(
    ret = weighted.mean(ret_excess, mktcap_lag), .groups = "drop"
  ) |>
  group_by(date) |>
  summarize(
    smb_replicated = mean(ret[portfolio_size == 1]) -
      mean(ret[portfolio_size == 2]),
    hml_replicated = mean(ret[portfolio_bm == 3]) -
      mean(ret[portfolio_bm == 1])
  )
# A tibble: 750 × 3
   date       smb_replicated hml_replicated
   <date>              <dbl>          <dbl>
 1 1961-07-01      -0.0179         -0.00284
 2 1961-08-01       0.000446       -0.0220 
 3 1961-09-01      -0.00352        -0.0215 
 4 1961-10-01      -0.0124          0.00571
 5 1961-11-01      -0.000233        0.00495
 6 1961-12-01      -0.0211          0.0427 
 7 1962-01-01       0.00893         0.0423 
 8 1962-02-01      -0.00768         0.0165 
 9 1962-03-01       0.0120         -0.0200 
10 1962-04-01      -0.00880         0.00426
# ℹ 740 more rows
test <- factors_ff3_monthly |>
  inner_join(factors_replicated, join_by(date)) |>
  mutate(
    across(c(smb_replicated, hml_replicated), ~round(., 4))
  )

Small-minus-Big

model_smb <- lm(smb ~ smb_replicated, data = test)
summary(model_smb)

Call:
lm(formula = smb ~ smb_replicated, data = test)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0205101 -0.0014497  0.0000327  0.0014360  0.0147423 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -0.0000994  0.0001283  -0.775    0.439    
smb_replicated  0.9880963  0.0042202 234.137   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.003508 on 748 degrees of freedom
Multiple R-squared:  0.9865,    Adjusted R-squared:  0.9865 
F-statistic: 5.482e+04 on 1 and 748 DF,  p-value: < 2.2e-16

High-minus-Low

model_hml <- lm(hml ~ hml_replicated, data = test)
summary(model_hml)

Call:
lm(formula = hml ~ hml_replicated, data = test)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.023527 -0.002818 -0.000200  0.002187  0.033922 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    0.0003464  0.0002155   1.607    0.108    
hml_replicated 0.9620722  0.0071073 135.364   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.005876 on 748 degrees of freedom
Multiple R-squared:  0.9608,    Adjusted R-squared:  0.9607 
F-statistic: 1.832e+04 on 1 and 748 DF,  p-value: < 2.2e-16

Fama-French Five-Factor Model

other_sorting_variables <- compustat |>
  mutate(sorting_date = ymd(str_c(year(datadate) + 1, "0701"))) |>
  select(gvkey, sorting_date, be, op, inv) |>
  inner_join(market_equity, 
             join_by(gvkey, sorting_date)) |>
  mutate(bm = be / me) |>
  select(permno, sorting_date, me, be, bm, op, inv)

sorting_variables <- size |>
  inner_join(
    other_sorting_variables, 
    join_by(permno, sorting_date)
    ) |>
  drop_na() |>
  distinct(permno, sorting_date, .keep_all = TRUE)
portfolios <- sorting_variables |>
  group_by(sorting_date) |>
  mutate(
    portfolio_size = assign_portfolio(
      data = pick(everything()),
      sorting_variable = size,
      percentiles = c(0, 0.5, 1)
    )) |> 
  group_by(sorting_date, portfolio_size) |> 
  mutate(
    across(c(bm, op, inv), ~assign_portfolio(
      data = pick(everything()), 
      sorting_variable = ., 
      percentiles = c(0, 0.3, 0.7, 1)),
      .names = "portfolio_{.col}"
    )
  ) |>
  ungroup() |> 
  select(permno, sorting_date, 
         portfolio_size, portfolio_bm,
         portfolio_op, portfolio_inv)

portfolios <- crsp_monthly |>
  mutate(sorting_date = case_when(
    month(date) <= 6 ~ ymd(str_c(year(date) - 1, "0701")),
    month(date) >= 7 ~ ymd(str_c(year(date), "0701"))
  )) |>
  inner_join(portfolios, join_by(permno, sorting_date))

HML - Value

portfolios_value <- portfolios |>
  group_by(portfolio_size, portfolio_bm, date) |>
  summarize(
    ret = weighted.mean(ret_excess, mktcap_lag), 
    .groups = "drop"
  )

factors_value <- portfolios_value |>
  group_by(date) |>
  summarize(
    hml_replicated = mean(ret[portfolio_bm == 3]) -
      mean(ret[portfolio_bm == 1])
  )

RMW - Profitabilität

portfolios_profitability <- portfolios |>
  group_by(portfolio_size, portfolio_op, date) |>
  summarize(
    ret = weighted.mean(ret_excess, mktcap_lag), 
    .groups = "drop"
  ) 

factors_profitability <- portfolios_profitability |>
  group_by(date) |>
  summarize(
    rmw_replicated = mean(ret[portfolio_op == 3]) -
      mean(ret[portfolio_op == 1])
  )

CMA - Investment

portfolios_investment <- portfolios |>
  group_by(portfolio_size, portfolio_inv, date) |>
  summarize(
    ret = weighted.mean(ret_excess, mktcap_lag), 
    .groups = "drop"
  )

factors_investment <- portfolios_investment |>
  group_by(date) |>
  summarize(
    cma_replicated = mean(ret[portfolio_inv == 1]) -
      mean(ret[portfolio_inv == 3])
  )

SMB - Size (als letztes wegen dependent sort)

factors_size <- bind_rows(
  portfolios_value,
  portfolios_profitability,
  portfolios_investment
) |> 
  group_by(date) |>
  summarize(
    smb_replicated = mean(ret[portfolio_size == 1]) -
      mean(ret[portfolio_size == 2])
  )
factors_replicated <- factors_size |>
  full_join(factors_value, join_by(date)) |>
  full_join(factors_profitability, join_by(date)) |>
  full_join(factors_investment, join_by(date))
# A tibble: 738 × 5
   date       smb_replicated hml_replicated rmw_replicated cma_replicated
   <date>              <dbl>          <dbl>          <dbl>          <dbl>
 1 1962-07-01       -0.00370       -0.0243        0.0190         -0.0291 
 2 1962-08-01        0.00396       -0.00638       0.00798         0.00649
 3 1962-09-01       -0.0129         0.00444       0.00161         0.00203
 4 1962-10-01       -0.0263        -0.00250       0.0129          0.00765
 5 1962-11-01        0.0213         0.00944      -0.0100         -0.00209
 6 1962-12-01       -0.0217        -0.00821       0.00383        -0.00767
 7 1963-01-01        0.0292         0.0135       -0.000551        0.0176 
 8 1963-02-01        0.00434        0.0212       -0.0165          0.0124 
 9 1963-03-01       -0.00960        0.00966       0.00637        -0.00681
10 1963-04-01       -0.00891        0.00400       0.0105          0.00845
# ℹ 728 more rows
test <- factors_ff5_monthly |>
  inner_join(factors_replicated, join_by(date)) |>
  mutate(
    across(c(smb_replicated, hml_replicated, 
             rmw_replicated, cma_replicated), ~round(., 4))
  )

Small-minus-Big

model_smb <- lm(smb ~ smb_replicated, data = test)
summary(model_smb)

Call:
lm(formula = smb ~ smb_replicated, data = test)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0190136 -0.0018549  0.0002052  0.0019887  0.0137217 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)    -0.0001828  0.0001328  -1.377    0.169    
smb_replicated  0.9639056  0.0042402 227.324   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.003567 on 724 degrees of freedom
Multiple R-squared:  0.9862,    Adjusted R-squared:  0.9862 
F-statistic: 5.168e+04 on 1 and 724 DF,  p-value: < 2.2e-16

High-minus-Low

model_hml <- lm(hml ~ hml_replicated, data = test)
summary(model_hml)

Call:
lm(formula = hml ~ hml_replicated, data = test)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.044359 -0.004058 -0.000349  0.004025  0.036610 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    0.0005772  0.0002944    1.96   0.0503 .  
hml_replicated 0.9890888  0.0100505   98.41   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.007907 on 724 degrees of freedom
Multiple R-squared:  0.9304,    Adjusted R-squared:  0.9303 
F-statistic:  9685 on 1 and 724 DF,  p-value: < 2.2e-16

Robust-minus-Weak

model_rmw <- lm(rmw ~ rmw_replicated, data = test)
summary(model_rmw)

Call:
lm(formula = rmw ~ rmw_replicated, data = test)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0199638 -0.0031132  0.0000533  0.0032781  0.0187142 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    2.265e-05  2.014e-04   0.112     0.91    
rmw_replicated 9.502e-01  8.791e-03 108.083   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.005381 on 724 degrees of freedom
Multiple R-squared:  0.9416,    Adjusted R-squared:  0.9416 
F-statistic: 1.168e+04 on 1 and 724 DF,  p-value: < 2.2e-16

Conservative-minus-Aggressive

model_cma <- lm(cma ~ cma_replicated, data = test)
summary(model_cma)

Call:
lm(formula = cma ~ cma_replicated, data = test)

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0151966 -0.0026756 -0.0001165  0.0024566  0.0214458 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)    0.0006444  0.0001676   3.845 0.000131 ***
cma_replicated 0.9627165  0.0079268 121.451  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.004492 on 724 degrees of freedom
Multiple R-squared:  0.9532,    Adjusted R-squared:  0.9531 
F-statistic: 1.475e+04 on 1 and 724 DF,  p-value: < 2.2e-16

Ausschluss von jungen Unternehmen

  • Ersten zwei Jahre des Unternehmens in der Compustat ausschließen
  • Durch IPO oder ähnliches häufig instabile Bilanzkennzahlen (überproportional volatil)
  • Oft fehlen Profitabilität (RMW) und Investment (CMA) in den Daten
    • Führt zu verzerrten Faktorberechnungen
  • Ergebnis
    • Auschluss führt zu robusterer Faktorkonstruktion
    • Qualität steigt (insbesondere bei den zusätzlichen FF5 Faktoren)

Download über tidyfinance Paket

library(tidyfinance)

factors_ff5_monthly <- download_data("factors_ff_5_2x3_monthly") |>
  select(date, smb, hml, rmw, cma)