Let's do some Purrr...
In this post I’m going to show some cool feature of Purrr. Purrr is an R package for functional programming. I have always been facinated by functional programming. I first heard about it while I was learning Scala. With this approach, not only it makes our code more succinct, but more expressive. There are other ways to achieve our results by using loops or functions like sapply,lapply but let’s not go into that direction. I can promise Purrr handles most efficiently than others. So, without further ado lets dive straingt in and do some purrr.
Not that kind of purrr…
For this excercise, I’m using mtcars
dataset. Lets load some libraries like tidyverse, tidytext to begin with.
Suppose if we want to run regression model on 3 sets of datasets grouped by certain feature (cyl in our case), we need to regress dataset separately.
Regression for dataset with cyl == 4
cyl4<-mtcars %>%
filter(cyl == 4)
## Warning: package 'bindrcpp' was built under R version 3.5.2
cyl4
## mpg cyl disp hp drat wt qsec vs am gear carb
## 1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## 2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## 3 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## 4 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## 5 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## 6 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## 7 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## 8 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## 9 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## 10 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## 11 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
summary(lm(mpg~wt, data=cyl4))
##
## Call:
## lm(formula = mpg ~ wt, data = cyl4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1513 -1.9795 -0.6272 1.9299 5.2523
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.571 4.347 9.104 7.77e-06 ***
## wt -5.647 1.850 -3.052 0.0137 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.332 on 9 degrees of freedom
## Multiple R-squared: 0.5086, Adjusted R-squared: 0.454
## F-statistic: 9.316 on 1 and 9 DF, p-value: 0.01374
So we have to do like this for each cyl. One thing you might have noticed is that we are going to have separate results for each group. Imagine doing some analysis on bigger dataset and having to keep track of each result separately. It would really be a nightmare. Some may argue we can achieve this by using loops but they have their own disadvantages. On the other side, we can achieve this easily by using Purrr
. Moreover, I have choosen to use tibble
. A tibble is similar to traditional dataframe but much more efficient. Out of many great features, one I like most is the fact that it can store ‘list columns’. Let me show this.
nested<-mtcars %>%
nest(-cyl)
nested
## cyl
## 1 6
## 2 4
## 3 8
## data
## 1 21.000, 21.000, 21.400, 18.100, 19.200, 17.800, 19.700, 160.000, 160.000, 258.000, 225.000, 167.600, 167.600, 145.000, 110.000, 110.000, 110.000, 105.000, 123.000, 123.000, 175.000, 3.900, 3.900, 3.080, 2.760, 3.920, 3.920, 3.620, 2.620, 2.875, 3.215, 3.460, 3.440, 3.440, 2.770, 16.460, 17.020, 19.440, 20.220, 18.300, 18.900, 15.500, 0.000, 0.000, 1.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 0.000, 0.000, 0.000, 0.000, 1.000, 4.000, 4.000, 3.000, 3.000, 4.000, 4.000, 5.000, 4.000, 4.000, 1.000, 1.000, 4.000, 4.000, 6.000
## 2 22.800, 24.400, 22.800, 32.400, 30.400, 33.900, 21.500, 27.300, 26.000, 30.400, 21.400, 108.000, 146.700, 140.800, 78.700, 75.700, 71.100, 120.100, 79.000, 120.300, 95.100, 121.000, 93.000, 62.000, 95.000, 66.000, 52.000, 65.000, 97.000, 66.000, 91.000, 113.000, 109.000, 3.850, 3.690, 3.920, 4.080, 4.930, 4.220, 3.700, 4.080, 4.430, 3.770, 4.110, 2.320, 3.190, 3.150, 2.200, 1.615, 1.835, 2.465, 1.935, 2.140, 1.513, 2.780, 18.610, 20.000, 22.900, 19.470, 18.520, 19.900, 20.010, 18.900, 16.700, 16.900, 18.600, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000, 0.000, 0.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 4.000, 4.000, 4.000, 4.000, 4.000, 4.000, 3.000, 4.000, 5.000, 5.000, 4.000, 1.000, 2.000, 2.000, 1.000, 2.000, 1.000, 1.000, 1.000, 2.000, 2.000, 2.000
## 3 18.700, 14.300, 16.400, 17.300, 15.200, 10.400, 10.400, 14.700, 15.500, 15.200, 13.300, 19.200, 15.800, 15.000, 360.000, 360.000, 275.800, 275.800, 275.800, 472.000, 460.000, 440.000, 318.000, 304.000, 350.000, 400.000, 351.000, 301.000, 175.000, 245.000, 180.000, 180.000, 180.000, 205.000, 215.000, 230.000, 150.000, 150.000, 245.000, 175.000, 264.000, 335.000, 3.150, 3.210, 3.070, 3.070, 3.070, 2.930, 3.000, 3.230, 2.760, 3.150, 3.730, 3.080, 4.220, 3.540, 3.440, 3.570, 4.070, 3.730, 3.780, 5.250, 5.424, 5.345, 3.520, 3.435, 3.840, 3.845, 3.170, 3.570, 17.020, 15.840, 17.400, 17.600, 18.000, 17.980, 17.820, 17.420, 16.870, 17.300, 15.410, 17.050, 14.500, 14.600, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 1.000, 1.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 5.000, 5.000, 2.000, 4.000, 3.000, 3.000, 3.000, 4.000, 4.000, 4.000, 2.000, 2.000, 4.000, 2.000, 4.000, 8.000
Here if you look at column data
, its a list column. Each entry is a separate dataframe. Its like an entire Excel spreadsheet stored into that tiny cell. And this is made possible by using nest
function from tidyr
package. Lets see what’s inside one of them.
nested$data[[1]]
## mpg disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 160.0 110 3.90 2.875 17.02 0 1 4 4
## Hornet 4 Drive 21.4 258.0 110 3.08 3.215 19.44 1 0 3 1
## Valiant 18.1 225.0 105 2.76 3.460 20.22 1 0 3 1
## Merc 280 19.2 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 167.6 123 3.92 3.440 18.90 1 0 4 4
## Ferrari Dino 19.7 145.0 175 3.62 2.770 15.50 0 1 5 6
Viola! as mentioned each entry of that list is a separate dataframe.
Now we can run regression on each entry of that list and store each model into yet another list column. For this we have to use map() function, which helps to iterate on each entry of that list and run regression.
model_nested<-nested %>%
mutate(model = map(data,~lm(mpg~wt, data=.)))
model_nested
## cyl
## 1 6
## 2 4
## 3 8
## data
## 1 21.000, 21.000, 21.400, 18.100, 19.200, 17.800, 19.700, 160.000, 160.000, 258.000, 225.000, 167.600, 167.600, 145.000, 110.000, 110.000, 110.000, 105.000, 123.000, 123.000, 175.000, 3.900, 3.900, 3.080, 2.760, 3.920, 3.920, 3.620, 2.620, 2.875, 3.215, 3.460, 3.440, 3.440, 2.770, 16.460, 17.020, 19.440, 20.220, 18.300, 18.900, 15.500, 0.000, 0.000, 1.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 0.000, 0.000, 0.000, 0.000, 1.000, 4.000, 4.000, 3.000, 3.000, 4.000, 4.000, 5.000, 4.000, 4.000, 1.000, 1.000, 4.000, 4.000, 6.000
## 2 22.800, 24.400, 22.800, 32.400, 30.400, 33.900, 21.500, 27.300, 26.000, 30.400, 21.400, 108.000, 146.700, 140.800, 78.700, 75.700, 71.100, 120.100, 79.000, 120.300, 95.100, 121.000, 93.000, 62.000, 95.000, 66.000, 52.000, 65.000, 97.000, 66.000, 91.000, 113.000, 109.000, 3.850, 3.690, 3.920, 4.080, 4.930, 4.220, 3.700, 4.080, 4.430, 3.770, 4.110, 2.320, 3.190, 3.150, 2.200, 1.615, 1.835, 2.465, 1.935, 2.140, 1.513, 2.780, 18.610, 20.000, 22.900, 19.470, 18.520, 19.900, 20.010, 18.900, 16.700, 16.900, 18.600, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000, 0.000, 0.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 4.000, 4.000, 4.000, 4.000, 4.000, 4.000, 3.000, 4.000, 5.000, 5.000, 4.000, 1.000, 2.000, 2.000, 1.000, 2.000, 1.000, 1.000, 1.000, 2.000, 2.000, 2.000
## 3 18.700, 14.300, 16.400, 17.300, 15.200, 10.400, 10.400, 14.700, 15.500, 15.200, 13.300, 19.200, 15.800, 15.000, 360.000, 360.000, 275.800, 275.800, 275.800, 472.000, 460.000, 440.000, 318.000, 304.000, 350.000, 400.000, 351.000, 301.000, 175.000, 245.000, 180.000, 180.000, 180.000, 205.000, 215.000, 230.000, 150.000, 150.000, 245.000, 175.000, 264.000, 335.000, 3.150, 3.210, 3.070, 3.070, 3.070, 2.930, 3.000, 3.230, 2.760, 3.150, 3.730, 3.080, 4.220, 3.540, 3.440, 3.570, 4.070, 3.730, 3.780, 5.250, 5.424, 5.345, 3.520, 3.435, 3.840, 3.845, 3.170, 3.570, 17.020, 15.840, 17.400, 17.600, 18.000, 17.980, 17.820, 17.420, 16.870, 17.300, 15.410, 17.050, 14.500, 14.600, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 1.000, 1.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 5.000, 5.000, 2.000, 4.000, 3.000, 3.000, 3.000, 4.000, 4.000, 4.000, 2.000, 2.000, 4.000, 2.000, 4.000, 8.000
## model
## 1 28.40884, -2.780106, -0.124967, 0.5839601, 1.929196, -0.689678, 0.3547199, -1.04528, -1.007951, -52.23469, -2.426656, 2.111436, -0.3526643, 0.679099, -0.720901, -1.10683, 2, 21.12497, 20.41604, 19.4708, 18.78968, 18.84528, 18.84528, 20.70795, 0, 1, -2.645751, 0.3779645, 0.3779645, 0.3779645, 0.3779645, 0.3779645, 0.3779645, -8.247185, 0.8728647, -0.2683341, -0.5490191, -0.526106, -0.526106, 0.2414814, 1.377964, 1.121188, 1, 2, 1e-07, 2, 5, lm(formula = mpg ~ wt, data = .), mpg ~ wt, 21, 21, 21.4, 18.1, 19.2, 17.8, 19.7, 2.62, 2.875, 3.215, 3.46, 3.44, 3.44, 2.77
## 2 39.5712, -5.647025, -3.670097, 2.842815, 1.016934, 5.25226, -0.05125022, 4.691095, -4.151279, -1.344202, -1.486562, -0.6272468, -2.472466, -88.43328, 10.17096, 0.6947654, 6.230721, 1.728126, 6.169273, -3.535624, -0.00293297, -0.4259551, 1.291776, -2.288073, 2, 26.4701, 21.55719, 21.78307, 27.14774, 30.45125, 29.2089, 25.65128, 28.6442, 27.48656, 31.02725, 23.87247, 0, 1, -3.316625, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, -7.5809, -1.801119, 0.4754451, -0.05200489, -0.376803, -0.2546567, 0.0951259, -0.1991357, -0.08531752, -0.4334345, 0.2700172, 1.301511, 1.497654, 1, 2, 1e-07, 2, 9, lm(formula = mpg ~ wt, data = .), mpg ~ wt, 22.8, 24.4, 22.8, 32.4, 30.4, 33.9, 21.5, 27.3, 26, 30.4, 21.4, 2.32, 3.19, 3.15, 2.2, 1.615, 1.835, 2.465, 1.935, 2.14, 1.513, 2.78
## 3 23.86803, -2.192438, 2.373957, -1.741026, 1.455193, 1.609764, -0.3806137, -1.95773, -1.576246, 2.550552, -0.6506476, -1.137005, -2.149067, 3.761895, -1.118001, -1.041026, -56.49903, -6.003055, 0.8157971, 1.220314, -0.8068206, -3.464586, -3.211015, 0.9738578, -0.8857193, -1.30959, -2.619382, 3.287904, -1.095775, -1.312854, 2, 16.32604, 16.04103, 14.94481, 15.69024, 15.58061, 12.35773, 11.97625, 12.14945, 16.15065, 16.337, 15.44907, 15.43811, 16.918, 16.04103, 0, 1, -3.741657, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, -14.96369, 2.738073, -0.06892519, 0.05524975, 0.03698873, -0.4998852, -0.5634336, -0.5345812, 0.131946, 0.1629898, 0.0150755, 0.0132494, 0.2597732, 0.113685, 1.267261, 1.113685, 1, 2, 1e-07, 2, 12, lm(formula = mpg ~ wt, data = .), mpg ~ wt, 18.7, 14.3, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 15.5, 15.2, 13.3, 19.2, 15.8, 15, 3.44, 3.57, 4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 3.52, 3.435, 3.84, 3.845, 3.17, 3.57
Here, mutate
is generic command to create a column called model
. The meat of the operation starts with map
function.
Purrr comes in different flavous of map function. We have map(), map_int(), map_dbl(), map_chr(). As you may have guessed, each one returns certain kind of data type like map_int() returns Integers, map_dbl() returns Doubles, but map() always returns a list
.
So in our above code, we can be certain that our result will be a list. You may be wondering what is (~)? It denotes an anonymous function, a function which is defined on a fly. So, lm(mpg~wt)
denotes a linear regression being run with mpg
against wt
. The (.) denotes the current dataframe in that context. So what map
has done is run 3 regression models and stored the respective results under model
column. We can see what’s the first entry.
model_nested$model[[1]]
##
## Call:
## lm(formula = mpg ~ wt, data = .)
##
## Coefficients:
## (Intercept) wt
## 28.41 -2.78
As we see, its one of the models. We can get more info into it by running summary
function.
summary(model_nested$model[[1]])
##
## Call:
## lm(formula = mpg ~ wt, data = .)
##
## Residuals:
## Mazda RX4 Mazda RX4 Wag Hornet 4 Drive Valiant Merc 280
## -0.1250 0.5840 1.9292 -0.6897 0.3547
## Merc 280C Ferrari Dino
## -1.0453 -1.0080
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.409 4.184 6.789 0.00105 **
## wt -2.780 1.335 -2.083 0.09176 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.165 on 5 degrees of freedom
## Multiple R-squared: 0.4645, Adjusted R-squared: 0.3574
## F-statistic: 4.337 on 1 and 5 DF, p-value: 0.09176
But why do summary separately, if we know how to Purrr :)
model_nested_summarised<-model_nested %>%
mutate(model_summary = map(model, ~summary(.)))
model_nested_summarised
## cyl
## 1 6
## 2 4
## 3 8
## data
## 1 21.000, 21.000, 21.400, 18.100, 19.200, 17.800, 19.700, 160.000, 160.000, 258.000, 225.000, 167.600, 167.600, 145.000, 110.000, 110.000, 110.000, 105.000, 123.000, 123.000, 175.000, 3.900, 3.900, 3.080, 2.760, 3.920, 3.920, 3.620, 2.620, 2.875, 3.215, 3.460, 3.440, 3.440, 2.770, 16.460, 17.020, 19.440, 20.220, 18.300, 18.900, 15.500, 0.000, 0.000, 1.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 0.000, 0.000, 0.000, 0.000, 1.000, 4.000, 4.000, 3.000, 3.000, 4.000, 4.000, 5.000, 4.000, 4.000, 1.000, 1.000, 4.000, 4.000, 6.000
## 2 22.800, 24.400, 22.800, 32.400, 30.400, 33.900, 21.500, 27.300, 26.000, 30.400, 21.400, 108.000, 146.700, 140.800, 78.700, 75.700, 71.100, 120.100, 79.000, 120.300, 95.100, 121.000, 93.000, 62.000, 95.000, 66.000, 52.000, 65.000, 97.000, 66.000, 91.000, 113.000, 109.000, 3.850, 3.690, 3.920, 4.080, 4.930, 4.220, 3.700, 4.080, 4.430, 3.770, 4.110, 2.320, 3.190, 3.150, 2.200, 1.615, 1.835, 2.465, 1.935, 2.140, 1.513, 2.780, 18.610, 20.000, 22.900, 19.470, 18.520, 19.900, 20.010, 18.900, 16.700, 16.900, 18.600, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000, 0.000, 0.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 4.000, 4.000, 4.000, 4.000, 4.000, 4.000, 3.000, 4.000, 5.000, 5.000, 4.000, 1.000, 2.000, 2.000, 1.000, 2.000, 1.000, 1.000, 1.000, 2.000, 2.000, 2.000
## 3 18.700, 14.300, 16.400, 17.300, 15.200, 10.400, 10.400, 14.700, 15.500, 15.200, 13.300, 19.200, 15.800, 15.000, 360.000, 360.000, 275.800, 275.800, 275.800, 472.000, 460.000, 440.000, 318.000, 304.000, 350.000, 400.000, 351.000, 301.000, 175.000, 245.000, 180.000, 180.000, 180.000, 205.000, 215.000, 230.000, 150.000, 150.000, 245.000, 175.000, 264.000, 335.000, 3.150, 3.210, 3.070, 3.070, 3.070, 2.930, 3.000, 3.230, 2.760, 3.150, 3.730, 3.080, 4.220, 3.540, 3.440, 3.570, 4.070, 3.730, 3.780, 5.250, 5.424, 5.345, 3.520, 3.435, 3.840, 3.845, 3.170, 3.570, 17.020, 15.840, 17.400, 17.600, 18.000, 17.980, 17.820, 17.420, 16.870, 17.300, 15.410, 17.050, 14.500, 14.600, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 1.000, 1.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 5.000, 5.000, 2.000, 4.000, 3.000, 3.000, 3.000, 4.000, 4.000, 4.000, 2.000, 2.000, 4.000, 2.000, 4.000, 8.000
## model
## 1 28.40884, -2.780106, -0.124967, 0.5839601, 1.929196, -0.689678, 0.3547199, -1.04528, -1.007951, -52.23469, -2.426656, 2.111436, -0.3526643, 0.679099, -0.720901, -1.10683, 2, 21.12497, 20.41604, 19.4708, 18.78968, 18.84528, 18.84528, 20.70795, 0, 1, -2.645751, 0.3779645, 0.3779645, 0.3779645, 0.3779645, 0.3779645, 0.3779645, -8.247185, 0.8728647, -0.2683341, -0.5490191, -0.526106, -0.526106, 0.2414814, 1.377964, 1.121188, 1, 2, 1e-07, 2, 5, lm(formula = mpg ~ wt, data = .), mpg ~ wt, 21, 21, 21.4, 18.1, 19.2, 17.8, 19.7, 2.62, 2.875, 3.215, 3.46, 3.44, 3.44, 2.77
## 2 39.5712, -5.647025, -3.670097, 2.842815, 1.016934, 5.25226, -0.05125022, 4.691095, -4.151279, -1.344202, -1.486562, -0.6272468, -2.472466, -88.43328, 10.17096, 0.6947654, 6.230721, 1.728126, 6.169273, -3.535624, -0.00293297, -0.4259551, 1.291776, -2.288073, 2, 26.4701, 21.55719, 21.78307, 27.14774, 30.45125, 29.2089, 25.65128, 28.6442, 27.48656, 31.02725, 23.87247, 0, 1, -3.316625, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, -7.5809, -1.801119, 0.4754451, -0.05200489, -0.376803, -0.2546567, 0.0951259, -0.1991357, -0.08531752, -0.4334345, 0.2700172, 1.301511, 1.497654, 1, 2, 1e-07, 2, 9, lm(formula = mpg ~ wt, data = .), mpg ~ wt, 22.8, 24.4, 22.8, 32.4, 30.4, 33.9, 21.5, 27.3, 26, 30.4, 21.4, 2.32, 3.19, 3.15, 2.2, 1.615, 1.835, 2.465, 1.935, 2.14, 1.513, 2.78
## 3 23.86803, -2.192438, 2.373957, -1.741026, 1.455193, 1.609764, -0.3806137, -1.95773, -1.576246, 2.550552, -0.6506476, -1.137005, -2.149067, 3.761895, -1.118001, -1.041026, -56.49903, -6.003055, 0.8157971, 1.220314, -0.8068206, -3.464586, -3.211015, 0.9738578, -0.8857193, -1.30959, -2.619382, 3.287904, -1.095775, -1.312854, 2, 16.32604, 16.04103, 14.94481, 15.69024, 15.58061, 12.35773, 11.97625, 12.14945, 16.15065, 16.337, 15.44907, 15.43811, 16.918, 16.04103, 0, 1, -3.741657, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, -14.96369, 2.738073, -0.06892519, 0.05524975, 0.03698873, -0.4998852, -0.5634336, -0.5345812, 0.131946, 0.1629898, 0.0150755, 0.0132494, 0.2597732, 0.113685, 1.267261, 1.113685, 1, 2, 1e-07, 2, 12, lm(formula = mpg ~ wt, data = .), mpg ~ wt, 18.7, 14.3, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 15.5, 15.2, 13.3, 19.2, 15.8, 15, 3.44, 3.57, 4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 3.52, 3.435, 3.84, 3.845, 3.17, 3.57
## model_summary
## 1 lm(formula = mpg ~ wt, data = .), mpg ~ wt, -0.124967, 0.5839601, 1.929196, -0.689678, 0.3547199, -1.04528, -1.007951, 28.40884, -2.780106, 4.184369, 1.334917, 6.789278, -2.082605, 0.001054844, 0.09175766, FALSE, FALSE, 1.165202, 2, 5, 2, 0.4645102, 0.3574122, 4.337245, 1, 5, 12.89607, -4.091314, -4.091314, 1.312521
## 2 lm(formula = mpg ~ wt, data = .), mpg ~ wt, -3.670097, 2.842815, 1.016934, 5.25226, -0.05125022, 4.691095, -4.151279, -1.344202, -1.486562, -0.6272468, -2.472466, 39.5712, -5.647025, 4.346582, 1.850119, 9.10398, -3.052251, 7.771511e-06, 0.01374278, FALSE, FALSE, 3.332283, 2, 9, 2, 0.5086326, 0.4540362, 9.316233, 1, 9, 1.701422, -0.7045954, -0.7045954, 0.3082587
## 3 lm(formula = mpg ~ wt, data = .), mpg ~ wt, 2.373957, -1.741026, 1.455193, 1.609764, -0.3806137, -1.95773, -1.576246, 2.550552, -0.6506476, -1.137005, -2.149067, 3.761895, -1.118001, -1.041026, 23.86803, -2.192438, 3.005462, 0.7392393, 7.941551, -2.965803, 4.052705e-06, 0.01179281, FALSE, FALSE, 2.024091, 2, 12, 2, 0.4229655, 0.3748793, 8.795985, 1, 12, 2.204765, -0.5334389, -0.5334389, 0.1333859
Again, here we can see model_summary
is a list column which stores summary of each model. The first entry should be the same as above result.
model_nested_summarised$model_summary[[1]]
##
## Call:
## lm(formula = mpg ~ wt, data = .)
##
## Residuals:
## Mazda RX4 Mazda RX4 Wag Hornet 4 Drive Valiant Merc 280
## -0.1250 0.5840 1.9292 -0.6897 0.3547
## Merc 280C Ferrari Dino
## -1.0453 -1.0080
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.409 4.184 6.789 0.00105 **
## wt -2.780 1.335 -2.083 0.09176 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.165 on 5 degrees of freedom
## Multiple R-squared: 0.4645, Adjusted R-squared: 0.3574
## F-statistic: 4.337 on 1 and 5 DF, p-value: 0.09176
We can extract the necessary information from those list columns separately.
model_nested_summarised$model_summary[[1]][["r.squared"]]
## [1] 0.4645102
But lets go one step and get all r.squared
values of each model and store them under separate column with the help of Purrr.
model_nested_summarised %>%
mutate(rsqr = map_dbl(model_summary,~.[["r.squared"]]))
## cyl
## 1 6
## 2 4
## 3 8
## data
## 1 21.000, 21.000, 21.400, 18.100, 19.200, 17.800, 19.700, 160.000, 160.000, 258.000, 225.000, 167.600, 167.600, 145.000, 110.000, 110.000, 110.000, 105.000, 123.000, 123.000, 175.000, 3.900, 3.900, 3.080, 2.760, 3.920, 3.920, 3.620, 2.620, 2.875, 3.215, 3.460, 3.440, 3.440, 2.770, 16.460, 17.020, 19.440, 20.220, 18.300, 18.900, 15.500, 0.000, 0.000, 1.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 0.000, 0.000, 0.000, 0.000, 1.000, 4.000, 4.000, 3.000, 3.000, 4.000, 4.000, 5.000, 4.000, 4.000, 1.000, 1.000, 4.000, 4.000, 6.000
## 2 22.800, 24.400, 22.800, 32.400, 30.400, 33.900, 21.500, 27.300, 26.000, 30.400, 21.400, 108.000, 146.700, 140.800, 78.700, 75.700, 71.100, 120.100, 79.000, 120.300, 95.100, 121.000, 93.000, 62.000, 95.000, 66.000, 52.000, 65.000, 97.000, 66.000, 91.000, 113.000, 109.000, 3.850, 3.690, 3.920, 4.080, 4.930, 4.220, 3.700, 4.080, 4.430, 3.770, 4.110, 2.320, 3.190, 3.150, 2.200, 1.615, 1.835, 2.465, 1.935, 2.140, 1.513, 2.780, 18.610, 20.000, 22.900, 19.470, 18.520, 19.900, 20.010, 18.900, 16.700, 16.900, 18.600, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000, 0.000, 0.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 4.000, 4.000, 4.000, 4.000, 4.000, 4.000, 3.000, 4.000, 5.000, 5.000, 4.000, 1.000, 2.000, 2.000, 1.000, 2.000, 1.000, 1.000, 1.000, 2.000, 2.000, 2.000
## 3 18.700, 14.300, 16.400, 17.300, 15.200, 10.400, 10.400, 14.700, 15.500, 15.200, 13.300, 19.200, 15.800, 15.000, 360.000, 360.000, 275.800, 275.800, 275.800, 472.000, 460.000, 440.000, 318.000, 304.000, 350.000, 400.000, 351.000, 301.000, 175.000, 245.000, 180.000, 180.000, 180.000, 205.000, 215.000, 230.000, 150.000, 150.000, 245.000, 175.000, 264.000, 335.000, 3.150, 3.210, 3.070, 3.070, 3.070, 2.930, 3.000, 3.230, 2.760, 3.150, 3.730, 3.080, 4.220, 3.540, 3.440, 3.570, 4.070, 3.730, 3.780, 5.250, 5.424, 5.345, 3.520, 3.435, 3.840, 3.845, 3.170, 3.570, 17.020, 15.840, 17.400, 17.600, 18.000, 17.980, 17.820, 17.420, 16.870, 17.300, 15.410, 17.050, 14.500, 14.600, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 1.000, 1.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 5.000, 5.000, 2.000, 4.000, 3.000, 3.000, 3.000, 4.000, 4.000, 4.000, 2.000, 2.000, 4.000, 2.000, 4.000, 8.000
## model
## 1 28.40884, -2.780106, -0.124967, 0.5839601, 1.929196, -0.689678, 0.3547199, -1.04528, -1.007951, -52.23469, -2.426656, 2.111436, -0.3526643, 0.679099, -0.720901, -1.10683, 2, 21.12497, 20.41604, 19.4708, 18.78968, 18.84528, 18.84528, 20.70795, 0, 1, -2.645751, 0.3779645, 0.3779645, 0.3779645, 0.3779645, 0.3779645, 0.3779645, -8.247185, 0.8728647, -0.2683341, -0.5490191, -0.526106, -0.526106, 0.2414814, 1.377964, 1.121188, 1, 2, 1e-07, 2, 5, lm(formula = mpg ~ wt, data = .), mpg ~ wt, 21, 21, 21.4, 18.1, 19.2, 17.8, 19.7, 2.62, 2.875, 3.215, 3.46, 3.44, 3.44, 2.77
## 2 39.5712, -5.647025, -3.670097, 2.842815, 1.016934, 5.25226, -0.05125022, 4.691095, -4.151279, -1.344202, -1.486562, -0.6272468, -2.472466, -88.43328, 10.17096, 0.6947654, 6.230721, 1.728126, 6.169273, -3.535624, -0.00293297, -0.4259551, 1.291776, -2.288073, 2, 26.4701, 21.55719, 21.78307, 27.14774, 30.45125, 29.2089, 25.65128, 28.6442, 27.48656, 31.02725, 23.87247, 0, 1, -3.316625, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, -7.5809, -1.801119, 0.4754451, -0.05200489, -0.376803, -0.2546567, 0.0951259, -0.1991357, -0.08531752, -0.4334345, 0.2700172, 1.301511, 1.497654, 1, 2, 1e-07, 2, 9, lm(formula = mpg ~ wt, data = .), mpg ~ wt, 22.8, 24.4, 22.8, 32.4, 30.4, 33.9, 21.5, 27.3, 26, 30.4, 21.4, 2.32, 3.19, 3.15, 2.2, 1.615, 1.835, 2.465, 1.935, 2.14, 1.513, 2.78
## 3 23.86803, -2.192438, 2.373957, -1.741026, 1.455193, 1.609764, -0.3806137, -1.95773, -1.576246, 2.550552, -0.6506476, -1.137005, -2.149067, 3.761895, -1.118001, -1.041026, -56.49903, -6.003055, 0.8157971, 1.220314, -0.8068206, -3.464586, -3.211015, 0.9738578, -0.8857193, -1.30959, -2.619382, 3.287904, -1.095775, -1.312854, 2, 16.32604, 16.04103, 14.94481, 15.69024, 15.58061, 12.35773, 11.97625, 12.14945, 16.15065, 16.337, 15.44907, 15.43811, 16.918, 16.04103, 0, 1, -3.741657, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, -14.96369, 2.738073, -0.06892519, 0.05524975, 0.03698873, -0.4998852, -0.5634336, -0.5345812, 0.131946, 0.1629898, 0.0150755, 0.0132494, 0.2597732, 0.113685, 1.267261, 1.113685, 1, 2, 1e-07, 2, 12, lm(formula = mpg ~ wt, data = .), mpg ~ wt, 18.7, 14.3, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 15.5, 15.2, 13.3, 19.2, 15.8, 15, 3.44, 3.57, 4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 3.52, 3.435, 3.84, 3.845, 3.17, 3.57
## model_summary
## 1 lm(formula = mpg ~ wt, data = .), mpg ~ wt, -0.124967, 0.5839601, 1.929196, -0.689678, 0.3547199, -1.04528, -1.007951, 28.40884, -2.780106, 4.184369, 1.334917, 6.789278, -2.082605, 0.001054844, 0.09175766, FALSE, FALSE, 1.165202, 2, 5, 2, 0.4645102, 0.3574122, 4.337245, 1, 5, 12.89607, -4.091314, -4.091314, 1.312521
## 2 lm(formula = mpg ~ wt, data = .), mpg ~ wt, -3.670097, 2.842815, 1.016934, 5.25226, -0.05125022, 4.691095, -4.151279, -1.344202, -1.486562, -0.6272468, -2.472466, 39.5712, -5.647025, 4.346582, 1.850119, 9.10398, -3.052251, 7.771511e-06, 0.01374278, FALSE, FALSE, 3.332283, 2, 9, 2, 0.5086326, 0.4540362, 9.316233, 1, 9, 1.701422, -0.7045954, -0.7045954, 0.3082587
## 3 lm(formula = mpg ~ wt, data = .), mpg ~ wt, 2.373957, -1.741026, 1.455193, 1.609764, -0.3806137, -1.95773, -1.576246, 2.550552, -0.6506476, -1.137005, -2.149067, 3.761895, -1.118001, -1.041026, 23.86803, -2.192438, 3.005462, 0.7392393, 7.941551, -2.965803, 4.052705e-06, 0.01179281, FALSE, FALSE, 2.024091, 2, 12, 2, 0.4229655, 0.3748793, 8.795985, 1, 12, 2.204765, -0.5334389, -0.5334389, 0.1333859
## rsqr
## 1 0.4645102
## 2 0.5086326
## 3 0.4229655
Here we used map_dbl() as we want our result as Doubles not a list. Thus, we managed to get all the requested values under one dataframe. This is neat.
Now let’s try all of this using a package called broom
. It presents all the result into a tidy format. I will compact all into one single block this time.
all_in_one<- mtcars %>%
nest(-cyl) %>%
mutate(model = map(data, ~lm(mpg~wt, data = .)),
tidied = map(model,glance)) %>%
unnest(tidied)
all_in_one
## cyl
## 1 6
## 2 4
## 3 8
## data
## 1 21.000, 21.000, 21.400, 18.100, 19.200, 17.800, 19.700, 160.000, 160.000, 258.000, 225.000, 167.600, 167.600, 145.000, 110.000, 110.000, 110.000, 105.000, 123.000, 123.000, 175.000, 3.900, 3.900, 3.080, 2.760, 3.920, 3.920, 3.620, 2.620, 2.875, 3.215, 3.460, 3.440, 3.440, 2.770, 16.460, 17.020, 19.440, 20.220, 18.300, 18.900, 15.500, 0.000, 0.000, 1.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 0.000, 0.000, 0.000, 0.000, 1.000, 4.000, 4.000, 3.000, 3.000, 4.000, 4.000, 5.000, 4.000, 4.000, 1.000, 1.000, 4.000, 4.000, 6.000
## 2 22.800, 24.400, 22.800, 32.400, 30.400, 33.900, 21.500, 27.300, 26.000, 30.400, 21.400, 108.000, 146.700, 140.800, 78.700, 75.700, 71.100, 120.100, 79.000, 120.300, 95.100, 121.000, 93.000, 62.000, 95.000, 66.000, 52.000, 65.000, 97.000, 66.000, 91.000, 113.000, 109.000, 3.850, 3.690, 3.920, 4.080, 4.930, 4.220, 3.700, 4.080, 4.430, 3.770, 4.110, 2.320, 3.190, 3.150, 2.200, 1.615, 1.835, 2.465, 1.935, 2.140, 1.513, 2.780, 18.610, 20.000, 22.900, 19.470, 18.520, 19.900, 20.010, 18.900, 16.700, 16.900, 18.600, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000, 0.000, 0.000, 1.000, 1.000, 1.000, 0.000, 1.000, 1.000, 1.000, 1.000, 4.000, 4.000, 4.000, 4.000, 4.000, 4.000, 3.000, 4.000, 5.000, 5.000, 4.000, 1.000, 2.000, 2.000, 1.000, 2.000, 1.000, 1.000, 1.000, 2.000, 2.000, 2.000
## 3 18.700, 14.300, 16.400, 17.300, 15.200, 10.400, 10.400, 14.700, 15.500, 15.200, 13.300, 19.200, 15.800, 15.000, 360.000, 360.000, 275.800, 275.800, 275.800, 472.000, 460.000, 440.000, 318.000, 304.000, 350.000, 400.000, 351.000, 301.000, 175.000, 245.000, 180.000, 180.000, 180.000, 205.000, 215.000, 230.000, 150.000, 150.000, 245.000, 175.000, 264.000, 335.000, 3.150, 3.210, 3.070, 3.070, 3.070, 2.930, 3.000, 3.230, 2.760, 3.150, 3.730, 3.080, 4.220, 3.540, 3.440, 3.570, 4.070, 3.730, 3.780, 5.250, 5.424, 5.345, 3.520, 3.435, 3.840, 3.845, 3.170, 3.570, 17.020, 15.840, 17.400, 17.600, 18.000, 17.980, 17.820, 17.420, 16.870, 17.300, 15.410, 17.050, 14.500, 14.600, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 1.000, 1.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 3.000, 5.000, 5.000, 2.000, 4.000, 3.000, 3.000, 3.000, 4.000, 4.000, 4.000, 2.000, 2.000, 4.000, 2.000, 4.000, 8.000
## model
## 1 28.40884, -2.780106, -0.124967, 0.5839601, 1.929196, -0.689678, 0.3547199, -1.04528, -1.007951, -52.23469, -2.426656, 2.111436, -0.3526643, 0.679099, -0.720901, -1.10683, 2, 21.12497, 20.41604, 19.4708, 18.78968, 18.84528, 18.84528, 20.70795, 0, 1, -2.645751, 0.3779645, 0.3779645, 0.3779645, 0.3779645, 0.3779645, 0.3779645, -8.247185, 0.8728647, -0.2683341, -0.5490191, -0.526106, -0.526106, 0.2414814, 1.377964, 1.121188, 1, 2, 1e-07, 2, 5, lm(formula = mpg ~ wt, data = .), mpg ~ wt, 21, 21, 21.4, 18.1, 19.2, 17.8, 19.7, 2.62, 2.875, 3.215, 3.46, 3.44, 3.44, 2.77
## 2 39.5712, -5.647025, -3.670097, 2.842815, 1.016934, 5.25226, -0.05125022, 4.691095, -4.151279, -1.344202, -1.486562, -0.6272468, -2.472466, -88.43328, 10.17096, 0.6947654, 6.230721, 1.728126, 6.169273, -3.535624, -0.00293297, -0.4259551, 1.291776, -2.288073, 2, 26.4701, 21.55719, 21.78307, 27.14774, 30.45125, 29.2089, 25.65128, 28.6442, 27.48656, 31.02725, 23.87247, 0, 1, -3.316625, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, 0.3015113, -7.5809, -1.801119, 0.4754451, -0.05200489, -0.376803, -0.2546567, 0.0951259, -0.1991357, -0.08531752, -0.4334345, 0.2700172, 1.301511, 1.497654, 1, 2, 1e-07, 2, 9, lm(formula = mpg ~ wt, data = .), mpg ~ wt, 22.8, 24.4, 22.8, 32.4, 30.4, 33.9, 21.5, 27.3, 26, 30.4, 21.4, 2.32, 3.19, 3.15, 2.2, 1.615, 1.835, 2.465, 1.935, 2.14, 1.513, 2.78
## 3 23.86803, -2.192438, 2.373957, -1.741026, 1.455193, 1.609764, -0.3806137, -1.95773, -1.576246, 2.550552, -0.6506476, -1.137005, -2.149067, 3.761895, -1.118001, -1.041026, -56.49903, -6.003055, 0.8157971, 1.220314, -0.8068206, -3.464586, -3.211015, 0.9738578, -0.8857193, -1.30959, -2.619382, 3.287904, -1.095775, -1.312854, 2, 16.32604, 16.04103, 14.94481, 15.69024, 15.58061, 12.35773, 11.97625, 12.14945, 16.15065, 16.337, 15.44907, 15.43811, 16.918, 16.04103, 0, 1, -3.741657, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, 0.2672612, -14.96369, 2.738073, -0.06892519, 0.05524975, 0.03698873, -0.4998852, -0.5634336, -0.5345812, 0.131946, 0.1629898, 0.0150755, 0.0132494, 0.2597732, 0.113685, 1.267261, 1.113685, 1, 2, 1e-07, 2, 12, lm(formula = mpg ~ wt, data = .), mpg ~ wt, 18.7, 14.3, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 15.5, 15.2, 13.3, 19.2, 15.8, 15, 3.44, 3.57, 4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 3.52, 3.435, 3.84, 3.845, 3.17, 3.57
## r.squared adj.r.squared sigma statistic p.value df logLik
## 1 0.4645102 0.3574122 1.165202 4.337245 0.09175766 2 -9.82518
## 2 0.5086326 0.4540362 3.332283 9.316233 0.01374278 2 -27.74487
## 3 0.4229655 0.3748793 2.024091 8.795985 0.01179281 2 -28.65778
## AIC BIC deviance df.residual
## 1 25.65036 25.48809 6.788481 5
## 2 61.48974 62.68342 99.936983 9
## 3 63.31555 65.23272 49.163336 12
We use glance
function to return all the output on each row-wise basis for each dataframe. We can now use this output straight to plot some nice figures.
all_in_one %>%
ggplot(aes(factor(cyl),r.squared, fill=cyl))+
geom_bar(stat = "identity", show.legend = FALSE)
Conclusion
This is not all we have about Purrr
. I mearly just scratched the surface. If you want to know more please visit a fantastic demonstration by Charlotte Wickham. Another fantastic blog is by Simon Jackson