Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R Programming/Looking At Data: summary(plants) output does not match description #526

Open
yesezra opened this issue Aug 2, 2023 · 2 comments

Comments

@yesezra
Copy link

yesezra commented Aug 2, 2023

Hello! I'm currently going through the R Programming>Looking at Data lesson. I'm in R 4.3.1/RStudio 2023.06.1+524 on macOS 13.5.

After being instructed to try summary(plants), I get the following output:

 Scientific_Name      Duration         Active_Growth_Period Foliage_Color          pH_Min     
 Length:5166        Length:5166        Length:5166          Length:5166        Min.   :3.000  
 Class :character   Class :character   Class :character     Class :character   1st Qu.:4.500  
 Mode  :character   Mode  :character   Mode  :character     Mode  :character   Median :5.000  
                                                                               Mean   :4.997  
                                                                               3rd Qu.:5.500  
                                                                               Max.   :7.000  
                                                                               NA's   :4327   
     pH_Max         Precip_Min      Precip_Max     Shade_Tolerance      Temp_Min_F    
 Min.   : 5.100   Min.   : 4.00   Min.   : 16.00   Length:5166        Min.   :-79.00  
 1st Qu.: 7.000   1st Qu.:16.75   1st Qu.: 55.00   Class :character   1st Qu.:-38.00  
 Median : 7.300   Median :28.00   Median : 60.00   Mode  :character   Median :-33.00  
 Mean   : 7.344   Mean   :25.57   Mean   : 58.73                      Mean   :-22.53  
 3rd Qu.: 7.800   3rd Qu.:32.00   3rd Qu.: 60.00                      3rd Qu.:-18.00  
 Max.   :10.000   Max.   :60.00   Max.   :200.00                      Max.   : 52.00  
 NA's   :4327     NA's   :4338    NA's   :4338                        NA's   :4328  

However, the output is described by the lesson as follows:

Duration (also a factor variable) tells us that our dataset contains 3031 Perennial plants, 682 Annual plants, etc.

This does not match the output, which shows Duration as a character, not factor, variable. This also occurs with Active_Growth_Period, which is described as:

| You can see that R truncated the summary for Active_Growth_Period by including a catch-all
| category called 'Other'. Since it is a categorical/factor variable, we can see how many times
| each value actually occurs in the data with table(plants$Active_Growth_Period).

Perhaps something changed in the dataset or default output of summary, but this is confusing and I'm not sure how to get output that matches the description. Many thanks for maintaining this valuable project!

@yesezra
Copy link
Author

yesezra commented Aug 2, 2023

In case this is useful for any other beginners finding this issue, I worked around it by coercing the appropriate columns from character vectors into factors:

plants$Active_Growth_Period <- as.factor(plants$Active_Growth_Period)
plants$Duration <- as.factor(plants$Duration)
plants$Foliage_Color <- as.factor(plants$Foliage_Color)
plants$Shade_Tolerance <- as.factor(plants$Shade_Tolerance)

@gdickens
Copy link

Just adding my voice here: faced the same issue.

The Scientific_Name variable is a character, not a factor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants