Skip to content

Latest commit

 

History

History
188 lines (158 loc) · 37.7 KB

CodeBook.md

File metadata and controls

188 lines (158 loc) · 37.7 KB

Data Dictionary

for: Coursera/JHU, Getting and Cleaning Data, Week 4, Course Project
by: Steven Balzer
date: 2021-02-13

The Variables

Data set overview

Feature Result
Number of observations 180
Number of variables 81

Codebook summary table

Variable Class # unique values Missing Description
[activity] character 6 0.00 % Labels describing the type of activity taking place during the measurement. The original data set contained numeric values ranging from 1 through 6. Using ‘activity_labels.txt’ these were replaced with the corresponding descriptive activity name: WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, and LAYING.
[subject] integer 30 0.00 % An identifier of the subject who carried out the experiment. Subject identifiers range from 1 through 30.
[tBodyAcc_mean_X] numeric 180 0.00 % This, and all of the remaining variables, are a subset of the original data set. The values stored for each are the calculated mean for each combination of activity and subject. All are derived from values that are normalized and bounded within the range of -1 and 1.
[tBodyAcc_mean_Y] numeric 180 0.00 %
[tBodyAcc_mean_Z] numeric 180 0.00 %
[tBodyAcc_std_X] numeric 180 0.00 %
[tBodyAcc_std_Y] numeric 180 0.00 %
[tBodyAcc_std_Z] numeric 180 0.00 %
[tGravityAcc_mean_X] numeric 180 0.00 %
[tGravityAcc_mean_Y] numeric 180 0.00 %
[tGravityAcc_mean_Z] numeric 180 0.00 %
[tGravityAcc_std_X] numeric 180 0.00 %
[tGravityAcc_std_Y] numeric 180 0.00 %
[tGravityAcc_std_Z] numeric 180 0.00 %
[tBodyAccJerk_mean_X] numeric 180 0.00 %
[tBodyAccJerk_mean_Y] numeric 180 0.00 %
[tBodyAccJerk_mean_Z] numeric 180 0.00 %
[tBodyAccJerk_std_X] numeric 180 0.00 %
[tBodyAccJerk_std_Y] numeric 180 0.00 %
[tBodyAccJerk_std_Z] numeric 180 0.00 %
[tBodyGyro_mean_X] numeric 180 0.00 %
[tBodyGyro_mean_Y] numeric 180 0.00 %
[tBodyGyro_mean_Z] numeric 180 0.00 %
[tBodyGyro_std_X] numeric 180 0.00 %
[tBodyGyro_std_Y] numeric 180 0.00 %
[tBodyGyro_std_Z] numeric 180 0.00 %
[tBodyGyroJerk_mean_X] numeric 180 0.00 %
[tBodyGyroJerk_mean_Y] numeric 180 0.00 %
[tBodyGyroJerk_mean_Z] numeric 180 0.00 %
[tBodyGyroJerk_std_X] numeric 180 0.00 %
[tBodyGyroJerk_std_Y] numeric 180 0.00 %
[tBodyGyroJerk_std_Z] numeric 180 0.00 %
[tBodyAccMag_mean] numeric 180 0.00 %
[tBodyAccMag_std] numeric 180 0.00 %
[tGravityAccMag_mean] numeric 180 0.00 %
[tGravityAccMag_std] numeric 180 0.00 %
[tBodyAccJerkMag_mean] numeric 180 0.00 %
[tBodyAccJerkMag_std] numeric 180 0.00 %
[tBodyGyroMag_mean] numeric 180 0.00 %
[tBodyGyroMag_std] numeric 180 0.00 %
[tBodyGyroJerkMag_mean] numeric 180 0.00 %
[tBodyGyroJerkMag_std] numeric 180 0.00 %
[fBodyAcc_mean_X] numeric 180 0.00 %
[fBodyAcc_mean_Y] numeric 180 0.00 %
[fBodyAcc_mean_Z] numeric 180 0.00 %
[fBodyAcc_std_X] numeric 180 0.00 %
[fBodyAcc_std_Y] numeric 180 0.00 %
[fBodyAcc_std_Z] numeric 180 0.00 %
[fBodyAcc_meanFreq_X] numeric 180 0.00 %
[fBodyAcc_meanFreq_Y] numeric 180 0.00 %
[fBodyAcc_meanFreq_Z] numeric 180 0.00 %
[fBodyAccJerk_mean_X] numeric 180 0.00 %
[fBodyAccJerk_mean_Y] numeric 180 0.00 %
[fBodyAccJerk_mean_Z] numeric 180 0.00 %
[fBodyAccJerk_std_X] numeric 180 0.00 %
[fBodyAccJerk_std_Y] numeric 180 0.00 %
[fBodyAccJerk_std_Z] numeric 180 0.00 %
[fBodyAccJerk_meanFreq_X] numeric 180 0.00 %
[fBodyAccJerk_meanFreq_Y] numeric 180 0.00 %
[fBodyAccJerk_meanFreq_Z] numeric 180 0.00 %
[fBodyGyro_mean_X] numeric 180 0.00 %
[fBodyGyro_mean_Y] numeric 180 0.00 %
[fBodyGyro_mean_Z] numeric 180 0.00 %
[fBodyGyro_std_X] numeric 180 0.00 %
[fBodyGyro_std_Y] numeric 180 0.00 %
[fBodyGyro_std_Z] numeric 180 0.00 %
[fBodyGyro_meanFreq_X] numeric 180 0.00 %
[fBodyGyro_meanFreq_Y] numeric 180 0.00 %
[fBodyGyro_meanFreq_Z] numeric 180 0.00 %
[fBodyAccMag_mean] numeric 180 0.00 %
[fBodyAccMag_std] numeric 180 0.00 %
[fBodyAccMag_meanFreq] numeric 180 0.00 %
[fBodyBodyAccJerkMag_mean] numeric 180 0.00 %
[fBodyBodyAccJerkMag_std] numeric 180 0.00 %
[fBodyBodyAccJerkMag_meanFreq] numeric 180 0.00 %
[fBodyBodyGyroMag_mean] numeric 180 0.00 %
[fBodyBodyGyroMag_std] numeric 180 0.00 %
[fBodyBodyGyroMag_meanFreq] numeric 180 0.00 %
[fBodyBodyGyroJerkMag_mean] numeric 180 0.00 %
[fBodyBodyGyroJerkMag_std] numeric 180 0.00 %
[fBodyBodyGyroJerkMag_meanFreq] numeric 180 0.00 %

The Data

Abstract: Human Activity Recognition database built from the recordings of 30 subjects performing activities of daily living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors.

A full description is available at UCI’s Center for Machine Learning and Intelligent Systems (link) from where the source data set (link) was obtained.

The original project and related data is provided by:

Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. Human Activity Recognition on Smartphones using a Multiclass Hardware-Friendly Support Vector Machine. International Workshop of Ambient Assisted Living (IWAAL 2012). Vitoria-Gasteiz, Spain. Dec 2012

Transformations

Note: See ‘run_analysis.R’ for the detailed step-by-step transformations and analysis for this project.

1.0) Merge the training and the test data sets into one.

1.1) All 3 training data files - ‘X_train.txt’, ‘y_train.txt’, ‘subject_train.txt’ - each containing 7,352 observations are combined, vertically, so that the result still has 7,352 observations.

1.2) A partition variable with the value ‘training’ is added to all observations in 1.1 in order to differentiate between data obtained from training vs. test subjects.

1.3) All 3 training data files - ‘X_test.txt’, ‘y_test.txt’, ‘subject_test.txt’ - each containing 2,947 observations are combined, vertically, so that the result still has 2,947 observations.

1.4) A partition variable with the value ‘test’ is added to all observations in 1.3 in order to differentiate between data obtained from training vs. test subjects.

1.5) The the data sets from 1.2 and 1.4 are combined, horizontally, so that the result is a total of 10,299 observations (7,352 training and 2,947 test). This data set includes 3 descriptive variables - ‘subject’, ‘partition’, ‘activity’ - and all 561 measurement variables.

2.0) Extract only the measurements on the mean and standard deviation for each measurement.

2.1) Using ‘features.txt’ as a reference, the variable names that contain either ‘-mean’ or ‘-std’ are selected. This reduces the number of measurement variables from 561 down to 79.

2.2) Since the information is readily available, the variable names are also updated to something more meaningful at this stage. Therefore, some of the work expected in 4.0 is done here.

3.0) Use descriptive activity names to name the activities in the data set.

3.1) Using ‘activity_labels.txt’, the numeric ‘activity’ variable is replaced with a more meaningful descriptive label.

4.0) Appropriately label/tidy the data set with descriptive variable names.

4.1) The variable names listed in ‘features.txt’ contained parentheses and hyphens. When written to the data frame, these invalid variable name characters were replaced with a period (full stop). These are cleaned up using a single underscore to make the variable names tidy and easier to read.

5.0) Create an independent tidy data set with the average of each variable for each activity and each subject.

5.1) Transform each measurement in 4.1 to a single observation.

5.2) With the measurement values in 5.1, calculate the mean for each combination of activity, subject, and measurement variable.

5.3) Transform back to a tidy wide data set where each observation consists of mean values from 5.2 for each of the 79 measurements for each activity-subject combination.