Skip to content
Adam VanIwaarden edited this page Oct 19, 2015 · 4 revisions

2015 SGP Analyses

For the 2015 Utah analyses, we are following a work flow that includes the following 5 steps:

  1. Create annual SGP configurations for End-of-Grade Test (EOGT) and End-of-Course Test(EOCT) analyses
  2. Update the SGPstateData object in the SGP package. For 2015 this includes a) updating the norm group preferences, b) adding meta-data used in the calculation of student growth percentiles and projections, including the SAGE test knots and boundaries for the cubic basis splines and the test specific proficiency cutscores, and c) adding new SAGE related meta-data for the production of individual student reports (ISRs).
  3. Conduct EOGT and EOCT SGP Analyses.
  4. Export data, and produce summaries and visualizations from the Utah_SGP object (including individual student reports (ISRs)).

1. Create annual SGP configurations.

Unlike grade-level analyses, EOCT analyses are specialized enough so that it is necessary to explicitly specify the analyses to be performed via a configuration code script. The configurations associated with the 2015 annual EOCT SGP analyses are located in the Utah repo folder SGP_CONFIG. The configurations are broken up into two separate R scripts: MATHEMATICS.R, and SCIENCE.R.

Unlike previous years, where EOG analyses were run separately and did not require configuration scripts, the 2015 analyses specify the configurations for these analyses. This allows us to run the analyses in a single call to the updateSGP function, as well as combine the results with a single call to combineSGP. The EOG configurations are located in the Utah repo folder SGP_CONFIG/EOGT.

Each configuration specifies a set of parameters that defines the norm group of students to be examined. Every potential norm group is defined by, at a minimum, the progressions of content area, academic year and grade-level. Other parameters may also be defined. Each configuration used for the Utah EOCT analyses contain the first three elements. The EOCT analyses also contain the fourth and fifth elements:

  • sgp.content.areas: A progression of values that specifies the content areas to be looked at and their order
  • sgp.panel.years: The progression of the years associated with the content area progression (sgp.content.areas) provided in the configuration, potentially allowing for skipped years, etc.
  • sgp.grade.sequences: The grade progression associated with the content area and year progressions provided in the configuration. 'EOCT' stands for 'End Of Course Test'. The use of the generic 'EOCT' allows for secondary students to be compared based on the pattern of course taking rather than being dependent upon grade-level/class-designation.
  • sgp.projection.grade.sequences: This element is used to identify the configurations that will be used to produce straight and/or lagged student growth projections. It can, somewhat counterintuitively, be left out or set to NULL, in which case projections will be produced and the package functions will populate the grade sequences to use based on the values provided in the sgp.grade.sequences element. Alternatively, when set to "NO_PROJECTIONS", no projections will be produced. For EOCT analyses, only configurations that correspond to the canonical course progressions can produce straight or lagged student growth projections. The canonical progressions are codified in SGPstateData[["UT"]][["SGP_Configuration"]][["content_area.projection.sequence"]].
  • sgp.norm.group.preference: Because a student can potentially be included in more than one analysis/configuration, this argument provides a ranking specifying which SGP is preferable and will ultimately be the SGP matched with the student in the combineSGP step. Lower numbers correspond with higher preference.

Note that sgp.content.areas, sgp.panel.years, and sgp.grade.sequences elements are all character strings, and their values correspond to levels found in the CONTENT_AREA, YEAR, and GRADE variables in the Utah_SGP@Data slot respectively. Only these three elements are needed for the EOGT analyses because they automatically fall into canonical projection sequences and there will not be any duplicates produced given the data cleaning process.

As an example of the EOCT configurations, here is an excerpt from the 2015 Physics configuration script:

PHYSICS_2015.config <- list(
# VIA CHEMISTRY
	PHYSICS.2015 = list(
		sgp.content.areas=c(rep('SCIENCE', 2), 'EARTH_SCIENCE', 'BIOLOGY', 'CHEMISTRY', 'PHYSICS'),
		sgp.panel.years=as.character(2010:2015),
		sgp.grade.sequences=list(c(7:8, 'EOCT', 'EOCT', 'EOCT', 'EOCT')),
		# sgp.projection.grade.sequences=NULL, # Canonical progression
		sgp.norm.group.preference=1),

	PHYSICS.2015 = list(
		sgp.content.areas=c(rep('SCIENCE', 3), 'BIOLOGY', 'CHEMISTRY', 'PHYSICS'),
		sgp.panel.years=as.character(2010:2015),
		sgp.grade.sequences=list(c(6:8, 'EOCT', 'EOCT', 'EOCT')),
		sgp.projection.grade.sequences="NO_PROJECTIONS",
		sgp.norm.group.preference=1),

# VIA BIOLOGY
PHYSICS.2015 = list( # NOT ENOUGH STUDENTS - 2015 -  ONLY 'BIOLOGY', 'PHYSICS'
	sgp.content.areas=c(rep('SCIENCE',3), 'EARTH_SCIENCE', 'BIOLOGY', 'PHYSICS'),
	sgp.panel.years=as.character(2010:2015),
	sgp.grade.sequences=list(c(6:8,'EOCT','EOCT','EOCT')),
	sgp.projection.grade.sequences="NO_PROJECTIONS",
	sgp.norm.group.preference=3),

PHYSICS.2015 = list( # NOT ENOUGH STUDENTS - 2015 -  ONLY 'BIOLOGY', 'PHYSICS'
		sgp.content.areas=c(rep('SCIENCE',4), 'BIOLOGY', 'PHYSICS'),
		sgp.panel.years=as.character(2010:2015),
		sgp.grade.sequences=list(c(5:8,'EOCT','EOCT')),
		sgp.projection.grade.sequences="NO_PROJECTIONS",
		sgp.norm.group.preference=3)

2. Update the SGPstateData

Norm group preferences

Configurations are R scripts that are sourced as part of the larger SGP analysis to be discussed later. In addition, the SGPstateData needs to be updated with the norm group preference embedded within the configurations. To do this, an Rdata object needs to be constructed and then embedded within SGPstateData (either manually or included in the package build itself). To create the Rdata object with the norm groups preferences utilize/source the R script configToSGPNormGroup.R in the SGP_CONFIG folder as follows:

source("configToSGPNormGroup.R")

This creates and saves the Rdata object UT_SGP_Norm_Group_Preference.Rdata containing the norm group preferences (this object is just a data.frame/data.table containing information about what the rank ordering of the configurations are in terms of preference). It can either be embedded into SGPstateData manually or, preferably, submitted to the SGP Package maintainers for inclusion in the package so that it is contained in SGPstateData when the package is loaded.

Calculation of knots and boundaries

During the data preparation step, we have a temporary data object that contains the entire 2014 and 2015 SAGE data, ALL_SAGE, which is suitable for calculating the knots and boundaries used to produce cubic basis spline functions incorporated in the production of the SGPs.

###  Create Knots and Boundaries for SAGE Tests

ALL_SAGE[, SCALE_SCORE := as.numeric(SCALE_SCORE)]
sage.kbs <- createKnotsBoundaries(ALL_SAGE[YEAR %in% c(2014, 2015)])

###  Example of mathematics related output to be inserted into SGPstateData:
str(sage.kbs[["MATHEMATICS"]])

The output from the code above, sage.kbs, is a list class object with the knots and boundary values for each grade-level by subject. That information is then manually added into the SGPstateData code-base. Because these values are the SAGE specific values, they are indexed using the subject and ".2014" to identify when the assessment transition occurred. So, for example, there are now two entries for ELA: "ELA.2014" for the SAGE test and "ELA" for the original CRT tests.

SAGE Cutscores

SAGE assessment cutscores were also added to the SGPstateData. The cutscore data was provided to the Center for Assessment staff in 2014 and added at that time. This year is the first year in which those values will be used. They are required for the calculation of Student Growth Projections and ISR production. Because the previous assessment program cutscores are not needed for the projection calculations, and the ISRs will only depict the two years of SAGE testing (i.e. no assessment transition depicted), the CRT cutscores have been commented out of the SGPstateData code-base.

Additional meta data

Finally, meta data for the ISR production was added. Mainly this entailed updating the Assessment_Program_Information and Student_Report_Information sections. The entire 2015 Utah entry can be viewed here.

3. Conduct SGP Analyses.

Unlike the 2014 analyses, we use the updateSGP function to A) do the final preparation and addition of the new long data to the existing SGP (S4) data object (prepareSGP step) and B) produce SGPs for both the grade-level and EOCT subjects (analyzeSGP step). Also, unlike any previous years' analyses, we will be producing SGPs for students who were non-continuously enrolled for the full academic year (FAY). The non-FAY analyses utilize coefficient matrices produced using only those students who are FAY, i.e. the students who count for accountability purposes. Therefore, there are three analyses being run with the data submitted this year by USOE, in strict order: 1) the 2014 non-FAY students, 2) 2015 FAY students and 3) 2015 non-FAY students.

2014 Non-FAY Analyses
# Load the 2014 SGP object with new prior data added in the 2015 data preparation step.
load("Data/Utah_SGP.Rdata") 

# Load the 2015 data object
load("Data/Base_Files/Utah_Data_LONG_2015.Rdata") 

#  Source the 2014 config files and creat list to supply to sgp.config argument.
#  Note:  the 2014 EOGT config file was writtin in 2015 for this purpose.

source('SGP_CONFIG/EOGT/UT_EOGT_2014.R')
source('SGP_CONFIG/EOCT/2014/MATHEMATICS.R')
source('SGP_CONFIG/EOCT/2014/SCIENCE.R')

UT.2014.config <- c(
	ELA_2014.config,
	SCIENCE_2014.config,
	MATHEMATICS_2014.config,

	EARTH_SCIENCE_2014.config, 
	BIOLOGY_2014.config, 
	CHEMISTRY_2014.config, 
	PHYSICS_2014.config,

	SEC_MATH_I_2014.config,
	SEC_MATH_II_2014.config,
	SEC_MATH_III_2014.config)

##  Don't enforce max order for 2014.  Set to 5 for 2015
##  This is necessary because the 2014 ELA SGPs contain up to 6 priors.
SGPstateData[["UT"]][["SGP_Configuration"]][["max.order.for.percentile"]] <- NULL

#  Run analyses - add 2014 data through prepareSGP step and 
#  calculate percentiles and projections through analyzeSGP step

Utah_SGP <- updateSGP(
	what_sgp_object=Utah_SGP,
	with_sgp_data_LONG=Utah_Data_LONG_2015[YEAR=='2014'],# Supply ONLY the 2014 data.
	steps=c("prepareSGP", "analyzeSGP"),
	sgp.config=UT.2014.config,
	sgp.percentiles = TRUE,
	sgp.projections = FALSE,
	sgp.projections.lagged = FALSE,
	sgp.percentiles.baseline = FALSE,
	sgp.projections.baseline = FALSE,
	sgp.projections.lagged.baseline = FALSE,
	simulate.sgps=FALSE,
	save.intermediate.results=FALSE,
	goodness.of.fit.print=FALSE,
	overwrite.existing.data=FALSE,
	sgp.use.my.coefficient.matrices=TRUE, 
	outputSGP.output.type="LONG_FINAL_YEAR_Data")

Note that in 2014 the student growth projections were not produced. This is due to the transition from CRT to SAGE assessment programs. Projections will not be available without at least one more additional year of SAGE data to form single year progression predictions.

2015 FAY (Accountability Eligible) Student Analyses
#  Source the 2014 config files and create list to supply to sgp.config argument.

source("SGP_CONFIG/EOGT/UT_EOGT_2015.R")
source("SGP_CONFIG/EOCT/2015/SCIENCE.R")
source("SGP_CONFIG/EOCT/2015/MATHEMATICS.R")

UT.config <- c(
	ELA_2015.config, 
	SCIENCE_2015.config, 
	MATHEMATICS_2015.config, 
	
	EARTH_SCIENCE_2015.config, 
	BIOLOGY_2015.config, 
	CHEMISTRY_2015.config, 
	PHYSICS_2015.config,

	SEC_MATH_I_2015.config,
	SEC_MATH_II_2015.config,
	SEC_MATH_III_2015.config)

###  Run analyses - add 2015 data through prepareSGP step and 
###  calculate percentiles and projections through analyzeSGP step

##  Reset max.order.for.percentile to 5 again
SGPstateData[["UT"]][["SGP_Configuration"]][["max.order.for.percentile"]] <- 5

##  Reset return.prior.scale.score.standardized to TRUE.  Gets set to FALSE in above call to updateSGP
SGPstateData[["UT"]][["SGP_Configuration"]][["return.prior.scale.score.standardized"]] <- TRUE

Utah_SGP <- updateSGP(
	what_sgp_object=Utah_SGP,
	with_sgp_data_LONG=Utah_Data_LONG_2015[YEAR=='2015'],# Supply ONLY the 2015 data for analysis.
	steps=c("prepareSGP", "analyzeSGP"),
	sgp.config=UT.config,
	sgp.percentiles = TRUE,
	sgp.projections = TRUE,
	sgp.projections.lagged = TRUE,
	sgp.percentiles.baseline = FALSE,
	sgp.projections.baseline = FALSE,
	sgp.projections.lagged.baseline = FALSE,
	simulate.sgps=FALSE,
	save.intermediate.results=FALSE,
	overwrite.existing.data=FALSE,
	parallel.config=list(
		BACKEND="FOREACH", TYPE="doParallel", WORKERS=12)
)
2015 Non-FAY Analyses

The final data analyses was conducted roughly a month after the initial analyses of the 2015 FAY student data. The process is very similar to the 2014 non-FAY analyses. Here we use the same configuration scripts used in the 2015 FAY analyses and the coefficient matrices from those analyses are used to produce the additional SGPs.

## Load the non-FAY cleaned data
load("Data/Base_Files/Utah_Data_LONG_2015_nonFAY.Rdata")

Utah_SGP <- updateSGP(
	what_sgp_object=Utah_SGP,
	with_sgp_data_LONG=Utah_Data_LONG_2015_nonFAY,
	steps=c("prepareSGP", "analyzeSGP"),
	sgp.config=UT.config,
	sgp.percentiles.baseline = FALSE,
	sgp.projections.baseline = FALSE,
	sgp.projections.lagged.baseline = FALSE,
	simulate.sgps=FALSE,
	save.intermediate.results=FALSE,
	goodness.of.fit.print=FALSE,
	overwrite.existing.data=FALSE,
	sgp.use.my.coefficient.matrices=TRUE,
	outputSGP.output.type="LONG_FINAL_YEAR_Data")

4. Export data, and produce summaries and visualizations from the Utah_SGP object.

We will now combine the results from the three analyses into the longitudinal data in the SGP object (@Data) using the combineSGP function. In this step we also produce the "target scale scores". These values may be used by USOE in the future for accountability purposes. Once the results have been merged in, we can produce numerous summary tables and visualizations. The student-level data is also output as pipe-delimited text files in both longitudinal and wide formats.Data/

# MERGE LONG DATA WITH SGP RESULTS
Utah_SGP <- combineSGP(
	Utah_SGP,
	sgp.percentiles.baseline = FALSE,
	sgp.projections.baseline = FALSE,
	sgp.projections.lagged.baseline = FALSE,
	sgp.target.scale.scores = TRUE,
	sgp.config=UT.config[sapply(UT.config, function(x) is.null(x[["sgp.projection.grade.sequences"]]))],
	parallel.config=list(
		BACKEND="FOREACH", TYPE="doParallel",
			WORKERS=list(SGP_SCALE_SCORE_TARGETS=12)))

# CALCULATE SUMMARY STATISTICS TO BE STORED IN DATA OBJECT

Utah_SGP <- summarizeSGP(
	Utah_SGP,
	parallel.config=list(
		BACKEND="FOREACH", TYPE="doParallel", 
		WORKERS=list(SUMMARY=12)))

save(Data/Utah_SGP, file="Utah_SGP.Rdata")

# EXPORT DATA FROM SGP OBJECT @Data SLOT AS PIPE-DELIMITED FILE IN VARIOUS FORMATS

outputSGP(Utah_SGP,
	output.type = c("LONG_Data", "LONG_FINAL_YEAR_Data"))
ISR Production

The following code cleans up the student (first and last) and school names provided in the data, and produces a demonstration set of student reports. These ISRs use real Utah students' data that are randomly selected, but have their personal information (ID, name, school and district info, etc.) anonymized.

###  Ensure that all names are capitalized consistently (camel case)

###  Convert NAME variables to factor so that we can just work with the factor levels (much quicker!)
Utah_SGP@Data[, LAST_NAME := factor(LAST_NAME)]
Utah_SGP@Data[, FIRST_NAME := factor(FIRST_NAME)]
Utah_SGP@Data[, SCHOOL_NAME := factor(SCHOOL_NAME)]

levels(Utah_SGP@Data$LAST_NAME) <- sapply(levels(Utah_SGP@Data$LAST_NAME), capwords)
levels(Utah_SGP@Data$FIRST_NAME) <-sapply(levels(Utah_SGP@Data$FIRST_NAME), capwords)
tmp.sch.levels<-sapply(levels(Utah_SGP@Data$SCHOOL_NAME), capwords, special.words=c("HS", "MS", "CS", "CBA", "UT", "SUCCESS", "DSU", "SUU", "AMES", "BSTA", "CBTU", "GTI", "NUAMES", "SPED", "UCAS", "YIC"), USE.NAMES=FALSE)

###  Fix some of the Schools.  There might be others (?)
tmp.sch.levels <- gsub("Mckinley", "McKinley", tmp.sch.levels)
tmp.sch.levels <- gsub("Mcmillan", "McMillan", tmp.sch.levels)
tmp.sch.levels <- gsub("Mcpolin", "McPolin", tmp.sch.levels)
tmp.sch.levels <- gsub("Inc ,", "Inc.,", tmp.sch.levels)
tmp.sch.levels <- gsub("Eschool@provo", "eSchool@Provo", tmp.sch.levels)
tmp.sch.levels <- gsub("SUCCESS School", "Success School", tmp.sch.levels)

levels(Utah_SGP@Data$SCHOOL_NAME) <- tmp.sch.levels

###  Reset the class of the NAME variables to character 
Utah_SGP@Data[, LAST_NAME := as.character(LAST_NAME)]
Utah_SGP@Data[, FIRST_NAME := as.character(FIRST_NAME)]
Utah_SGP@Data[, SCHOOL_NAME := as.character(SCHOOL_NAME)]

###  Set SGPstateData to NOT include the front page in the School Catalogs produced
###  This is now included in the UT entry of SGPstateData.  Set to NULL to include it in the catalog if wanted.
# SGPstateData[["UT"]][["Student_Report_Information"]][["Include_Front_Page_in_School_Catalog"]] <- FALSE

###  Produce the "Demo" set of student reports, as well as state level bubblePlots and growthAchievementPlots

visualizeSGP(
	Utah_SGP,
	sgPlot.front.page = "Misc/USOE_Cover.pdf", # Serves as introduction to the report
	sgPlot.header.footer.color="#0B6E8D", # Will need to change to match USOE_Cover.pdf
	sgPlot.demo.report = TRUE,
	sgPlot.year.span = 2,
	sgPlot.plot.test.transition=FALSE,
	gaPlot.content_areas=c("ELA", "MATHEMATICS", "SCIENCE"))