Skip to content

echo4922/Data-Visualization-Heatmap-with-DGE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Data-Visualization-Heatmap-with-DGE

A heatmap is a type of data visualization where the magnitude of values is represented by colors and needs a minimum of two objects to display (Source: Techtarget.com). Because the DGE analysis was performed with the standard approach and the compositional approach, the heatmap is the ideal choice of data visualizations to compare and contrast log fold changes of DGEs where the log fold changes are represented by colors. A total of four R programming libraries were needed for the DGE analysis . To perform the DGE analysis, 13 genes related to the cGAS-STING signaling pathway were first handpicked from DESeq2. The cGAS-STING pathway emerged as a potential mediator of inflammation in the context of infection, cell stress, and tissue damage (Source: original paper by Decout et al). In addition, the pathway may play a role in inflammation for age-related diseases (Source: originla paper by Zheng et al). To perform the DGE analysis on ALDEx2, the raw features of 13 genes were needed. Once ALDEx2 yielded differentially expressed genes with log fold changes, the log fold changes of 13 genes were extracted from the DESeq2 and the ALDEx2 then a new CSV file was created. In the CSV file for creating the heatmap, a first column designates names of 13 genes with base 2 log fold changes of DESeq2 on a second column and base 2 log fold changes of ALDEx2 on a third column.

The figure below is the heatmap:

Heatmap

The heatmap is based on 13 genes related to the cGAS-STING pathway. Because the DGE analysis was performed with both the standard approach and the CoDA, the comparison between these two approaches is shown as well. For instance, the DGEs with the standard “counts” approach (DESeq2) are located on the left side. On the other hand, the DGEs with the CoDA (ALDEx2) are located on the right side. Each value in a “box” represents actual base 2 log fold changes of each gene rounded to two decimal places. The colors are represented according to the log fold changes and the representations of colors and log fold change values are shown next to the heatmap; red colors are shown as the log fold changes get to zero. Orange colors are shown when log fold changes are around -0.5. Lime colors are shown when the log fold changes are around -1.5. Blue colors are shown as the log fold changes reach -2.5.

The heatmap comparing log fold changes of DESeq2 and ALDEx2 based on 13 genes related to the cGAS-Sting pathway indicates the marginal increase of gene expression of the aubergine gene (aub). By the definition of log fold change in gene expression, the aubergine gene was elevated in the treatment condition as it indicates positive log fold changes. It belongs to the Piwi subfamily of RNA binding proteins and facilitates the repression of transposable elements during key developmental time points and oogenesis in Drosophila melanogaster. The Diptericin-B (DptB) gene showed the lowest gene expression as it indicates the lowest negative fold change value. By the definition of log fold change in gene expression, the Diptericin-B gene was more expressed in the control condition. It is regulated at the transcriptional level by the immune deficiency (IMD) pathway. As a key inflammatory marker, REF, the Diptericin-B gene has been examined extensively in previous studies.

The overall directionalities of gene expressions (increase or decrease) are generally similar between the standard “counts” data approach (DESeq2) and the CoDA (ALDEx2) although log fold change values are different due to different mathematical modeling implemented by the two data approaches. One thing to note is that in order to obtain the same directionalities of gene expressions, the order of condition has to be different for DESeq2 and ALDEx2. For example, a separate metadata file is needed for DESeq2 such that columns of the count matrix specify the control condition first then the treatment condition is specified. In ALDEx2, a condition vector is created on R such that the treatment condition is specified first then the control condition is specified. Thus, if the control condition is specified first in the same way as DESeq2, then the log fold changes of ALDEx2 yield completely different profiles compared to the DESEq2. For the quality of control, random genes were selected for log fold changes of ALDEx2 and then compared to log fold changes of DESeq2. The results were consistent; the treatment condition has to be specified first in ALDEx2 while the control condition must be specified first in DESeq2 to obtain similar directionalities in DESeq2 and ALDEx2.

References:
Heatmap: https://www.techtarget.com/searchbusinessanalytics/definition/heat-map
Paper by Decout et al: https://doi.org/10.1038/s41577-021-00524-z
Paper by Zheng et al: https://doi.org/10.14336/ad.2023.0117

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published