This was the final project for Data Science: R (DAT-5301) course at Hult International Business School. It focused on exploring 3 datasets consisting of password strengths, ranks, categories and data breaches. The purpose of the project was to conduct a thorough data analysis on why passwords are still a business -cybersecurity- concern and share insightful conclusions.
- Problem recognition:
- What is the Business problem that you can analyze from this dataset? Why is it relevant?
- Review of Previous findings:
- What does your research guide you into? Are there key insights that you found from your research about the Busines problem? – This will be the area where, as a team, you would look into Business articles (WSJ / Economist / Financial times) to highlight about the business problem that you are trying to explore.
- What is the Testable Hypothesis / Thought process that you established based on your initial research? Your analysis can be predictive or inferable. If your analysis is predictive, there would not be a hypothesis, instead it would report model performance.
-
Variable Selection: Introduce your Data using key attributes. What is the data about?
-
Data collection: What are the data sources that you collected?
-
Data Analysis: Summarization and Visualization (5-7 charts / analyses)
- What are the key trends and patterns that you find about the data? Each trend /chart should have 3-4 lines about why is that trend/chart important. How does it add value to your Data Analysis project?
- Are there Outliers in your data? What charts/visualization did you use to identify them? How did you handle your Outliers?
- What are the updates/ modifications that you did to your initial hypothesis/ thought process after Summarization and Visualization?
-
Modelling: (OLS and / Logistic) to identify relations /connections in the data
-
Results presentation:
- Validate your Hypothesis / thought process. What are your inferences / model performance?
- Preparing your R markdown for presentation
- What are your 3# specific insights for the data analysis? Connect your data analysis from Stage 1 and Modelling from Stage 3 to support your findings. It is also expected that you use with domain knowledge (i.e. research from external sources). Make sure to site your sources.