Skip to content

Study Materials for Data Analytics & Visualization Course offered for Semester 6, AI & DS Students

Notifications You must be signed in to change notification settings

LifnaJos/Data-Analytics-Visualization-ADC601

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Faculty Incharge : Lifna C S

No Rubrics Marks Document / Schedule
1 End Semester Exam 60 Marks
2 Internal Assessment 20 Marks Mid-Term Paper, Mid-Term Paper Solution
3 Continuous Assessment 20 Marks
a. MCQ-1 5 Marks 2nd week of Feb 2024
b. MCQ-2 5 Marks 4th week of Mar 2024
c. Mini-Project 10 Marks 5th week of Mar 2024
d. Content Beyond Syllabus Presentation 10 Marks 4th & 5th week of March 2024
Total Marks 100 Marks
  • Data Analytics Lifecycle overview: Key Roles for a Successful Analytics, Background and Overview of Data Analytics Lifecycle Project.
  • Phase 1: Discovery: Learning the Business Domain, Resources Framing the Problem, Identifying Key Stakeholders. Interviewing the Analytics Sponsor, Developing Initial Hypotheses Identifying Potential Data Sources.
  • Phase 2: Data Preparation: Preparing the Analytic Sandbox, Performing ETLT, Learning About the Data, Data Conditioning, Survey and visualize, Common Tools for the Data Preparation Phase.
  • Phase 3: Model Planning: Data Exploration and Variable Selection, Model Selection ,Common Tools for the Model Planning Phase
  • Phase 4: Model Building: Common Tools for the Model Building Phase
  • Phase 5: Communicate Results
  • Phase 6: Operationalize

Sample Case Studies

  1. Imagine you are a data analyst working for a retail company that wants to optimize its marketing strategy for a new product launch. Utilizing the data analytics lifecycle, outline the key steps you would take from discovery to operationalization to ensure the success of this project. Be sure to discuss the specific activities and tools you would use in each phase, as well as how you would engage with stakeholders throughout the process to ensure alignment with business goals and objectives. Additionally, identify any potential challenges you might encounter at each stage and how you would mitigate them to keep the project on track.
  2. ABC Retail, a leading player in the fashion industry, aimed to enhance its sales forecasting accuracy to better manage inventory and meet customer demands efficiently. To achieve this, they embarked on a data analytics project. This case study illustrates how ABC Retail applied the data analytics lifecycle to achieve their objectives.
  3. Retail Store Optimization: How can a retail chain leverage the data analytics lifecycle to optimize its store layout, inventory management, and pricing strategies to maximize sales and enhance customer satisfaction?
  4. Healthcare Resource Allocation: In what ways can a healthcare provider utilize the data analytics lifecycle to analyze patient demographics, medical histories, and treatment outcomes to efficiently allocate resources and improve patient care while minimizing costs?
  5. Fraud Detection in Financial Transactions: How can a financial institution employ the data analytics lifecycle to detect and prevent fraudulent activities in real-time transactions, considering factors such as transaction patterns, account behavior, and historical fraud incidents?
  6. Energy Consumption Optimization: How might a utility company apply the data analytics lifecycle to analyze energy consumption patterns, identify areas of inefficiency, and develop targeted strategies to reduce energy waste and carbon footprint while maintaining service reliability?
  7. Predictive Maintenance in Manufacturing: In what ways can a manufacturing company utilize the data analytics lifecycle to predict equipment failures, optimize maintenance schedules, and minimize downtime, thereby improving operational efficiency and reducing maintenance costs?
  • Introduction to Regression
  • Types of Regression: Simple Linear Regression, Multiple Linear Regression, Polynomial Regression, Interaction Regression, Weighted least squares Regression, Ridge Regression, Loess Regression, Bootstrapping Regression
  • Qualitative predictor variables, Model Evaluation Measures, Model selection procedures, Leverage, Influence measures, Diagnostics.
  • Logistic Regression: Logistic Response function and logit, Predicted values from Logistic Regression, Interpreting the coefficients and odds ratios
  • Generalized Linear model
  • Logistic Regression Vs GLM,
  • Linear Regression Vs Logistic Regression
  • Assessing the models.

Colab Notebooks on Regression

  • Definition of time series, Times series forecasting. Time series components, Decomposition – additive and multiplicative. Exponential smoothing, Holt winters method.
  • Time Series Analysis - Box-Jenkins Methodology, ARIMA Model Autocorrelation Function (ACF, PACF) Autoregressive Models ,Moving Average Models ,ARMA and ARIMA Models , Building and Evaluating an ARIMA Model.
  • Acquiring and Visualizing Data, Simultaneous acquisition and visualization, Applications of Data Visualization, Keys factors of Data Visualization , Exploring the Visual Data Spectrum: charting Primitives (Data Points, Line Charts, Bar Charts, Pie Charts, Area Charts), Exploring advanced Visualizations (Candlestick Charts, Bubble Charts, Surface Charts, Map Charts) ; Narrative visualization and digital story Telling ,infographics and interactive dashboards

Module - 5 : Introduction to D3.js:

  • Getting setup with D3, Making selections, changing selection’s attribute, Loading and filtering External data.
  • Building a graphic that uses all of the population distribution data, Data formats you can use with D3,
  • Creating a server to upload your data, D3’s function for loading data, Dealing with Asynchronous requests,
  • Loading and formatting Large Data Sets
  • Online Tutorials:
  • Google Colab Notebooks were prepared by each every students based on the above tutorials.

Module - 6 : Data analytics and Visualization with Python

  • Essential Data Libraries for data analytics: Pandas, NumPy, SciPy.
  • Plotting and visualization with python: Introduction to Matplotlib, Basic Plotting with Matplotlib, Create Histogram, Bar Chart, Pie Chart, Box Plot, violin plot using Matplotlib, Matrix charts and heat maps.
  • Introduction to seaborne Library, Multiple Plots, Regression plot, replot. Discover and visualize the data to gain insights, Feature scaling and Transformation pipelines
  • Google Colab prepared by D11AD Students as per the Syllabus and as a part of Content Beyond Syllabus
No Package Name Contributor Document
1 Pandas Shreeprasad Navare Colab Notebook
2 SciPy Suhanee Kandalkar Colab Notebook
3 NumPy Alok Kale Colab Notebook
4 Matpltlib Mrunal Shinde Colab Notebook
5 Seaborn Kapil Bodas Colab Notebook
6 ggplot Chaitali Gaikwad Colab Notebook

Online Resources

  1. Stat 501 : Regression Methods by Penn State University, Online Course Notes
  2. Regression Analysis
  3. Applied Linear Regression - Sanford Weisberg
  4. Regression Analysis by Example
  5. Regression Analysis by Example - Textbook Datasets
  6. Applied Regression Analysis with SAS - Mc Master University, Cananda
  7. Applied Regression Analysis using R - University of Chicago

Text Books & References :

  1. Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data,EMC Education services Wiley Publication
  2. Data Analytics using Python: Bharati Motwani, Wiley Publications.
  3. Forecasting : methods and applications- Spyros G Makridakis, Steven C wheelwright, Rob J Hyndman, 3rd edition Wiley publications.
  4. Practical Text Mining and statistical Analysis for non-structured text data applications,1st edition,Grey Miner,Thomas Hill.
  5. Ritchie S. King, Visual story telling with D3’ Pearson
  6. Data Mining, Concepts and Techniques: 3rd edition, Jiawei Han, Micheline Kamber and Jian Pei
  7. Python for Data Analysis: 3rd Edition, Wes McKinney ,Publisher(s): O'Reilly Media, Inc.
  8. Ben Fry, ‘Visualizing data: Exploring and explaining data with the processing environment’, O'Reilly, 2008.
  9. Nisbet, Robert, John Elder, and Gary D. Miner. Handbook of statistical analysis and data mining applications. Academic press, 2009.
  10. Visualising Data: A Handbook for Data Driven Design (Second Edition)' by Andy Kirk, published by Sage

Acknowledgemnts

  • This material was prepared as a part of Course - Data Analytics and Visualization offered by the University of Mumbai to the Third Year Students of Artifical Intelligence & Data Science** Engineering Branch.

About

Study Materials for Data Analytics & Visualization Course offered for Semester 6, AI & DS Students

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published