MGS 8040: Data Mining
Syllabus for Fall 2014
Instructor: Dr. Satish Nargundkar
Office: 827 College of Business
Office Hours: By appointment
Phone: (678) 644 6838
|
E-Mail : snargundkar@gmail.com
Website: www.nargund.com/gsu
CRN: 83254, Buckhead Center, Room 406
Thursday 4:30 – 7:00 PM
|
Prerequisites: MBA 7025 or equivalent or permission of instructor. You must already have knowledge of basic statistics, including Regression Analysis, to succeed in this course.
Text:
-
Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management, 3rd Edition, by Gordon Linoff and Michael Berry. ISBN-10: 0470650931
ISBN-13: 978-0470650936, Wiley.
The following optional books/sites may also be helpful.
-
Making Sense of Data II by Glenn Myatt & Wayne Johnson, John Wiley& Sons, 2009.
-
Multivariate Data Analysis by Hair, Anderson, Tatham, & Black, Prentice Hall.
-
http://statsoft.com/textbook/stathome.html.
-
The Little SAS Book by Delwiche and Slaughter.
Course Catalog Description
This course covers various analytical techniques to extract managerial information from large data warehouses. A number of well-defined data mining tasks such as classification, estimation, prediction, affinity grouping and clustering, and data visualization are discussed. Design and implementation issues for corporate data warehousing are also addressed.
Detailed Course Description
Data mining supports decision making by detecting patterns, devising rules, identifying new decision alternatives and making predictions. This course is organized around a number of well-defined data mining tasks: description, classification, estimation, prediction, and affinity grouping and clustering. Students will learn to use techniques such as Rule Induction (classification trees), Logistic Regression, Discriminant Analysis, and Neural Networks. Data visualization techniques will be used whenever possible to reveal patterns and relationships. Students will use commercially available software tools to mine large databases. Team-based projects will be conducted.
The course is organized into 3 broad areas as follows:
-
Context: Decision Support for Strategic/Tactical Decision-making. Data/Information Organization Data Warehouse Design.
-
Exploratory Analysis: Segmentation Techniques
-
Forecasting/Segmentation: Modeling Techniques, Transforming analysis into actions
Learning Outcomes/Course Objectives
Upon completion of the course, students will be able to:
-
Explain in your own words a general framework for decision support within organizations.
-
Discuss the sources of data, problems with data, and how to overcome them (Data Cleaning).
-
Understand business requirements, organization structure, and how data mining projects may fit into a client’s organization to meet their decision support needs.
-
Explain the data mining methodology; use it to analyze a dataset.
-
Use visual techniques to describe data.
-
Explain the assumptions of various techniques such as Cluster Analysis, Multiple Regression, Discriminant Analysis, Logistic Regression, and Artificial Neural Networks.
-
Build multiple regression, discriminant analysis, and Logistic models for forecasting.
-
Validate models using the Kolmogorov-Smirnov (K-S) test.
-
Compare and Contrast Neural Networks with Statistical techniques.
-
Interpret Classification trees.
-
Use Interaction detection methods such as CART, CHAID, for classification.
-
Segment data using Cluster Analysis, and interpret the output.
-
Identify underlying factors using Factor Analysis, and interpret.
-
Discuss issues of implementation of the results of various techniques.
-
Develop methods to monitor the ongoing performance of implemented models.
Methods of Instruction:
The course will combine lectures and discussion, plus guest lectures from industry experts. The team-based project will be emphasized, and case studies will be discussed.
Grading:
-
|
|
|
Course Average
|
Grade
|
Course Average
|
Grade
|
Assignments
|
20%
|
|
94-96, 97+
|
A, A+
|
77-79
|
C+
|
Tests (2)
|
50%
|
|
90-93
|
A-
|
73-76
|
C
|
Team Project
|
20%
|
|
87-89
|
B+
|
70-72
|
C-
|
Final Exam
|
10%
|
|
83-86
|
B
|
60-69
|
D
|
|
|
|
80-82
|
B-
|
Less than 60
|
F
|
Late work will get partial credit only, with 10% less for each day of delay.
Software: Students are encouraged to do project work in SAS or R in order to develop a marketable skill. You may choose other software (SPSS is available at GSU) if you wish. SAS will be discussed in class.
General Policies:
-
Students are expected to attend each class (who knows, you may actually enjoy the class!), arrive on time and participate in class discussions.
-
Turn off cell phones, pagers, stereos, TVs, etc. when in class. Treat the instructor and each other with courtesy.
Course Assessment:
Your constructive assessment of this course plays an indispensable role in shaping education at Georgia State. Upon completing the course, please take the time to fill out the online course evaluation.
MGS 8040 Data Mining Tentative Schedule – Fall 2014
Date |
Topic
|
Readings
|
Assignments
| Overview / Understanding Data |
Week 1: 8/28
|
Introduction – DM Overview
|
Notes
|
|
Week 2: 9/4
|
Regression Review
Understanding Credit Data –
Equifax / Experian / Trans Union
|
Notes – Simple Regression
Notes – Multiple Regression
Exercise
|
Review Regression Analysis Notes
|
Week 3: 9/11
|
The Initial Client Meeting
|
Notes – Initial Client Meeting
Hair Chapter 2
Sample Design Exercise
Solution to Exercise
Data Cleaning
|
1. Application – Dep. Var, Outcome, Sample time frame
|
Week 4: 9/18
|
Introduction to SAS
SAS Training at UCLA
| Notes – Basic SAS Analysis The Little SAS Book
By Delwiche & Slaughter
Data1 subset in Excel
|
2. SAS assignment
Folder Instructions
|
Week 5: 9/25
|
Guest Lecture: State of the art of Analytics and Big Data.
Bill Franks, Chief Analytics Officer, Teradata Corp.
|
Week 6: 10/2
|
Data Cleaning
Dummy Variable Definition
Class Handout
|
Data Warehouse introduction
Books by Edward Tufte.
Gallery of Data Visualization
WHO visualization
|
3. Crosstabs, Dummy decisions
|
Week 7: 10/9
|
Test 1
|
|
| Modeling/Validation/Forecasting |
Week 8: 10/16
|
Discriminant Analysis
Validation – KS Test
SAS Programs for Reg/Scoring
|
Hair, Chapter 4
www.statsoft.com
|
4. Discrim, KS
SAS Programs for Regression/Scoring
|
Week 9: 10/23
| Guest Lecture: Logistic Regression and Classification Trees Gregg Weldon, Chief Analytics Officer, Analytics IQ Inc. Intro to Logistic Regression, Logistic Regression, Classification Trees |
Week 10: 10/30
| Effectiveness of models – A review of methods
Neural Networks
Excel file (demo of logic)
|
Research Paper on Model Effectivenss
www.statsoft.com
|
| Segmentation |
Week 11: 11/6
|
Segmentation
Cluster Analysis
SPSS Output (Cluster)
Memory Based Reasoning
|
Hair, Cluster Analysis
www.statsoft.com
Factor Analysis
Clustering Paper
|
Project Progress Report (informal, oral) |
Week 12: 11/13
|
Test 2
|
|
|
Week 13: 11/20
|
Project Presentations
|
11/27
|
Thanksgiving Break
|
Week 14: 12/4
|
Monitoring Reports Review
|
Project Reports Due [Guidelines]
Sample Final Project
| Week 15: 12/11 |
Final Exam – Comprehensive – 4:15 – 6:45 PM.
|
Appendix A
MSA Program Goals and Objectives
Goals:
Students completing the MS in Analytics will:
G1 Understand organizational problems in general and associated analytical problems in particular.
G2 Proficient in the management of data needed for decision-making.
G3 Proficient with the methodological skills needed for data-driven decision-making.
G4 Understand the implementation issues that accompany analytical problem solving.
G5 Be able to demonstrate the positive impact on analytics on organizations.
Objectives/Learning Outcomes (LO): After finishing the program students are expected to have mastered the knowledge and skills to carry out the following analytical tasks:
LO1 Frame Business Problems (G1) MSA students will properly frame a business problem.
LO2 Frame Analytical Problems (G1) MSA students will demonstrate the ability to properly solve analytical problems.
LO3 Data Management (G2) MSA students will effectively acquire, clean, and manage both structured and unstructured data.
LO4 Methodology (G3) MSA students will identify and apply the appropriate methodology for the business and analytical problem(s) identified.
LO5 Modeling (G3, G4). MSA students will build and deploy analytical models across organizations that fit the underlying organizational needs and the analytical problem(s) identified.
LO6 Programming (G4). MSA students will solve analytical problems by utilizing computer programming, both by employing available tools where possible and by developing customized solutions where necessary.
LO7 Life Cycle Management. (G3, G4). MSA student(s) will develop adaptable models that allow for continued organizational improvement of productivity and quality
LO8 Organizational Impact (G5) MSA student(s) will effectively communicate the positive, strategic impact of a model on the firm to which it is being applied.
Dostları ilə paylaş: |