University of Illinois at Chicago
College of Business Administration
Department of Information & Decision Sciences
IDS 472 Statistics for Information Systems and Data Mining
Semester  Spring, 2002 (Term #012)

HOMEPAGE
Time M W 2:00 - 3:15
Place DH 118, except in L270 EPASW (CBA PC Lab) on Mondays, 14-Jan / 22-April
Textbooks  Required:  Berry & Linoff, Data Mining Techniques (DMT)
Recommended:  Berry & Linoff,  Mastering Data Mining (MDM)

Staff

Instructor  Stanley L. Sclove, Professor, Information and Decision Sciences
Office 2418 UH
Telephone Office (312) 996-2681      Dept (312) 996-2676
E-mail slsclove@uic.edu
Web page URL  http://www.uic.edu/~slsclove
Office hours  14 Jan / 29-Apr,  M 3:30-5:30  in CBA PC Lab, L270 EPASW (if not there, in Room 2418 UH)
TA  Senlin Wu, PhD Student in MIS
E-mail swu6@uic.edu
Office hours  M W 12:15 - 1:45, in the CBA PC LAB (Room L270 EPASW)

Organization and Administration

Syllabus
Attendance
Course Calendar
Bibliography   (FYI only:   no readings assigned except the textbooks)
Errata to DMT
Suggestions for IDS Majors


Tasks   in the course

Three homework assignments;  two exams;  and  a final project.
 
Homework A Nearest Neighbor Classification Due M,  4-Feb-2002
Homework B Classification  Due M, 11-Feb-2002
Exam #1 Review Qs for Exam #1     Solutions to same W, 20-Feb-2002 
Homework C Classification Trees  Due W, 3-April-2002 
Exam #2 Review Qs for Exam #2    Solutions to same W, 17-April-2002 
Project Team Project for the Term Due by 12:30 pm on W, 1-May-2002   Peer Evaluation Form

In regard to the exams, there may well be exam questions which are from lectures and not necessarily covered in the Review Qs.  So, come to class, take notes, and study the notes between classes.



Labs
Lab #1a   Mon., 14-Jan.:   Distances
Lab #1b   Mon., 28-Jan.:   Nearest Neighbors
Lab #2    Mon.,  4-Feb.:   Discriminant Analysis
Lab #3    Mon., 11-Feb.:   Discriminant Analysis - Quadratic
Lab #4    Mon., 18-Feb.:   Logistic Regression
Lab #5    Mon., 25-Feb.:   k-MEANS Clustering
Lab #6    Mon.,  4-Mar.:   k-MEANS Clustering with Preprocessing
Lab #7    Mon., 11-Mar.:   Hierarchical Clustering
Lab #8    Mon., 25-Mar.:   Classification Trees
Lab #9    Mon.,  1-Apr.:   Path Analysis
Lab #10   Mon.,  8-Apr.:   Stratified Sampling Instructions    Spreadsheet
Lab #11   Mon., 15-Apr.:   Correspondence Analysis


DATA
Buy, Gender, Age, Income
Alpha radial tires data
Assets/Sales/Profits workbook
Credit Data
Iris Data
Heart Data
Correspondence Analysis of Smoking Data
Background Notes
Review of Business Statistics I-II       Solutions
Notes on Vectors and Matrices


Notes on some Chapters and Other Topics

Memory-Based Reasoning
Chapter 9    Mon., 7-Jan./Wed., 23-Jan.
                   Notes on Ch. 9
                    Video:  ABC "Nightline" show on   MovieLens -- Helping You Find the Right Movies
Classification
                   Classification   W, 30-Jan-02
                   Discriminant Analysis   W, 6-Feb-02
                   Logistic Regression   W, 13-Feb-02
                   Credit Scoring   W, 13-Feb-02
Chapter 1      Why Data Mining?
                      An Introductory Note
Chapter 4      What Can Data Mining Do?
                      Notes for Ch. 4
Chapter 5      Data Mining Methodology
            Outlier Detection:  An Example
Chapter 7      Overview of Data Mining Techniques
                      Another type of technique:  Path Analysis
Exam #1     Wed., 20-Feb-2002
Review Questions for Exam #1        Solutions to same
Exam 1  (posted 21-Feb)     solutions to same   (posted 21-Feb) 
Chapter 10   Automatic Cluster Detection     M, 25-Feb / W, 6-March
Notes on Ch. 10
Notes on "The Clustering of America"
           to go with NOVA's "We Know Where You Live"
                      video on geodemographic databases and direct-mail marketing
Example: "You are What You Eat," by David J. Lipke, American Demographics, 10/1/2000.   Marriott cooks up changes in campus dining   with an inventive segmentation program
Video:  Against All Odds: Inside Statistics Program 10: "Multidimensional Data"
Homepage for Sclove's cluster analysis package.
Chapter 12    Decision Trees    W, 13-March / W, 27-March
Notes on Ch.12
Notes on CART
Tree for iris data
Classification regions from tree for iris data
Expected loss calculation from tree for iris data
Paper on employee retention
Chapter 13      Artifical Neural Networks   W, 3-April / M, 8-April
Outline of lecture
Notes

Exam #2    Wed., 17-April-2002  Review Qs for Exam #2      Solutions to Rev Qs for Exam #2
           posted 18-April:
           Exam 2A     solutions to same          Exam 2B    solutions to same   

Chapter 14     Genetic Algorithms    W, 10-April / W, 24-April
                        Reading:  pp. 335-348; you may omit pp. 349-359.
                        Notes 
Project          Due by 12:30 p.m., Wed., 1-May-2002      Writing Tips

Links
Stock returns research links
Minitab Web Resources Site
Prof. Sclove's homepage


"What goes around, comes around."  1.6 billion movie tickets were sold in the U.S. and Canada in 1999.  The last year that many tickets were sold was 1959.     TIME,  29-Nov-1999. 
Copyright © 2002 Stanley Louis Sclove
latest update  18 April 2002