Data Mining: MIS 382N.9

Professor Maytal Saar-Tsechansky

 

Tuesday, Thursday 3:30pm-5:00pm (UTC 1.146)

 

 

 

 

Course Overview

Data Mining is an applied course introducing popular data mining methods for extracting intelligence from business data.

Organizations today are applying data mining methods to identify and appeal to higher-value customers, customize their product offerings, or minimize losses due to erroneous decision-making and fraud. The role of data-driven intelligence is becoming increasingly critical and a rigorous understanding of the methods and application can provide managers and IS professionals with important tools to improve decision-making. Topics and related methods discussed in the class include personalization, customer relationship management, intelligent marketing, risk management, web mining and operations. We will discuss the inner workings of the methods to the level necessary to develop an understanding of when and how to use each technique. Students would also acquire hands-on experience working in teams and using state-of-the-art software to develop data mining solutions to business problems.

Reading Materials and Resources

Textbook ( available at the bookstore)
Data Mining Techniques, Second Edition by Michael Berry and Gordon Linoff Wiley, 2004 ISBN: 0-471-47064-3

Some reading materials are available in postscript format or  adobe acrobat format.  One Postscript viewer is Ghostscript. To read .pdf files you need Adobe's Acroread.

Articles featuring data mining :

Additional readings are available for download from the course syllabus table (see below) or will be otherwise distributed in class.

WEKA Software and Documentation

      WEKA is an open-source Machine Learning software which we will use in class

      The WEKA web site (includes software download & documentation)

 

Office Hours

By appointment, CBA, Room 5.238

 

Teaching Assistant: Michelle Hsuan-Wei Chen (MICHELLEHWCHEN@MAIL.UTEXAS.EDU)

 

Course Requirements and Grading

 

Style

This is a lecture-style course, however student participation is important. Students are required to be prepared and read the material before class. Students are required to attend all sessions and discuss with the instructor any absence from class.

 

Assignments and Projects

You will hand-in a weekly (individual) write ups. Answers should be well thought out and concise.

Assignments must be submitted by the due date.

 

Late assignments

Turn in your assignment early if there is any uncertainty about your ability to turn it in on the due date. Assignments up to one week late will have their grade reduced by 50%. After one week, late assignments will receive no credit.

 

Projects

There will be a team project (maximum two students per team). Students will address business problems with data mining techniques. Students will hand in a brief report (accounts for 80% of project grade) and prepare a short class presentation of their work (20% of project grade). A class discussion will follow the presentations. There will be no final exam.

 

Grade breakdown:

1. Involvement : 10%

2. Assignments : 30%

3. Case study   : 20%

4. Team project: 40%

 

 

 

 

Tentative Course Schedule

 

Date

Topic

Readings

 

Assignments

due

January 17

Introduction to the course. Introduction to data mining.

 

Chapters 1 & 2  

January 19

Introduction (Contd.)

 

Chapters 1 & 2  

January 24

Fundamental concepts and definitions

 

 

   

January 26

Classification: Recursive partitioning & Decision Trees

 

 

Ch 2 pp. 39-42 (revisit),

Ch. 6 pp. 165-194, 209.

Question set #1

January 31

Classification: Recursive partitioning & Decision Trees (Contd.)

  Question set #2

February 2

Classification: Recursive partitioning & Decision Trees (Contd.)    

February 7

Model Evaluation

   

February 9

Association Rules and Sequential Patterns.

Personalization: K-Nearest Neighbor Classification Algorithm

Pages 287-315 

 Chapter 8: pp.257-271

Question set : Model Evaluation

February 14

Lab session: WEKA

 

 

 Hands on #1: Installing and running WEKA

 

February 16

Personalization: Collaborative filtering

 

 

 

February 21

Clustering Analysis

 

 

 
Chapter 11: 349-365 Hands-on exercise with WEKA #2

Question set: collaborative filtering

February 23

Clustering Analysis

WEKA - Lab session

   
February 28 WEKA Lab session

Bayesian learning with applications to spam filtering

Chapter 8: pp.257-271

Download supplement reading from Blackboard

 

Team project proposal.

March 2

Guest Speaker: Dr. David  Moriarty

David Moriarty is the Director of Data Mining at Apple Computer, where he leads a group of scientists developing analytic solutions to large-scale business problems.  Specifically, Dr. Moriarty leverages data patterns to optimize strategic decisions in various business areas, including fraud detection, product quality, logistics, and sales.  Dr. Moriarty received a M.S. and Ph.D. in computer science from the University of Texas at Austin specializing in artificial intelligence and machine learning.  He regularly serves on journal and conference review committees and is a founding member of Merchant Risk Council.  Before Apple Computer, David designed intelligent algorithms at the Naval Research Laboratory, Daimler-Chrysler Research Center, USC Information Sciences Institute, and Intelligent Technologies Corporation.

 

  Question set: clustering 

 

 

Week of  March 6th

Team work on class project (in class)  

Week of March 13th

Spring Break

   

March 21

Bayesian learning with applications to spam filtering Chapter 8: pp.257-271

 

 

March 23

Guest speaker: Dr. Pramod Singh of HP

Dr. Singh manages the  Global Analytics Solutions group in the Information Technology organization at Hewlett-Packard.  He leads a team of  data-miners and solution architects  and is responsible for the development and deployment of analytics solutions for HP. Dr. Singh has been with HP for over 5 years.  Prior to joining HP, Dr. Singh has worked for Wal-Mart’s Information Systems Division in several areas using data mining to support assortment planning, customer segmentation and market basket analysis.

 

Dr. Singh received a Ph.D and M.S. degree in Mathematics from The University of Arkansas and an MBA from  The University of Jammu. Dr. Singh is the author of research papers and patents and has presented his work in various conferences.  

 

   

March 28

Team presentation of Harrah's Case.

Class discussion 

 

  By Sunday, 3/26/2006

Submit executive reports of Harrah's case. Prepare questions for other teams. 

March 30

Team presentation of Harrah's Case (Contd.)

 Genetic Algorithms

   

April 4

 

Genetic Algorithms (Contd.)

 

Time permitted: Lab session in class - team work on term project 

 

  Question set : Spam filtering

April 6

Guest Speaker: Dr. Ahmet Kuyumcu, Zilliant

Dr. H. Ahmet Kuyumcu is an independent pricing consultant and has over 10 years of practical experience on delivering data-driven, technology-based pricing solutions across variety of Fortune 500 firms. He is currently engaged with a major gaming company to combine their pricing, revenue management, and promotion management activities.  Dr. Kuyumcu was the chief pricing scientist at Zilliant, Inc. and led the science aspects of the product development efforts. Prior to Zilliant, he pioneered innovative price optimization algorithms for media, travel-transportation, and multi-family housing industries at Manugistics, Inc. Dr. Kuyumcu  frequently speaks at conferences on the practice of pricing and revenue management and has published several articles in academic journals.  He also teaches a graduate-level class in pricing and revenue management at University of Texas at Austin. Dr. Kuyumcu is a board member of revenue management and pricing section of INFORMS. He has M.S. and Ph.D. degrees in Operations Research from Texas A&M University.

 

Pricing and Revenue Management and its Hotels and Gaming Resorts

Pricing and Revenue Management (PRM) use historical data and mathematical models to predict customers’ behavior at a micro-market level and optimize product availability and and/or price to maximize revenue and profit.  PRM was first applied to the airline industry shortly after the U.S. Congress passed the Airline Deregulation Act of 1978.  Since deregulation, PRM has generated billions of additional dollars for many industries including, but not limited to airlines, hotels, car rental firms, cruise lines, media companies, utility firms, apartments, railroads, wholesalers, and manufacturers.

Although PRM concepts are similar across different industries, their application varies significantly.  This talk gives a brief overview of pricing and revenue management problems and provides a real-world application for hotel/gaming industries. 

Kuyumcu, H. A. (2002) Gaming Twist in Hotel Revenue Management. Journal of Pricing and Revenue Management, 1, 161-168  

April 11

Artificial Neural Networks

  Question set: Genetic Algorithms

April 13

Neural Networks

 

   

April 18

Ensemble models

Related Technologies

   

April 20

Guest Speaker: Dr. Gerald Fahner, Fair Isaac

 

Dr. Gerald Fahner is Analytic Science-Director at Fair Isaac Corporation’s Core Analytic R&D group, where he develops innovative analytics for business prediction and decision problems. Before joining Fair Isaac, he served as a researcher in machine learning and robotics. Gerald received a Physics diploma from University of Karlsruhe and earned his Computer Science doctorate from University of Bonn, Germany.

 

TBA  

April 25

Preparation for final  project (class consultation)
April 27 Preparation for final  project (class consultation)    
May 2 Team projects  -  presentations and discussion Final projects are due

May 4

Team projects -  presentations and discussion