|
Maytal Saar-Tsechansky
Assistant Professor
of Information, Risk, and
Operations Management
Red McCombs School of Business
The University of Texas at Austin
CBA 5.254

Tel: (512) 471-1512
maytal@mail.utexas.edu |
|
_____________________________________

|
NEW!
I am guest-editing the
Special Issue on Utility Based Data Mining with Gary Weiss
and Bianca Zadrozny in
Data Mining and Knowledge Discovery.
Call For Papers |
|
I
Co-chaired The First and Second ACM SIGKDD Workshops on Utility-Based
Data Mining in 2005 & 2006 with Gary Weiss and Bianca Zadrozy:
-
The Second ACM SIGKDD
Workshop on Utility-Based Data Mining,
Chicago, Illinois , 2005
-
The Second ACM SIGKDD
Workshop on Utility-Based Data Mining, August 2006,
Philadelphia, Pennsylvania.
|
|
NSF Grant: STTR Program
(with Daniele Micci-Barreca): "Active Learning System for Audit
Selection" |
| |
| |
Research Papers |
|
| Journal Publications |
-
Saar-Tsechansky
Maytal and Provost Foster. “Handling Missing Values When Applying
Classification Models”. Journal of Machine Learning Research,
8(Jul):1623--1657, 2007.
Abstract:
Much work has studied the effect
of different treatments of missing values on model induction, but little work
has analyzed treatments for the common case of missing values at prediction
time. This paper first compares several different methods---predictive value
imputation, the distribution-based imputation used by C4.5, and using reduced
models---for applying classification trees to instances with missing values (and
also shows evidence that the results generalize to bagged trees and to logistic
regression). The results show that for the two most popular treatments, each is
preferable under different conditions. Strikingly the reduced-models approach,
seldom mentioned or used, consistently outperforms the other two methods,
sometimes by a large margin. The lack of attention to reduced modeling may be
due in part to its (perceived) expense in terms of computation or storage.
Therefore, we then introduce and evaluate alternative, hybrid approaches that
allow users to balance between more accurate but computationally expensive
reduced modeling and the other, less accurate but less computationally expensive
treatments. The results show that the hybrid methods can scale gracefully to the
amount of investment in computation/storage, and that they outperform imputation
even for small investments |
-
Paul
Tetlock, Maytal Saar-Tsechansky and Sofus Macskassy. “More Than Words:
Quantifying Language to Measure Firms' Fundamentals”. Journal of Finance,
Forthcoming.
Abstract:
We examine whether a simple quantitative measure of language can be used to
predict individual firms’ accounting earnings and stock returns. Our three main
findings are: (1) the fraction of negative words in firm-specific news stories
forecasts low firm earnings; (2) firms’ stock prices briefly underreact to the
information embedded in negative words; and (3) the earnings and return
predictability from negative words is largest for the stories that focus on
fundamentals. Together these findings suggest that linguistic media content
captures otherwise hard-to-quantify aspects of firms’ fundamentals, which
investors quickly incorporate into stock prices. |
|
-
Saar-Tsechansky
Maytal and Provost Foster. “Decision-centric Active Learning of Binary-Outcome
Models”, Information Systems Research, Vol. 18, No. 1, pp. 1–19, 2007.
Abstract:
It can be
expensive to acquire the data required for businesses to employ data-driven
predictive modeling, for example to model consumer preferences to optimize
targeting. Prior research has introduced “active learning” policies for
identifying data that are particularly useful for model induction, with the goal
of decreasing the statistical error for a given acquisition cost (error-centric
approaches). However, predictive models are used as part of a decision-making
process, and costly improvements in model accuracy do not always result in
better decisions. This paper introduces a new approach for active data
acquisition that targets decision-making specifically. The new decision-centric
approach departs from traditional active learning by placing emphasis on
acquisitions that are more likely to affect decision-making. We describe two
different types of decision-centric techniques. Next, using direct-marketing
data, we compare various data-acquisition techniques. We demonstrate that
strategies for reducing statistical error can be wasteful in a decision-making
context, and show that one decision-centric technique in particular can improve
targeting decisions significantly. We also show that this method is robust in
the face of decreasing quality of utility estimations, eventually converging to
uniform random sampling, and that it can be extended to situations where
different data acquisitions have different costs. The results suggest that
businesses should consider modifying their strategies for acquiring information
through normal business transactions. For example, a firm such as Amazon.com
that models consumer preferences for customized marketing may accelerate
learning by proactively offering recommendations—not merely to induce immediate
sales, but for improving recommendations in the future. |
|
-
Saar-Tsechansky
Maytal and Provost Foster. “Active Sampling for Class Probability Estimation and
Ranking.” Machine Learning, 54:2, 153-178, 2004
Abstract:
In many cost-sensitive environments class
probability estimates are used by decision makers to evaluate the expected
utility from a set of alternatives. Supervised learning can be used to build
class probability estimates; however, it often is very costly to obtain training
data with class labels. Active learning acquires data incrementally, at each
phase identifying especially useful additional data for labeling, and can be
used to economize on examples needed for learning. We outline the critical
features of an active learner and present a sampling-based active learning
method for estimating class probabilities and class-based rankings.
BOOTSTRAP-LV
identifies particularly informative new data for learning based on the variance
in probability estimates, and uses weighted sampling to account for a potential
example’s informative value for the rest of the input space. We show empirically
that the method reduces the number of data items that must be obtained and
labeled, across a wide variety of domains. We investigate the contribution of
the components of the algorithm and show that each provides valuable information
to help identify informative examples. We also compare
BOOTSTRAP-LV
with
UNCERTAINTY SAMPLING, an existing active
learning method designed to maximize classification accuracy. The results show
that BOOTSTRAP-LV
uses fewer examples to exhibit a certain estimation accuracy and provide
insights to the behavior of the algorithms. Finally, we experiment with another
new active sampling algorithm drawing from both
UNCERTAINTY SAMPLING
and BOOTSTRAP-LV
and show that it is significantly more competitive with
BOOTSTRAP-LV compared to
UNCERTAINTY SAMPLING. The analysis
suggests more general implications for improving existing active sampling
algorithms for classification.
|
-
Saar-Tsechansky
Maytal, Pliskin Nava, Rabinowitz Gadi., and Porath Avi, "Mining Relational
Patterns from Multiple Relational Tables," Decision Support Systems, Vol.
27, No. 1-2, 177-195, 1999. An earlier version appeared in HICSS 2001.
|
| Peer-Reviewed Meetings |
|
·
Foster Provost,
Prem Melville, and Maytal Saar-Tsechansky. Data acquisition and cost-effective
predictive modeling: targeting offers for electronic commerce. Invited paper to
appear In the Proceedings of The Ninth International Conference on Electronic
Commerce, Minneapolis, 2007.
·
Saar-Tsechansky,
Duy Vu, Mikhail Bilenko, and Prem Melville. “Intelligent
Information Acquisition for Improved Clustering”,
Workshop on Information
Technologies and Systems
(WITS), 2007.
·
David
Pardoe, Peter Stone, Maytal Saar-Tsechansky, and Kerem Tomak, “Adaptive
Mechanism Design: A Metalearning Approach”. In the Proceedings of The Eighth
International Conference on Electronic Commerce, 2006.
·
Prem Melville,
Stewart M. Yang, Maytal Saar-Tsechansky, and Raymond J. Mooney. “Active Learning
for Probability Estimation using Jensen-Shannon Divergence”, The Proceedings
of The 16th European Conference on Machine Learning (ECML), Porto, Portugal,
2005. 10% acceptance rate.
·
Melville, P.,
Saar-Tsechansky, M., Provost, F. and Mooney, R.J. An Expected Utility Approach
to Active Feature-value Acquisition. The Proceedings of the Fifth
International Conference on Data Mining (ICDM-2005). 13% acceptance rate.
·
David Pardoe,
Peter Stone, Maytal Saar-Tsechansky and Kerem Tomak. Adaptive Auctions:
Learning to Adjust to Bidders. Workshop on Information Technologies and
Systems (WITS), 2005. 27% acceptance rate.
·
Melville, P.,
Saar-Tsechansky, M., Provost, F. and Mooney, R.J. Economical Active
Feature-value Acquisition through Expected Utility Estimation.
Proceedings of the KDD-05 Workshop on Utility-Based Data Mining, Chicago,
IL, August 2005.
·
Maytal
Saar-Tsechansky and Hsuan Wei-Chen. Variance-Based Active Learning for
Classifier Induction. Workshop on Information Technologies and Systems (WITS),
2005. 27% acceptance rate.
·
Prem Melville,
Maytal Saar-Tsechansky, Foster Provost, and Raymond J. Mooney. “Active Feature
Acquisition for Classifier Induction.” The Proceedings of The Fourth
International Conference on Data Mining (ICDM-2004). Brighton, UK. November
2004. 14% acceptance rate.
·
Saar-Tsechansky
Maytal and Provost Foster. “Active Learning for Class Probability Estimation and
Ranking” The Seventeenth International Joint Conference on Artificial
Intelligence (IJCAI-01), Seattle, Washington, August 2001. 24% acceptance
rate. (An extended version was published in the Journal of Machine
Learning)
·
Saar-Tsechansky
Maytal, Pliskin Nava, Rabinowitz Gadi, and Tsechansky Mark. "Patterns
Extraction for Monitoring Medical Practices," Proceedings of the 34th Hawaii
International Conference on Systems Sciences (HICSS), Maui, Hawaii. IEEE
Computer Society Press, 2001. Best Paper Award Winner of the
Information Technology in Health Care Track.
Working Papers
·
“Active
Information Acquisition for Model Induction” Maytal Saar-Tsechansky,
Prem Melville and Foster Provost.
Abstract:
Most induction algorithms for building predictive models take as input training
data in the form of feature vectors. Acquiring the values of features may be
costly, and simply acquiring all values may be wasteful, or prohibitively
expensive. Active feature-value acquisition (AFA) selects features
incrementally in an attempt to improve the predictive model most
cost-effectively. This paper presents a framework for AFA based on estimating
information value. While straightforward in principle, estimations and
approximations must be made to apply the framework in practice. We present an
acquisition policy, Sampled Expected Utility (SEU), that employs particular
estimations to enable effective ranking of potential acquisitions in settings
where relatively little information is available about the underlying domain.
We then present experimental results showing that, as compared to the policy of
using representative sampling for feature acquisition, SEU reduces the cost of
producing a model of a desired accuracy and exhibits consistent performance
across domains. We also extend the framework to a more general modeling setting
in which feature values as well as class labels are missing and are costly to
acquire.
·
“Identifying Customer-Centric, Cross-Category Product Groups: A Product
Segmentation Approach and its Relationship to Customer Segmentation Approaches”,
Andrea Godfrey, Leigh McAlister, and Maytal Saar-Tsechansky.
Abstract: As part of their customer management strategy, retailers with
large, multi-category offerings need to present their products in ways that help
target customers search and choose from those offerings. The authors propose a
product segmentation approach that gives retailers a methodology for directly
identifying customer-centric, cross-category, product segments from large
numbers of products in multiple categories such that products within a segment
are purchased by the same type of customers. In addition, the research examines
the relationship between the proposed product segmentation approach and a
parallel customer segmentation approach. The close relationship between the
approaches suggests that the segments of products and customers inferred from
each approach will be equivalent. However, the authors show that this is not
the case because of the aggregation constraint imposed on customers in the
product segmentation approach and on products in the customer segmentation
approach. Further, the authors show that the product segmentation approach
provides better recommendations of products for a customer to purchase, while
the customer segmentation approach provides better recommendations of customers
for a product to target.
Other Publications
-
Gary Weiss, Maytal Saar-Tsechansky, Bianca
Zadrozny: Report on UBDM-05:
Workshop on Utility-Based Data Mining. SIGKDD Explorations 7(2):
145-147,
2005.
-
Bianca Zadrozny, Gary.
M. Weiss and Maytal Saar-Tsechansky.
UBDM 2006: Utility-Based
Data Mining 2006 Workshop Report. SIGKDD
Explorations, 8(2), ACM Press,December 2006.
·
Saar-Tsechansky
Maytal, Pliskin Nava., Rabinowitz Gadi., Porath Avi, and Tsechansky Mark.
"Monitoring Quality of Care with Relational Patterns," Topics in Health
Information Management, Vol. 22, N0. 1, 2001.