Research Journal Publications
Most induction algorithms for building predictive models take as input training data in the form of feature vectors. Acquiring the values of features may be costly, and simply acquiring all values may be wasteful, or prohibitively expensive. Active feature-value acquisition (AFA) selects features incrementally in an attempt to improve the predictive model most cost-effectively. This paper presents a framework for AFA based on estimating information value. While straightforward in principle, estimations and approximations must be made to apply the framework in practice. We present an acquisition policy, Sampled Expected Utility (SEU), that employs particular estimations to enable effective ranking of potential acquisitions in settings where relatively little information is available about the underlying domain. We then present experimental results showing that, as compared to the policy of using representative sampling for feature acquisition, SEU reduces the cost of producing a model of a desired accuracy and exhibits consistent performance across domains. We also extend the framework to a more general modeling setting in which feature values as well as class labels are missing and are costly to acquire.
Much work has studied the effect of different treatments of missing values on model induction, but little work has analyzed treatments for the common case of missing values at prediction time. This paper first compares several different methods---predictive value imputation, the distribution-based imputation used by C4.5, and using reduced models---for applying classification trees to instances with missing values (and also shows evidence that the results generalize to bagged trees and to logistic regression). The results show that for the two most popular treatments, each is preferable under different conditions. Strikingly the reduced-models approach, seldom mentioned or used, consistently outperforms the other two methods, sometimes by a large margin. The lack of attention to reduced modeling may be due in part to its (perceived) expense in terms of computation or storage. Therefore, we then introduce and evaluate alternative, hybrid approaches that allow users to balance between more accurate but computationally expensive reduced modeling and the other, less accurate but less computationally expensive treatments. The results show that the hybrid methods can scale gracefully to the amount of investment in computation/storage, and that they outperform imputation even for small investments
• Paul Tetlock, Maytal Saar-Tsechansky and Sofus Macskassy.(click to close/open)
“More Than Words: Quantifying Language to Measure Firms' Fundamentals”. Journal of Finance, 63, 1437-1467, 2008. (PDF)
We examine whether a simple quantitative measure of language can be used to predict individual firms’ accounting earnings and stock returns. Our three main findings are: (1) the fraction of negative words in firm-specific news stories forecasts low firm earnings; (2) firms’ stock prices briefly underreact to the information embedded in negative words; and (3) the earnings and return predictability from negative words is largest for the stories that focus on fundamentals. Together these findings suggest that linguistic media content captures otherwise hard-to-quantify aspects of firms’ fundamentals, which investors quickly incorporate into stock prices.
• Saar-Tsechansky Maytal and Provost Foster.(click to close/open)
“Decision-centric Active Learning of Binary-Outcome Models”, Information Systems Research, Vol. 18, No. 1, pp. 1–19, 2007. (PDF)
It can be expensive to acquire the data required for businesses to employ data-driven predictive modeling, for example to model consumer preferences to optimize targeting. Prior research has introduced “active learning” policies for identifying data that are particularly useful for model induction, with the goal of decreasing the statistical error for a given acquisition cost (error-centric approaches). However, predictive models are used as part of a decision-making process, and costly improvements in model accuracy do not always result in better decisions. This paper introduces a new approach for active data acquisition that targets decision-making specifically. The new decision-centric approach departs from traditional active learning by placing emphasis on acquisitions that are more likely to affect decision-making. We describe two different types of decision-centric techniques. Next, using direct-marketing data, we compare various data-acquisition techniques. We demonstrate that strategies for reducing statistical error can be wasteful in a decision-making context, and show that one decision-centric technique in particular can improve targeting decisions significantly. We also show that this method is robust in the face of decreasing quality of utility estimations, eventually converging to uniform random sampling, and that it can be extended to situations where different data acquisitions have different costs. The results suggest that businesses should consider modifying their strategies for acquiring information through normal business transactions. For example, a firm such as Amazon.com that models consumer preferences for customized marketing may accelerate learning by proactively offering recommendations—not merely to induce immediate sales, but for improving recommendations in the future.
• Saar-Tsechansky Maytal and Provost Foster. (click to close/open)
“Active Sampling for Class Probability Estimation and Ranking.” Machine Learning, 54:2, 153-178, 2004. PDF (Publisher's version)
In many cost-sensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very costly to obtain training data with class labels. Active learning acquires data incrementally, at each phase identifying especially useful additional data for labeling, and can be used to economize on examples needed for learning. We outline the critical features of an active learner and present a sampling-based active learning method for estimating class probabilities and class-based rankings. BOOTSTRAP-LV identifies particularly informative new data for learning based on the variance in probability estimates, and uses weighted sampling to account for a potential example’s informative value for the rest of the input space. We show empirically that the method reduces the number of data items that must be obtained and labeled, across a wide variety of domains. We investigate the contribution of the components of the algorithm and show that each provides valuable information to help identify informative examples. We also compare BOOTSTRAP-LV with UNCERTAINTY SAMPLING, an existing active learning method designed to maximize classification accuracy. The results show that BOOTSTRAP-LV uses fewer examples to exhibit a certain estimation accuracy and provide insights to the behavior of the algorithms. Finally, we experiment with another new active sampling algorithm drawing from both UNCERTAINTY SAMPLING and BOOTSTRAP-LV and show that it is significantly more competitive with BOOTSTRAP-LV compared to UNCERTAINTY SAMPLING. The analysis suggests more general implications for improving existing active sampling algorithms for classification.
Saar-Tsechansky Maytal, Pliskin Nava, Rabinowitz Gadi, and Porath Avi,
In this paper, we present the concept of relational patterns and our approach
to extract them from multiple relational tables. Relational patterns are
analogous to frequent itemsets extracted by the Apriori algorithm in the case of
a single table. However, for multiple relational tables relational patterns
capture co-occurrences of attributes as well as the relationships between these
attributes, which are essential to avoid information loss. We describe our
experiences from a test-bed implementation of our approach on a real hospital
discharge abstract database. This process raised challenges pertaining to an
analyst's ability to explore subtle patterns of significant importance.
Finally, we evaluate the usefulness of relational patterns in the context of the
discharge abstract data and discuss their application to support decisions in
Gary M. Weiss, Maytal Saar-Tsechansky, and Bianca Zadrozny.
Editorial: Special Issue On Utility-Based Data Mining. Data Mining and Knowledge Discovery, 17(2): 129-135, 2008.
Bianca Zadrozny, Gary. M. Weiss and Maytal Saar-Tsechansky.
Saar-Tsechansky Maytal, Pliskin Nava., Rabinowitz Gadi., Porath Avi, and Tsechansky Mark.
Andrea Godfrey, Leigh McAlister, and Maytal Saar-Tsechansky.
• Foster Provost, Prem Melville, and Maytal Saar-Tsechansky.
Data acquisition and cost-effective predictive modeling: targeting offers for electronic commerce. Invited paper. In the Proceedings of The Ninth International Conference on Electronic Commerce, Minneapolis, 2007.
• Duy Vu, Prem Melville, Mikhail Bilenko, and Maytal Saar-Tsechansky
“Intelligent Information Acquisition for Improved Clustering”, Workshop on Information Technologies and Systems (WITS), 2007.
• David Pardoe, Peter Stone, Maytal Saar-Tsechansky, and Kerem Tomak,
“Adaptive Mechanism Design: A Metalearning Approach”. In the Proceedings of The Eighth International Conference on Electronic Commerce, 2006.
• Prem Melville, Stewart M. Yang, Maytal Saar-Tsechansky, and Raymond J. Mooney.
“Active Learning for Probability Estimation using Jensen-Shannon Divergence”, The Proceedings of The 16th European Conference on Machine Learning (ECML), Porto, Portugal, 2005. 10% acceptance rate.
• Melville, P., Saar-Tsechansky, M., Provost, F. and Mooney, R.J.
An Expected Utility Approach to Active Feature-value Acquisition. The Proceedings of the Fifth International Conference on Data Mining (ICDM-2005). 13% acceptance rate.
• David Pardoe, Peter Stone, Maytal Saar-Tsechansky and Kerem Tomak.
Adaptive Auctions: Learning to Adjust to Bidders. Workshop on Information Technologies and Systems (WITS), 2005. 27% acceptance rate.
• Melville, P., Saar-Tsechansky, M., Provost, F. and Mooney, R.J.
Economical Active Feature-value Acquisition through Expected Utility Estimation. Proceedings of the KDD-05 Workshop on Utility-Based Data Mining, Chicago, IL, August 2005.
“Identifying Customer-Centric, Cross-Category Product Groups: A Product Segmentation Approach and its Relationship to Customer Segmentation Approaches”
As part of their customer management strategy, retailers with large, multi-category offerings need to present their products in ways that help target customers search and choose from those offerings. We propose a product segmentation approach that gives retailers a methodology for directly identifying customer-centric, cross-category, product segments from large numbers of products in multiple categories such that products within a segment are purchased by the same type of customers. In addition, we examines the relationship between the proposed product segmentation approach and a parallel customer segmentation approach. The close relationship between the approaches suggests that the segments of products and customers inferred from each approach will be equivalent. However, we show that this is not the case because of the aggregation constraint imposed on customers in the product segmentation approach and on products in the customer segmentation approach. In addition, we show that the product segmentation approach provides better recommendations of products for a customer to purchase, while the customer segmentation approach provides better recommendations of customers for a product to target.
While auction design has traditionally been
an analytic process that relies on theory-driven assumptions such as bidders’
rationality, bidders often exhibit unknown and variable behaviors. In this paper
we present a data-driven adaptive auction mechanism that capitalizes on key
properties of electronic auction markets, in particular the large transaction
volume, access to information and the capacity to dynamically alter the
mechanism’s design in order to acquire information about the benefits from
different designs so as to adapt the auction mechanism online in response to
actual bidders’ behaviors. Our auction mechanism does not require an explicit
representation of bidder behavior to infer about design profitability — a key
limitation of prior approaches when they address complex auction settings. Our
adaptive mechanism can also incorporate prior general knowledge of bidder
behavior to enhance the search for effective designs. The data-driven adaptation
and the capacity to employ prior knowledge render our mechanisms particularly
useful when there is uncertainty regarding bidders’ behaviors or when bidders’
behaviors change over time. Extensive empirical evaluations demonstrate that the
adaptive mechanism outperforms any single fixed mechanism design under a variety
of settings, including when bidders’ strategies evolved in response to the
seller’s adaptation; our mechanism’s performance is also more robust than that
of alternatives when prior general information about bidders’ behaviors differs
from the encountered behaviors.
Meghana Deodhar, Joydeep Ghosh, and Maytal Saar-Tsechansky.
Abstract: For effective prediction of consumers' ratings in large scale recommender systems, it is essential to have many customers rate a large number of products. However, consumers often do not provide their preferences without proper incentives. Given a budget to reward consumers for their feedback, it would be beneficial to have a policy to suggest which customers' ratings and for what products would be most cost-effective to acquire so as to improve the prediction of consumers ratings the most. In this paper we address this general challenge of pool-based active learning of regression models, particular induced from heterogeneous and dyadic data, commonly encountered in business. The policies we propose (a) leverage a collection of localized predictive models to estimate the benefit from each prospective acquisition, and (b) generalize to models with different loss functions, including a variety of classification and regression models. One of the policies effectively exploits the tradeoff between model complexity and the availability of data to increase the local modeling complexity as more data is acquired. We empirically evaluate the proposed policies on real-world data sets including the MovieLens recommender system data, and the ERIM customer purchase choice data. The empirical evaluations demonstrate that the active learning policies we propose yield better models and effectively improve prediction accuracy over existing alternatives.
In this paper we consider the problem of acquiring missing feature values that improve clustering quality in a cost-effective manner. We propose a framework for iteratively selecting feature values that result in highest improvements in clustering quality per unit cost. Our framework can be adapted to different clustering algorithms, and we illustrate it in the context of two popular methods, K-Means and hierarchical agglomerative clustering. Experimental results on several datasets demonstrate clustering accuracy improvements provided by the proposed framework over random acquisition. Additional experiments demonstrate the performance of the framework for different cost structures, and explore several alternative formulations of the acquisition strategy.