Journal of the American Statistical Association, Volume 104, Issue 488 – December 2009

Learn From Thy Neighbor: Parallel-Chain and Regional Adaptive MCMC
Radu V. Craiu, Jeffrey Rosenthal, Chao Yang. Journal of the American Statistical Association. December 1, 2009, 104(488): 1454-1466. doi:10.1198/jasa.2009.tm08393.

Starting with the seminal paper of Haario, Saksman, and Tamminen (Haario, Saksman, and Tamminen 2001), a substantial amount of work has been done to validate adaptive Markov chain Monte Carlo algorithms. In this paper we focus on two practical aspects of adaptive Metropolis samplers. First, we draw attention to the deficient performance of standard adaptation when the target distribution is multimodal. We propose a parallel chain adaptation strategy that incorporates multiple Markov chains which are run in parallel. Second, we note that the current adaptive MCMC paradigm implicitly assumes that the adaptation is uniformly efficient on all regions of the state space. However, in many practical instances, different “optimal” kernels are needed in different regions of the state space. We propose here a regional adaptation algorithm in which we account for possible errors made in defining the adaptation regions. This corresponds to the more realistic case in which one does not know exactly the optimal regions for adaptation. The methods focus on the random walk Metropolis sampling algorithm but their scope is much wider. We provide theoretical justification for the two adaptive approaches using the existent theory build for adaptive Markov chain Monte Carlo. We illustrate the performance of the methods using simulations and analyze a mixture model for real data using an algorithm that combines the two approaches.


Empirical Likelihood in Missing Data Problems
Jing Qin, Biao Zhang, Denis H. Y. Leung. Journal of the American Statistical Association. December 1, 2009, 104(488): 1492-1503. doi:10.1198/jasa.2009.tm08163.

Missing data is a ubiquitous problem in medical and social sciences. It is well known that inferences based only on the complete data may not only lose efficiency, but may also lead to biased results if the data is not missing completely at random (MCAR). The inverse-probability weighting method proposed by Horvitz and Thompson (1952) is a popular alternative when the data is not MCAR. The Horvitz–Thompson method, however, is sensitive to the inverse weights and may suffer from loss of efficiency. In this paper, we propose a unified empirical likelihood approach to missing data problems and explore the use of empirical likelihood to effectively combine unbiased estimating equations when the number of estimating equations is greater than the number of unknown parameters. One important feature of this approach is the separation of the complete data unbiased estimating equations from the incomplete data unbiased estimating equations. The proposed method can achieve semiparametric efficiency if the probability of missingness is correctly specified. Simulation results show that the proposed method has better finite sample performance than its competitors. Supplemental materials for this paper, including proofs of the main theoretical results and the R code used for the NHANES example, are available online on the journal website.


Nonparametric Bayes Conditional Distribution Modeling With Variable Selection
Yeonseung Chung, David B. Dunson. Journal of the American Statistical Association. December 1, 2009, 104(488): 1646-1660. doi:10.1198/jasa.2009.tm08302.

This article considers a methodology for flexibly characterizing the relationship between a response and multiple predictors. Goals are (1) to estimate the conditional response distribution addressing the distributional changes across the predictor space, and (2) to identify important predictors for the response distribution change both within local regions and globally. We first introduce the probit stick-breaking process (PSBP) as a prior for an uncountable collection of predictor-dependent random distributions and propose a PSBP mixture (PSBPM) of normal regressions for modeling the conditional distributions. A global variable selection structure is incorporated to discard unimportant predictors, while allowing estimation of posterior inclusion probabilities. Local variable selection is conducted relying on the conditional distribution estimates at different predictor points. An efficient stochastic search sampling algorithm is proposed for posterior computation. The methods are illustrated through simulation and applied to an epidemiologic study.


What Are the Limits of Posterior Distributions Arising From Nonidentified Models, and Why Should We Care?
Paul Gustafson. Journal of the American Statistical Association. December 1, 2009, 104(488): 1682-1695. doi:10.1198/jasa.2009.tm08603.

In health research and other fields, the observational data available to researchers often fall short of the data that ideally would be available, due to the inherent limitations of study design and data acquisition. Were they available, these ideal data might be readily analyzed via straightforward statistical models with such desirable properties as parameter identifiability. Conversely, realistic models for the available data that incorporate uncertainty about the link between ideal and available data may be nonidentified. While there is no conceptual difficulty in implementing Bayesian analysis with nonidentified models and proper prior distributions, it is important to know to what extent data can be informative about parameters of interest. Determining the large-sample limit of the posterior distribution is one way to characterize the informativeness of data. In some nonidentified models, it is relatively straightforward to determine the limit via a particular reparameterization of the model; however, in other nonidentified models there is no such obvious approach. Thus we have developed an algorithm for determining the limiting posterior distribution for at least some such more difficult models. The work is motivated by two specific nonidentified models that arise quite naturally, and the algorithm is applied to reveal how informative the data are for these models. This article has supplementary material online.

Advertisements

About Leonardo de Oliveira Martins

I am currently a postdoc working in David Posada's lab at the University of Vigo, Spain. I did my Ph.D. at the Universty of Tokyo, and have both an M.Sc. in Biotechnology and a B.Sc. in Molecular Sciences completed at the University of Sao Paulo, Brasil.
This entry was posted in Abstracts, New Publications and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s