Nonparametric criteria for sparse contingency tables

Pavel Samusenko

Doctoral dissertation

Dissertations are not being sold



In the dissertation, the problem of nonparametric testing for sparse contingency tables is addressed. Statistical inference problems caused by sparsity of contingency tables are widely discussed in the literature. Traditionally, the expected (under null the hypothesis) frequency is required to exceed 5 in almost all cells of the contingency table. If this condition is violated, the χ2 approximations of goodness-of-fit statistics may be inaccurate and the table is said to be sparse . Several techniques have been proposed to tackle the problem: exact tests, alternative approximations, parametric and nonparametric bootstrap, Bayes approach and other methods. However they all are not applicable or have some limitations in nonparametric statistical inference of very sparse contingency tables.

In the dissertation, it is shown that, for sparse categorical data, the likelihood ratio statistic and Pearson’s χ2 statistic may become noninformative: they do not anymore measure the goodness-of-fit of null hypotheses to data. Thus, they can be inconsistent even in cases where a simple consistent test does exist. An improvement of the classical criteria for sparse contingency tables is proposed. The improvement is achieved by grouping and smoothing of sparse categorical data by making use of a new sparse asymptotics model relying on (extended) empirical Bayes approach. Under general conditions, the consistency of the proposed criteria based on grouping is proved. Finite-sample behavior of the criteria is investigated via Monte Carlo simulations.

Read electronic version of the book:


Book details

Data sheet

Imprint No:
145×205 mm
116 p.
16 other books in the same category:

Follow us on Facebook