Biometrical Letters

ISSN:1896-3811

Submit manuscript
Volume (56) Number 2 pp. 253-261

Ewa Skotarczak 1, Anita Dobek 1, Krzysztof Moliński 1

1Department of Mathematical and Statistical Methods, Poznań University of Life Sciences, 60-637, Poznań, Poland

Comparison of some correlation measures for continuous and categorical data

Summary

In the literature there can be found a wide collection of correlation and association coefficients used for different structures of data. Generally, some of the correlation coefficients are conventionally used for continuous data and others for categorical or ordinal observations. The aim of this paper is to verify the performance of various approaches to correlation coefficient estimation for several types of observations. Both simulated and real data were analysed. For continuous variables, Pearson’s r2 and MIC were determined, whereas for categorized data three approaches were compared: Cramér’s V, Joe’s estimator, and the regression-based estimator. Two method of discretization for continuous data were used. The following conclusions were drawn: the regression-based approach yielded the best results for data with the highest assumed r2 coefficient, whereas Joe’s estimator was the better approximation of true correlation when the assumed r2 was small; and the MIC estimator detected the maximal level of dependency for data having a quadratic relation. Moreover, the discretization method applied to data with a non-linear dependency can cause loss of dependency information. The calculations were supported by the R packages arules and minerva.

Keywords: correlation, mutual information, contingency table

DOI: 10.2478/bile-2019-0015

For citation:

MLA Skotarczak, Ewa, et al. "Comparison of some correlation measures for continuous and categorical data." Biometrical Letters 56.2 (2019): 253-261. DOI: 10.2478/bile-2019-0015
APA Skotarczak, E., Dobek, A., & Moliński, K. (2019). Comparison of some correlation measures for continuous and categorical data. Biometrical Letters 56(2), 253-261 DOI: 10.2478/bile-2019-0015
ISO 690 SKOTARCZAK, Ewa, DOBEK, Anita, MOLIńSKI, Krzysztof. Comparison of some correlation measures for continuous and categorical data. Biometrical Letters, 2019, 56.2: 253-261. DOI: 10.2478/bile-2019-0015