Группа авторов

Applied Modeling Techniques and Data Analysis 2


Скачать книгу

      This book is a collective work by a number of leading scientists, analysts, engineers, mathematicians and statisticians, who have been working on the front end of data analysis. The chapters included in this collective volume represent a cross-section of current concerns and research interests in the above-mentioned scientific areas. This volume is divided into two parts with a total of 17 chapters in a form that provides the reader with both theoretical and applied information on data analysis methods, models and techniques, along with appropriate applications.

      Part 2 covers the area of applied stochastic and statistical models and methods and comprises eight chapters: Chapter 10, “The Double Flexible Dirichlet: A Structured Mixture Model for Compositional Data”, by Roberto Ascari, Sonia Migliorati and Andrea Ongaro; Chapter 11, “Quantization of Transformed Lévy Measures”, by Mark Anthony Caruana; Chapter 12, “A Flexible Mixture Regression Model for Bounded Multivariate Responses”, by Agnese M. Di Brisco and Sonia Migliorati; Chapter 13, “On Asymptotic Structure of the Critical Galton-Watson Branching Processes with Infinite Variance and Allowing Immigration”, by Azam A. Imomov and Erkin E. Tukhtaev; Chapter 14, “Properties of the Extreme Points of the Joint Eigenvalue Probability Density Function of the Wishart Matrix”, by Asaph Keikara Muhumuza, Karl Lundengård, Sergei Silvestrov, John Magero Mango and Godwin Kakuba; Chapter 15, “Forecast Uncertainty of the Weighted TAR Predictor”, by Francesco Giordano and Marcella Niglio; Chapter 16, “Revisiting Transitions Between Superstatistics”, by Petr Jizba and Martin Prokš; Chapter 17, “Research on Retrial Queue with Two-Way Communication in a Diffusion Environment”, by Viacheslav Vavilov.

      Yannis DIMOTIKALIS

      Crete, Greece

      Alex KARAGRIGORIOU

      Samos, Greece

      Christina PARPOULA

      Athens, Greece

      Christos H. SKIADAS

      Athens, Greece

      December 2020

PART 1 Financial and Demographic Modeling Techniques

      1

      Data Mining Application Issues in the Taxpayer Selection Process

      This chapter provides a data analysis framework designed to build an effective learning scheme aimed at improving the Italian Revenue Agency’s ability to identify non-compliant taxpayers, with special regard to self-employed individuals allowed to keep simplified registers. Our procedure involves building two C4.5 decision trees, both trained and validated on a sample of 8,000 audited taxpayers, but predicting two different class values, based on two different predictive attribute sets. That is, the first model is built in order to identify the most likely non-compliant taxpayers, while the second identifies the ones that are are less likely to pay the additional due tax bill. This twofold selection process target is needed in order to maximize the overall audit effectiveness. Once both models are in place, the taxpayer selection process will be held in such a way that businesses will only be audited if they are judged as worthy by both models. This methodology will soon be validated on real cases: that is, a sample of taxpayers will be selected according to the classification criteria developed in this chapter and will subsequently be involved in some audit processes.

      This chapter contributes to the empirical literature on the development of classification models applied to the tax evasion field, presenting a case study that focuses on a dataset of 8,000 audited taxpayers on the fiscal year 2012, each of them described by a set of features, concerning, among others, their tax returns, their properties and their tax notice.1

      In this context, all the