Группа авторов

Applied Modeling Techniques and Data Analysis 2


Скачать книгу

we can do, then, is use the two models “together”. For instance, we could exploit the first model in order to sort the taxpayers eligible to be selected and the second one to discard the ones likely to be subject to coercive procedures.

Graph depicts the coercive procedures rates.

      This is just an example and it is not the only way we can combine the two models. Indeed, there is space for policymakers to exploit the two models in different ways, depending on the kind of tradeoff choices they may want to reach, concerning the two goals of the audit process: its profitability and its tax collectability. For instance, a selection process could only be targeted towards interesting taxpayers and taxpayers without payment issues.

      Anyway, does the tradeoff we have sketched above work?

      In our case, thus, with the ensemble model, we would claim, on average, € 26,219 from the selected taxpayers and we would hopefully collect, on average, € 17,542 from each of them, of whom only 25% are predicted to incur in coercive procedures.

Graph depicts the total tax claim. Graph depicts the average tax claim. Graph depicts the coercive procedures rate.

      In a hypothetical selection process, the winning strategy would then be to use the ensemble model, since it maximizes the collectable tax claim.

      To satisfy our interest, we should depict the two models’ behavior as a function of the unknown parameters, θ’ and θ”, respectively; that is, we should calculate the expected tax revenues amounts for any value of θ’ and θ”. Unfortunately, this cannot be done. To understand why, suppose that for both models, only one of the selected taxpayers turns out to be subject to coercive procedures. If this taxpayer’s debt is high, the amount of money that is difficult to collect would be high, but if his debt is low, then the uncollected tax would also be low.

Graph depicts the models maximum and minimum collected tax.

      The first model’s maximum and minimum values are represented by the red and orange lines, while the ensemble model’s are the blue and purple ones. Any point within the red and orange lines represents a possible outcome for the first model and any point within the blue and purple lines represents one possible outcome for the ensemble model. For instance, points A and B represent the outcomes of our models (the first and the ensemble, respectively), given our training and test sets.

      Having to deal with two areas means that the models’ behavior is determined not only by θ’ and θ”, but also by the kind of taxpayers that go through a coercive procedure. If we could shrink the areas between the red and orange lines and the blue and purple ones, we could be put in a better shape.

      How could we do this? Well, if we turn back to points A and B in Figure 1.14, and we draw two dashed vertical lines from them, we can see that the first is nearer to the minimum line of its model (since line image is shorter than line image), while the other is nearer to the maximum one (since line image is shorter than line image).