Recommender systems learn from historical data that is often non-uniformly distributed across items, so they may end up suggesting popular items more than niche items. This can hamper user interest and several qualities of the recommended lists (e.g., novelty, coverage, diversity), impacting on the future success of the platform. In this paper, we formalize two novel metrics that quantify how much a recommender system equally treats items along the popularity tail. The first one encourages equal probability of being recommended across items, while the second one encourages true positive rates for items to be equal. Then, we characterize the recommendations of representative algorithms with respect to the proposed metrics, and we show that the item probability of being recommended and the item true positive rate are directly proportional to the item popularity. To mitigate the influence of popularity, we propose an in-processing approach aimed at minimizing the correlation between user-item relevance and item popularity, leading to a more equal treatment of items along the popularity tail. Extensive experiments show that, with small losses in accuracy, our popularity-debiasing approach leads to important gains in beyond-accuracy recommendation quality.