Optimal Labor Income Taxation PDF

CHAPTER 7 Optimal Labor Income Taxation Thomas Piketty∗ and Emmanuel Saez†,‡ * Paris School of Economics, Paris, France † Department of Economics, University of California, 530 Evans Hall #3880, Berkeley, CA 94720, USA ‡ National Bureau of Economic Research, USA Contents 1. Introduction 392 2. Background on Actual Tax Systems and Optimal Tax Theory 395 2.1. Actual Tax Systems 395 2.2. History of the Field of Optimal Income Taxation 402 3. Conceptual Background 404 3.1. Utilitarian Social Welfare Objective 404 3.2. Fallacy of the Second Welfare Theorem 407 3.3. Labor Supply Concepts 409 4. Optimal Linear Taxation 411 4.1. Basic Model 411 4.2. Accounting for Actual Tax Rates 415 4.3. Tax Avoidance 418 4.4. Income Shifting 420 5. Optimal Nonlinear Taxation 422 5.1. Optimal Top Tax Rate 423 5.1.1. Standard Model 423 5.1.2. Rent-Seeking Eﬀects 426 5.1.3. International Migration 429 5.1.4. Empirical Evidence on Top Incomes and Top Tax Rates 431 5.2. Optimal Nonlinear Schedule 435 5.2.1. Continuous Model of Mirrlees 435 5.2.2. Discrete Models 439 5.3. Optimal Profile of Transfers 440 5.3.1. Intensive Margin Responses 440 5.3.2. Extensive Margin Responses 443 5.3.3. Policy Practice 445 6. Extensions 447 6.1. Tagging 447 6.2. Supplementary Commodity Taxation 448 6.3. In-Kind Transfers 450 6.4. Family Taxation 452 6.5. Relative Income Concerns 455 6.6. Other Extensions 456 Handbook of Public Economics, Volume 5 © 2013 Elsevier B.V. ISSN 1573-4420, http://dx.doi.org/10.1016/B978-0-444-53759-1.00007-8 All rights reserved. 391 392 Thomas Piketty and Emmanuel Saez 7. Limits of the Welfarist Approach and Alternatives 461 7.1. Issues with the Welfarist Approach 461 7.2. Alternatives 461 Appendix 467 A.1 Formal Derivation of the Optimal Nonlinear Tax Rate 467 A.2 Optimal Bottom Tax Rate in the Mirrlees Model 469 Acknowledgments 471 References 471 1. INTRODUCTION This handbook chapter considers optimal labor income taxation, that is, the fair and efficient distribution of the tax burden across individuals with different earnings. A large academic literature has developed models of optimal tax theory to cast light on this issue. Models in optimal tax theory typically posit that the tax system should maximize a social welfare function subject to a government budget constraint, taking into account how individuals respond to taxes and transfers. Social welfare is larger when resources are more equally distributed, but redistributive taxes and transfers can negatively affect incentives to work and earn income in the first place. This creates the classical trade-off between equity and efficiency which is at the core of the optimal labor income tax problem. In this chapter, we present recent developments in the theory of optimal labor income taxation. We emphasize connections between theory and empirical work that were pre- viously largely absent from the optimal income tax literature. Therefore, throughout the chapter, we focus less on formal modeling and rigorous derivations than was done in previous surveys on this topic (Atkinson & Stiglitz, 1980; Kaplow, 2008; Mirrlees (1976, 1986, chap. 24); Stiglitz, 1987, chap. 15; Tuomala, 1990) and we try to systematically con- nect the theory to both real policy debates and empirical work on behavioral responses to taxation.1 This chapter limits itself to the analysis of optimal labor income taxation and related means-tested transfers.2 First, we provide historical and international background on labor income taxation and transfers. In our view, knowing actual tax systems and understanding their history and the key policy debates driving their evolution is critical to guide theoretical modeling and successfully capture the first order aspects of the optimal tax problem. We also briefly review the history of the field of optimal labor income taxation to place our chapter in its academic context. Second, we review the theoretical underpinnings of the standard optimal income tax approach, such as the social welfare function, the fallacy of the second welfare theorem, and hence the necessity of tackling the equity-efficiency trade off. We also present the 1 Boadway (2012) also provides a recent, longer, and broader survey that aims at connecting theory to practice. 2 The analysis of optimal capital income taxation naturally involves dynamic considerations and is covered in the chapter by Kopczuk in this volume. Optimal Labor Income Taxation 393 key parameters capturing labor supply responses as they determine the efficiency costs of taxation and hence play a crucial role in optimal tax formulas. Third,we present the simple model of optimal linear taxation. Considering linear labor income taxation simplifies considerably the exposition but still captures the key equity- efficiency trade-off. The derivation and the formula for the optimal linear tax rate are also closely related to the more complex nonlinear case, showing the tight connection between the two problems.The linear tax model also allows us to consider extensions such as tax avoidance and income shifting, random earnings, and median voter tax equilibria in a simpler way. Fourth, we consider optimal nonlinear income taxation with particular emphasis on the optimal top tax rate and the optimal profile of means-tested transfers at the bottom. We consider several extensions including extensive labor supply responses, international migration, or rent-seeking models where pay differs from productivity. Fifth, we consider additional deeper extensions of the standard model including tag- ging (i.e., conditioning taxes and transfers on characteristics correlated with ability to earn), the use of differential commodity taxation to supplement the income tax, the use of in-kind transfers (instead of cash transfers), the treatment of couples and children in tax and transfer systems, or models with relative income concerns. Many of those extensions cannot be satisfactorily treated within the standard utilitarian social welfare approach. Hence, in a number of cases, we present the issues only heuristically and leave formal full-fledged modeling to future research. Sixth and finally, we come back to the limitations of the standard utilitarian approach that have appeared throughout the chapter. We briefly review the most promising alter- natives. While many recent contributions use general Pareto weights to avoid the strong assumptions of the standard utilitarian approach, the Pareto weight approach is too gen- eral to deliver practical policy prescriptions in most cases. Hence, it is important to make progress both on normative theories of justice stating how social welfare weights should be set and on positive analysis of how individual views and beliefs about redistribution are formed. Methodologically, a central goal of optimal tax analysis should be to cast light on actual tax policy issues and help design better tax systems. Theory and technical deriva- tions are very valuable to rigorously model the problem at hand. A key aim of this chapter is to show how to make such theoretical findings applicable. As argued in Diamond and Saez (2011), theoretical results in optimal tax analysis are most useful for policy recommendations when three conditions are met. (1) Results should be based on economic mechanisms that are empirically relevant and first order to the problem at hand. (2) Results should be reasonably robust to modeling assumptions and in particular to the presence of heterogeneity in individual preferences. (3)The tax policy prescription needs to be implementable—that is, the tax policy needs to be relatively easy to explain 394 Thomas Piketty and Emmanuel Saez and discuss publicly, and not too complex to administer relative to actual practice.3 Those conditions lead us to adopt two methodological choices. First, we use the “sufficient statistics” approach whereby optimal tax formulas are derived and expressed in terms of estimable statistics including social marginal welfare weights capturing society’s value for redistribution and labor supply elasticities capturing the efficiency costs of taxation (see Chetty, 2009a for a recent survey of the “sufficient statistics” approach in public economics). This approach allows us to understand the key economic mechanisms behind the formulas, helping meet condition (1). The “sufficient statistics” formulas are also often robust to change the primitives of the model, which satisfies condition (2). Second, we tend to focus on simple tax structures—e.g., a linear income tax—without systematically trying to derive the most general tax system possible. This helps meet condition (3) as the tax structures we obtain will by definition be within the realm of existing tax structures.4 This is in contrast to the “mechanism design” approach that derives the most general optimum tax compatible with the informational structure. This “mechanism design” approach tends to generate tax structures that are highly complex and results that are sensitive to the exact primitives of the model. The mechanism design approach has received renewed interest in the new dynamic public finance literature that focuses primarily on dynamic aspects of taxation.5 The chapter is organized as follows. Section 2 provides historical and international background on labor income taxation and means-tested transfers, and a short review of the field of optimal labor income taxation. Section 3 presents the key concepts: the stan- dard utilitarian social welfare approach, the fallacy of the second welfare theorem, and the key labor supply concepts. Section 4 discusses the optimal linear income tax prob- lem. Section 5 presents the optimal nonlinear income taxation problem with particular emphasis on the optimal top tax rate and the optimal profile of means-tested transfers. Section 6 considers a number of extensions. Section 7 discusses limits of the standard utilitarian approach. 2. BACKGROUND ON ACTUAL TAX SYSTEMS AND OPTIMAL TAX THEORY 2.1. Actual Tax Systems Taxes. Most advanced economies in the OECD raise between 35% and 50% of national income (GNP net of capital depreciation) in taxes. As a first approximation, the share 3 Naturally, the set of possible tax systems evolves overtime with technological progress. If more complex tax innovations become feasible and can realistically generate large welfare gains, they are certainly worth considering. 4 The simple tax structure approach also helps with conditions (1) and (2) as the economic trade-offs are simpler and more transparent, and the formulas for simple tax structures tend to easily generalize to heterogeneous populations. 5 See Golosov,Tsyvinski, and Werning (2006) and Kocherlakota (2010) for recent surveys of the new dynamic public finance literature. Piketty and Saez (2012a,b) analyze the problem optimal taxation of capital and inheritances in a dynamic model but using a sufficient statistics approach and focusing on simple tax structures. Optimal Labor Income Taxation 395 of total tax burden falling on capital income roughly corresponds to the share of capital income in national income (i.e., about 25%).6 The remaining 75% of taxes falls on labor income (OECD 2011a),7 which is the part we are concerned with in this chapter. Historically, the overall tax to national income ratio has increased substantially during the first part of the 20th century in OECD countries from about 10% on average around 1900 to around 40% by 1970 (see e.g., Flora, 1983 for long time series up to 1975 for a number of Western European countries and OECD, Revenue Statistics, OECD, 2011a for statistics since 1965). Since the late 1970s, the tax burden in OECD countries has been roughly stable.The share of taxes falling on capital income has declined slightly in Europe and has been approximately stable in the United States.8 Similar to the historical evolu- tion, tax revenue to national income ratios increase with GDP per capita when looking at the current cross-section of countries. Tax to national income ratios are smaller in less developed and developing countries and higher on average among the most advanced economies. To a first approximation, the tax burden is distributed proportionally to income. Indeed, the historical rise in the tax burden has been made possible by the ability of the government to monitor income flows in the modern economy and hence impose payroll taxes, profits taxes, income taxes, and value-added-taxes, based on the corresponding income and consumption flows. Before the 20th century, the government was largely limited to property and presumptive taxes, and taxes on a few specific goods for which transactions were observable. Such archaic taxes severely limited the tax capacity of the government and tax to national income ratios were low (see Ardant, 1971 and Webber & Wildavsky, 1986 for a detailed history of taxation). The transition from archaic to broad-based taxes involves complex political and administrative processes and may occur at various speeds in different countries.9 In general, actual tax systems achieve some tax progressivity, i.e., tax rates rising with income, through the individual income tax. Most individual income tax systems have brackets with increasing marginal tax rates. In contrast, payroll taxes or consumption taxes tend to have flat rates. Most OECD countries had very progressive individual income 6 This is defining taxes on capital as the sum of property and wealth taxes, inheritance and gift taxes, taxes of corporate and business profits, individual income taxes on individual capital income, and the share of consumption taxes falling on capital income. Naturally, there are important variations over time and across countries in the relative importance of these various capital tax instruments. See e.g., Piketty and Saez (2012a). 7 Including payroll taxes, individual income tax on labor income, and the share of consumption taxes falling on labor income. 8 Again, there are important variations in capital taxes which fall beyond the scope of this chapter. In particular, corporate tax rates have declined significantly in Europe since the early 1990s (due to tax competition), but tax revenues have dropped only slightly, due to a global rise in the capital share, the causes of which are still debated. See e.g., Eurostat (2012). 9 See e.g., Piketty and Qian (2009) for a contrast between China (where the income tax is about to become a mass tax, like in developed countries) and India (where the income tax is still very much an elite tax raising limited revenue). Cagé and Gadenne (2012) provide a comprehensive empirical analysis of the extent to which low- and middle-income countries were able to replace declining trade tax revenues by modern broad based taxes since the 1970s. See Kleven, Kreiner and Saez (2009b) for a theoretical model of the fiscal modernization process. 396 Thomas Piketty and Emmanuel Saez Top Individual Income Marginal Tax Rates 1900-2011 100% 90% 80% 70% 60% 50% 40% U.S. 30% U.K. 20% France 10% Germany 0% 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 2010 Figure 1 Top Marginal income tax rates in the US, UK, France, Germany. This ﬁgure, taken from Piketty et al. (2011), depicts the top marginal individual income tax rate in the US, UK, France, Germany since 1900. The tax rate includes only the top statutory individual income tax rate applying to ordinary income with no tax preference. State income taxes are not included in the case of the United States. For France, we include both the progressive individual income tax and the ﬂat rate tax “Contribution Sociale Généralisée.” taxes in the post-World War II decades with a large number of tax brackets and high top tax rates (see e.g., OECD, 1986). Figure 1 depicts top marginal income tax rate in the United States, the United Kingdom, France, and Germany since 1900. When progressive income taxes were instituted—around 1900–1920 in most developed countries, top rates were very small—typically less than 10%. They rose very sharply in the 1920–1940s, particularly in the US and in the UK. Since the late 1970s, top tax rates on upper income earners have declined significantly in many OECD countries, again particularly in English speaking countries. For example,the US top marginal federal individual tax rate stood at an astonishingly high 91% in the 1950–1960s but is only 35% today (Figure 1). Progressivity at the very top is often counter balanced by the fact that a substantial fraction of capital income receives preferential tax treatment under most income tax rules.10 10 For example, (Landais, Piketty, and Saez (2011)) show that tax rates decline at the very top of the French income distribution because of such preferential tax treatment and of various tax loopholes and fiscal optimization strategies. In the United States as well, income tax rates decline at the very top due to the preferential treatment of realized capital gains which constitute a large fraction of top incomes (US Treasury, 2012). See Piketty and Saez (2007) for an analysis of progressivity of the federal tax system since 1960. Note that preferential treatment for capital income did not exist when modern income taxes were created in 1900–1920. Preferential treatment was developed mostly in the Optimal Labor Income Taxation 397 Table 1 Public Spending in OECD Countries (2000–2010, Percent of GDP) US Germany France UK Total OECD (1) (2) (3) (4) (5) Total public spending 35.4% 44.1% 51.0% 42.1% 38.7% Social public spending 22.4% 30.6% 34.3% 26.2% 25.1% Education 4.7% 4.4% 5.2% 4.8% 4.9% Health 7.7% 7.8% 7.1% 6.1% 5.6% Pensions 6.0% 10.1% 12.2% 4.8% 6.5% Income support to working age 2.7% 3.9% 4.8% 4.9% 4.4% Other social public spending 1.3% 4.4% 5.1% 5.7% 3.7% Other public spending 13.0% 13.5% 16.7% 15.9% 13.6% Notes and sources: OECD Economic Outlook 2012, Annex Tables 25–31; Adema et al., 2011, Table 1.2; Education at a Glance, OECD 2011, Table B4.1. Total public spending includes all government outlays (except net debt interest payments). Other social public spending includes social services to the elderly and the disabled, family services, housing and other social policy areas (see Adema et al., 2011, p.21). We report 2000–2010 averages so as to smooth business cycle variations. Note that tax to GDP ratios are a little bit lower than spending to GDP ratios for two reasons: (a) governments typically run budget deficits (which can be large, around 5–8 GDP points during recessions), (b) governments get revenue from non-tax sources (such as user fees, profits from government owned firms, etc.). As we shall see, optimal nonlinear labor income tax theory derives a simple formula for the optimal tax rate at the top of the earnings distribution. We will not deal however with the dynamic redistributive impact of tax progressivity through capital and wealth taxation, which might well have been larger historically than its static impact, as suggested by the recent literature on the long run evolution of top income shares.11 Transfers.The secular rise in taxes has been used primarily to fund growing public goods and social transfers in four broad areas:education,health care,retirement and disability,and income security (see Table 1). Indeed, aside from those four areas, government spending (as a fraction of GDP) has not grown substantially since 1900. All advanced economies provide free public education at the primary and secondary level, and heavily subsidized (and often almost free) higher education.12 All advanced economies except the United States provide universal public health care (the United States provides public health care postwar period in order to favor savings and reconstruction, and then extended since the 1980–1990s in the context of financial globalization and tax competition. For a detailed history in the case of France, see Piketty (2001). 11 See Atkinson, Piketty and Saez (2011) for a recent survey. One of the main findings of this literature is that the historical decline in top income shares that occurred in most countries during the first half of the twentieth century has little to do with a Kuznets-type process. It was largely due to the fall of top capital incomes, which apparently never fully recovered from the 1914–1945 shocks, possibly because of the rise of progressive income and estate taxes and their dynamic impact of savings, capital accumulation and wealth concentration. 12 Family benefits can also be considered as part of education spending. Note that the boundaries between the various social spending categories reported on Table 1 are not entirely homogenous across OECD countries (e.g., family benefits are split between “Income support to the working age” and “Other social public spending”). Also differences 398 Thomas Piketty and Emmanuel Saez to the old and the poor through the Medicare and Medicaid programs respectively, which taken together happen to be more expensive than most universal health care systems), as well as public retirement and disability benefits. Income security programs include unemployment benefits, as well as an array of means-tested transfers (both cash and in- kind). They are a relatively small fraction of total transfers (typically less than 5% of GDP, out of a total around 25–35% of GDP for social spending as a whole; see Table 1). Education, family benefits, and health care government spending are approximately a demogrant, that is, a transfer of equal value for all individuals in expectation over a life- time.13 In contrast, retirement benefits are approximately proportional to lifetime labor income in most countries.14 Finally, income security programs are targeted to lower income individuals. This is therefore the most redistributive component of the transfer system. Income security programs often take the form of in-kind benefits such as subsi- dized housing, subsidized food purchases (e.g., food stamps and free lunches at school in the United States),or subsidized health care (e.g.,Medicaid in the United States).They are also often targeted to special groups such as the unemployed (unemployment insurance), the elderly or disabled with no resources (for example Supplemental Security Income in the United States). Means-tested cash transfer programs for “able bodied” individuals are only a small fraction of total transfers. To a large extent, the rise of the modern welfare state is the rise of universal access to “basic goods” (education, health, retirement and social insurance), and not the rise of cash transfers (see e.g., Lindert, 2004).15 In recent years, traditional means-tested cash welfare programs have been partly replaced by in-work benefits. The shift has been particularly large in the United States and the United Kingdom. Traditional means-tested programs are L-shaped with income. They provide the largest benefits to those with no income and those benefits are then phased-out at high rates for those with low earnings. Such a structure concentrates benefits among those who need them most. At the same time and as we shall see, these phase-outs discourage work as they create large implicit taxes for low earners. In contrast, in-work benefits are inversely U-shaped, first rising and then declining with earnings. Benefits are nil for those with no earnings and concentrated among low earners before being phased-out. Such a structure encourages work but fails to provide support to those with no earnings, arguably those most in need of support. in tax treatment of transfers further complicate cross country comparisons. Here we simply care about the broad orders of magnitude. For a detailed cross-country analysis, see Adema, Fron, and Ladaique (2011). 13 Naturally, higher income individuals are often better able to navigate the public education and health care systems and hence tend to get a better value out of those benefits than lower income individuals. However, the value of those benefits certainly grows less than proportionally to income. 14 In most countries, benefits are proportional to payroll tax contributions. Some countries—such as the United Kingdom—provide a minimum pension that is closer to a demogrant. 15 It should be noted that the motivation behind the historical rise of these public services has to do not only with redistributive objectives, but also with the perceived failure of competitive markets in these areas (e.g., regarding the provision of health insurance or education). We discuss issues of individual and market failures in Section 6 below. Optimal Labor Income Taxation 399 Overall, all transfers taken together are fairly close to a demogrant, i.e., are about con- stant with income. Hence, the optimal linear tax model with a demogrant is a reasonable first order approximation of actual tax systems and is useful to understand how the level of taxes and transfers should be set. At a finer level,there is variation in the profile of transfers. Such a profile can be analyzed using the more complex nonlinear optimal tax models. Budget Set. The budget set relating pre-tax and pre-transfers earnings to post-tax post- transfer disposable income summarizes the net impact of the tax and transfer system. $50,000 45 Degree Line $40,000 US France Disposable income $30,000 $20,000 $10,000 $0 $0 $10,000 $20,000 $30,000 $40,000 $50,000 Gross Earnings (with employer payroll taxes) Figure 2 Tax/transfer system in the US and France, 2010, single parent with two children. The ﬁgure depicts the budget set for a single parent with two children in France and the United States (exchange rate 1 Euro = $1.3). The ﬁgure includes payroll taxes and income taxes on the tax side. It includes means-tested transfer programs (TANF and Food stamps in the United States, and the minimum income–RSA for France) and tax credits (the Earned Income Tax Credit and the Child Tax Credit in the United States, in-work beneﬁt Prime pour l’Emploi and cash family beneﬁts in France). Note that this graph ignores important elements. First, the health insurance Medicaid program in the United States is means tested and adds a signiﬁcant layer of implicit taxation on low income work. France oﬀers universal health insurance which does not create any additional implicit tax on work. Second, the graph ignores in-kind beneﬁts for children such as subsidized child care and free pre-school kindergarten in France that have signiﬁcant value for working single parents. Such programs barely exist in the United States. Third, the graph ignores temporary unemployment insurance beneﬁts which depend on previous earnings for those who have become recently unemployed and which are signiﬁcantly more generous in France both in level and duration. 400 Thomas Piketty and Emmanuel Saez The slope of the budget set captures the marginal incentive to work. Figure 2 depicts the budget set for a single parent with two children in France and the United States. The figure includes all payroll taxes and the income tax, on the tax side. It includes means-tested transfer programs (TANF and Food Stamps in the United States, and the minimum income—RSA for France) and tax credits (the Earned Income Tax Credit and the Child Tax Credit in the United States, in-work benefit Prime pour l’Emploi and cash family benefits in France). France offers more generous support to single parents with no earnings but the French tax and transfer system imposes higher implicit taxes on work.16 As mentioned above, optimal nonlinear income tax theory precisely tries to assess what is the most desirable profile for taxes and transfers. Policy Debate. At the center of the political debate on labor income taxation and trans- fers is the equity-efficiency trade off.The key argument in favor of redistribution through progressive taxation and generous transfers is that social justice requires the most successful to contribute to the economic well being of the less fortunate. The reasons why society values such redistribution from high to low incomes are many. As we shall see,the standard utilitarian approach posits that marginal utility of consumption decreases with income so that a more equal distribution generates higher social welfare. Another and perhaps more realistic reason is that differences in earnings arise not only from differences in work behavior (over which individuals have control) but also from differences in innate ability or family background or sheer luck (over which individuals have little control). The key argument against redistribution through taxes and transfers is efficiency. Taxing the rich to fund means-tested programs for the poor reduces the incentives to work both among the rich and among transfer recipients. In the standard optimal tax theory, such responses to taxes and transfers are costly solely because of their effect on government finances. Do Economists Matter?The academic literature in economics does play a role,although often an indirect one, in shaping the debate on tax and transfer policy. In the 1900–1910s, when modern progressive income taxes were created, economists appear to have played a role, albeit a modest one. Utilitarian economists like Jevons, Edgeworth, and Marshall had long argued that the principles of marginal utility and equal sacrifice push in favor of pro- gressive tax rates (see e.g.,Edgeworth,1897)—but such theoretical results had little impact on the public debate. Applied economists like Seligman wrote widely translated and read 16 Note that this graph ignores important elements. First, the health insurance Medicaid program in the United States is means-tested and adds a significant layer of implicit taxation on low income work. France offers universal health insurance which does not create any additional implicit tax on work. Second, the graph ignores in-kind benefits for children such as subsidized child care and free pre-school kindergarten in France that have significant value for working single parents. Such programs barely exist in the United States. Third, the graph ignores housing benefits, which are substantial in France. Fourth,the graph ignores temporary unemployment insurance benefits which depend on previous earnings for those who have become recently unemployed and which are significantly more generous in France both in level and duration. Finally, this graph ignores consumption taxes, implying that the cutoff income level below which transfers exceed taxes is significantly overestimated. This cutoff also greatly varies with the family structure (e.g., able bodied single individuals with no dependent receive zero cash transfers in the US but significant transfers in France). Optimal Labor Income Taxation 401 books and reports (see e.g., Seligman, 1911) arguing that progressive income taxation was not only fair but also economically efficient and administratively manageable.17 Such arguments expressed in terms of practical economic and administrative rationality helped to convince reluctant mainstream economists in many countries that progressive income taxation was worth considering.18 In the 1920–1940s, the rise of top tax rates seems to have been the product of public debate and political conflict—in the context of chaotic political, financial, and social situations—rather than the outcome of academic arguments. It is worth noting, however, that a number of US economists of the time, e.g., Irving Fisher, then president of the American Economic Association, repeatedly argued that concentration of income and wealth was becoming as dangerously excessive in America as it had been for a long time in Europe, and called for steep tax progressivity (see e.g., Fisher, 1919). It is equally difficult to know whether economists had a major impact on the great reversal in top tax rates that occurred in the 1970–1980s during the Thatcher and Reagan conservative revolutions in Anglo-Saxon countries. The influential literature showing that top tax rate cuts can generate large responses of reported taxable income came after top tax rate cuts (e.g., Feldstein, 1995). Today, most governments also draw on the work of commissions, panels, or reviews to justify tax and transfer reforms. Such reviews often play a big role in the public debate. They are sometimes commissioned by the government itself (e.g.,the President’s Advisory Panel on Federal Tax Reform in the United States, US Treasury, 2005), by independent policy research institutes (e.g., the Mirrlees review on Reforming the Tax System for the 21st Century in the United Kingdom,Mirrlees (2010,2011)),or proposed by independent academics (e.g., Landais et al., 2011 for France). Such reviews always involve tax scholars who draw on the academic economic literature to shape their recommendations.19 The press also consults tax scholars to judge the merits of reforms proposed by politicians, and tax scholars naturally use findings from the academic literature when voicing their views. 2.2. History of the Field of Optimal Income Taxation We offer here only a brief overview covering solely optimal income taxation.20 The modern analysis of optimal income taxation started with Mirrlees (1971) who rigor- ously posed and solved the problem. He considered the maximization of a social welfare function based on individual utilities subject to a government budget constraint and 17 See e.g., Mehrotra (2005) for a longer discussion of the role of Seligman on US tax policy at the beginning of the 20th century. 18 This is particularly true in countries like France where mainstream laissez-faire economists had little sympathy for Anglo-Saxon utilitarian arguments, and were originally very hostile to tax progressivity, which they associated with radical utopia and with the French Revolution. See e.g., Delalande (2011a,b, pp. 166-170). 19 Boadway (2012), Chapter 1 provides a longer discussion of the role played by such reviews. 20 For a survey of historical fiscal doctrine in general see Musgrave (1985, chap. 1). For a more complete overview of modern optimal Boadway (2012), chapter 2. 402 Thomas Piketty and Emmanuel Saez incentive constraints arising from individuals’ labor supply responses to the tax system.21 Formally, in the Mirrlees model, people differ solely through their skill (i.e., their wage rate).The government wants to redistribute from high skill to low skill individuals but can only observe earnings (and not skills). Hence, taxes and transfers are based on earnings, leading to a non-degenerate equity-efficiency trade off. Mirrlees (1971) had an enormous theoretical influence in the development of contract and information theory, but little influence in actual policy making as the general lessons for optimal tax policy were few. The most striking and discussed result was the famous zero marginal tax rate at the top. This zero-top result was established by Sadka (1976) and Seade (1977). In addition, if the minimum earnings level is positive with no bunching of individuals at the bottom, the marginal tax rate is also zero at the bottom (Seade, 1977). A third result obtained by Mirrlees (1971) and Seade (1982) was that the optimal marginal tax rate is never negative if the government values redistribution from high to low earners. Stiglitz (1982) developed the discrete version of the Mirrlees (1971) model with just two skills. In this discrete case, the marginal tax rate on the top skill is zero making the zero-top result loom even larger than in the continuous model of Mirrlees (1971). That likely contributed to the saliency of the zero-top result. The discrete model is useful to understand the problem of optimal taxation as an information problem generating an incentive compatibility constraint for the government. Namely, the tax system must be set up so that the high skill type does not want to work less and mimic the low skill type. This discrete model is also widely used in contract theory and industrial organization. However, this discrete model has limited use for actual tax policy recommendations because it is much harder to obtain formulas expressed in terms of sufficient statistics or put realistic numbers in the discrete two skill model than in the continuous model.22 Atkinson and Stiglitz (1976) derived the very important and influential result that under separability and homogeneity assumptions on preferences, differentiated commod- ity taxation is not useful when earnings can be taxed nonlinearly. This famous result was influential both for shaping the field of optimal tax theory and in tax policy debates. Theoretically, it contributed greatly to shift the theoretical focus toward optimal nonlin- ear taxation and away from the earlier Diamond and Mirrlees (1971) model of differ- entiated commodity taxation (itself based on the original Ramsey (1927) contribution). Practically, it gave a strong rationale for eliminating preferential taxation of necessities on redistributive grounds, and using instead a uniform value-added-tax combined with income-based transfers and progressive income taxation. Even more importantly, the 21 Vickrey (1945) had proposed an earlier formalization of the problem but without solving explicitly for optimal tax formulas. 22 Stiglitz (1987, chap. 15) handbook chapter on optimal taxation provides a comprehensive optimal tax survey using the Stiglitz (1982) discrete model. In this chapter, we will not use the Stiglitz (1982) discrete model and present instead an alternative discrete model, first developed by Piketty (1997) which generates optimal tax formulas very close to those of the continuous model, and much easier to calibrate meaningfully. Optimal Labor Income Taxation 403 Atkinson and Stiglitz (1976) result has been used to argue against the taxation of capital income and in favor of taxing solely earnings or consumption. The optimal linear tax problem is technically simpler and it was known since at least Ramsey (1927) that the optimum tax rate can be expressed in terms of elasticities. Sheshinski (1972) is the first modern treatment of the optimal linear income tax problem. It was recognized early that labor supply elasticities play a key role in the optimal linear income tax rate. However, because of the disconnect between the nonlinear income tax analysis and the linear tax analysis, no systematic attempt was made to express nonlinear tax formulas in terms of estimable “sufficient statistics” until relatively recently. Atkinson (1995),Diamond (1998),Piketty (1997),Saez (2001) showed that the optimal nonlinear tax formulas can also be expressed relatively simply in terms of elasticities.23 This made it possible to connect optimal income tax theory to the large empirical literature estimating behavioral responses to taxation. Diamond (1980) considered an optimal tax model with participation labor supply responses, the so-called extensive margin (instead of the intensive margin of the Mirrlees, 1971). He showed that the optimal marginal tax rate can actually be negative in that case. As we shall see, this model with extensive margins has received renewed attention in the last decade. Saez (2002a) developed simple elasticity-based formulas showing that a negative marginal tax rate (i.e., a subsidy for work) is optimal at the bottom in such an extensive labor supply model. With hindsight, it may seem obvious that the quest for theoretical results in optimal income tax theory with broad applicability was doomed to yield only limited results. We know that the efficiency costs of taxation depend on the size of behavioral responses to taxes and hence that optimal tax systems are going to be heavily dependent on the size of those empirical parameters. In this handbook chapter, in addition to emphasizing connections between theory and practical recommendations, we also want to flag clearly areas, where we feel that the theory fails to provide useful practical policy guidance. Those failures arise both because of limitations of empirical work and limitations of the theoretical framework. We dis- cuss limitations of the standard utilitarian framework in Section 7. Another theoretical limitation arises because of behavioral considerations, i.e., the fact that individuals do not behave according to the standard utility maximization model, due to psychologi- cal effects and cognitive limitations. Such behavioral effects naturally affect the analysis and have generated an active literature both theoretical and empirical that we do not cover here (see e.g., Congdon, Mullainathan, & Schwartzstein, 2012 and the chapter by Chetty and Finkelstein in this volume for applications of behavioral economics to public economics). 23 In the field of nonlinear pricing in industrial organization, the use of elasticity-based formulas came earlier (see e.g., Wilson, 1993). 404 Thomas Piketty and Emmanuel Saez 3. CONCEPTUAL BACKGROUND 3.1. Utilitarian Social Welfare Objective The dominant approach in normative public economics is to base social welfare on individual utilities. The simplest objective is to maximize the sum of individual utilities, the so-called utilitarian (or Benthamite) objective.24 Fixed Earnings.To illustrate the key ideas,consider a simple economy with a population normalized to one and an exogenous pre-tax earnings distribution with cumulative dis- tribution function H (z), i.e., H (z) is the fraction of the population with pre-tax earnings below z. Let us assume that all individuals have the same utility function u(c) increasing and concave in disposable income c (since there is only one period, disposable income is equal to consumption). Disposable income is pre-tax earnings minus taxes on earnings so that c = z − T (z). The government chooses the tax function T (z) to maximize the utilitarian social welfare function: ∞ ∞ SWF = u(z − T (z))dH (z) subject to T (z)dH (z) ≥ E (p), 0 0 where E is an exogenous revenue requirement for the government and p is the Lagrange multiplier of the government budget constraint. As incomes z are fixed,this is a point-wise maximization problem and the first order condition in T (z) is simply: u (z − T (z)) = p ⇒ c = z − T (z) = constant across z. Hence, utilitarianism with fixed earnings and concave utility implies full redistribution of incomes. The government confiscates 100% of earnings, funds its revenue requirement, and redistributes the remaining tax revenue equally across individuals.This result was first established by Edgeworth (1897). The intuition for this strong result is straightforward. With concave utilities, marginal utility u (c) is decreasing with c. Hence, if c1 < c2 then u (c1 ) > u (c2 ) and it is desirable to transfer resources from the person consuming c2 to the person consuming c1. Generalized social welfare functions of the form G(u(c))dH (z) where G(·) is increasing and concave are also often considered.The limiting case where G(·) is infinitely concave is the Rawlsian (or maxi-min ) criterion where the government’s objective is to maximize the utility of the most disadvantaged person, i.e., maximize the minimum util- ity (maxi-min). In this simple context with fixed incomes, all those objectives also leads to 100% redistribution as in the standard utilitarian case. Finally, with heterogeneous utility functions ui (c) across individuals, the utilitarian optimum is such that ui (c) is constant over the population. Comparing the levels of marginal utility of consumption conditional on disposable income z − T (z) across peo- ple with different preferences raises difficult issues of interpersonal utility comparisons. 24 Utilitarianism as a social justice criterion was developed by the English philosopher Bentham in the late 18th century (Bentham, 1791). Optimal Labor Income Taxation 405 There might be legitimate reasons,such as required health expenses due to medical condi- tions, that make marginal utility of consumption higher for some people than for others even conditional on after-tax income z − T (z). Another legitimate reason would be the number of dependent children. Absent such need-based legitimate reasons, it does not seem feasible nor reasonable for society to discriminate in favor of those with high marginal utility of consumption (e.g., those who really enjoy consumption) against those with low marginal utility of consumption (e.g., those less able to enjoy consumption). This is not feasible because marginal utility of consumption cannot be observed and com- pared across individuals. Even if marginal utility were observable, it is unlikely that such discrimination would be acceptable to society (see our discussion in Section 6). Therefore, it seems fair for the government to consider social welfare functions such that social marginal utility of consumption is the same across individuals conditional on disposable income. In the fixed earnings case, this means that the government can actually ignore individual utilities and use a“universal”social utility function u(c) to evaluate social welfare. The concavity of u(c) then reflects society’s value for redistribution rather than directly individual marginal utility of consumption.25 We will come back to this important point later on. Endogenous Earnings. Naturally, the result of complete redistribution with concave utility depends strongly on the assumption of fixed earnings. In the real world, complete redistribution would certainly greatly diminish incentives to work and lead to a decrease in pre-tax earnings. Indeed, the goal of optimal income tax theory has been precisely to extend the basic model to the case with endogenous earnings ( Vickrey, 1945 and Mirrlees, 1971). Taxation then generates efficiency costs as it reduces earnings, and the optimal tax problem becomes a non-trivial equity-efficiency trade off. Hence, with util- itarianism, behavioral responses are the sole factor preventing complete redistribution. In reality, society might also oppose complete redistribution on fairness grounds even setting aside the issue of behavioral responses. We come back to this limitation of utilitarianism in Section 6. Let us therefore now assume that earnings are determined by labor supply and that individuals derive disutility from work. Individual i has utility ui (c, z) increasing in c but decreasing with earnings z. In that world, 100% taxation would lead everybody to completely stop working, and hence is not desirable. Let us consider general social welfare functions of the type: SWF = ωi G(ui (c, z))dν(i), where ωi ≥ 0 are Pareto weights independent of individual choices (c, z) and G(·) an increasing transformation of utilities, and dν(i) is the distribution of individuals. 25 Naturally, the two concepts are not independent. If individuals have very concave utilities, they will naturally support more redistribution under the “veil of ignorance,” and the government choice for u(c) will reflect those views. 406 Thomas Piketty and Emmanuel Saez The combination of arbitrary Pareto weights ωi and a social welfare function G(·) allows us to be fully general for the moment. We denote by ωi G (ui )uci gi = p the social marginal welfare weight on individual i, with p the multiplier of the government budget constraint. Intuitively, gi measures the dollar value (in terms of public funds) of increasing con- sumption of individual i by $1. With fixed earnings, any discrepancy in the gi ’s across individuals calls for redistribution as it increases social welfare to transfer resources from those with lower gi ’s toward those with higher gi ’s. Hence, absent efficiency concerns, the government should equalize all the gi ’s.26 With endogenous earnings, the gi ’s will no longer be equalized at the optimum. As we shall see, social preferences for redistribution enter optimal tax formulas solely through the gi weights. Under the utilitarian objective, gi = uci /p is directly proportional to the marginal utility of consumption. Under the Rawlsian criterion, all the gi are zero, except for the most disadvantaged. In the simpler case with no income effects on labor supply, i.e., where utility functions take the quasi-linear form ui (c, z) = v i (c − hi (z)) with v i (·) increasing and concave and hi (z) increasing and convex, the labor supply decision does not depend on non-labor income (see Section 3.3) and the average of gi across all individuals is equal to one. This can be seen as follows. The government is indifferent between one more dollar of tax revenue and redistributing $1 to everybody (as giving one extra dollar lump-sum does not generate any behavioral response). The value of giving $1 extra to person i, in terms of public funds, is gi so that the value of redistributing $1 to everybody is gi dν(i). 3.2. Fallacy of the Second Welfare Theorem The second welfare theorem seems to provide a strikingly simple theoretical solution to the equity-efficiency trade off. Under standard perfect market assumptions, the second welfare theorem states that any Pareto efficient outcome can be reached through a suitable set of lump-sum taxes that depend on exogenous characteristics of each individual (e.g., intrinsic abilities or other endowments or random shocks), and the subsequent free func- tioning of markets with no additional government interference. The logic is very simple. If some individuals have better earnings ability than others and the government wants to equalize disposable income, it is most efficient to impose a tax (or a transfer) based on earnings ability and then let people keep 100% of their actual earnings at the margin.27 26 As we saw, under utilitarianism and concave and uniform utility functions across individuals, this implies complete equalization of post-tax incomes. 27 In the model above, the government would impose taxes T based on the intrinsic characteristics of individual i but i independent of the behavior of individual i so as to equalize all the gi ’s across individuals (in the equilibrium where each individual chooses labor supply optimally given Ti ). Optimal Labor Income Taxation 407 In standard models,it is assumed that the government cannot observe earnings abilities but only realized earnings. Hence, the government has to base taxes and transfers on actual earnings only, which distort earnings and create efficiency costs. This generates an equity-efficiency trade off.This informational structure puts optimal tax analysis on sound theoretical grounds and connects it to mechanism design. While this is a theoretically appealing reason for the failure of the second welfare theorem, in our view, there must be a much deeper reason for governments to systematically use actual earnings rather than proxies for ability in real tax systems. Indeed, standard welfare theory implies that taxes and transfers should depend on any characteristic correlated with earnings ability in the optimal tax system. If the charac- teristic is immutable, then average social marginal utilities across groups with different characteristics should be perfectly equalized. Even if the characteristic is manipulable, it should still be used in the optimal system (see Section 6.1). In reality, actual income tax or transfer systems depend on very few other characteristics than income. Those char- acteristics, essentially family situation or disability status, seem limited to factors clearly related to need.28 The traditional way to resolve this puzzle has been to argue that there are additional horizontal equity concerns that prevent the government from using non-income char- acteristics for tax purposes (see e.g., Atkinson and Stiglitz (1980) pp. 354–5). Recently, Mankiw and Weinzierl (2010) argue that this represents a major failure of the standard social welfare approach. This shows that informational concerns and observability is not the overwhelming reason for basing taxes and transfers almost exclusively on income. This has two important consequences. First,finding the most general mechanism compatible with the informational set of the government—as advocated for example in the New Dynamic Public Finance literature (see Kocherlakota, 2010 for a survey)—might not be very useful for understanding actual tax problems. Such an approach can provide valuable theoretical insights and results but is likely to generate optimal tax systems that are so fundamentally different from actual tax systems that they are not implementable in practice. It seems more fruitful practically to assume instead exogenously that the government can only use a limited set of tax tools, precisely those that are used in practice, and consider the optimum within the set of real tax systems actually used. In most of this chapter, we therefore pursue this “simple tax structure” approach.29 Second,it would certainly be useful to make progress on understanding what concepts of justice or fairness could lead the government to use only a specific subset of taxes 28 When incomes were not observable, archaic tax systems did rely on quasi-exogenous characteristics such as nobility titles, or land taxes based on rarely updated cadasters (Ardant, 1971). Ironically, when incomes become observable, such quasi-first best taxes were replaced by second-best income-based taxes. 29 As mentioned above, the set of tools available changes over time. For example, individual incomes become observable only in modern economies. 408 Thomas Piketty and Emmanuel Saez and deliberately ignore other tools—such as taxes based on non-income characteristics correlated with ability—that would be useful to maximize standard utilitarian social welfare functions. We will come back to those important issues in Section 6.1 where we study tagging and in Section 7 where we consider alternatives to utilitarianism. 3.3. Labor Supply Concepts In this chapter, we always consider a population of measure one of individuals. In most sections, individuals have heterogeneous preferences over consumption and earnings. Individual i utility is denoted by ui (c, z) and is increasing in consumption c and decreas- ing in earnings z as earnings require labor supply. Following Mirrlees (1971), in most models, heterogeneity in preferences is due solely to differences in wage rates w i where utility functions take the form u(c, z/w i ) where l = z/w i is labor supply needed to earn z. Our formulation ui (c, z) is more general and can capture both heterogeneity in ability as well as heterogeneity in preferences. As mentioned earlier, we believe that heterogeneity is an important element of the real world and optimal tax results should be reasonably robust to it. To derive labor supply concepts, we consider a linear tax system with a tax rate τ combined with a lump sum demogrant R so that the budget constraint of each individual is c = (1 − τ )z + R. Intensive Margin. Let us focus first on the intensive labor supply margin, that is on the choice of how much to earn conditional on working. Individual i chooses z to maximize ui ((1 − τ )z + R, z) which leads to the first order condition ∂ui ∂ui (1 − τ ) + = 0, ∂c ∂z which defines implicitly the individual uncompensated (also called Marshallian) earnings supply function zui (1 − τ, R). ∂zui The effect of 1 − τ on zi defines the uncompensated elasticity eui = 1−τ zui ∂(1−τ ) of earnings with respect to the net-of-tax rate 1 − τ. The effect of R on zu defines the income effect i ηi = (1 − τ ) ∂z i ∂R. If leisure is a normal good, an assumption we make from now on, then ηi ≤ 0 as receiving extra non-labor income induces the individual to consume both more goods and more leisure. Finally, one can also define the compensated (also called Hicksian) earnings supply function zci (1 − τ, u) as the earnings level that minimizes the cost necessary to reach utility u.30 The effect of 1 − τ on zi keeping u constant defines the compensated elasticity ∂zci eci = 1−τ zi ∂(1−τ ) of earnings with respect to the net-of-tax rate 1 − τ. The compensated elasticity is always positive. 30 Formally z i (1 − τ, u) solves the problem min c − (1 − τ )z subject to u(c, z) ≥ u. c z Optimal Labor Income Taxation 409 The Slutsky equation relates those parameters eci = eui − ηi. To summarize we have: 1 − τ ∂zui ∂zui 1 − τ ∂zci eui = 0, ηi = (1 − τ ) ≤ 0, eci = > 0, zui ∂(1 − τ ) ∂R zci ∂(1 − τ ) and eci = eui − ηi. (1) In the long-run process of development over the last century in the richest countries, wage rates have increased by a factor of five. Labor supply measured in hours of work has declined only very slightly (Ramey & Francis, 2009). If preferences for consumption and leisure have not changed, this implies that the uncompensated elasticity is close to zero. This does not mean however that taxes would have no effect on labor supply as a large fraction of taxes are rebated as transfers (see our discussion in Section 2). Therefore, on average, taxes are more similar to a compensated wage rate decrease than an uncompensated wage rate decrease. If income effects are large, government taxes and transfers could still have a large impact on labor supply. Importantly, although we have defined those labor supply concepts for a linear tax system, they continue to apply in the case of a nonlinear tax system by considering the linearized budget at the utility maximizing point. In that case, we replace τ by the marginal tax rate T (z) and we replace R by virtual income defined as the non-labor income that the individual would get if her earnings were zero and she could stay on the virtual linearized budget. Formally R = z − T (z) − (1 − T (z)) · z. Hence, the marginal tax rate T (z) reduces the marginal benefit of earning an extra dollar and reduces labor supply through substitution effects, conditional on the tax level T (z). The income tax level T (z) increases labor supply through income effects. In net, taxes (with T (z) > 0 and T (z) > 0) hence have an ambiguous effect on labor supply while transfers (with T (z) > 0 and T (z) < 0) have an unambiguously negative effect on labor supply. Extensive Margin. In practice, there are fixed costs of work (e.g., searching for a job, finding alternative child care for parents, loss of home production, transportation costs, etc.). This can be captured in the basic model by assuming that choosing z > 0 (as opposed to z = 0) involves a discrete cost di. It is possible to consider a pure extensive margin model by assuming that individual i can either not work (and earn zero) or work and earn zi where zi is fixed to individual i and reflects her earning potential. Assume that utility is linear, i.e., ui = ci − di · li where ci is net-of-tax income, di is the cost of work and li = 0, 1 is a work dummy. In that case, individual i works if and only if zi −T (zi )−di ≥ −T (0), i.e., if di ≤ zi −T (zi )+T (0) = zi · (1 − τp ) where τp = [T (zi ) − T (0)]/zi. τp is the participation tax rate, defined as the fraction of earnings taxed when the individual goes from not working and earning zero to working and earning zi. Therefore, the decision to work depends on the net-of-tax participation tax rate 1 − τp. 410 Thomas Piketty and Emmanuel Saez To summarize, there are three key concepts for any tax and transfer system T (z). First, the transfer benefit with zero earnings −T (0), sometimes called demogrant or lump- sum grant. Second, the marginal tax rate (or phasing-out rate) T (z): The individual keeps 1 − T (z) for an additional $1 of earnings. 1 − T (z) is the key concept for the intensive labor supply choice. Third, the participation tax rate τp = [T (z) − T (0)]/z: The individual keeps a fraction 1 − τp of his earnings when going from zero earnings to earnings z. 1 − τp is the key concept for the extensive labor supply choice. Finally, note that T (z) integrates both the means-tested transfer program and the income tax that funds such transfers and other government spending. In practice transfer programs and taxes are often administered separately. The break even earnings point z∗ is the point at which T (z∗ ) = 0. Above the break even point, T (z) > 0 which encourages labor supply through income effects. Below the break even point, T (z) < 0 which discourages labor supply through income effects. Tax Reform Welfare Effects and Envelope Theorem. A key element of optimal tax analysis is the evaluation of the welfare effects of small tax reforms. Consider a nonlinear tax T (z). Individual i chooses z to maximize ui (z − T (z), z), leading to the first order condition uci · (1 − T (z)) + uzi = 0. Consider now a small reform dT (z) of the nonlinear tax schedule. The effect on individual utility ui is dui = uci · [−dT (z)] + uci · [1 − T (z)]dz + uzi · dz = uci · [−dT (z)], where dz is the behavioral response of the individual to the tax reform and the second equality is obtained because of the first order condition uci · (1 − T (z)) + uzi = 0. This is a standard application of the envelope theorem. As z maximizes utility, any small change dz has no first order effect on individual utility. As a result, behavioral responses can be ignored and the change in individual welfare is simply given by the mechanical effect of the tax reform on the individual budget multiplied by the marginal utility of consumption. 4. OPTIMAL LINEAR TAXATION 4.1. Basic Model Linear labor income taxation simplifies considerably the exposition but captures the key equity-efficiency trade off. Sheshinski (1972) offered the first modern treatment of optimal linear income taxation following the nonlinear income tax analysis of Mirrlees (1971). Both the derivation and the optimal formulas are also closely related to the more complex nonlinear case. It is therefore pedagogically useful to start with the linear case where the government uses a linear tax at rate τ to fund a demogrant R (and additional non-transfer spending E taken as exogenous).31 31 In terms of informational constraints, the government would be constrained to use linear taxation (instead of the more general nonlinear taxation) if it can only observe the amount of each earnings transaction but cannot observe the identity of individual earners. This could happen for example if the government can only observe the total payroll paid by each employer but cannot observe individual earnings perhaps because there is no identity number system for individuals. Optimal Labor Income Taxation 411 Summing the Marshallian individual earnings functions zui (1−τ, R),we obtain aggre- gate earnings which depend upon 1 − τ and R and can be denoted by Zu (1 − τ, R). The government’s budget constraint is R + E = τ Zu (1 − τ, R), which defines implicitly R as a function of τ only (as we assume that E is fixed exogenously). Hence, we can express aggregate earnings as a sole function of 1 − τ : Z(1 − τ ) = Zu (1 − τ, R(τ )). The tax revenue function τ → τ Z(1 − τ ) has an inverted U-shape. It is equal to zero both when τ = 0 (no taxation) and when τ = 1 (complete taxation) as 100% taxation entirely discourages labor supply. This curve is popularly called the Laffer curve although the concept of the revenue curve has been known since at least Dupuit (1844). Let us denote by e = 1−τ dZ Z d(1−τ ) the elasticity of aggregate earnings with respect to the net-of- ∗ tax rate. The tax rate τ maximizing tax revenue is such that Z(1 − τ ) − τ d(1−τ dZ ) = 0, τ ∗ i.e., 1−τ e = 1. Hence, we can express τ as a sole function of e: τ∗ 1 1 Revenue maximizing linear tax rate: = or τ∗ =. (2) 1−τ ∗ e 1+e Let us now consider the maximization of a general social welfare function. The demogrant R evenly distributed to everybody is equal to τ Z(1 − τ ) − E and hence disposable income for individual i is c i = (1 − τ )zi + τ Z(1 − τ ) − E (recall that popu- lation size is normalized to one). Therefore, the government chooses τ to maximize SWF = ωi G[ui ((1 − τ )zi + τ Z(1 − τ ) − E, zi )]dν(i). i Using the envelope theorem from the choice of zi in the utility maximization problem of individual i, the first order condition for the government is simply dSWF i i i dZ 0= = ω G (u )uc · Z − z − τ i dν(i), dτ i d(1 − τ ) The first term in the square brackets Z − zi reflects the mechanical effect of increasing taxes (and the demogrant) absent any behavioral response. This effect is positive when individual income zi is less than average income Z. The second term −τ dZ/d(1 − τ ) reflects the efficiency cost of increasing taxes due to the aggregate behavioral response. This is an efficiency cost because such behavioral responses have no first order positive welfare effect on individuals but have a first order negative effect on tax revenue. Introducing the aggregate j elasticity e and the “normalized” social marginal welfare i i i j j weight g = ω G (u )uc / ω G (u )uc dν(j), we can rewrite the first order condition as: i τ Z · 1− e = gi zi dν(i). 1−τ i Hence, we have the following optimal linear income tax formula 1 − ḡ gi zi dν(i) Optimal linear tax rate: τ = with ḡ =. (3) 1 − ḡ + e Z 412 Thomas Piketty and Emmanuel Saez ḡ is the average “normalized” social marginal welfare weight weighted by pre-tax incomes zi. ḡ is also the ratio of the average income weighted by individual social welfare weights gi to the actual average income Z. Hence, ḡ measures where social welfare weights are con- centrated on average over the distribution of earnings. An alternative form for formula (3) often presented in the literature takes the form τ = −cov(gi , zi /Z)/[−cov(gi , zi /Z) + e] where cov(gi , zi /Z) is the covariance between social marginal welfare weights gi and normalized earnings zi /Z. As long as the correlation between gi and zi is negative, i.e., those with higher incomes have lower social marginal welfare weights, the optimum τ is positive. Five points are worth noting about formula (3). First, the optimal tax rate decreases with the aggregate elasticity e. This elasticity is a mix of substitution and income effects as an increase in the tax rate τ is associated with an increase in the demogrant R = τ Z(1 − τ ) − E. Formally, one can show that e = [ēu − ∂Zu η̄]/[1− η̄τ/(1−τ )] where ēu = 1−τ Zu ∂(1−τ ) is the average of the individual uncompensated elasticities eu weighted by income z and η̄ = (1 − τ ) ∂Z i i ∂R u is the unweighted average of individual income effects η. This allows us to rewrite the optimal tax formula (3) in a i 32 slightly more structural form as τ = (1 − ḡ)/(1 − ḡ − ḡ · η̄ + ēu ). When the tax rate maximizes tax revenue, we have τ = 1/(1 + e) and then e = ēu is a pure uncompensated elasticity (as the tax rate does not raise any extra revenue at the margin). When the tax rate is zero, e is conceptually close to a compensated elasticity as taxes raised are fully rebated with no efficiency loss.33 Second, the optimal tax rate naturally decreases with ḡ which measures the redistribu- tive tastes of the government. In the extreme case where the government does not value redistribution at all, gi ≡ 1 and hence ḡ = 1 and τ = 0 is optimal.34 In the polar oppo- site case where the government is Rawlsian and maximizes the lump sum demogrant (assuming the worst-off individual has zero earnings), then ḡ = 0 and τ = 1/(1 + e), which is the revenue maximizing tax rate from Eq. (2). As mentioned above, in that case e = ēu is an uncompensated elasticity. Third and related, for a given profile of social welfare weights (or for a given degree of concavity of the utility function in the homogeneous utilitarian case), the higher the pre-tax inequality at a given τ , the lower ḡ, and hence the higher the optimal tax rate. If there is no inequality, then ḡ = 1 and τ = 0 with a lump-sum tax −R = E is optimal. If inequality is maximal, i.e., nobody earns anything except for a single person who earns everything and has a social marginal welfare weight of zero, then τ = 1/(1 + e), again equal to the revenue maximizing tax rate. Fourth, it is important to note that, as is usual in optimal tax theory, formula (3) is an implicit formula for τ as both e and especially ḡ vary with τ. Under a standard utilitarian 32 To see this, recall that Z(1 − τ ) = Z (1 − τ, τ Z(1 − τ ) − E) so that dZ ∂Zu ∂Zu ∂Zu u d(1−τ ) 1 − τ ∂R = ∂(1−τ ) − Z ∂R. 33 It is not exactly a compensated elasticity as ē is income weighted while η̄ is not. u 34 This assumes that a lump sum tax E is feasible to fund government spending. If lump sum taxes are not feasible, for example because it is impossible to set taxes higher than earnings at the bottom, then the optimal tax in that case is the smallest τ such that τ Z(1 − τ ) = E, i.e., the level of tax required to fund government spending E. Optimal Labor Income Taxation 413 social welfare criterion with concave utility of consumption, ḡ increases with τ as the need for redistribution (i.e., the variation of the gi with zi ) decreases with the level of taxation τ. This ensures that formula (3) generates a unique equilibrium for τ. Fifth, formula (3) can also be used to assess tax reform. Starting from the current τ , the current estimated elasticity e, and the current welfare weight parameter ḡ, if τ < (1 − ḡ)/(1 − ḡ + e) then increasing τ increases social welfare (and conversely). The tax reform approach has the advantage that it does not require knowing how e and ḡ change with τ , since it only considers local variations. Generality of the Formula. The optimal linear tax formula is very general as it applies to many alternative models for the income generating process. All that matters is the aggregate elasticity e and how the government sets normalized marginal welfare weights g i. First, if the population is discrete, the same derivation and formula obviously apply. Second, if labor supply responses are (partly or fully) along the extensive margin, the same formula applies. Third, the same formula also applies in the long run when educational and human capital decisions are potentially affected by the tax rate as those responses are reflected in the long-run aggregate elasticity e (see e.g., Best & Kleven, 2012).35 Random Earnings. If earnings are generated by a partly random process involving luck in addition to ability and effort, as inVarian (1980) and Eaton and Rosen (1980), formula (3) still applies as long as the social welfare objective is defined over individual expected utilities. To see this, suppose that pre-tax income for individual i is a random function of labor supply l i and an idiosyncratic luck shock ε (with distribution dF i ) with zi = l i + ε for simplicity. Individual i chooses l i to maximize expected utility EU = ui ((l i + ε) · (1 − τ ) + R, l i )dF i (ε), i so that l i is function of 1 − τ and R. The government budget implies again that R = τ Z − E so that Z is also a function of 1 − τ as in the standard model (recall that R = τ Z(1 − τ ) − E is an implicit function of τ ). The government then chooses τ to maximize SWF = ωi G(EU i )dν(i). This again leads to formula (3) with ḡ the “normalized” average of g i = ωi G (EU i )uci weighted by incomes zi where now the average is taken as a double integral over both dF i (ε) and dν(i). Therefore, the random earnings model generates both the same equity-efficiency trade-off and the same type of optimal tax formula. This shows the robustness of the optimal linear tax approach. This robustness was not clearly apparent in the literature because of the focus on the nonlinear income tax case where the two models no longer deliver identical formulas.36 35 Naturally, such long-run responses are challenging to estimate empirically as short-term comparisons around a tax reform cannot capture them. 36 Varian (1980) analyzes the optimal nonlinear tax with random earnings. 414 Thomas Piketty and Emmanuel Saez Political Economy and Median Voter. The most popular model for policy decisions among economists is the median voter model.As is well known,the median voter theorem applies for unidimensional policies and where individual preferences are single-peaked with respect to this unidimensional policy. In our framework, the unidimensional policy is the tax rate τ (as the demogrant R is a function of τ ). Each individual has single-peaked preferences about the tax rate τ as τ → ui ((1 − τ )zi (1 − τ ) + τ Z(1 − τ ), zi (1 − τ )) is single-peaked with a peak such that −zi + Z − τ dZ/d(1 − τ ) = 0, i.e., τi = (1 − zi /Z)/ (1 − zi /Z + e). Hence, the median voter is the voter with median income zm. Recall that with single-peaked preferences, the median voter preferred tax rate is a Condorcet winner, i.e., wins in majority voting against any other alternative tax rate.37 Therefore, the median voter equilibrium has: 1 − zm /Z Median voter optimal tax rate: τm =. (4) 1 − zm /Z + e The formula implies that when the median zm is close to the average Z, the optimal tax rate is low because a linear tax rate achieves little redistribution (toward the median) and hence a lump-sum tax is more efficient.38 In contrast, when the median zm is small relative to the average, the tax rate τm gets close to the revenue maximizing tax rate τ ∗ = 1/(1 + e) from Eq. (2). Formula (4) is a particular case of formula (3) where social welfare weights are con- centrated at the median so that ḡ = zm /Z. This shows that there is a tight connection between optimal tax theory and political economy. Political economy uses social welfare weights coming out of the political game process rather than derived from marginal utility of consumption as in the standard utilitarian tax theory but the structure of resulting tax formulas is the same (see Persson & Tabellini, 2002, chap. 24 for a comprehensive survey of political economy applied to public finance). We come back to the determination of social welfare weights in Section 6. Finally and as caveats,note that the median voter theory applies only to unidimensional policies so that those results do not carry over to the nonlinear income tax case. The political economy literature has also shown that real world outcomes differ substantially from median voter predictions. 4.2. Accounting for Actual Tax Rates As we saw in Section 2, tax to GDP ratios in OECD countries are between 30% and 45% and the more economically meaningful tax to national income ratios between 35% and 50%. Quantitatively, most estimates of aggregate elasticities of taxable income are 37 To see this, if the alternative is τ < τ , everybody below and including the median prefers τ to τ so that τ wins. m m m Conversely, if τ > τm , everybody above and including the median prefers τm to τ and τm still wins. 38 Formula (4) shows that if z > Z, then a negative tax rate is actually optimal. Empirically however, it is always the m case that zm < Z. Optimal Labor Income Taxation 415 between.1 and.4 with.25 perhaps being a reasonable estimate (see Saez, Slemrod, & Giertz, 2012 for a recent survey), although there remains considerable uncertainty about these magnitudes.39 Table 2 proposes simple illustrative calculations using the optimal linear tax rate for- mula (3). It reports combinations of τ and ḡ in various situations corresponding to different elasticities e (across columns) and different social objectives (across rows). We consider three elasticity scenarios.The first one has e =.25 which is a realistic mid-range estimate (Saez et al., 2012, Chetty, 2012). The second has e =.5, a high range elasticity scenario. We add a third scenario with e = 1, an extreme case well above the current average empirical estimates. Panel A considers the standard case where ḡ is pinned down by a given social objective criterion and τ is then given by the optimal tax formula. The first row is the Rawlsian criterion (or revenue maximizing tax rate) with ḡ = 0. The second row is a utilitarian criterion with coefficient of relative risk aversion (CRRA) equal to one (social marginal welfare weights are proportional to uc = 1/c where c = (1 − τ )z + R is disposable income).40 Chetty (2006) shows that a CRRA equal to one is consistent with empirical labor supply behavior and hence a reasonable benchmark. The third row is the median voter optimum with a median to average earnings ratio of 70% (corresponding approxi- mately to the current US distribution based on individual adult earnings from the Current Population Survey in 2010). Panel B considers the inverse problem of determining the social preference parameter ḡ for a given tax rate τ. The first row uses τ = 35%, corre- sponding to a low tax country such as the United States. The second row uses τ = 50%, corresponding to a high tax country such as a typical country from the European Union. Three points should be noted. First, panel A shows that an empirically realistic elasticity e =.25 implies a revenue maximizing tax rate of 80% which is considerably higher than any actual average tax rate, even in the countries with the highest tax to GDP ratios, around 50%. The optimal tax rate under the utilitarian criterion with CRRA coefficient equal to one is 61%. The optimal tax rate for the median earner is τ = 55% which corresponds to average tax rates in high tax countries. Correspondingly as shown in panel B, with e =.25, a tax rate of 35%, such as current US tax rates, would be optimal in a situation where ḡ = 87%, i.e., with low redistributive tastes. A tax rate of 50% (as in a high tax country) would be optimal with ḡ = 75%. Second, a fairly high elasticity estimate of e =.5 would still generate a revenue maxi- mizing tax rate of 67%,above current rates in any country.The median voter optimum tax 39 Note however that the tax base tends to be smaller than national income as some forms of income (or consumption) are excluded from the tax base. Therefore, with existing tax bases, the tax rate needed to raise say 40% of national income, will typically be somewhat higher, perhaps around 50%. 40 ḡ is endogenously determined using the actual US earnings distribution and assuming that government required spending E (outside transfers) is 10% of total actual earnings. The distribution is for earnings of individuals aged 25 to 64 from the 2011 Current Population Survey for 2010 earnings. 416 Thomas Piketty and Emmanuel Saez Table 2 Optimal Linear Tax Rate Formula τ = (1 − g)/(1 − g + e) Elasticity e =.25 Elasticity e =.5 Elasticity e = 1 (empirically realistic) (high) (extreme) Parameter g (%) Tax rate τ Parameter g (%) Tax rate τ Parameter g (%) Tax rate τ (1) (2) (3) (4) (5) (6) A. Optimal linear tax rate τ Rawlsian revenue maximizing rate 0 80 0 67 0 50 Utilitarian (CRRA = 1, uc = 1/c) 61 61 54 48 44 36 Median voter optimum (zmedian /zaverage = 70%) 70 55 70 38 70 23 B. Revealed preferences g for redistribution Low tax country (US):Tax rate τ = 35% 87 35 73 35 46 35 High tax country (EU):Tax rate τ = 50% 75 50 50 50 0 50 Notes:This table illustrates the use of the optimal linear tax rate formula τ = (1 − g)/(1 − g + e) derived in the main text. It reports combinations of τ and g in various situations corresponding to different elasticities e (across columns) and different social objectives (across rows). Recall that g is the ratio of average earnings weighted by social marginal welfare weights to unweighted average earnings. Panel A considers the standard case where g is pinned down by a given social objective criterion and τ is then given by the optimal tax formula. The first row is the Rawlsian criterion (or revenue maximizing tax rate) with g = 0. The second row is a utilitarian criterion with coefficient of relative risk aversion (CRRA) equal to one (social marginal welfare weights are proportional to uc = 1/c where c = (1 − τ )z + R is disposable income). g is endogenously determined using the actual US earnings distribution and assuming that government required spending (outside transfers) is 10% of total earnings. The third row is the median voter optimum with a median to average earnings ratio of 70% (corresponding approximately to the current US situation). Panel B considers the inverse problem of determining the social preference parameter g for a given tax rate τ. The first row uses τ = 35%, corresponding to a low tax country such as the United States. The second row uses τ = 50%, corresponding to a high tax country such as a typical country from the European Union. Optimal Labor Income Taxation 417 rate of 38% would actually be close to the current US tax rate in that situation. A high tax rate of 50% would be rationalized by ḡ =.5,i.e.,fairly strong redistributive tastes.The util- itarian criterion also generates an optimal tax rate close to 50% in that elasticity scenario. Third, in the unrealistically high elasticity scenario e = 1, the revenue maximizing rate is 50%, about the current tax rate in countries with the highest tax to GDP ratios. Hence, only in that case would social preferences for redistribution be approaching the polar Rawlsian case. 4.3. Tax Avoidance As shown by many empirical studies (see Saez et al., 2012 for a recent survey), responses to tax rates can also take the form of tax avoidance. We can define tax avoidance as changes in reported income due to changes in the form of compensation but not in the total level of compensation. Tax avoidance opportunities typically arise when taxpayers can shift part of their taxable income into another form of income or another time period that receives a more favorable tax treatment.41 The key distinction between real and tax avoidance responses is that real responses reflect underlying, deep individual preferences for work and consumption while tax avoidance responses depend critically on the design of the tax system and the avoidance opportunities it offers. While the government cannot change underlying deep individual preferences and hence the size of the real elasticity, it can change the tax system to reduce avoidance opportunities. A number of papers incorporate avoidance effects for optimal tax design. In this chapter, we adapt the simple modeling of Piketty, Saez, and Stancheva (2011) to the linear tax case so as to capture the key tradeoffs as simply and transparently as possible.42 We can extend the original model as follows to incorporate tax avoidance. Let us denote by y real income and by x sheltered income so that taxable income is z = y − x. Taxable income z is taxed at linear tax rate τ , while sheltered income x is taxed at a constant and linear tax rate t lower than τ. Individual i’s utility takes the form: ui (c, y, x) = c − hi (y) − di (x), where c = y−τ z −tx+R = (1−τ )y+(τ −t)x+R is disposable after tax-income. hi (y) is the utility cost of earning real income y, and di (x) is the cost of sheltering an amount of income x. We assume a quasi-linear utility to simplify the derivations and eliminate cross-elasticity effects in real labor supply and sheltering decisions. We assume that both 41 Examples of such avoidance/evasion are (a) reductions in current cash compensation for increased fringe benefits or deferred compensation such as stock-options or future pensions, (b) increased consumption w

Optimal Labor Income Taxation PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue