PBS covers Bob Gordon's The Rise and Fall of American Growth.
[Embedded video. These aren't picked up when other sources pick up the blog, so come back to the original if you don't see the video.]
PBS and Paul Solman did a great job, especially relative to the usual standards of economics coverage in the media. OK, not perfect -- they livened it up by tying it to partisan politics a bit more than they should have, though far less than usual.
I don't (yet, maybe) agree with Bob. I still hope that the mastery of information and biology can produce results like the mastery of electromagnetism and fossil fuels did earlier. I still suspect that slow growth is resulting from government-induced sclerosis rather than an absence of good ideas in a smoothly functioning economy. But Bob has us talking about The Crucial Issue: long term growth, and its source in productivity. The 1870-1970 miracle was not about whether the federal funds rate was 0.25% higher or lower. And the issue is not about opinions, like the ones I just offered, but facts and research, which Bob offers.
The issue of future long-term growth is tied with the issue of measurement, something else that Bob has championed over the years. GDP is well designed to measure steel per worker. Information, health and lifespan increases are much more poorly measured. This is already a problem in long-term comparisons. In the video, Bob points to light as the greatest invention. The price of light has fallen by a factor of thousands since the age of candles, to the point where light consumption is a trivial part of GDP. It's a worse problem as all the great stuff becomes free. I suspect that we'll have to try to measure consumer surplus not just the market value of goods and services.
And congratulations to Bob. The economics profession tends to focus on the young rising stars, but he offers inspiration that economists can produce magnum opuses of deep impact at any point in a career.
Disclosure: I haven't read the book yet, but it is on top of the pile. More when I finish. Ed Glaeser has an excellent review.
Update: Tyler Cowen's review, in Foreign Affairs
Thứ Sáu, 29 tháng 1, 2016
Thứ Sáu, 22 tháng 1, 2016
Tax Oped -- full version
Source: Wall Street Journal |
Left and right agree that the U.S. tax code is a mess. The men and women running for president in 2016 are offering reform plans, and proposals to fix the code regularly surface in Congress. But these plans are, and should be, political documents, designed to attract votes. To prevent today’s ugly bargains from becoming tomorrow’s conventional wisdom, we should more frequently discuss the ideal tax structure.
The first goal of taxation is to raise needed government revenue with minimum economic damage. That means lower marginal rates—the additional tax people pay for each extra dollar earned—and a broader base of income subject to tax. It also means a massively simpler tax code.
In my view, simplification is more important than rates. A simple code would allow people and businesses to spend more time and resources on productive activities and less on attorneys and accountants, or on lobbyists seeking special deals and subsidies. And a simple code is much more clearly fair. Americans now suspect that people with clever lawyers are avoiding much taxation, which is corrosive to compliance and driving populist outrage across the political spectrum.
What would a minimally damaging, simple, fair tax code look like? First, the corporate tax should be eliminated. Every dollar of taxes that a corporation seems to pay comes from higher prices to its customers, lower wages to its workers, or lower dividends to its shareholders. Of these groups, wealthy individual shareholders are the least likely to suffer. If taxes eat into profits, investors pay lower prices for less valuable shares, and so earn the same return as before. To the extent that taxes do reduce returns, they also financially hurt nonprofits and your and my pension funds.
With no corporate tax, arguments disappear over investment expensing versus depreciation, repatriation of profits, too much tax-deductible debt, R&D deductions, and the vast array of energy deductions and credits.
Second, the government should tax consumption, not wages, income or wealth. When the government taxes savings, investment income, wealth or inheritance, it reduces the incentive to save, invest and build companies rather than enjoy consumption immediately. Taxes on capital gains discourage people from moving or reallocating capital toward their most productive uses.
Recognizing the distortion, the federal government provides a complex web of shelters, including IRAs, Roth IRAs, 527(b), 401(k), health-savings accounts, life-insurance exemptions, and the panoply of trusts that wealthy individuals use to shelter their wealth and escape the estate tax. If investment isn’t taxed, these costly complexities can disappear.
All the various deductions, credits and exclusions should be eliminated—even the holy trinity of tax breaks for mortgage interest, charitable donations and employer-provided health insurance. The extra revenue, over a trillion dollars annually, could finance a large reduction in marginal rates. This step would also simplify the code and make it fairer.
Imagine that Congress proposed to send an annual check to each homeowner. People with high incomes, who buy expensive houses, borrow lots of money or refinance often, would get bigger checks than people with low incomes, who buy smaller houses, save up more for down payments or pay down their mortgages. There would be rioting in the streets. Yet that is exactly what the mortgage-interest deduction accomplishes.
Similarly, suppose Congress proposed to match private charitable donations. But rich people would get a 40% match, middle class people only 10%, and poor people nothing. This is exactly what the charitable deduction accomplishes.
Zeroing out deductions, credits, and corporate and investment taxes matters—for permanence, for predictability and for simplicity. If the corporate rate is drastically reduced, or if deductions are capped, it seems that the economic distortions go away. But the thousands of pages of tax code are still in place, the army of lawyers and accountants and lobbyists is still in place, and the next administration will itch to raise the caps, and the rate.
Why is tax reform paralyzed? Because political debate mixes the goal of efficiently raising revenue with so many other objectives. Some want more progressivity or more revenue. Others defend subsidies and transfers for specific activities, groups or businesses. They hold reform hostage.
Wise politicians often bundle dissimilar goals to attract a majority. But when bundling leads to paralysis, progress comes by separating the issues. Thus, we should agree to first reform the structure of the tax code, leaving the rates blank. We will then separately debate rates, and the consequent overall revenue and progressivity.
Consumption-based taxes can be progressive. A simplified income tax, excluding investment income and allowing a full deduction for savings, could tax high-income earners’ consumption at a higher rate. Low-income people can receive transfers and credits. I think smaller government and less progressivity are wiser. But we can agree on an efficient, simple and fair tax, and debate revenues and progressivity separately.
We should also agree to separate the tax code from the subsidy code. We agree to debate subsidies for mortgage-interest payments, electric cars and the like—transparent and on-budget—but separately from tax reform.
Negotiating such an agreement will be hard. But the ability to achieve grand bargains is the most important characteristic of great political leaders.
Mr. Cochrane is a senior fellow at Stanford University’s Hoover Institution.
Thứ Sáu, 15 tháng 1, 2016
MacDonell on QE
Gerard MacDonell has a lovely noahpinion guest post "So Much for the QE Stimulus" (HT Marginal Revolution). Some good bits here, with my bold on noteworthy zingers.
The post is unusual, because practitioners tend to regard the Fed and QE as very powerful. But here he expresses nicely the skeptical view of many academics such as myself.
To be fair, I think Bernanke's point might hold if there were a huge QE, a clear promise to leave reserves outstanding when interest rates rise above zero, and then possibly future inflation might work its way back to current inflation. But exit principles that clearly state the large reserves will pay interest so as not to give future inflation undo the possibility.
Gerard leaves out, I think, the most telling mistake in the Bernanke quote, "monetary authorities could use the money they create to acquire indefinite quantities of goods..." Monetary policy does not buy goods; it does not drop money from helicopters. Monetary policy only gives one kind of debt in return for another kind; roughly speaking making change, giving you two 5s and a 10 for each 20. Buying goods is fiscal policy, and fiscal policy can cause inflation.
Bottom line
To be clear, both my post and Gerard's are not really critical of the Fed. If "pyrotechnics'' helped, good. If QE is not "mechanically" that powerful, great, we all learn from experience. A large interest-paying balance sheet and silence is probably the best thing for the Fed to do right now. This question is most important to academic and historical analysis, to learn what causal mechanisms really did play out, and what will work in the future.
The post is unusual, because practitioners tend to regard the Fed and QE as very powerful. But here he expresses nicely the skeptical view of many academics such as myself.
the Fed leadership has now abandoned its original story about how QE affects the economy and has conceded that the tool is weak
It has long been obvious that QE operated mainly through signaling and confidence channels, which wore off on their own without any adjustment in the size or composition of the Fed’s balance sheet....Obvious to us skeptics, not to the Fed or to the many academic papers written trying to explain the supposed powers of QE
The story initially told by the Fed leadership starts with the claim that large scale asset purchases (LSAPs) [lower interest rates]... by removing default-free interest rate duration from the capital markets. ...Translation: buying bonds to drive up bond prices
That story does not hold much water.
The theoretical foundations supporting QE were invented – or really revived from the 1950s [Preferred habitat theory]– in an effort to justify a program that had been resolved upon for other reasons.
LSAPs did not actually succeed in reducing the stock of government rates duration because they were fully offset by the fiscal deficit and the Treasury’s program of extending the maturity of the federal debt.Translation: The Treasury sold as much as the Fed bought.
And while the estimated term premium and bond yields did go down during the QE era of late 2008 through late 2014, they had a disconcerting tendency to rise while LSAPs were ongoing.Translation: When the Fed actually bought securities, yields went up.
Peak QE gullibility seems to have been reached in the late summer of 2012, with Ben Bernanke’s presentation to the Kansas City Fed’s monetary policy conference at Jackson Hole. ...Evidence that the Fed doesn't believe it any more
...the Fed has abandoned the flock it once led. If the leadership still believed the official story, it could not promise both to maintain the size of the balance sheet and raise rates at an historically slow pace. That would deliver far too much stimulus, particularly with the economy now near full employment. The obvious way to square this circle to recognize that the Fed does not believe the story, which is an advance.
... according to the original story, little of this presumed stimulus would unwind without asset sales or a passive shortening of maturities, both of which have largely been excluded for now.
...Readers of this comment may recall those charts circulated by Wall Street showing the fed funds equivalent going deeply and shockingly negative after 2009. In retrospect, those charts are cringe-inducing and best forgotten. It is a mercy that the Fed has participated in the forgetting.This is consistent with my view. The large balance sheet is a great thing. Narrow banking has arrived. We live the optimal quantity of money. Interest-paying reserves generate zero stimulus, but great liquidity. Alas, the Fed, having touted the world-saving stimulus of QE, without qualifying that effects might be temporary, now is in a tough spot to turn around and say "never mind." All it can do is be silent and wait.
...This raises the question of why the Fed initially promoted a story that so obviously would not stand the test of time. We can imagine three possibilities...
The first possibility relates to the first round of event studies, which measured the immediate effects on the term premium and bond yields of QE-related news....
Announcement effects are a poor measure of fundamental effects that will endure long enough to affect the economy... markets typically act more segmented in the short run than over time,.... But smart and credentialed people argued otherwise and the FOMC may have been comforted by that.I have puzzled at this as well. Many studies find price impacts of large unannounced trades. But price impact melts away. Why would we treat announcement effects as permanent -- as many Fed speeches did?
The second possibility is that the Fed wanted to raise confidence in the markets and real economy and thus chose to communicate that it was wielding a new and fundamentally powerful tool, even if Fed officials had their own doubts. ...This is the "signaling" channel.
It is best to lift confidence with tools that have a mechanical force and do not rely purely on confidence effects. But if such tools are not readily available, then it probably does not hurt to try magic tricks and pyrotechnics.Nice phrases. But..
The problem looking forward is that people may not be so responsive to the symbolism of QE next time around. ... Moreover, the Bank of Japan has got hold of QE, which raises the odds it will be properly discredited, if history guides.OK, not very nice, but a good snark prize, as much to the B of J as to its many critics. But far more interesting..
The third possibility ..[is] that Bernanke and his colleagues in Fed circles were durably confused by Bernanke’s early and mistaken relation of the Quantity Theory to the efficacy of LSAPs...:
"The general argument that the monetary authorities can increase aggregate demand and prices, even if the nominal interest rate is zero, is as follows:..The monetary authorities can issue as much money as they like. Hence, if the price level were truly independent of money issuance, then the monetary authorities could use the money they create to acquire indefinite quantities of goods and assets. This is manifestly impossible in equilibrium. Therefore, money issuance must ultimately raise the price level, even if nominal interest rates are bounded at zero. .."This is indeed the crucial point. In simple quantity theory thought, MV=PY, so you can raise M even at zero rates, and eventually PY must rise. But that's wrong, alas. V becomes undefined when the interest rate is zero, or money pays interest. As Gerard explains,
... one must wonder if this misapplication of the Quantity Theory to LSAPs created in Bernanke and associates an excessive confidence in the efficacy of the program...
...Bernanke would later argue this point himself, and demonstrate it by paying interest on excess reserves, thereby by converting them from money to debt. Bernanke’s money injection actually had ZERO maturity. Or more to the point, it did not even happen.Stop and savor just a moment. When the government pays interest on reserves, reserves become the same thing as overnight government debt. They are held as a saving vehicle, and have no "stimulus."
To be fair, I think Bernanke's point might hold if there were a huge QE, a clear promise to leave reserves outstanding when interest rates rise above zero, and then possibly future inflation might work its way back to current inflation. But exit principles that clearly state the large reserves will pay interest so as not to give future inflation undo the possibility.
Gerard leaves out, I think, the most telling mistake in the Bernanke quote, "monetary authorities could use the money they create to acquire indefinite quantities of goods..." Monetary policy does not buy goods; it does not drop money from helicopters. Monetary policy only gives one kind of debt in return for another kind; roughly speaking making change, giving you two 5s and a 10 for each 20. Buying goods is fiscal policy, and fiscal policy can cause inflation.
Bottom line
...The Fed leadership has come a long way from believing that QE had something to do with the power of the printing press to a recognition that the program is a combination of an indirect and transitory rates signal, a confidence game, and a duration take out that probably achieved much less than was advertised. But at least the journey has been made....I share this view.
To be clear, both my post and Gerard's are not really critical of the Fed. If "pyrotechnics'' helped, good. If QE is not "mechanically" that powerful, great, we all learn from experience. A large interest-paying balance sheet and silence is probably the best thing for the Fed to do right now. This question is most important to academic and historical analysis, to learn what causal mechanisms really did play out, and what will work in the future.
Thứ Ba, 5 tháng 1, 2016
Secret Data Encore
My post "Secret data" on replication provoked a lot of comment and emails, more reflection, and some additional links.
This isn't about rules
Many of my correspondents missed my main point -- I am not advocating more and tighter rules by journals! This is not about what you are "allowed to do," how to "get published" and so forth.
In fact, this extra rumination points me even more strongly to the view that rules and censorship by themselves will not work. How to make research transparent, replicable, extendable, and so forth varies by the kind of work, the kind of data, and is subject like everything else to creativity and technical improvement. Most of all, it will not work if nobody cares; if nobody takes the kind of actions in bullet points of my last post, and it's just an issue about rules at journals. Already, (more below) rules are not that well followed.
This isn't just about "replication."
"Replication" is much too narrow a word. Yes, many papers have not documented transparently what they actually did, so that even armed with the data it's hard to produce the same numbers. Other papers are based on secret data, the problem with which I started.
But in the end, most important results are not simply due to outright errors in data or coding. (I hope!)
The important issue is whether small changes in instruments, controls, data sample, measurement error handling, and so forth produce different results, whether results hold out of sample, or whether collecting or recoding data produces the same conclusions. "Robustness" is a better overall descriptor for the problem that many of us suspect pervades empirical economic research.
You need replicability in order to evaluate robustness -- if you get a different result than the original authors', it's essential to be able to track down how the original authors got their result. But the real issue is that much larger one.
The excellent replication wiki (many good links) quotes Daniel Hamermesh on this difference between "narrow" and "wide" replication
Michael Clemens writes about the issue in a blog post here, noting
"Failed replication" is a damning criticism. It implies error, malfeasance, deliberately hiding data, and so forth. What most "replication" studies really mean is "robustness," either to method or natural fishing biases, which is a more common problem (in my view). But as Michael points out, you really can't use the emotionally charged language of failed or "discrepant" replication for that situation.
This isn't about people or past work
I did not anticipate, but should have, that the secret data post would be read as criticism of people who do large-data work, proprietary-data work, or work with government agencies that cannot currently be shared. The internet is pretty snarky, so it's worth stating explicitly that is not my intent or my view.
Quite the opposite. I am a huge fan of the pioneering work exploiting new data sets. If these pioneers had not found dramatic results and possibilities with new data, it would not matter whether we can replicate, check or extend those results.
It is only now, that the pioneers have shown the way, that we know how important the work can be, that it becomes vital to rethink how we do this kind of work going forward.
The special problems of confidential government data
The government has a lot of great data -- IRS, and census for microeconomics, SEC, CFTC, Fed, financial product safety commission in finance. And there are obvious reasons why so far it has not been easily shared.
Journal policies allow exceptions for such data. So only a fundamental demand from the rest of us for transparency can bring about changes. And has begun to do so.
In addition to the suggestions in the last post, more and more people are going through the vetting to use the data. That leaves open the possibility that a full replication machine could be stored on site, ready for a replicator with proper access to push a button. Commercial data vendors could allow similar "free" replication, controlling directly how replicators use the data.
Technological solutions are on the way too. "Differential privacy" is an example of a technology that allows results to be replicated without compromising the privacy of the data. Leapyear.io is an example of companies selling this kind of technology. We are not alone, as there is a strong commercial demand for this kind of data. (Medical data for example.)
Other institutions: Journals, replication journals, websites,
There is some debate whether checking "replication" should count as new research, and I argued if we want replication we need to value it. The larger robustness question certainly is "new" research. Xs result does not hold out of sample, is sensitive to the precise choice of instruments and controls, and so forth, is genuine, publishable, follow-on research.
I originally opined that replications should be published by the original journal to give the best incentives. That means an AER replication "counts" as an AER publication.
But with the idea that robustness is the wider issue, I am less inclined to this view. This broader robustness or reexamination is genuine new research, and there is a continuum between replication and the normal business of examining the basic idea of a model with new data and also some new methods. Each paper on the permanent income hypothesis is not a "replication" of Friedman! We don't want to only value as "new" research that which uses novel methods -- then we become dry methodologists, not fact-oriented economists. And once a paper goes beyond pointing out simple mistakes, to questioning specification, a question which itself can be rebutted, it's beyond the responsibility of the original journal.
Ivo Welch argues that a third of each journal should be devoted to replication and critique. The Critical Finance Review, which he edits asks for replication papers. The Journal of Applied Econometrics has a replication section, and now invites replications of papers in many other journals. Where journals fear to tread, other institutions step in. The replication network is one interesting new resource.
Faculties
A correspondent suggests an important additional bullet point for the "what can we do" list
The precise wording of such standards should be fairly loose. The important thing is to send a message. Faculty are expected to make their research transparent and replicable, to provide data and programs, even when journals do not require it. Faculty up for promotion should expect that the committee reviewing them will look to see if they are behaving reasonably. Failure will likely lead to a little chat from your department chair or dean. And the policy should state that replication and robustness work is valued.
Another correspondent wrote that he/she advises junior faculty not to post programs and data, so that they do not become a "target" for replicators. To say we disagree on this is an understatement. A clear voice on this issue is an excellent outcome of crafting a written policy.
From Michael Kiley's excellent comment below
Two good surveys of replications (as well as journals)
Maren Duvendack, Richard Palmer-Jones, and Bob Reed have an excellent survey article, "Replications in Economics: A Progress Report"
Numerical Analysis
Ken Judd wrote to me,
Rules can be taken to extremes. Nobody is talking about "requiring" package customers to distribute the (proprietary) package source code. We all understand that step is not needed.
For heavy numerical analysis papers, using author-designed software that the author wants to market, the verification suggestion seems a sensible social norm to me. If I'm refereeing a paper with a heavy numerical component, I would be happy to see the extensive verification, and happier still if I could use the program on a few test cases of my own. Seeing the source code would not be necessary or even that useful. Perhaps in extremis, if a verification failed, I would want the right to contact the author and understand why his/her code produces a different result.
Some other examples of "replication" (really robustness) controversies:
Andrew Gelman covers a replication controversy, in which Douglas Campbell and Ju Hyun Pun dissect Enrico Spolaore and Romain Wacziarg's "the Diffusion of Development" in the QJE. There is no charge that the computer programs were wrong, or that one cannot produce the published numbers. The controversy is entirely over specification, that the result is sensitive to specification and controls.
Yakov Amihud and Stoyan Stoyanov Do Staggered Boards Harm Shareholders? reexamine Alma Cohen and Charles Wang's Journal of Financial Economics paper. They come to the opposite conclusion, but could only reexamine the issue because Cohen and Wang shared their data. Again, the issues, as far as I can tell, are not a charge that programs or data are wrong.
Update: Yakov corrects me:
I think the point that replication slides in to robustness which is more important and more contentious remains clear.
Asset pricing is especially vulnerable to results that do not hold out of sample, in particular the ability to forecast returns. Campbell Harvey has a number of good papers on this topic. Here, the issue is again not that the numbers are wrong, but that many good in-sample return-forecasting tricks stop working out of sample. To know, you have to have the data.
This isn't about rules
Many of my correspondents missed my main point -- I am not advocating more and tighter rules by journals! This is not about what you are "allowed to do," how to "get published" and so forth.
In fact, this extra rumination points me even more strongly to the view that rules and censorship by themselves will not work. How to make research transparent, replicable, extendable, and so forth varies by the kind of work, the kind of data, and is subject like everything else to creativity and technical improvement. Most of all, it will not work if nobody cares; if nobody takes the kind of actions in bullet points of my last post, and it's just an issue about rules at journals. Already, (more below) rules are not that well followed.
This isn't just about "replication."
"Replication" is much too narrow a word. Yes, many papers have not documented transparently what they actually did, so that even armed with the data it's hard to produce the same numbers. Other papers are based on secret data, the problem with which I started.
But in the end, most important results are not simply due to outright errors in data or coding. (I hope!)
The important issue is whether small changes in instruments, controls, data sample, measurement error handling, and so forth produce different results, whether results hold out of sample, or whether collecting or recoding data produces the same conclusions. "Robustness" is a better overall descriptor for the problem that many of us suspect pervades empirical economic research.
You need replicability in order to evaluate robustness -- if you get a different result than the original authors', it's essential to be able to track down how the original authors got their result. But the real issue is that much larger one.
The excellent replication wiki (many good links) quotes Daniel Hamermesh on this difference between "narrow" and "wide" replication
Narrow, or pure, replication means first checking the submitted data against the primary sources (when applicable) for consistency and accuracy. Second the tables and charts are replicated using the procedures described in the empirical article. The aim is to confirm the accuracy of published results given the data and analytical procedures that the authors write to have used.
Replication in a wide sense is to consider the empirical finding of the original paper by using either new data from other time periods or regions, or by using new methods, e.g., other specifications. Studies with major extensions, new data or new empirical methods are often called reproductions.But the more important robustness question is more controversial. The original authors can complain they don't like the replicator's choice of instruments, or procedures. So "replication," which sounds straightforward, quickly turns in to controversies.
Michael Clemens writes about the issue in a blog post here, noting
...Again and again, the original authors have protested that the critique of their work got different results by construction, not because anything was objectively incorrect about the original work. (See Berkeley’s Ted Miguel et al. here; Oxford’s Stefan Dercon et al. here and Princeton’s Angus Deaton here among many others. Chris Blattman at Columbia and Berk Özlerat the World Bank have weighed in on some of these controversies.)In a good paper, published as The meaning of failed replications in the Journal of Economic Surveys he argues for an expanded vocabulary, including "verification," "robustness," "reanalysis" and "extension."
"Failed replication" is a damning criticism. It implies error, malfeasance, deliberately hiding data, and so forth. What most "replication" studies really mean is "robustness," either to method or natural fishing biases, which is a more common problem (in my view). But as Michael points out, you really can't use the emotionally charged language of failed or "discrepant" replication for that situation.
This isn't about people or past work
I did not anticipate, but should have, that the secret data post would be read as criticism of people who do large-data work, proprietary-data work, or work with government agencies that cannot currently be shared. The internet is pretty snarky, so it's worth stating explicitly that is not my intent or my view.
Quite the opposite. I am a huge fan of the pioneering work exploiting new data sets. If these pioneers had not found dramatic results and possibilities with new data, it would not matter whether we can replicate, check or extend those results.
It is only now, that the pioneers have shown the way, that we know how important the work can be, that it becomes vital to rethink how we do this kind of work going forward.
The special problems of confidential government data
The government has a lot of great data -- IRS, and census for microeconomics, SEC, CFTC, Fed, financial product safety commission in finance. And there are obvious reasons why so far it has not been easily shared.
Journal policies allow exceptions for such data. So only a fundamental demand from the rest of us for transparency can bring about changes. And has begun to do so.
In addition to the suggestions in the last post, more and more people are going through the vetting to use the data. That leaves open the possibility that a full replication machine could be stored on site, ready for a replicator with proper access to push a button. Commercial data vendors could allow similar "free" replication, controlling directly how replicators use the data.
Technological solutions are on the way too. "Differential privacy" is an example of a technology that allows results to be replicated without compromising the privacy of the data. Leapyear.io is an example of companies selling this kind of technology. We are not alone, as there is a strong commercial demand for this kind of data. (Medical data for example.)
Other institutions: Journals, replication journals, websites,
There is some debate whether checking "replication" should count as new research, and I argued if we want replication we need to value it. The larger robustness question certainly is "new" research. Xs result does not hold out of sample, is sensitive to the precise choice of instruments and controls, and so forth, is genuine, publishable, follow-on research.
I originally opined that replications should be published by the original journal to give the best incentives. That means an AER replication "counts" as an AER publication.
But with the idea that robustness is the wider issue, I am less inclined to this view. This broader robustness or reexamination is genuine new research, and there is a continuum between replication and the normal business of examining the basic idea of a model with new data and also some new methods. Each paper on the permanent income hypothesis is not a "replication" of Friedman! We don't want to only value as "new" research that which uses novel methods -- then we become dry methodologists, not fact-oriented economists. And once a paper goes beyond pointing out simple mistakes, to questioning specification, a question which itself can be rebutted, it's beyond the responsibility of the original journal.
Ivo Welch argues that a third of each journal should be devoted to replication and critique. The Critical Finance Review, which he edits asks for replication papers. The Journal of Applied Econometrics has a replication section, and now invites replications of papers in many other journals. Where journals fear to tread, other institutions step in. The replication network is one interesting new resource.
Faculties
A correspondent suggests an important additional bullet point for the "what can we do" list
- Encourage your faculty to adopt a replicability policy as part of its standards of conduct, and as part of its standards for internal and outside promotions.
The precise wording of such standards should be fairly loose. The important thing is to send a message. Faculty are expected to make their research transparent and replicable, to provide data and programs, even when journals do not require it. Faculty up for promotion should expect that the committee reviewing them will look to see if they are behaving reasonably. Failure will likely lead to a little chat from your department chair or dean. And the policy should state that replication and robustness work is valued.
Another correspondent wrote that he/she advises junior faculty not to post programs and data, so that they do not become a "target" for replicators. To say we disagree on this is an understatement. A clear voice on this issue is an excellent outcome of crafting a written policy.
From Michael Kiley's excellent comment below
- Assign replication exercises to your students. Assign robustness checks to your more advanced students. Advanced undergraduate and PhD students are a natural reservoir of replicators. Seeing the nuts and bolts of how good, transparent, replicable work is done will benefit them. Seeing that not everything published is replicable or right might benefit them even more.
Two good surveys of replications (as well as journals)
Maren Duvendack, Richard Palmer-Jones, and Bob Reed have an excellent survey article, "Replications in Economics: A Progress Report"
...a survey of replication policies at all 333 economics journals listed in Web of Science. Further, we analyse a collection of 162 replication studies published in peer-reviewed economics journals.The latter is especially good, starting at p. 175. You can see here that "replication" goes beyond just can-we-get-the-author's-numbers, and maddeningly often does not even ask that question
a little less than two-thirds of all published replication studies attempt to exactly reproduce the original findings....A frequent reason for not attempting to exactly reproduce an original study’s findings is that a replicator attempts to confirm an original study’s findings by using a different data set"Robustness" not "replication "
Original Results?, tells whether the replication study re-reports the original results in a way that facilitates comparison with the original study. A large portion of replication studies do not offer easy comparisons, perhaps because of limited journal space. Sometimes the lack of direct comparison is more than a minor inconvenience, as when a replication study refers to results from an original study without identifying the table or regression number from which the results come.Replicators need to be replicable and transparent too!
Across all categories of journals and studies, 127 of 162 (78%) replication studies disconfirm a major finding from the original study.But rather than just the usual alarmist headline, they have a good insight. Replication studies can suffer the same significance bias as original work:
Interpretation of this number is difficult. One cannot assume that the studies treated to replication are a random sample. Also, researchers who confirm the results of original studies may face difficulty in getting their results published since they have nothing ‘new’ to report. On the other hand, journal editors are loath to offend influential researchers or editors at other journals. The Journal of Economic & Social Measurement and Econ Journal Watch have sometimes allowed replicating authors to report on their (prior) difficulties in getting disconfirming results published. Such firsthand accounts detail the reticence of some journal editors to publish disconfirming replication studies (see, e.g., Davis 2007; Jong-A-Pin and de Haan 2008, 57).Summarizing
.. nearly 80 percent of replication studies have found major flaws in the original researchSven Vlaeminck and Lisa-Kristin Hermmann surveyed journals and report that many journals with data policies are not enforcing them.
The results we obtained suggest that data availability and replicable research are not among the top priorities of many of the journals surveyed. For instance, we found 10 journals (i.e. 20.4% of all journals with such policies) where not a single article was equipped with the underlying research data. But even beyond these journals, many editorial offices do not really enforce data availability: There was only a single journal (American Economic Journal: Applied Economics) which has data and code available for every article in the four issues.Again, this observation reinforces my point that rules will not substitute for people caring about it. (They also discuss technological aspects of replication, and the impermanence and obscurity of zip files posted on journal websites.)
Numerical Analysis
Ken Judd wrote to me,
"Your advocacy of authors giving away their code is not the rule in numerical analysis. I point to the “market test”: the numerical analysis community has done an excellent job in advancing computational methods despite the lack of any requirement to share the code....
Would you require Tom Doan to give out the code for RATS? If not, then why do you advocate journals forcing me to freely distribute my code?...
The issue is not replication, which just means that my code gives the same answer on your computer as it does on mine. The issue is verification, which is the use of tests to verify the accuracy of the answers. That I am willing to provide."Ken is I think reading more "rule and censorship" rather than "social norms" in my views. And I think it reinforces my preference for the latter over the former. Among other things, rules designed for one purpose (extensive statistical analysis of large data sets) are poorly adapted to other situations (extensive numerical analysis.)
Rules can be taken to extremes. Nobody is talking about "requiring" package customers to distribute the (proprietary) package source code. We all understand that step is not needed.
For heavy numerical analysis papers, using author-designed software that the author wants to market, the verification suggestion seems a sensible social norm to me. If I'm refereeing a paper with a heavy numerical component, I would be happy to see the extensive verification, and happier still if I could use the program on a few test cases of my own. Seeing the source code would not be necessary or even that useful. Perhaps in extremis, if a verification failed, I would want the right to contact the author and understand why his/her code produces a different result.
Some other examples of "replication" (really robustness) controversies:
Andrew Gelman covers a replication controversy, in which Douglas Campbell and Ju Hyun Pun dissect Enrico Spolaore and Romain Wacziarg's "the Diffusion of Development" in the QJE. There is no charge that the computer programs were wrong, or that one cannot produce the published numbers. The controversy is entirely over specification, that the result is sensitive to specification and controls.
Yakov Amihud and Stoyan Stoyanov Do Staggered Boards Harm Shareholders? reexamine Alma Cohen and Charles Wang's Journal of Financial Economics paper. They come to the opposite conclusion, but could only reexamine the issue because Cohen and Wang shared their data. Again, the issues, as far as I can tell, are not a charge that programs or data are wrong.
Update: Yakov corrects me:
- We do not come to "the opposite conclusion". We just cannot reject the null that staggered board is harmless to firm value, using Cohen-Wang's experiment.
- Our result is also obtained using the publicly-available ISS database (formerly RiskMetrics).
- Why is the difference between the results? We used CRSP data and did not include a few delisted (penny) stocks that are in Cohen-Wang's sample. Our paper states which stocks were omitted and why. We are re-writing the paper now with more detailed analysis.
I think the point that replication slides in to robustness which is more important and more contentious remains clear.
Asset pricing is especially vulnerable to results that do not hold out of sample, in particular the ability to forecast returns. Campbell Harvey has a number of good papers on this topic. Here, the issue is again not that the numbers are wrong, but that many good in-sample return-forecasting tricks stop working out of sample. To know, you have to have the data.
Đăng ký:
Bài đăng (Atom)