taxpayer (randomly) taken from the 2011 tax record is a
descendant of one
taxpayer (randomly) selected from the 1427 Census is strictly higher if the two
share the same surname. Two facts challenge our working hypothesis. First, people
sharing the same surname may well not belong to the same family. Second, the city
of Florence is not a closed system. For instance, it may well happen that an
immigrant, having the same surname as those living in Florence in 1427, settled in
Florence from outside in the following centuries. Our methodology erroneously
treats the latter as a pseudo-descendant of the former.
We start by noting that our pseudo-links are more reliable with respect to
those adopted in previous studies, as they are generated by surnames living in the
city of Florence. For example, if the same data were available for all Italian cities,
our strategy would entail the prediction of the ancestors’ socioeconomic status
using the interaction between surnames and cities. This is arguably a more
demanding and more precise approach to creating links across generations than
the one adopted in previous studies (i.e. surnames at the national level). Moreover,
the huge heterogeneity and “localism” of Italian surnames further strengthens the
quality of the pseudo-links.
Nevertheless, we propose three tests aimed at showing the robustness of our
findings to the lineage imputation procedure. The first test is based on the idea
that the more common a surname is, the less sharing the surname is likely to be
informative about actual kinship. In the first two columns of Table 8, we re-
estimate equation (3) by weighting observations with the inverse of the relative
frequency in 1427, thus giving more weight to rare surnames. Our results are
confirmed, and if anything, they are upwardly revised, consistent with the fact that
the mismeasurement of the family links should lead to an attenuation bias.
The second test exploits the extent to which a surname is Florence-specific
(specificity is measured as the ratio between the surname shared in Florence and
the corresponding figure at the national level): the idea is that the more a surname
is Florence-specific, the less the same surname is likely to be contaminated by in-
and out-migration patterns. In the last two columns of Table 8, we split our key
parameters by interacting them with a dummy variable that equals 1 for more
typical Florentine surnames (those with a value of the ratio above the median) and
0 otherwise. The results are reassuring: the elasticities are larger (and significant)
for more Florence-specific surnames.
The two exercises discussed above indirectly test the robustness of the
pseudo-links. We complement them with a direct test that goes as follows. We
randomly reassigned surnames to taxpayers in 2011 and re-estimated the TS2SLS
intergenerational elasticities. If the positive correlations we detected were not
related to the lineage (whose measurement might be affected by error), but would
17
emerge
by chance, we should find that our estimates are not statistically different
from those stemming from a random reshuffling of surnames. Figure 4 shows the
distribution of the estimated earnings elasticity for 1 million replications. The two
dashed vertical lines are the 95
th
and the 99
th
percentiles, while the red line
indicates our estimate based on the observed surnames. These results provide a
clear graphical representation of the informational content of the surnames and
the goodness of the pseudo-links: the simulated p-value in this exercise is lower
than 1%. Figure 5 shows the corresponding results for wealth, where the result of
the test is even more telling.
5.3 Selectivity bias due to families’ survival rate
As said above, we are able to match only a subsample of the surnames in the
1427 Census with the 2011 tax records. This is clearly a reflection of the
demographic processes that are involved in the analysis of intergenerational
mobility in the very long run: the families’ survival rate depends on migration,
reproduction, fertility and mortality, which, in turn, may differ across people with
different socioeconomic backgrounds.
As far as migration is concerned, some of the families recorded in the 1427
Census might have decided to migrate during the following centuries. Since they
are not necessarily a random sample of the original population, this might bias our
estimates. Borjas (1987) provided a theoretical model that shows that migrants
are mainly drawn from the upper or lower tail of the skill (i.e. income) distribution.
Analogously, a dynasty’s reproduction rate (i.e. fertility/mortality rate) may be
correlated with income and/or wealth. Jones et al. (2010) showed a strong and
robust negative relationship between income and fertility, though they also argued
that, in the agrarian (pre-industrialization) economies, the reverse could have
been possible, as documented, for example, in Clark and Cummins (2009). On the
other side, it is reasonable to expect that the wealthiest families were those better
equipped to survive across the centuries (and therefore, those that can be matched
to the current tax records).
How do we address these issues? First, we compare the distributions of
earnings and wealth in 1427 between the families who are still present in the tax
records of 2011 and those who are not in order to have a general assessment of the
relevance of the selection issue. Figure 6a shows that the distributions of earnings
are rather similar, although the density of missing families has a larger mass of
probability for the lower level of earnings. As far as wealth is concerned, the two
distributions overlap each other (Figure 6b). Table 9 confirms the visual
inspection: with respect to the missing families, the surviving ones had 6% higher
18