Database Ulisse: Methodological note · 4 Methodological note number of data makes it possible,...
Transcript of Database Ulisse: Methodological note · 4 Methodological note number of data makes it possible,...
Database Ulisse:Methodological note
September 2018
2 Methodological note
Table of Contents
Presentation 3
The sources of database Ulisse 5
UN Comtrade . . . . . . . . . . . . . . . 5
Other sources . . . . . . . . . . . . . . . 5
Eurostat - db Comext . . . . . . . . 5
United Nations - db Monthly Com-
trade . . . . . . . . . . . . . 6
US Census Bureau - db UsaTrade . . 6
Estimation of Year-End 7
Normalization 8
Outliers 9
Missing Data 10
Price Ranges 11
Quantity at constant prices 13
Presentation
This document describes the methodological choices
that have been made to build the database Ulisse.
The data of exports and imports declared by the va-
rious countries of the world are a rich database of
information content. Using these data it is possi-
ble to have information on the dynamics of di�erent
markets at a very high level of detail . And it is also
possible to know at what price a given good is sold
on a speci�c market and, above all, if some opera-
tors on the market pay or not a premium price for
higher quality o�ered. Finally, it allows to assess the
competitiveness of di�erent countries, highlighting
their strengths and weaknesses. Often, however ,
this information is not immediately accessible from
a simple reading of the data. The statements of dif-
ferent countries sometimes contain data that does
not correspond to the actual �ows of foreign trade.
The main reasons are as follows:
Estimation procedures : not always data are col-
lected when the goods arrive at the di�erent
customs borders. It is the case, for example, of
European Union countries' foreign trade �ows,
whose values are estimated by di�erent natio-
nal statistical institutes on the basis of VAT
returns made by the various companies. Ge-
nerally, however, these estimation procedures
are of high reliability.
Product classi�cation : not all countries use the
same product classi�cations and, therefore,
�ows relating to similar products can be at-
tributed to di�erent product codes. In re-
cent years, however , there was a strong pro-
cess of harmonization of di�erent classi�ca-
tions, mainly thanks to the work done by the
World Customs Organization (www.wcoomd.
org) that has developed a Harmonized System
classi�cation, which is used for almost 150
countries.
Con�dentiality : sometimes foreign trade �ows by
country and product can be secreted because
they are considered con�dential. Each coun-
try follows its own rules, with a higher inci-
dence of secreted �ows for smaller countries.
Among the countries of the European Union
some ones (i.e. Denmark, Finland, Austria)
present a ratio of secreted �ows more than 5%
of their total exports.
Delays : some countries in the developing world,
with a less well-established administrative
structure, tend to produce statistics relating
to their foreign trade �ows with delay, in some
cases consisting in even a few years. The UN is
committed to support these countries by �nan-
cing projects that aim to improve the proce-
dures for collecting, processing and publishing
foreign trade data.
In order to extract meaningful and reliable informa-
tion from the data base, it is therefore necessary
to use a variety of methodological tools designed to
distinguish the information content of the phenome-
non from the statistical noise.
It is therefore essential to build analytical procedu-
res that, on the basis of what has just been descri-
bed, best use the many advantages inherent in the
international trade data, namely:
High Numerousness : most countries publish
monthly statistics of foreign trade by product
code and by partner country. It follows that,
each month, each country publishes data rela-
ted to several million trade �ows data: the big
3
4 Methodological note
number of data makes it possible, therefore,
to easily identify possible outliers.
Data refer to the population : foreign trade sta-
tistics a�ect almost all trade �ows. This allows
to avoid all the problems regarding sampling
and measurement errors that can arise from
their use.
Double Statements : each foreign trade �ow is
declared twice: once by the exporting coun-
try and another time by the importing coun-
try. The double declaration, if properly used,
allows to increase signi�cantly the reliability of
the measures relating to various phenomena.
These features concerning foreign trade data allow
to make e�ective use of some methodologies of data
mining able to isolate satisfactorily the information
content of the various phenomena from their stati-
stical noise. This document describes the di�erent
methodologies used in the construction of the Data
Warehouse Ulisse. These methodologies include:
Outliers Detection : methods aimed at identify-
ing data a�ected by signi�cant measurement
errors and their replacement with values more
consistent with other observations;
Missing Data Detection : methods that identify
trade �ows for which some measures are mis-
sing. These methods allow to estimate for
the missing data a value consistent with the
information set.
Normalization : methods that allow to compa-
re the di�erent sizes of the same phenome-
non (i.e. �ow declared both by the coun-
try of origin and by the country of destina-
tion), producing for each variable a unique
measurement.
Conjuncture : methods capable of producing a pre-
estimate of an annual �gure, using the best
conjuncture information available.
Price Ranges for di�erentiable goods : me-
thods that allow to assess whether a certain
good may be subject of quality di�erentiation.
In the case of a di�erentiable good, the pro-
cedures developed by StudiaBo give a speci�c
range of price (quality) for each �ow, taking
into consideration the comparison between its
price and that of the other �ows for the same
good.
Quantity at constant prices : foreign trade �ows
declarations relate both to values and quan-
tities. The ratio between the two variables
consists of the so-called average unit value,
often identi�ed with the price. These values
also incorporate qualitative changes a�ecting
the product. In order to detect these chan-
ges, StudiaBo built the variable Quantity at
constant prices (Q), expected to include also
quality changes in the product. The prices de-
rived from that, given the relationship between
values and quantities at constant prices, can
therefore be considered more reliable than the
average unit values, because they tend not to
be a�ected by quality changes in the product.
We believe that the methods used by StudiaBo
and hereafter described allow to produce an highly
informative dataset.
The sources of
database Ulisse
UN Comtrade
The database Ulisse was constructed from avai-
lable information from di�erent sources of economic
analysis, �rst of all the United Nations Statistics Di-
vision. This entity has built and has been maintai-
ning the Comtrade database (http://comtrade.un.
org/db/). UN Comtrade brings together the annual
foreign trade �ows of more than 170 countries, detai-
led by code-level product. The Statistics Division of
the United Nations has estimated that UN Comtrade
covers more than 95% of world trade. For each �ow
of foreign trade (as a declaration of a country to its
commercial partner in a given year), UN Comtrade
reports the dollar value and the quantity exchanged
(which is expressed either in kg and / or a supple-
mentary unit, di�erent from product to product).
Harmonised System (HS)1 is the product classi�ca-
tion used in the database Ulisse.
Other sources
The objective of having a database capable of
supporting the analysis also of more recent events
has necessitated the development of a procedure for
the periodic updating of the database Ulisse. This
result was made possible by making use of economic
information made available from the following other
sources:
• Eurostat Comext database: mon-
thly foreign trade �ows declared
by the EU and EFTA countries
(http://epp.eurostat.ec.europa.eu/);
• UN Monthly Comtrade database: monthly
foreign trade �ows declared by UN countries
partecipating to the UN Monthly Comtrade
project (http://comtrade.un.org/monthly/);
• UN Monthly Comtrade database: monthly
foreign trade �ows declared by UN countries
partecipating to the UN Monthly Comtrade
project (http://comtrade.un.org/monthly/);
• U.S. Census Bureau - db UsaTrade: mon-
thly foreign trade �ows of U.S. companies
(https://usatrade.census.gov/).
Eurostat - db Comext
Eurostat Comext (http://ec.europa.eu/eurostat/
estat-navtree-portlet-prod/BulkDownloadListing?
sort=1&dir=comext) is the database containing
the monthly statements of foreign trade reported by
EU and EFTA countries.
For each monthly �ow of foreign trade, db Comext
reports the measures both in monetary values and
in volumes (kilograms and / or other supplemen-
tary unit measure). Data are updated monthly at
�xed intervals, with a delay of about six weeks for
�ows reported to partner countries outside Europe
and about 10 weeks for �ows reported to European
countries.
EU member states are using a very detailed le-
vel of customs classi�cation, called the Combined
Nomenclature, at 8-digit level (CN8). CN8 con-
sists of a further speci�cation with respect to the
customs classi�cation 6-digit Harmonized System
(HS6). Each HS6 chapter is the sum of one or more
sub-NC8s.
1The HS classi�cation at 6-digits level, developed by the World Customs Organization (WCO), is revised every 5years (last time in 2017).
5
6 Methodological note
United Nations - db Monthly Comtrade
Since 2010 the United Nations Statistics Division
has been carrying out an experimental project cal-
led Monthly Comtrade, for the construction of an
online database, accessible at http://comtrade.un.
org/monthly/, on monthly foreign trade �ows re-
ported by UN countries.
Data are available in dollars and kilograms, for
each product of the Harmonized System customs
classi�cation at 6-digits level (HS6).
US Census Bureau - db UsaTrade
US Census Bureau disseminates monthly foreign tra-
de data reported by the United States of Ame-
rica through the portal UsaTrade Online (https:
//usatrade.census.gov/), according to the Harmo-
nized System customs classi�cation at 10-digit level
(HS10). HS10 consists of a further speci�cation with
respect to the customs classi�cation 6-digit Harmo-
nized System (HS6). Each HS6 chapter is the sum
of one or more sub-HS10s.
Estimation of
Year-End
Foreign trade data are processed by StudiaBo with
the objective to support the analysis also of more
recent events. As already mentioned, however, the
Statistics Division of the United Nations poses no
terms for the transfer of information and therefo-
re trade �ows regarding previous year can often be
lacking. With the aim to overcome this drawback,
StudiaBo has implemented an estimation procedure
to close the year on the basis of the monthly de-
clarations available. This estimation procedure con-
sists, �rst, isolating the last two years elapsed. So,
from the n months already subject to reporting, for
each �ow (country of origin - country of destination)
change rates were calculated for values and quanti-
ties. This operation allows to estimate the informa-
tion of the foreign trade of 12-n months still missing
and hence to derive a �rst hypothesis of year-end:
the StudiaBo procedure, in fact, applies to preceding
annual UN Comtrade �gures the dynamics expres-
sed by the di�erent monthly databases. Plase note
that, in the case of non-EU countries that do not
adhere to Monthly Comtrade, the estimation proce-
dure appears quite imposing. Given a �ow between
a country of origin and a country of destination, the
procedure detects the following cases:
• in the case of the construction of a closed sam-
ple for the last two years, if it is possible to
calculate a rate of change for both exporter
and importer, the �ow not yet explicitly stated
is calculated applying a change rate equal to
the average between the 2;
• in the case the information available enables to
derive only a change rate either of the exporter
or of the importer, the �ow not yet explicitly
stated is calculated through the characteristic
ratio (rate of change) available for the most
recent year.
• in the case it is not possible to calculate any
rate of change for a given �ow, the latter is
calculated applying the ratio of the total ex-
ports of the product last year history and the
total exports of the product in the penultimate
year.
7
Normalization
As mentioned before, information on foreign trade
are characterized by a double statement: each �ow
is, in fact, declared by both the exporting coun-
try and the importing country, with possible mar-
gins of di�erence. As known, the values declared
by the importer are a�ected by the incidence of the
costs of insurance and freight (CIF: Cost, Insuran-
ce and Freight) and are therefore generally higher
than the �ows declared by the exporter (FOB: Free
On Board). In this regard, the procedure StudiaBo,
based on the Mirror Statistics methodology, detects
and manages the following types of cases:
• the trade �ow is declared only once, by the ex-
porter or the importer (eg, for reasons of com-
mercial secrecy or small economic importance
of the �ow). In this case, StudiaBo estimates
the missing declaration, considering a di�eren-
ce in value between statements at CIF prices
and FOB prices equal to 5 %, in agreement
with a broad empirical literature that estima-
tes economic costs of transport and insurance
as a factor proportional to the total value of
between 3 and 8%;
• the trade �ow is declared twice, both by im-
porter and exporter, and the two statemen-
ts are mutually consistent 2. In this case, no
adjustments are made by the procedure.
• the trade �ow is declared both by importer and
exporter, but the two statements are mutually
inconsistent. In this case, the StudiaBo pro-
cedure recalculates the value of the �ow at
FOB and CIF prices, using the following simple
average:
FOB = (Vexp + Vimp/1.05)/2 (1)
CIF = (Vexp ∗ 1.05 + Vimp)/2 (2)
where Vexp e Vimp represent, respectively, the
value declared by the country of origin and the
value declared by the country of destination.
The technique just described allows to derive higher
level information from the data available, particu-
larly with regard to the value measurement. The
measurement of quantities, in contrast, requires an
additional step, through the identi�cation of outliers
and the estimation of the missing data, which will
be described in the next chapter.
2StudiaBo consider a double statement as consistent whenever the two declarations do not di�er in value by morethan 20%.
8
Outliers
One possible cause of discrepancy between state-
ments of import and exports in quantity is the pre-
sence of outliers or measurement errors which, if not
treated, can a�ect the understanding of the econo-
mic phenomenon. Quantity declarations, whether
expressed in kilograms or in supplementary unit mea-
sure, are therefore subject to some control �lters.
First, StudiaBo has decided to eliminate the informa-
tion in quantity estimated by the Statistics Division
of the United Nations, considering it more appropria-
te to refer to a well codi�ed internal methodology.
Therefore, on the basis of the available declarations,
the following key ratios are constructed:
• an average unit value per �ow, given by the
ratio of monetary value and kilograms;
• a conversion factor, given by the ratio of
kilograms and supplementary unit measure.
On the basis of the orderly distribution of these key
ratios, a range of reliability is de�ned (based on the
�rst and ninth decile of the distribution) within which
the observations are considered to be reliable. Ob-
servations excluded from this range are therefore re-
calculated to be in the range of validity, resulting in
a smoothing of the distribution. This method is
used for all traded goods that have been identi�ed
previously as di�erentiable3. In case, however, the
preliminary analysis of di�erentiability has concluded
the substantial homogeneity of a good, the range of
reliability of the key ratios is restricted around the
median.
3To the purpose of distinguishing commodity goods from qualitatively di�erentiable goods, StudiaBo has built aneconometric panel data model at �xed e�ects, through which it is estimated the elasticity of export shares to pricechanges.
9
Missing Data
In cases where the quantity information are not avai-
lable or estimated by the United Nations Statistics
Division, StudiaBo is committed to rebuild the da-
ta on the basis of the following internal method of
interpolation:
• First it is built the time series of annual ave-
rage unit values per single product, given by
the characteristic ratio between values and ki-
lograms of the product for each year). This
series represents the base dynamics followed
by the prices of the product object of analysis
and is characterized by one-dimension (time).
Note that in the case little information is avai-
lable for the last year elapsed, the characteri-
stic ratio is set equal to that of the penulti-
mate year. If, on the contrary, the information
available are su�ciently consistent, the ave-
rage unit value is calculated on the basis of
the closed sample of observations in the last
two-year period.
• If it is not possible to give an annual charac-
teristic ratio to a product, StudiaBo makes a
�rst interpolation using as a reference the years
available.
• Given the time series of annual average unit
values, a 2-dimensional matrix is built, whe-
re the price drivers is not only the year but
also the exporting country. Note that, for rea-
sons related to the robustness of information,
the values within that matrix are made explicit
only for those countries that record in a base
year (e.g. 2010) a share on the world trade
of the product of at least 1%. In the case
of less relevant exporting countries, the cha-
racteristic ratio will coincide with that of the
annual time series of the product. In the case
that, in a given year and for a given exporter,
information in quantity is not available, Stu-
diaBo's procedure will again recourse to inter-
polation, using the index given by the charac-
teristic ratio for exporter and year compared
to the characteristic ratio for year.
• Finally, the Cartesian ratio of observations is
built, which identi�es, for the total time pe-
riod, all the possible combinations of �ows bet-
ween a country of origin and a country of desti-
nation. For every combination of exporter and
importer, an interpolation is carried out of the
index given by the characteristic ratio for �ow
and the characteristic ratio for exporter.
• Once reached the highest level of complete-
ness of the available data, the information in
kilograms is reconstructed from the average
unit values, obviously only for �ows that have
a counterpart in value.
• After the reconstruction of the information in
kilograms, this process is repeated for a second
type of characteristic ratios, i.e. the conver-
sion factor, which explicit for each product the
weight per supplementary unit measure.
10
Price Ranges
In order to analyse the quality characteristics of the
international trade �ows of a product, the StudiaBo's
procedure gives each �ow a variable (Price Range
(R), built on the basis of the ordered distribution of
average unit values (the ratios between values and
kilograms). In detail, the values of international tra-
de are de�ated by a production price index, which is
built on-purpose for about 200 supply chains. This
price index summarizes the impact of the following
inputs:
• the cost of labour;
• the price of materials.
In particular, the cost of labour is calculated from
the hourly labour costs per country, provided by the
U.S. Bureau of Labor Statistics. The average hour-
ly labour cost is weighed on the basis of the export
levels of the more than 150 countries of Ulisse clas-
si�cation and then adjusted by a �xed percentage
attributable to productivity changes. In turn, the
price of materials incorporates both the oil price (as
a proxy of energy consumption) and the price of the
main commodity intervened in the production pro-
cess of the supply chain. By de�ating average unit
values it is possible to build a distribution of prices
comparable to each other, as relativized to a base
year. Once identi�ed the deciles of the distribution,
these are used as thresholds for identifying di�erent
price ranges. Then it is possible to assign a dyna-
mic to these price ranges: for each year, in fact, the
speci�c deciles of the distribution are calculated, by
applying to the deciles of the base year (e.g. 2010)
the change rate embedded in the production price
index, according to the equation:
Decilen,year = Decilen,2010 ∗ Indiceyear/100 (3)
where Decilen,year represents the decile that in the
given year holds the position n, Decilen,2010 is its
counterpart calculated in the base year (2010) and
Indiceyear represents the value expressed by the pro-
duction price index in the given year. Therefore, the
price ranges are dynamically identi�ed to distinguish
the following ranges of price (quality):
• LL -Low Low : identi�es the low-quality mar-
ket segments, comprehensive of all the �ows
with an average unit value below the second
decile of the orderly distribution of prices at
which the product is marketed worldwide;
• LM - Medium Low identi�es the medium-low
quality market segments, comprehensive of all
the �ows between the second and fourth decile
of the orderly distribution of prices at which
the product is sold worldwide, between the
fourth and sixth decile, in which the product
is marketed; item HM - High Medium identi-
�es the medium-high quality market segmen-
ts, comprehensive of all the �ows between the
sixth and eighth decile of the orderly distri-
bution of prices at which the product is sold
worldwide;
• HH - High High identi�es the high-end quali-
ty market segments, comprehensive of all the
�ows with an average unit value higher than
the eighth decile of the orderly distribution
of prices at which the product is marketed
worldwide.
This process of attribution, of course, applies on-
ly in cases where the product has been previously
identi�ed as di�erentiable, through the estimation
of the apposite economic model. In the case of not
11
12 Methodological note
di�erentiable goods, such that an exporting coun-
try is impossible to extract a signi�cant premium-
price from the market, in the construction of the
variable Price Range (R), the standard price range
MM (Medium-Medium) is applied to all �ows of that
product.
Quantity at
constant prices
As mentioned before, the main measures of foreign
trade �ows concern values and physical units (gene-
rally expressed in kilograms), whose ratios are the
so-called average unit values, often regarded as a
proxy of prices. Average unit values are widely used
in the procedure for the construction of DW Ulis-
se. It should however be pointed out that chan-
ges in average unit values, besides re�ecting price
changes, might also incorporate changes in product
quality. In order to develop analyses that consider
those amendments, StudiaBo preferred to unbundle
the qualitative e�ects on the prices, to include them
in a new variable (Q), which measures quantity at
constant prices. By doing so, it is possible to obtain
a price indicator, based on the ratio between values
and quantities at constant prices, more informative
because not vitiated by qualitative modi�cations. In
detail , the variable Q is constructed for a single fo-
reign trade �ow with reference to a base year (e.g.
2010), isolating the quantity on the basis of the price
range at which the �ow belongs for that year. For
that year the procedure builds both the average unit
value for each speci�c combination [country of ori-
gin - country of destination - price range] and the
average price for each price range. In the case, for a
given �ow, there is a qualitative increase compared
to 2010, such as to be associated to a di�erent price
range, then the average unit value of 2010 is abando-
ned in favour of the speci�c average price associated
with the new price range of reference. This causes
a consequent increase of the quantity expressed at
constant prices (Q), as explained by the following
equation:
Qflow,year = Kflow,year ∗AUV2010 (4)
where Qflow,year identi�es the quantity at constant
prices in any given year for a given combination
[country of origin - country of destination - price
range], Kflusso,year identi�es the kilograms object of
foreign trade �ows in the given year for a given com-
bination [country of origin - country of destination
- price range], AUV2010 represents the average unit
value in 2010 related to the price range associated
to the �ow.
13