Промышленный лизинг Промышленный лизинг  Методички 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 [ 85 ] 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103

regression. To do lliis, we apply the Nadaraya-Walson estimator (12.3.0) with a Gaussian kernel to the data, and vary the bandwidth parameter Ii between O.la, and 0.5<тх where ax is the sample standard deviation ol [X,\. By varying h in units of standard deviation, we are implicitly normalizing the explanatory variable X, by its own standard deviation, as (12.3.10) suggests.

I For each value of h, we plot the kernel estimator as a function of A and djiese plots are given in Figures 12.5a to 12.5c. Observe that for a bandwidth of 0.1a the kernel estimator is too choppy-the bandwidth is too small lc provide sufficient local averaging to recover Sin(X,). While the kernel e: timator does pick up the cyclical nature of the data, it is also picking up n ndom variations due to noise, which may be eliminated by increasing the bmdwidth and consequently widening the range of local averaging.

Figure 12.5b shows the kernel estimator for a larger bandwidth of 0.3(7Л, w itch is much smoother and a closer lit lo the true conditional expectation.

As the bandwidth is increased, the local averaging is performed over successively wider ranges, and the variability of the kernel estimator (as a function of x) is reduced. Figure 12.5c plots the kernel estimator with a bandwidth of 0.5a which is loo smooth since some of the genuine variation of ihe sine function has been eliminated along with the noise. In the limit, tile kernel estimator approaches the sample average of {),), and all the variability of Y, as a function of X, is lost.

12.3.2 Optimal Bandwidth Selection

it is apparent from the example in Section 12.3.1 that choosing the proper bandwidth is critical in any application of kernel regression. There are several methods for selecting an optimal bandwidth; the most common of these is the method of cross-validation, popular because of its robustness and asymptotic optimality (see Hardle [1990, Chapter 5] for further details). In this approach, the bandwidth is chosen to minimize a weighted-average squared error of the kernel estimator. In particular, for a sample of T observations (X Y,yjzJ, let

mA,;(X;) = 1£ш,.г(Х)-)У (12.3.12)

which is simply the kernel estimator based on the dalaset with observation j deleted, evaluated at the jlh observation Xj. Then the cross-validation function CV(/i) is defined as

1

CV(/i) = -X)K< - >V,№)]2<5(X,). (12.3.13)

where 5(X/) is a nonnegalive weight function that is required to reduce bbundary effects (sec Hvirdlc [1990, p. 102] for further discussion). The

(a) = (1. la,


5 -у 2.

(b) /l = (l..Vi,


(r) Ii = И.Гот, Figure 12.5. Kernel l..\limiilm



tz. csuitttiteurttie.s in Ilnaucial Data

function CXUi) is (ailed die cross-validation (unction because it validates the success ol the kernel estimator in lilting У, across the 7 subsamplcs (\/. iibft;. each with one observation omitted. The optimal bandwidth is the one lhat ininiiiii/cs this function.

12. к 3 Aieragr Derivative Estimators

For many financial applications, we wish to relate V, to several variables .... .V*, noiiparameli к ally, for example, we may wish to model the expected returns ol slocks and bonds as a nonlinear function of several factors: the market relurn, interest rate spreads, dividend yield, etc. (see l.o and MacKinlay 1 1996)). Such a lask is considerably more ambitious than the univariate example of Section I2.3.1. To see why, consider the case of live independent variables and, without loss of generality, let these live variables all lake on values in the interval 0, I]. Even if we divide the domain of each variable inio only ten equally spaced pieces, this would yield a total of 10*= 100.000 neighborhoods each of width 0.10; hence we would need at least 100,0(10 observations lo ensure an average of just one data point per neighborhood! This eiir.seo\dimensionality can only be solved by placing restrictions on the kinds of nouliiiearilies that are allowable.

For example, suppose a rVmv/r combination of the X s is related to V, iionparaiucttically. This has the advantage of capturing important non-linearities while providing sufficient structure to permit estimation with reasonable sample si/.es. Specifically, consider the following multivariate nonlinear model:

Г, = ш(Х) + <= F.£,X(] = 0 (12.3.1-1)

where X, = ... .V*,) is now a (/< x I) vector and ) ( ) is some arbitrary but fixed nonlinear function. The function ; () may be estimated by the following two-step procedure: (I) estimate /3 with an average derivative estimator fi; and (2) estimate m(-) with a kernel regression of F, on X,/1

Stoker (1986) observes thai the coefficients/3 of (12.3.14) maybe estimated up to a scale factor bv ordinary least squares if either of the following two conditions is lute: (I) the X,s are multivariate normal vectors; or, more

generally, (2) F. \ X,/-) is lineat in Xfl for = I...../<-. If neither

of lliese conditions holds. Stoker (IОКО) proposes an ingenious estimator, the average derivative estimator, which can estimate /3 consistently (see also Stoker I 1002!).

11 litis set i ii ul i i inililii in is s.nisliril I iv nitiltiv.ii i.iu* tiiiitn.il X,s but is .tlsn s.ilislicil loi iion-llunii.ll elliptic.ilk svmuu-ttii distributions. Sec ( .haiuhrrfiiu {ItltV.Mi), (lining and ( aildbriger

tiwi). Ости, .mil iiisii 11ЧН1). .nut Кит! i i.m:t).

12.3. Nonparamelric Estimation

Average derivative estimators are based on the fact that the expectation of the derivative of m(-) with respect to the X,s is proportional to /3:

= Е[т<ВД/3 <x P.


(12.3.15)

Therefore, an estimator of the average derivative is equivalent to an estimator of (3 up to a scale factor, and this scale factor is irrelevant for our purposes since it may be subsumed by m(-) and consistently estimated by kernel regression.

There are several average derivative estimators available: the direct, indirect, and slope estimators. Stoker (1991, Theorem 1) shows that they are all asymptotically equivalent; however, Stoker (1992, Chapter 3) favors the indirect slope estimator (ISE) for two reasons. First, if the relation between Y, and X, is truly linear, the indirect slope estimator is still unbiased whereas the others are not. Second, the indirect slope estimator requires less precision from its nonparametric component estimators because of the JSEs ratio form (see below). j

Heuristically, the indirect slope estimator /3SE exploits the fact that the unknown parameter vector (3 is proportional to the covariance betvyeen the dependent variable Y and the negative of the derivative of the logarithm of the marginal density of independent variables X denoted by; /( ). Therefore, by estimating CovjT, /( )], we obtain a consistent estimator of /3 up to scale. This covariance may be estimated by computing the sample covariance between У and the sample counterpart to /( ).

More formally, /31SE may be viewed as an instrumental variables V) estimator (see Section A.l of the Appendix) of the regression of Y, on X, with the instrument matrix H:

(HX)-HY,

(12.3.16)

where Y = [Y,...YT],

1 U(X,)/(X,)

- l x; -

H ==

1 I4(X,)/(X,)

l x;

. 1 U(Xr)/ (Xr) .

1 Xr .

(12.3.17)

/( ) is an estimator of the negative of the derivative of the log of the marginal density of X and Ij(x) is an indicator function that trims a portion of the sample with estimated marginal densities lower than a fixed constant b:

Ux) = l[/(x) > b].

(12.3.18)



In mosl empirical applications, the constant It is set so that between 1 % and 5% of the sample is trimmed.

To obtain /( ), observe that if/(x) denotes the marginal density of X then the Gaussian kernel estimator of /(x) is given by12

1 1 /х-ХЛ

where

(V) r>()

= (2л-)-

- -(x-X,)(x-X,)

Therefore, we have

(12.3.19)

(12.3.20) (12.3.21)

(12.3.22)

and wc can define l(\) to be

J(x) =

/(x) /(x)

(12.3.24)

Despite the multivariate nature of /( ), observe that there is still only a s ngle bandwidth to adjust in the kernel estimator (12.3.19). As in the variatc case, the bandwidth controls the degree of local averaging, but no v over multidimensional neighborhoods. Д&.а practical matter, the numerical properties of this local averaging procedure may be improved by no malizing all the X/,s by their own standard deviations~bcforc computing /( i, and then multiplying each of the fljs by the standard deviation of (he cot responding X,i to undo the normalization.

.Note that the bandwidth h implicit in fix) is, in general, dillcrent trout the bandwidth nl the lionpaianietrir estimator of ml) in (12.3.14). Cross-validation ierhnicucs may be used to select both; however, this may be computationally loo demanding and simple rules-ol-lbitnib may sttllice.

12.3.4 Application: Estimating State-Iriee Densities

One of the mosl important theoretical advances in the economics of investment under uncertainly is the linie-slale prclcrcucc model of Arrow (1904) and Dcbreu (1959) in which ihey introduce primitive securities, each paying $1 in one specific stale of nature and nothing otherwise. Now known as Anaw-Del/reu securities, they are the fundamental building blocks from which we have derived much of our current understanding of economic equilibrium in an uncertain environment.

In practice, since true Arrow-Debreu securities are not yet traded on any organized exchange, Arrow-Debreu prices are not observable.1 However, using nonparaiiietric techniques-specifically, multivariate kernel regression-Ait-Sahalia and Lo (1990) develop estimators for such prices, known as a state-price density (SPD) in the continuous-state case. The SPD contains a wealth of information concerning the pricing (and hedging) of risky assets in an economy. In principle, it can be used to price other assets, even assets that arc currently not traded (see Ait-Sahalia and l.o [ 1995] for examples).14

More importantly, SPDs contain much formation about preferences and as.;et price dynamics. For example, if parametric restrictions are imposed on the data-generating process of asset prices, the SPD estimator may be used to infer the preferences of the representative agent in an equilibrium model of asset prices (see, for example, Hie:к [1990] and lie and Lelaud [1993]). Alternatively, if specific preferences are imposed, the SPD estimator may be used to infer the data-generating process of asset prices (sec, for example, Derman and Kani [ 1994], Dupire [ 1994], Jackwerth and Rubinstein [1995], Longstaff [1992, 1994J, Rady [1994], Rubinstein [1985], and Shimko [1991, 1993]). Indeed, Rubinstein (1985) has observed lhat any two of the following implies the third: (I) the representative agents preferences; (2) asset price dynamics; and (3) the SPD.

Definition of the Slate-Price Density

To define the SPD formally, consider a standard dynamic exchange economy (see Chapter 8) in which the equilibrium price p, of a security al date I with a single liquidating payoff Y {€/ ) at date T that is a function of aggregate consumption Or is given by:

P, = ЫУ{Ог)Ми). M,.r з .. . (12.3.25)

1 sThis may soon change with the advent ol мфгЫшт, first proposed by (>arinan (197b) and 1 iakausson (1970, 11)77) and currently under development by I .eland Olhien Rubinstein Associates, Inc. See Mason, Merlon, Perold, and Tufano (<ШГ>) lor further details.

1 Of course, markets must, he dynamically complete tor such pi ices to he meaningful-see, for example, Constant in ides (1982). This assumption is almost a Iwavs adopted, either explicitly or implicitly, in parametric derivative pricing models, and we adopt it as well.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 [ 85 ] 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103