A systematic Characterization of Application Sensitivity to Network Performance

Yüklə 0,74 Mb.

Pdf görüntüsü

səhifə	12/51
tarix	15.10.2018
ölçüsü	0,74 Mb.
	#74178

1 ... 8 9 10 11 12 13 14 15 ... 51

Calibration of LogGP Parameters
Calibration Summary

24
µ
sec
µ
sec
µ
sec
10
9
8
7
6
5
4
3
2
1
0
µ
sec
g = 12.8
µ
sec
∆
= 10
0
10
20
30
40
50
60
70
Oreceive = 4
BurstSize
Round Trip Time = 21
Osend = 1.8
20
15
10
5
0
Sec/Message
µ
∆µ
sec
12
11
Figure 2.3: Calibration of LogGP Parameters
The LogP signature is visible as the isobaric plot of burst size vs. ﬁxed computational delay,
v
.
This signature was a calibration made when the desired
w
was 14
x
s. The send overhead, receive
overhead and gap can be read from the signature. Overhead is modeled as the average of the send
and receive overhead. Latency is computed as
y

(round trip time)-2o.
Such a calibration can be obtained by running a set of Active Message micro-benchmarks,
described in [31]. The basic technique is to measure the time to issue a sequence of

messages with
a ﬁxed computational delay,
v
between messages. The clock stops when the last message is issued
by the processor, regardless of how many requests or responses are in ﬂight. Plotting the average
message cost as a function of the sequence size (burst size) and added delay generates a LogP signa-
ture, such as that shown in Figure 2.3. Each curve in the ﬁgure shows the average initiation interval
seen by the processor as a function of the number of messages in the burst,

, for a ﬁxed
v
. For a
short sequence, this shows the send overhead. Long sequences approach the steady-state initiation
interval,
w
. For sufﬁciently large
v
the bottleneck is the processor, so the steady state interval is the
send overhead plus the receive overhead plus
v
. Finally, subtracting the two overheads from half
the round-trip time gives

.
Table 2.2 describes the result of this calibration process for three of the four communica-
tion characteristics. For each parameter, the table shows the desired and calibrated setting for that
parameter. For much of the tunable range for overhead, the calibrated value is within 1% of the de-
sired value. Observe that as

is increased, the effective gap increases because the processor becomes
the bottleneck, consistent with the LogGP model. As desired, the value of

is independent of

. The
calibrated
w
is somewhat lower than intended and varying
w
has little effect on

and no effect on

25
Desired
Observed
Desired
Observed
Desired
Observed
o
o
g
L
g
g
o
L
L
L
o
g
2.9
2.9
5.8
5.0
5.8
5.8
2.9
5.0
5.0
5.0
2.9
5.8
4.9
5.1
10.1
5.0
8.0
7.5
2.9
5.1
7.5
8.1
2.9
6.3
7.9
8.1
16.0
4.7
10
9.6
2.9
5.5
10
10.3
2.9
6.4
12.9
13.0
26.0
5.0
15
14
3.0
5.5
15
15.5
2.9
7.0
22.9
23.1
46.0
4.9
30
29
3.0
5.5
30
30.4
2.9
9.6
52.9
52.9
106.0
5.4
55
52
2.9
5.5
55
55.9
3.0
15.5
77.9
76.5
151.0
5.3
80
76
2.9
5.5
80
80.4
2.9
21.6
102.9
103.0
205.9
6.0
105
99
3.0
5.5
105
105.5
3.0
27.7
Table 2.2: Calibration Summary
This table demonstrates the calibration of desired LogP parameter values versus measured values.
The table also shows that the LogP parameters can mostly be varied independent of one another. As
predicted by the LogP model, when

then

is not longer observable as a distinct parameter, it
degenerates to

. Note that in the steady-state, the data for

includes the time to send and receive a
message. The increase in

at high

is due to our system’s ﬁxed capacity of 4 outstanding messages
between processor pairs. It differs from the LogP capacity model which speciﬁes that up to
8fC
messages can be in-ﬂight to a given processor at a time.

. Increasing

has little effect on

. A notable effect of our implementation is that for large values
of

, the effective

rises. Because the implementation has a ﬁxed number of outstanding messages
independent of

, due in part to the

deadlock avoidance algorithm, when

becomes very large
the implementation is unable to form an efﬁcient network pipeline. In effect, the capacity constraint
of our system is constant, instead of varying with

and

as the LogGP model would predict.
To calibrate

, we use a similar methodology, but instead send a burst of bulk messages,
each with a ﬁxed size. The delay inside the LANai bulk handling loop was set to a speciﬁc number
of microseconds per 100 bytes of data. From the initiation interval and message size we derive the
calibrated bandwidth. We increase the bulk message size until we no longer observe an increase in
bandwidth, which happens at a 2K byte message size. Figure 2.4 shows a linear relationship between
the added delay in the LANai code and the observed

. The linear relationship shows that the appa-
ratus can deliver a range of

quite accurately. The small dip at the lower left shows that as we add
a linear delay, a sublinear increase in

occurs. We can conclude from this probe that the baseline
ﬁrmware is not rate-limited, instead the system is overhead-limited.

Yüklə 0,74 Mb.

Dostları ilə paylaş:

1 ... 8 9 10 11 12 13 14 15 ... 51