34
0
0.2
0.4
0.6
0.8
1
0
10
20
30
40
50
60
70
80
90
Bulk Gap (usec/byte)
Delay (usec)
Bulk Gap Calibration for the TCP/IP
apparatus
TCP/IP
Parallel
Figure 2.7:
Calibration of bulk Gap for TCP/IP-GAM apparatus
This figure shows the empirical calibration for bulk Gap for the TCP/IP apparatus. The dependent
variable shows the added delay in
s per 100 bytes of packet size. The independent variable is the
Gap expressed in
s per byte (1/bandwidth). The figure shows the TCP/IP-GAM apparatus for ad-
justing bulk Gap is quite accurate. The basic parallel program apparatus calibration is shown as
well, demonstrating that the two systems are near equivalent as to Gap adjustment.
Calibration
Because the LCP of the TCP/IP-GAM apparatus is taken from the parallel programming
apparatus, we know that
and a network-limited
are identical, so we do not measure those param-
eters again. However, unlike the parallel programming apparatus,
is substantially different. As can
be seen from Figure 2.6, there are many software components involved in sending a message; the
result is a large software overhead.
Since we know
, we can compute
from a simple round trip time. Measurements show
a mean RTT of 340
s. With
at 5
s we can deduce
as roughly 82
s. Unlike in user-space,
calibration of the delay loop inside the kernel can be tricky because many floating-point math rou-
tines are not supported in the kernel. Fortunately, the Solaris Device Driver Interface (DDI) contains
a time-calibrated spin loop,
drv usecwait
. It was intended for short waits to slow devices, but
serves as a calibrated spin-loop quite well.
We must re-calibrate
, because the increases in
may affect the range for which changes
to the LCP bulk data handling loop affect
. We use the same methodology as the parallel apparatus.
We cannot control the fragment sizes used by the kernel, however. Our gap experiment thus sends
a single large block of data (10 MB), with each write call sending 8KB at a time. The observed