A systematic Characterization of Application Sensitivity to Network Performance

Yüklə 0,74 Mb.

Pdf görüntüsü

səhifə	14/51
tarix	15.10.2018
ölçüsü	0,74 Mb.
	#74178

1 ... 10 11 12 13 14 15 16 17 ... 51

29
may correspond directly to
y
, or, if a round trip is required, may include
z
as well.
The classic experiment to compute
{|
is realized in the benchmark [35, 36]. This code
performs a ping-pong of messages between two nodes. One node sends a message of size
}
bytes.
After the entire message is received, the second node responds with a message of size
}
bytes. The
total time for the test approximates
~{
|
. The time reported is half of the time to send the message
and receive the response.
Figure 2.5 shows the results of the experiment for the MPI-GAM system for increasing
}
.
The ﬁrst thing the note is that there is a sharp inﬂection point at 4KB. The slope of the line changes
suddenly at 4 KB because of the change in the way MPI-GAM maps MPI messages to the GAM
system. The ﬁgure clearly shows the tradeoff in increased start-up cost vs. delivered bandwidth. If
we break up the line into to regions and compute the a least squares ﬁt, we see that the ﬁt is quite
good for each of the two regions.
LogGP to MPI Model
Given that we know the basic MPI performance and protocol, we are now in a position
to model the effect of inﬂation of the LogGP parameters on MPI performance. We characterize the
change in parameters on the linear model described in the previous section.
Inﬂation of overhead will impact the system in three ways. The ﬁrst and most obvious way
is on
{
. For messages under 4 KB, the MPID to GAM protocol only uses one message, so we simply
add
y
to the cost of
{
. For messages over 4 KB, the protocol uses a round trip, and so in our model
we inﬂate
{
by
Qy
. The third way overhead impacts the system is for long messages. We add
y
to the
cost of each 4 KB fragment, thus reducing the effective
p
. Adding latency to the system primarily
impacts the
{
term. We model an increase in
z
as adding
~0z
to the cost of
{
for messages over 4
KB and ignore effects for messages

4 KB. The gap is perhaps the most difﬁcult to model. Because
the MPI-GAM system uses few small messages, we chose to ignore added

entirely. We shall see
that is the not all that poor of an assumption. Indeed, one of the NPB can ignore added

entirely. The
Gap is perhaps easiest to cast into the MPI linear framework. Changes in Gap correspond directly
to changes in
p
via the model
p

.
2.3.3
TCP/IP Apparatus
The TCP/IP apparatus operates along the same lines as the parallel program apparatus.
We use this apparatus in our sensitivity measurements of the SPECsfs NFS benchmark. A guiding

30
engineering principle used in building the apparatus was to re-cycle as much of the user-level GAM
code and LCP as possible. The alternative approach of adding delays and calibrating Myricom’s
device drivers and LCP was rejected as too time consuming to complete in the context of this thesis.
Our approach to building the apparatus was to insert the GAM layer inside the Solaris ker-
nel. We created a kernel module which contained the user-level GAM code, slightly modiﬁed to run
in the kernel, and then layered the STREAMS TCP/IP on top of it. However, modiﬁcations had to
be made to the semantics of the GAM layer in order to accommodate placing active messages in the
kernel.
The user level GAM layer required three semantic changes in order to most easily accom-
modate the STREAMS abstraction while still providing controllable delays. First, the request-reply
model was removed from the code. The elimination of request-reply caused the second change, the
removal of reliability semantics. Finally, the buffering semantics for medium messages required an
extra copy on the receive side.
Construction
Figure 2.6 shows the architecture of the TCP/IP-GAM apparatus. A number of STREAMS
and character Solaris kernel drivers are required. Two drivers from Myricom are required to boot
and control the LANai card (not shown). The Active Message driver implements most of the GAM
functions. In order to more easily handle control operations, the Active Message driver is a simple
character driver. STREAMS drivers require special messages in order to send control information;
these are clumsy to use. However, as a character driver, the Active Message driver is unable to in-
terface to the STREAMS subsystem directly. Therefore, a pseudo-Ethernet STREAMS driver was
constructed to interface to the IP layer. The Ethernet driver was modeled on the Lance Ethernet driver
provided in the Solaris source code. The Lance uses some fast-paths not provided in the normal sam-
ple drivers. The interface between the Ethernet and Active Message drivers is a modiﬁcation of the
GAM functions.
The Unix STREAMS model assumes a number of modules which are connected by queues.
Figure 2.6 shows the relationship between these modules. The STREAMS framework contains the
notion of layering. Each module has a “down” direction, towards the device, called the write side.
The inverse is the “up” direction, called the em read side, which moves data toward the user process.
In addition to the two directions, there are two types of reads and writes: put procedures
and service procedures. The main difference between put and service procedures is in the scheduling

31
Read()/Write() calls
Application
Process
Stream head
Stream head
Node
Node
Node
Control
Process
Kernel
Node
Kernel
Kernel Active Messages
Socket module
TCP module
IP module
Pseudo Ethernet
Interrupt
Socket module
TCP module
IP module
Pseudo Ethernet
Kernel Active Messages
2
3
4
5
6
1
Read()/Write() calls
Application
Process
Figure 2.6: TCP/IP Apparatus Architecture
This ﬁgure shows the software architecture of the TCP/IP emulation environment. Two Solaris kernel
modules are used. One simulates and Ethernet driver and the second runs Kernel-to-Kernel Active
Messages. The ﬁgure shows the path needed to send a message. After the application passes the
message to the kernel(1), it eventually ends up at the pseudo-ethernet driver (2) which calls the kernel
active message driver (3). After crossing the Myrinet, the receiving LANai interrupts the host (4),
which invokes the poll routine of the kernel active message driver. The driver then passes it through
the STREAMS sub-system (5) and eventually the message ends up at the receiving application (6).
of the operation. The put procedure is called directly by the preceding module, while the service
procedure is called via the STREAMS scheduler.
Tracing the path of a
write
system call in Figure 2.6, after the write call at (1), the Socket
layer calls the write-put procedure of the TCP module, which calls the write-put procedure of the
IP module, which calls the write-put procedure of the pseudo-Ethernet driver. Finally, at (2), the
pseudo-Ethernet driver calls
am request
with the IP packet as the data for the medium active mes-
sage. In the normal case, the service procedures are not called. The kernel Active Message module
copies the message into the LANai ﬁrmware queue at (3). The current implementation thus requires
2 copies on the send side.
When the receiving LANai sees a message in the receive queue, it generates an interrupt.
The kernel vectors control to the Active Message poll function,
am poll
, at (4).
Am poll
in turn

Yüklə 0,74 Mb.

Dostları ilə paylaş:

1 ... 10 11 12 13 14 15 16 17 ... 51