A variety of tests were performed to attempt to gain some insight into where contention for priority queue servicing happens. In the STREAMS priority system, each band has its own message queue. These are then linked together on a master queue linked-list in order of priority. When the STREAMS scheduler runs and checks the queue, it will first run service procedures of those queues associated with the priority band queues. This system would essentially starve lower priority bands from being serviced if there is always a higher priority band waiting to be serviced.
The first test performed involves a receiver and
two senders. The receiver is bound to a single Data
Link Service Access Point (DLSAP) which is analogous
to a port in Socket programming. Both clients are
addressing their packets to the server's address and
an identical DLSAP. The idea was to send only one
packet of data from each client to the server to keep
things simple. In addition, the clients would be using
separate priority bands. The test indicates a problem
in Stream head processing. Figure 5.3
shows the test's configuration.
|
Analysis of network traffic using the standard "snoop" binary program shows that the message sent from the second client on band 1 was received by the stack. When running snoop, a module is pushed onto the protocol stack to parse the traffic. It sits above the Ethernet driver. Therefore, the driver successfully passed the message to the snoop module, which then would forward it onto the Stream head. This indicates that there is likely a problem in Stream head processing regarding priority bands. Other implementations based on SVR4 have been shown to not exhibit this erroneous behavior [10].
A similar test to that shown in Figure 5.3 was also executed except that a continuous stream of data was sent from the clients instead of a single packet. In addition, a client stream on priority band 1 was sent before the client stream from priority band 2. Figure 5.4 shows the test's configuration.
|
The test indicates that the Stream head is exhibiting the correct behavior by blocking band 1 messages when band 2 messages start arriving. Band 2 messages are those shown in the application as arriving via the continuous stream, and no band 1 messages are arriving, despite the client still sending them. However, the test was taken one step further by stopping P3 (the process sending band 2 messages), while continuing to let P1 run (the process sending band 1 messages). Theoretically, once the Stream head empties the message queue for band 2, it should backenable the queue holding band 1 messages, and continue any messages left on the queue. Experimental results indicate that this is not the system's realized behavior. Instead, the program delivers zero messages to the application.
An additional test was run that indicated a memory problem within the STREAMS allocator. The test shown in Figure 5.4 was run for a sufficiently long period of time ( 100 seconds). A problem exists when band 1 messages become blocked by band 2 messages. The UNIX SVR4.2 Programmer's Guide [31] indicates that each band contains its own separate flow-control parameters, and within each queue structure is a counter containing the number of messages on the queue. Some drivers and modules have made the error of testing only the first counter associated with the queue list. A problem may exist whereby there are no messages on the first queue in the queue list, but there are priority band messages. The Solaris implementation is unique in that it appears to keep two separate counters which keep track of both the number of normal band messages on the queue list, as well as the number of total messages on the queue list. When the hi water mark is reached for any band, it should set the QHIWAT flag so that no more messages will be put on its queue. However, running the test in Figure 5.4 exhibited other behavior. What happens is that band 1 messages were being blocked by band 2 messages. However, the MHME driver was still passing band 1 messages to the Stream head. The Stream head, in turn, continued to put messages on the band 1 queue. If the test is run long enough, the system will begin heavy swapping because it runs out of pre-allocated kernel buffers for the band 1 messages. Therefore, it begins allocating more memory from the free pool. Once this is used up, it starts swapping out other active processes. Disk I/O was monitored and shown to be reach intense peaks, and eventually the X Windows System itself is swapped out, and the system halts. It doesn't seem as if the hi-water mark for band 1 messages was properly set. Henceforth, a runaway stream steals all system memory. Running netstat -mv shows a number of STREAMS memory allocation failures for dblks of various sizes, as well.
In order to validate these claims, an additional test was executed. This time, only one client was run sending to one server using priority band 2. The test continued for a sufficiently long period ( 200 seconds), and CPU and disk I/O were closely monitored. The CPU averaged around 30-50% (using the Solaris sdtperfmeter binary at one second intervals). In addition, no heavy swapping or disk I/O was observed for the length of the test. Running netstat -mv shows zero STREAMS memory allocation failures. This seems to validate the claim that the band 1 messages in the previous test were running away with system memory.
Further tests were performed using multiple receiver
processes bound to separate DLSAPs. These tests are
analagous to simply having two separate socket processes
listening on separate port numbers. Two separate tests
were done using two and three clients. All hosts are
first synchronized to a reasonable clock value using
the rdate binary utility. Next, cron jobs
are scheduled betwen all clients and servers to execute
at the same time, and for a sufficiently long period
of time. The first test involved two client hosts
sending to two separate server processes on the same
host using different DLSAP values. The aim of the
test was to verify if the Ethernet driver was properly
multiplexing the messages. There exists no service
procedure in the driver, therefore it is not a source
of priority contention, but simply a message-marking
mechanism, as packets pass through it. In addition,
by using separate DLSAP values, two separate Streams
are constructed and attached to the MHME driver. Therefore,
even though the clients are sending data on separate
bands (1 and 2), there should be no queue blocking,
and the tests indicate that there is not. Both streams
are given roughly 50% of the available bandwidth during
the course of the tests. Figure 5.5
shows the test's configuration. Table 5.4
summarizes the results of the test when both clients
use normal priority messages. Table 5.5
summarizes the results when the client's use different
priority bands.
|
|
|
As can be seen, there is little area for contention here, and the driver multiplexes the separate Streams reasonably well. Any major differences in bandwidth percentage are most likely to occur because of synchronization flaws between cron job scheduling. The receiver host is a uniprocessor, and trial results indicate that whichever receiving server application is started first via the cron daemon gets slightly more bandwidth.
This test was taken one step further, using three clients sending to three server processes (all on the same host) with different DLSAP values for each. The results are consistent, and each client gets roughly one third of the available bandwidth. Table 5.6 shows the results.
|