******************************************************************* ATM Forum Document Number: ATM_Forum/96-0520 ******************************************************************* Title: Considerations for Frame Level Throughput and Latency Measurements of ATM Switches ******************************************************************* Abstract: We discuss measurement methods for throughput and latency in this contribution. These are enhancements of those in our February 96 contribution. ******************************************************************* Source: Raj Jain, Bhavana Nagendra, and Gojko Babic The Ohio State University Department of CIS Columbus, OH 43210-1277 Phone: 614-292-3989, Fax: 614-292-2911, Email: Jain@ACM.Org The presentation of this contribution at the ATM Forum is sponsored by NASA. ******************************************************************* Date: April 1996, Anchorage, Alaska ******************************************************************* Distribution: ATM Forum Technical Working Group Members (AF-TEST) ******************************************************************* Notice: This contribution has been prepared to assist the ATM Forum. It is offered to the Forum as a basis for discussion and is not a binding proposal on the part of any of the contributing organizations. The statements are subject to change in form and content after further study. Specifically, the contributors reserve the right to add to, amend or modify the statements contained herein. ******************************************************************* Frame-level throughput and latency metrics are discussed in this contribution. THROUGHPUT ---------- There are two throughputs that are of interest to a user: i. Zero loss (lossless) throughput - Its the maximum rate at which none of the frames are dropped. ii. Peak throughput - Its the maximum throughput without considering the losses. In other words, the maximum throughput can actually occur when the loss is not zero. ^ Y |-------------# # | # | # OUTPUT COUNT X |--------- # | | # | | # | # | | | # | | # | # |<---- 0% loss | # | | | # | | | # | | | # | | | # | | +-----------------------------------> X Z LOAD (INPUT COUNT) X : lossless throughput Y : peak throughput Z : input count for peak throughput Figure 1 - Graph of output count vs load (input count) ------------------------------------------------------ A model graph of input count vs output count would be: Point X defines the throughput without loss and point Y defines the peak throughput. Note that the peak throughput may equal the lossless throughput in some cases. Throughput can be expressed in bits/sec, frame/sec or cells/sec. Cells/sec is not a good unit for frame-level performance since the cells aren't seen by the user. Bits/sec and frame/sec are related by the following equation. Throughput (bits/sec) = Throughput (frame/sec) * Average frame size (bits) It is preferred to express the throughput in bits/sec, because expressing it in frame/sec would require specifying the frame size, which is a variable. The lossless throughput is the highest load at which the count of the output frames equals the count of the input frames. Peak throughput is the maximum throughput that can be reached inspite of the losses. The tests can be conducted under two conditions - with background traffic and without background traffic. Higher priority traffic like VBR can act as background traffic for the experiment. The frames sent are of fixed length and they have a fixed interframe gap. The frames can be pictured as follows : <------------------> <------------------> | Frame | Gap | Frame | Figure 2 - Traffic used for testing ----------------------------------- Procedure for both lossless and peak throughput: Data traffic is passed through the switch from the sources in the absence or presence of VBR as background and then the frames that are transmitted by the switch are counted. If the input and the output count are the same then the load is increased and the test is conducted again. The throughput without loss is the highest load at which the count of the output frames equals the count of the input frames and this is called lossless throughput. Once the lossless throughput is reached and the input count is further increased then the throughput increases till it reaches a high value and then further increase in load will result in a decrease in the throughput. Instead, the load can be kept constant and the frame size can be varied and its effect on the throughput can be studied. It should be noted that in the case of ABR, the generators should follow traffic management mechanisms from the network. Throughput for a n-to-1 configuration defined in [96-0519] may be atmost equal to (or close) to the capacity of the sink. It is noted that a well behaved switch would allow equal load from all sources without giving preference to any source. For n-by-n configuration [96-0519], in cases i and ii, the throughput may be equal to the sum of the host traffic. For cases iii and iv, throughput may be equal to the 2 * sum of the host traffic. LATENCY ------- For a single bit, latency through a switch can be easily defined as the time between the input and output instants. For a frame, both input and output are intervals and not instants. Therefore, one has to carefully define the instants at which the frame latency measurement begins and ends. Usually latency is measured as one of the following four ways: a. FIFO Latency = First-bit in to first-bit out b. LIFO Latency = Last-bit in to first-bit out c. LILO Latency = Last-bit in to Last-bit out d. FILO Latency = First-bit in to Last-bit out It turns out these definitions apply only for contiguous frames. With ATM cells, the frames may not be contiguous since cells of frames going to other destinations may be intermingled. Also, the frame duration at the input and output may be different due to different input and output link rate. After some thought, we have come up with the following generalized definition of Frame Latency: Frame Latency = Min{LILO Latency, FILO Latency - Nominal frame output time} Where, Nominal frame output time = Number of Cells in the Frame/Output Link Cell Rate Notice that FILO latency includes frame output time along with the switch latency. By subtracting the frame output time, we get the switch latency. The rest of the this contribution is devoted to mathematically justifying this new definition of latency. **** [Interested readers should download our presentation slides from http://www.cse.wustl.edu/~jain/atmforum.htm The slides have a better explaination of latency then this text.] **** DERIVATION OF THE LATENCY FORMULA: --------------------------------- Let t_fi = time of first bit in [in seconds] t_fo = time of first bit out [in seconds] t_li = time of last bit in [in seconds] t_lo = time of last bit out [in seconds] Cin = capacity of input link [cell/sec] Cout = capacity of output link [cell/sec] m = size of frame [in cells] Assumption 1: No links have infinite capacity. We consider only finite link capacities implying t_li > t_fi and t_lo > t_fo. It is always true that t_fo >= t_fi and t_lo >= t_li (Cases t_fo=t_fi and t_lo=t_li for zero-latency switches). Any relation between t_fo and t_li is possible, i.e. t_fo > t_li or t_fo = t_li or t_fo < t_li. Assumption 2: All cells of a frame are contiguous at the input. When the source starts transmitting the first cell into the ATM network, all cells of the frame will be transmitted in continuous stream of cells, without any interruption by empty cells or cells from other frames on that input. Mathematically speaking, t_li - t_fi = m/Cin. Assumption 3: At the output, cells of a frame may or may not be contiguous. In other words, we have two possibilities for t_lo - t_fo: a. t_lo - t_fo > m/Cout, when the output cell stream of the given frame is intermixed with empty cells or cells from other frames. b. t_lo - t_fo = m/Cout, when the output cell stream of the given frame is contiguous. The four traditional definitions of latency can be expressed in terms of these time instants as follows: a. FIFO Latency = first-bit in to first-bit out = t_fo - t_fi b. LIFO Latency = Last-bit in to first-bit out = t_fo - t_li c. LILO Latency = Last-bit in to Last-bit out = t_lo - t_li d. FILO Latency = first-bit in to Last-bit out = t_lo - t_fi We shall now consider each definition and in each one provide one case where corresponding definition does not produce expected result. FIFO: The problem with FIFO is that it provides delay of the first cell of the frame but not of the whole frame. Consider the following scenario: The first cell of frame is delivered at the destination very fast (zero delay may be possible), and then all other cells exercise very long delays due to internal queueing. FIFO accounts for delay of the first cell (which is in this case small) and gives a short frame delay (even zero), although the frame has very long delay because all other cells but first have long delays. LIFO: t_fo < t_li is possible (Assumption 1) implying that LIFO (=t_fo - t_li) may be negative. This is not acceptable. LILO: Consider the case when the input link rate is higher than the output rate but the switch delay is zero. In this case: t_fi = t_fo (zero delay in ATM network) t_lo - t_fo = m/Cout (Assumption 3b) Cin > Cout => t_lo > t_li Since the switch latency in this scenario is 0, the measured value t_lo-t_li should come out zero. However Cin > Cout implies t_lo > t_li or LILO = t_lo - t_li > 0. Thus, LILO latency in this case is non-zero. FILO: Consider again the case of a zero-latency switch. In this case: t_fi = t_fo (zero delay in ATM network) t_lo - t_fo = m/Cout (Assumption 3b) t_lo >= t_li > t_fi (Assumption 1) implies t_lo > t_fi or FILO latency = t_lo - t_fi > 0. Thus, the FILO latency would be non- zero in this case. Proposed Definition ------------------- The proposed definition is: Frame Latency = min {(t_lo-t_li), (t_lo - t_fi - m/Cout)} = min (LILO, FILO-m/Cout) Given definition may be considered in the following three cases: 1. If Cin = Cout then t_lo - t_fi - m/Cout = t_lo - t_fi - (t_li - t_fi) = t_lo - t_li This implies that both terms in the latency expression are identical and the frame latency can be determined by value of either term. 2. If Cin > Cout then t_lo - t_li = t_lo - t_fi - m/Cin > t_lo - t_fi - m/Cout This implies that the first term in the latency expression is larger than the second one and the frame latency is determined by the value of the second term. 3. If Cin < Cout, then t_lo - t_fi - m/Cout > t_lo - t_fi - m/Cin = t_lo - t_li In this case, the second term in the latency expression is larger than the first and the frame latency is determined by the value of the first term. For each case, all possible scenarios (timing diagrams) that illustrate correctness of our definition may be presented. However, here we present only two characteristic scenarios. Scenario 1 ---------- A B t_fi + | * | * | * | * | * | * | * | + t_fo | | | | | | | | t_li + | | * | | * | | * | | * | | * | | * | | + t_lo Here we assume that Cin > Cout, the definition provided by frame latency = t_lo - t_fi - m/Cout Without knowing precisely the relationship between Cin and Cout we can only state that t_lo - t_fo >= m/Cout. We now analyze two cases. Case a) If t_lo - t_fo = m/Cout then, t_fo = t_lo - m/Cout and latency = t_fo - t_fi = t_lo - m/Cout - t_fi which is in accordance with the definition. Case b) If t_lo - t_fo > m/Cout then latency = t_fo - t_fi + (t_lo - t_fo) - m/Cout = t_lo - t_fi - m/Cout which is in accordance with the definition. Scenario 2 ---------- A B t_fi + | * | * | * | * | * | * | + t_fo | | | | | | | | t_li +*******************************+ t_lo If we assume Cin < Cout, the definition provided by frame latency is t_lo - t_li. In this scenario, t_li = t_lo implies that we have a zero-delay network (because last bit is delivered instantaneously), so frame latency is zero. The definition of latency also produces identical result. Note that in this case, without knowing relationship between Cin and Cout, we can state only that t_lo - t_fo >= m/Cout. If t_lo - t_fo > m/Cout then some number of cells not belonging to the frame under consideration (including empty ones) have been delivered by the network. If t_lo - t_fo = m/Cout, then cells of the given frame are not interleaved. But regardless of cells of the given frame being interleaved or not, frame latency is zero. REFERENCES: ---------- [95-1347] Raj Jain, "Performance Benchmarking BOF," AF-ALL/95- 1347, October 1995. [95-1662] Raj Jain, Bhavana Nagendra, "Performance Benchmarking of ATM Switches", AF-TEST/95-1662, December 1995. [96-0180] Raj Jain, Bhavana Nagendra, Gojko Babic, "Scope For ATM Forum's Performance Benchmarking Work Item," AF-TEST/96-0180, February 1996. [96-0519] Raj Jain, Bhavana Nagendra, Gojko Babic, " General Considerations for Frame-Level Performance Measurement of ATM Switches," AF-TEST/96-0519, April 1996. Note: All our past ATM forum contributions and presentations are available on-line at http://www.cse.wustl.edu/~jain/