*******************************************************************

       ATM Forum Document Number: ATM_Forum/96-0520

       *******************************************************************

       Title: Considerations for Frame Level Throughput and Latency Measurements of
       ATM Switches

       *******************************************************************


       Abstract: We discuss measurement methods for throughput and
       latency in this contribution. These are enhancements of those in
       our February 96 contribution.

       *******************************************************************

       Source:

       Raj Jain, Bhavana Nagendra, and Gojko Babic
       The Ohio State University
       Department of CIS
       Columbus, OH 43210-1277
       Phone: 614-292-3989, Fax: 614-292-2911, Email: Jain@ACM.Org

       The presentation of this contribution at the ATM Forum is sponsored by
       NASA.

       *******************************************************************

       Date:  April 1996, Anchorage, Alaska

       *******************************************************************

       Distribution: ATM Forum Technical Working Group Members (AF-TEST)

       *******************************************************************

       Notice: This contribution has been prepared to assist the ATM
       Forum. It is offered to the Forum as a basis for discussion and
       is not a binding proposal on the part of any of the contributing
       organizations. The statements are subject to change in form and
       content after further study. Specifically, the contributors
       reserve the right to add to, amend or modify the statements
       contained herein.

       *******************************************************************

       Frame-level throughput and latency metrics are discussed in this
       contribution.

       THROUGHPUT
       ----------

       There are two throughputs that are of interest to a user:

       i. Zero loss (lossless) throughput - Its the maximum rate at
       which none of the frames are dropped.

       ii. Peak throughput - Its the maximum throughput without
       considering the losses.  In other words, the maximum throughput
       can actually occur when the loss is not zero.


                                  ^
                                Y |-------------#  #
                                  |            # |       #
                  OUTPUT COUNT  X |--------- #   |
                                  |         # |  |             #
                                  |        #  |  |
                                  |       #   |  |                  #
                                  |      #    |<---- 0% loss
                                  |     #     |  |
                                  |    #      |  |
                                  |   #       |  |
                                  |  #        |  |
                                  | #         |  |
                                  +----------------------------------->
                                              X  Z

                                              LOAD (INPUT COUNT)


                  X : lossless throughput
                  Y : peak throughput
                  Z : input count for peak throughput

        Figure 1 - Graph of output count vs load (input count)
        ------------------------------------------------------


       A model graph of input count vs output count would be: Point X
       defines the throughput without loss and point Y defines the peak
       throughput.

       Note that the peak throughput may equal the lossless throughput
       in some cases.

       Throughput can be expressed in bits/sec, frame/sec or cells/sec.
       Cells/sec is not a good unit for frame-level performance since
       the cells aren't seen by the user.  Bits/sec and frame/sec are
       related by the following equation.

       Throughput (bits/sec) = Throughput (frame/sec) * Average frame
       size (bits)

       It is preferred to express the throughput in bits/sec, because
       expressing it in frame/sec would require specifying the frame
       size, which is a variable.

       The lossless throughput is the highest load at which the count of
       the output frames equals the count of the input frames.

       Peak throughput is the maximum throughput that can be reached
       inspite of the losses.

       The tests can be conducted under two conditions - with background
       traffic and without background traffic.

       Higher priority traffic like VBR can act as background traffic
       for the experiment.

       The frames sent are of fixed length and they have a fixed
       interframe gap.   The frames can be pictured as follows :

       <------------------>             <------------------>
       |    Frame         |     Gap     |      Frame       |


              Figure 2 - Traffic used for testing
              -----------------------------------


       Procedure for both lossless and peak throughput:

       Data traffic is passed through the switch from the sources in the
       absence or presence of VBR as background and then the frames that
       are transmitted by the switch are counted.   If the input and the
       output count are the same then the load is increased and the test
       is conducted again.  The throughput without loss is the highest
       load at which the count of the output frames equals the count of
       the input frames and this is called lossless throughput.  Once
       the lossless throughput is reached and the input count is further
       increased then the throughput increases till it reaches a high
       value and then further increase in load will result in a decrease
       in the throughput.  Instead, the load can be kept constant and
       the frame size can be varied and its effect on the throughput can
       be studied.

       It should be noted that in the case of ABR, the generators should
       follow traffic management mechanisms from the network.

       Throughput for a n-to-1 configuration defined in [96-0519] may be
       atmost equal to (or close) to the capacity of the sink. It is
       noted that a well behaved switch would allow equal load from all
       sources without giving preference to any source.

       For n-by-n configuration [96-0519], in cases i and ii, the
       throughput may be equal to the sum of the host traffic.  For
       cases iii and iv, throughput may be equal to the 2 * sum of the
       host traffic.


       LATENCY
       -------

       For a single bit, latency through a switch can be easily defined
       as the time between the input and output instants. For a frame,
       both input and output are intervals and not instants. Therefore,
       one has to carefully define the instants at which the frame
       latency measurement begins and ends.  Usually latency is measured
       as one of the following four ways:
                      a. FIFO Latency = First-bit in to first-bit out
                      b. LIFO Latency = Last-bit in to first-bit out
                      c. LILO Latency = Last-bit in to Last-bit out
                      d. FILO Latency = First-bit in to Last-bit out

       It turns out these definitions apply only for contiguous frames.
       With ATM cells, the frames may not be contiguous since cells of
       frames going to other destinations may be intermingled. Also, the
       frame duration at the input and output may be different due to
       different input and output link rate. After some thought, we have
       come up with the following generalized definition of Frame
       Latency:

       Frame Latency = Min{LILO Latency, FILO Latency - Nominal frame
       output time}
       Where,
       Nominal frame output time = Number of Cells in the Frame/Output
       Link Cell Rate

       Notice that FILO latency includes frame output time along with
       the switch latency. By subtracting the frame output time, we get
       the switch latency.


       The rest of the this contribution is devoted to mathematically
       justifying this new definition of latency.

       **** [Interested readers should download our presentation slides
       from http://www.cse.wustl.edu/~jain/atmforum.htm The slides
       have a better explaination of latency then this text.] ****

       DERIVATION OF THE LATENCY FORMULA:
       ---------------------------------
       Let
             t_fi = time of first bit in [in seconds]
             t_fo = time of first bit out [in seconds]
             t_li = time of last bit in [in seconds]
             t_lo = time of last bit out [in seconds]
             Cin = capacity of input link [cell/sec]
             Cout = capacity of output link [cell/sec]
             m = size of frame [in cells]

       Assumption 1: No links have infinite capacity.  We consider only
       finite link capacities implying t_li > t_fi and t_lo > t_fo.  It
       is always true that t_fo >= t_fi and t_lo >= t_li (Cases
       t_fo=t_fi and t_lo=t_li  for zero-latency switches). Any relation
       between t_fo and t_li is possible, i.e. t_fo > t_li or t_fo =
       t_li or  t_fo < t_li.


       Assumption 2: All cells of a frame are contiguous at the input.
       When the source starts transmitting the first cell into the ATM
       network, all cells of the frame will be transmitted in continuous
       stream of cells, without any interruption by empty cells or cells
       from other frames on that input. Mathematically speaking, t_li -
       t_fi = m/Cin.


       Assumption 3: At the output, cells of a frame may or may not be
       contiguous.  In other words, we have two possibilities for t_lo -
       t_fo:

       a. t_lo - t_fo > m/Cout, when the output cell stream of the given
       frame is intermixed with empty cells or cells from other frames.

       b. t_lo - t_fo = m/Cout, when the output cell stream of the given
       frame is contiguous.

       The four traditional definitions of latency can be expressed in
       terms of these time instants as follows:

             a. FIFO Latency = first-bit in to first-bit out = t_fo - t_fi
             b. LIFO Latency = Last-bit in to first-bit out = t_fo - t_li
             c. LILO Latency = Last-bit in to Last-bit out = t_lo - t_li
             d. FILO Latency = first-bit in to Last-bit out = t_lo - t_fi

       We shall now consider each definition and in each one provide one
       case where corresponding definition does not produce expected
       result.

       FIFO: The problem with FIFO is that it provides delay of the
       first cell of the frame but not of the whole frame. Consider the
       following scenario: The first cell of frame is delivered at the
       destination very fast (zero delay may be possible), and then all
       other cells exercise very long delays due to internal queueing.
       FIFO accounts for delay of the first cell (which is in this case
       small) and gives a short frame delay (even zero), although the
       frame has very long delay because all other cells but first have
       long delays.

       LIFO: t_fo < t_li is possible (Assumption 1) implying that LIFO
       (=t_fo - t_li) may be negative. This is not acceptable.

       LILO: Consider the case when the input link rate is higher than
       the output rate but the switch delay is zero. In this case:
        t_fi = t_fo (zero delay in ATM network)
        t_lo - t_fo = m/Cout (Assumption 3b)
        Cin > Cout => t_lo > t_li

       Since the switch latency in this scenario is 0, the measured
       value t_lo-t_li should come out zero. However Cin > Cout implies
       t_lo > t_li or LILO = t_lo - t_li > 0. Thus, LILO latency in this
       case is non-zero.

       FILO: Consider again the case of a zero-latency switch. In this
       case:
       t_fi = t_fo (zero delay in ATM network)
       t_lo - t_fo = m/Cout (Assumption 3b)

       t_lo >= t_li > t_fi (Assumption 1) implies t_lo > t_fi or FILO
       latency = t_lo - t_fi > 0.  Thus, the FILO latency would be non-
       zero in this case.

       Proposed Definition
       -------------------

       The proposed definition is:

       Frame Latency = min {(t_lo-t_li), (t_lo - t_fi - m/Cout)}
        = min (LILO, FILO-m/Cout)


       Given definition may be considered in the following three cases:

       1. If Cin = Cout then
       t_lo - t_fi - m/Cout = t_lo - t_fi - (t_li - t_fi)
       = t_lo - t_li

       This implies that both terms in the latency expression are
       identical and the frame latency can be determined by value of
       either term.

       2. If Cin > Cout then
       t_lo - t_li = t_lo - t_fi - m/Cin > t_lo - t_fi - m/Cout
       This implies that the first term in the latency expression is
       larger than the second one and the frame latency is determined by
       the value of the second term.

       3. If Cin < Cout, then
       t_lo - t_fi - m/Cout > t_lo - t_fi - m/Cin = t_lo - t_li
       In this case, the second term in the latency expression is larger
       than the first and the frame latency is determined by the value
       of the first term.

       For each case, all possible scenarios (timing diagrams) that
       illustrate correctness of our definition may be presented.
       However, here we present only two characteristic scenarios.

       Scenario 1
       ----------


          A                               B

       t_fi +
            |  *
            |    *
            |        *
            |            *
            |                *
            |                     *
            |                          *
            |                               +  t_fo
            |                               |
            |                               |
            |                               |
            |                               |
       t_li +                               |
                                            |
              *                             |
                                            |
                  *                         |
                                            |
                      *                     |
                                            |
                          *                 |
                                            |
                               *            |
                                            |
                                    *       |
                                            |
                                            + t_lo

       Here we assume that Cin > Cout, the definition provided by frame
       latency = t_lo - t_fi - m/Cout

       Without knowing precisely the relationship between Cin and Cout
       we can only state that t_lo - t_fo >= m/Cout.

       We now analyze two cases.

       Case a) If t_lo - t_fo = m/Cout then, t_fo = t_lo - m/Cout and
                      latency = t_fo - t_fi  = t_lo - m/Cout - t_fi
       which is in accordance with the definition.

       Case b) If t_lo - t_fo > m/Cout then
       latency = t_fo - t_fi + (t_lo - t_fo) - m/Cout = t_lo - t_fi -
       m/Cout
       which is in accordance with the definition.

       Scenario 2
       ----------


            A                               B

       t_fi +
            |    *
            |        *
            |            *
            |                *
            |                     *
            |                          *
            |                               + t_fo
            |                               |
            |                               |
            |                               |
            |                               |
       t_li +*******************************+ t_lo


       If we assume Cin < Cout, the definition provided by frame latency
       is t_lo - t_li.

       In this scenario, t_li = t_lo implies that we have a zero-delay
       network (because last bit is delivered instantaneously), so frame
       latency is zero.  The definition of latency also produces
       identical result.

       Note that in this case, without knowing relationship between Cin
       and Cout, we can state only that t_lo - t_fo >= m/Cout.   If t_lo
       - t_fo > m/Cout then some number of cells not belonging to the
       frame under consideration (including empty ones) have been
       delivered by the network.  If t_lo - t_fo = m/Cout, then cells of
       the given frame are not interleaved.   But regardless of cells of
       the given frame being interleaved or not, frame latency is zero.

       REFERENCES:
       ----------

       [95-1347] Raj Jain, "Performance Benchmarking BOF," AF-ALL/95-
       1347, October 1995.

       [95-1662] Raj Jain, Bhavana Nagendra, "Performance Benchmarking
       of ATM Switches", AF-TEST/95-1662, December 1995.

       [96-0180] Raj Jain, Bhavana Nagendra, Gojko Babic, "Scope For ATM
       Forum's Performance Benchmarking Work Item," AF-TEST/96-0180,
       February 1996.

       [96-0519] Raj Jain, Bhavana Nagendra, Gojko Babic, " General
       Considerations for Frame-Level Performance Measurement of ATM
       Switches," AF-TEST/96-0519, April 1996.

       Note: All our past ATM forum contributions and presentations are
       available on-line at http://www.cse.wustl.edu/~jain/