******************************************************************* ATM Forum Document Number: ATM_Forum/96-8011 ******************************************************************* Title: Frame-level throughput and latency metrics - proposed text. ******************************************************************* Abstract: In the April meeting we presented definitions for throughput and latency, which were generally agreed. This contribution proposes exact text for inclusion in the performance benchmarking document. ******************************************************************* Source: Raj Jain, Gojko Babic, Bhavana Nagendra The Ohio State University Department of CIS Columbus, OH 43210-1277 Phone: 614-292-3989, Fax: 614-292-2911, Email: Jain@ACM.Org The presentation of this contribution at the ATM Forum is sponsored by NASA. ******************************************************************* Date: June 1996 ******************************************************************* Distribution: ATM Forum Technical Working Group Members (AF-TEST) ******************************************************************* Notice: This contribution has been prepared to assist the ATM Forum. It is offered to the Forum as a basis for discussion and is not a binding proposal on the part of any of the contributing organizations. The statements are subject to change in form and content after further study. Specifically, the contributors reserve the right to add to, amend or modify the statements contained herein. ******************************************************************* In AF-TEST April 96 meeting in Alaska, we presented definitions for two throughput metrics and one latency metrics [1]. There was general agreement about the metrics. Also, It was suggested that we add one more throughput metrics. Based on this discussion, it is proposed that the following be considered for addition to the performance benchmarking document: Performance Metrics: In the following description System Under Test (SUT) refers to an ATM switch. However, the definitions and measurement procedures are general and may be used for other devices or a network consisting of multiple switches as well. 1. THROUGHPUT 1.1 Definitions: There are three frame-level throughput metrics that are of interest to a user. i. Lossless throughput - It is the maximum rate at which none of the offered frames are dropped by the SUT. ii. Peak throughput - It is the maximum rate regardless of frames dropped at which the SUT operates. The maximum rate can actually occur when the loss is not zero. iii. Full-load throughput - Its the rate at which SUT operates when the input links are loaded at 100% of their capacity. A model graph of throughput vs input rate is shown in Figure 1. (Figures are included only in the postscript version of this contribution). Level X defines the loss- less throughput, level Y defines the peak throughput and level Z defines the full-load throughput. The lossless throughput is the highest load at which the count of the output frames equals the count of the input frames. Peak throughput is the maximum throughput that can be achieved inspite of the losses. Full-load throughput is the throughput of the system at 100% load on input links. Note that the peak throughput may equal the lossless throughput in some cases. Only frames that are received completely without errors are included in frame-level throughput computation. Partial frames and frames with CRC errors are not included. 1.2 Units: Throughput should be expressed in bits/sec. This is preferred over specifying it in frames/sec or cells/sec. Frames/sec requires specifying the frame size. The throughput values in frames/sec at various frame sizes cannot be compared without first being converted into bits/sec. Cells/sec is not a good unit for frame-level performance since the cells aren't seen by the user. 1.3 Statistical Variation: The tests should be run NRT times for TRT seconds each. Here NRT and TRT are parameters. These and other such parameters and their default values are listed later in Table 2. If Ti is the throughput in ith run, The mean and standard errors of the measurement should be computed as follows: Mean throughput = Sum_over_i Ti Standard deviation of throughput = (Sum_over_i (Ti-Mean throghput)**2)/(n-1) Standard error = Standard deviation of throughput/sqrt(n) Given mean and standard errors, the users can compute an alpha-percent confidence interval as follows: alpha-percent confidence interval = (mean - z*std error, mean+z*std error) Here, z is the 1-alpha/2 quantile of the unit normal variate. For commonly used confidence levels, the quantile values are as follows: Confidence Quantile 90% 1.645 99% 2.346 99.9% 3.291 1.4 Traffic Pattern: The input traffic will consist of frames of length FSA bytes each. Before starting the throughput measurements, all required VCs will be set up (for an n-port SUT) in one of the following four configurations: 1. n-to-n straight: All frames input from port i exit to port i+1 modulo n. This represents almost no path interference among the VCs. Total n VCs. 2. n-to-n cross: Input from port each port is divided equally to exit on each of the n output ports. Total n**2 VCs. 3. n-to-1: Input from all ports is destined to one output port. Total n VCs. 4. 1-to-n: Input from a port is multicast to all output ports. Total 1 VC. The frames will be delivered to the layer under test equally spaced at a given input rate. The rate at which the cells reach SUT may vary depending upon the service used. For example, for ABR traffic, the allowed cell rate may be less than the link rate in some configurations. At each value of the input rate to the layer under test, the total number of frames sent to SUT and received from SUT are recorded. The input rate is computed based on the time from the first bit of first frame enters the SUT to the last bit of the last frame enters the SUT. The throughput (output rate) is computed based on the time from the first bit of the first frame exits the SUT to the last bit of the last frame exits SUT. If the input frame count and the output frame count are the same then the input rate is increased and the test is conducted again. The lossless throughput is the highest throughput at which the count of the output frames equals the count of the input frames. If the input rate is increased even further, although some frames will be lost, the throughput may increase till it reaches the peak throughput value after which the further increase in input rate will result in a decrease in the throughput. The input rate is increased further till 100% load is reached and the full-load throghput is recorded. 1.5 Background Traffic: The tests can be conducted under two conditions - with background traffic and without background traffic. Higher priority traffic like VBR can act as background traffic for the experiment. Further details of measurements with background traffic (multiple service classes simultaneously) are to be specified. Until then all benchmarking will be done without any background traffic. 2. FRAME LATENCY: 2.1 Definition: The frame latency for a system under test is measured using a "Message-in Message-out (MIMO)" definition. Succinctly, MIMO latency is defined as follows: MIMO Latency = Min{First-bit in to last-bit out latency - nominal frame output time, last-bit in to last-bit out latency} An explanation of MIMO latency and its justification is presented in Appendix A. To measure MIMO latency, a sequence of equally spaced frames are sent at a particular rate. After the flow has been established, one of the frames in the flow is marked and the time of the following four events is recorded for the marked frame while the flow continues unpurturbed: 1. First-bit of the frame enters into the SUT 2. Last-bit of the frame enters into the SUT 3. First-bit of the frame exits from the SUT 4. Last-bit of the frame exits from the SUT The time between the first-bit entry and the last bit exit (events 1 and 4 above) is called first-bit in to last-bit out (FILO) latency. The time between the last-bit entry to the last-bit exit (events 2 and 4 above) is called last-bit in to last-bit out (LILO) latency. Given the frame size and the nominal output link rate, the nominal frame output time is computed as follows: Nominal frame output time = Frame size/Nominal output link rate Substituting the FILO latency, LILO latency, and Nominal frame output time in the MIMO latency formula gives the frame level lantecy of the SUT. 2.2 Units: The latency should be specified in micro-seconds. 2.3 Statistical Variations: NML samples of the latency are obtained by sending NML marked frames at TTL/(NML+1) intervals for a total test duration of TTL seconds. Here, NML and TTL are parameters. Their default values are specified in Table 2. The mean and standard errors computed (in a manner similar to that explained in Section 1.3 for Throughput) from these samples are reported as the test results. 2.4 Traffic Pattern: The input traffic will consist of frames of length FSA bytes. Here, FSA is a parameter. Its default value is specified in Table 2. Before starting the throughput measurements, all required VCs will be set up (for an n-port SUT) in one of the following configurations: 1. n-to-n straight: All frames input from port i exit to port i+1 modulo n. This represents almost no path interference among the VCs. 2. n-to-n cross: Input from port each port is divided equally to exit on each of the n output ports. 3. n-to-1 : Input from all ports is destined to one output port. 4. 1-to-n: Input from a port is multicast to all output ports. Total 1 VC. The frames will be delivered to the layer under test equally spaced at a given input rate. For latency measurement, the input rate will be set at the input rate corresponding to the lossless throughput. This avoids the problem of lost marked cells and missing samples. 2.5 Background Traffic: The tests can be conducted under two conditions - with background traffic and without background traffic. Higher priority traffic like VBR can act as background traffic for the experiment. Further details of measurements with background traffic (multiple service classes simultaneously) are to be specified. Initally all tests will be conducted without the background traffic. 3. REPORTING RESULTS The throughput and latency results will be reported in a tabular format as follows: Table 1: Tabular format for reporting performance benchmarking results +-------------------------------------------------------------------+ | | Throughput | Latency | |Traffic |--------------------------------------------+ | |pattern | Loss-less | Peak | Full-Load | | | |-----+--------+-----+--------+-----+--------+-------------| | | Mean| Std Err| Mean| Std Err| Mean| Std Err| Mean|Std Err| |-------------------------------------------------------------------+ |n-to-n | | | | | | | | | |Straight| | | | | | | | | |-------------------------------------------------------------------+ |n-to-n | | | | | | | | | |Cross | | | | | | | | | |-------------------------------------------------------------------+ |n-to-1 | | | | | | | | | | | | | | | | | | | |-------------------------------------------------------------------+ | | | | | | | | | | |1 to n | | | | | | | | | +-------------------------------------------------------------------+ 4. DEFAULT PARAMETER VALUES The default values of the parameters used in performance benchmarking are listed in Table 2. Table 2: List of Parameters and their default values +-----+-----------------------------------------------------+--------+ |Para-| | Default| |meter| Meaning | Value | +-----+-----------------------------------------------------+--------+ |NRT | Number of repetitions of throughput experiments | 30 | |TRT | Time of each repetition of throughput experiments | 60 sec | |FSA | Frame size for AAL performance experiments | 9188 B | |NML | Number of marked frames sent in latency experiments | 30 | |TTL | Total time of latency experiment | 31 sec | +-----+-----------------------------------------------------+--------+ APPENDIX A: MIMO LATENCY The message-in message-out (MIMO) latency is a general definition of latency that applies to a switch or a group of switches when the frames equal to output link rate. For a single bit, the latency is generally defined as the time from bit in to bit out. For a multi-bit frame, there are several possible definitions. First, consider the case of contiguous frames. All bits of the frames are delivered contiguously without any gap between them. In this case, latency can be defined in one of the following four ways: 1. First bit in to first bit out (FIFO) 2. Last bit in to last bit out (LILO) 3. First bit in to last bit out (FILO) 4. Last bit in to first bit out (LIFO) If the input link and the output links are of the same speed and the frames are contiguous, the FIFO and LILO latencies are identical. FILO and LIFO latencies can be computed from FIFO (or LILO) given the frame time: FILO = FIFO + Nominal frame output time LIFO = FIFO - Nominal frame output time It is clear that FIFO (or LILO) is a prefered metrics in this case since it may be independent of the frame time while FILO and LIFO would be different for each frame size. Unfortunately, none of the above four metrics apply to an ATM network (or a switch) since the frames are not always delivered contiguously. There may be idle time between cells of a frame. Also, the input and output link may be of different speeds. In the following we consider twelve different cases. For each case, we compare four possible metrics (FIFO, LILO, FILO-nominal frame output time, and MIMO) and show that MIMO is the correct metrics in all cases while other metrics apply to some cases but give wrong answers in others. The twelve cases and the applicability of the four metrics is shown in Table A.1 Table A.1: Applicability of various latency definitions +---+----------------------------------+------+------+-------+------+ |No.| Case | FIFO | LILO | FILO- | MIMO | | | | | | NFOT | | +---+----------------------------------+------+------+-------+------+ | 1a| Input rate=output rate, conti- | + | + | + | + | | | guous frame, zero delay switch | | | | | +---+----------------------------------+------+------+-------+------+ | 1b| Input rate=output rate, conti- | + | + | + | + | | | guous frame, nonzero delay switch| | | | | +---+----------------------------------+------+------+-------+------+ | 1c| Input rate=output rate, nonconti-| case is not possible | | | guous frame, zero-delay switch | | +---+----------------------------------+------+------+-------+------+ | 1d| Input rate=output rate, nonconti-| - | + | + | + | | | guous frame, nonzero delay switch| | | | | +---+----------------------------------+------+------+-------+------+ | 2a| Input rate>output rate, conti- | + | - | + | + | | | guous frame, zero delay switch | | | | | +---+----------------------------------+------+------+-------+------+ | 2b| Input rate>output rate, conti- | + | - | + | + | | | guous frame, nonzero delay switch| | | | | +---+----------------------------------+------+------+-------+------+ | 2c| Input rate>output rate, nonconti-| case is not possible | | | guous frame, zero-delay switch | | +---+----------------------------------+------+------+-------+------+ | 2d| Input rate>output rate, nonconti-| - | - | + | + | | | guous frame, nonzero delay switch| | | | | +---+----------------------------------+------+------+-------+------+ | 3a| Input rate The Metric gives a valid result - => The Metric gives an invalid result CASE 1a: Input Rate = Output Rate, Contiguous Frame, Zero- Delay Switch One way to verify the validity of a latency definition is to apply it to a single input single output zero delay switch (basically a very short wire). In this case, the bits appear on the output as soon as they enter on the input. All four metrics give a delay of zero and therefore valid. Notice that FILO and LIFO will will give a non-zero delay equal to frame time. Since we are interested in only switch delay and know that the switch delay in this case is zero, FILO and LIFO are not good switch delay metrics and will not be considered any further. The nominal frame output time (NFOT) is computed as the frame size divided by the output link rate. It indicates how long the it will take to output the frame at the link speed. FILO - NFOT indicates switch's contribution to the latency and is therefore a candidate for further discussion. CASE 1b: Input Rate = Output Rate, Contiguous frame, non- zero delay switch In this case, the total delay FILO can be divided into two parts: switch latency and frame time: FILO = Switch latency + Nominal frame output time Switch latency = FILO - NFOT LILO = FIFO = FILO-NFOT MIMO = Min{FILO-NFOT, LILO) = LILO = FILO-NFOT = FIFO All four metrics again give identical and meaningful result. CASE 1c: Input Rate = Output Rate, Non-contiguous frame, Zero-delay Switch On a zero-delay switch, the bits will appear on the output as soon as they enter the input. Since the input frame is continous, the output frame will also be contiguous and therefor this case is not possible. CASE 1d: Input Rate = Output Rate, Non-contiguous frame, Nonzero-Delay Switch This case is shown in Figure A.2. There are several gaps between the cells of the frame at the output. By changing these gaps, the FIFO latency can be changed arbitrarily. FILO, LILO, and MIMO are related as follows: FILO - NFOT = LILO = Min{FILO-NFOT, LILO) = MIMO Either one of these three metrics can be used as switch latency. CASE 2a: Input Rate > Output Rate, Contiguous frame, Zero- delay Switch In this case, the switch consists of a single-input single-output memory buffer. The frame flow is shown in Figure A.2. For this case, FIFO, FILO, and MIMO are related as follows: LILO > FIFO = FILO - NFOT = min{FILO-NFOT, LILO} = MIMO = 0 In this case, FIFO, FILO-NFOT, and MIMO give the correct (zero) latency. LILO will produce a non-zero result and is incorrect. CASE 2b: Input Rate > Output Rate, Contiguous frame, Nonzero-delay Switch The frame flow is shown in Figure A.2b. Note that the following relationship among various metrics still holds as in case 2a: LILO > FIFO = FILO - NFOT = min{FILO-NFOT, LILO} = MIMO Thus, LILO gives incorrect answer. While the other three metrics give the correct answer. CASE 2c: Input Rate > Output Rate, Non-contiguous frame, Zero-delay Switch This case is not possible. CASE 2d: Input Rate > Output Rate, Non-contiguous frame, Nonzero-Delay Switch In this case, (see Figure A.2d) FIFO < FILO - NFOT LILO > FILO - NFOT = Min{FILO-NFOT, LILO} = MIMO In this case, FIFO can be made arbitrarily small by delivering the first cell fast but later introducing large gaps. Similarly, LILO can be made arbitrarily large by increasing the input rate (and not changing the switch otherwise). Thus, FILO-NFOT and MIMO are the only two metrics that can be considered valid in this case. CASE 3a: Input Rate < Output Rate, Contiguous frame, Zero- delay Switch This case is shown in Figure A.3a. FILO-NFOT = FIFO >0 Since both FIFO and FILO-NFOT latencies are non-zero, they are both incorrect for this case. LILO = min{FILO-NFOT, LILO} = MIMO = 0 Both LILO and MIMO give the correct result of zero. CASE 3b: Input Rate < Output Rate, Contiguous frame, Nonzero-delay Switch This case is shown in Figure A.3b. Both FIFO and FILO-NFOT latencies are non-zero and so they are incorrect. LILO = min{FILO-NFOT, LILO} = MIMO = 0 Both LILO and MIMO give the correct result of zero. CASE 3c: Input Rate < Output Rate, Non-contiguous frame, Zero-delay Switch This case is shown in Figure A.3c. FIFO can be made arbitrarily large by increasing the output link rate (and not changing the switch otherwise). FIFO is not a good indicator of switch latency. FILO-NFOT is equal to FIFO latency and is also incorrect. LILO is the only metric that can be argued to be the correct measure of latency. LILO is less than FILO-NFOT. Therefore, LILO = Min{FILO-NFOT, LILO} = MIMO MIMO is also equal to LILO and is therefore a correct measure. CASE 3d: Input Rate < Output Rate, Non-contiguous frame, Nonzero-Delay Switch FIFO can be made small by sending the first cell fast and then introducing large time gaps in the output. FIFO is, therefore, not a valid switch latency metric in this case. FILO - NFOT > FIFO is similarly incorrect. LILO is the only metric that can be argued to be correct in this case. Since LILO < FILO-NFOT, MIMO=Min{FILO-NFOT, LILO} = LILO MIMO is also a correct measure. Once again looking at Table A.1, we find that MIMO is the only metric that applies to all input and output link rates and contiguous and non-contiguous frames. MOTION: Adopt the text of this contribution for inclusion in the performance benchmarking draft baseline document. REFERENCES: [1] R. Jain, G. Babic, and B. Nagendra, "Considerations for Frame-level Throughput and Latency Measurements of ATM switches," ATM_Forum 96-0520, April 1996. All our papers and ATM Forum contributions are available on-line: http://www.cse.wustl.edu/~jain/