******************************************************************************** ATM Forum Document Number: BTD-TEST-TM-PERF.00.04 (96-0810R7) ******************************************************************************** Title: ATM Forum Performance Testing Specification - Baseline Text ******************************************************************************** Abstract: This baseline document includes all text related to performance testing that has been agreed so far by the ATM Forum Testing Working Group. ******************************************************************************** Source: Raj Jain, Gojko Babic, Arjan Durresi, Justin Dolske. The Ohio State University Department of CIS Columbus, OH 43210-1277 Phone: 614-292-3989, Fax: 614-292-2911, Email: Jain@ACM.Org The presentation of this contribution at the ATM Forum is sponsored by NASA Lewis Research Center. ******************************************************************************** Date: December 1997 ******************************************************************************** Distribution: ATM Forum Technical Working Group Members (AF-TEST, AF-TM) ******************************************************************************** Notice: This contribution has been prepared to assist the ATM Forum. It is offered to the Forum as a basis for discussion and is not a binding proposal on the part of any of the contributing organizations. The statements are subject to change in form and content after further study. Specifically, the contributors reserve the right to add to, amend or modify the statements contained herein. *********************************************************************** Two postscript versions of this contribution including all figures and tables has been uploaded to the ATM Forum ftp server in the incoming directory. One postscript version shows changes from the last version and the other doesn't. These may be moved from there to atm documents directory. The postscript versions are also available on our web page via: http://www.cse.wustl.edu/~jain/atmf/bperf04.htm Technical Committee ATM Forum Performance Testing Specification December 1997 BTD-TEST-TM-PERF.00.04 (96-0810R7) ATM Forum Performance Testing Specifications Version 1.0, December 1997 (C) 1997 The ATM Forum. All Rights Reserved. No part of this publication may be reproduced in any form or by any means. The information in this publication is believed to be accurate at its publication date. Such information is subject to change without notice and the ATM Forum is not responsible for any errors. The ATM Forum does not assume any responsibility to update or correct any information in this publication. Notwithstanding anything to the contrary, neither The ATM Forum nor the publisher make any representation or warranty, expressed or implied, concerning the completeness, accuracy, or applicability of any inform ation contained in this publication. No liability of any kind shall be assumed by The ATM Forum or the publisher as a result of reliance upon any information contained in this publication. The receipt or any use of this document or its contents does not in any way create by implication or otherwise: * Any express or implied license or right to or under any ATM Forum member company's patent, copyright, trademark or trade secret rights which are or may be associated with the ideas, techniques, concepts or expressions contained herein; nor * Any warranty or representation that any ATM Forum member companies will announce any product(s) and/or service(s) related thereto, or if such announcements are made, that such announced product(s) and/or service(s) embody any or all of the ideas, technologies, or concepts contained herein; nor * Any form of relationship between any ATM Forum member companies and the recipient or user of this document.Implementation or use of specific ATM recommendations and/or specifications or recommendations of the ATM Forum or any committee of the ATM Forum will be voluntary, and no company shall agree or be obliged to implement them by virtue of participation in the ATM Forum. The ATM Forum is a non-profit international organization accelerating industry cooperation on ATM technology. The ATM Forum does not, expressly or otherwise, endorse or promote any specific products or services. Table of Contents 1. INTRODUCTION 1 1.1. SCOPE 1 1.2. GOALS OF PERFORMANCE TESTING 2 1.3. NON-GOALS OF PERFORMANCE TESTING 3 1.4. TERMINOLOGY 3 1.5. ABBREVIATIONS 4 2. CLASSES OF APPLICATION 4 2.1. PERFORMANCE TESTING ABOVE THE ATM LAYER 4 2.2. PERFORMANCE TESTING AT THE ATM LAYER 5 3. PERFORMANCE METRICS 6 3.1. THROUGHPUT 6 3.1.1. Definitions 6 3.1.2. Units 7 3.1.3. Statistical Variations 7 3.1.4. Measurement Procedures 7 3.1.5. Foreground Traffic 8 3.1.6. Background Traffic 11 3.1.7. Guidelines For Scaleable Test Configurations 11 3.1.8. Reporting results 13 3.2. FRAME LATENCY 13 3.2.1. Definition 13 3.2.2. Units 15 3.2.3. Statistical Variations 15 3.2.4. Measurement Procedures 15 3.2.5. Foreground traffic 16 3.2.6. Background Traffic 16 3.2.7. Guidelines For Scaleable Test Configurations 17 3.2.8. Reporting results 19 3.3. THROUGHPUT FAIRNESS 20 3.3.1. Definition 20 3.3.2. Units 20 3.3.3. Measurement procedures 21 3.3.4. Statistical Variations 21 3.3.5. Reporting Results 21 3.4. FRAME LOSS RATIO 21 3.4.1. Definition 21 3.4.2. Units 22 3.4.3. Measurement Procedures 22 3.4.4. Statistical Variations 22 3.4.5. Reporting Results 22 3.5. MAXIMUM FRAME BURST SIZE (MFBS) 22 3.5.1 Definition 22 3.5.2 Units 23 3.5.3 Statistical Variations 23 3.5.4 Measurement Procedure and MFBS Calculation 23 3.5.5 Reporting Results 23 3.6. CALL ESTABLISHMENT LATENCY 24 3.6.1. Definition 24 3.6.2. Units 25 3.6.3. Configurations 25 3.6.4. Statistical Variations 25 3.6.5. Guidelines For Using This Metric 25 3.7. APPLICATION GOODPUT 26 3.7.1. Guidelines For Using This Metric 26 4. REFERENCES 27 APPENDIX A: DEFINING FRAME LATENCY ON ATM NETWORKS 28 A.1. INTRODUCTION 28 A.2. USUAL FRAME LATENCIES AS METRICS FOR ATM SWITCH DELAY 30 A.3. MIMO LATENCY DEFINITION 33 A.4. CELL AND CONTIGUOUS FRAME LATENCY THROUGH A ZERO-DELAY SWITCH 34 A.5. LATENCY OF DISCONTINUOUS FRAMES PASSING THROUGH A ZERO-DELAY SWITCH 37 A.6. CALCULATION OF FILO LATENCY FOR A ZERO-DELAY SWITCH 39 A.7. EQUIVALENT MIMO LATENCY DEFINITION 40 A.8. MEASURING MIMO LATENCY 40 A.9. USER PERCEIVED DELAY 41 APPENDIX B: METHODOLOGY FOR IMPLEMENTING CONNECTION CONFIGURATIONS 44 B.1. INTRODUCTION 44 B.2. DEFINITIONS AND RULES 45 B.3. CONNECTION CONFIGURATIONS CHARACTERISTICS. 47 B.3.1 N-to-N Straight (Single Generator) 47 B.3.2. N-to-N Straight (r Generators) 49 B.3.3. N-to-m Partial Cross (r Generators) 51 B 4. ALGORITHM FOR CREATING VCC CHAINS. 54 1. Introduction Performance testing in ATM deals with the measurement of the level of quality of a system under test (SUT) or an interface under test (IUT) under well-known conditions. The level of quality can be expressed in the form of metrics such as latency, end-to-end delay, effective throughput. Performance testing can be carried at the end-user application level (e.g., FTP, NFS), at or above the ATM layers (e.g., cell switching, signaling, etc.). Performance testing also describes in details the procedures for testing the IUTs in the form of test suites. These procedures are intended to test the SUT or IUT and do not assume or imply any specific implementation or architecture of these systems. This document highlights the objectives of performance testing and suggests an approach for the development of the test suites. 1.1. Scope Asynchronous Transfer Mode, as an enabling technology for the integration of services, is gaining an increasing interest and popularity. ATM networks are being progressively deployed and in most cases a smooth migration to ATM is prescribed. This means that most of the existing applications can still operate over ATM via service emulation or service interworking along with the proper adaptation of data formats. At the same time, several new applications are being developed to take full advantage of the capabilities of the ATM technology through an Application Protocol Interface (API). While ATM provides an elegant solution to the integration of services and allows for high levels of scalability, the performance of a given application may vary substantially with the IUT or the SUT utilized. The variation in the performance is due to the complexity of the dynamic interaction between the different layers. For example, an application running with TCP/IP stacks will yield different levels of performance depending on the interaction of the TCP window flow control mechanism and the ATM netwo rk congestion control mechanism used. Hence, the following points and recommendations are made. First, ATM adopters need guidelines on the measurement of the performance of user applications over different systems. Second, some functions above the ATM layer, e.g., adaptation, signaling, constitute applications (i.e. IUTs) and as such should be considered for performance testing. Also, it is essential that these layers be implemented in compliance with the ATM Forum specifications. Third, performance t esting can be executed at the ATM layer in relation to the QoS provided by the different service categories. Finally, because of the extensive list of available applications, it is preferable to group applications in generic classes. Each class of applications requires different testing environment such as metrics, test suites and traffic test patterns. It is noted that the same application, e.g., ftp, can yield different performance results depending on the underlying layers used (TCP/IP to ATM versus T CP/IP to MAC layer to ATM). Thus performance results should be compared based on the utilization of the same protocol stack. Performance testing is related to user perceived performance of ATM technology. In other words, goodness of ATM will be measured not only by cell level performance but also by frame-level performance and performance perceived at higher layers. Most of the quality of Service (QoS) metrics, such as cell transfer delay (CTD), cell delay variation (CDV), cell loss ratio (CLR), and so on, may or may not be reflected directly in the performance perceived by the user. For example, while comparing two switches if one gives a CLR of 0.1% and a frame loss ratio of 0.1% while the other gives a CLR 1% but a frame loss of 0.05%, the second switch will be considered superior by many users. ATM Forum and ITU-T have standardized the definitions of ATM layer QoS metrics [1,2].This specification does the same for higher layer performance metrics. Without a standard definition, each vendor will use their own definition of common metrics such as throughput and latency resulting in a confusion in the market place. Avoiding such a confusion will help buyers eventually leading to better sales resulting in the success of the ATM technology. The initial work at the ATM Forum will be restricted to the native ATM layer and the adaptation layer. Any work on the performance of the higher layers is being deferred for further study. 1.2. Goals of Performance Testing The goal of this effort is to enhance the marketability of ATM technology and equipment. Any additional criteria that helps in achieving that goal can be added later to this list. a. The ATM Forum shall define metrics that will help compare various ATM equipment in terms of performance. b. The metrics shall be such that they are independent of switch or NIC architecture. (i) The same metrics shall apply to all architectures. c. The metrics can be used to help predict the performance of an application or to design a network configuration to meet specific performance objectives. d. The ATM Forum will develop a precise methodology for measuring these metrics. (i) The methodology will include a set of configurations and traffic patterns that will allow vendors as well as users to conduct their own measurements. e. The testing shall cover all classes of service including CBR, rt-VBR, nrt-VBR, ABR, and UBR. f. The metrics and methodology for different service classes may be different. g. The testing shall cover as many protocol stacks and ATM services as possible.(i) As an example, measurements for verifying the performance of services such as IP, Frame Relay and SMDS over ATM may be included. h. The testing shall include metrics to measure performance of network management, connection setup, and normal data transfer. i. The following objectives are set for ATM performance testing: (i) Definition of criteria to be used to distinguish classes of applications. (ii) Definition of classes of applications, at or above the ATM Layer, for which performance metrics are to be provided. (iii) Identification of the functions at or above the ATM Layer which influence the perceived performance of a given class of applications. Example of such functions include traffic shaping, quality of service, adaptation, etc. These functions need to be measured in order to assess the performance of the applications within that class. (iv) Definition of common performance metrics for the assessment of the performance of all applications within a class. The metrics should reflect the effect of the functions identified in (iii). (v) Provision of detailed test cases for the measurement of the defined performance metrics. 1.3. Non-Goals of Performance Testing a. The ATM Forum is not responsible for conducting any measurements. b. The ATM Forum will not certify measurements. c. The ATM Forum will not set thresholds such that equipment performing below those thresholds are called "unsatisfactory." d. The ATM Forum will not establish any requirement that dictates a cost versus performance ratio. e. The following areas are excluded from the scope of ATM performance testing: (i) Applications whose performance cannot be assessed by common implementation independent metrics. In this case the performance is tightly related to the implementation. An example of such applications is network management, whose performance behavior depends on whether it is a centralized or a distributed implementation. (ii) Performance metrics which depend on the type of implementation or architecture of the SUT or the IUT. (iii) Test configurations and methodologies which assume or imply a specific implementation or architecture of the SUT or the IUT. (iv) Evaluation or assessment of results obtained by companies or other bodies. (v) Certification of conducted measurements or of bodies conducting the measurements. 1.4. Terminology The following definitions are used in this document: * Implementation Under Test (IUT): The part of the system that is to be tested. * Metric: a variable or a function that can be measured or evaluated and which reflects quantitatively the response or the behavior of an IUT or an SUT. * System Under Test (SUT): The system in which the IUT resides. * Test Case: A series of test steps needed to put an IUT into a given state to observe and describe its behavior. * Test Suite: A complete set of test cases, possibly combined into nested test groups, that is necessary to perform testing for an IUT or a protocol within an IUT. 1.5. Abbreviations ISO International Organization for Standardization IUT Implementation Under Test NP Network Performance NPC Network Parameter Control PDU Protocol Data Unit PVC Permanent Virtual Circuit QoS Quality of Service SUT System Under Test SVC Switched Virtual Circuit WG Working Group 2. Classes of Application Developing a test suite for each existing and new application can prove to be a difficult task. Instead, applications should be grouped into categories or classes. Applications in a given class have similar performance requirements and can be characterized by common performance metrics. This way, the defined performance metrics and test suites will be valid for a range of applications. Classes of application can be defined based on one or a combination of criteria. The following criteria can be used in the definition of the classes: (i) Time or delay requirements: real-time versus non real-time applications. (ii) Distance requirements: LAN versus WAN applications. (iii) Media type: voice, video, data, or multimedia application. (iv) Quality level: for example desktop video versus broadcast quality video. (v) ATM service category used: some applications have stringent performance requirements and can only run over a given service category. Others can run on several service categories. An ATM service category relates application aspects to network functionalities. (vi) Others to be determined. 2.1. Performance Testing Above the ATM Layer Performance metrics can be measured at the user application layer, and sometimes at the transport layer and the network layer, and can give an accurate assessment of the perceived performance. Since it is difficult to cover all the existing applications and all the possible combinations of applications and underlying protocol stacks, it is desirable to classify the applications into classes. Performance metrics and performance test suites can be provided for each class of applications. The perceived performance of a user application running over an ATM network is dependent on many parameters. It can vary substantially by changing an underlying protocol stack, the ATM service category it uses, the congestion control mechanism used in the ATM network, etc. Furthermore, there is no direct and unique relationship between the ATM Layer Quality of Service (QoS) parameters and the perceived application performance. For example, in an ATM network implementing a packet level discard congestion mechanism, applications using TCP as the transport protocol may see their effective throughput improved while the measured cell loss ratio may be relatively high. In practice, it is difficult to carry out measurements in all the layers that span the region between the ATM Layer and the user application layer given the inaccessibility of testing points. More effort needs to be invested to define the performance at these layers. These layers include adaptation, signaling, etc. 2.2. Performance Testing at the ATM Layer The notion of application at the ATM Layer is related to the service categories provided by the ATM service architecture. The Traffic Management Specification, version 4.0 [2], specifies five service categories: CBR, rt-VBR, nrt-VBR, UBR, and ABR. Each service category defines a relation of the traffic characteristics and the Quality of Service (QoS) requirements to network behavior. There is an assessment criteria of the QoS associated with each of these parameters. These are summarized below. QoS PERFORMANCE PARAMETER QoS ASSESSMENT CRITERIA Cell Error Ratio Accuracy Severely-Errored Cell Block Ratio Accuracy Cell Misinsertion Ratio Accuracy Cell Loss Rate Dependability Cell Transfer Delay Speed Cell Delay Variation Speed Measurement methods for the QoS parameters are defined in Appendix A of [1] and Appendix B of [2]. However, detailed test cases and procedures, as well as test configurations are needed for both in-service and out-of-service measurement of QoS parameters. An example of test configuration for the out-of-service measurement of QoS parameters is given in Appendix A of [3]. Performance testing at the ATM Layer covers the following categories: (i) In-service and out-of-service measurement of the QoS performance parameters for all five service categories (or application classes in the context of performance testing): CBR, rt-VBR, nrt-VBR, UBR, and ABR. The test configurations assume a non-overloaded SUT. (ii) Performance of the SUT under overload conditions. In this case, the efficiency of the congestion avoidance and congestion control mechanisms of the SUT are tested. In order to provide common performance metrics that are applicable to a wide range of SUT's and that can be uniquely interpreted, the following requirements must be satisfied: (i) Reference load models for the five service categories CBR, rt-VBR, nrt-VBR, UBR, and ABR, are required. Reference load models are to be defined by the Traffic Management Working Group. (ii) Test cases and configurations must not assume or imply any specific implementation or architecture of the SUT. 3. Performance Metrics In the following description System Under Test (SUT) refers to an ATM switch. However, the definitions and measurement procedures are general and may be used for other devices or a network consisting of multiple switches as well. 3.1. Throughput 3.1.1. Definitions There are three frame-level throughput metrics that are of interest to a user: * Loss-less throughput - It is the maximum rate at which none of the offered frames is dropped by the SUT. * Peak throughput - It is the maximum rate at which the SUT operates regardless of frames dropped. The maximum rate can actually occur when the loss is not zero. * Full-load throughput - It is the rate at which the SUT operates when the input links are loaded at 100% of their capacity. A model graph of throughput vs. input rate is shown in Figure 3.1. Level X defines the loss-less throughput, level Y defines the peak throughput and level Z defines the full-load throughput. [Figure 3.1: Peak, loss-less and full-load throughput] The loss-less throughput is the highest load at which the count of the output frames equals the count of the input frames. The peak throughput is the maximum throughput that can be achieved in spite of the losses. The full-load throughput is the throughput of the system at 100% load on input links. Note that the peak throughput may equal the loss-less throughput in some cases. Only frames that are received completely without errors are included in frame-level throughput computation. Partial frames and frames with CRC errors are not included. 3.1.2. Units Throughput should be expressed in the effective bits/sec, counting only bits from frames excluding the overhead introduced by the ATM technology and transmission systems. This is preferred over specifying it in frames/sec or cells/sec. Frames/sec requires specifying the frame size. The throughput values in frames/sec at various frame sizes cannot be compared without first being converted into bits/sec. Cells/sec is not a good unit for frame-level performance since the cells aren't seen by the user. 3.1.3. Statistical Variations There is no need for obtaining more than one sample for any of the three frame-level throughput metrics. Consequently, there is no need for calculation of the means and/or standard deviations of throughputs. 3.1.4. Measurement Procedures Before starting measurements, a number of VCCs (or VPCs), henceforth referred to as "foreground VCCs", are established through the SUT. Foreground VCCs are used to transfer only the traffic whose performance is measured. That traffic is referred as the foreground traffic. Characteristics of foreground traffic are specified in 3.1.5. The tests can be conducted under two conditions: * without background traffic; * with background traffic; Procedure without background traffic The procedure to measure throughput in this case includes a number of test runs. A test run starts with the traffic being sent at a given input rate over the foreground VCCs with early packet discard disabled (if this feature is available in the SUT and can be turned off). The average cell transfer delay is constantly monitored. A test run ends and the foreground traffic is stopped when the average cell transfer delay has not significantly changed (not more than 5%) during a period of at least 5 minutes. During the test run period, the total number of frames sent to the SUT and the total number of frames received from the SUT are recorded. The throughput (output rate) is computed based on the duration of a test run and the number of received frames. If the input frame count and the output frame count are the same then the input rate is increased and the test is conducted again. The loss-less throughput is the highest throughput at which the count of the output frames equals the count of the input frames. The input rate is then increased even further (with early packet discard enabled, if available). Although some frames will be lost, the throughput may increase till it reaches the peak throughput value. After this point, any further increase in the input rate will result in a decrease in the throughput. The input rate is finally increased to 100% of the link input rates and the full-load throughput is recorded. Procedure with background traffic Measurements of throughput with background traffic are under study. 3.1.5. Foreground Traffic Foreground traffic is specified by the type of foreground VCCs, connection configuration, service class, arrival patterns, frame length and input rate. Foreground VCCs can be permanent or switched, virtual path or virtual channel connections, established between ports on the same network module on the switch, or between ports on different network modules, or between ports on different switching fabrics. A system with n ports can be tested for the following connection configurations: * n-to-n straight, * n-to-(n-1) full cross, * n-to-m partial cross, 1 <= m <= n-1, * k-to-1, 1 COT. A zero-delay switch will transmit the last bit of each cell of the frame as soon as it is received. In particular, the last bit of the frame is transmitted as soon as it is received. Thus, NFOT in these cases is equal to the frame input time: NFOT = Frame Input Time and, MIMO latency = FILO latency - NFOT = FILO latency - Frame Input Time = LILO latency Then the equivalent MIMO latency definition is: MIMO latency is equal to LILO latency if Input Link Rate <= Output Link Rate and is equal to FILO latency minus NFOT otherwise. Throughout this discussion, we assume that the link rates are used in latency computation. If other rates are used, there is the potential for strange results. For example, it is possible that a carrier may offer a lower rate contract to a customer on a higher rate link. If the peak cell rate for the traffic contract is less than the link rate, and this peak cell rate is used for MIMO calculations, then the MIMO value may be negative, depending on the scheduling of cells on the link and the traffic contrac t. Using the link rate in MIMO calculations avoids this potential problem. A.8. Measuring MIMO Latency To measure MIMO latency for a frame passing through the System Under Test (SUT), the times of occurrence for the following two events need to be recorded: * the first-bit of the frame enters into the SUT, * the last-bit of the frame exits from the SUT. The time between these two events is the FILO latency. NFOT can be obtained from the cell pattern of the test frame on input as explained in Section A.6. Substituting FILO latency and NFOT into the MIMO latency formula would give the SUT's delay for a given frame. If the input link rate is lower than or equal to the output link rate, it is easier to calculate MIMO latency. In this case, the times of occurrence for the following two events need to be recorded: * the last-bit of the frame enters into the SUT, * the last-bit of the frame exits from the SUT. The time between these two events is the LILO latency, which is equal to the MIMO latency for the frame. Note that the cell arrival pattern does not matter in this case. Contemporary ATM monitors provide measurement data at the cell level. Considering that the definition of MIMO latency uses bit level data, we now describe how to calculate MIMO latency using measurements at the cell level. Standard definitions of two cell level performance metrics, which are of importance for MIMO latency calculation are: * cell transfer delay (CTD) , defined as the time between the first bit of the cell entering the switch and the last bit of the cell leaving the switch, * cell inter-arrival time, defined as the time between arrival of the last bit of the first cell and arrival of the last bit of the second cell. In cases where input link rate is higher than output link rate, according to the MIMO latency definition, FILO latency has to be measured. From Figure A.10, it can be observed that: FILO latency = First cell's transfer delay + First cell to last cell inter-arrival time Thus, to calculate MIMO latency when the input link rate is higher than or equal to the output link rate, it is necessary to measure the transfer delay of the first cell of a frame and the inter-arrival time between the first cell and the last cell of a frame. In cases when input link rate is lower than or equal to output link rate, it is sufficient to measure LILO latency. From Figure A.11, it can be observed that: LILO latency = Last cell's transfer delay - CIT Thus, to calculate MIMO latency when the input link rate is lower than or equal to the output link rate, it is necessary to measure the transfer delay of the last cell of a frame. A.9. User Perceived Delay It should be pointed out that MIMO latency measures only the SUT's contribution to the delay. It does not include the delay caused by components not in the SUT's control. In particular, it does not include the frame input time. However, a user using the system does have to wait while the frame is being sent to the SUT. A user typically assembles the frame and gives it to the network. The user starts waiting as soon as the first bit starts entering the system and cannot do any meaningful work until the las t bit exits the network. Thus, user perceived performance is reflected by FILO latency. [Figure A.10: FILO Latency Calculation (Input rate > Output Rate)] [Figure A.11: LILO Latency Calculation (Input rate <= Output Rate)] Figure A.12 illustrates the relationships between the user perceived performance and MIMO latency in two scenarios with continuous frames. In the first scenario, the input link rate is same as the output link rate. In the second scenario, the output is slower. The switch delay, as given by MIMO latency, is same in both cases; but the user perceived delay, as given by FILO latency, is different. For the case in Figure A.12b, FILO latency is worse. It can be observed that the user perceived delay depends u pon input/output link speeds. On the other hand, network delay measured by MIMO latency is independent of link speeds. The difference between those two delays is the frame latency through a zero-delay switch. [Figure A.12: FILO Latency as User Perceived Delay] References: [1] CCITT Recommendation X.135: " Speed of Service (Delay and Throughput) Performance Values for Public Data Networks when Providing International Packet Switched Service", 1992 [2] S. Bradner, "Benchmarking Terminology for Network Interconnection Devices", RFC 1242[3] ITU-T Recommendation I.356, "B-ISDN ATM Layer Specification," ITU-Study Group 13, Geneva, 1995 Editorial note: The current text of Appendix B includes the correction of a printing error which has been found in the algorithm pseudo-code part of this document approved at September '97 ATMF meeting. Appendix B: Methodology for Implementing Connection Configurations B.1. Introduction In Sections 3.1.7 and 3.2.7 of the baseline text, a number of configurations have been presented for throughput and latency measurements. In their basic form, these configurations require traffic generators and analyzers, whose number increases as the number of ports on a switch increases. Since the test monitors are rather expensive, it is desirable to define scalable configurations that can be used with a limited number of generators. However, one problem with scalable configurations is that there are many ways to set up the connections and the results could vary with the setup. In this appendix, a standard method for generating these configurations is defined. Thus, anyone can design a connection configuration for switches with any number of ports. Since the methodology presented here applies to any number of traffic generators, it can be used for non-scalable (basic) configurations as well. Performance testing requires two kinds of virtual channel connections (VCCs): foreground VCCs (traffic that is measured) and background VCCs (traffic that simply interferes with the foreground traffic). The methodology for generating configurations of both types of VCCs is covered in this appendix. The VCCs are formed by setting up connections between ports of the switch. The connection order of these ports is referred to here as a VCC Chain. For example, the VCC shown in Figure B.1 consists of one VCC chain passing through ports P1-P2-P3-P4-1. Another possible configuration for this "N-to-N" single generator scalable configuration would be P1-P3-P2-P4-P1. For an N-port switch, there are a total of (N-1)! possible configurations. [Figure B.1 One out of six possible VCC chains that can implement the 4-to-4 straight configuration with a single generator.] If the four-port switch shown in Figure B.1 consists of two modules with two ports each, the measured performance may depend upon the number of times the VCC chain passes from one module to the other and may be different for different configurations. At the end of this appendix, the pseudocode for a computer program is presented that allows generating a standardized port order for all connection configurations. This methodology (pseudocode) generally creates VCC chains that cross the modules as often as possible while still keeping the whole process simple. B.2. Definitions and Rules In order to generate a standard configuration, it is first necessary to have a standard method of numbering the ports of a switch. This method is presented in this section. Consider a switch with several modules. Each module may have a varying number of ports. In order to number these ports, the first step is to generate a schematic of modules placed one below the other. The schematic is drawn such that the modules are arranged in a decreasing order of number of ports. Then the switch ports are numbered sequentially, along the columns, starting from the top left corner of the schematic. This port numbering helps in creating VCC chains that cross modules as often as possible. The port numbers obtained this way are represented by Pi in this appendix. Figure B.2 shows one example of port numbering. The switch consists of three modules with 8, 7, and 6 ports respectively. The first port on the first module is numbered 1 or P1. The first port on the second module is numbered P2, and so on up to P18 as shown in the figure. For simplicity, we also refer to Pi as port i. [Figure B.2 Numbering of ports in a switch with different number of ports per module.] The second thing we need is a standard method of presenting connection configurations. Each VCC chain is represented by a three-dimensional matrix C(i, j, k). Matrix index i represents the interconnection order among the switch ports, where the value 0 indicates the source port and the last value indicates the destination port. Index k represents the generator number and index j represents the chain number starting at that generator. One row C(*, j, k) of the matrix represents a single VCC chain. For exam ple, if the first VCC chain from generator #2 starts at source port P1, passes through ports P3, P4, P5, P6, P7, P8, and exits at port P2, the matrix C has the following entries: C(0,1,2)=P1, C(1, 1, 2)=P3, C(2, 1, 2)=P4, C(3, 1, 2)=P5, C(4, 1, 2)=P6, C(5, 1, 2)=P7, C(6, 1, 2)=P8, C(7, 1, 2)=P2. Figure B.3 illustrates this VCC chain. The source port and destination ports are also represented by symbols Cin and Cout. For the VCC of Figure B.3, Cin=P1, Cout=P2. [Figure B.3 A VCC chain ] NP(k) denotes the total number of intermediate ports for a VCC chain generated by generator k. Notice that the source and destination ports are not counted. In the case of Figure B.3, NP(2) = 6. Note that C(NP+1, j, k) is always the destination port. For latency measurements, the foreground traffic involves only two ports, one for input and the other for output. To design the VCC chain for this traffic, the operator may simply chose any two ports, referred to as CFin and CFout respectively. Here, F in the subscript signifies "foreground." In order to avoid interference with the foreground traffic, the background VCC chains may or may not use CFin and CFout. If the background traffic does use these ports then it should only be in the directions opposi te to that used by the foreground traffic. Figure B.4 shows a schematic representation of connection configuration for latency measurement of a 8-port 2-module switch. The foreground traffic uses ports P2 and P1 as the source and destination ports, respectively. So, CFin=P2 and CFout=P1. The background traffic also uses these ports but in the opposite direction. Therefore, for the background traffic: Cin = C(0,1,2)=P1 and Cout=C(0,1,2)=P2. The background traffic can use the six remaining ports in both directions. Incidently, Figure B.3 shown earlier shows the VCC chain representation of this same configuration. From now on, we only show VCC chain representions for all configurations. It is straight forward to generate the schematic representations from it. [Figure B.4. A 7-to-7 straight configuration with one-generator for the background traffic.] B.3. Connection Configurations Characteristics. In this section we analyze several of the configurations for throughput and latency measurements and show how scalable version of them can be obtained using the algorithm given in Section B.3. The algorithm consists of three simple rules: 1. The chains generally go from port i to port i+1 unless the port has already been fully used by other chains. 2. After generating jth chain, j+1 chain can be generated simply by adding 1 to each port index of the jth chain. 3. If there are multiple generators, each generator uses a contiguous subset of the switch ports as source ports. Each generator needs as many source ports as the number of VCC chains starting from it. B.3.1 N-to-N Straight (Single Generator) This configuration is used for throughput as well as latency measurements. The scalable versions can be obtained as follows: a) Throughput measurements: For these tests, we need only a single chain starting from a single generator, i.e., k=1 and j=1. The chain starts from one port, goes through all other ports and exits from the starting port. Therefore, NP(1) is equal to N-1. Cin and Cout coincide and any port Px could be selected to be the input/output port Figure B.5 illustrates this case for the 2-module 8-port switch. Figure B.5a shows how to number the switch ports. Figure B.5b presents the VCC chain representation of the configuration. using Cin=C(0,1,1) = Cout = C(8,1,1) =P1. The application of the algorithm is simple. The ports C(i,1,1) in the VCC chain are selected in numerically increasing order. The ports are included in VCC chain if they are not already used up. After reaching Nth port, the index (i) starts again from the beginning (from i=1). [Figure B.5a. Port numbering of a switch with 2 modules and 4 ports on each module. The numbers in brackets indicate the port numbers in the module.] For Cin=Cout=P1, the VCC chain is:P1-P2-P3-P4-P5-P6-P7-P8-P1 If we chose P3 as the source and destination then the VCC chain will be: P3-P4-P5-P6-P7-P8-P1-P2-P3 [Figure B.5b. The 8-to-8 straight configuration with one generator.] Note that in both cases, the VCC chains cross the modules at every hop. b) Latency Measurements: First, let us consider the case in which the background traffic does use the source/destination ports of the foreground traffic (but in the opposite direction). The background traffic passes through all other ports. Therefore, NP(1) is equal to N-2 . Cin and Cout for the background coincide respectively with PFout and PFin. If PFin=P2 and PFout=1, the foreground chain is P2-P1 and the background chain is P1-P2-P3-P4-P5-P6-P7-P8-P2. This connection configuration was presented ea rlier in Figures B.3 and B.4. Now, let us consider the case in which the background traffic does not use the source/destination ports of the foreground. In this case, NP(1) is equal to N-3. Cin and Cout coincide and could be selected any of the switch ports Px except PFout and PFin. For example, the foreground could use the chain P2-P1 and background could use P3-P4-P5-P6-P7-P8-P3. Figure B.6 illustrates this case. Figure B.6 Implementation of the 6-to-6 straight configuration with one generator. B.3.2. N-to-N Straight (r Generators) This configuration implements the N-to-N straight configuration with r generators. a) Throughput Measurements: Each generator has one VCC chain. In all there are r VCC chains. Of the N ports, r ports are used as source/destination of these chains. The remaining ports are divided among the generators as evenly as possible. Let p =mod(N-r, r) * For the first p VCC chains, the number of intermediate ports NP is equal to the quotient of (N-r)/r plus 1 * For the remaining (r-p) VCC chains, NP is equal to the quotient of (N-r)/r * For all VCC chains, the source/destination ports coincide and may be selected from any of the switch ports Px not selected by other VCC chains as source or destination. As an example, consider the 8-port switch again. With r=3 generators, p equals mod(8-3, 3) = 2. So, the first two VCC chains have NP equal to the quotient of (8-3)/3 plus 1 equal to 2 intermediate ports, and the last chain has NP equal to the quotient of (8-3)/3 equal to 1. Figures B.7 illustrates the implementation of the VCC chains for this case. First we select the source and destination ports: Port 1 is the source and destination for the first chain, so C(0,1,1)=P1, C(3, 1,1)=P1 Port 2 is the source and destination for the second chain, so C(0,1,2)=P2, C(3,1,2)=P2 Port 3 is the source and destination for the third chain, so C(0,1,3)=P3, C(2, 1,3)=P3. These selections have been made to avoid any overlap. Then we apply the algorithm. Let us start with the VCC chain having port 1 as the source. The next port available is P4, so C(1,1,1)=P4, then C(2,1,1)=P5. This VCC chain has two intermediate ports, for this reason it is now complete. Now we continue with the VCC chain starting at port 2. The next available port is port 6 (because 4 and 5 are fully occupied by the previous VCC chain). So C(1,1,2)=P6, and then C(2,1,2)=P7. Similarly, C(1,1,3)=P8. This VCC chain has only one intermediate port. The VCC cha in implementation is complete. [Figure B.7 Implementation of the 8-to-8 straight configuration with 3 generators .] b) Latency Measurements: Consider the case with the background traffic using the foreground ports in the opposite direction. The remaining N-1 ports are evenly divided among the r background VCC chains. Let p =mod(N-r-1, r) * For the first p VCC chains NP is equal to the quotient of (N-r-1)/r plus 1 * For the remaining (r-p) VCC chains, NP is equal to the quotient of (N-r-1)/r, * For one of VCC chains, Cin and Cout coincide with PFout and PFin respectively. * For the other VCC chains, Cin and Cout coincide and could be selected from any of the switch ports Px not selected by other VCCs Figure B.8 illustrates an example for this case. Ports 1 and 2 are used by the foreground traffic as destination and source ports, respectively. [Figure B.8 Implementation of the 7-to-7 straight configuration with 3 generators for background traffic in latency measurement.] Ports 1 and 2 will be used as source and destination ports (respectively) by one of the background VCC chains. The other two generators will use port 3 and 4 as the source and destination ports, respectively. For the first VCC chain, NP =2 and for the other two VCC chains NP=1. The chains are: P1-P5-P6-P2, P3-P7-P3, and P4-P8-P4. Note that the first chain goes from P1 to P5 since P2, P3, P4 since have already been assigned to other chains. The configuration for the case when the background traffic does not share the ports with the foreground can be generated by the above procedure by considering the switch having only N-2 ports B.3.3. N-to-m Partial Cross (r Generators) This is a generalization of N-to-m Partial cross with 1 generator presented in the baseline. The discussion here applies for r=1. Also, by appropriately setting r, one can obtain non-scalable (basic) configurations. a) Throughput Measurements: This configuration has m*r VCC chains originating from r, where each generator originates m VCC chains. Each has a load of 1/mth of the generator. Each intermediate node has exactly m of these streams flowing through it. Again, the ports are evenly divided among the chains. However, since each chain uses only a part of the port's capacity, the ports can be used by other chains even from other generators as well. Let p =mod(N-r, r) * For the first p VCC chains, the number of intermediate ports NP is equal to the quotient of (N-r)/r plus 1 * For the remaining (r-p) VCC chains, NP is equal to the quotient of (N-r)/r * For all m VCC chains, source/destination ports coincide and may be selected from any of the switch ports Px not selected by other VCC chains. Figure B.9 illustrates the case of 8-to-2 partial cross with 2 generators. In this case, p =mod(8-2,2) = 0. So, the VCC chains of both generators have quotient of (8-2)/2 is equal to 3 intermediate ports. [Figure B.9 Implementation of 8-to-2 partial cross configuration with 2 generators for foreground traffic ] Both of the VCC chains of the first generator start and end at port 1, so: C(0,1,1)=C(0,2,1)= C(4,1,1)=C(4,2,1)=P1. Similarly for the two VCC chains of the other generator: C(0,1,2)=C(0,2,2)=C(4,1,2)=C(4,2,2)=P2. First we divide the remaining ports among the two generators. The first generator gets P3, P4, and P5. The second generator gets P6, P7, and P8. The first chain of the first generator is simply P1-P3-P4-P5-P1. The first chain of the second generator is P2-P6-P7-P8-P2. The second chain from the first generator is obtained by shifting the intermediate ports of the first chain. Therefore, the chain is P1-P4-P5-P6-P1. Note that this chain is sharing port P6 of the other generator since each chain uses only half the capacity. The second chain of the second generator is again obtained by shifting: P2-P7-P8-P3-P2. Note that, shifting P8 would have produced P1 but P1 is being fully used. The next port P2 is also b eing fully used. So P3 is used. b)Latency measurements: Again we consider only the case of background traffic sharing the foreground ports in the opposite direction. Excluding the foreground port, the remaining N-1 ports are evenly divided among the r generators. Let p =mod(N-r-1, r) * For all VCCs of the first p generators NP is equal to the quotient of (N-r)/r plus 1 * For all VCCs of the remaining (r-p) generators, NP is equal to the quotient of (N-r)/r, * For all m VCCs of only one generator, source and destination coincide with PFout and PFin respectively. * For all m VCCs of all other generators, source and destination coincide and could be selected from any of the switch ports Px not selected by other generators. An example of this case is shown in Figure B.10. In this case, N=8, r=2. So p=mod(8-2-1,2) = 1 , so NP(1)=3 and NP(2)=2. [Figure B.10 Implementation of 7-to-2 partial cross configuration with 2 generators for background traffic in latency measurements.] The VCC chains of the first generator will use ports 1 and 2 in opposite directions of the foreground traffic,. The VCC chains of the second generator will use port 3 as the source and destination. The chains of the first generator are: P1-P4-P5-P6-P2, P1-P5-P6-P7-P2.The chains of the second generator are: P3-P7-P8-P3, P3-P8-P4-P3. Table B.1 summarizes the values for number of intermediate ports in various configurations of this section B.2. These values are used in the pseudocode of Section B.3. [Table B.1 Parameter values used in the algorithm to creating VCC chains for different configurations.] B 4. Algorithm for creating VCC Chains. The algorithm for creating VCC chains for different connection configuration is based on the definitions given in section B.1. and the characteristics specified in section B.2 and summarized in Table B.1. * NP(k) denotes the number of intermediate ports for the VCC chains of the kth generator. These values are specified in B.2. * P(f) denotes the fth port of the switch * C(i, j, k) denotes the ith intermediate port of the jth VCC chain of the kth generator * The function mod*(x, N) is equal to mod(x, N) except for the cases where mod(x, N) is equal to zero, where the function is equal to N f = 1; for (k = 1 to r, step 1) { if(k>1){f=0; for(q = mod*(1 + sum of NP(d) for d from 1 to (k-1), N), to q>=1, step -1) { f=f+1; while P(f) is source or destination{ f = f+1;} } end for q }end if(k>1) for (j = 1 to m, step 1) { if (r is equal to 1 and j > 1) {f = mod*(f+1, N);} if (r>1 and j>1) {f=C(2,j-1,k);} for (i =1 to NP(k), step 1) { while (P(f) is source or destination or is full) {f = mod*(f+1, N);} C(i, j, k) = P(f);f = mod*(f+1, N); } end for i } end for j } end for k.