ATM Forum Document Number: BTD-TEST-TM-PERF.00.05 (96-0810R8) ***************************************************************** Title: ATM Forum Performance Testing Specification - Baseline Text ***************************************************************** Abstract: This baseline document includes all text related to performance testing that has been agreed so far by the ATM Forum Testing Working Group. ***************************************************************** Source: Raj Jain, Gojko Babic, Arjan Durresi The Ohio State University Department of CIS Columbus, OH 43210-1277 Phone: 614-292-3989, Fax: 614-292-2911, Email: Jain@ACM.Org The presentation of this contribution at the ATM Forum is sponsored by NASA Lewis Research Center. ***************************************************************** Date: February 1998 ***************************************************************** Distribution: ATM Forum Technical Working Group Members (AF-TEST, AF-TM) ***************************************************************** Notice: This contribution has been prepared to assist the ATM Forum. It is offered to the Forum as a basis for discussion and is not a binding proposal on the part of any of the contributing organizations. The statements are subject to change in form and content after further study. Specifically, the contributors reserve the right to add to, amend or modify the statements contained herein. ***************************************************************** Two postscript versions of this document including all figures and tables have been uploaded to the ATM Forum ftp server in the incoming directory. One postscript version shows changes from the last version and the other doesn't. These may be moved from there to atm documents directory. The postscript versions are also available on our web page via: http://www.cse.wustl.edu/~jain/atmf/bperf05.htm Technical Committee ATM Forum Performance Testing Specification February 1998 BTD-TEST-TM-PERF.00.05 (96-0810R8) ATM Forum Performance Testing Specifications Version 1.0, February 1998 (C) 1998 The ATM Forum. All Rights Reserved. No part of this publication may be reproduced in any form or by any means. The information in this publication is believed to be accurate at its publication date. Such information is subject to change without notice and the ATM Forum is not responsible for any errors. The ATM Forum does not assume any responsibility to update or correct any information in this publication. Notwithstanding anything to the contrary, neither The ATM Forum nor the publisher make any representation or warranty, expressed or implied, concerning the completeness, accuracy, or applicability of any information contained in this publication. No liability of any kind shall be assumed by The ATM Forum or the publisher as a result of reliance upon any information contained in this publication. The receipt or any use of this document or its contents does not in any way create by implication or otherwise: · Any express or implied license or right to or under any ATM Forum member company’s patent, copyright, trademark or trade secret rights which are or may be associated with the ideas, techniques, concepts or expressions contained herein; nor · Any warranty or representation that any ATM Forum member companies will announce any product(s) and/or service(s) related thereto, or if such announcements are made, that such announced product(s) and/or service(s) embody any or all of the ideas, technologies, or concepts contained herein; nor · Any form of relationship between any ATM Forum member companies and the recipient or user of this document. Implementation or use of specific ATM recommendations and/or specifications or recommendations of the ATM Forum or any committee of the ATM Forum will be voluntary, and no company shall agree or be obliged to implement them by virtue of participation in the ATM Forum. The ATM Forum is a non-profit international organization accelerating industry cooperation on ATM technology. The ATM Forum does not, expressly or otherwise, endorse or promote any specific products or services. Table of Contents 1. INTRODUCTION 1 1.1. SCOPE 1 1.2. GOALS OF PERFORMANCE TESTING 2 1.3. NON-GOALS OF PERFORMANCE TESTING 3 1.4. TERMINOLOGY 3 1.5. ABBREVIATIONS 4 2. CLASSES OF APPLICATION 4 2.1. PERFORMANCE TESTING ABOVE THE ATM LAYER 4 2.2. PERFORMANCE TESTING AT THE ATM LAYER 5 3. PERFORMANCE METRICS 6 3.1. THROUGHPUT 6 3.1.1. DEFINITIONS 6 3.1.2. UNITS 7 3.1.3. STATISTICAL VARIATIONS 7 3.1.4. MEASUREMENT PROCEDURES 7 3.1.5. FOREGROUND TRAFFIC 8 3.1.6. BACKGROUND TRAFFIC 11 3.1.7. GUIDELINES FOR SCALEABLE TEST CONFIGURATIONS 11 3.1.8. REPORTING RESULTS 13 3.2. FRAME LATENCY 13 3.2.1. DEFINITION 13 3.2.2. UNITS 15 3.2.3. STATISTICAL VARIATIONS 15 3.2.4. MEASUREMENT PROCEDURES 16 3.2.5. FOREGROUND TRAFFIC 16 3.2.6. BACKGROUND TRAFFIC 16 3.2.8. REPORTING RESULTS 19 3.3. THROUGHPUT FAIRNESS 20 3.3.1. DEFINITION 20 3.3.2. UNITS 20 3.3.3. MEASUREMENT PROCEDURES 21 3.3.4. STATISTICAL VARIATIONS 21 3.3.5. REPORTING RESULTS 21 3.4. FRAME LOSS RATIO 21 3.4.1. DEFINITION 21 3.4.2. UNITS 22 3.4.3. MEASUREMENT PROCEDURES 22 3.4.4. STATISTICAL VARIATIONS 22 3.4.5. REPORTING RESULTS 22 3.5. MAXIMUM FRAME BURST SIZE (MFBS) 22 3.5.1 DEFINITION 22 3.5.2 UNITS 23 3.5.3 STATISTICAL VARIATIONS 23 3.5.4 MEASUREMENT PROCEDURE AND MFBS CALCULATION 23 3.5.5 REPORTING RESULTS 23 3.6. CALL ESTABLISHMENT LATENCY 24 3.6.1. DEFINITION 24 3.6.2. UNITS 25 3.6.3. CONFIGURATIONS 25 3.6.4. STATISTICAL VARIATIONS 25 3.6.5. GUIDELINES FOR USING THIS METRIC 25 4. REFERENCES 26 APPENDIX A: DEFINING FRAME LATENCY ON ATM NETWORKS 27 A.1. INTRODUCTION 27 A.2. USUAL FRAME LATENCIES AS METRICS FOR ATM SWITCH DELAY 29 A.3. MIMO LATENCY DEFINITION 32 A.4. CELL AND CONTIGUOUS FRAME LATENCY THROUGH A ZERO-DELAY SWITCH 33 A.5. LATENCY OF DISCONTINUOUS FRAMES PASSING THROUGH A ZERO- DELAY SWITCH 36 A.6. CALCULATION OF FILO LATENCY FOR A ZERO-DELAY SWITCH 38 A.7. EQUIVALENT MIMO LATENCY DEFINITION 39 A.8. MEASURING MIMO LATENCY 39 A.9. USER PERCEIVED DELAY 40 APPENDIX B:METHODOLOGY FOR IMPLEMENTING SCALABLE TEST CONFIGURATIONS 43 B.1. INTRODUCTION 43 B.2. IMPLEMENTATION OF EXTERNAL CONNECTIONS 44 B.3. IMPLEMENTATION OF INTERNAL CONNECTIONS. 47 B.3.1 N-TO-N STRAIGHT (SINGLE GENERATOR) 48 B.3.2. N-TO-N STRAIGHT (R GENERATORS) 50 B.3.3. N-TO-M PARTIAL CROSS (R GENERATORS) 52 B 4. INTERNAL CONNECTION ALGORITHM FOR CREATING VCC CHAINS. 56 1. Introduction Performance testing in ATM deals with the measurement of the level of quality of a system under test (SUT) or an interface under test (IUT) under well-known conditions. The level of quality can be expressed in the form of metrics such as latency, end-to-end delay, effective throughput. Performance testing can be carried at the end-user application level (e.g., FTP, NFS), at or above the ATM layers (e.g., cell switching, signaling, etc.). Performance testing also describes in details the procedures for testing the IUTs in the form of test suites. These procedures are intended to test the SUT or IUT and do not assume or imply any specific implementation or architecture of these systems. This document highlights the objectives of performance testing and suggests an approach for the development of the test suites. 1.1. Scope Asynchronous Transfer Mode, as an enabling technology for the integration of services, is gaining an increasing interest and popularity. ATM networks are being progressively deployed and in most cases a smooth migration to ATM is prescribed. This means that most of the existing applications can still operate over ATM via service emulation or service interworking along with the proper adaptation of data formats. At the same time, several new applications are being developed to take full advantage of the capabilities of the ATM technology through an Application Protocol Interface (API). While ATM provides an elegant solution to the integration of services and allows for high levels of scalability, the performance of a given application may vary substantially with the IUT or the SUT utilized. The variation in the performance is due to the complexity of the dynamic interaction between the different layers. For example, an application running with TCP/IP stacks will yield different levels of performance depending on the interaction of the TCP window flow control mechanism and the ATM network congestion control mechanism used. Hence, the following points and recommendations are made. First, ATM adopters need guidelines on the measurement of the performance of user applications over different systems. Second, some functions above the ATM layer, e.g., adaptation, signaling, constitute applications (i.e. IUTs) and as such should be considered for performance testing. Also, it is essential that these layers be implemented in compliance with the ATM Forum specifications. Third, performance testing can be executed at the ATM layer in relation to the QoS provided by the different service categories. Finally, because of the extensive list of available applications, it is preferable to group applications in generic classes. Each class of applications requires different testing environment such as metrics, test suites and traffic test patterns. It is noted that the same application, e.g., ftp, can yield different performance results depending on the underlying layers used (TCP/IP to ATM versus TCP/IP to MAC layer to ATM). Thus performance results should be compared based on the utilization of the same protocol stack. Performance testing is related to user perceived performance of ATM technology. In other words, goodness of ATM will be measured not only by cell level performance but also by frame-level performance and performance perceived at higher layers. Most of the quality of Service (QoS) metrics, such as cell transfer delay (CTD), cell delay variation (CDV), cell loss ratio (CLR), and so on, may or may not be reflected directly in the performance perceived by the user. For example, while comparing two switches if one gives a CLR of 0.1% and a frame loss ratio of 0.1% while the other gives a CLR 1% but a frame loss of 0.05%, the second switch will be considered superior by many users. ATM Forum and ITU-T have standardized the definitions of ATM layer QoS metrics and their measurement [1, 2, 3, 4]. This specification does the same for higher layer performance metrics. Without a standard definition, each vendor will use their own definition of common metrics such as throughput and latency resulting in a confusion in the market place. Avoiding such a confusion will help buyers eventually leading to better sales resulting in the success of the ATM technology. The initial work at the ATM Forum will be restricted to the native ATM layer and the adaptation layer. Any work on the performance of the higher layers is being deferred for further study. 1.2. Goals of Performance Testing The goal of this effort is to enhance the marketability of ATM technology and equipment. Any additional criteria that helps in achieving that goal can be added later to this list. a. The ATM Forum shall define metrics that will help compare various ATM equipment in terms of performance. b. The metrics shall be such that they are independent of switch or NIC architecture. (i) The same metrics shall apply to all architectures. c. The metrics can be used to help predict the performance of an application or to design a network configuration to meet specific performance objectives. d. The ATM Forum will develop a precise methodology for measuring these metrics. (i) The methodology will include a set of configurations and traffic patterns that will allow vendors as well as users to conduct their own measurements. e. The testing shall cover all classes of service including CBR, rt-VBR, nrt-VBR, ABR, and UBR. f. The metrics and methodology for different service classes may be different. g. The testing shall cover as many protocol stacks and ATM services as possible. (i) As an example, measurements for verifying the performance of services such as IP, Frame Relay and SMDS over ATM may be included. h. The testing shall include metrics to measure performance of network management, connection setup, and normal data transfer. i. The following objectives are set for ATM performance testing: (i) Definition of criteria to be used to distinguish classes of applications. (ii) Definition of classes of applications, at or above the ATM Layer, for which performance metrics are to be provided. (iii) Identification of the functions at or above the ATM Layer which influence the perceived performance of a given class of applications. Example of such functions include traffic shaping, quality of service, adaptation, etc. These functions need to be measured in order to assess the performance of the applications within that class. (iv) Definition of common performance metrics for the assessment of the performance of all applications within a class. The metrics should reflect the effect of the functions identified in (iii). (v) Provision of detailed test cases for the measurement of the defined performance metrics. 1.3. Non-Goals of Performance Testing a. The ATM Forum is not responsible for conducting any measurements. b. The ATM Forum will not certify measurements. c. The ATM Forum will not set thresholds such that equipment performing below those thresholds are called "unsatisfactory." d. The ATM Forum will not establish any requirement that dictates a cost versus performance ratio. e. The following areas are excluded from the scope of ATM performance testing: (i) Applications whose performance cannot be assessed by common implementation independent metrics. In this case the performance is tightly related to the implementation. An example of such applications is network management, whose performance behavior depends on whether it is a centralized or a distributed implementation. (ii) Performance metrics which depend on the type of implementation or architecture of the SUT or the IUT. (iii) Test configurations and methodologies which assume or imply a specific implementation or architecture of the SUT or the IUT. (iv) Evaluation or assessment of results obtained by companies or other bodies. (v) Certification of conducted measurements or of bodies conducting the measurements. 1.4. Terminology The following definitions are used in this document: · Implementation Under Test (IUT): The part of the system that is to be tested. · Metric: a variable or a function that can be measured or evaluated and which reflects quantitatively the response or the behavior of an IUT or an SUT. · System Under Test (SUT): The system in which the IUT resides. · Test Case: A series of test steps needed to put an IUT into a given state to observe and describe its behavior. · Test Suite: A complete set of test cases, possibly combined into nested test groups, that is necessary to perform testing for an IUT or a protocol within an IUT. 1.5. Abbreviations ISO International Organization for Standardization IUT Implementation Under Test NP Network Performance NPC Network Parameter Control PDU Protocol Data Unit PVC Permanent Virtual Circuit QoS Quality of Service SUT System Under Test SVC Switched Virtual Circuit WG Working Group 2. Classes of Application Developing a test suite for each existing and new application can prove to be a difficult task. Instead, applications should be grouped into categories or classes. Applications in a given class have similar performance requirements and can be characterized by common performance metrics. This way, the defined performance metrics and test suites will be valid for a range of applications. Classes of application can be defined based on one or a combination of criteria. The following criteria can be used in the definition of the classes: (i) Time or delay requirements: real-time versus non real-time applications. (ii) Distance requirements: LAN versus WAN applications. (iii) Media type: voice, video, data, or multimedia application. (iv) Quality level: for example desktop video versus broadcast quality video. (v) ATM service category used: some applications have stringent performance requirements and can only run over a given service category. Others can run on several service categories. An ATM service category relates application aspects to network functionalities. (vi) Others to be determined. 2.1. Performance Testing Above the ATM Layer Performance metrics can be measured at the user application layer, and sometimes at the transport layer and the network layer, and can give an accurate assessment of the perceived performance. Since it is difficult to cover all the existing applications and all the possible combinations of applications and underlying protocol stacks, it is desirable to classify the applications into classes. Performance metrics and performance test suites can be provided for each class of applications. The perceived performance of a user application running over an ATM network is dependent on many parameters. It can vary substantially by changing an underlying protocol stack, the ATM service category it uses, the congestion control mechanism used in the ATM network, etc. Furthermore, there is no direct and unique relationship between the ATM Layer Quality of Service (QoS) parameters and the perceived application performance. For example, in an ATM network implementing a packet level discard congestion mechanism, applications using TCP as the transport protocol may see their effective throughput improved while the measured cell loss ratio may be relatively high. In practice, it is difficult to carry out measurements in all the layers that span the region between the ATM Layer and the user application layer given the inaccessibility of testing points. More effort needs to be invested to define the performance at these layers. These layers include adaptation, signaling, etc. 2.2. Performance Testing at the ATM Layer The notion of application at the ATM Layer is related to the service categories provided by the ATM service architecture. The Traffic Management Specification, Version 4.0 [2] specifies five service categories: CBR, rt-VBR, nrt-VBR, UBR, and ABR. Each service category defines a relation of the traffic characteristics and the Quality of Service (QoS) requirements to network behavior. There is an assessment criteria of the QoS associated with each of these parameters. These are summarized below. QoS PERFORMANCE PARAMETER QoS ASSESSMENT CRITERIA Cell Error Ratio Accuracy Severely-Errored Cell Block Ratio Accuracy Cell Misinsertion Ratio Accuracy Cell Loss Rate Dependability Cell Transfer Delay Speed Cell Delay Variation Speed Section 5.6 of ITU-T Recommendation I.356 [1] further defines the Severely-Errored Cell Block Ratio. ITU-T Recommendation O.191 [4] defines measurement methods for both the in-service and out-of-service modes. The in-service mode uses OAM cells, while the out-of-service mode defines the payloads to be used for test cells on connections running out-of- service measurements. ATM Forum specification [3] also defines out-of-service measurement of several QoS parameters. However, detailed test cases and procedures, as well as test configurations are needed for both in-service and out-of-service measurement of QoS parameters. An example of test configuration for the out-of-service measurement of QoS parameters is given in Appendix A of [3]. Performance testing at the ATM Layer covers the following categories: (i) In-service and out-of-service measurement of the QoS performance parameters for all five service categories (or application classes in the context of performance testing): CBR, rt-VBR, nrt-VBR, UBR, and ABR. The test configurations assume a non-overloaded SUT. (ii) Performance of the SUT under overload conditions. In this case, the efficiency of the congestion avoidance and congestion control mechanisms of the SUT are tested. In order to provide common performance metrics that are applicable to a wide range of SUT's and that can be uniquely interpreted, the following requirements must be satisfied: (i) Reference load models for the five service categories CBR, rt-VBR, nrt-VBR, UBR, and ABR, are required. Reference load models are to be defined by the Traffic Management Working Group. (ii) Test cases and configurations must not assume or imply any specific implementation or architecture of the SUT. 3. Performance Metrics In the following description System Under Test (SUT) refers to an ATM switch. However, the definitions and measurement procedures are general and may be used for other devices or a network consisting of multiple switches as well. 3.1. Throughput 3.1.1. Definitions There are three frame-level throughput metrics that are of interest to a user: · Loss-less throughput - It is the maximum rate at which none of the offered frames is dropped by the SUT. · Peak throughput - It is the maximum rate at which the SUT operates regardless of frames dropped. The maximum rate can actually occur when the loss is not zero. · Full-load throughput - It is the rate at which the SUT operates when the input links are loaded at 100% of their capacity. A model graph of throughput vs. input rate is shown in Figure 3.1. Level X defines the loss-less throughput, level Y defines the peak throughput and level Z defines the full-load throughput. The loss-less throughput is the highest load at which the count of the output frames equals the count of the input frames. The peak throughput is the maximum throughput that can be achieved in spite of the losses. The full-load throughput is the throughput of the system at 100% load on input links. Note that the peak throughput may equal the loss-less throughput in some cases. Only frames that are received completely without errors are included in frame-level throughput computation. Partial frames and frames with CRC errors are not included. [Figure 3.1: Peak, loss-less and full-load throughput] 3.1.2. Units Throughput should be expressed in the effective bits/sec, counting only bits from frames excluding the overhead introduced by the ATM technology and transmission systems. This is preferred over specifying it in frames/sec or cells/sec. Frames/sec requires specifying the frame size. The throughput values in frames/sec at various frame sizes cannot be compared without first being converted into bits/sec. Cells/sec is not a good unit for frame-level performance since the cells aren't seen by the user. 3.1.3. Statistical Variations There is no need for obtaining more than one sample for any of the three frame-level throughput metrics. Consequently, there is no need for calculation of the means and/or standard deviations of throughputs. 3.1.4. Measurement Procedures Before starting measurements, a number of VCCs (or VPCs), henceforth referred to as “foreground VCCs”, are established through the SUT. Foreground VCCs are used to transfer only the traffic whose performance is measured. That traffic is referred as the foreground traffic. Characteristics of foreground traffic are specified in 3.1.5. The tests can be conducted under two conditions: · without background traffic; · with background traffic; Procedure without background traffic The procedure to measure throughput in this case includes a number of test runs. A test run starts with the traffic being sent at a given input rate over the foreground VCCs with early packet discard disabled (if this feature is available in the SUT and can be turned off). The average cell transfer delay is constantly monitored. A test run ends and the foreground traffic is stopped when the average cell transfer delay has not significantly changed (not more than 5%) during a period of at least 5 minutes. During the test run period, the total number of frames sent to the SUT and the total number of frames received from the SUT are recorded. The throughput (output rate) is computed based on the duration of a test run and the number of received frames. If the input frame count and the output frame count are the same then the input rate is increased and the test is conducted again. The loss-less throughput is the highest throughput at which the count of the output frames equals the count of the input frames. The input rate is then increased even further (with early packet discard enabled, if available). Although some frames will be lost, the throughput may increase till it reaches the peak throughput value. After this point, any further increase in the input rate will result in a decrease in the throughput. The input rate is finally increased to 100% of the input link rates and the full-load throughput is recorded. Before conducting the tests, it is recommended that the port clocks are synchronized or locked together; otherwise, an unstable delay may be observed. In case of instability, one solution is to reduce the maximum load to slightly below 100%. In this case, the load used should be reported Procedure with background traffic Measurements of throughput with background traffic are under study. 3.1.5. Foreground Traffic Foreground traffic is specified by the type of foreground VCCs, connection configuration, service class, arrival patterns, frame length and input rate. Foreground VCCs can be permanent or switched, virtual path or virtual channel connections, established between ports on the same network module on the switch, or between ports on different network modules, or between ports on different switching fabrics. A system with n ports can be tested for the following connection configurations: · n-to-n straight, · n-to-(n-1) full cross, · n-to-m partial cross, 1<=m<=n-1, · k-to-1, 1 Output Rate)] [Figure A.11: LILO Latency Calculation (Input rate Output Rate)] Figure A.12 illustrates the relationships between the user perceived performance and MIMO latency in two scenarios with continuous frames. In the first scenario, the input link rate is same as the output link rate. In the second scenario, the output is slower. The switch delay, as given by MIMO latency, is same in both cases; but the user perceived delay, as given by FILO latency, is different. For the case in Figure A.12b, FILO latency is worse. It can be observed that the user perceived delay depends upon input/output link speeds. On the other hand, network delay measured by MIMO latency is independent of link speeds. The difference between those two delays is the frame latency through a zero-delay switch. [Figure A.12 FILO Latency as User Perceived Delay] References: [1] CCITT Recommendation X.135: “ Speed of Service (Delay and Throughput) Performance Values for Public Data Networks when Providing International Packet Switched Service”, 1992 [2] S. Bradner, “Benchmarking Terminology for Network Interconnection Devices”, RFC 1242 [3] ITU-T Recommendation I.356, “B-ISDN ATM Layer Specification,” ITU-Study Group 13, Geneva, 1995 Appendix B: Methodology for Implementing Scalable Test Configurations B.1. Introduction In Sections 3.1.5 and 3.2.6 of the baseline text, a number of connection configurations have been presented for throughput and latency measurements. In most of the cases, these configurations require one traffic generators and/or analyzers for each port. Thus, the number of generators and/or analyzers increases as the number of ports increases. Since this equipment is rather expensive, it is desirable to define scalable configurations that can be used with a limited number of generators. Sections 3.1.7 and 3.2.7 present several scalable test configurations. However, one problem with scalable configurations is that there are many ways to set up the connections and measurement results could vary with the setup. In this appendix, a standard method for generating scalable configurations is defined. Since the methodology presented here applies to any number of traffic generators, it can be used for non-scalable (full-scale) test configurations as well. Performance testing requires two kinds of virtual channel connections (VCCs): foreground VCCs (traffic that is measured) and background VCCs (traffic that simply interferes with the foreground traffic). The methodology for generating configurations of both types of VCCs is covered in this appendix. The VCCs are formed by setting up connections between ports of the switch. The connections are internal through the switch fabric and external through wires or fibers, depending on the port technology. An external connection between two switch ports is referred in this appendix as a wire W. The methodology presented here has two phases. During the first phase the switch ports are connected externally by numbered wires as given in the section B.2. The second phase consists of setting up PVCs, i.e. internal connections, between appropriate ports as explain in section B.3. The sequence of concatenated connections (internal and external) is called a VCC Chain. For example, the VCC shown in Figure B.1 is formed by setting a VCC chain starting from P1 IN, passing through wires W1, W2, W3, which are internally connected, and ending at P1 OUT. P1 IN is connected to the generator and P1 OUT is connected to the analyzer. Each wire connects a pair formed by an output port and an input port, so W1 connects P2 OUT to P3 IN, W2 connects P4 OUT to P2 IN and W3 connects P3 OUT to P4 IN. This VCC chain is indicated by the notation P1-W1-W2-W3-P1. This notation implies a unique configuration of internal connections. In Figure B.1, external connections are shown by thick lines while the interval connections are shown by thin lines. This notation is followed throughout this appendix. Another possible configuration for this "n-to-n single generator scalable configuration" would be P1-W2-W1-W3-P1. For an n-port switch, there is a maximum of (n-1)! possible configurations that can implement this configuration. [Figure B.1 One out of six possible VCC chains that can implement the 4–to-4 straight configuration with a single generator.] The four-port switch shown in Figure B.1 consists of two modules with two ports each. The measured performance may depend upon the number of times the VCC chain passes from one module to the other and may be different for different configurations. At the end of this appendix, the pseudocode for a computer program is presented that allows generating a standardized port order for all connection configurations. This methodology (pseudocode) generally creates VCC chains that cross the modules as often as possible while still keeping the whole process simple. B.2. Implementation of External Connections The methodology for implementing the external connections consists of the following three steps: 1. Numbering the ports 2. Identifying the ports connected to generators and analyzers 3. Numbering the wires These steps are now explained. Step 1. Numbering the Ports: Consider a switch with several modules of different port types. The ports could be different in speed and/or technology. Each module may have a varying number of ports. For example, a switch may have two modules of eight and six 155 Mbps single-mode fiber ports, respectively, another module with eight 155 Mpbs UTP ports and a fourth module with six 25-Mbps UTP ports. In order to number these ports, the first step is to group the modules of the same port type, then generate a schematic of modules placed one below the other. The schematic should be drawn such that the modules inside the group are arranged in a decreasing order of number of ports. Then the switch ports are numbered sequentially inside the groups, column wise, starting from the top left corner of the schematic. Numbering of each group continues the numbering of the previous group. This port numbering helps in creating VCC chains that cross modules as often as possible. The port numbers obtained this way are represented by Pi in this appendix. Figure B.2 shows an example of port numbering. The modules are divided into three groups. The first group consists of 155-Mbps single-mode fiber modules, the second group consists of 155-Mbps UTP module, the third group consists of 25-Mbps UTP module. The ports of the first group are numbered sequentially along the column from P1 through P14 as shown in Figure B.2. The ports of the second group are then numbered sequentially as P15 through P22. The ports of the third group are numbered similarly as P23 through P28. [Figure B.2 Example of port numbering.] Step 2. Identifying the ports connected to the generators and/or analyzers: In general it is possible to design a scalable configuration for any given number of generators and analyzers. These can be connected to any input/output ports. However, the starting/ending port should be chosen in such a way to avoid the case of having only one port left over in a group. This is necessary because that port cannot be connected externally to any other port. This condition does not apply if a loopback is allowed by I.150 [1] respecting the bi-directional nature of VCs/VPs. Step 3. Numbering of Wires: After the selection of input and output ports, the remaining ports have to be connected in pairs formed by the output of one port and the input of another port. In connecting the port pairs and in numbering the respective wires the following rules are applied: 1. In each group start with the first output port available (that has not been externally connected yet). Increase the port number by one until a port is found whose input is available. This input is connected to the output of the ouput port chosen previously. If a scaleable configuration with loopback is desired and is allowed by I.150[5], the output of a port can be connected to the input of the same port. The rest of the methodology of this appendix applies to this case also. This is continued until all output ports have been connected to other input ports or to analyzers. 2.The external connections formed above are numbered sequentially as W1, W2, ...The only restriction is that the end of wire Wi and the beginning of W(i+1) must be different ports. If the next external connection begins with the same port as the end of the previous wire, the next external connection is skipped for this round and may be included in the next round. In general, several rounds may be required to number all the wires. The restriction also applies to the last wire. Thus, the port at end of the last wire should be different from the port at the beginning of the first wire. If this happens then swapping the labels of the last two wires may solve the problem. The following example illustrates this step. Consider the (n-1)-to-(n-1) straight configuration required for the background traffic in latency measurement. Suppose the switch has two modules with four ports each of the same speed and technology as shown in Figure B.3. Step 1. There is only one group, because all ports are of the same speed and technology. The ports are numbered as shown in Figure B.3. Step 2. For the foreground traffic: P2 IN is arbitrarily selected to be connected to the generator and P1 OUT is connected to the analyzer. For background traffic: P1 IN is connected to the generator and P2 OUT is connected to the analyzer. Step 3. The first output port available is P3 OUT. It is connected externally to P4 IN. P4 OUT is then connected to P5 IN, and so on. Finally, P8 OUT is connected P3 IN. Figure B.4 shows these external connections. The next step is to number the wires. The first wire connecting P3 OUT to P4 IN is labeled W1. The next wire connects P4 OUT to P5 IN. However, it cannot be labeled W2 because its input port is the same as the output port of the previous numbered wire W1. So this wire is skipped in this round. The next wire connecting P5 OUT to P6 IN is labeled as W2. The next wire connecting P6 OUT to P7 IN has to be skipped for the same reason. The wire connecting P7 OUT to P8 IN is labeled W3. The wire connecting P8 OUT to P3 IN is skipped. This finishes the first round. The unlabeled wires are considered in the second round. The first unlabeled wire connecting P4 OUT to P5 IN is labeled as W4. The other two remaining wires are labeled as W5 and W6, respectively. The only problem with the labels is that the ending port (P3) of the last wire W6 is the same as the beginning port of the first wire W1. To avoid this conflict, the labels on wire W5 and W6 are swapped. The resulting wire numbers are as shown in Figure B.4, which also shows the internal PVCs for a latency measurement test. The construction of these internal connections is explained next. [Figure B.3 Port numbering of a switch with 2 modules and 4 ports on each.] [Figure B.4 A 7-to-7 straight configuration with one-generator for the background traffic.] B.3. Implementation of Internal Connections. All VCC chains are represented by a three-dimensional matrix CH(i, j, k). Matrix index i represents the interconnection order among the wires. Index k represents the generator number and index j represents the chain number starting at that generator. The input ports of all VCC chains are represented by the matrix CHin(j, k), where j, k have the same meaning as explained above. In similar way the output ports of the VCC chains are represented by CHout(j, k). CHin(j, k) = Px (CHout(j, k) = Px) means that the input (output) part of port Px is used as input (output) port by the jth chain of generator k. One row CH(*, j, k) of the matrix represents a single VCC chain. For example, in Figure B.4, the VCC chain from generator #2 starts at P1, passes through wires W1, W2, W3, W4, W5, W6, and exits at P2, the matrix CH has the following entries: CH(1, 1, 2)=W1, CH(2, 1, 2)=W2, CH(3, 1, 2)=W3, CH(4, 1, 2)=W4, CH(5, 1, 2)=W5, CH(6, 1, 2)=W6 and CHin(1,2)=P1, CHout(1,2)=P2. The number of intermediate wires in the kth chain is denoted by NW(k). In the case of Figure B.4, NW(2) = 6. For latency measurements, two types of traffic are used: foreground and background. Therefore, at least two VCC chains are required. In order to avoid interference with the foreground traffic, the background VCC chains may or may not use the input and output port of the foreground traffic. If the background traffic does use these ports then it should only be in the directions opposite to that used by the foreground traffic. In our example, Figure B.4, the foreground traffic uses ports P2 IN and P1 OUT as input and output ports, respectively. The background traffic also uses these ports but in the opposite direction, i.e. P1 IN and P2 OUT as input and output ports, respectively. The remainder of this section is devoted to showing how to obtain scalable configurations the throughput and latency measurements. In all cases, the numbering of ports and wires discussed in Section B.2 is used. The algorithm to implement the internal connections consists of three simple rules: 1. The chains generally go from wire i to wire i+1 unless the wire has already been fully used by other chains. 2. After generating jth chain, (j+1)st chain can be generated simply by adding 1 to each wire index of the jth chain. 3. If there are multiple generators, each generator uses a contiguous subset of wires as source wires. Each generator needs as many source wires as the number of VCC chains starting from it. B.3.1 n-to-n Straight (Single Generator) This configuration is used for throughput as well as latency measurements. The scalable versions can be obtained as follows: a) Throughput measurements: For these tests, we need only a single chain starting from a single generator, i.e., k=1 and j=1. The chain starts from one port, goes through all other ports and exits from the starting port. Therefore, NW(1) is equal to n-1. Any port Px IN and Py OUT can be selected to be the input and output port, respectively. Figure B.5 illustrates this case for the 2-module 8-port switch. The VCC chain has CHin(1,1) = CHout(1,1) = P1. The application of the internal connection algorithm is simple. The wires CH(i,1,1) in the VCC chain are selected in numerically increasing order. The wires are included in VCC chain if they are not already used up. After reaching the last wire, the index (i) starts again from the beginning (from i=1). For CHin(1,1) = CHout(1,1) =P1, the VCC chain is: P1-W1-W2-W3-W4- W5-W6-W7-P1. [Figure B.5. The 8-to-8 straight configuration with one generator.] b) Latency Measurements: First, consider the case in which the background traffic uses the same input/output ports as the foreground traffic (but in the opposite direction). The background traffic passes through all other ports. Therefore, NW(1) is equal to n-2. The input and output ports coincide respectively with the output and input ports for the foreground. The foreground and background generators are labeled as generator 1 and generator 2, respectively. If CHin(1,1)=P2 and CHout(1,1)=P1, the foreground chain is P2-P1 and the background chain is P1-W1-W2-W3-W4-W5-W6-P2, having CHin(1,2)=P1, CHout(1,2)=P2.. This connection configuration was presented earlier in Figure B.4. Now, consider the case in which the background traffic does not use the input/output ports of the foreground. Generator 1 and 2 are used for background and foreground traffic, respectively. In this case, NW(1) is equal to n-3. CHin(1,1) and CHout(1,1) coincide and can be selected from any of the switch ports except CHout(1,2) and CHin(1,2). For example, the foreground can use the chain P2-P1 and background could use P1-W1-W2-W3-W4-W5-P1. Figure B.6 illustrates this case. [Figure B.6. The 6-to-6 straight configuration with one generator, where the foreground traffic does not share the port with background traffic.] B.3.2. n-to-n Straight (r Generators) This configuration implements the n-to-N straight configuration with r generators. a) Throughput Measurements: Each generator has one VCC chain. In all there are r VCC chains. Of the n ports, r ports are used as source/destination of these chains. The remaining ports are connected among themselves and their wires are divided among the generators as evenly as possible. Let p =mod(n-r, r) · For the first p VCC chains, the number of intermediate wires NW is equal to the quotient of (n-r)/r plus 1, i.e., (n-r)/r + 1 · For the remaining (r-p) VCC chains, NW is equal to the quotient of (n-r)/r, or (n-r)/r · For all VCC chains, the source/destination ports may be selected from any of the switch ports Px not selected by other VCC chains as a source or destination. As an example, consider the 8-port switch again. With r=3 generators, p equals mod(8-3, 3) = 2. So, the first two VCC chains have NW=(8-3)/3 + 1 = 2 intermediate wires, and the last chain has NW==(8-3)/3 = 1. Figures B.7 illustrates the implementation of the VCC chains for this case. First we select the source and destination ports: Port 1 is the input and output for the first chain, so CHin(1,1) = CHout(1,1) = P1 Port 2 is the input and output for the second chain, so CHin(1,2) = CHout(1,2) =P2 Port 3 is the input and output for the third chain, so CHin(1,3) = CHout(1,3) = P3 These selections have been made to avoid any overlap. After applying the first three steps of the methodology we obtain the configuration shown in Figure B.7. Then we apply the VCC chain algorithm. Let us start with the VCC chain having port 1 as the source. The first available wire is W1, so CH(1,1,1)=W1, then CH(2,1,1)=W2. This VCC chain has two intermediate wires and so it is now complete. Now we continue with the VCC chain starting at port P2. The next available wire is W3 (because W1 and W2 are fully occupied by the previous VCC chain). So CH(1,1,2)=W3, and then CH(2,1,2)=W4. Similarly, for the third chain, CH(1,1,3)=W5. This VCC chain has only one intermediate wire. The VCC chain implementation is complete. [Figure B.7 Implementation of the 8-to-8 straight configuration with 3 generators .] b) Latency Measurements: Consider the case with the background traffic using the foreground ports in the opposite direction. The remaining n-1 ports are connected among themselves and their wires are evenly divided among the r background VCC chains. Let p =mod(n-r-1, r) · For the first p VCC chains, NW is equal to the quotient of (n-r-1)/r plus 1, i.e., (n-r-1)/r + 1 · For the remaining (r-p) VCC chains, NW is equal to the quotient of (n-r-1)/r, or (n-r-1)/r · For one of VCC chains of the background traffic, the input and output ports coincide with output and input port for the foreground traffic, respectively. · For the other VCC chains, the input and output ports can be selected from any of the switch ports Px not selected by other VCCs After applying the first three steps of the methodology, we obtain the configuration shown in Figure B.8. Ports P1 and P2 are used by the foreground traffic as output and input ports, respectively. [Figure B.8 Implementation of the 7-to-7 straight configuration with 3 generators for background traffic in latency measurement.] Ports P1 and P2 will be used as input and output ports (respectively) by one of the background VCC chains. The other two generators will use port P3 and P4 as the input and output ports, respectively. For the first VCC chain, NW(1) =2 and for the other two VCC chains NW(2) = NW(3) =1. The chains are: P1-W1-W2-P2, P3- W3-P3, and P4-W4-P4. The configuration for the case when the background traffic does not share the ports with the foreground can be generated by the above procedure by considering the switch having only n-2 ports. B.3.3. n-to-m Partial Cross (r Generators) This is a generalization of n-to-m partial cross with 1 generator presented in the baseline. The discussion here applies also for r=1. Also, by appropriately setting r, one can obtain non- scalable (basic) configurations. a) Throughput Measurements: This configuration has m*r VCC chains originating from r, where each generator originates m VCC chains. Each has a load of 1/mth of the generator. Each intermediate wire has exactly m of these streams flowing through it. Again, the wires are evenly divided among the chains. However, since each chain uses only a part of the wire's capacity, the wires can be used by other chains even from other generators as well. Let p =mod(n-r, r) · For the first p VCC chains, the number of intermediate ports NW is equal to the quotient of (n-r)/r plus 1, i.e., (n-r)/r + 1 · For the remaining (r-p) VCC chains, NW is equal to the quotient of (n-r)/r, or (n-r)/r · For all m VCC chains, input and output ports may be selected from any of the switch ports Px not selected by other VCC chains. After applying the first three steps of the methodology we obtain the configuration shown in Figure B.9 for the case of 8-to-2 partial cross with 2 generators. Note that in this case we have exchanged the number between wires W5 and W6. This is done because the output of previous wire W6, P3 coincided with the input of wire 1. So, going from W6 to W1 would have required a loopback on P3. In this case, p =mod(8-2,2) = 0. So, the VCC chains of both generators have (8-2)/2 = 3 intermediate wires. [Figure B.9 Implementation of 8-to-2 partial cross configuration with 2 generators for foreground traffic] Both of the VCC chains of the first generator start and end at port P1, so: CHin(1,1) = CHout(1,1) = CHin(2,1) = CHout(2,1) = P1. Similarly for the two VCC chains of the other generator: CHin(1,2) = CHout(1,2) = CHin(2,2) = CHout(2,2) = P2. First we divide the wires among the two generators. The first generator gets W1, W2, and W3. The second generator gets W4, W5, and W6. The first chain of the first generator is simply P1-W1-W2-W3-P1. The first chain of the second generator is P2-W4-W5-W6-P2. The second chain from the first generator is obtained by shifting the intermediate ports of the first chain. Therefore, the chain is P1-W2-W3-W4-P1. Note that this chain is sharing wire W6 of the other generator since each chain uses only half the capacity. The second chain of the second generator is again obtained by shifting: P2-W5-W6-W1-P2. b)Latency measurements: Again we consider only the case of background traffic sharing the foreground ports in the opposite direction. Excluding the foreground port, the remaining n-1-r ports connected among themselves and their wires are evenly divided among the r generators. Let p =mod(n-r-1, r) · For all VCCs of the first p generators NW is equal to the quotient of (n-r)/r plus 1, i.e., (n-r)/r + 1 · For all VCCs of the remaining (r-p) generators, NW is equal to the quotient of (n-r)/r, or (n-r)/r · For all m VCCs of only one generator, the input and output ports coincide with the output and input ports of the foreground traffic, respectively. · For all m VCCs of all other generators, the input and output ports can be selected from any of the switch ports Px not selected by other generators. An example of this case is shown in Figure B.10. In this case, n=8, r=2. This gives p=mod(8-2-1,2) = 1. Therefore, NW(1)=3 and NW(2)=2. The VCC chains of the first generator uses ports P1 and P2 in opposite directions of the foreground traffic. The VCC chains of the second generator will use port P3 as the source and destination. The chains of the first generator are: P1-W1-W2-W3-P2 and P1-W2- W3-W4-P2. The chains of the second generator are: P3-W4-W5-P3, P3-W5-W1-P3. [Figure B.10 Implementation of 7-to-2 partial cross configuration with 2 generators for background traffic in latency measurements.] Table B.1 summarizes the values for number of intermediate ports in various configurations of this section B.3. These values are used in the pseudocode of Section B.4. [Table B.1 Parameter values used in the algorithm to creating VCC chains for different configurations.] B 4. Internal Connection Algorithm for creating VCC Chains. The following algorithm can be used to create VCC chains for different connection configurations and is based on the definitions given in section B.2. and the characteristics specified in section B.3 and summarized in Table B.9. · NW(k) denotes the number of intermediate wires for the VCC chains of the kth generator. These values are specified in B.2 · TNW denotes the total number of wires. · W(f) denotes the fth wire · CH(i, j, k) denotes the ith intermediate wire of the jth VCC chain of the kth generator · The function mod*(x, n) is equal to mod(x, n) except for the cases where mod(x, n) is equal to zero, where the function is equal to n f = 1; for (k = 1 to r, step 1) { if(k>1) f = 1 + Sum(NW(d)) for d=1 to k-1 for (j = 1 to m, step 1) { if(j>1) f = mod*(CH(1,j-1,k)+1, TNW); for (i =1 to NW(k), step 1) { CH(i,j,k)=W(f); f = mod*(f+1, TNW); } end for i } end for j } end for k. References: [1] ITU Recommendation I.150, "Integrated Services Digital Network (ISDN) General Structure - B-ISDN Asynchronous Transfer Mode Functional Characteristics," ITU-T, Geneva, 1995.