*********************************************************************** ATM Forum Document Number: ATM_Forum/97-0831 *********************************************************************** Title: GFR -- Providing Rate Guarantees with FIFO Buffers to TCP Traffic. *********************************************************************** Abstract: In this contribution we present analysis and simulation results on controlling TCP rates by buffer allocation. When segment loss is low, TCP throughput depends primarily on the TCP window size, and the round trip time of the connection. As a result, it is possible to control TCP rates with FIFO queuing. We present a buffer management policy called Differential Fair Buffer Allocation that provides loose rate guarantees to SACK TCP sources when the sum of the MCRs is significantly lower than the network capacity. We study the performance of differential fair buffer allocation by simulation. ************************************************************************ Source: Rohit Goyal, Raj Jain, Sonia Fahmy, Bobby Vandalore, Shiv Kalyanaraman Department of CIS, The Ohio State University (and NASA) 395 Dreese lab, 2015 Neil Ave, Columbus, OH 43210-1277 Phone: 614-292-3989, Fax: 614-292-2911, Email: {goyal,jain}@cse.wustl.edu Sastri Kota Pradeep Samudra, Lockheed Martin Telecom/Astrolink Broadband Network Lab 1272 Borregas Avenue, Samsung Electronics Co. Ltd. Bldg B/551 O/GB - 70 Samsung Telecom America, Inc. Sunnyvale, CA 94089 1130 E Arapaho, Richardson, TX 75081 Email: sastri.kota@lmco.com email: psamudra@telecom.sna.samsung.com ************************************************************************* Date: September 1997, Paris, France ************************************************************************* Distribution: ATM Forum Technical Working Group Members (AF-TM) ************************************************************************* Notice: This contribution has been prepared to assist the ATM Forum. It is offered to the Forum as a basis for discussion and is not a binding proposal on the part of any of the contributing organizations. The statements are subject to change in form and content after further study. Specifically, the contributors reserve the right to add to, amend or modify the statements contained herein. ************************************************************************* A postscript version of this contribution including all figures and tables has been uploaded to the ATM forun ftp server in the incoming directory. It may be moved from there to the atm97 directory. The postscript version is also available on our web page as: http://www.cse.wustl.edu/~jain/atmf/a97-0831.htm ************************************************************************* 1 Introduction Guaranteed Frame Rate (GFR) is expected to be used between ATM edge devices. For example, IP routers separated by an ATM network can have GFR VCs set up between them for data transfer. Figure 1 illustrates such a case where the ATM cloud connects several LANs and routers. ATM end systems may also establish GFR VCs for connections that can benefit from a minimum throughput guarantee. Figure 1: Use of GFR in ATM connected LANs In the July 1997 meeting, several proposals were made [3, 4, 8] to provide rate guarantees to TCP sources with FIFO queuing in the network. Per-VC scheduling was deemed necessary to provide rate guarantees to TCP connections. However, all these studies were performed at high target network utilization, i.e., most of the network bandwidth was allocated to the GFR VCs. It was mentioned in the July meeting, that rate guarantees should be achievable with a FIFO buffer for low rate allocation. Also, since routers would use GFR VCs, each VC would multiplex many TCP connections through it. All previous studies had examined TCP traffic with a single TCP per VC. Per-VC buffer management for such cases reduces to per-TCP buffer management. For VCs with several aggregated TCPs, per-VC control is unaware of each TCP in the VC. Moreover, aggregate TCP traffic characteristics and control requirements may be different from those of single TCP streams. In this contribution, we study two main issues: o Providing minimum rate guarantees to TCP like adaptive traffic with FIFO buffer for low rate allocations. o Traffic management and control of VCs with aggregate TCP flows. Section 2 discusses the behavior of TCP traffic with controlled windows. This provides insight into con- trolling TCP rates by controlling TCP windows. Section 3 describes the effect of buffer occupancy and thresholds on TCP throughput. Section 4 presents a simple threshold-based buffer management policy called differential fair buffer allocation to provide TCP throughputs in proportion to buffer thresholds for low rate allocations. This scheme assumes that each GFR VC may carry multiple TCP connections. We then present preliminary simulation results with differential buffer allocation. In our simulation and analysis, we use SACK TCP [10] as the TCP model. 2 TCP Behavior with Controlled Windows TCP uses a window based mechanism for flow control. The amount of data sent by a TCP connection in one round trip is determined by the window size of the TCP connection. The window size is the minimum of the sender's congestion window (CWND) and the receiver's window (RCVWND). As a result, TCP rate can be controlled by controlling the window size of the TCP connection. Many TCP sources tend to be greedy so that a window limit might not be enforceable by the network to control the TCP rate. TCP sources respond to packet loss by reducing the source congestion window by one-half, and then increasing it by one segment size every round trip. As a result, the average TCP window can be controlled by packet discard at specific CWND values. Figure 2 shows how the source TCP congestion window varies when a single segment is lost at a particular value of the congestion window. The figure is the CWND plot of the simulation of the configuration shown in Figure 3 with a single TCP source (N=1). The figure shows four different values of the window at which a packet is lost. The round trip latency for the connection is 30 ms. For window based flow control, the throughput (in Mbps) can be calculated from the average congestion window (in Bytes) and the round trip time (in seconds) as: 8 x 10-6 x Average CWND (bytes) Throughput (Mbps)= ____________________________ Round Trip Time (secs) The factor 8x10-6 converts the throughput from bytes per sec to Megabits per sec. The average TCP CWND during the linear increase phase can be calculated as Ti=1CWNDmax =2+ Max Segment Sizex i CWND avg= ___________________________________ T where T is the number of round trip times for the congestion window to increase from CWND max=2 to CWND max . Note that this equation assumes that during the linear increase phase, the TCP window increases by one segment every round trip time. However, when the TCP delayed acknowledgment option is set, TCP might only send an ACK for every two segments. In this case, the window would increase by 1 segment every 2 RTTs. From Figure 2, the average congestion windows in the linear phases of the four experiments are approxi- mately 91232 bytes, 181952 bytes, 363392 bytes and over 600000 bytes. As a result, the average calculated throughputs from the above equation are 24.32 Mbps, 48.5 Mbps, 96.9 Mbps, and 125.6 Mbps (126 Mbps is the maximum possible TCP throughput for a 155.52 Mbps link with 1024 byte TCP segments). The respective throughputs obtained from the simulations of the four cases are 23.64 Mbps, 47.53 Mbps, 93.77 Mbps and 25.5 Mbps. The throughput values calculated from the average congestion windows are close to those obtained by simulation. From this, it appears that controlling the TCP window so as to maintain a desired average window size should enable the network to control the average TCP throughput. 3 TCP Window Control using Buffer Management In the previous section, an artificial simulation was presented where the network controlled the TCP rate by dropping a packet every time the TCP window reached a particular value. In practice, the ATM network [Figure 2: Single TCP Congestion Window Control] knows neither the size of the TCP window, nor the round trip time of the connection. A switch can use per-VC accounting of the TCP packets in its buffer to estimate the bandwidth used by the connection. In a FIFO buffer, the output rate of a connection is determined by the number of packets of the connection in the buffer. Let iand xibe the output rate and the buffer occupancy respectively of V Ci. Let and x be the total output rate and the buffer occupancy of the FIFO buffer respectively. Then, by the FIFO principle, in steady state, mu_i= xi*mu ----- x or xi/x --- = 1 mu_i/mu If the buffer occupancy of every active VC is maintained at a desired threshold, then the output rate of each VC can also be controlled. In other words, if a VC always has xi=x cells in the buffer, its average output rate will be at least xi=x. Adaptive flows like TCP respond to segment loss by reducing their congestion window. A single packet loss is sufficient to reduce the TCP congestion window by one-half. Consider a drop policy that drops a single TCP packet from a connection every time the connection's buffer occupancy crosses a given threshold. The drop threshold for a connection determines the maximum size to which the congestion window is allowed to grow. Because of TCP's adaptive nature, the buffer occupancy reduces after about 1 RTT. The drop policy drops a single packet when the TCP's buffer occupancy crosses the threshold, and then allows the buffer occupancy to grow by accepting the remainder of the TCP window. On detecting a loss, TCP remains idle for about one-half RTT, during which the buffer occupancy decreases below the threshold. Then the TCP window increases linearly (and so does the buffer occupancy), and a packet is again dropped when the buffer occupancy crosses the threshold. In this way, TCP windows can be controlled quite accurately to within one round trip time. As a result, the TCP's throughput can also be controlled by controlling the TCP's buffer occupancy. Figure 3: N source configuration We performed simulations of the TCP configuration in Figure 3 with fifteen TCP sources. Each TCP source was a separate UBR VC. Five different buffer thresholds (ri) were selected, and each of three TCP's had the same buffer threshold. Table 1 lists the buffer thresholds for the VC's in the FIFO buffer of the switches. We chose four different sets of thresholds as shown by the threshold columns. The last row in the table shows the total buffer allocated (r) to all the TCP connections for each simulation. The total buffer size was large (48000 cells) so that there was enough space for the buffers to increase after the single packet drop. The buffer thresholds were selected in proportion to the SCR of the connection. For each connection, the ratio of the threshold to the total buffer size was proportional to the ratio of the SCR to the PCR. For a 155.52 Mbps link (= 353200 cells per sec approximately), and a buffer size of 48000 cells, the total target utilizations were 29%, 43%, 57%, 71% respectively. _____Table_1:_Fifteen_TCP_buffer_thresholds_ _TCP_number____Threshold_per_TCP_(cells)_(ri) 1-3 305 458 611 764 4-6 611 917 1223 1528 7-9 917 1375 1834 2293 10-12 1223 1834 2446 3057 _13-15_________1528___2293__3057___3822__ _Total_threshold13752_20631_27513__34392__ Table 2 shows the average throughput obtained per TCP in each threshold category for the four simulations. The TCP throughputs were averaged over each category to reduce the effects of randomness. The last row of the table shows the total throughput obtained in each simulation. The proportion of the buffer usable by each TCP (ri=r) before the single packet drop should determine the proportion of the throughput achieved by the TCP. Table 3 shows the ratios (i=)=(ri=r) for each simulation. All ratios are close to 1. This indicates that the TCP throughputs are indeed proportional to the buffer allocations. The variations (not shown in the table) from the mean TCP throughputs increased as the total buffer thresholds increased (from left to right across the table). This is because the TCPs suffered a higher packet loss due to the reduced room to grow beyond the threshold. Thus, very high buffer utilization produced more variation in achieved rate (last column of Table 3), whereas in low utilization cases, the resulting throughputs were in proportion to the buffer usage. Figure 4 shows the congestion windows of one TCP from each TCP category for each of the four simulations. For each simulation, the slopes of the graphs during the linear increase are approximately the same for each TCP, i.e., for a given simulation, the rate of increase of CWND is the same for all TCPs. We know that TCP windows increase by 1 segment every round trip time. Thus, we can conclude that for a given simulation, TCPs sharing the FIFO buffer experience similar queuing delays regardless of the individual per- connection thresholds at which their packets are dropped. This is because, if all TCP's buffer occupancies are close to their respective thresholds, then when a packet arrives at the buffer, it is queued behind cells ________Table_2:_Fifteen_TCP_throughputs____ _TCP_number_______Throughput_per_TCP_(Mbps)_ 1-3 2.78 2.83 2.95 3.06 4-6 5.45 5.52 5.75 5.74 7-9 8.21 8.22 8.48 8.68 10-12 10.95 10.89 10.98 9.69 _13-15___________14.34__13.51__13.51__13.93_ _Total_throughput125.21_122.97_125.04_123.35__ Table_3:_Fifteen_TCP_buffer:throughput_ratio _TCP_number_______(i=)=(ri=r)____ 1-3 1.00 1.03 1.02 1.08 4-6 0.98 1.01 1.03 1.04 7-9 0.98 1.00 1.00 1.02 10-12 0.98 0.99 0.98 0.88 _13-15________1.02_0.98_0.97_1.01_ Figure 4: 15 TCP rate control by packet drop from (threshold) packets, regardless of the connection to which is belongs. Consequently, each TCP experiences the same average queuing delay. However, as the total buffer threshold increases, the round trip time for each TCP increases because of the larger total queue size. The larger threshold also results in a larger congestion window at which a packet is dropped. A larger congestion window means that TCP can send more segments in one round trip time. But since the round trip time also increases proportionally to the increase in CWND (because all 15 TCPs are bottlenecked at the first switch), the average throughput achieved by a single TCP remains almost the same (see table 2) across the simulations. The following list summarizes the discussion so far: 1. TCP throughput can be controlled by controlling its congestion window, which, in turn, can be controlled by setting buffer thresholds to drop packets. This statement clearly assumes that in cases where the offered load is low, and a queue is never built up, then the TCP is allowed to use as much capacity as it can. 2. With a FIFO buffer, the average throughput achieved by a connection is proportional to the fraction of the buffer occupancy that is consumed by the connection's cells. 3. As long as the fraction of buffer occupancy of a TCP can be controlled, its relative throughput is independent of the total number of packets in the buffer, and depends primarily on the fraction of packets of that TCP in the buffer. 4. At a very high buffer utilization, packets may be dropped due to buffer unavailability. This results in larger variations in TCP throughputs. At very high thresholds, the queuing delay also increases significantly, and may cause the TCP sources to timeout. 5. At very low buffer thresholds (high loss rates), TCP sources become unstable and tend to timeout. Also, very low buffer occupancies result in low network utilization. Since TCP can maintain a flow of 1 CWND worth of packets each round trip time, a total buffer occupancy of 1 bandwidth-delay product should provide good utilization. 4 Differential Fair Buffer Allocation The differential fair buffer allocation scheme uses the ideas from the previous sections to provide soft rate guarantees to SACK-TCP like adaptive traffic on ATM connections under low network load conditions. Simulation results of heterogeneous TCP and non-TCP environments will be presented in a future contri- bution. The policy assumes that multiple TCP connections are multiplexed on a single VC. In this section we present the preliminary design and simulation results of differential fair buffer allocation. A parameter study and sensitivity analysis will be presented in a future contribution. We assume a model in which TCPs may be merged into a single VC, in which case, the cells of different frames within a VC are not interleaved. This allows the network to drop frames without having to identify the source that generated the frame. The switch output port consists of a FIFO buffer for the UBR class with the following attributes: o K: Buffer size in cells. o R: Congestion threshold in cells (0 R K). EPD is performed when buffer occupancy is greater than R. Figure 5: Drop behavior of differential fair buffer allocation o Wi: Weight of VCi (for example Wi = MCRi/(Total UBR capacity)). Wi < 1 o Ri: Threshold for VCi. Here we use Ri = Wi*R o X: Number of cells in the buffer. o Xi: Number of cells of VCi in the buffer. o Z: Parameter (Z > 1). We selected Z = 1.5 in our simulations. o u: Uniform(0,1) random number. When the first cell of a frame arrives at the buffer, if the number of cells (Xi) of VCi in the buffer is less than its threshold (Ri), then the cell and frame is accepted into the buffer. If Xi is greater than Ri, and if the total buffer occupancy (X) is greater than the buffer threshold (R), or if Xi is greater than Z x Ri, then the cell and frame are dropped (EPD). Thus Z specifies a maximum per-VC buffer occupancy during congestion periods. Under low or mild load conditions, R x Z should be large enough to buffer a burst of cells without having to perform EPD. If the Xi is greater than Ri, and X is less than R, then the cell/frame are dropped in a probabilistic manner. The probability of frame drop depends on how much Xi is above Ri, as well as the weight (Wi) of the connection. As Xi increases beyond Ri, the probability of drop increases. Also, the drop probability is higher for connections with a higher threshold. This is because, TCP flows with higher windows (due to higher thresholds) are more robust to packet loss than TCP flows with lower windows. Moreover, in the case of merged TCPs over a single VC, VCs with a high threshold are likely to carry more active TCP flows than those with a low threshold. As a result, a higher drop probability is more likely to hit more TCP sources and improve the fairness within a VC. The frame is dropped with a probability Xi- Ri P{drop} = Wix __________ Z x Ri - Ri The drop probability may be further scaled depending on the desired level of control. In addition, if Xi is greater than Ri, then all tagged frames may also be dropped. Tagging support is not yet tested for this drop policy. The resulting algorithm works as follows. When the first cell of a frame arrives: IF (Xi < Ri) THEN ACCEPT CELL AND FRAME ELSE IF ((X < R) AND (Cell NOT Tagged) AND (u > Wi*(Xi - Ri)/(Ri(Z-1)))) THEN ACCEPT CELL AND FRAME ELSE DROP CELL AND FRAME ENDIF Figure 6: N source VC merge configuration Figure 6 illustrates the 15 TCP configuration in which groups of three TCPs are merged into 1 single VC. Each local switch merges the 3 TCPs into a single VC. The backbone link has 5 VCs going through it, each with 3 TCPs. The local switches ensure that the cells of frames within a single VC are not interleaved. All the switches implement the differential fair buffer allocation policy described above. The local switches use a separate threshold for each TCP, while the backbone switches use a different threshold for each VC. Table_4:_Differential_FBA_thresholds_ _VC_number___Threshold_(cells)_ 1 152 305 611 2 305 611 1223 3 458 917 1834 4 611 1223 2446 _5__________764___1528_3057_ _Total______2290__4584_9171__ Table_5:_Differential_FBA_simulation_results _VC_number__Ratio_(i=)=(ri=r)_ 1 1.04 1.01 1.16 2 1.05 1.02 1.06 3 0.97 0.99 1.05 4 0.93 1.00 1.13 _5__________1.03_0.99__0.80__ We simulated the 15 merged TCP configuration with 3 different buffer threshold sets. The parameter Z was set to 1.5, therefore, EPD was performed for each VC when its buffer occupancy was 1.5xR. Table 4 shows the thresholds used for each VC at the first bottleneck switch. Table 5 shows the ratio (i=)=(ri=ri) for each VC for the configuration in Figure 6 and the corresponding thresholds. In all cases, the achieved link utilization was almost 100%. The table shows that TCP throughputs obtained were in proportion to the buffers allocated (since most of the ratios in table 5 are close to 1). The highest variation (not shown in the table) was seen in the last column because of the high threshold values. In our simulations, the maximum observed queue sizes in cells in the first backbone switch (the main bottleneck) were 3185, 5980 and 11992 respectively. The total allocated buffer thresholds were 2230, 4584 and 9171 cells for the experiments. For a given buffer size, the allocated thresholds represent the SCRs of the connections. At higher rate allocations, the drop policy cannot provide tight bounds on throughput. Ideally, differential buffer allocation should allocate all excess capacity in proportion to the buffer alloca- tions. The results in Table 5 seem to suggest so, especially for low allocations. In all cases, the total TCP throughput is over 95% of the total link capacity. However, excess bandwidth may not always be allocated linearly in proportion to the buffer allocations. The fairness properties of differential fair buffer allocation are a topic of further study. 5 Summary and Future Work In this contribution, we have used FIFO buffers to attempt to control SACK TCP rates by differential buffer allocation. An optimal set of thresholds should be selected that is high enough to provide sufficient network utilization, and is low enough to allow stable operation. The achieved TCP throughputs are in proportion to the fraction of the average buffer occupied by the VC. Much work remains to be done to further modify differential fair buffer allocation to work with a variety of configurations. In particular: o We have only studied the performance of differential fair buffer allocation with SACK TCP. Its performance with heterogeneous TCPs is a topic of further study. o We have not studied the effect of non adaptive traffic (like UDP) on the drop policy. It appears that for non adaptive traffic, the thresholds must be set lower than those for adaptive traffic (for the same MCR), and the dropping should be more strict when the buffer occupancy crosses the threshold. o More simulation configurations need to be studied, including TCP with different RTTs and topologies. o In this contribution we have not studied the effect of network based tagging in the context of GFR. In the strict sense, GFR only provides a low CLR guarantee to the CLP=0 cell stream i.e., the cells that were not tagged by the source and passed the GCRA conformance test. However, when source (this could be a non-ATM network element like a router) based tagging is not performed, it is not clear if the CLP0 stream has any significance over the CLP1 stream. Moreover, network tagging is an option that must be signaled during connection establishment. References [1]ATM Forum, "ATM Traffic Management Specification Version 4.0," April 1996, ftp://ftp.atmforum.com/pub/approved-specs/af-tm-0056.000.ps [2]Debashis Basak, Surya Pappu, "TCP over GFR Implementation with Different Service Disciplines: A Simulation Study" [3]Debashis Basak, Surya Pappu, "GFR Implementation Alternatives with Fair Buffer Allocation Schemes," ATM Forum 97-0528. [4]Olivier Bonaventure, "A simulation study of TCP with the proposed GFR service category," DAGSTUHL Seminar 9725, High Performance Networks for Multimedia Applications, June 1997, Germany [5]John B. Kenney, "Satisfying UBR+ Requirements via a New VBR Conformance Definition," ATM FORUM 97-0185. [6]John B. Kenney, "Open Issues on GFR Frame Discard and Frame Tagging," ATM FORUM 97-0385. [7]Juha Heinanen, and Kalevi Kilkki, "A fair buffer allocation scheme," Unpublished Manuscript. [8]R. Goyal, R. Jain et.al, "Simulation Experiments with Guaranteed Frame Rate for TCP/IP Traffic," ATM Forum 97-0607. [9]R. Goyal, R. Jain, S. Kalyanaraman, S. Fahmy and Seong-Cheol Kim, "UBR+: Improving Perfor- mance of TCP over ATM-UBR Service," Proc. ICC'97, June 1997. [10]R. Goyal, R. Jain et.al., "Selective Acknowledgements and UBR+ Drop Policies to Improve TCP/UBR Performance over Terrestrial and Satellite Networks," To appear, Proc. IC3N'97, September 1997. 1 [11]Roch Guerin, and Juha Heinanen, "UBR+ Service Category Definition," ATM FORUM 96-1598, December 1996. [12]Roch Guerin, and Juha Heinanen, "UBR+ Enhancements," ATM FORUM 97-0015, February 1997. 1_____________________________________ All our papers and ATM Forum contributions are available from http://www.cse.wustl.edu/~jain