***********************************************************************

      ATM Forum Document Number: ATM_Forum/96-1269

      ***********************************************************************

      Title: Performance of TCP over UBR+

      ***********************************************************************

      Abstract:
      ATM  switches  respond to UBR congestion by dropping cells when their
      buffers become full. TCP connections using the UBR service experience
      low  throughput and low fairness. For 100% TCP throughput each switch
      needs buffers equal to the sum of the window sizes  of  all  the  TCP
      connections. Intelligent drop policies can improve the performance of
      TCP over UBR with limited buffers.   The  UBR+  service  consists  of
      enhancements  to UBR for intelligent drop. We found that Early Packet
      Discard (EPD) improves throughput  but  does  not  improve  fairness.
      Selective  packet  drop  based  on  per-connection  buffer  occupancy
      improves fairness. The Fair Buffer Allocation scheme further improves
      both throughput and fairness.
      ***********************************************************************

      Source:
      Rohit Goyal, Raj Jain, Shiv Kalyanaraman, Sonia Fahmy
      The Ohio State University
      Department of CIS
      Columbus, OH 43210-1277
      Phone: 614-292-3989, Fax: 614-292-2911, Email: Jain@ACM.Org

      Seong-Cheol Kim
      Samsung Electronics Co. Ltd.
      Chung-Ang Newspaper Bldg.
      8-2, Karak-Dong, Songpa-Ku
      Seoul, Korea 138-160
      Email: kimsc@metro.telecom.samsung.co.kr

      The presentation of this contribution at the ATM Forum is
      sponsored by NASA.

      ***********************************************************************

      Date: October 1996

      ***********************************************************************

      Distribution: ATM Forum Technical Working Group Members (AF-TM)

      ***********************************************************************
      Notice:  This contribution has been prepared to assist the ATM Forum.
      It is offered to the Forum as a basis for discussion  and  is  not  a
      binding   proposal   on   the   part   of  any  of  the  contributing
      organizations. The statements are  subject  to  change  in  form  and
      content  after  further  study.   Specifi-  cally,  the  contributors
      reserve the right to add to, amend or modify the statements contained
      herein.
      ***********************************************************************

      A postscript version of this contribution including all figures and
      tables has been uploaded to the ATM forun ftp server in the incoming
      directory. It may be moved from there to the atm96
      directory. The postscript version is also available on our web page as:

      http://www.cse.wustl.edu/~jain/atmf/atm96-1269.ps
      (814 kB)
      or PKzip compressed
      http://www.cse.wustl.edu/~jain/atmf/atm96-1269.zip
      (114 kB)


      1 Introduction
      The  Unspecified  Bit Rate (UBR) service provided by ATM networks has
      no explicitcongestion control mechanisms [8]. However, it is expected
      that many TCP implementations will use the UBR service category.  TCP
      employs a window based end-to-end  congestion  control  mechanism  to
      recover from segment loss and also avoid congestion collapse. Several
      studies have been done to analyze the performance of TCP over the UBR
      service  [1,  4,  11].  TCP  sources  running  over ATM switches with
      limited buffers experience low throughput and high unfairness [2,  3,
      7, 10].

      Studies  have  shown  that  intelligent drop policies at switches can
      improve throughput of transport connections.   Early  Packet  Discard
      (EPD) [1] proposed by Romanov and Floyd has been shown to improve TCP
      throughput but not fairness [7]. A policy  for  selective  cell  drop
      based   on  per-VC  accounting  can  be  used  to  improve  fairness.
      Enhancements that perform  intelligent  cell  drop  policies  at  the
      switches  need  to  be  developed  for UBR to improve transport layer
      throughput and fairness.

      Heinanen and Kilkki [6] have  designed  a  drop  policy  called  Fair
      Buffer  Allocation  (FBA)  that  attempts  to  improve fairness among
      connections. The FBA scheme selectively drops complete packets from a
      connection  based  on  the  connection's buffer occupancy. The scheme
      uses a FIFO buffer at the switch, and performs some per-VC accounting
      to  keep track of each VC's buffer occupancy. FBA tries to allocate a
      fair share of bandwidth to competing sources by managing  the  amount
      of buffer space used by each connection.

      In  this contribution, we analyze several enhancements to the ATM UBR
      service category. This  enhanced  service  category  is  called  UBR+
      because  it  maintains  the simplicity of UBR and performs congestion
      control without explicit feedback control mechanisms.  UBR+  improves
      throughput  and  fairness  by  intelligent  cell  drop  policies.  We
      describe  the  performance  of  TCP  over   UBR   and   its   various
      enhancements.

      We  first  discuss  the  congestion  control  mechanisms  in  the TCP
      protocol  and  explain  why  these  mechanisms  can  result  in   low
      throughput  during  congestion.  We then describe our simulationsetup
      used for all our experiments and define our performance  metrics.  We
      present  the  performance of TCP over vanilla UBR and explain why TCP
      over vanilla UBR results in poor performance. We  then  describe  the
      Early  Packet  Discard  scheme  and present simulation results of TCP
      over UBR with EPD. Next, we present a simple  selective  drop  policy
      based  on  per-VC  accounting.  This is a simpler version of the Fair
      Buffer Allocation scheme as  proposed  by  Heinanen  and  Kilkki.  We
      present  an analysis of the operation of these schemes and the effect
      of their parameters. We also provide guidelines for choosing the best
      FBA parameters.


      2 TCP congestion control

      TCP  relies  on  a  window based protocol for congestion control. TCP
      connections provide end-to-end flow control to limit  the  number  of
      packets in the network.  The flow control is enforced by two windows.
      The receiver's window (RCVWND) is enforced by the receiver as measure
      of  its  buffering  capacity. The Congestion Window (CWND) is kept at
      the sender as a measure of the capacity of the  network.  The  sender
      sends  data  one  window  at  a  time,  and cannot send more than the
      minimum of RCVWND and CWND into the network.

      The TCP congestion control scheme consists of the  "Slow  Start"  and
      "Congestion Avoidance" phases. The variable SSTHRESH is maintained at
      the source to distinguish between the two phases. The  source  starts
      transmission   in  the  slow  start  phase  by  sending  one  segment
      (typically 512 Bytes) of data, i.e., CWND = 1 TCP segment.  When  the
      source  receives  an  acknowledgement  for  a new segment, the source
      increments CWND by 1. Since the time between the sending of a segment
      and  the  receipt  of its ack is an indication of the Round Trip Time
      (RTT) of the connection, CWND is doubled every round trip time during
      the  slow  start  phase.  The  slow  start phase continues until CWND
      reaches SSTHRESH (typically set to 64K bytes) and then the congestion
      avoidance  phase  begins.  During  the congestion avoidancephase, the
      source  increases  its  CWND  by  1/CWND  every  time  a  segment  is
      acknowledged.  The  slow  start  and  thecongestion  avoidance phases
      correspond to an exponential increase and a linear  increase  of  the
      congestion window every round trip time respectively.

      If  a  TCP  connection  loses  a  packet, the destination responds by
      sending duplicate acks for each  out-of-order  packet  received.  The
      source maintains a retransmission timeout for the last unacknowledged
      packet. The timeout value  is  reset  each  time  a  new  segment  is
      acknowledged.  Congestion is detected by the source by the triggering
      of the  retransmission  timeout.  At  this  point,  the  source  sets
      SSTHRESH  to half of CWND.  More precisely, SSTHRESH is set to maxf2,
      minfCWND/2, RCVWNDgg. CWND is set to one.


                           Figure 1: TCP CWND vs Time

      As a result, CWND < SSTHRESH and the source  enters  the  slow  start
      phase. The source then retransmits the lost segment and increases its
      CWND by one every time a new  segment  is  acknowledged.  The  source
      proceeds to retransmit all the segments since the lost segment before
      transmitting any  new  segments.  This  corresponds  to  a  go-back-N
      retransmission  policy.  Note that although the congestion window may
      increase beyond the advertised receiver window (RCVWND),  the  source
      window  is  limited bythe minimum of the two . The typical changes in
      the source window plotted against time are shown in Figure 1.

      Most TCP implementations use a  500  ms  timer  granularity  for  the
      retransmission  timeout. The TCP source estimates the Round Trip Time
      (RTT) of the connection by measuring the time (number of ticks of the
      timer)  between  the  sending of a segment and the receipt of the ack
      for the segment. The retransmission timer is calculated as a function
      of  the  estimates of the average and mean-deviation of the RTT [12].
      Because of coarse grained TCP timers,  when  there  is  loss  due  to
      congestion,   significanttime   may   be   lost   waiting   for   the
      retransmission timeout to trigger. The source does not send  any  new
      segments   when   duplicate   acks   are  being  received.  When  the
      retransmission timeout triggers, the connection enters the slow start
      phase.   As  a  result,  the link may remain idle for a long time and
      experience  low  utilization.  Moreover,  the  sender   attempts   to
      retransmit all the segments since the lost segment. Many of these may
      be discarded at the destination if the latter had cached the  out-of-
      order segments.

      Coarse  granularity  TCP timers and retransmission of segments by the
      go-back-N policy are the main reasons that TCP sources can experience
      low throughput and high file transfer delays during congestion.

      TCP  Reno  includes  the Fast Retransmit and Fast Recovery algorithms
      that improveTCP performance when a single segment is  lost.  However,
      in  high  bandwidth  links,  network congestion can result in several
      dropped segments. In this case, fast retransmit and recovery are  not
      able  to  recover  from  the loss and slow start is triggered. In our
      experiments, typical losses are  due  to  congestion  and  result  in
      multiple segments being dropped. Therefore, we study TCP without fast
      retransmit and recovery running on UBR.


      3 The Simulation Experiment


      3.1 Simulation Model

      All simulations presented in this contribution are performed on the N
      source  configuration  shown in Figure 2.  The configuration consists
      of N identical TCP sources that send data  whenever  allowed  by  the
      window.  The  switches  implement  UBR  service  with  optional  drop
      policies described in this  contribution.  The  following  simulation
      parameters are used [11]:


                    Figure 2: The N-source TCP configuration


          -The  configuration  consists of N identical TCP sources as shown
          in Figure 2.

          -All sources are infinite TCP sources. The TCP layer always sends
          a segment as long as it is permitted by the TCP window.

          -All  link  delays are 5 microseconds for LANs and 5 milliseconds
          for WANs.

          -All link bandwidths are 155.52 Mbps.

          -Peak Cell Rate is 155.52 Mbps.

          -The traffic is unidirectional. Only the sources send  data.  The
          destinationssend only acknowledgments.

          -TCP Fast Retransmit and Recovery are disabled. This isolates the
          slow-startand congestion avoidance  behavior  of  TCP.  Moreover,
          Fast Retransmit and Recovery are unable to handle multiple packet
          loss, which is seen in our simulations.


          -The TCP segment size is set to 512 bytes. This is  the  standard
          value used by current TCP implemen- tations. Larger segment sizes
          have been reported to produce higher TCP throughputs,  but  these
          have not been implemented in real TCP protocol stacks.

          -TCP  timer  granularity  is  set  to  100  ms.  This affects the
          triggering of retransmission timeout due  to  packet  loss.   The
          values  used  in  most  TCP  implementations  is 500 ms, and some
          implementations use 100  ms.  Several  other  studies  have  used
          smaller TCP timer granularity and have obtained higher throughput
          numbers. However, the timer granularity is an important factor in
          determining  the  amount  of  time  lost during congestion. Small
          granularity results in less  time  being  lost  waiting  for  the
          retransmission  timeout  to  trigger.   This  results  in  faster
          recovery and higher throughput.  However, TCP implementations  do
          not  use  timer  granularities of less than 100 ms, and producing
          results  with  lower  granularity  artificially   increases   the
          throughput.

          -TCP  maximum receiver window size is 64K bytes for LANs. This is
          the default value used in TCP.   For  WANs,  this  value  is  not
          enough  to  fill  up the pipe, and reach full throughput.  In the
          WAN simulations we use the TCP window scaling option to scale the
          window to the bandwidth delay product of approximately 1 RTT. The
          window size used for WANs is 600000 Bytes.

          -TCP delay ack timer is NOT set. Segments are acked as  soon  are
          they are received.

          -Duration  of  simulation  runs  is  10  seconds  for LANs and 20
          seconds for WANs.

          -All TCP sources start and stop at the same  time.  There  is  no
          processing   delay,  delay  variation  or  randomization  in  any
          component of the simulation. This highlights the effects  of  TCP
          synchronization as discussed later.


      3.2 Performance Metrics


      The  performance  of  TCP  over UBR is measured by the efficiency and
      fairness which are defined as follows:

      Efficiency= (Sum of TCP throughputs)=(Maximum possible TCP throughput)


      The TCP throughputs are  measured  at  the  destination  TCP  layers.
      Throughput  is  defined as the total number of bytes delivered to the
      destination application divided by the  total  simulation  time.  The
      results are reported in Mbps.

      The  maximum  possible TCP throughput is the throughput attainable by
      the TCP layer running over UBR on a 155.52 Mbps link. For  512  bytes
      of  data (TCP maximum segment size), the ATM layer receives 512 bytes
      of data + 20 bytes of TCP header + 20 bytes of IP header + 8 bytes of
      LLCheader  +  8 bytes of AAL5 trailer. These are padded to produce 12
      ATM cells. Thus, each TCP segment results in 636  bytes  at  the  ATM
      Layer. From this, the maximum possible throughput = 512/636 = 80.5% =
      125.2Mbps approximately on a 155.52 Mbps link.

                        Fairness Index= (xi)^2= (n*xi^2)

      Where xi= throughput of the ith TCP source, and n is  the  number  of
      TCP sources

      The  fairness  index  metric applies well to the n-source symmetrical
      configuration.  For  more  general   configurations   with   upstream
      bottlenecks, the max-min fairness criteria [5] can be used.


      4 TCP over UBR


      In  its  simplest  form, an ATM switch implements a tail drop policy.
      When a cell arrives at the FIFO queue, if the queue is full, the cell
      is  dropped, otherwise the cell is accepted. If acell is dropped, the
      TCP source loses time waiting for the retransmission  timeout.   Even
      though  TCP  congestion mechanisms effectively recover from loss, the
      resulting throughput can be very low. It is also  known  that  simple
      FIFO  buffering with tail drop results in excessive wasted bandwidth.
      Simple tail drop of ATM cells results in the  receipt  of  incomplete
      segments.  When  part  of  a  segment  is  dropped at the switch, the
      incomplete segment is dropped at the destination  during  reassembly.
      This wasted bandwidth further reduces theeffective TCP throughput.

      We  simulate  5 and 15 TCP sources with finite buffered switches. The
      simulationsare performed with three values  of  switch  buffer  sizes
      both  for  LAN  and  WAN links. For WAN experiments, we choose buffer
      sizes of approximately k times the  bandwidth-delay  product  of  the
      connection  for  k  = 1,2 and 3.  Thus, we select WAN buffer sizes of
      12000, 24000 and 36000 cells. These values are  chosen  because  most
      feedback  control  mechanisms  can  achieve  steady  state in a fixed
      number of round trip times, and have similar buffer requirements  for
      zero  loss  at  the  switch  [9].  It  is  interesting  to assess the
      performance of vanilla  UBR  in  this  situation.  For  LANs,  1  RTT
      Bandwidth  is  a very small number (11 cells) and is not practical as
      the size for the buffer. For LAN links, the buffer sizes  chosen  are
      1000,  2000,  and  3000cells.  These numbers are closer to the buffer
      sizes of current LAN switches.

      Column 4 of tables 2 and 3 show the efficiency  and  fairness  values
      respectivelyfor  these  experiments. Several observations can be made
      from these results.

          -TCP over vanilla UBR results in low fairness in both LAN and WAN
          configurations.  This  is due to TCP synchronization effects. TCP
          connections are  synchronized  when  their  sources  timeout  and
          retransmit at the same time. This occurs because packets from all
          sources are dropped forcing  them  to  enter  slow  start  phase.
          However,  in  this  case,  when  the  switch  buffer  is about to
          overflow, one or two  connections  get  lucky  and  their  entire
          windows   are   accepted   while  the  segments  from  all  other
          connections are dropped. All these connections wait for a timeout
          andstop  sending data into the network. The connections that were
          not dropped send their  next  window  and  keep  filling  up  the
          buffer.  All other connections timeout and retransmit at the same
          time. This resultsin their segments being dropped again  and  the
          synchronization  effect  is  seen.  The  sources  that escape the
          synchronization get most of the bandwidth.

          -The default TCP maximum window size leads to low  efficiency  in
          LANs.  LAN simulations have very low effeciency values (less than
          50%) while WAN simulations have  higher  effeciency  values.  For
          LANs,  the the TCP receiver window size (65535 Bytes) corresponds
          to more than 1500 cells at the switch  for  each  source.  For  5
          sources  and  a  buffer size of 1000 cells, the sum of the window
          sizes is almost 8 times the buffer  size.  For  WAN  simulations,
          with  5  sources and a buffer size of 12000 cells, the sum of the
          window sizes is less than 6 times the buffer  size.  Moreover,the
          larger RTT in WANs allows more cells to be cleared out before the
          next window is seen. As a result, the WAN simulations have higher
          throughputs  than  LANs.  For LAN experiments with smaller window
          sizes (less than the default), higher efficiency values are seen.

          -Efficiency  typically  increases  with  increasing  buffer size.
          Larger buffer sizes result in more cells  being  accepted  before
          loss  occurs,  and  therefore higher efficiency. This is a direct
          result of the dependence of the buffer requirements to the window
          sizes.

      TCP  performs best when there is zero loss. In this situation, TCP is
      able to fill the pipe and fully utilize the  link  bandwidth.  During
      the  exponential  rise  phase  (slow start), TCP sources send out two
      segments for every segment that is acked. For N TCP sources,  in  the
      worst  case,  a switch can receive a whole window's worth of segments
      from N-1 sources while it is still clearing  out  segments  from  the
      window  of  the  Nth  source. As a result, the switch can have buffer
      occupancies of up to the sum of all the  TCP  maximum  sender  window
      sizes.  This  is  especially  true  for  connections  with very small
      propagation delays. For large propagation delays, the switch has more
      time  to  clear  out  a segment before it sees the two segments which
      resulted from the ack.

      Table 1 contains the simulation results  for  TCP  running  over  UBR
      service  with  infinite  buffering.  The maximum queue length numbers
      give an indication of the  buffer  sizes  required  atthe  switch  to
      achieve  zero  loss  for  TCP.  The  connections  achieve 100% of the
      possible throughput and perfect fairness.


            Table 1: TCP over UBR: Buffer requirements for zero loss

      -------------------------------------------------------
      Number of Configuration Efficiency Fairness Maximum Queue
      Sources                                    (Cells)
      -------------------------------------------------------
           5     LAN              1       1           7591
          15     LAN              1       1          22831
           5     WAN              1       1          59211
          15     WAN              1       1         196203
      -------------------------------------------------------


      For the five source LAN configuration, the maximum  queue  length  is
      7591  cells  =7591 / 12 segments = 633 segments ss 323883 Bytes. This
      is approximately equal to the sum  of  the  TCPwindow  sizes  (655355
      Bytes).  For  the  five  source  WAN configuration, the maximum queue
      length is 59211 cells = 2526336 Bytes. This is slightly less that the
      sum  of  the  TCP  window  sizes  (600000 5 = 3000000 Bytes). This is
      because the switch has 1 RTT to clear out almost 500000 bytes of  TCP
      data  (at 155.52 Mbps) before it receives the next window of data. In
      any case, the increase in buffer requirement is proportional  to  the
      number  of  sources  in  the simulation. The maximum queue is reached
      just when theTCP connections reach the maximum  window.  After  that,
      the  window  stabilizes  and TCP's self clocking congestion mechanism
      puts one segment into the network for each segment  that  leaves  the
      network.  Fora  switch  to  guarantee zero loss for TCP over UBR, the
      amount of buffering required is equal to the sum of the  TCP  maximum
      window sizes for all the TCP connections.


      5 UBR+: Early Packet Discard

      The  Early  Packet  Discard  (EPD)  policy  [1] has been suggested to
      remedy some of the  problems  with  tail  drop  switches.  EPD  drops
      complete  packets  instead  of partial packets. As a result, the link
      does not carry incomplete packets which  would  have  been  discarded
      during reassembly. A threshold R less than the buffer size, is set at
      the switches. When the switch queue length  exceeds  this  threshold,
      all  cells  from  any new packets are dropped. Packets which had been
      partly received before exceeding the threshold are still accepted  if
      there  is  buffer  space.  In  the  worst case, the switch could have
      received onecell from all N connections before  its  buffer  exceeded
      the  threshold.  To accept all the incomplete packets,there should be
      additional buffer capacity of the sum of the packet sizes of all  the
      connections.  Typically,  the threshold R should be set to the buffer
      size N the maximum packet size, where N is  the  expected  number  of
      connections active at one time.

      The EPD algorithm used in our simulations is the one suggested by [3,
      10].  Column 5 of tables 2 and 3 show  the  efficiency  and  fairness
      respectively  of  TCP  over  UBR  with EPD. The switch thresholds are
      selected so as to allow one entire packet  from  each  connection  to
      arrive  after  the threshold is exceeded. We use thresholds of Buffer
      Size 200 cells in our simulations. 200 cells are enough to  hold  one
      packet each from all 15 TCP connections. This reflects the worst case
      scenario when all the fifteen connections  have  received  the  first
      cell  of  their  packet  and  then  the  buffer occupancy exceeds the
      threshold.

      Tables 2 and 3 show that EPD improves the efficiency of TCP over UBR,
      but   it   does   not   improve   fairness.   This   is  because  EPD
      indiscriminately  discards  complete  packets  from  all  connections
      without   taking   into   account   their  current  rates  or  buffer
      utilizations. When the bufferoccupancy exceeds the thresh-  old,  all
      new  packets  are  dropped. The slight improvement in fairness in the
      LANcases is because EPD can sometimes break TCP  synchronization  and
      in such cases only a few connections are dropped during congestion.


      6 UBR+: Selective Drop using per-VC accounting

      Per-VC accounting can be effectively used to achieve a greater degree
      of fairness among TCP connections.  A VC that is using  up  excessive
      share   of  the  throughput  or  buffer  capacity  can  be  penalized
      preferentially over another. The scheme presented here is  a  simpler
      version  of  the  Fair  BufferAllocation  scheme  proposed in [6] and
      described in the next section. Selective  Drop  keeps  track  of  the
      activity  of  each VC by counting the number of cells from each VC in
      the buffer. A VC is said to be active if ithas at least one  cell  in
      the  buffer.  A  fair allocation is calculated as the (current buffer
      occupancy) divided by (number of active VCs).

      Let the buffer occupancy be denoted by X, and the  number  of  active
      VCs be denoted by Na. Then,

                             Fair allocation = X/Na


      The  ratio  of  the number of cells of a VC in the buffer to the fair
      allocation gives a measure of how much  the  VC  is  overloading  the
      buffer  i.e.,  by what ratio it exceeds the fair allocation. Let Yibe
      the number of cells from V Ciin the buffer. Then the Load Ratio of  V
      Ciis defined as


      Load Ratio of VCi = (Number of Cells from VCi) / (Fair allocation) = Yi*Na / X

      If  the  load  ratio  of a VC is greater than a parameter Z, then new
      packets from that VC are dropped in preference to  packets  of  a  VC
      with  load ratio less than Z. Thus, Z is used as acutoff for the load
      ratio to indicate that the VC is overloading the switch.

      Figure 3 shows the buffer management of the  Selective  drop  scheme.
      For  a given buffer size K (cells), the selective drop scheme assigns
      a static  minimum  threshold  parameter  R  (cells).  If  the  buffer
      occupancy  X  is less than or equal to this minimum threshold R, then
      no cells are dropped. If the buffer occupancy is greater than R, then
      the next new incoming packet of V Ciis dropped if the load ratio of V
      Ciis greater than Z.

      We performed simulations to find the value of Z  that  optimizes  the
      efficiency  and  fairness  values.  We  first  performed 5 source LAN
      simulations with 1000 cell buffers.  We set R to 0.9 the buffer  size
      K.  This ensured that there was enough buffer space accept incomplete
      packets during congestion.We experimented with values of Z  =  2,  1,
      0.9,  0.5  and  0.2.  Z  =  0.9  resulted  in good results. A further
      simulation of Z around 0.9 shows that  Z  =  0.8  produces  the  best
      efficiency  and  fairness  values  for  this  configuration.  For WAN
      simulations, any Z value between 0.8 and  1  produces  good  results.
      Tables  2,3  show the simulation results for the optimal performances
      of each scheme. The following  observations  can  be  made  from  the
      simulation results:


          -Selective  Drop using per-VC accounting improves the fairness of
          TCP  over  UBR+EPD.  This  is  because  cells  from   overloading
          connections  are dropped in preference to underloading ones. As a
          result,  Selective  Drop  is  more  effective  in  breaking   TCP
          synchronization.When the buffer exceeds the threshold, only cells
          from overloading connections are  dropped.  This  frees  up  some
          bandwidth  and  allows  the  underloading connections to increase
          their window and obtain more throughput.

          -Fairness and efficiency increase with increase in buffer size.

          -Fairness decreases with increasing number of sources.


      7 UBR+: The Fair Buffer Allocation Scheme

      The Fair Buffer Allocation Scheme proposed by [6] uses a smooth  form
      of  the  parameter  Z anc compares it with the Load ratio of a VC. To
      make the cutoff smooth, FBA  uses  the  current  load  level  in  the
      switch.  The  scheme  compares  the load ratio of a VC to 1 + another
      threshold that determineshow much the switch is congested. Let  K  be
      the  buffer  capacity of the switch in cells. For a given buffer size
      K, the FBA scheme assigns a  static  Minimum  Threshold  parameter  R
      (cells).  If  the  buffer  occupancyX  is  less than or equal to this
      minimum threshold R, then no  cells  are  dropped.  When  the  buffer
      occupancy  is  greater  than  R,  then  upon the arrival of every new
      packet, the load ratio of the VC (to which  the  packet  belongs)  is
      compared   to  an  allowable  drop  threshold  calculated  as  Z(1  +
      (KX)/(XR)). In this equationZ is a linear scaling  factor.  The  next
      packet from V Ciis dropped if


                   (X > R) AND ( Yi*Na / X > Z((K-R)/(X-R)) )

      Figure  3  shows the switch buffer with buffer occupancies X relative
      to the minimum threshold R and the buffer size K where  incoming  TCP
      packets may be dropped.


      Figure 3: Selective Drop and FBA: Buffer Occupancy for drop


      Note  that  when  the  current buffer occupancy X exceeds the minimum
      threshold R, it is not always the case that a new packet is  dropped.
      The  load ratio in the above equation determines if V Ciis using more
      than a fair amount of buffer space. X / Na is used as a measure of  a
      fair  allocationfor  each  VC, and Z((K R)/(X R)) is a drop threshold
      for the buffer. If the current buffer occupancy (Yi) is greater  than
      this  dynamic  threshold times the fair allocation (X / Na), then the
      new packet of that VC is dropped.


      7.1 Effect of the minimum drop threshold R

      The load ratio threshold for dropping  a  complete  packet  is  Z((K-
      R)/(X-R)). AsR increases for a fixed value of the buffer occupancy X,
      X-R decreases, which means  that  the  drop  threshold  ((K-R)/(X-R))
      increases  and  each  connection is allowed to have more cells in the
      buffer. Higher values of R  provide  higher  efficiency  by  allowing
      higher  buffer  utilization.  Lower values of R should provide better
      fairness than higher values by dropping packets earlier.


      7.2 Effect of the linear scale factor Z

      The parameter Z scales the FBA drop  threshold  by  a  multiplicative
      factor.  Z  hasa  linear  effect  on  the drop threshold, where lower
      values of Z lower the threshold and vice versa. Higher  values  of  Z
      should  increase  the efficiency of the connections. However, if Z is
      very close to 1, then cellsfrom a connection may not be dropped until
      the buffer overflows.


      7.3 Effect of FBA parameters: Simulation results

      We performed a full factorial experiment with the following parameter
      variations for both LANs and WANs.  Each experiment was performed for
      N source configurations.

          -Number of sources, N = 5 and 15.

          -Buffer  capacity,  K  =  1000 , 2000 and 3000 cells for LANs and
          12000, 24000 and 36000 cells for WANs.

          -Minimum drop threshold, R = 0.9K , 0.5K and 0.1K.

          -Linear scale factor, Z = 0.2 , 0.5 and 0.8.


      A set of 54 experiments were conducted to determine the values  of  R
      and  Z  that maximized efficiency and fairness among the TCP sources.
      We sorted the results with respect to  the  efficiency  and  fairness
      values.   The  following observations can be made from the simulation
      results.

          -There is a tradeoff between efficiency and fairness. The highest
          values  of  fairness  (close  to  1)  have  the  lowest values of
          efficiency. The simulation data shows that these results are  for
          low  R  and  Z  values.  Higher values of the minimum threshold R
          combined with low Z valueslead  to  slightly  higher  efficiency.
          Efficiency  is  high for high values of R and Z. Lower efficiency
          values have either R or Z low, and higher efficiency values  have
          either  of  R or Z high. When R is low (0.1), the scheme can drop
          packets when the buffer occupancy exceeds a small fraction of the
          capacity.When  Z  is  low,  a  small  rise in the load ratio will
          result in its packets being dropped. This improves  the  fairness
          of  the  scheme,  but decreases the efficiency especially if R is
          also low. For configurations simulated,  we  foundthat  the  best
          value of R was about 0.9 and Z about 0.8.

          -The  fairness  of  the  scheme  is  sensitive to parameters. The
          simulation results showed that small changes in the values  of  R
          and  Z  can  result  in  significant  differences in the fairness
          results. With the increase  of  R  and  Z,  efficiency  shows  an
          increasing  trend.  However there isconsiderable variation in the
          fairness  numbers.  We  attribute  this  to  TCP  synchronization
          effects.   Sometimes,  a  single TCP source can get lucky and its
          packets are accepted while all  other  connections  are  dropped.
          When  the  source  finally  exceeds  its fair-share and should be
          dropped, the buffer is no longer above the threshold because  all
          other  sources  have  stopped sending packets and are waiting for
          timeout.

          -FBA improves both fairness and efficiency of TCP  over  UBR.  In
          general,  the average efficiency and fairness values for FBA (for
          optimal  parameter  values)  are  higher  than   the   previously
          discussed  options.  Tables  2,3 show the fairness and efficiency
          values for FBA switches with R = 0.9 and Z = 0.8.


      8 UBR+: Summary


      The previous sections have shown successive improvements for the  UBR
      service  category  in  ATM networks.  We summarize the results in the
      form of a comparative analysis of the various options in  UBR+.  This
      summary  is  based  on  the choice of optimal parameters for the drop
      policies. Forboth selective drop  and  fair  buffer  allocation,  the
      values of R and Z are chosen to be 0.9 and 0.8 respectively.


          -TCP  achieves  maximum  possible throughput when no segments are
          lost. To achieve zero  loss  for  TCP  over  UBR,  switches  need
          buffers  equal  to the sum of the receiver windows of all the TCP
          connections.

          -With limited buffer sizes, TCP performs poorly over vanilla  UBR
          switches.  TCP  throughput  is low, and there is unfairness among
          the connections. The coarse granularity TCP timer is an important
          reason for low TCP throughput.


                Table 2: UBR+: Comparative analysis (Efficiency)


      ----------------------------------------------------------------
      Config-   Number of Buffer        UBR     EPD     Selective FBA
      uration   Sources   Size(cells)                   Drop
      ----------------------------------------------------------------
      LAN         5        1000        0.21     0.49     0.75     0.88
      LAN         5        2000        0.32     0.68     0.85     0.84
      LAN         5        3000        0.47     0.72     0.90     0.92
      LAN         15       1000        0.22     0.55     0.76     0.91
      LAN         15       2000        0.49     0.81     0.82     0.85
      LAN         15       3000        0.47     0.91     0.94     0.95
      ----------------------------------------------------------------
      WAN         5        12000       0.86     0.90     0.90     0.95
      WAN         5        24000       0.90     0.91     0.92     0.92
      WAN         5        36000       0.91     0.81     0.81     0.81
      WAN         15       12000       0.96     0.92     0.94     0.95
      WAN         15       24000       0.94     0.91     0.94     0.96
      WAN         15       36000       0.92     0.96     0.96     0.95
      ----------------------------------------------------------------


                 Table 3: UBR+: Comparative analysis (Fairness)


      ----------------------------------------------------------------
      Config-  Number of Buffer       UBR      EPD    Selective  FBA
      uration  Sources   Size(cells)                  Drop
      ----------------------------------------------------------------
      LAN        5        1000        0.68     0.57     0.99     0.98
      LAN        5        2000        0.90     0.98     0.96     0.98
      LAN        5        3000        0.97     0.84     0.99     0.97
      LAN        15       1000        0.31     0.56     0.76     0.97
      LAN        15       2000        0.59     0.87     0.98     0.96
      LAN        15       3000        0.80     0.78     0.94     0.93
      ----------------------------------------------------------------
      WAN       5         12000       0.75     0.94     0.95     0.94
      WAN       5         24000       0.83     0.99     0.99     1
      WAN       5         36000       0.86     1        1        1
      WAN       15        12000       0.67     0.93     0.91     0.97
      WAN       15        24000       0.82     0.92     0.97     0.98
      WAN       15        36000       0.77     0.91     0.89     0.97
      ----------------------------------------------------------------


          -UBR with EPD improves the throughput performance of TCP. This is
          because partial packets are not being transmitted by the  network
          and  some  bandwidth  is  saved. EPD does not have much effect on
          fairness because it does not drop segments selectively.

          -UBR with selective packet drop using per-VC accounting  improves
          fairness over UBR+EPD. Connections with higher buffer occupancies
          are more likely to  be  dropped  inthis  scheme.  The  efficiency
          values are similar to the ones with EPD.

          -UBR  with  the  Fair  Buffer  Allocation  scheme can improve TCP
          throughput and fairness. There is a tradeoff  between  efficiency
          and  fairness and the scheme is sensitive to parameters. We found
          R  =  0.9  and  Z  =  0.8  to  produce  best  results   for   our
          configurations.

          -TCP  synchronization  is  an  important  factor that effects TCP
          throughput and fairness. Vanilla UBR and EPD are  ineffective  in
          breaking  TCP  synchronization  becausethey drop packets from all
          connections. Selective  feedback  schemes  are  needed  to  break
          synchronization  effects.  Some  values  of  FBA  parameters  are
          successful in breaking TCP synchronization, and for these values,
          we  see high values of efficiency and fairness. Some other papers
          on TCP over UBR have broken TCP synchronization  by  artificially
          staggering  the TCP sources or introducing some randomness in the
          simulation. This situation may not reflect  TCP  sources  in  the
          real  world  and  we  have chosen to not introduce any artificial
          randomness to break synchronization.


      References

      [1]Allyn Romanov, Sally Floyd, "Dynamics  of  TCP  Traffic  over  ATM
      Networks."

      [2]Chien  Fang,  Arthur  Lin: "On TCP Performance of UBR with EPD and
      UBR-EPD with a Fair Buffer Allocation  Scheme,"  ATM  FORUM  95-1645,
      December 1995.

      [3]Hongqing  Li, Kai-Yeung Siu, and Hong-Ti Tzeng, "TCP over ATM with
      ABR service versus UBR+EPD service," ATM FORUM 95-0718, June 1995.

      [4]Hongqing Li, Kai-Yeung Siu, Hong-Yi Tzeng, Brian Hang,  Wai  Yang,
      "Issues inTCP over ATM," ATM FORUM 95-0503, April 1995.

      [5]J.   Jaffe,   "Bottleneck  Flow  Control,"  IEEE  Transactions  on
      Communications, Vol. COM-29, No. 7, pp. 954-962.

      [6]Juha Heinanen,  and  Kalevi  Kilkki,  "A  fair  buffer  allocation
      scheme," Unpublished Manuscript.

      [7]Raj  Jain,  R.  Goyal,  S.  Kalyanaraman,  S. Fahmy, F. Lu, and S.
      Srinidhi, "Buffer requirements for TCP over UBR" ATM  FORUM  96-0518,
      April 1996.

      [8]Shirish  S. Sathaye, "ATM Traffic Management Specification Version
      4.0," ATMForum/95-0013R10, February 1996.

      [9]Shiv Kalyanaraman, Raj Jain, Sonia Fahmy, Rohit Goyal, Fang Lu and
      Saragur  Srinidhi,  "Performance  of  TCP/IP  over  ABR,"  To appear:
      Proceedings of Globecom'96.

      [10]Stephen Keung, Kai-Yeung Siu,  "Degradation  in  TCP  Performance
      under Cell Loss," ATM FORUM 94-0490, April 1994.

      [11]Tim  Dwight,  "Guidelines for the Simulation of TCP/IP over ATM,"
      ATM FORUM 95-0077r1, March 1995.

      [12]V. Jacobson, "Congestion Avoidance and Control,"  Proceedings  of
      the SIGCOMM'88 Symposium, pp. 314-32, August 1988.


      Appendix:
      The detailed simulation results for Selective Discard and Fair Buffer
      Allocation, and the scatter plots of efficiency and Fairness for Fair
      Buffer Allocation are presented here.

      Figure 4: FBA LAN: Efficiency vs Fairness


      Figure 5: FBA WAN: Efficiency vs Fairness

         Table 4: TCP over UBR+: Selective drop using per-VC accounting

      --------------------------------------------------------
      Number of Config- Buffer  Z     Effic   Fairness
      Sources  uration (Cells)
      --------------------------------------------------------
          5     LAN    1000    2      0.36    0.78
          5     LAN    1000    1      0.13    0.81
          5     LAN    1000    0.95   0.72    0.93
          5     LAN    1000    0.9    0.65    0.96
          5     LAN    1000    0.85   0.68    0.89
          5     LAN    1000    0.8    0.75    0.98
          5     LAN    1000    0.75   0.63    0.95
          5     LAN    1000    0.5    0.57    0.95
          5     LAN    1000    0.2    0.50    0.58
          5     LAN    2000    1      0.47    0.92
          5     LAN    2000    0.9    0.72    0.98
          5     LAN    2000    0.8    0.84    0.95
          5     LAN    3000    1      0.88    0.99
          5     LAN    3000    0.9    0.89    0.98
          5     LAN    3000    0.8    0.90    0.98
          15    LAN    1000    1      0.38    0.48
          15    LAN    1000    0.9    0.73    0.77
          15    LAN    1000    0.8    0.75    0.76
          15    LAN    2000    1      0.38    0.13
          15    LAN    2000    0.9    0.91    0.95
          15    LAN    2000    0.8    0.81    0.97
          15    LAN    3000    1      0.93    0.94
          15    LAN    3000    0.9    0.95    0.95
          15    LAN    3000    0.8    0.94    0.94
          5     WAN    12000   2      0.86    0.93
          5     WAN    12000   1      0.91    0.96
          5     WAN    12000   0.9    0.86    0.93
          5     WAN    12000   0.8    0.90    0.95
          5     WAN    12000   0.5    0.89    0.94
          5     WAN    24000   1      0.92    0.97
          5     WAN    24000   0.9    0.92    0.97
          5     WAN    24000   0.8    0.91    0.98
          5     WAN    36000   1      0.85    0.99
          5     WAN    36000   0.9    0.80    0.99
          5     WAN    36000   0.8    0.80    0.99
          15    WAN    12000   1      0.93    0.97
          15    WAN    12000   0.9    0.92    0.97
          15    WAN    12000   0.8    0.93    0.90
          15    WAN    24000   1      0.95    0.89
          15    WAN    24000   0.9    0.94    0.92
          15    WAN    24000   0.8    0.94    0.96
          15    WAN    36000   1      0.94    0.97
          15    WAN    36000   0.9    0.96    0.92
          15    WAN    36000   0.8    0.96    0.88
      --------------------------------------------------------


              Table    5:TCP over UBR+: Fair Buffer Allocation LAN

      --------------------------------------------------------
      Number of Z     R     Buffer  Effic   Fairness
      Sources               (Cells)
      --------------------------------------------------------
          5    0.8    0.9    1000    0.66    0.93
          5    0.5    0.9    1000    0.80    0.99
          5    0.2    0.9    1000    0.71    0.92
          5    0.8    0.5    1000    0.21    0.45
          5    0.5    0.5    1000    0.05    1.00
          5    0.2    0.5    1000    0.33    0.78
          5    0.8    0.1    1000    0.06    1.00
          5    0.5    0.1    1000    0.04    1.00
          5    0.2    0.1    1000    0.01    1.00
          5    0.8    0.9    2000    0.84    0.98
          5    0.5    0.9    2000    0.83    0.97
          5    0.2    0.9    2000    0.89    0.97
          5    0.8    0.5    2000    0.47    0.77
          5    0.5    0.5    2000    0.58    0.97
          5    0.2    0.5    2000    0.93    0.99
          5    0.8    0.1    2000    0.20    1.00
          5    0.5    0.1    2000    0.10    1.00
          5    0.2    0.1    2000    0.04    1.00
          5    0.8    0.9    3000    0.91    0.97
          5    0.5    0.9    3000    0.88    0.96
          5    0.2    0.9    3000    0.88    0.98
          5    0.8    0.5    3000    0.92    0.99
          5    0.5    0.5    3000    0.94    0.96
          5    0.2    0.5    3000    0.94    0.90
          5    0.8    0.1    3000    0.87    0.93
          5    0.5    0.1    3000    0.20    1.00
          5    0.2    0.1    3000    0.39    0.82
          15   0.8    0.9    1000    0.60    0.71
          15   0.5    0.9    1000    0.68    0.77
          15   0.2    0.9    1000    0.68    0.62
          15   0.8    0.5    1000    0.28    0.34
          15   0.5    0.5    1000    0.21    0.45
          15   0.2    0.5    1000    0.40    0.61
          15   0.8    0.1    1000    0.04    1.00
          15   0.5    0.1    1000    0.06    0.20
          15   0.2    0.1    1000    0.01    0.99
          15   0.8    0.9    2000    0.85    0.96
          15   0.5    0.9    2000    0.92    0.96
          15   0.2    0.9    2000    0.87    0.96
          15   0.8    0.5    2000    0.74    0.72
          15   0.5    0.5    2000    0.73    0.63
          15   0.2    0.5    2000    0.80    0.88
          15   0.8    0.1    2000    0.11    1.00
          15   0.5    0.1    2000    0.14    0.33
          15   0.2    0.1    2000    0.20    0.29
          15   0.8    0.9    3000    0.95    0.93
          15   0.5    0.9    3000    0.94    0.96
          15   0.2    0.9    3000    0.92    0.97
          15   0.8    0.5    3000    0.43    0.74
          15   0.5    0.5    3000    0.80    0.85
          15   0.2    0.5    3000    0.85    0.90
          15   0.8    0.1    3000    0.18    1.00
          15   0.5    0.1    3000    0.11    1.00
          15   0.2    0.1    3000    0.04    1.00
      --------------------------------------------------------


               Table 6: TCP over UBR+: Fair Buffer Allocation WAN
      --------------------------------------------------------
      Number of Z     R      Buffer  Effic   Fairness
      Sources               (Cells)
      --------------------------------------------------------
          5    0.8    0.9    12000    0.95    0.94
          5    0.5    0.9    12000    0.95    0.98
          5    0.2    0.9    12000    0.91    0.97
          5    0.8    0.5    12000    0.84    0.99
          5    0.5    0.5    12000    0.92    0.96
          5    0.2    0.5    12000    0.89    0.96
          5    0.8    0.1    12000    0.88    0.98
          5    0.5    0.1    12000    0.88    0.95
          5    0.2    0.1    12000    0.78    0.97
          5    0.8    0.9    24000    0.92    1.00
          5    0.5    0.9    24000    0.93    0.95
          5    0.2    0.9    24000    0.93    1.00
          5    0.8    0.5    24000    0.93    1.00
          5    0.5    0.5    24000    0.93    1.00
          5    0.2    0.5    24000    0.86    0.96
          5    0.8    0.1    24000    0.93    0.98
          5    0.5    0.1    24000    0.93    0.97
          5    0.2    0.1    24000    0.85    0.99
          5    0.8    0.9    36000    0.81    1.00
          5    0.5    0.9    36000    0.81    1.00
          5    0.2    0.9    36000    0.81    1.00
          5    0.8    0.5    36000    0.81    1.00
          5    0.5    0.5    36000    0.86    0.99
          5    0.2    0.5    36000    0.93    1.00
          5    0.8    0.1    36000    0.93    1.00
          5    0.5    0.1    36000    0.89    0.98
          5    0.2    0.1    36000    0.87    0.99
          15   0.8    0.9    12000    0.95    0.97
          15   0.5    0.9    12000    0.94    0.99
          15   0.2    0.9    12000    0.96    0.98
          15   0.8    0.5    12000    0.95    0.99
          15   0.5    0.5    12000    0.96    0.98
          15   0.2    0.5    12000    0.96    0.98
          15   0.8    0.1    12000    0.94    0.98
          15   0.5    0.1    12000    0.91    0.99
          15   0.2    0.1    12000    0.86    0.98
          15   0.8    0.9    24000    0.96    0.98
          15   0.5    0.9    24000    0.96    0.98
          15   0.2    0.9    24000    0.96    0.98
          15   0.8    0.5    24000    0.94    0.98
          15   0.5    0.5    24000    0.94    0.97
          15   0.2    0.5    24000    0.95    0.98
          15   0.8    0.1    24000    0.93    0.99
          15   0.5    0.1    24000    0.94    0.97
          15   0.2    0.1    24000    0.96    0.99
          15   0.8    0.9    36000    0.95    0.97
          15   0.5    0.9    36000    0.96    0.97
          15   0.2    0.9    36000    0.96    0.97
          15   0.8    0.5    36000    0.96    0.99
          15   0.5    0.5    36000    0.95    0.98
          15   0.2    0.5    36000    0.96    0.97
          15   0.8    0.1    36000    0.94    1.00
          15   0.5    0.1    36000    0.94    0.95
          15   0.2    0.1    36000    0.96    0.98
      -------------------------------------------------------