********************************************************************************

      ATM Forum Document Number: ATM_Forum/96-1761

      ********************************************************************************

      Title: Further Results on UBR+: Effect of Fast Retransmit and Recovery.

      ********************************************************************************

      Abstract:
      In  a  previous  contribution, we studied the performance of TCP over
      UBR+. We showed that TCP perfor- mance improves  by  adding  EPD  and
      Fair  buffer  management techniques over UBR. In this contribution we
      study  the  effect  of  Fast  Retransmit  and   Recovery   on   these
      enhancements.  In  general, Fast Retransmit and Recovery improves the
      performance of TCP over UBR. However in some cases,  the  performance
      is  degraded.   The incremental gain with Fair Buffer Allocation over
      simple selective drop using per-VC accounting is small.

      ********************************************************************************

      Source:

      Rohit Goyal, Raj Jain, Shiv Kalyanaraman, and Sonia Fahmy.
      The Ohio State University (and NASA)
      Department of CIS Columbus, OH 43210-1277
      Phone: 614-292-3989, Fax: 614-292-2911,
      Email: Jain@ACM.Org

      Seong-Cheol Kim
      Samsung Electronics Co. Ltd.
      Chung-Ang Newspaper
      Bldg.  8-2, Karak-Dong,
      Songpa-Ku Seoul, Korea 138-160
      Email: kimsc@metro.telecom.samsung.co.kr

      ********************************************************************************

      Date: December 1996.

      ********************************************************************************

      Distribution: ATM Forum Technical Working Group Members (Traffic Management)

      ********************************************************************************

      Notice: This contribution has been prepared to assist the ATM  Forum.
      It  is  offered  to  the Forum as a basis for discussion and is not a
      binding  proposal  on  the  part   of   any   of   the   contributing
      organizations.  The  statements  are  subject  to  change in form and
      content after further study.  Specifically, the contributors  reserve
      the right to add to, amend or modify the statements contained herein.

      ********************************************************************************

      A postscript version of this contribution including all figures and
      tables has been uploaded to the ATM forun ftp server in the incoming
      directory. It may be moved from there to the atm96
      directory. The postscript version is also available on our web page as:

      http://www.cse.wustl.edu/~jain/atmf/atm96-1761.ps (261 kB)
      or
      http://www.cse.wustl.edu/~jain/atmf/atm96-1761.zip (51 kB)

      ********************************************************************************

      1 Introduction
      In our previous contribution [12], we studied the performance of  TCP
      over UBR [13]. We studied several enhanced versions of UBR with three
      buffer management policies-Early Packet Discard, selective drop using
      per-VC accounting, and Fair Buffer Allocation. A brief summary of the
      results is given below.

          o TCP achieves maximum possible throughput when no  segments  are
          lost.  To  achieve  zero  loss  for  TCP  over UBR, switches need
          buffers equal to the sum of the receiver windows of all  the  TCP
          connections.

          o With limited buffer sizes, TCP performs poorly over vanilla UBR
          switches.  TCP throughput is low, and there is  unfairness  among
          the  connections [2, 3, 4, 5, 11, 16]. The coarse granularity TCP
          timer is an important reason for low TCP throughput.

          o UBR with EPD improves the throughput performance of  TCP.  This
          is  because  partial  packets  are  not  being transmitted by the
          network and some bandwidth is  saved.  EPD  does  not  have  much
          effect  on fairness because it does not drop segments selectively
          [11].

          o UBR with selective packet drop using per-VC accounting improves
          fairness over UBR+EPD. Connections with higher buffer occupancies
          are more likely to be dropped  in  this  scheme.  The  efficiency
          values are slightly better than those with EPD.

          o  UBR  with  the  Fair  Buffer Allocation scheme can improve TCP
          throughput  and  fairness  [8].  There  is  a  tradeoff   between
          efficiency   and   fairness   and  the  scheme  is  sensitive  to
          parameters. We found R = 0.9 and Z = 0.8 to produce best  results
          for our configurations.

          o  TCP  synchronization  is  an important factor that effects TCP
          throughput and fairness.  In  this  contribution,  we  study  the
          effect  of Fast Retransmit and recovery on the performance of TCP
          over UBR.

      2   Fast Retransmit and Recovery

      TCP  congestion  control  techniques  are  called  slow   start   and
      congestion   avoidance   [18].  TCP  Reno  introduces  an  additional
      congestion control mechanism  called  Fast  Retransmit  and  Recovery
      (FRR).  FRR  is  designed  for  quick  recovery  from isolated packet
      losses. Without FRR, every time a packet is lost, TCP waits  for  the
      retransmission  timeout and then enters slow start mode. As a result,
      much time is  lost  waiting  for  the  retransmission  timeout  which
      typically has a granularity of 100-500 milliseconds.  Fast Retransmit
      and Recovery works as  follows.  When  a  packet  is  lost,  the  TCP
      destination  sends  duplicate  ACKs  upon  the  receipt of subsequent
      packets (these packets are out of sequence since there is  a  missing
      packet).  When  the  source receives the third duplicate ACK, it sets
      SSTHRESH to half of the congestion window (CWND), and retransmits the
      segment  that  is  missing (from the ACK number). It then reduces its
      congestion window to half its previous  value  plus  three  (for  the
      three  ACKs  it  received). Now for every additional duplicate ACK it
      receives, the source sends an additional packet (if  allowed  by  the
      maximum  window)  and  also  increments  CWND by one segment. When it
      receives a new ACK (meaning that the  destination  has  received  the
      missing  segment)  the  source  sets CWND to SSTHRESH (i.e., half the
      value of CWND before the fast retransmit and recovery  began).  Since
      SSTHRESH  and  CWND  are  now equal, the source now enters congestion
      avoidance mode.   Fast  Retransmit  and  Fast  Recovery  improve  TCP
      performance when a single segment is lost. However, in high bandwidth
      links, network congestion results in  several  dropped  segments.  In
      this  case, fast retransmit and recovery are not able to recover from
      the loss and slow start is triggered. Moreover, [7] points  out  that
      in  some  cases  retransmission  of  packets cached in the receiver's
      reassembly queue result in false retransmits.

      In this case, the sender goes into  congestion  avoidance  mode  when
      there  is  no congestion in the network. As a result, Fast retransmit
      and recovery are effective only in isolated packet losses.

      3   The Simulation Experiment

      3.1  Simulation Model

      All simulations presented in this contribution are performed on the N
      source  configuration  shown in Figure 1.  The configuration consists
      of N identical TCP sources that send data  whenever  allowed  by  the
      window.  The  switches  implement  UBR  service  with  optional  drop
      policies described in this  contribution.  The  following  simulation
      parameters are used [17]:

                    Figure 1: The N-source TCP configuration

          o  The configuration consists of N identical TCP sources as shown
          in Figure 1.

          o All sources are infinite TCP  sources.  The  TCP  layer  always
          sends a segment as long as it is permitted by the TCP window.

          o  All link delays are 5 microseconds for LANs and 5 milliseconds
          for WANs. Thus, the Round Trip Time due to the propagation  delay
          is 30 microseconds or 30 milliseconds respectively.

          o All link bandwidths are 155.52 Mbps.

          o Peak Cell Rate is 155.52 Mbps.

          o  The traffic is unidirectional. Only the sources send data. The
          destinations send only acknowledgments.

          o TCP Fast Retransmit and Recovery is enabled.

          o The TCP segment size is set to 512 bytes. This is the  standard
          value  used  by current TCP implementations. Larger segment sizes
          have been reported to produce higher TCP  throughput,  but  these
          have not been implemented in real TCP protocol stacks.

          o  TCP  timer  granularity  is  set  to  100 ms. This affects the
          triggering of retransmission timeout due  to  packet  loss.   The
          values  used  in  most  TCP  implementations  is 500 ms, and some
          implementations use 100  ms.  Several  other  studies  have  used
          smaller TCP timer granularity and have obtained higher throughput
          numbers. However, the timer granularity is an important factor in
          determining  the  amount  of  time  lost during congestion. Small
          granularity results in less  time  being  lost  waiting  for  the
          retransmission  timeout  to  trigger.   This  results  in  faster
          recovery and higher throughput.  However, TCP implementations  do
          not  use  timer  granularities of less than 100 ms, and producing
          results  with  lower  granularity  artificially   increases   the
          throughput.

          o TCP maximum receiver window size is 64K bytes for LANs. This is
          the default value used in TCP. For WANs, this value is not enough
          to  fill  up  the  pipe,  and  reach  full throughput. In the WAN
          simulations we use the TCP window scaling  option  to  scale  the
          window to the bandwidth delay product of approximately 1 RTT. The
          window size used for WANs is 600000 Bytes.

          o TCP delay ack timer is NOT set. Segments are acked as soon  are
          they are received.

          o  Duration  of  simulation  runs  is  10 seconds for LANs and 20
          seconds for WANs.

          o All TCP sources start and stop at the same time.  There  is  no
          processing   delay,  delay  variation  or  randomization  in  any
          component of the simulation. This highlights the effects  of  TCP
          synchronization as discussed later.

      3.2  Performance Metrics

      The  performance  of  TCP  over UBR is measured by the efficiency and
      fairness which are defined as follows:

      Efficiency = (Sum of TCP throughputs)/(Maximum possible TCP throughput)

      The TCP throughputs are  measured  at  the  destination  TCP  layers.
      Throughput  is  defined as the total number of bytes delivered to the
      destination application divided by the  total  simulation  time.  The
      results are reported in Mbps.  The maximum possible TCP throughput is
      the throughput attainable by the TCP layer  running  over  UBR  on  a
      155.52  Mbps  link. For 512 bytes of data (TCP maximum segment size),
      the ATM layer receives 512 bytes of data + 20 bytes of TCP  header  +
      20  bytes  of  IP  header  +  8 bytes of LLC header + 8 bytes of AAL5
      trailer. These are padded to produce 12 ATM  cells.  Thus,  each  TCP
      segment results in 636 bytes at the ATM Layer. From this, the maximum
      possible throughput = 512/636 = 80.5% = 125.2 Mbps approximately on a
      155.52 Mbps link.

                 Fairness Index = (Sum(xi))^2 / (n * Sum(xi^2))

      Where  xi=  throughput  of the ith TCP source, and n is the number of
      TCP sources The fairness index metric applies well  to  the  n-source
      symmetrical  configuration.  For  more  general configu- rations with
      upstream bottlenecks, the max-min fairness criteria [6] can be  used.

      4   Results

      We   performed   full   factorial   simulations   for   LAN  and  WAN
      configurations, with 5  and  15  sources  with  each  of  the  buffer
      management  policies.  For LAN configurations, switch buffer sizes of
      1000 cells and 3000 cells  were  simulated.  For  WAN  configurations
      switch  buffer  sizes  of 12000 cells and 36000 cells were simulated.
      Tables 1 and 2 are from our previous contribution [12]. Tables 3  and
      4  are  the  analogous  tables with Fast Retransmit and Recovery. The
      last row of each table gives the column averages  of  the  respective
      efficiency  and  fairness  values.  This  gives  a rough quantitative
      measure of how much each feature of UBR+ improves the efficiency  and
      fairness.

         Table 1: UBR+ without Fast Retransmit and Recovery (Efficiency)

      -------------------------------------------------------------
      Config-   Number of  Buffer  UBR    EPD  Selective  FBA
      uration   Sources    Size                 Drop
                           (cells)
      -------------------------------------------------------------
      LAN         5       1000    0.21    0.49    0.75   0.88
      LAN         5       3000    0.47    0.72    0.90   0.92
      LAN        15       1000    0.22    0.55    0.76   0.91
      LAN        15       3000    0.47    0.91    0.94   0.95
      -------------------------------------------------------------
      WAN         5      12000    0.86    0.90    0.90   0.95
      WAN         5      36000    0.91    0.81    0.81   0.81
      WAN        15      12000    0.96    0.92    0.94   0.95
      WAN        15      36000    0.92    0.96    0.96   0.95
      -------------------------------------------------------------
      Column Average              0.63    0.78    0.87   0.92
      -------------------------------------------------------------

          Table 2: UBR+ without Fast Retransmit and Recovery (Fairness)
      -------------------------------------------------------------
      Config-   Number of  Buffer  UBR    EPD  Selective  FBA
      uration   Sources    Size                 Drop
                           (cells)
      -------------------------------------------------------------
      LAN         5      1000    0.68    0.57    0.99   0.98
      LAN         5      3000    0.97    0.84    0.99   0.97
      LAN        15      1000    0.31    0.56    0.76   0.97
      LAN        15      3000    0.80    0.78    0.94   0.93
      -------------------------------------------------------------
      WAN         5     12000    0.75    0.94    0.95   0.94
      WAN         5     36000    0.86       1       1      1
      WAN        15     12000    0.67    0.93    0.91   0.97
      WAN        15     36000    0.77    0.91    0.89   0.97
      -------------------------------------------------------------
      Column Average             0.73    0.82    0.93   0.97
      -------------------------------------------------------------

      From  the  average  of the efficiency values of vanilla UBR (tables 1
      and 3), fast retransmit and recovery seems to improve the  efficiency
      of  TCP  over UBR (Efficiency = 0.65 without fast retransmit and 0.73
      with  fast  retransmit).  However,  for  the  WAN  simulations,  fast
      retransmit  and  recovery  hurts  the  efficiency. This is because in
      vanilla UBR, congestion typically results in multiple  packets  being
      dropped.  Fast  retransmit  and recovery cannot recover from multiple
      packet losses and slow start is triggered.  The  additional  segments
      sent  by fast retransmit and recovery (while duplicate ACKs are being
      received) are retransmit during slow start. In WAN links  with  large
      bandwidth delay products, the number of retransmitted segments can be
      significant. Thus, fast retransmit can  add  to  the  congestion  and
      reduce  throughput.  Also,  the  phenomenon  of  false retransmits as
      described in section 2  results  in  wasted  throughput  because  the
      source  enters  congestion  avoidance  mode by setting the slow start
      threshold variable SSTHRESH to one-half the congestion window  (CWND)
      value.   The  fairness  values  with fast retransmit and recovery are
      better for vanilla UBR. This is because, fast retransmit and recovery
      helps  in  mitigating  the  TCP synchronization effects. Sources with
      fast retransmit and recovery  do  not  wait  for  the  retransmission
      timeout  to  trigger,  but retransmit the lost packet as soon as they
      receive duplicate ACKs. Also, for every duplicate ACK received, a new
      segment  is  sent if allowed by the window.  The addition of EPD with
      fast retransmit and recovery results in a large improvement  in  both
      fairness.

          Table 3: UBR+ with Fast Retransmit and Recovery (Efficiency)
      -------------------------------------------------------------
      Config-   Number of  Buffer  UBR    EPD  Selective  FBA
      uration   Sources    Size                 Drop
                           (cells)
      -------------------------------------------------------------
      LAN         5      1000    0.53    0.97    0.97   0.97
      LAN         5      3000    0.89    0.97    0.97   0.97
      LAN        15      1000    0.42    0.97    0.97   0.97
      LAN        15      3000    0.92    0.97    0.97   0.97
      -------------------------------------------------------------
      WAN         5     12000    0.61    0.79     0.8   0.76
      WAN         5     36000    0.66    0.75    0.77   0.78
      WAN        15     12000    0.88    0.95    0.79   0.79
      WAN        15     36000    0.96    0.96    0.86   0.89
      -------------------------------------------------------------
      Column Average             0.73    0.92    0.89   0.89
      -------------------------------------------------------------

           Table 4: UBR+ with Fast Retransmit and Recovery (Fairness)
      -------------------------------------------------------------
      Config-   Number of  Buffer  UBR    EPD  Selective  FBA
      uration   Sources    Size                 Drop
                           (cells)
      -------------------------------------------------------------
      LAN         5      1000     0.77   0.99    0.99   0.97
      LAN         5      3000     0.93   0.99       1   0.99
      LAN        15      1000     0.26   0.96    0.99   0.69
      LAN        15      3000     0.87   0.99    0.99      1
      -------------------------------------------------------------
      WAN         5     12000     0.99      1    0.99      1
      WAN         5     36000     0.97   0.99    0.99      1
      WAN        15     12000     0.91   0.96    0.99   0.95
      WAN        15     36000     0.74   0.91    0.98   0.98
      -------------------------------------------------------------
      Column Average              0.81   0.97    0.99   0.95
      -------------------------------------------------------------

      Average  efficiency  improves from 0.73 without EPD to 0.92 with EPD.
      Fairness improves from 0.81 to 0.97.  Thus, the  combination  on  EPD
      and  fast  retransmit  can  provide  high  throughput and fairness in
      configurations similar to those simulated here.  The incremental gain
      achieved by adding selective drop and fair buffer allocation is small
      compared to the gain from EPD.  Efficiency  decreases  slightly  from
      0.92  to 0.89, and fairness improves from 0.97 to 0.99 with selective
      drop. Fair buffer allocation actually hurts the fairness in one  case
      (fairness  =  0.69  for  15  source LAN configuration with 1000 cells
      buffer  size).  This  is  because  fair  buffer  allocation  is  very
      sensitive  to the parameters, and slightly different parameters could
      result in much improved performance. In our simulations, we  use  the
      parameters  that we found best from our previous contribution [12] so
      that the results can be consistently  compared.  The  combination  of
      early packet discard and fast retransmit and recovery is effective in
      breaking  TCP  synchronization,  and  thus  improves   fairness   and
      efficiency of the N source symmetrical configurations.

      5   Summary

      This contribution examines the effect of fast retransmit and recovery
      on the performance of TCP over UBR+.  The following  conclusions  can
      be drawn from our simulations of N symmetrical TCP sources.

          o  Fast  retransmit  recovers  from isolated packet losses faster
          than slow start. Since it does not  depend  on  a  common  coarse
          granularity timer, fast retransmit and recovery helps in breaking
          TCP synchronization. As a result vanilla UBR performs  better  in
          some cases.

          o  For  long  links  (WAN configuration) fast retransmit actually
          hurts the throughput. This  is  because  fast  retransmit  cannot
          recover  from multiple segment losses and transmits segments that
          are retransmitted by slow start. False retransmits  also  degrade
          the throughput performance.

          o  Early Packet Discard improves the performance of TCP with fast
          retransmit and recovery, over UBR.

          o The incremental gain obtained by adding Fair Buffer  Allocation
          and  Selective  Drop  is  much  less  with  fast  retransmit  and
          recovery.

      The TCP Selective Acknowledgment option (SACK) has  been  recommended
      to overcome the shortcomings of fast retransmit and recovery. SACK is
      expected to improve performance over  fast  retransmit  in  cases  of
      multiple packet loss. The effect of SACK on UBR+ is an area of future
      study.

      References

          [1] Allyn Romanov, Sally Floyd, "Dynamics of TCP Traffic over ATM
          Networks."

          [2]  Chien  Fang, Arthur Lin: "On TCP Performance of UBR with EPD
          and UBR-EPD with a Fair  Buffer  Allocation  Scheme,"  ATM  FORUM
          95-1645, December 1995.

          [3]  Hongqing Li, Kai-Yeung Siu, and Hong-Ti Tzeng, "TCP over ATM
          with ABR service versus UBR+EPD service," ATM FORUM 95-0718, June
          1995.

          [4]  Hongqing  Li,  Kai-Yeung Siu, Hong-Yi Tzeng, Brian Hang, Wai
          Yang, "Issues in TCP over ATM," ATM FORUM 95-0503, April 1995.

          [5]  Hongqing  Li,  Kai-Yeung  Siu,  and  Hong-Ti   Tzeng,   "TCP
          Performance  over  ABR  and  UBR services in ATM," Proceedings of
          IPCCC'96, March 1996.

          [6] J. Jaffe, "Bottleneck Flow  Control,"  IEEE  Transactions  on
          Communications, Vol. COM-29, No. 7, pp. 954-962.

          [7]   Janey  C.  Hoe,  "Improving  the  Start-Up  Behavior  of  a
          Congestion Control  Scheme  for  TCP,"  Proceedings  of  the  ACM
          SIGCOMM, August 1996.

          [8]  Juha  Heinanen, and Kalevi Kilkki, "A fair buffer allocation
          scheme," Unpublished Manuscript.

          [9] Raj  Jain,  Shiv  Kalyanaraman,  Rohit  Goyal,  Sonia  Fahmy,
          Saragur  Srinidhi,  "Buffer  Requirements for TCP over ABR ," ATM
          Forum/96-0517

          [10] Raj Jain, Rohit Goyal, Shiv Kalyanaraman, and  Sonia  Fahmy,
          "Performance  of  TCP  over  UBR  and  buffer  requirements," ATM
          Forum/96-0518

          [11] Raj Jain, R. Goyal, S. Kalyanaraman, S. Fahmy, F. Lu, and S.
          Srinidhi,  "Buffer  requirements  for  TCP  over  UBR"  ATM FORUM
          96-0518, April 1996.

          [12] R. Goyal, Raj Jain, S. Kalyanaraman, S.  Fahmy,  Seong-Cheol
          Kim,  "Performance  of TCP over UBR+," ATM Forum/96-1269, October
          1996.

          [13] Shirish S. Sathaye, "ATM  Traffic  Management  Specification
          Version 4.0," ATM Forum/95-0013R10,
              February 1996.

          [14]  Shiv Kalyanaraman, Raj Jain, Sonia Fahmy, Rohit Goyal, Fang
          Lu and  Saragur  Srinidhi,  "Performance  of  TCP/IP  over  ABR,"
          Proceedings of Globecom. November 1996.

          [15]  Shivkumar Kalyanaraman, Raj Jain, Sonia Fahmy, Rohit Goyal,
          Jianping Jiang and Seong-Cheol Kim, "Performance of TCP over  ABR
          on  ATM  backbone  and  with  various  VBR traffic patterns," ATM
          Forum/96-1294:

          [16]  Stephen  Keung,  Kai-Yeung   Siu,   "Degradation   in   TCP
          Performance under Cell Loss," ATM FORUM 94-0490, April 1994.

          [17]  Tim  Dwight,  "Guidelines for the Simulation of TCP/IP over
          ATM," ATM FORUM 95-0077r1, March 1995.

          [18] V. Jacobson, "Congestion Avoidance and Control," Proceedings
          of the SIGCOMM'88 Symposium, pp. 314-32, August 1988.


      All  our  papers  and  ATM  Forum  Contributions  are  available from
      http://www.cse.wustl.edu/~jain