********************************************************************************

      ATM Forum Document Number: ATM_Forum/97-0423

      *******************************************************************************

      Title: Selective Acknowledgements and UBR+ Drop Policies to Improve TCP/UBR
            Performance over Terrestrial and Satellite Networks.

      *******************************************************************************

      Abstract:
      We  study the performance of Selective Acknowledgements with TCP over
      the UBR+ service category. We examine various UBR+ drop policies, TCP
      mechanisms and network configurations to recommend optimal parameters
      for  TCP  over  UBR.  We  discuss  various  TCP  congestion   control
      mechanisms  compare  their  performance  for LAN and WAN networks. We
      describe the effect of satellite delays on TCP performance  over  UBR
      and  present  simulation results for LAN, WAN and satellite networks.
      SACK TCP improves the performance of TCP  over  UBR,  especially  for
      large  delay networks.  Intelligent drop policies at the switches are
      an important factor for good performance in local area networks.

      ********************************************************************************
      Source:
      Rohit Goyal, Raj Jain, Shiv Kalyanaraman, Sonia Fahmy, Bobby Vandalore,
      Xiangrong Cai
      Department of CIS, The Ohio State University (and NASA)
      395 Dreese lab, 2015 Neil Ave,
      Columbus, OH 43210-1277
      Phone: 614-292-3989, Fax: 614-292-2911, Email: -goyal,jain"@cse.wustl.edu

      Seong-Cheol Kim                              Sastri Kota
      Samsung Electronics Co. Ltd.                  Lockheed Martin Telecommunications
      Chung-Ang Newspaper Bldg.                     1272 Borregas Avenue,
      8-2, Karak-Dong, Songpa-Ku                    Bldg B/551 O/GB - 70
      Seoul, Korea 138-160                         Sunnyvale, CA 94089
      Email: kimsc@metro.telecom.samsung.co.kr        Email: sastri.kota@lmco.com

      *******************************************************************************

      Date: April 1997

      *******************************************************************************

      Distribution: ATM Forum Technical Working Group Members (AF-TM)

      *******************************************************************************

      Notice: This contribution has been prepared to assist the ATM  Forum.
      It  is  offered  to  the Forum as a basis for discussion and is not a
      binding  proposal  on  the  part   of   any   of   the   contributing
      organizations.  The  statements  are  subject  to  change in form and
      content  after  further  study.   Specifi-  cally,  the  contributors
      reserve the right to add to, amend or modify the statements contained
      herein.
      *******************************************************************************

      A postscript version of this contribution including all  figures  and
      tables  has been uploaded to the ATM forun ftp server in the incoming
      directory. It may be moved from there to  the  atm97  directory.  The
      postscript version is also available on our site as:

      ftp://netlab.wustl.edu/pub/jain/atmf/atm97-0423.ps

      or

      ftp://netlab.wustl.edu/pub/jain/atmf/atm97-0423.zip

      ***********************************************************************

      1   Introduction

      The  Unspecified Bit Rate (UBR) service in ATM networks does not have
      any congestion control mechanisms [2]. The basic UBR service  employs
      a  tail  drop  policy  where  cells  are  drop  when the swith buffer
      overflows. As a result, TCP connections using  ATM-UBR  service  with
      limited switch buffers experience low throughput [3, 4, 5, 9, 13]. In
      our previous paper [9] we analyzed several enhancements to  UBR,  and
      showed  that  these  enhancements  can improve the performance of TCP
      slow start and congestion avoidance  algorithms  over  UBR.  We  also
      analyzed the performance of Reno TCP over UBR and UBR+, and concluded
      that fast retransmit and recovery hurts the performance of TCP in the
      presence   of   congested  losses  over  wide  area  networks.   This
      contribution  discusses  the  performance  of  TCP   with   selective
      acknowledgements  (SACK  TCP)  over  the  UBR+  service  category. We
      compare the performance of SACK TCP with slow  start  and  Reno  TCP.
      Simulation  results of the performance the SACK TCP with several UBR+
      drop policies over terrestrial and  satellite  links  are  presented.
      Section  2  describes the TCP congestion control mechanisms including
      the Selective Acknowledgements  (SACK)  option  for  TCP.  Section  3
      describes  our  implementation of SACK TCP and Section 4 analyzes the
      features and retransmission properties of SACK TCP. We also  describe
      a  change to TCP's fast retransmit and recovery, proposed in [22] and
      named "New Reno" in [18]. Section 7 discusses some issues relevant to
      the performance of TCP over satellite networks.  The remainder of the
      contribution presents simulation results comparing the performance of
      various TCP congestion avoidance methods.

      2   TCP Congestion Control

      TCP's  congestion  control mechanisms are described in detail in [15,
      21]. TCP uses a window based flow control policy. The variable RCVWND
      is  used  as  a  measure  of  the  receiver's buffer capacity. When a
      destination TCP host receives a segment, it sends an  acknowledgement
      (ACK)  for the next expected segment. TCP congestion control is build
      on this window based flow control. The following subsections describe
      the various TCP congestion control policies.

      2.1  Slow Start and Congestion Avoidance

      The  sender  TCP maintains a variable called congestion window (CWND)
      to measure the network capacity. The number of unacknowledged packets
      in  the  network  is  limited  to  CWND or RCVWND whichever is lower.
      Initially, CWND is set to one segment and it increases by one on  the
      receipt  of  each new ACK until it reaches a maximum (typically 65536
      bytes). It can be shown that CWND doubles every round trip  time  and
      this  corresponds  to an exponential increase in the CWND every round
      trip time [15].  If a segment is lost, the receiver  sends  duplicate
      ACKs  on  receiving  subsequent  segments.  The  sender  maintains  a
      retransmission  timeout   for   the   last   unacknowledged   packet.
      Congestion  is  indicated  by  the  expiration  of the retransmission
      timeout. When the timer expires, the sender saves half the CWND in  a
      variable called SSTHRESH, and sets CWND to 1 segment. The sender then
      retransmits  segments  starting  from  the  lost  segment.  CWND   is
      increased  by  one  on  the  receipt of each new ACK until it reaches
      SSTHRESH. This is called the slow  start  phase.   After  that,  CWND
      increases  by  one  segment  every round trip time. This results in a
      linear increase of CWND every round trip time.  Figure  1  shows  the
      slow  start  and  congestion  avoidance  phases  for  at  typical TCP
      connection.

      2.2  Fast Retransmit and Recovery

      Current TCP implementations use a coarse granularity  (typically  500
      ms)  timer  for  the  retransmission  timeout.  As  a  result, during
      congestion, the TCP connection can lose much  time  waiting  for  the
      timeout. In Figure 1, the horizontal CWND line shows the time lost in
      waiting for a timeout to occur. During this  time,  the  TCP  neither
      sends  new  packets  nor retransmits lost packets. Moreover, once the
      timeout occurs, the CWND is set to  1  segment,  and  the  connection
      takes  several  round  trips  to efficiently utilize the network. TCP
      Reno implements the fast  retransmit  and  recovery  algorithms  that
      enable the connection to quickly recover from isolated segment losses
      [21].  When a TCP receives an out-of-order  segment,  it  immediately
      sends a duplicate acknowledgement to the sender.

                Figure 1: TCP Slow Start and Congestion Avoidance

      When  the sender receives three duplicate ACKs, it concludes that the
      segment  indicated  by  the  ACKs  has  been  lost,  and  immediately
      retransmits  the  lost  segment. The sender then reduces CWND to half
      (plus 3 segments) and also saves half  the  original  CWND  value  in
      SSTHRESH.  Now for each subsequent duplicate ACK, the sender inflates
      CWND by one and tries to send a new segment. Effectively, the  sender
      waits  for  half  a  round  trip  before sending one segment for each
      subsequent duplicate  ACK  it  receives.  As  a  result,  the  sender
      maintains  the  network  pipe  at half of its capacity at the time of
      fast retransmit.  Approximately one  round  trip  after  the  missing
      segment  is retransmitted, its ACK is received (assuming the retrans-
      mitted segment was not lost). At this time, instead of  setting  CWND
      to  one  segment  and  proceeding to do slow start until CWND reaches
      SSTHRESH, the TCP sets CWND to SSTHRESH,  and  then  does  congestion
      avoidance.  This is called the fast recovery algorithm.

      2.3  A Modification to Fast Retransmit and Recovery: TCP New Reno

      It  has  been  known that fast retransmit and recovery cannot recover
      from multiple packet  losses.  Figure  2  shows  a  case  when  three
      consecutive  packets  are  lost  from a window, the sender TCP incurs
      fast retransmit twice and then times out. At that time,  SSTHRESH  is
      set  to  one-eighth  of the original congestion window value (CWND in
      the figure) As a result, the exponential phase  lasts  a  very  short
      time,  and  the  linear increase begins at a very small window. Thus,
      the TCP sends at a very low rate  and  loses  much  throughput.   The
      "fast-retransmit  phase"  was introduced in [22], in which the sender
      remembers the highest sequence number sent (RECOVER)  when  the  fast
      retransmit  is first triggered. After the first unacknowledged packet
      is  retransmitted,  the  sender  follows  the  usual  fast   recovery
      algorithm  and  inflates  the  CWND  by one for each duplicate ACK it
      receives.  When  the  sender  receives  an  acknowledgement  for  the
      retransmitted  packet, it checks if the ACK acknowledges all segments
      including RECOVER. If so, the ACK is a new ACK, and the sender  exits
      the  fast  retransmit-recovery  phase,  sets its CWND to SSTHRESH and
      starts a linear increase. If on the other hand, the ACK is a  partial
      ACK, i.e., it acknowledges the retransmitted segment, and only a part
      of  the  segments  before  RECOVER,  then  the   sender   immediately
      retransmits  the  next expected segment as indicated by the ACK. This
      continues until all  segments  including  RECOVER  are  acknowledged.
      This  mechanism  ensures  that the sender will recover from N segment
      losses in N round trips.  As a result, the sender  can  recover  from
      multiple  packet  losses without having to time out. In case of small
      propagation

                   Figure 2: TCP Fast Retransmit and Recovery

      delays,  and  coarse  timer   granularities,   this   mechanism   can
      effectively  improve TCP throughput over vanilla TCP.  Figure 3 shows
      the congestion window graph of a TCP connection for three  contiguous
      segment losses. The TCP retransmits one segment every round trip time
      (shown by the CWND going down to  1  segment)  until  a  new  ACK  is
      received.

      2.4  Selective Acknowledgements

      TCP  with  Selective Acknowledgements (SACK TCP) has been proposed to
      efficiently recover from  multiple  segment  losses  [20].  SACK  TCP
      acknowledgement  contain  additional  information  about the segments
      have been received by the destination. When the destination  receives
      out-of-order  segments,  it  sends  duplicate  ACKs  (SACKs) acknowl-
      edging the out-of-order segments it has received. From  these  SACKs,
      the  sending  TCP  can reconstruct information about the segments not
      received at the destination. When the sender receives three duplicate
      ACKs, it retransmits the first lost segment, and inflates its CWND by
      one for each duplicate ACK it receives. This behavior is the same  as
      Reno  TCP. However, when the sender, in response to duplicate ACKs is
      allowed by the window to send a segment, it uses the SACK information
      to retransmit lost segments before sending new segments. As a result,
      the sender can recover from multiple dropped segments  in  about  one
      round  trip. Figure 4 shows the congestion window graph of a SACK TCP
      recovering from segment losses. During the time when  the  congestion
      window  is inflating (after fast retransmit has incurred), the TCP is
      sending missing packets before any new packets.

      3   SACK TCP Implementation

      In this subsection, we describe our implementation of  SACK  TCP  and
      some  properties  of  SACK.  Our  implementation is based on the SACK
      implementation described  in  [20,  18,  19].   The  SACK  option  is
      negotiated  in  the SYN segments during TCP connection establishment.
      The SACK information is sent with an ACK by the data receiver to  the
      data   sender  to  inform  the  sender  of  out-of-sequence  segments
      received.  The format of the SACK packet has been proposed  in  [20].
      The  SACK  option  is sent whenever out of sequence data is received.
      All duplicate ACK's contain the SACK option. The  option  contains  a
      list of some of the contiguous blocks of data already received by the
      receiver. Each data block is identified by the sequence number of the
      first  byte  in  the  block  (the  left  edge  of the block), and the
      sequence number of the byte immediately after the last byte


                  Figure 3: TCP with the fast retransmit phase

      of the block. Because of the limit on the maximum TCP header size, at
      most  three  SACK  blocks  can  be specified in one SACK packet.  The
      receiver keeps track of all the out-of-sequence data blocks received.
      When  the  receiver  generates a SACK, the first SACK block specifies
      the block of data formed by the most recently received data  segment.
      This  ensures  that  the  receiver  provides  the  most  up  to  date
      information to the sender. After the first SACK block, the  remaining
      blocks  can be filled in any order.  The sender also keeps a table of
      all the segments sent but not ACKed. When a segment is  sent,  it  is
      entered into the table. When the sender receives an ACK with the SACK
      option, it marks all the segments specified in the SACK option blocks
      as SACKed. The entries for each segment remain in the table until the
      segment is ACKed. The  remaining  behavior  of  the  sender  is  very
      similar  to  Reno  implementations with the modification suggested in
      Section 2.3 1. When the sender  receives  three  duplicate  ACKs,  it
      retransmits   the   first  unacknowledged  packet.  During  the  fast
      retransmit phase, when the sender is sending  one  segment  for  each
      duplicate ACK received, it first tries to retransmit the holes in the
      SACK  blocks  before  sending  any  new  segments.  When  the  sender
      retransmits  a  segment, it marks the segment as retransmitted in the
      table. If a retransmitted segment is lost, the sender times  out  and
      performs  slow  start.  When  a timeout occurs, the sender resets the
      SACK bits in the table.  During the fast retransmit phase, the sender
      maintains a variable PIPE that indicates how many bytes are currently
      in the network pipe. When the third duplicate ACK is  received,  PIPE
      is  set  to  the value of CWND and CWND is reduced by half. For every
      subsequent duplicate ACK received, PIPE is decremented by one segment
      because  the  ACK denotes a packet leaving the pipe. The sender sends
      data (new or retransmitted) only when PIPE is less  than  CWND.  This
      implementation is equivalent to inflating the CWND by one segment for
      every  duplicate  ACK  and  sending  segments  if   the   number   of
      unacknowledged  bytes is less than the congestion window value.  When
      a segment is sent, PIPE is incremented by one. When a partial ACK  is
      received, PIPE is decremented by two.  The first decrement is because
      the partial ACK represents a retransmitted segment leaving the  pipe.
      The  second  decrement  is done because the original segment that was
      lost, and had not been accounted for, is now actually  considered  to
      be lost.  __________________
        1It is not clear to us whether the modification proposed in [22] is
      necessary with the SACK option. The  modification  is  under  further
      study.


                  Figure 4: SACK TCP Recovery from packet loss

      4   TCP: Analysis of Recovery Behavior

      In  this  section,  we  discuss  the  behavior  of SACK TCP. We first
      analyze the properties of Reno TCP and then lead into the  discussion
      of  SACK  TCP.  Vanilla  TCP without fast retransmit and recovery (we
      refer to TCP with only slow start and congestion avoidance as vanilla
      TCP), will be used as the basis for comparison. Every time congestion
      occurs, TCP tries to reduce its CWND window by half and  then  enters
      congestion  avoidance.  In the case of vanilla TCP, when a segment is
      lost, a timeout occurs, and the  congestion  window  reduces  to  one
      segment.  From there, it takes about log2(CWND=(2 x TCP segment size)
      RTTs for CWND to reach the target value. This behavior is  unaffected
      by the number of segments lost from a particular window.

      4.1  Reno TCP

      When a single segment is lost from a window, Reno TCP recovers within
      approximately one RTT of knowing about the loss or two RTTs after the
      lost  packet was first sent. The sender receives three duplicate ACKS
      about one RTT after the dropped packet was sent. It then  retransmits
      the  lost  packet.  For  the  next  round  trip,  the sender receives
      duplicate ACKs for the whole window of packets sent  after  the  lost
      packet.  The  sender  waits  for half the window and then transmits a
      half window worth of new packets. All of this  takes  about  one  RTT
      after   which  the  sender  receives  a  new  ACK  acknowledging  the
      retransmitted  packet  and  the  entire  window   sent   before   the
      retransmission.    CWND  is  set  to  half  its  original  value  and
      congestion  avoidance  is  performed.   When  multiple  packets   are
      dropped,  Reno  TCP  cannot  recover and may result in a timeout. The
      fast retransmit phase modification can recover from  multiple  packet
      losses by retransmitting a single packet every round trip time.

      4.2  SACK TCP

      In  this  subsection  we show that SACK TCP can recover from multiple
      packet losses more efficiently than Reno  or  vanilla  TCP.   Suppose
      that  at  the instant when the sender learns of the first packet loss
      (from three duplicate ACKs), the value of the  congestion  window  is
      CWND.  Thus,  the  sender  has  CWND  bytes  of  data  waiting  to be
      acknowledged. Suppose also that the network drops  a  block  of  data
      which  is  CWND/n  bytes  long (This will typically result in several
      segments being lost). After one RTT  of  sending  the  first  dropped
      segment,  the  sender receives three duplicate ACKs for this segment.
      It retransmits the segment, and sets PIPE to CWND - 3, and sets  CWND
      to CWND/2. For each duplicate ACK received, PIPE is decremented by 1.
      When PIPE reaches  CWND,  then  for  each  subsequent  duplicate  ACK
      received, another segment can be sent. All the ACKs from the previous
      window take 1 RTT to return. For half RTT nothing is sent (since PIPE
      >  CWND).  For  the next half RTT, if CWND/n bytes were dropped, then
      only CWND/2 - CWND/n bytes (of retransmitted or new segments) can  be
      sent. Thus, all the dropped segments can be retransmitted in 1 RTT if

                             CWND=2 - CWND=n  CWND=n

      i.e., n 4. Therefore, for SACK TCP to be able to retransmit all  lost
      segments in one RTT, the network can drop at most CWND/4 bytes from a
      window of CWND.  Now, we calculate the maximum amount  of  data  that
      can  be  dropped  for SACK TCP to be able to retransmit everything in
      two RTTs. Suppose again that CWND/n bytes are dropped from  a  window
      of  size  CWND. Then, in the first RTT from receiving the 3 duplicate
      ACKs, the sender can retransmit upto CWND/2 - CWND/n  bytes.  In  the
      second  RTT, the sender can retransmit 2(CWND/2 - CWND/n) bytes. This
      is because for each retransmitted  segment  in  the  first  RTT,  the
      sender receives a partial ACK that indicates that the next segment is
      missing. As a result, PIPE is decremented by 2, and  the  sender  can
      send  2 more segments (both of which could be retransmitted segments)
      for each partial ACK it receives. Thus, all the dropped segments  can
      be retransmitted in 2 RTTs if

                  CWND=2 - CWND=n + 2(CWND=2 - CWND=n)  CWND=n

      i.e.  n  8=3.  This  means that at most 3xCWND/8 bytes can be dropped
      from a window of size CWND for SACK TCP to be able to  recover  in  2
      RTTs.  Generalizing the above argument, we have the following result:
      The number of RTTs needed by SACK TCP  to  recover  from  a  loss  of
      CWND/n is at most log (n/(n-2)) for n > 2. If more than half the CWND
      is dropped, then there will not be enough duplicate ACKs for PIPE  to
      become  large  enough to transmit any segments in the first RTT. Only
      the first dropped segment will be retransmitted on the receipt of the
      third   duplicate   ACK.    In  the  second  RTT,  the  ACK  for  the
      retransmitted packet will be received. This is a partial ACK and will
      result  in PIPE being decremented by 2 so that 2 packets can be sent.
      As a result, PIPE will double every RTT, and  SACK  will  recover  no
      slower  than  slow  start  [18, 19]. SACK would still be advantageous
      because timeout would be still avoided unless a retransmitted  packet
      were dropped.

      5   The ATM-UBR+ Service

      The  basic  UBR  service  can be enhanced by implementing intelligent
      drop policies at the switches. A comparative analysis of various drop
      policies  on  the  performance  of  Vanilla  and Reno TCP over UBR is
      presented in [9]. Section 5.3 briefly summarizes the results  of  our
      earlier  work.  This section briefly describes the drop policies, and
      discusses the simulation results  of  TCP  over  satellite  UBR  with
      intelligent cell drop.

      5.1  Early Packet Discard

      The  Early  Packet Discard policy [1] maintains a threshold R, in the
      switch buffer. When the buffer occupancy  exceeds  R,  then  all  new
      incoming packets are dropped. Partially received packets are accepted
      if possible. [9] shows that EPD improves the efficiency of  TCP  over
      UBR  but  does  not  improve  fairness.  The  effect  of  EPD is less
      pronounced for large delay-bandwidth networks. In satellite networks,
      EPD has little or no effect in the performance of TCP over UBR.

      5.2  Selective Packet Drop and Fair Buffer Allocation

      These  schemes  use  per-VC accounting to maintain the current buffer
      utilization of each UBR VC. A fair allocation is calculated for  each
      VC, and if the VC's buffer occupancy exceeds its fair allocation, its
      subsequent incoming  packet  is  dropped.  Both  schemes  maintain  a
      threshold  R,  as a fraction of the buffer capacity K. When the total
      buffer occupancy exceeds RxK, new packets are  dropped  depending  on
      the  V  Ci's  buffer  occupancy (Yi). In the selective drop scheme, a
      VC's packet is dropped if

                           (X > R) AND (Yix Na=X > Z)

      where Na is the number of VCs with at least one cell the buffer,  and
      Z  is  another  threshold  parameter  (0  <  Z  1)  used to scale the
      effective drop threshold.  The Fair Buffer Allocation proposed in [8]
      is similar to Selective Drop and uses the following formula:

                 (X > R) AND (Yix Na=X > Z x ((K - R)=(X - R)))

      5.3  Performance of TCP over UBR+: Summary of Earlier Results

      In our earlier work [9, 10] we discussed the following results:


        o  For  N TCP connections, the switch requires a buffer size of the
      sum of the receiver windows of the TCP connections.

        o With  limited  buffers,  TCP  over  plain  UBR  results  in  poor
      performance.

        o  TCP  performance  over  UBR  can be improved by intelligent drop
      policies like Early Packet Discard, Selective Drop  and  Fair  Buffer
      Allocation.

        o  TCP  fast  retransmit  and recover improves TCP performance over
      LANs, and actually degrades performance over WANs in the presence  of
      congestons.


      6   Simulation Results with SACK TCP over UBR+

      This   section   presents  the  simulation  results  of  the  various
      enhancements of TCP and UBR presented in the previous sections.

      6.1  The Simulation Model

      All simulations use the N source configuration shown in Figure 5. All
      sources are identical and infinite TCP sources.  The TCP layer always
      sends a segment as long  as  it  is  permitted  by  the  TCP  window.
      Moreover,  traffic  is unidirec- tional so that only the sources send
      data. The destinations only send ACKs. The performance  of  TCP  over
      UBR  with  bidirectional  traffic  is  a  topic of further study. The
      delayed acknowledgement timer is deactivated, and the receiver  sends
      an  ACK  as  soon  as  it  receives  a  segment.   Link  delays are 5
      microseconds for  LAN  configurations  and  5  milleseconds  for  WAN
      configurations.  This results in a round trip propagation delay of 30
      microseconds for LANs and 30 milliseconds for WANs respectively.  The
      TCP segment size is set to 512 bytes. This is the common segment size
      used in most current TCP implementa- tions. Larger segment sizes have
      been  reported  to  produce  higher  TCP  throughputs.  The effect of
      segment size is a topic of further study. For the LAN configurations,
      the  TCP  maximum  window size is limited by a receiver window of 64K
      bytes. This is the default value specified for  TCP  implementations.
      For  WAN  configurations,  a window of 64K bytes is not sufficient to
      achieve 100% utilization. We thus use the window  scaling  option  to
      specify  a  maximum  window  size  of  600000  Bytes.  This window is
      sufficient to provide full utilization with  each  TCP  source.   All
      link  bandwidths are 155.52 Mbps, and Peak Cell Rate at the ATM layer
      is 155.52 Mbps. The duration of the simulation is 10 seconds for LANs
      and  20  seconds  for  WANs.  This  allows enough round trips for the
      simulation to give stable results.  The configurations for  satellite
      networks are discussed in Section 7.

                    Figure 5: The N source TCP configuration

      6.2  Performance Metrics

      The performance of the simulation is measured at the TCP layer by the
      Efficiency and Fairness as defined below.

      Efficiency= (Sum of TCP throughputs)=(Maximum possible TCP throughput)

      TCP throughput is measured at the destination TCP layer as the  total
      number   of  bytes  delivered  to  the  application  divided  by  the
      simulation time. This is divided by the maximum  possible  throughput
      attainable  by  TCP.  With  512 bytes of TCP data in each segment, 20
      bytes of TCP header, 20 bytes of IP header, 8 bytes  of  LLC  header,
      and 8 bytes of AAL5 trailer are added. This results in a net possible
      throughput of 80.5% of the ATM layer data rate  =  125.2  Mbps  on  a
      155.52 Mbps link.

                         Fairness Index= (xi)2= (n xx2i)

      Where  xi=  throughput  of the ith TCP source, and n is the number of
      TCP sources

      6.3  Simulation Results

      We performed simulations for the LAN and WAN configurations for three
      drop  policies  - tail drop, Early Packet Discard and Selective Drop.
      For LANs, we used buffer sizes of 1000  and  3000  cells.  These  are
      representative  of  the typical buffer sizes in current switches. For
      WANs, we chose buffer sizes of approximately one and three times  the
      bandwidth  -  round  trip  delay  product.  Tables  1  and 2 show the
      efficiency and fairness values of SACK TCP  with  various  UBR+  drop
      policies. Several observations can be made from these tables:

        o For most cases, for a given drop policy, SACK TCP provides higher
      efficiency than either the corresponding drop policy  in  vanilla  or
      Reno  TCP.  This  confirms  the intuition provided by the analysis of
      SACK that SACK recovers at least as fast as slow start when  multiple
      packets  are lost. In fact, for most cases, SACK recovers faster than
      both fast retransmit/recovery and slow start algorithms.

        o For LANs, the effect of drop policies is very important  and  can
      dominate  the effect of SACK. For UBR with tail drop, SACK provides a
      significant improvement over Vanilla and Reno TCPs. However, as


      Table 1: SACK TCP over UBR+ : Efficiency
      ____________________________________________
      Config-Number of BufferUBR   EPD  Selective
      uration   Sources(cells)             Drop
      ____________________________________________
      LAN         5   1000  0.76  0.85    0.94
      LAN         5   3000  0.98  0.97    0.98
      LAN        15   1000  0.57  0.78    0.91
      LAN        15   3000  0.86  0.94    0.97
      ____________________________________________
      SACK Column Average     0.79  0.89    0.95
      Vanilla TCP Average     0.34  0.67    0.84
      Reno TCP Average        0.69  0.97    0.97
      ____________________________________________
      WAN          5  12000  0.90  0.88    0.95
      WAN          5  36000  0.97  0.99    1.00
      WAN         15  12000  0.93  0.80    0.88
      WAN         15  36000  0.95  0.95    0.98
      ____________________________________________
      SACK Column Average     0.94  0.91    0.95
      Vanilla TCP Average     0.91  0.9     0.91
      Reno TCP Average        0.78  0.86    0.81
      ____________________________________________

          the drop policies get  more  sophisticated,  the  effect  of  TCP
      congestion mechanism is less pronounced. This is because, the typical
      LAN switch buffer sizes are small compared to the default TCP maximum
      window  of  64K  bytes,  and  so  buffer  management  becomes  a very
      important factor. Moreover, the degraded performance of SACK  in  few
      cases can be attributed to excessive timeout due to the retransmitted
      packets being lost. In this case SACK loses several  round  trips  in
      retransmitting  parts  of  the  lost  data  and then times out. After
      timeout, much of the data is transmitted again, and this  results  in
      wasted  throughput.  This  result reinforces the need for a good drop
      policy for TCP over UBR.

        o The throughput improvement provided by SACK is  more  significant
      for  wide  area  networks. When propagation delay is large, a timeout
      results in the loss of a significant amount of time during slow start
      from a window of one segment. With Reno TCP (with fast retransmit and
      recovery), performance  is  further  degraded  (for  multiple  packet
      losses)  because  timeout  occurs at a much lower window than vanilla
      TCP. With SACK TCP, a timeout is avoided at many times, and  recovery
      is  complete  within  a  short  number of roundtrips. Even if timeout
      occurs, the recovery is as fast as slow start but a little  time  may
      be lost in the earlier retransmission.

        o  The  performance of SACK TCP can be improved by intelligent drop
      policies like EPD and Selective drop. This  is  consistent  with  our
      earlier  results  in  [9].  Thus,  we recommend that intelligent drop
      policies be used in UBR service.

        o The fairness values for selective  drop  are  comparable  to  the
      values  with the other TCP versions. Thus, SACK TCP does not hurt the
      fairness in TCP connections with  an  intelligent  drop  policy  like
      selective  drop.  The  fairness  of tail drop and EPD are sometimes a
      little lower for  SACK  TCP.  This  is  again  because  retransmitted
      packets  are lost and some connections time out. Connections which do
      not time out do not have to go  through  slow  start,  and  thus  can
      utilize more of the link capacity. The fairness among a set of hybrid
      TCP connections is a topic of further study.

      7   Effects of Satellite Delays on TCP over UBR+

      Since TCP congestion control is inherently limited by the round  trip
      time, long delay paths have significant effects on the performance of
      TCP  over  ATM.  A  large  delay-bandwidth  link  must  be   utilized
      efficiently  to be cost effective. This section discusses some of the
      issues that arise in the congestion control of large  delay-bandwidth
      links. Simulation


      Table 2: SACK TCP over UBR+ : Fairness
      ____________________________________________
      Config-Number of BufferUBR   EPD  Selective
      uration   Sources(cells)             Drop
      ____________________________________________
      LAN         5   1000  0.22  0.88    0.98
      LAN         5   3000  0.92  0.97    0.96
      LAN        15   1000  0.29  0.63    0.95
      LAN        15   3000  0.74  0.88    0.98
      ____________________________________________
      SACK Column Average     0.54  0.84    0.97
      Vanilla TCP Average     0.69  0.69    0.92
      Reno TCP Average        0.71  0.98    0.99
      ____________________________________________
      WAN          5  12000  0.96  0.98    0.95
      WAN          5  36000  1.00  0.94    0.99
      WAN         15  12000  0.99  0.99    0.99
      WAN         15  36000  0.98  0.98    0.96
      ____________________________________________
      Column Average          0.98  0.97    0.97
      Vanilla TCP Average     0.76  0.95    0.94
      Reno TCP Average        0.90  0.97    0.99
      ____________________________________________

      results  of  TCP  over UBR+ with satellite delays are also presented.
      Related results in TCP performance over satellite  are  available  in
      [23].

      7.1  Window Scale Factor

      The default TCP maximum window size is 65535 bytes. For a 155.52 Mbps
      ATM satellite link (with a  propagation  RTT  of  about  550  ms),  a
      congestion  window  of  about  8.7M bytes is needed to fill the whole
      pipe. As a result, the TCP  window  scale  factor  must  be  used  to
      provide  high link utilization. In our simulations, we use a receiver
      window of 34000 and a window scale factor of 8 to achieve the desired
      window size.

      7.2  Large Congestion Window and the congestion avoidance phase

      During  the  congestion  avoidance  phase,  CWND  is incremented by 1
      segment  every  RTT.  Most   TCP   implementa-   tions   follow   the
      recommendations in [15], and increment by CWND by 1/CWND segments for
      each ACK received during the  congestion  avoidance.  Since  CWND  is
      maintained  in  bytes,  this  increment translates to an increment of
      MSS*MSS/CWND bytes on the receipt of each new ACK. All operations are
      done  on  integers,  and this expression avoids the need for floating
      point calculations. However, in the  case  of  large  delay-bandwidth
      paths where the window scale factor is used, MSS*MSS may be less than
      CWND. For example, with MSS = 512 bytes, MSS*MSS = 262144,  and  when
      CWND  is  larger  than this value, the expression MSS*MSS/CWND yields
      zero. As a result, CWND is  never  increases  during  the  congestion
      avoidance  phase.   There are several solutions to this problem.  The
      most intuitive is to  use  floating  point  calculations.   This  in-
      creases  the  processing  overhead  of  the  TCP  layer  and  is thus
      undesirable.  A second option is to not increment CWND for each  ACK,
      but  to wait for N ACKs such that N*MSS*MSS > CWND and then increment
      CWND by  N*MSS*MSS/CWND.  We  call  this  the  ACK  counting  option.
      Another  option  would  be  to increase MSS to a larger value so that
      MSS*MSS would be larger than CWND at all times. The MSS size  of  the
      connection  is  limited  by  the smallest MTU of the connection. Most
      future TCPs are expected to use Path-MTU discovery to  find  out  the
      largest  possible  MSS that can be used. This value of MSS may or may
      not be sufficient to ensure the  correct  functioning  of  congestion
      avoidance  without  ACK  counting. Moreover, if TCP is running over a
      connectionless network layer like IP, the MTU may change  during  the
      lifetime  of  a  connection and segments may be fragmented. In a cell
      based network like ATM,  TCP  could  used  arbitrary  sized  segments
      without  worrying about fragmentation. The value of MSS can also have
      an effect on the TCP througput, and larger  MSS  values  can  produce
      higher throughput. The effect of MSS on TCP over satellite is a topic
      of current research.


      Table 3: TCP over UBR+ with Satellite Delays: Efficiency
      ____________________________________________
      TCP  Number of Buffer UBR  EPD   Selective
      Sources (cells)             Drop
      ____________________________________________
      SACK         5  200000 0.86   0.6    0.72
      SACK         5  600000 0.99  1.00    1.00
      Reno        5  200000 0.84  0.12    0.12
      Reno        5  600000 0.30  0.19    0.22
      Vanilla       5  200000 0.70  0.73    0.73
      Vanilla       5  600000 0.88  0.81    0.82
      ___________________________________________

      Table 4: SACK TCP over UBR+ with Satellite Delays: Fairness
      ____________________________________________
      Config-Number of  BufferUBR  EPD   Selective
      uration   Sources (cells)             Drop
      ____________________________________________
      SACK         5  200000  1.00 0.83    0.94
      SACK         5  600000  1.00 1.00    1.00
      Reno        5  200000  0.96 0.97    0.97
      Reno        5  600000  1.00 1.00    1.00
      Vanilla       5  200000  1.00 0.87    0.89
      Vanilla       5  600000  1.00 1.00    1.00
      ____________________________________________

      8 Simulation Results of TCP over UBR+ in Satellite networks

      The satellite simulation model is very similar to the model described
      in section 6.1. The differences are listed below:

        o The link between the two switches in Figure 5 is now a satellite
      link with a propagation delay of 275 ms. The links between the TCP
      sources and the switches are 1 km long. This results in a round trip
      propagation delay of over 550 ms.

        o The maximum value of the TCP receiver window is now 8704000
      bytes. This window size is sufficient to fill the 155.52 Mbps pipe.

        o The TCP maximum segment size is 9180 bytes. A larger value is used
      because most TCP connections over ATM with satellite delays are
      expected to use larger segment sizes.

        o The buffer sizes used in the switch are 200000 cells and 600000
      cells. These buffer sizes reflect buffers of about 1 RTT and 3 RTTs
      respectively.

        o The duration of simulation is 40 seconds.

      Tables 3 and 4 show the fairness and efficiency values for Satellite
      TCP over UBR+ with 5 TCP sources and buffer sizes of 200000 and 600000
      cells. Several observations can be made from the tables:

        o Selective acknowledgements significantly improve the performance
      of TCP over UBR+ over satellite networks. The efficiency and fairness
      values are typically higher for SACK than for Reno and vanilla
      TCP. This is because SACK often prevents the need for a timeout and
      can recover quickly from multiple packet losses.

        o Fast retransmit and recovery is detrimental to the performance of
      TCP over large delay-bandwidth links. The efficiency numbers for Reno
      TCP in table 3 are much lower than those of either SACK or Vanilla
      TCP. This reinforces the WAN results in table 1 for Reno TCP. Both the
      tables are also consistent with anal- ysis in Figure 2, and show that
      fast retransmit and recovery cannot recover from multiple losses in
      the same window.

        o Intelligent drop policies have little effect on the performance of
      TCP over UBR satellite networks. Again, these results are consistent
      with the WAN results in tables 1 and 2. The effect of intelligent drop
      policies is most significant in LANs, and the effect decreases in WANs
      and satellite networks. This is because LAN buffer sizes (1000 to 3000
      cells) are much smaller compared to the default TCP maximum window
      size of 65535 bytes. For WANs and satellite networks, the switch
      buffer sizes and the TCP maximum congestion window sizes are both of
      the order of the round trip delays. As a result, efficient buffer
      management becomes more important for LANs than WANs and satellite
      networks.

      9   Summary

      This paper describes the performance of SACK TCP over the ATM UBR
      service category. SACK TCP is seen to improve the performance of TCP
      over UBR. UBR+ drop policies are also essential to improving the
      performance of TCP over UBR. As a result, TCP performance over UBR can
      be improved by either improving TCP using selective acknowledgements,
      or by introducing intelligent buffer management policies at the
      switches.  Efficient burrer management has a more significant
      influence on LANs because of the limited buffer sizes in LAN switches
      compared to the TCP maximum window size. In WANs and satellite
      networks, the drop policies have a smaller impact because both the
      switch buffer sizes and the TCP windows are of the order of the
      bandwidth-delay product of the network.

      References

       [1]Allyn Romanov, Sally Floyd, "Dynamics of TCP Traffic over ATM
      Networks," IEEE JSAC, May 1995.

       [2]ATM Forum, "ATM Traffic Management Specification Version 4.0,"
      April 1996,
      ftp://ftp.atmforum.com/pub/approved-specs/af-tm-0056.000.ps

       [3]Chien Fang, Arthur Lin, "On TCP Performance of UBR with EPD and UB
      R-EPD with a Fair Buffer Allocation Scheme," ATM FORUM 95-1645,
      December 1995.

       [4]Hongqing Li, Kai-Yeung Siu, and Hong-Ti Tzeng, "TCP over ATM with
      ABR service versus UBR+EPD service," ATM FORUM 95-0718, June 1995.

       [5]H. Li, K.Y. Siu, H.T. Tzeng, C. Ikeda and H. Suzuki "TCP over ABR
      and UBR Services in ATM," Proc. IPCCC'96, March 1996.

       [6]Hongqing Li, Kai-Yeung Siu, Hong-Yi Tzeng, Brian Hang, Wai Yang, "
      Issues in TCP over ATM," ATM FORUM 95-0503, April 1995.

       [7]J. Jaffe, "Bottleneck Flow Control," IEEE Transactions on
      Communications, Vol. COM-29, No. 7, pp. 954-962.

       [8]Juha Heinanen, and Kalevi Kilkki, "A fair buffer allocation
      scheme," Unpublished Manuscript.

       [9]R. Goyal, R. Jain, S. Kalyanaraman, S. Fahmy and Seong-Cheol Kim,
      "UBR+: Improving Performance of TCP over ATM-UBR Service,"
      Proc. ICC'97, June 1997. 2

      [10]R. Goyal, R. Jain, S. Kalyanaraman and S. Fahmy, "Furhter Results
      on UBR+: Effect of Fast Retransmit and Recovery," ATM FORUM 96-1761,
      December 1996.

      [11]Shiv Kalyanaraman, Raj Jain, Sonia Fahmy, Rohit Goyal, Fang Lu and
      Saragur Srinidhi, "Performance of TCP/IP over ABR," Proc. IEEE
      Globecom'96, November 1996.

      [12]Shivkumar Kalyanaraman, Raj Jain, Rohit Goyal, Sonia Fahmy and
      Seong-Cheol Kim, "Performance of TCP over ABR on ATM backbone and with
      various VBR background traffic patterns," Proc. ICC'97, June 1997.

      [13]Stephen Keung, Kai-Yeung Siu, "Degradation in TCP Performance
      under Cell Loss," ATM FORUM 94-0490, April 1994.

        _____________________________________
         All our papers and ATM Forum contributions are available from http://www.cse.wustl.edu/"jain

      [14]Tim Dwight, "Guidelines for the Simulation of TCP/IP over ATM,"
      ATM FORUM 95-0077r1, March 1995.

      [15]V. Jacobson, "Congestion Avoidance and Control," Proceedings of
      the SIGCOMM'88 Symposium, pp. 314-32, August 1988.

      [16]V. Jacobson, R. Braden, "TCP Extensions for Long-Delay Paths,"
      Internet RFC 1072, October 1988.

      [17]V. Jacobson, R. Braden, D. Borman, "TCP Extensions for High
      Performance," Internet RFC 1323, May 1992.

      [18]Kevin Fall, Sally Floyd, "Simulation-based Comparisons of Tahoe,
      Reno, and SACK TCP,"

      [19]Sally Floyd, "Issues of TCP with SACK,"

      [20]M. Mathis, J. Madhavi, S. Floyd, A. Romanow, "TCP Selective
      Acknowledgement Options," Internet RFC 2018, October 1996.

      [21]W. Stevens, "TCP Slow Start, Congestion Avoidance, Fast
      Retransmit, and Fast Recovery Algorithms," Internet RFC 2001, January
      1997.

      [22]Janey C. Hoe, "Start-up Dynamics of TCP's Congestion Control and
      Avoidance Schemes," MS Thesis, Massachusetts Institute of Technology,
      June 1995.

      [23]Mark Allman, Chris Hayes, Hans Kruse, Shawn Ostermann, "TCP
      Performance over Satellite Links," Proc. 5th International Conference
      on Telecommunications Systems, 1997.