************************************************************************

      ATM Forum Document Number: ATM_Forum/97-0617

      ************************************************************************

      Title: Worst Case Buffer Requirements for TCP over ABR

      ************************************************************************

      Abstract:

      We extend our earlier study on the buffer   requirement  problem  for
      TCP  over  ABR.    Here,  a   worst case sceanrio  is  generated such
      that TCP sources send a  burst of data under  the    conditions  when
      the   sources  have  high  congestion   window the ACR  (allowed cell
      rate) rate for ABR is high.  We find that ABR  using  ERICA+   switch
      algorithm   can   maintain   the  queue  lengths  (hence  the  buffer
      requirements) bounded even for the worst case for a reasonable number
      of   sources.   We  present  analytical  arguments for the   expected
      queue length  and  simulation results for varying number  of  sources
      and parameter values.

      ************************************************************************

      Source:

      Bobby Vandalore, Shiv Kalyanaraman, Raj Jain, Rohit Goyal & Sonia Fahmy
      The Ohio State University (and NASA)
      Department of CIS
      Columbus, OH 43210-1277
      Contact Phone: 614-292-3989, Fax: 614-292-2911, Email: Jain@ACM.Org

      Pradeep Samudra
      Director, System Engineering
      Broadband Network Lab
      Samsung Telecom America, Inc.
      1130 E Arapaho
      Richardson, TX 75081
      Phone: (972) 761-7865
      email: psamudra@telecom.sna.samsung.com

      ************************************************************************

      Date: July 1997

      ************************************************************************

      Distribution: ATM Forum Technical Working Group Members (AF-TM)

      ************************************************************************
      Notice:
      This contribution has been prepared to assist the ATM  Forum.  It  is
      offered  to  the Forum as a basis for discussion and is not a binding
      proposal on the part of any of the  contributing  organizations.  The
      statements  are  subject  to change in form and content after further
      study. Specifically, the contributors reserve the right  to  add  to,
      amend     or     modify     the    statements    contained    herein.
      ************************************************************************

      A  postscript   version  of this  contribution including all  figures
      and tables has been uploaded to the  ATM Forum  ftp   server  in  the
      incoming  directory.   It   may  be   moved   from  there   to  atm97
      directory.   The postscript version is also available on our web page
      as:

            http://www.cse.wustl.edu/~jain/atmf/a97-0617.htm
      ************************************************************************

      1. Introduction:
      ----------------
      We extend our earlier studies [1,2,7,8] of buffer requirements of TCP
      over  ABR.  In  those studies, we had shown that to achieve zero loss
      ABR service requires switch buffering which is only a small  multiple
      of  round  trip  times  of  the feedback delay. The buffering depends
      heavily upon the switch scheme used.

      One of the issues related to buffer sizing is that it is possible for
      a  source to get a high ACR and keep it. As long as, it keeps sending
      a packet before 500 ms, the use-it or  lose-it  policy  will  not  be
      triggered.  The  source  can  then  use  the high ACR to dump a large
      amount of data suddenly.  The  effect  can  be  amplified  with  many
      sources  trying  to  do  the  same.   This  contribution studies this
      particular case.

      In this study we generate a worst case in which the TCP sources  have
      a high congestion  window  and all  ACR rates  are  high.  Under such
      a condition the TCP sources are made to  dump a huge  burst  of  data
      into  the network. We show  that even under  these  circumstances ABR
      using a good switch algorithm like  ERICA+  [3]  performs   well  and
      controls  the  queues.    In  our   previous simulations we  had used
      only upto 15  TCP sources. In  this study  we    present  results  of
      simulation  which have upto 200 TCP sources.

      2. Generating Worst Case TCP traffic
      ------------------------------------
      TCP has its own congestion  control  mechanism.  It  uses  the  "slow
      start" mechanism which is explained in detail in [4,5]. The amount of
      traffic which  TCP can send  into the  network  is dependent  on  TCP
      congestion  window  size and the ACR value.The maximum burst possible
      which TCP can send is equal  to the maximum  congestion window  size.
      The  TCP   takes  log(congestion   window) number of  round  trips to
      reach the maximum congestion window. In normal TCP activity when this
      maximum  congestion  window is  achieved, the ACRs  may not be  high.
      In order  to generate worst case conditions in which TCP  has maximum
      congestion window size and the ACR is high, we designed the following
      sceanrio.

      The configuration used is the  N  TCP  Sources  connected  to  N  TCP
      Destinations  via  two  switches  and  a bottleneck link (LINK1). The
      following figure shows the configuration used.

       --------                                              --------
      | Src 1  |\                                          /| Dest 1 |
      |        | \                                        / |        |
       --------   \     --------  LINK1  --------        /   --------
            .      ----|  Sw1   |-------|  Sw2   |-------       .
            .       ---|        |       |        |-------       .
            .      /    --------         --------        \      .
                  /                                       \
       --------  /                                         \--------
      | Src N  |/                                          | Dest N |
      |        |                                           |        |
       --------                                             --------

               Figure 1: The "N Source - N Destination" Configuration
               ------------------------------------------------------

      Initially  to build  up  the congestion window    each  source  sends
      one  segment  of  data  every  't'  milliseconds.  One  thousand such
      segments are sent so that congestion window reaches  a  maximum   for
      all the sources. The sources send   segments in a  staggered  manner,
      so that  not  all sources send  the segments  simultaneoulsy.    This
      is   done  so  that  the  TCP  data can be  sent without  overloading
      the network.  Since  the network  is not  overloaded, the  ACRs   are
      high  when  the   congestion window reaches  the  maximum values.  At
      this  point all the  sources synchronize and  send  a burst of   data
      (burst  size   equal  to maximum congestion window).  Since every 't'
      milliseconds, one segment is sent and one thousand segments are  sent
      by   each  source,  the  sychronization  time  is  just  after 1000*t
      milliseconds. This is  the worst case burst size which arises due  to
      the TCP sources.

      3. TCP Options and ERICA+ parameters
      ------------------------------------

      We use a TCP  maximum segment size (MSS)  of  512  and   1024  bytes.
      The MTU size used    by IP  is  generally   9180 bytes,  so  there is
      no segmentation caused by IP. Since  our  simulations  are  performed
      under  no  loss  conditions,  the results hold even for TCP with fast
      retransmit and recovery or TCP with SACK (Selective Acknowledgement).

      The TCP  data is encapsulated  over ATM  as  follows.  First,  a  set
      of  headers  and trailers are added to every TCP segment.  We have 20
      bytes of TCP header, 20 bytes of IP header, 8 bytes for  the  RFC1577
      LLC/SNAP encapsulation, and 8  bytes of AAL5 information,  a total of
      56 bytes. Hence, every MSS   of  512 bytes  becomes    568  bytes  of
      payload   for  transmission  over  ATM.  This  payload  with  padding
      requires 12 ATM cells of 48 data bytes  each.

      The ERICA+ algorithm  is  an  extension  of   ERICA  which  uses  the
      queueing  delay  as  an additional metric to  calculate the feedback.
      ERICA+ uses four parameters : a  target  queueing  delay  (T0  =  500
      microseconds),  two curve parameters (a =  1.15 and b =  1.05), and a
      factor  which limits the amount  of ABR  capacity  allocated to drain
      the   queues   (QDLF  =  0.5).  In  our   previous  study [6] we  had
      enhanced our ERICA+ algorithm to   handle variance in   traffic.   In
      our  current  simulations,   this  exponential  averaging   option is
      included  to handle  the   variances  in  the   traffic.  The  ERICA+
      algorithm  is   designed  to  handle  short  spikes  in  ACR, achieve
      fairness and correctly count bursty sources.

      4. Analytical predicition
      --------------------------

      In this section we derive the analytical value for the queue lengths.
      The  queue  length  is  given  in  two  formulae  for underloaded and
      overloaded networks.

      Qlen = N * floor(cwnd_max/48)     for N <= floor(1000*t/g) (forumla 1)

           = N * 367 * t  cells        for N  > floor(1000*t/g) (forumla 2)

      where
      N               - number of sources
      cwnd_max        - maximum congestion window size
      t               - time between consecutive segments
      g               - time between sources sending segments

      Initially,  for  a  small  number  of  sources,    the   network   is
      underloaded,  so  the  ACRs are high when the burst  occurs. When all
      the sources send the burst simultaneously, the  whole  burst  can  be
      sent into the network and this results in large switch queues.

      Let 'cells_per_mss' be  the  number  of  cells  required  to  send  a
      segement and 'mss' be the maximum segment size.

      The burst size due to one source = floor(cells_in_mss*(cwnd_max/mss))
      cells

      So if there are N sources, the expected queue length is

      Qlen = N * floor(cells_in_mss*(cwnd_max/mss)) cells

      Substituting cells_in_mss = ceiling(mss/48) in the above we get

      Qlen = N * floor(cwnd_max/48) cells

      Under    no  loss    condition,    the     TCP    congestion   window
      increases  exponentially   in  the "slow  start" mechanism   until it
      reaches the maximum value.  As   the number of  sources  increase the
      load of the network increases.  In the simulation every  source sends
      a segment of size  512   bytes  every  't'   milliseconds.  The  time
      between   two  sources  sending  the segment is  'g' microsecond.  At
      time  0, source 1 sends a segment, then  at  time   'g'  microsecond,
      source  2 sends, at time '2*g' microsecond source 3 sends and so  on.
      Also source-1 sends its second segement at time 't'  millisecond  and
      so on (See figure 2).

          Figure 2: Timing diagram: sources sending segments
          --------------------------------------------------

         <----- 't' milliseconds ----->
         --------------------------------------------------------------> Time
         ^   ^   ^                    ^   ^   ^
         |   |   |                    |   |   |
        S-1 S-2 S-3                  S-1 S-2 S-3

         <-->
          'g' microseconds

      Legend
      ------
      S-<i> - Source number 'i'.


      The network becomes  overloaded when  the  input    rate  is  greater
      than  the   output  rate.   In  the  simulation, the network will get
      overloaded if number of sources

       N >  (t ms / g microsecond)
         > 1000 * t/ g
         > floor(1000*t/g)

      (For t = 1 millisecond, g = 50 microsecond, N = 1000*1/50 = 20)

      Once the network is  overloaded the ACRs  are low  and  each of the N
      sources   gets  approximately  (1/N)th  of the  bandwidth.  Since the
      ACRs are low the burst which occurs after 1  second should  not  give
      rise  to  large  queues.   There  are still switch queues which occur
      initially when  the  network  has  still  not  realized  that  it  is
      overloaded.


      The TCP congestion window grows exponentially  whenever  it  receives
      an  acknowledgement.  Let  'd' be  the  length of the  links  in kms.
      The acknowledgement  arrives after round trip  time   of  30*(d/1000)
      ms.  But  by  this   time    the  source would   have  generated more
      segements to send. After each round trip, each source will send twice
      the  number  of  segments.   A larger  round trip can give  rise to a
      larger burst. The networks get overloaded when the number of segments
      sent just fills up the pipe.

      Let k be the number   of  round  trips  it   takes  to  overload  the
      network. Time  taken for transmitting one  cell is 2.72  microsecond.
      Hence, in 367*t.  Each segement gives rise to  'cells_in_mss'  cells.
      Therefore, the network gets overloaded when

              2^k* cells_in_mss = 367*t
      =>      k = log (367*t/cells_in_mss)
                     2

      For t = 1 millisecond. mss = 512

      =>      k = 4.934
      =>      k ~= 5

      (~= means approximately equal to)

      Hence, in this case after 6 (= k+1) round trips the network detects the
      overload  if N >  20.  The maximum queues should  occur  after 6 round
      trips for N > 20.

      The value of the maximum queue in an overloaded network can be derived
      as follows.

      The burst due to one source = 2^k * cells_in_mss = 367*t cells

      Hence the burst due to N sources = N*367*t cells

      For t = 1 millisecond, g = 50 microsecond, mss = 512 and cwnd_max=
      65536, we get

      Qlen = N*1365                       for N <= 20
           = N*367                        for N > 20

      5. Simulation results
      ---------------------

      We present the maximum  queues  at  the  bottleneck  switch  for  the
      simulations  where  the  number  of sources (N) varies from 2 to 200.
      The results  are  shown  in   Table   1.  The  same   information  in
      graphical   form   will  be  presented  at  the ATM Forum and will be
      available at our web site after the Forum meeting.

      The length  of  all the  links  are  1000  km,  giving rise   to   15
      ms  propagation delay from   source to destination.  The  round  trip
      time (RTT) is  30   millisecond.  The   feedback   delay  (the  delay
      between   the  source and  the bottleneck link)   is 10 ms.  A  round
      trip of  30 ms corresponds to 11029 cells. The other parameter values
      are  't'  =  1  millisecond, 'g' = 50 microsecond, 'mss' = 512 bytes,
      'cwnd_max' = 65536.

      The maximum queue length  for sources N   <= 20 was  at time   around
      1  second.    For N >    20, the maximum queue    was at time  around
      300 millisecond (which agrees with  the analytical  prediction).  The
      queue  lengths  increase    with the number of  sources  as expected.
      The queue lengths also  correspond to the burst  size   sent  by  the
      sources.  But,  this is true only when the number of sources are less
      than or equal to 20. For N  =  30,   there  is  sharp  decrease   the
      maximum  queue  length  and again it starts (for  N >  40) increasing
      but now  it increases at  a slower rate.


      From the  Table 1,  it  can   be   seen  that  for   N   <=  20,  the
      simulation  agrees   well  with the  analytical values.   For  N > 30
      initially  the queue lengths in the simulation are  higher  than  the
      analytical  value, but later it becomes lower than the expected queue
      length. This can be explained  as follows,  initially   the   traffic
      generated in each cycle can be sent without pending queue in the next
      cycle. But as the number of sources  increase   (N  >  80)  in   each
      cycle  there  are   some  pending  cells  which  get sent in the next
      cycles.  Because  of  this  the  network  detects   the  overload  at
      lesser  queue  lengths  than   the  analytical value. A more rigorous
      analytical derivation is necessary to take this into account.

      Table 1: Effect of number of sources
            Actual and Expected queue lengths

       =======================================
      | # TCP   |  Max Queue  |  Analytical   |
      | Sources |     size    |  Queue Size   |
      |         |    (cells)  |   (cells)     |
       ========================================
      |  2      |     1575    |     2730      |
      |  3      |     3149    |     4095      |
      |  5      |     6297    |     6825      |
      |  10     |     14131   |    13650      |
      |  20     |     29751   |    27300      |
      |  30     |     20068   |    11010      |
      |  40     |     19619   |    14680      |
      |  50     |     24162   |    18350      |
      |  60     |     28006   |    22020      |
      |  70     |     30109   |    25690      |
      |  80     |     31439   |    29360      |
      |  90     |     34530   |    33030      |
      |  100    |     38088   |    36700      |
      |  110    |     43672   |    40370      |
      |  120    |     44939   |    44040      |
      |  130    |     44708   |    47710      |
      |  140    |     44744   |    51380      |
      |  150    |     46058   |    55050      |
      |  160    |     48880   |    58720      |
      |  170    |     50784   |    62390      |
      |  180    |     49961   |    66060      |
      |  190    |     53366   |    69730      |
      |  200    |     55618   |    73400      |
       =======================================


      To  study    the  effect   of     different    parameters    a   full
      factorial  experiment was carried out for different parameter values.
      For a given number of  sources     the   following  parameters   were
      varied:  maximum  segment  size (mss = 512,1024 bytes),  time between
      two sources sending segments (g = 50, 100 microseconds), time between
      successive  segements  of  a  source   (t  = 1, 10  milliseconds) and
      distance  of the links (d = 1000,2000 kms).

      Table 2: Full factorial experiment
      |======================================================================
      | # | mss/g/t/d       |  N=3 | N=10  | N=30  | N=40  | N=50  | N=100  |
      |======================================================================
      | 1 | 512/50/1/1000   | 3171 | 14273 | 20068 | 19619 | 24162 | 35687  |
      | 2 | 512/50/1/2000   | 3171 | 14273 | 19906 | 27567 | 30872 | 75083  |
      | 3 | 512/50/10/1000  | 3172 | 14274 | 45994 | 61854 | 77714 | 150453 |
      | 4 | 512/50/10/2000  | 3172 | 14274 | 45994 | 61854 | 77714 | 150458 |
      | 5 | 512/100/1/1000  | 3171 | 14273 | 19283 | 20080 | 24164 |   NA   |
      | 6 | 512/100/1/2000  | 3171 | 14273 | 21241 | 32314 | 35961 |   NA   |
      | 7 | 512/100/10/1000 | 3172 | 14274 | 45994 | 61854 | 77714 |   NA   |
      | 8 | 512/100/10/2000 | 3172 | 14274 | 45994 | 61854 | 77714 |   NA   |
      | 9 | 1024/50/1/1000  | 3040 | 13680 | 18650 | 18824 | 23542 |   NA   |
      |10 | 1024/50/1/2000  | 1542 | 5612  | 19131 | 22934 | 29163 |   NA   |
      |11 | 1024/50/10/1000 | 3040 | 13680 | 44080 | 59280 | 74480 |   NA   |
      |12 | 1024/50/10/2000 | 3041 | 13681 | 44081 | 59281 | 74481 |   NA   |
      |13 | 1024/100/1/1000 | 3040 | 13680 | 18591 | 19600 | 24314 |   NA   |
      |14 | 1024/100/1/2000 | 1403 | 5556  | 17471 | 24412 | 30533 |   NA   |
      |15 | 1024/100/10/1000| 3040 | 13680 | 44080 | 59280 | 74480 |   NA   |
      |16 | 1024/100/10/2000| 3041 | 13681 | 44081 | 59281 | 74481 |   NA   |
      |======================================================================
      (NA - data not available)

      Table   2  shows  for   number  of   source  N=3,10,30,40,50,100  the
      maximum  queue  sizes  for   16 experiments.   From   Table   2,  the
      following observations can be  made.  These observations support  the
      analytical results.

      - In line  numbers 10  and  14 for  N = 3,10   the queue size  is less
        compared to other lines, since in these cases congestion is detected
        early, so the ACRs are  low when the burst is  sent. Queue length is
        given by the formula 2.

      - For N  = 30,40,50,100, when t  = 10  millisecond,  the queue lengths
        agrees with one given by the  formula 1.

      - If the  network  is overloaded then a larger round  trip time  gives
        larger queue lengths. (e.g., see line 1 and 2 for N = 30,40,50).

      - Lines 3 and 4  indicate that  in  underloaded  conditions the  queue
        length is given by the  formula 1.

      - As expected the segment size of 512 and 1024 do not affect the queue
        sizes.


      6. Summary
      ----------

      We generated artifically a  worst   case  scenario  and  studied  the
      buffer  requirement    problem for TCP   over ABR.   The  buffer size
      required depends heavily on the switch scheme  used.  Our simulations
      are  based  on   our  ERICA+ switch scheme.  For  worst case scenario
      generated both analytical prediction   and   simulation  results  for
      queue  lengths  (and  hence  buffer  requirements)  are  given.   The
      simulation agrees well with the analytical  prediction.    The  queue
      lengths   is affected by maximum congestion window size     the round
      trip   time,   network congestion  (overloaded  or  underloaded)  and
      number  of sources. It is not affected by maximum segment size.

      7. References:
      ---------------

      [1] R.Jain, S.Kalyanaraman, R. Goyal, S.Fahmy, S.M.Srinidhi,  "Buffer
      Requirements for TCP over ABR," AF-TM 96-0517, April 1996.

      [2]  Shiv Kalyanaraman, Raj Jain, Sonia Fahmy, Rohit Goyal and Seong-
      Cheol Kim, ''Buffer Requirements For TCP/IP Over  ABR,''  Proc.  IEEE
      ATM'96 Workshop, San Francisco, August 23-24, 1996.

      [3] Raj Jain, Shivkumar Kalyanaraman, Rohit Goyal, Sonia  Fahmy,  and
      Ram   Viswanathan,  "The  ERICA  Switch  Algorithm  for  ABR  Traffic
      Management in ATM Networks, Part I: Description," submitted  to  IEEE
      Transactions  on  Networking,  January  1997. Also presented as AF-TM
      96-1172, August 1996.

      [4] Raj Jain, Shiv Kalyanaraman, Rohit Goyal, Sonia Fahmy,  Fang  Lu,
      Saragur  Srinidhi,  "TBE  and  TCP/IP  traffic,"  ATM  Forum/96-0177,
      February 1996.

      [5]  V. Jacobson, "Congestion Avoidance and Control," Proceedings  of
      the SIGCOMM'88 Symposium, pp. 314-32, August 1988.

      [6] Shivkumar Kalyanaraman,  Raj  Jain,  Sonia  Fahmy,  Rohit  Goyal,
      Jianping  Jiang, Seong-Cheol Kim, "Performance of TCP over ABR on ATM
      backbone and with various VBR traffic patterns,"  ATM  Forum/96-1294,
      October 1996.

      [7] Shiv Kalyanaraman, Raj Jain, Sonia Fahmy, Rohit Goyal and  Seong-
      Cheol  Kim,  ''Performance  and  Buffering  Requirements  of Internet
      Protocols  over  ATM  ABR  and  UBR  Services,''  Submitted  to  IEEE
      Communications Magazine, January 1997 issue.

      [8]  Shiv Kalyanaraman, Raj Jain, Rohit Goyal, Sonia Fahmy and Seong-
      Cheol Kim, ``Performance of TCP/IP Using ATM  ABR  and  UBR  Services
      over  Satellite  Networks,''  IEEE Communication Society Workshop  on
      Computer-Aided  Modeling,  Analysis   and   Design   of Communication
      Liks and Networks, Mclean, VA, October 20, 1996.

      All our papers and ATM  Forum  contributions  are  available  through
      http://www.cse.wustl.edu/~jain/