*****************************************************************************


      ATM Forum Document Number: ATM Forum/97-1085R1


      *****************************************************************************


      Title: A switch algorithm for ABR multipoint-to-point connections


      *****************************************************************************


      Abstract: In a multipoint-to-point connection, multiple senders  send
      data to the same destination.  The multipoint-to-point connection can
      be implemented as a shared tree, where traffic from multiple branches
      is merged into the same stream after every merging point. If the same
      VPI/VCI values are used by all senders in the multipoint-to-point VC,
      it  is  impossible  for  the network to determine any sender-specific
      characteristics  (such  as  the  sender  rate  and  whether   it   is
      bottlenecked  at  this link).  Thus ABR multipoint traffic management
      becomes complex, since per-VC (or per-flow) accounting is  no  longer
      equivalent to per-source accounting.  We design and simulate a switch
      scheme that achieves fairness among the  senders  in  both  point-to-
      point and multipoint-to-point connections.


      *****************************************************************************


      Source:
      Sonia Fahmy, Raj Jain, Rohit Goyal, and Bobby Vandalore
      The Ohio State University
      Department of Computer and Information Science
      Columbus, OH 43210-1277
      Contact Phone: 614-292-3989, Fax: 614-292-2911, E-mail: {fahmy,jain}@cse.wustl.edu
      This work is sponsored in part by NASA Lewis Research Center.

      *****************************************************************************

      Date: December 1997

      *****************************************************************************

      Distribution: ATM Forum Technical Working Group Members (AF-TM)

      *****************************************************************************

      Notice:
      This contribution has been prepared to assist the ATM  Forum.  It  is
      offered  to  the Forum as a basis for discussion and is not a binding
      proposal on the part of any of the contributing  organizations.   The
      statements  are  subject  to change in form and content after further
      study.  Specifically, the contributors reserve the right to  add  to,
      amend or modify the statements contained herein.

      *****************************************************************************


      A postscript version of this contribution including all figures and
      tables has been uploaded to the ATM Forum ftp server in the incoming
      directory. It may be moved from there to atm97 directory. The contribution
      is also available on our web page through:

         http://www.cse.wustl.edu/~jain/atmf/a97-1085.htm


      1   Introduction

      Multipoint-to-point connections require feedback to  be  returned  to
      the  appropriate  sources  at  the  appropriate  times. The bandwidth
      requirements for a VC after  a  merging  point  is  the  sum  of  the
      bandwidths  used  by  all senders whose traffic is merged (see figure
      1). This is because the aggregate data rate after a merging point  is
      the sum of all incoming data rates to the merging point [9].


                            Figure 1: Multipoint-to-point connections


      In [2], we defined different types  of  fairness  for  multipoint-to-
      point  VCs  implemented  as  shared  trees. Among these, source-based
      fairness is the most preferred, since it is an extension of point-to-
      point   fairness   definitions  [7].  To  compute  source-based  fair
      allocations, a single N-to-one connection is treated as N  one-to-one
      connections (VCs), regardless of which VC each source belongs to.


      A source-based fairness algorithm must give the  same  allocation  to
      all  sources bottlenecked on the same link.  Source-based fairness in
      VC merging implementations poses a number of problems, since  sources
      in  the  same VC cannot be distinguished. The main considerations for
      switch  algorithms  in  this  case  are  to  avoid   any   per-source
      accounting,  and  to  avoid  any  attempt  to  estimate the effective
      number, or rates of active sources. However, this  may  increase  the
      oscillations  and  transient response for many algorithms, since per-
      source accounting can improve switch algorithm performance.

      In this  contribution,  we  design  a  switch  algorithm  that  gives
      source-based   max-min   fair   allocations  for  point-to-point  and
      multipoint VCs. The remainder of the  contribution  is  organized  as
      follows.   First,  we  discuss  the VC merging technique for avoiding
      cell interleaving in  multipoint  connections.   Then,  we  summarize
      previous  work  on multipoint-to-point algorithms.  In sections 4 and
      5, we develop the rate allocation and merging  point  algorithms  for
      multipoint  connections.  We analyze the performance of the algorithm
      by simulating it in a number of configurations in section 6.


      2   VC Merging

      The ATM adaptation layer (AAL) at each sender segments  packets  into
      ATM  cells,  marking  the  last  cell  of each packet. The AAL at the
      receiver uses the VPI/VCI fields and the  end  of  packet  marker  to
      reassemble  the data from the cells received. AAL5 does not introduce
      any multiplexing identifier or sequence number in ATM cells. If cells
      from  different  senders are merged and interleaved on the links of a
      multipoint connection (implemented as a shared tree), the AAL5 at the
      receiver  cannot  reassemble  the  data.  This is because all traffic
      within the group uses the same VPI/VCI.  The identity of  the  sender
      is  not indicated in each cell.  Since the cells of different packets
      are interleaved, the packets get corrupted, as illustrated in  figure
      2.


                             Figure 2: The cell interleaving problem


      One of the solutions proposed to  this  problem  is  the  VC  merging
      approach.  This approach buffers cells of other packets at the switch
      until all cells of the current packet go through (see  SEAM  [4]  and
      ARIS [12]). A packet-based scheduling algorithm is implemented at the
      merging point, and separate queues are maintained for  each  flow.  A
      flow  is  defined  as  the cells of a VC coming on an input link. The
      end-of-message bit signals  to  the  switch  that  a  packet  from  a
      different  port  can now be forwarded. See [13] for an analysis of VC
      merging.   In   this   contribution,   we   focus   on   VC   merging
      implementations,    where    cells   from   different   senders   are
      indistinguishable, since this is the most difficult case.


      3   Previous Work


      Traffic management  rules  for  multipoint-to-point  connections  are
      still  in  their  early  phases  of definition [5, 6]. Four different
      types  of  fairness  have  been   defined   for   multipoint-to-point
      connections in [2]:


        1. Source-based fairness: divides  bandwidth  fairly  among  active
      sources ignoring group memberships.


        2. VC/source-based fairness: first gives fair bandwidth allocations
      at the VC level, and then
           fairly allocates the bandwidth for  each  VC  among  the  active
      sources in the VC.


        3. Flow-based fairness: gives  fair  allocations  for  each  active
      flow, where a flow constitutes the
           cells of a VC coming on an input link.


        4. VC/flow-based fairness: first divides  the  available  bandwidth
      fairly among the active VCs
           and then divides the VC bandwidth fairly among the active  flows
      in the VC.


      Ren and  Siu  [10]  have  described  an  algorithm  for  source-based
      multipoint-to-point congestion control, which assumes that VC merging
      is employed. The  algorithm  operates  as  follows.  When  a  forward
      resource  management  (FRM) cell originating at a leaf is received at
      the merging point, it is forwarded to the root, and the merging point
      returns a backward resource management (BRM) cell to the source which
      had sent the FRM cell. The explicit rate in the BRM cell  is  set  to
      the value of a register called MER (explicit rate), maintained at the
      merging point for each VC. The MER register is then reset to the peak
      cell  rate.  When a BRM cell is received at the merging point, the ER
      value in the BRM is used to set the MER register, and the BRM cell is
      discarded.


      A better strategy is to maintain a bit at the merging point for  each
      of  the  flows  being merged [11].  The bit indicates that an FRM has
      been received from this flow  after  a  BRM  had  been  sent  to  it.
      Therefore,  when  an  FRM  is  received  at  the merging point, it is
      forwarded to the root and the bit is set, but  the  RM  cell  is  not
      turned around as in the previous algorithm. When a BRM is received at
      the merging point, it is duplicated and sent  to  the  branches  that
      have  their  bit  set, and then the bits are reset. We implement this
      algorithm as explained in section 5, and show simulation  results  in
      section  6.  In  their  papers  [10]  and [11], Ren and Siu only show
      simulation results for simple LAN  configurations.  We  discuss  more
      complex  problems  that  arise  in  multipoint  algorithms,  and show
      simulation results for our proposed solutions.


      4   Rate Allocation Algorithm


      The rate allocation algorithm is employed at every switch to give the
      appropriate feedback to the senders [3]. The algorithm developed here
      is based upon the ERICA+ switch algorithm [8]. There are, however,  a
      number  of  differences from ERICA+.  We eliminate all the steps that
      required per-VC accounting. The reason for this is that  we  actually
      need   per-source  accounting.  We  avoid  that  for  simplicity  and
      scalability reasons, and for compatibility with VC merging  switches.
      We  first give the pseudocode of the algorithm, and then discuss some
      of the issues pertaining to rate allocation algorithms for multipoint
      connections.


      4.1  Algorithm Pseudocode


      The algorithm uses a measurement interval to measure  the  quantities
      required  for  computing  the  rate  allocation.  At the end of every
      interval, the algorithm averages and computes the  quantities,  which
      are  used  to  give  the  appropriate  feedback to the sources in the
      following interval.  The algorithm measures the input  rate  and  the
      available  capacity, subtracting the capacity used by higher priority
      classes such as VBR. It also computes  a  function  of  the  queueing
      delay and uses its value to scale the available capacity (in order to
      leave a percentage of the capacity for the queues to drain). See  [8]
      for more details on the queue control concept and function. The ratio
      of the input rate to the  target  capacity  is  called  the  overload
      factor. The algorithm also uses the current cell rate of the sources,
      as indicated in the FRM cells. In addition, it  keeps  track  of  the
      maximum explicit rate given to sources during every interval.


      In the pseudocode below, there are two options that are not necessary
      for  the  algorithm,  but help reduce rate fluctuations in some cases
      (especially when the measurement interval value is very  small).  The
      first  option (which we label option 1) does not use the most current
      current cell rate (CCR) value from FRM cells, but uses the maximum of
      the  CCR  values seen in FRMs in the current interval. This option is
      useful when there are multiple senders in the same VC,  as  explained
      in the next subsection. The second option (option 2) uses exponential
      averaging for the maximum ER given in the previous interval.


      The algorithm executes for each output port:  when  an  FRM  cell  is
      received,  when  a  BRM  cell  is  received,  and  at the end of each
      measurement interval. Since  the  calculations  of  the  input  rate,
      target  capacity  and  overload  factor are the same as in the ERICA+
      algorithm, we only briefly outline these here.


      FRM cell is received for VC j:
      ==============================

      (current cell rate)j = CCR field from the FRM cell


      Or as an option (option 1: maximum CCR option):
      IF (first FRM in interval)j = TRUE THEN
         (current cell rate)j = CCR field from the FRM cell
         (first FRM in interval)j = FALSE
      ELSE
         (current cell rate)j = maximum (CCR field from the FRM cell, (current cell rate)j)
      END


      BRM cell is to be sent out for VC j:
      ====================================

      IF (overload factor  1+delta) THEN
         ER = (current cell rate)j/overload factor
      ELSE
         ER = maximum ((current cell rate)j/overload factor, maximum ER in previous interval)
      END


      ER = minimum (target capacity, ER)


      maximum ER in current interval = maximum (ER, maximum ER in current interval)


      ER in BRM cell = minimum (ER, ER in BRM cell)


      End of measurement interval:
      ============================

      target capacity = exponential average of target capacity across intervals (subtracting VBR capacity,
      and leaving capacity for queues to drain by using a queue control function as in [8])


      input rate = exponential average of input rate across intervals


      overload factor = input rate/target capacity


      FOR ALL j (first FRM in interval)j = TRUE


      maximum ER in previous interval = maximum ER in current interval


      Or as an option (option 2: averaging the maximum ER in previous interval option):
      maximum ER in previous interval = (1-alpha) x maximum ER in current interval + alpha x maximum
      ER in previous interval


      maximum ER in current interval = 0

      Notes:

        1. The input rate, target capacity, overload factor, maximum ER in
           current interval and maximum ER in previous interval are
           computed and stored for each output port. The "first FRM in
           interval" and the "current cell rate" variables are stored for
           each VC for each output port.


        2. In our simulations, the parameter delta is set to 0.1, and the
           parameter alpha is also set to 0.1. These  are  the  recommended
      values
           in most cases.


        3. The maximum ER previous averaging option (option 2) slightly
           reduces rate oscillations in some cases. It is not essential if
           its implementation complexity is high.

        4.  The  maximum  CCR  option  (option   1)   also   reduces   rate
      oscillations,
           especially in cases of extremely small averaging interval values
           (< 200 us for rates  about  10  Mbps  per  source).  It  is  not
      necessary
           in most cases.  Exponentially averaging the maximum CCR values
           across intervals can further improve the performance. The next
           subsection discusses the usage of CCR in more detail.


      4.2  Design Issues


      Rate allocation algorithms for  multipoint-to-point  (or  multipoint-
      to-multipoint)  connections  cannot  distinguish cells from different
      sources in the same VC. Thus they cannot measure  the  rate  of  each
      source,  or distinguish between overloading and underloading sources,
      or estimate the effective number of active sources.  Such  techniques
      are  used  in  many  point-to-point  switch  schemes, such as the MIT
      scheme, the UCSC scheme, and the ERICA scheme.


            Figure 3: Example  multipoint-to-point  configuration  with  an
      upstream bottleneck


      Furthermore, algorithms which use the CCR values noted from BRM cells
      will not work for multipoint-to-point connections. This is because it
      may be impossible to determine which source the RM cell  belongs  to.
      The  CCR  value in the BRM cells at the merging point may not capture
      upstream bottleneck information for any of the flows whose traffic is
      being merged, since it may actually be the CCR of a downstream source
      whose bottleneck rate is high.


      Consider the following situation. Suppose  a  multipoint-to-point  VC
      has  two  sources, one of which has a bottleneck rate of 58 Mbps, and
      the other has a bottleneck rate of 16 Mbps, and the two  sources  are
      being merged at a switch. Figure 3 shows an example where at Switch2,
      S1 (and S2) of rate 16 Mbps and S3 of rate 58 Mbps are  being  merged
      (we  will  simulate this case in sections 6.2.2 to 6.2.4). The source
      which is bottlenecked at 16 Mbps (say S1) shares its bottleneck  link
      with  a point-to-point connection (SA to dSA).  At the merging point,
      BRM cells of the higher rate source (the 58  Mbps  source)  are  more
      frequently  sent to all the sources in this VC being merged (assuming
      FRM cells were received from them) with an ER value that assumes  the
      CCR  is 58 Mbps.  This can result in overallocation to the lower rate
      source(s)  being  merged,  and  unfairness  to   the   point-to-point
      connection.


      Therefore, algorithms that use the CCR  value  for  rate  computation
      must  use the value of the CCR indicated in FRM cells for computation
      when a BRM cell is received. This is the most up-to-date value of CCR
      anyway,  since  the  CCR in the BRMs may be stale after traveling all
      the way to the destination and back. The CCR value in the  FRM  cells
      at the merging point captures upstream bottleneck information for one
      of the flows whose traffic is being merged.  The  FRM  cells  of  the
      sources  being merged, however, may still be indistinguishable at the
      merging point. In the remainder of this section, we argue  that  this
      does  not  significantly  affect  the  convergence  and  steady state
      behavior of the algorithm.


      Assume that there are two streams S_low and S_high being merged. Also
      assume  that  the  CCRs  of  the  two streams are 16 Mbps and 58 Mbps
      respectively. We will examine the situation when the forward CCR used
      to  compute  the ER for a stream is not the CCR corresponding to that
      stream.


      When computing the ER for S_low, if the CCR of S_high  (58  Mbps)  is
      used,  then  the ER computed for S_low will be too high. But S_low is
      bottlenecked upstream of the merging point (otherwise its  bottleneck
      rate  will  not  be less than that for S_high, since S_low and S_high
      merge at the merging point and never split after  that),  so  the  ER
      given  to  S_low at the merging point will be overwritten by upstream
      switches.


      For the case when the CCR of S_low is used  to  compute  the  ER  for
      S_high,  let  us  first  consider  the algorithm with the maximum CCR
      option. The only situation when the ER for S_high is calculated based
      upon the CCR for S_low is when only FRM cells of S_low have been seen
      since the beginning of the current interval  (note  that  if  no  FRM
      cells  have  been seen at all, the CCR value used is the maximum seen
      in the previous interval, which will be the CCR of  the  higher  rate
      source  S_high  unless S_high is sending at a very low rate, in which
      case the scheme should not allocate it high rates). Since S_high  has
      a  higher rate, it has a higher frequency of FRM cells, so it becomes
      highly improbable for this to  hold.  Assuming,  however,  that  only
      S_low  RM  cells  have  been  seen since the beginning of the current
      interval, the CCR divided by  the  overload  factor  will  likely  be
      smaller  than  the  optimal rate for S_high. But this will eventually
      cause the overload factor to be less than 1 + delta, which means that
      the  maximum  ER  allocated  in the previous interval will be used in
      this case. The maximum ER in the previous interval captures the value
      for  all  VCs  going  to  this output port, so fairness and high link
      utilization are ensured.


      This argument can be extended for the algorithm without  the  maximum
      CCR  option. In this case, instead of the smaller CCR being only used
      when no FRM cells from the higher rate source have been seen  so  far
      in  this  interval,  it is the last FRM cell received that determines
      the CCR used.  But, again, since the higher rate source has a  higher
      FRM  rate,  it  is  statistically  unlikely for the smaller CCR to be
      used. The maximum ER in the previous interval term  ensures  that  if
      the  small  CCR  is in fact used, the source is allocated at least as
      much as other VCs going to the same output port.


      5   Merging Point Algorithm


      This  algorithm  is  based  upon  the  multipoint-to-point  algorithm
      developed by Ren and Siu in [11].  The algorithm is employed at every
      merging  point  where  cells  from  different  senders  in  the  same
      multipoint-to-point  VC  are being merged and follow the same path to
      the destination. We first give the pseudocode for the algorithm,  and
      then discuss some properties of the algorithm.


      5.1  Algorithm Pseudocode


      A flag (can be one bit) called Ready is maintained for  each  of  the
      flows  being  merged.  The  flag  indicates that an FRM cell has been
      received from this flow after a BRM cell had been sent to it.


      Upon the receipt of an FRM cell from branch i:
      ==============================================

        1. Forward FRM cell to the outgoing link
        2. Let Ready_i= TRUE

      Upon the receipt of a BRM cell from the root:
      =============================================

      FOR ALL upstream branches DO
         IF Ready_i= TRUE THEN
            Send a copy of the BRM to branch i
            Let Ready_i= FALSE
         END
      END

      When a BRM cell is about to be scheduled:
      =========================================

      Perform the rate allocation algorithm as described in the previous section


      5.2  Design Issues


      There are  other  ways  to  implement  multipoint-to-point  ABR  flow
      control  algorithms.  Each  method  offers  a tradeoff in complexity,
      scalability, overhead, noise and response time.


      In the above algorithm, a BRM cell is returned to a sender for  every
      one  or  more  FRM cells it sends.  Thus the BRM to FRM cell ratio at
      the sender is less than or equal to one.  In steady state, the  ratio
      is  likely  to  approach one, since the FRM rate and BRM rate will be
      similar. This is an important  property  of  ABR  flow  control  that
      should  be maintained for multipoint-to-point connections. The BRM to
      FRM ratio in the network is also one in this case. This  is  contrary
      to the algorithm in [10] where FRM cells are turned around at merging
      points, and the same FRMs can be turned  around  at  another  merging
      point or the destination, creating many BRM cells that eventually get
      discarded in the network.


      Also observe that in this scheme, since the merging  point  does  not
      need  to turn around every FRM cell, the overhead of the algorithm is
      significantly reduced. However, the scheme  needs  to  duplicate  BRM
      cells.  With  the new advances in multicast ATM switch architectures,
      this operation can be quite efficient.


      The algorithm we use returns a BRM cell received from the root to the
      branches  which  have  sent  FRM cells to the merging point since the
      last BRM cell had been passed. This makes the scheme  less  sensitive
      to  the  number  of  levels  of  merging points, as compared to those
      schemes which turn around FRM cells (such as  the  scheme  in  [10]).
      This  is  because schemes turning around FRMs have to wait for an FRM
      to be received  at  every  merging  point,  so  their  response  time
      increases  with the number of levels in the tree. In addition, the ER
      value returned by such schemes may be incorrect if no BRM cells  have
      been   received  since  the  last  one  was  sent,  leading  to  rate
      oscillations and possibly large queue lengths.


      6   Performance Analysis


      This  section  provides  a  simulation  analysis  of  the  multipoint
      algorithm described in the previous two sections.


      The results are presented  in  the  form  of  four  graphs  for  each
      configuration:


         (a) Graph of allowed cell rate (ACR) in Mbps versus time for  each
      source


         (b) Graph of ABR  queue  lengths  in  cells  versus  time  at  the
      bottleneck port of each switch


         (c) Graph of link utilization versus time for  each  of  the  main
      links (those that connect two switches to each other)


         (d) Graph of  number  of  cells  received  versus  time  for  each
      destination


      6.1  Parameter Settings


      Throughout our experiments, the following parameter values are used:


        1. Except where otherwise indicated (in sections 6.2.2 to 6.2.4),
           all links have a bandwidth of 155.52 Mbps (149.76 Mbps after
           SONET overhead is accounted for).


        2. All multipoint-to-point traffic flows from  the  leaves  to  the
      root
           of the tree. No traffic flows  from  the  root  to  the  leaves,
      except
           for   RM   cells.   Point-to-point    connections    are    also
      unidirectional.


        3. Except in section 6.2.3 where we experiment with the source
           parameter rate increase factor (RIF), we have set RIF to 1/32
           in our simulations.  We do not, however, expect the performance
           of the algorithm to be significantly influenced by the value of
           RIF, as seen in section 6.2.3.


        4. The source parameter transient buffer exposure (TBE) is set to
           large values to prevent rate decreases due to the triggering of
           the source open-loop congestion control mechanism. This was done
           to isolate the rate reductions due to the switch congestion
           control scheme from the rate reductions due to TBE.


        5. The queue control parameters are set as follows (see [8] for an
        explanation of these parameters). The two queue hyperbolic curve
        parameters a and b are set to 1.15 and 1 respectively.  The queue
        drain limit factor is set to 0.5 (which means that up to 50% of the
        link capacity can be used to drain queues), and the target queueing
        delay is set to 1.5 ms.


        6. A fixed measurement interval is used to measure and average the
           input rate and available capacity, and to note the maximum
           allocation given (and possibly the maximum CCR value in FRM
           cells). The interval is set to 5 ms in all experiments except
           those in section 6.2.4.


        7. Since we do not implement VC merging in our  switches,  we  only
      use
           one cell long packets.  Our next study will implement VC merging
           and examine its effect.


        8. All sources are deterministic, i.e., their start/stop times and
           their transmission rates are known.


        9. Simulation time is two seconds.


       10.  The  simulations  use  both  the   maximum   CCR   option   and
      exponentially
           averaging the maximum ER in the previous interval option as
           discussed in section 4. We have simulated all our configurations
           without using either option, and with  each  option  separately,
      and
           the differences  were  insignificant.   We  do  not  show  these
      results
           here for space considerations.  In particular, the results when
           neither of the two options is enabled, and with extremely small
           measurement intervals (as with the simulations in section 6.2.4)
           showed that the algorithm still rapidly converges to the
           optimal allocations, and that the oscillations (though they do
           increase) were not significantly more than the results  we  show
      in
           section 6.2.4. From the discussion in section 4.2, however, we
           recommend the maximum CCR option for better performance.


      6.2  Simulation Results


      In this section, we discuss a sample of our  simulation  results.  We
      mainly  use  two  configurations,  and experiment with different link
      lengths, initial cell rates of  the  sources,  rate  increase  factor
      values, and lengths of the measurement interval.


      6.2.1 Downstream Bottleneck Configuration


            Figure 4:  Example  multipoint-to-point  configuration  with  a
      downstream bottleneck


      Figure 4 illustrates a configuration with two VCs: one of the VCs  is
      a multipoint-to-point VC with three senders and one receiver, and the
      other is a point-to-point VC. Sources S1, S2, and S3 are  sending  to
      destination  dS1,  and  source  SA is sending to destination dSA. All
      links are approximately 150 Mbps (after SONET overhead  is  accounted
      for), and their lengths are as shown in the figure.


      Clearly, all four sources  are  sharing  a  bottleneck  link  (LINK3)
      between  Switch3  and Switch4. The aim of this example is to show the
      division of the 150 Mbps capacity of this bottleneck link  among  the
      sources.


      Applying the max-min fairness definition among sources,  the  optimal
      allocations should be:


         {S1, S2, S3, SA} = {37.5, 37.5, 37.5, 37.5}


      Each of the four sources is allocated 1/4*150 = 37.5.


      Figure  6  illustrates  the   results   of   simulating   the   above
      configuration.  The sources start with an ICR value of 25 Mbps, which
      is below their optimal allocation. Clearly, all sources rise to their
      optimal rates quickly (figure 6(a)), and the queues are small (figure
      6(b)).  The bottleneck link (LINK3) is fully utilized (figure  6(c)).
      LINK2  is 50% utilized (since only 2 of the 4 sources utilize it) and
      LINK1 is only 25% utilized (1 out of 4 sources).


      Observe that with source-based  fairness,  VCs  that  have  a  larger
      number  of  concurrently  active  senders get more bandwidth than VCs
      with less concurrent senders on the same link. This  can  be  clearly
      seen from the slope of the cells received graph (figure 6(d)) for dS1
      and dSA. Clearly dS1 has a slope that is three times as large as that
      for  dSA.  After 2 seconds, the ratio of cells received at dSA to dS1
      is around 175000 to 520000,  which  is  exactly  1  to  3.  Thus  the
      resource  allocation  is  not  max-min fair among the VCs. This means
      that the bandwidth allocated  to  a  multipoint-to-point  VC  with  N
      concurrent  senders  all  bottlenecked  on  a certain link would be N
      times the bandwidth for a point-to-point VC bottlenecked on that same
      link,  and  N=K  times  that  for  a  K-sender multipoint-to-point VC
      bottlenecked on the same link.


            Figure 5:  Example  multipoint-to-point  configuration  with  a
      downstream bottleneck


      Figure 7 shows the situation with the  same  configuration  but  when
      LINK1  is 5000 km and LINK3 is 50 km, as shown in figure 5. There are
      slightly more rate fluctuations in this situation, due  to  the  long
      feedback delay to the first source S1. The fluctuations are caused by
      variations in the value of the overload.


      Figure  8  illustrates  the   results   of   simulating   the   first
      configuration  (figure 4) when all sources start at a high ICR value.
      The ICR for all sources here is 100 Mbps.  This  creates  an  initial
      overload  on  LINK3  of  400/150 = 2 2/3. The algorithm recovers from
      this situation and all sources  converge  to  the  correct  value  of
      approximately  37.5  Mbps  (figure 8(a)). The queues at Switch3 start
      dropping after approximately one round trip (figure 8(b)).


      Figure 9 shows the situation when LINK1 is the longest  link  (figure
      5) with a high ICR value (100 Mbps). The rates converge to about 37.5
      Mbps (figure 9(a)), the queues are drained quickly (figure 9(b)), and
      the bottleneck link is fully utilized (figure 9(c)).


      Figure 10 shows the results for the configuration of figure  4  (long
      LINK3), but different sources start at different ICR values.  Sources
      S1 and S3 start at an ICR of 65 Mbps, while sources S2 and  SA  start
      at  10 Mbps.  Notice that the sum of the source rates for all sources
      is 150 Mbps, so the initial load value is close to 1. The  rates  for
      sources  S1 and S3 are quickly reduced, while those of sources S2 and
      SA quickly rise, as seen in figure 10(a). The queues are  also  quite
      small (figure 10(b)).


                   (a) Transmitted Cell Rate            (b) Queue Length


                     (c) Link Utilization               (d) Cells Received


      Figure 6: Results for a WAN multipoint-to-point configuration with  a
      downstream bottleneck (long LINK3, low ICR)


                   (a) Transmitted Cell Rate            (b) Queue Length


                     (c) Link Utilization               (d) Cells Received


      Figure 7: Results for a WAN multipoint-to-point configuration with  a
      downstream bottleneck (long LINK1, low ICR)


                   (a) Transmitted Cell Rate            (b) Queue Length


                     (c) Link Utilization               (d) Cells Received


      Figure 8: Results for a WAN multipoint-to-point configuration with  a
      downstream bottleneck (long LINK3, high ICR)


                   (a) Transmitted Cell Rate            (b) Queue Length


                     (c) Link Utilization               (d) Cells Received


      Figure 9: Results for a WAN multipoint-to-point configuration with  a
      downstream bottleneck (long LINK1, high ICR)


                   (a) Transmitted Cell Rate            (b) Queue Length


                     (c) Link Utilization               (d) Cells Received


      Figure 10: Results for a WAN multipoint-to-point configuration with a
      downstream bottleneck (long LINK3, different ICRs)


      6.2.2 Upstream Bottleneck with Heterogenous Links Configuration


            Figure 11: Example multipoint-to-point  configuration  with  an
      upstream bottleneck

      Figure 11 illustrates a configuration with two VCs: one of the VCs is
      a  multipoint-to-point VC with four senders and one receiver, and the
      other is a point-to-point VC. Sources S1, S2, S3 and S4  are  sending
      to destination dS1, and source SA is sending to destination dSA.  All
      links are approximately 150 Mbps (after SONET overhead  is  accounted
      for),  except  for the link between Switch1 and Switch2 (LINK1) which
      is only 50 Mbps. The  link  lengths  are  as  shown  in  the  figure.
      Clearly,  sources  S1,  S2  and  SA  are bottlenecked at LINK1, while
      sources S3 and S4 are bottlenecked at LINK3. The aim of this  example
      is  to illustrate the allocation of the capacity left over by sources
      bottlenecked on LINK1 to the sources bottlenecked on LINK3,  and  the
      fairness  of  the  allocation among the point-to-point and multipoint
      sources.


      The  allocation  vector  according  to  the  source  based   fairness
      definition is:


         {S1, S2, S3, S4, SA} = {16.67, 16.67, 58.33, 58.33, 16.67}


      This is because each of sources S1, S2 and SA is allocated one  third
      of  the  bandwidth of LINK1.  At LINK3, the 50 x 2/3= 33.33 Mbps used
      by sources S1 and S2 is subtracted from the available bandwidth,  and
      the  remaining capacity (116.67 Mbps) is equally divided upon sources
      S3 and S4.


            Figure 12: Example multipoint-to-point  configuration  with  an
      upstream bottleneck


      Figures 13 and 14 illustrate the results for the  configuration  when
      LINK1  is of length 50 km and LINK3 is of length 5000 km (figure 11),
      and when LINK1 is of length 5000 km and LINK3 of length 50 km (figure
      12),  respectively.  Sources  S1  and  S2 start at an ICR of 20 Mbps.
      Source S3 starts at 30 Mbps and source S4 starts at 80  Mbps.  Source
      SA starts at 10 Mbps.


      As seen in figures 13(a) and 14(a), sources S1, S2 and SA converge to
      about  16.67  Mbps,  while  sources S3 and S4 converge to about 58.33
      Mbps. The queues are bounded to reasonable values (figures 13(b)  and
      14(b))  and utilization of the bottleneck links (LINK1 and LINK3) are
      close to 100% (figures 13(c) and 14(c)). Destination  dSA  gets  much
      less  throughput  than dS1 (figures 13(d) and 14(d)), since source SA
      is bottlenecked on a 50 Mbps link  with  2  other  sources.  After  2
      seconds,  the ratio of the throughputs for destinations dSA to dS1 is
      approximately 80000 to 700000 which is 0.11. The slopes  of  the  two
      lines  also  have  the same ratio. This is close to the optimal value
      since 16.67/133.09 = 0.13.


                   (a) Transmitted Cell Rate            (b) Queue Length


                     (c) Link Utilization               (d) Cells Received


      Figure 13: Results for a WAN multipoint-to-point  configuration  with
      an upstream bottleneck (long LINK3)


                   (a) Transmitted Cell Rate            (b) Queue Length


                     (c) Link Utilization               (d) Cells Received


      Figure 14: Results for a WAN multipoint-to-point  configuration  with
      an upstream bottleneck (long LINK1)


      6.2.3 Effect of Large Rate Increase Factor Values

                   (a) Transmitted Cell Rate            (b) Queue Length


                     (c) Link Utilization               (d) Cells Received


      Figure 15: Results for a WAN multipoint-to-point  configuration  with
      an upstream bottleneck (long LINK3, large RIF)


      The rate increase factor determines the maximum increase when  a  BRM
      cell  indicating  underload  is  received.   If  the  RIF is set to a
      fraction less than one, the maximum increase at each step is  limited
      to  RIF  x the peak cell rate for the VC. Setting RIF to small values
      is a more  conservative  strategy  that  controls  queue  growth  and
      oscillations,  especially  during transient periods. It, however, may
      slow down the response of the system when capacity  suddenly  becomes
      available leading to underutilization.


      Figure 15 illustrates the results for the configuration of figure  11
      when  the  rate  increase factor (RIF) is set to its maximum possible
      value, which is 1. Part (a) of the figure shows that the rates do not
      oscillate  more  than the corresponding figure with a small RIF value
      (figure 13(a)).  The queues in figure 15(b) are also similar to those
      in figure 13(b).


      6.2.4 Effect of Extremely Short Measurement Intervals


      As discussed in section 4, extremely short measurement intervals  can
      cause  the  algorithm  to  suffer from oscillations.  To examine this
      effect, we have simulated the algorithm with a  measurement  interval
      of  200  us.  Recall  that  in  the upstream bottleneck configuration
      (shown in figure 11), the optimal rates for sources S1, S2 and SA are
      16.67  Mbps,  and  those  for  sources S3 and S4 are 58.33 Mbps. This
      implies that, in steady state, RM cells for sources  S1,  S2  and  SA
      arrive every:

                                   32 x 53 x8
                                   ---------- = 813.92 us
                                    16.67 M

      For sources S3 and S4, RM cells arrive every:


                                   32 x 53 x 8
                                   ----------- = 232.61 us
                                    58.33 M

      Setting the measurement interval to 200 us, means that RM  cells  for
      S3  and S4 might not be received every measurement interval, and that
      RM cells for S1, S2 and SA might not be received  for  4  consecutive
      measurement intervals.


      In order to receive at least one  FRM  cell  from  the  highest  rate
      source in a certain interval, the interval length should be > Nrm/ACR
      maximum.  This condition  is  likely  to  hold  for  reasonably  long
      intervals, unless all sources are sending at very low rates, in which
      case the overload factor will be low and their rates will increase if
      they have data to send.

      Figure 16 illustrates the results for the configuration of figure 11.
      Clearly,  the  short averaging interval causes more oscillations, but
      the rates of the sources still converge to their max-min fair  rates.
      Also  observe  that the number of cells received for both connections
      is the same as in figure 13(d). Increasing the value of the parameter
      ff (in section 4) can reduce the oscillations.

                   (a) Transmitted Cell Rate            (b) Queue Length


                     (c) Link Utilization               (d) Cells Received


      Figure 16: Results for a WAN multipoint-to-point  configuration  with
      an upstream bottleneck (long LINK3, short interval)


      7   Summary


      Source-based fairness divides bandwidth fairly among active  sources,
      ignoring VC memberships.  Source-based switch algorithms operating in
      VC merging switches need to avoid distinguishing among sources in the
      same VC. Key lessons learnt from this study include:


        1. Source-level accounting should not be performed in multipoint
           rate allocation algorithms. For example, measuring the rates for
           each source, or distinguishing overloading and underloading
           sources cannot be performed. If such accounting is performed at
           the VC level or the  flow  level,  an  additional  mechanism  to
      divide
           VC or flow bandwidth among sources is necessary.


        2. Estimating the effective number of active sources in order to
           divide the available capacity among them is very difficult in
           multipoint connections, since it is impossible to distinguish
           among senders in the same multipoint VC with VC merging
           implementations.


        3. The only information a multipoint rate allocation algorithm can
        use is the information supplied in RM cells, in addition to
        aggregate measurements of load, capacity and queuing delays.


        4. CCR values from BRM cells should not be used in computing rate
           allocations for sources in multipoint connections, since the CCR
           value can be that of another source that does not go through the
           switch performing the computation. That source may have a much
           higher bottleneck rate, and using its CCR can result in
           unfairness.


        5.  CCR  values  from  FRM  cells  can  be  used  to  compute  rate
      allocations
           for sources in multipoint connections, even though the CCR used
           to compute the rate for a source may not actually be the CCR
           value of the source.  This does not create problems due to the
           properties of the merged flow (see section 4 for more detailed
           explanation).  The maximum CCR value seen in an interval can be
           used instead of the CCR of the source. Exponential averaging of
           the maximum CCR seen or maximum ER given can further improve the
           performance of the algorithm.


        6. Merge point algorithms should avoid  changing  the  BRM  to  FRM
      ratio
           at the sender or inside the network, to maintain the rate of
           feedback that the source requires, and avoid excessive overhead
           in the network. Scalability of the scheme is also affected by
           these ratios. Excessive complexity, noise, and response time can
           also be avoided by returning the BRM cells coming from the root,
           instead of turning around the RM cells at the merge points.


      In this contribution, we have developed and  simulated  an  algorithm
      for  computing  source-based fair allocations for multipoint-to-point
      and point-to-point connections. The algorithm uses  simple  aggregate
      measurements  and maximum CCR values from FRM cells during successive
      intervals to perform rate computation. The algorithm  exhibited  very
      good behavior for the configurations tested.


      More  extensive  performance  analysis  and  convergence  proofs  are
      crucial  to  examine  the  fairness,  complexity, overhead, transient
      response, delays, and scalability tradeoffs in  multipoint  algorithm
      design.   Extending  multipoint-to-point  schemes  for multipoint-to-
      multipoint  connections  can  be  performed  by  combining  point-to-
      multipoint  algorithms  (such  as  those  developed  in [1]) with the
      multipoint-to-point algorithm.


      References


      References


       [1] Sonia Fahmy, Raj Jain, Rohit Goyal, Bobby  Vandalore,  Shivkumar
      Kalyanaraman, Sastri
          Kota, and Pradeep Samudra. Feedback consolidation algorithms  for
      ABR point-to-multipoint
          connections. ATM Forum/97-0615, July 1997.  Extended  version  to
      appear in the Proceedings
          of IEEE INFOCOM 1998.


       [2] Sonia Fahmy, Raj Jain,  Rohit  Goyal,  Bobby  Vandalore,  Sastri
      Kota, and Pradeep Samudra.
          Fairness for ABR multipoint-to-point connections.  ATM  Forum/97-
      0832, September 1997.


       [3]  The   ATM   Forum.     The   ATM   forum   traffic   management
      specification  version  4.0.
          ftp://ftp.atmforum.com/pub/approved-specs/af-tm-0056.000.ps,
      April 1996.


       [4] Matthias Grossglauser and K. K. Ramakrishnan.    SEAM:  Scalable
      and efficient ATM
          multipoint-to-multipoint multicasting. In Proceedings of the IEEE
      INFOCOM, April 1997.


       [5] Juha Heinanen. Comments on  the  multipoint-to-point  base  line
      text. ATM Forum/97-0707,
          September 1997.


       [6] Juha Heinanen. Multipoint-to-point VCs. ATM Forum/97-0261, April
      1997.


       [7] Jeffrey M. Jaffe.  Bottleneck flow control.   IEEE  Transactions
      on Communications, COM-29(7):954-962, July 1981.

       [8] Raj Jain, Shivkumar Kalyanaraman, Rohit Goyal, Sonia Fahmy,  and
      Ram  Viswanathan. ERICA switch algorithm: A complete description. ATM
      Forum/96-1172, August 1996.


       [9] Mark Jeffrey. Scope, concepts and issues for  the  new  multiway
      BOF. ATM Forum/96-0628,
          June 1996.


      [10] Wenge  Ren,  Kai-Yeung  Siu,  and  Hiroshi  Suzuki.  Performance
      evaluation of multipoint-point
          ABR and UBR. ATM Forum/96-1402, October 1996.


      [11]  Wenge  Ren,  Kai-Yeung  Siu,  Hiroshi  Suzuki,   and   Masayuki
      Shinihara. Multipoint-to-multipoint
          ABR service in ATM. Submitted to IEEE Journal on  Selected  Areas
      in Communications, 1997.


      [12] A. Viswanathan, N. Feldman, Rick Boivie, and Rich Woundy.  ARIS:
      Aggregate route-based
          IP  switching.  IETF  Internet   Draft:   draft-viswanathan-aris-
      overview-00.txt, March 1997.


      [13]  Indra  Widjaja,  Steve   Wright,   and   Amalendu   Chatterjee.
      Interworking of VP-merge, VC-merge,
          and non-merge ATM switches in a multipoint-to-point  environment.
      ATM Forum/97-0768,
          September 1997.

      All our papers and ATM  Forum  contributions  are  available  through
      http://www.cse.wustl.edu/~jain/