0940_03F8_c1 NW97_US_106

55
1 0940_03F8_c1 NW97_US_106 Multicast Issues for Gigapop Operators David Meyer Gigapop Operators II Workshop 26 June 1998 0940_03F8_c1 NW97_US_106

description

Multicast Issues for Gigapop Operators David Meyer Gigapop Operators II Workshop 26 June 1998. 0940_03F8_c1 NW97_US_106. Agenda. Introduction First Some Basic Technology Basic Host Model Basic Router Model Data Distribution Concepts - PowerPoint PPT Presentation

Transcript of 0940_03F8_c1 NW97_US_106

Page 1: 0940_03F8_c1 NW97_US_106

10940_03F8_c1NW97_US_106

Multicast Issues for Gigapop Operators

David MeyerGigapop Operators II Workshop 26 June 1998

0940_03F8_c1NW97_US_106

Page 2: 0940_03F8_c1 NW97_US_106

20940_03F8_c1NW97_US_106

• Introduction• First Some Basic Technology

Basic Host ModelBasic Router ModelData Distribution Concepts

• What Are the Deployment ObstaclesWhat Are the Non-technical IssuesWhat Are the Technical Scaling Issues

Agenda

Page 3: 0940_03F8_c1 NW97_US_106

30940_03F8_c1NW97_US_106

Agenda (Cont.)

• Potential Solutions (Cisco Specific)Multi-level RP, Anycast Clusters, MSDPUsing Directory Services

• Industry SolutionsBGMP and MASC

• Possible Deployment Scenarios• References

Page 4: 0940_03F8_c1 NW97_US_106

40940_03F8_c1NW97_US_106

Introduction—Level Set

• This presentation focuses on large-scale multicast routing for Gigapops and their customers

Note that the problem is essentially the same as the inter-domain multicast routing problem

• The problems/solutions presented are related to inter-enterprise or Gigapop deployment of IP multicast

• The current set of deployed technology is sufficient for enterprise environments

Page 5: 0940_03F8_c1 NW97_US_106

50940_03F8_c1NW97_US_106

Introduction—Why Would You Want to Deploy IP Multicast?

• You don’t want the same data traversing your links many times— bandwidth saver• You want to join and leave groups

dynamically without notifying all data sources—pay-per-view

Page 6: 0940_03F8_c1 NW97_US_106

60940_03F8_c1NW97_US_106

Introduction—Why Would You Want to Deploy IP Multicast?

• You want to discover a resource but don’t know who is providing it or if you did, don’t want to configure it— expanding ring search• Reduce startup latency for

subscribers

Page 7: 0940_03F8_c1 NW97_US_106

70940_03F8_c1NW97_US_106

Introduction—Why Would a Gigapop Want to Deploy IP Multicast?• All of the previous, plus revenue

potential for deploying IP multicast• Initial applications

Radio station transmissionsReal-time stock quote service

• Future applicationsDistance learningEntertainment

Page 8: 0940_03F8_c1 NW97_US_106

80940_03F8_c1NW97_US_106

Basic Host Model

• Strive to make the host model simpleWhen sourcing data, just send the data

Map network layer address to link layer addressRouters will figure out where receivers are and

are notWhen receiving data, need to perform two actions

Tell routers what group you’re interested in (via IGMP)Tell your LAN controller to receive for link-layer

mapped address

Page 9: 0940_03F8_c1 NW97_US_106

90940_03F8_c1NW97_US_106

Basic Host Model

• Hosts can be receivers and not send to the group• Hosts can send but not be

receivers of the group• Or they can be both

Page 10: 0940_03F8_c1 NW97_US_106

100940_03F8_c1NW97_US_106

Basic Host Model

• There are some protocol and architectural issues

Multiple IP group addresses map into a single link-layer address

You need IP-level filteringHosts join groups, which means they receive traffic from all sources sending to the group

Wouldn’t it be better if hosts could say what sources they were willing to receive from

Page 11: 0940_03F8_c1 NW97_US_106

110940_03F8_c1NW97_US_106

Basic Host Model

• There are some protocol and architectural issues (continued)

You can access control sources but you can’t access control receivers in a scalable way

Page 12: 0940_03F8_c1 NW97_US_106

120940_03F8_c1NW97_US_106

Basic Router Model

• Since hosts can send any time to any group, routers must be prepared to receive on all link-layer group addresses

And know when to forward or drop packets

Page 13: 0940_03F8_c1 NW97_US_106

130940_03F8_c1NW97_US_106

Basic Router Model

• What does a router keep track of?interfaces leading to receiverssources when utilizing source distribution treesprune state depending on the multicast routing protocol (e.g. Dense Mode)

Page 14: 0940_03F8_c1 NW97_US_106

140940_03F8_c1NW97_US_106

Data Distribution Concepts

• Routers maintain state to deliver data down a distribution tree• Source trees

Router keeps (S,G) state so packets can flow from the source to all receiversTrades off low delay from source against router state

Page 15: 0940_03F8_c1 NW97_US_106

150940_03F8_c1NW97_US_106

Data Distribution Concepts

• Shared treesRouter keeps (*,G) state so packets flow from the root of the tree to all receiversTrades off higher delay from source against less router state

Page 16: 0940_03F8_c1 NW97_US_106

160940_03F8_c1NW97_US_106

Data Distribution Concepts

• How is the tree built?On demand, in response to data arrival

Dense-mode protocols (PIM-DM and DVMRP)MOSPF

Explicit control Sparse-mode protocols (PIM-SM and CBT)

Page 17: 0940_03F8_c1 NW97_US_106

170940_03F8_c1NW97_US_106

Data Distribution Concepts

• Building distribution trees requires knowledge of where members are

flood data to find out where members are not (Dense-mode protocols)flood group membership information (MOSPF), and build tree as data arrivessend explicit joins and keep join state (Sparse-mode protocols)

Page 18: 0940_03F8_c1 NW97_US_106

180940_03F8_c1NW97_US_106

Data Distribution Concepts

• Construction of source trees requires knowledge of source locations

In dense-mode protocols you learn them when data arrives (at each depth of the tree)Same with MOSPFIn sparse-mode protocols you learn them when data arrives on the shared tree (in leaf routers only)

Ignore since routing based on direction from RPPay attention if moving to source tree

Page 19: 0940_03F8_c1 NW97_US_106

190940_03F8_c1NW97_US_106

Data Distribution Concepts

• To build a shared tree you need to know where the core (RP) is

Can be learned dynamically in the routing protocol (Auto-RP, PIMv2)Can be configured in the routersCould use a directory service

Page 20: 0940_03F8_c1 NW97_US_106

200940_03F8_c1NW97_US_106

Data Distribution Concepts

• Source trees make sense forBroadcast radio transmissionsExpanding ring searchGeneric few-sources-to-many-receiver applicationsHigh-rate, low-delay application requirementsPer source policy from a service provider’s point of viewPer source access control

Page 21: 0940_03F8_c1 NW97_US_106

210940_03F8_c1NW97_US_106

Data Distribution Concepts

• Shared trees make sense forMany low-rate sourcesApplications that don’t require low delayConsistent policy and access control across most participants in a groupWhen most of the source trees overlap topologically with the shared tree

Page 22: 0940_03F8_c1 NW97_US_106

220940_03F8_c1NW97_US_106

Deployment Obstacles—Non-Technical Issues

• How to bill for the serviceIs the service what runs on top of multicast?Or is it the transport itself?Do you bill based on sender or receiver, or both?

• How to control accessShould sources be rate-controlled (unlike unicast routing)Should receivers be able to receive at a specific rate only?

Page 23: 0940_03F8_c1 NW97_US_106

230940_03F8_c1NW97_US_106

Deployment Obstacles—Non-Technical Issues• How to make your peers fan-out instead

of you (reduce the replication factor in you own network)

Closest exit versus latest entrance—all a wash

• How to avoid multicast from opening a lot of security holes

Network-wide denial of service attacksEaves-dropping simpler since receivers are unknown

Page 24: 0940_03F8_c1 NW97_US_106

240940_03F8_c1NW97_US_106

Deployment Obstacles—Technical Issues• Source tree state will become

a problem as IP multicast gains popularity

When policy and access control per source will be the rule rather than the exception

• Group state will become a problem as IP multicast gains popularity

10,000 three member groups across the Internet

Page 25: 0940_03F8_c1 NW97_US_106

250940_03F8_c1NW97_US_106

Deployment Obstacles— Technical Issues

• Hopefully we can upper bound the state in routers based on their switching capacity

Page 26: 0940_03F8_c1 NW97_US_106

260940_03F8_c1NW97_US_106

Deployment Obstacles— Technical Issues• Gigapop customers are telling us

they don’t want to depend on another customer’s (or gigapop) RP

Do we connect shared trees together?Do we have a single shared tree across domains?Do we use source trees only for inter-domain groups?

Page 27: 0940_03F8_c1 NW97_US_106

270940_03F8_c1NW97_US_106

Deployment Obstacles— Technical Issues• Customers are telling us that the

unicast and multicast topologies won’t be congruent across domains

Due to physical/topological constraintsDue to policy constraints

• We need a inter-domain routing protocol that distinguishes unicast versus multicast policy

Page 28: 0940_03F8_c1 NW97_US_106

280940_03F8_c1NW97_US_106

How to Control Multicast Routing Table State in the Network?

• Fundamental problem of learning group membership

Flood and PruneDVMRPPIM-DM

Broadcast MembershipMOSPFDWRs

Rendezvous MechanismPIM-SMBGMP

Page 29: 0940_03F8_c1 NW97_US_106

290940_03F8_c1NW97_US_106

Rendezvous Mechanism

• Why not use sparse-mode PIM?Where to put the root of the shared tree (the RP)Third-party RP problem

• If you did use sparse-mode PIMGroup-to-RP mappings would have to be distributed throughout the Internet

Page 30: 0940_03F8_c1 NW97_US_106

300940_03F8_c1NW97_US_106

Rendezvous Mechanism

• Lets try using sparse-mode PIM for inter-domain multicast• Look at four possibilities

Multi-level RPAnycast clustersMSDPUse directory services

Page 31: 0940_03F8_c1 NW97_US_106

310940_03F8_c1NW97_US_106

Connect Shared Trees Together—Multi-Level RP

• Idea is to have a hierarchy of shared trees

Level-0 RPs are inside of domainsThey propagate joins from routers to a Level-1 RP that may be in another domainAll level-0 shared trees get connected together via a Level-1 RPIf multiple Level-1 RPs, iterate up to Level-2 RPs

Page 32: 0940_03F8_c1 NW97_US_106

320940_03F8_c1NW97_US_106

Connect Shared Trees Together—Multi-Level RP

• ProblemsRequires PIM protocol changesIf you don’t locate the Level-0 RP at the border, intermediate PIM routers think there may be two RPs for the groupStill has the third-party problem, there is ultimately one node at the root of the hierarchyData has to flow all the way to the highest-level RP

Page 33: 0940_03F8_c1 NW97_US_106

330940_03F8_c1NW97_US_106

Connect Shared Trees Together—Anycast Clusters• Share the burden of being an RP

among service providersEach RP in each domain is a border router

• Build RP clusters at interconnect points (or dense-mode clouds)• Group allocation is per cluster and

not per-user or per-domain

Page 34: 0940_03F8_c1 NW97_US_106

340940_03F8_c1NW97_US_106

Connect Shared Trees Together—Anycast Clusters

• Closest border router in cluster is used as the RP• Routers in a domain will use the

domain’s RPProvided you have an RP for that group range at an interconnect pointIf not, you use the closest RP at the interconnect point (could be RP in another domain)

Page 35: 0940_03F8_c1 NW97_US_106

350940_03F8_c1NW97_US_106

Connect Domains Together—MSDP • If you can’t connect shared trees

together easily, then don’t• Multicast Source Discovery Protocol

Different paradigmRather than getting trees connected, get sources known to all treesSounds non-scalable, but the trick is in the implementation

Page 36: 0940_03F8_c1 NW97_US_106

360940_03F8_c1NW97_US_106

Connect Domains Together—MSDP• An RP in a domain has a MSDP

peering session with an RP in another domain

Runs over TCPSource-Active (SA) messages are sent to describe active sending sources in a domainLogical topology is built for the sole purpose to distribute SA messages

Page 37: 0940_03F8_c1 NW97_US_106

370940_03F8_c1NW97_US_106

Connect Domains Together—MSDP• How it works

Source goes active in PIM-SM domainIt’s packets get PIM registered to domain’s RPRP sends SA message to it’s MSDP peersOther MSDP peers forward to their peers away from the originating RPIf a peer in another domain has receivers for the group the source is sending to, it joins the source (Flood-and-Join model)

Page 38: 0940_03F8_c1 NW97_US_106

380940_03F8_c1NW97_US_106

Connect Domains Together—MSDP• There is no shared tree across domains

Therefore each domain can depend solely on it’s own RP (no third-party problem)

• SA state is not stored at each MSDP peer• You could encapsulate data in SA

messages for low-rate bursty sources• You could have SA caching peers to

speed up join latency

Page 39: 0940_03F8_c1 NW97_US_106

390940_03F8_c1NW97_US_106

Use Directory Services

• You can use directory services to:Enable a single shared tree across domainsEnable use of source tree only and avoid using a single shared tree across domains

Page 40: 0940_03F8_c1 NW97_US_106

400940_03F8_c1NW97_US_106

Use Directory Services

• How it works with a single shared tree across domains

Put RP in client’s domainOptimal placement of the RP if the domain had a multicast source or receiver activePolicy for RP is consistent with policy for domain’s unicast prefixesUse directory to find RP address for a given group

Page 41: 0940_03F8_c1 NW97_US_106

410940_03F8_c1NW97_US_106

Use Directory Services• For example

Receiver host sends IGMP report for 224.1.1.1First-hop router performs DNS name resolution on1.1.1.224.pim.mcast.net

An A record is returned with the IP address of RPFirst-hop router sends PIM join message towards RP

Page 42: 0940_03F8_c1 NW97_US_106

420940_03F8_c1NW97_US_106

Use Directory Services

• All routers get consistent RP addresses via DNS• When dynamic DNS is widely

deployed it will be easier to change A records• In the mean time, use loopback

addresses on routers and move them around in your domain

Page 43: 0940_03F8_c1 NW97_US_106

430940_03F8_c1NW97_US_106

Use Directory Services

• When domain group allocation exists, a domain can be authoritative for a DNS zone1.224.pim.mcast.net

128/17.1.224.pim.mcast.net

Page 44: 0940_03F8_c1 NW97_US_106

440940_03F8_c1NW97_US_106

Use Directory Services• Another approach—avoid using

shared trees all togetherBuild PIM-SM source trees across domains

• Put multiple A records in DNS to describe sources for the group

1.0.2.224.sources.pim.mcast.net IN CNAME dmm-home IN CNAME dino-homedmm-home IN A 171.69.58.81dino-home IN A 171.69.127.178

Page 45: 0940_03F8_c1 NW97_US_106

450940_03F8_c1NW97_US_106

Standards Solutions

• Ultimate scalability of both routing and group allocation can be achieved using BGMP/MASC• Use BGP4+ (MBGP) to deal with

non-congruency issues

Page 46: 0940_03F8_c1 NW97_US_106

460940_03F8_c1NW97_US_106

Border Gateway Multicast Protocol (BGMP)• Use a PIM-like protocol that runs

between domains (BGP equivalent for multicast)• The protocol builds a shared tree of

domains for a groupSo we can use a rendezvous mechanism at the domain levelShared tree is bi-directionalRoot of shared tree of domains is at root domain

Page 47: 0940_03F8_c1 NW97_US_106

470940_03F8_c1NW97_US_106

Border Gateway Multicast Protocol (BGMP)• Runs in routers that border a multicast

routing domain• Runs over TCP like BGP• Joins and prunes travel at domain hops• Can build unidirectional source trees• MIGP tells the borders about group

membership

Page 48: 0940_03F8_c1 NW97_US_106

480940_03F8_c1NW97_US_106

Multicast Address Set Claim (MASC)• How does one determine the root

domain for a given group?• Group prefixes are temporarily

leased to domains• They are allocated out of a service

provider’s allocation which in turn get from upstream provider

Page 49: 0940_03F8_c1 NW97_US_106

490940_03F8_c1NW97_US_106

Multicast Address Set Claim (MASC)

• Claims for group allocation resolve collisions• Group allocations are advertised

across domains• Lots of machinery for

aggregating group allocations

Page 50: 0940_03F8_c1 NW97_US_106

500940_03F8_c1NW97_US_106

Multicast Address Set Claim (MASC)

• Tradeoff between aggregation and anticipated demand for group addresses• Group prefix allocations are not

assigned to domains—they are leasedApplication must be written to know that group addresses may go away

• Work in progress

Page 51: 0940_03F8_c1 NW97_US_106

510940_03F8_c1NW97_US_106

Using BGP4+ (MBGP) for Non-Congruency Issues• Multiprotocol extensions to BGP4—

RFC 2283• MBGP allows you to build a unicast RIB

and multicast RIB independently with one protocol• Can use the existing or new BGP peering

topology• MBGP carries unicast prefixes of

multicast capable sources

Page 52: 0940_03F8_c1 NW97_US_106

520940_03F8_c1NW97_US_106

Possible Deployment Scenarios

• Environment:Multiple customers multicast peer at a Gigapop

• Deployment proposalEach customer puts their own administered RP attached to the GigapopThat RP as well as all border routers run MBGPThe interconnect runs dense-mode PIMEach customer runs PIM-SM/DM internally

Page 53: 0940_03F8_c1 NW97_US_106

530940_03F8_c1NW97_US_106

Possible Deployment Scenarios

• What about multiple interconnect points between Gigapop customers?

If multiple Gigapop customers connect at different interconnect points, they can multicast peer for any groups as long as their respective RPs are collocated on the same Gigapop (and the interconnect is Dense mode)

Page 54: 0940_03F8_c1 NW97_US_106

540940_03F8_c1NW97_US_106

Possible Deployment Scenarios

• What if all RPs are not at the same interconnect point?

Use MSDP so the sources known to the interconnect with RPs can tell the RPs at other interconnects where to join

Page 55: 0940_03F8_c1 NW97_US_106

550940_03F8_c1NW97_US_106

Possible Deployment Scenarios

• Use a group range that depends on DNS for rendezvousing or building trees

Customers decide which domains will have RPsCustomers decide which groups will use source trees and don’t have to administer RPsCustomers administer DNS databases