The Network Operation Centre of a RREN: Anella Cient ífica · The Network Operation Centre of a...

48
The Network Operation Centre of a RREN: The Network Operation Centre of a RREN: Anella Cient Anella Cient í í fica fica Maria Isabel Gandía Carriedo Communications Area, Systems & Networks Department, CESCA TF-NOC Preparation Meeting NORDUnet A/S, Kastrup, 3/5/2010

Transcript of The Network Operation Centre of a RREN: Anella Cient ífica · The Network Operation Centre of a...

The Network Operation Centre of a RREN:The Network Operation Centre of a RREN:

Anella CientAnella Cientííficafica

Maria Isabel Gandía Carriedo

Communications Area, Systems & Networks

Department, CESCA

TF-NOC Preparation Meeting

NORDUnet A/S, Kastrup, 3/5/2010

AgendaAgenda

� About CESCA and Anella Científica

� Anella Científica/CESCA NOC:• Communication with the users

• How we manage the network

• How we manage dedicated circuits

� Tools• Communications database

• Ad-hoc scripts

• Cacti & its plugins

• PerfSonar

• SMARTxAC

• NAM

• Other tools

� Conclusions

About CESCA and Anella CientAbout CESCA and Anella Cientíífica fica

� Public consortium

� Created in 1991

� Formed by:

• Generalitat de Catalunya

• Talència

• 9 Catalan universities

• Consejo Superior de

Investigaciones Científicas

� CATNIX created in 1999

� Anella Científica created in 1993

Our ServicesOur Services

About Anella CientAbout Anella Cientííficafica

Anella Científica is the research and education network in Catalonia

Managed by CESCA

Connected to RedIRIS

With more than 80 points of access of institutions related to research

0

10

20

30

40

50

60

70

80

90

93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10

# P

oin

ts o

f a

cc

es

s

(Ag

gre

ga

tge

d c

ap

ac

ity

in

Mb

ps

)

≤≤≤≤ 10 Mbps

10–90 Mbps

100–990 Mbps

≥≥≥≥ 1.000 Mbps

0

200

400

600

800

1000

1200

2002 2003 2004 2005 2006 2007 2008 2009 2010

Trà

fic

(T

B)

660

660

770

880

880

15188

16190

17288

19388

27502

371G

532G

664G

736G

7.646,552008

4.665,432007

2.591,912010

6.712,352009

2.920,752006

7616G

7928G

8228G

8529G

Anella CientAnella Cientíífica: fica: EvolutionEvolution

Anella CientAnella Cientíífica: Architecturefica: Architecture

� Some local dark fibre links

� L2 Gigabit Ethernet network

� Flexible and easily scalable

� Different points of access & connections:

• Ethernet: 10, 34, 100, 1,000 and 10,000 Mb/s

• ADSL, SHDSL

� Core is a full mesh, redundancy in the links between nodes

� Access is a “ring”: dual homing

� Redundancy of the provider network and the WDM network

� Customizable CIR + EIR

� QoS capabilities at L2 network

…but the model will probably change

Anella CientAnella Cientíífica: projectsfica: projects

� PIC participates in LHC (10 Gbps)

� i2CAT participates in FEDERICA, Phosphorus, HDVIPER (10 Gbps)

� UPC-CCABA participates in EuQoS, MUPBED,… (1 Gbps)

� CESCA, i2CAT & UPC participate in PASITO (10 Gbps)

� BSC participates in RES (1 Gbps)

� Liceu transmits the course Opera Oberta

CESCA, as the manager of the Regional Research and Education Network (RREN) in Catalonia and as a Local Internet Registry (LIR) has:

• Addresses for the connected institutions:

– IPv4: 84.88.0.0/15

– IPv6: 2001:40B0::/32

• An Autonomous System (AS):

– AS13041

�CESCA controls all the L3, some L2 and some L1,

so our monitoring is mostly L3-based.

Anella CientAnella Cientíífica: L3fica: L3

Anella CientAnella Cientíífica: topologyfica: topology

1. Public and private non-profit Universities

2. Official Bodies of Research

3. Other non-profit Research centres

4. Hospital Research centres

1. Official bodies of R+D management

2. Relevant Digital contents institutions

3. R+D+i participants

4. Special interest for R+D institutions

1. Science and technological parks

2. Other hospital units

A B C

C. Nord Telvent

Operator

Internet

Anella CientAnella Cientíífica: circuitsfica: circuits

� Permanent circuits & services:

• Each point of access has one circuit to each core node for

redundancy (using L3 routing)

• An institution can have more than one VLAN with other points of access that usually belong to the same institution (internal traffic)

• An institution can have a dedicated virtual router, managed by

CESCA, to aggregate some connections

C. Nord Telvent

Operator

A B C

Anella CientAnella Cientíífica: points of accessfica: points of access

Backbone

Node

BackboneNode

Access

Node 10~70km10~40km

10~70km

Access

Node

Core

Access Ring

AgendaAgenda

� About CESCA and Anella Científica

� Anella Científica/CESCA NOC:• Communication with the users

• How we manage the network

• How we manage dedicated circuits

� Tools• Communications database

• Ad-hoc scripts

• Cacti & its plugins

• PerfSonar

• SMARTxAC

• NAM

• Other tools

� Conclusions

The NOC: Communications AreaThe NOC: Communications Area

� Some numbers:

• 85 points of access

• 2 core nodes

• 76 institutions connected to Anella Científica

• 22 entities connected to CATNIX

• 4 network engineers & 1 student

• 20 engineers for the weekend monitoring

� Help from the Operations & Security Area for cabling, installations, etc.

� We have a technical and an administrative contact for each institution

that channel all the requests (IP address assignments, routing,

dedicated circuits, incidents), but we can have previous conversations with relevant users to know their needs.

� Some technical contacts have a meeting once a year (CTAC).

� We organize a Meeting/Workshop (TAC) once a year to present new

institutions and projects (for instance, this year, Cloud Computing)

Communication institutions Communication institutions --> CESCA> CESCA

� Adresses (RT):

[email protected]

– Routing

– Network incidents

[email protected]

– Addresses requests

– Reverse DNS

[email protected]

– Services (Multicast, ftp-mirror,…)

[email protected]

– Eduroam

[email protected]

– Security incidents

� Telephone

Communication CESCA Communication CESCA --> institutions > institutions

� Distribution lists:

[email protected]

– Members of the Comission

[email protected]:

– Technical representatives

[email protected]:

– Other technical staff

– Generic addresses

� RT queues

� Telephone & e-mail

� TAC

� Aula (New Technologies and Seminars)

If there is an incident..If there is an incident..

� During our working hours (9.00-18.00 Mo-Th, 9.00-14.30

Fr, 8.00-15.00 Jul/Aug)

• They call us

• They send a message to [email protected]

• We try to be very proactive

� Out of our working hours, 24x7 reactive service for the institutions with an external enterprise.

� The external enterprise is able to check the state of our

routers and switches and, if the problem is external, they

can call our provider.

� Second level support from our technicians during the weekend.

How we manage the networkHow we manage the network

� Inventory of circuits using “our” Communications database

� Ad-hoc scripts and alarms

� Statistics via SNMP with Cacti

� UPC-CCABA has developed a passive monitoring system

using real-time analysis: SMARTxAC

� Our NOC is subscribed to the Dante E2ECU (End to end coordination unit) mailing list for dedicated circuits

� perfSONAR node through RedIRIS for LHC

� NAM

� Other tools

How we manage dedicated circuitsHow we manage dedicated circuits

� Special circuits & services:

• If the circuit is between two institutions connected to Anella

Científica, we ask both if they want the connection. We have a

special range of VLAN for these connections.

• If the circuit is external, RedIRIS uses a formulary that the

institutions fill and send. They send it to RedIRIS and CESCA

indicating the name of the project, description, responsible entity, kind of connection, etc.

• For modifications, institutions can ask us directly and we contact

RedIRIS

• RedIRIS and CESCA have agreed two ranges of VLAN for special

projects, one range for each type of encapsulation

• We use the Request tracker to handle all the requests, arrange a

VLAN number, etc.

For our users:For our users:

� Listen to their needs first

� For each new connection, there are some stress tests before going to a production environment

� They can choose static routing or dynamic routing (BGP)

� We ping their interface from the other end of the /30 and

from our monitoring machine

� We apply anti-spoofing filters…Some insist on using the

infrastructure address for VPNs �

AgendaAgenda

� About CESCA and Anella Científica

� Anella Científica/CESCA NOC:• Communication with the users

• How we manage the network

• How we manage dedicated circuits

� Tools• Communications database

• Ad-hoc scripts

• Cacti & its plugins

• PerfSonar

• SMARTxAC

• NAM

• Other tools

� Conclusions

““OurOur”” Communications databaseCommunications database

““OurOur”” Communications Communications databasedatabase

� We store all the information of our institutions:• Points of access

• Addresses

• Technical and executive contacts e-mails and telephones

• Assigned IP addresses

• Infrastructure addresses (point to point)

• Equipment

• Bandwidth

• Technology

• Comments, special cases for the 24x7 service

� It makes our life easier, as we have many “special” cases:• More than one point of access per institution

• More than one institution per point of access

• Different circuits intra and inter-institutions

• …

““OurOur”” Communications Communications databasedatabase

� All the information from an institution/circuit/person is linked

� Every time we need to contact an institution, we find the related information here

� It’s not accessible from external networks

� It’s programmed by our engineers

� It also stores information of the Neutral Internet Exchange, CATNIX

� Pros

• All the information is together

• We don’t have to maintain separated files for the assignment of VLAN, IP addresses, etc

• Easy creation of new instances

• When there is a change on the technical/administrative contacts, it’s

changed “almost” automatically

� Cons

• Each change requires programming

• Sometimes the initial programmer is not the same person that

makes the changes

““OurOur”” Communications databaseCommunications database

““OurOur”” adad--hoc scriptshoc scripts

� They send e-mails and messages to our mobile phone

when a connection fails.

� The institution and problem is on the subject

� It’s the best way to be “proactive”

� Pros

• They are extremely useful to quickly detect problems and know

them during the weekend

• Easy to program (shell)

� Cons

• We need to remember to add the institutions each time there is a

new connection (separated maintenance)

““OurOur”” adad--hoc scriptshoc scripts

CactiCacti

� RRDtool front-end, high performance tool that stores and represents series of data.

� It’s used to monitor:• CPU, temperature and memory of the routers

• Anella Científica: points of access

• Voice calls

• Remote and direct access services

• CATNIX (Internet Exchange)

• Warnings

• Automatic monthly statistics

• BGP prefixes

• Ping

• Power consumption

• RedIRIS & Orange Business Services graphics integrated

SCP

Cacti: one for users, one for managementCacti: one for users, one for management

PRIVATE PUBLIC

Contact information

CACTICACTI

SC

P

PluginsPlugins

� Useful to generate monthly reports

� Useful to detect • Down links

• Congestions

• High temperature

• High CPU

• Excess of BGP prefixes…

ToldTold

ReportitReportit

SuperlinksSuperlinks

� It allows us to new tabs

� Useful to integrate RedIRIS graphs

in the same environment

� It stores in the cache the visited graphs for 5 minutes

� It doesn’t generate all the graphs

BoostBoost

Link2BDCOPSLink2BDCOPS

� It adds an icon next to each graph that, if you click on it ,

you see the data of the technical and administrative contact

� Programmed by our engineers

� Linked to our database

� Only the internal Cacti has access to it

Some weather maps: occupation of the linesSome weather maps: occupation of the lines

Provider

SomeSome weatherweather mapsmaps: Link : Link andand occupationoccupation

2

35

CreatingCreating a a newnew codecode

1

3

4

5

CactiCacti

� Pros

• It’s very useful to detect problems

• It’s very useful to “see” the network while it’s working

• It makes the 24x7 service easier

• It simplifies the generation of monthly reports

• Graph templates are useful

� Cons

• Groups of users are hard to manage

• The creation of Graph Templates requires time and dedication

• The user interface is better if you don’t have a big amount of data.

PerfSonarPerfSonar

� We’re beginning to use it.

� Initially installed for the LHC project

� Uses the installable DVD version from RedIRIS

� Coordinated through RedIRIS

� Other tools, like NDT, also installed, for the measurement

of the network by our users

• Our NOC is subscribed to the Dante E2ECU (End to end coordination unit) mailing list

PerfSonarPerfSonar

� Pros

• Good for inter-domain monitoring of L2 circuits (LHC)

• Very powerful if all the tools are used

� Cons

• Installing it wasn’t easy at all…

SMARTxACSMARTxAC

� Traffic Monitoring System for Anella Científica (Sistema de

Monitorització de AЯTfic per l’Anella Científica).

� It’s a passive monitoring and analysis system, tailor-made

for Anella Científica by the Advanced Broadband

Communications Service of the Technical University of

Catalonia (UPC-CCABA).

� Usable for other high-speed networks.

� Since 2003, SMARTxAC has been used for continuously

monitoring Anella Científica.

� Passive splitters and cards for every external link.

SMARTxACSMARTxAC: Topology and splitters: Topology and splitters

Campus

NordTelvent

Specialprojects

Catalyst 6500

Level 2/3

Local connections

Juniper M320

Level 3 (RedIRIS)

Nortel

Level 2 (RedIRIS)

Capture servers (Endace

cards), analysis and

monitoring

Splitters

Catalyst 6500

Level 2

Catalyst 6500

Level 3

Operator

SMARTxACSMARTxAC

� Pros

• It captures ALL the headers through the regular traffic links

• Very useful to detect problems that happened hours ago

• Traffic is classified

• It can detect different types of application

� Cons

• The 10 Gbps cards are very expensive

• New interfaces require more programming and more cards

The NAM, Network Analysis ModuleThe NAM, Network Analysis Module

� It’s a module of the Catalyst 6500

� Similar to a SPAN port + server with ethereal/wireshark

� It allows us to capture all the traffic in certain period

� The results help us to find the origin of attacks or security

problems, black holes, etc.

� 2 simultaneous captures

Source: http://www.cisco.com

The NAM, Network Analysis ModuleThe NAM, Network Analysis Module

� Pros

• Very easy to use (web-based interface)

• Analysis in real-time of what’s happening on the network

• The capture can be saved in “ethereal format”

• It can monitor physical and logical interfaces, like VLANs

• It monitors ALL the traffic

• Filters can be applied before the capture

� Cons

• It’s a proprietary solution

• It can only monitor interfaces 1 Gbps or less

• It’s used once a problem has started

OtherOther toolstools

� MGEN to send big amounts of traffic on the links and check

if they we can fulfill them with UDP traffic

� Direct access to some tools that our providers gives us:

• HP Openview

• Cacti statistics

• Management of VLAN

� Iperf

� Netmate

� Pathrate

� Nagios

� Zabbix

� MTR

The most common incidents & requestsThe most common incidents & requests

� Incidents:

• Electrical cuts at the institution

• Radiolinks & ADSL

• Last mile fibre cuts

• Crazy firewalls…

• DoS attacks

� Other requests

• Multicast tests

• New circuits

• Routing

• DNS

• Redundancy

AgendaAgenda

� About CESCA and Anella Científica

� Anella Científica/CESCA NOC:• Communication with the users

• How we manage the network

• How we manage dedicated circuits

� Tools• Communications database

• Ad-hoc scripts

• Cacti & its plugins

• PerfSonar

• SMARTxAC

• NAM

• Other tools

� Conclusions

ConclusionsConclusions

� Our RREN has to face the problems of small entities, big

universities and research centres and very important projects with dedicated lambdas that traverse several

domains

� RT for incidents

� At least a database for data

� At least a monitoring tool

� At least an analysis tool

� New models with dark fibre require new management

models for the NOC

No single tool

Thanks for your attention!Thanks for your attention!

Questions? Suggestions?Questions? Suggestions?

[email protected]