1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas...
-
Upload
joshua-barrett -
Category
Documents
-
view
218 -
download
1
Transcript of 1 Comments on Software Systems HATC Corporation, Beijing December 6 2005 Geoffrey Fox CTO Anabas...
11
Comments onSoftware Systems
HATC Corporation, BeijingDecember 6 2005
Geoffrey Fox
CTO Anabas Corporation andComputer Science, Informatics, Physics
Pervasive Technology LaboratoriesIndiana University Bloomington IN 47401
[email protected]://www.infomall.org
22
Design, analysis, and management of a BIG software project I
General Principles• Quality control in software development • documentation/archives • codes
Design of the architecture of a large-scale or complicated system How to start
• Methodology• Decomposition• Subtask and goal
How to choose programming language and development environment• Trend of programming language (C, C++, and Java)• Platforms (Windows, Unix, and Linux)• Is there any de facto programming language(s) for a certain type of
applications (e.g. C and C++ used to be popular in real-time systems)
33
Design, analysis, and management of a BIG software project II
How to design a client-side (stand alone) air traffic control – a real-time client-side monitor system • Principles
• Reliability
• Performance
• Interface between subsystem and main framework
How to design a large-scale distributed air traffic control system Architecture
• Modularity
• Reusability (difficulty for us)
• Design model (two-tier or three-tier)
Algorithm and performance of air traffic flow control Training of senior system architect
44
Overall Remarks Talk based on my experience which is very different
from that of your company I have developed software in a small company and in
university setting with a mix of students and staff I watched other large software activities including
Apache and other open source Preferred software model changes faster than software
engineering techniques• C++• Corba• Java• Web Services
Maybe some software engineering
55
General Principles I Have a clear management structure with one person in
charge of important decisions• Decisions can and should be debated
Communicate electronically and preserve records in a searchable fashion• Email possible if a clean master list but probably Wikis and
Blogs are better
• Equip with Search – Google web or desktop better than most built in search capabilities
Obviously use CVS or equivalent for preserving version control
Document all actions in Wiki/Blog/email
66
General Principles II Computers are getting faster which implies we do not
have to worry about efficiency as much Build smaller modules
• As modules decrease in size, the overhead of interacting with them increases
• But smaller modules with simple functionality are much easier to build and test
So avoid pointers even more and prefer to communicate data, not pointers thereto, when communicating between modules
Use databases; not ad-hoc storage mechanisms where performance cost can tolerate
77
General Principles III Test as much as you can by having others (Q/A)
exercise code – especially where you need to evaluate system results (output) to see if correct
Use tools like Junit to provide automated repeatable tests
The harder tests are “where you don’t know answer” Then I used to prepare two codes
• One was “production system” with all the bells and whistles
• The other had few options and just did main problem Always test incrementally
• Each module separately
• Full system as it builds up
88
General Principles IV Minimize configuration variables that must be changed
for each installation Rather provide a message-based and user-based
interface that system can use to set operating parameters
Make each module as independents as possible; build together• Module• Documentation• User interface (portlets are an example)• Configuration interface
Store configuration data in a database that is independent of system
99
Web services Web Services build
loosely-coupled, distributed applications, (wrapping existing codes and databases) based on the SOA (service oriented architecture) principles.
Web Services interact by exchanging messages in SOAP format
The contracts for the message exchanges that implement those interactions are described via WSDL interfaces.
Databases
Humans
ProgramsComputational resources
Devices
reso
urce
s
BP
EL,
Jav
a, .N
ET
serv
ice
logi
c
<env:Envelope> <env:Header> ... </env:header> <env:Body> ... </env:Body></env:Envelope> m
essa
ge p
roce
ssin
g
SO
AP
and
WS
DL
SOAP messages
1010
A typical Web Service In principle, services can be in any language (Fortran .. Java ..
Perl .. Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining)
The simplest implementations involve XML messages (SOAP) and programs written in net friendly languages like Java and Python
PaymentCredit Card
WarehouseShippingcontrol
WSDL interfaces
WSDL interfaces
Security CatalogPortalService
Web Services
Web Services
1111
Messaging StructureMessaging Structure Web Service Communication is Web Service Communication is
messaging (transport protocol, routing) messaging (transport protocol, routing) using SOAP protocolusing SOAP protocol
Invoke Other Servicesfrom Header or Body
Messaging
Process SOAPHeader Body
Process SOAPBody Header
Customizable HandlerChain processesSOAP Header
Serviceitself
Service itself
1212
Merging the OSI Levels All messages pass through multiple operating systems and each
O/S thinks of message as a header and a body Important message processing is done at
• Network
• Client (UNIX, Windows, J2ME etc)
• Web Service Header
• Application
EACH is < 1ms (except forsmall sensor clients andexcept for complex security)
But network transmissiontime is often 100ms or worse
Thus no performance reasonnot to mix up places processingdone
IP
TCP
SOAP
App
1313
Linking Modules
From method based to RPC to message based to event-based publish-subscribe Message Oriented Middleware
Module A
Module B
Method Calls.001 to 1 millisecond
Service A
Service B
Messages
0.1 to 1000 millisecond latency
Coarse Grain Service ModelClosely coupled Java/Python …
Service B Service A
PublisherPost Events
“Listener”Subscribe to Events
Message Queue in the Sky
OGCEOGCEConsortium
Individual portlet for the Proxy Manager
Use tabs or choose different portlets to navigate through interfaces to different services
2 Other Portlets
Each Servicehas its own portlet
1515
Portal ArchitecturePortal ArchitectureC
lient
s (P
ure
HT
ML,
Jav
a A
pple
t ..
)
Agg
rega
tion
and
Ren
derin
g
PortalInternalServices
Portlet Class
Portlet Class
Portlet Class
Portlet Class:WebForm
SERVOGrid(IU)
Web/Gridservice
Web/Gridservice
Web/Gridservice
Computing
Data Stores
Instruments
GridPortetc.
(Java)COG Kit
Clients Portal Portlets Libraries Services Resources
LocalPortlets
Remoteor ProxyPortlets
Hierarchical arrangement
1616
General Principles V Do not spend too long documenting and prefer methods
like javadoc that again are naturally associated with code
Do describe actions (as opposed to code functionality) in your Wiki/Blog/email
The quality and speed of different people varies a lot• Evaluate this and assign responsibilities according
Do not let anybody take decisions into their own hands Debate goals and processes but once decision is made
all must adhere to it• Decisions can be changed and should be if needed
1717
General Principles VI Evaluate carefully timing constraints Use simplest most robust approach that satisfies time
constraints• That’s why I recommend databases for configuration as this
is not a time critical part of system Note computer does one instruction in 10-6 milliseconds
but a network communication takes 1-100 milliseconds• Invoking a process has about 1 millisecond overhead
• Method calls 0.01 to 0.01 milliseconds
• Using a database a few milliseconds
• People only notice 30 milliseconds
1818
Consequences of Rule of the Millisecond Useful to remember critical time scales
• 1) 0.000001 ms – CPU does a calculation• 2a) 0.001 to 0.01 ms – Parallel Computing MPI latency• 2b) 0.001 to 0.01 ms – Overhead of a Method Call• 3) 1 ms – wake-up a thread or process either?• 4) 10 to 1000 ms – Internet delay: Workflow
So use pointers and the compute memory system when latencies of ≤ 1 millisecond but use URI looked up in a context store when longer delays allowed
Transfer data when read-only and long latency allowed Always choose the slowest allowed methodology and
remember when in doubt, Moore’s law favors computer performance and systems always get more complex and harder to maintain.
ClassicProgramming
1919
Architecture of a large System Divide system hierarchically into parts
• Interaction between parts will be messages with no conventional pointers• Can have URI’s that need to be looked up in a database (essentially)
Keep doing this until overhead prohibitive• Overhead is “surface”/”volume” for ALL systems – people, software … -
and always decreases in relative importance as system gets bigger Remember computers are going to get faster than slower so err
on side of modularity versus performance Rare to be worth optimizing performance but rather make a
good design that has no bad aspects making performance unnecessarily bad
Specify data structures in XML NOT Java or C++ first• Design ATCML first specifying data structures needed in Air Traffic
Control • Map to SQL for databases (don’t use XML databases)• Map to C++ or Java for programming
2020
Philosophy of Web Service Grids Much of Distributed Computing was built by natural
extensions of computing models developed for sequential machines
This leads to the distributed object (DO) model represented by Java and CORBA• RPC (Remote Procedure Call) or RMI (Remote Method
Invocation) for Java Key people think this is not a good idea as it scales badly
and ties distributed entities together too tightly• Distributed Objects Replaced by Services
Note CORBA was considered too complicated in both organization and proposed infrastructure• and Java was considered as “tightly coupled to Sun”• So there were other reasons to discard
Thus replace distributed objects by services connected by “one-way” messages and not by request-response messages
2121
What is a Simple Service? Take any system – it has multiple functionalities
• We can implement each functionality as an independent distributed service
• Or we can bundle multiple functionalities in a single service Whether functionality is an independent service or one of many
method calls into a “glob of software”, we can always make them as Web services by converting interface to WSDL
Simple services are gotten by taking functionalities and making as small as possible subject to “rule of millisecond”• Distributed services incur messaging overhead of one (local) to
100’s (far apart) of milliseconds to use message rather than method call
• Use scripting or compiled integration of functionalities ONLY when require <1 millisecond interaction latency
Apache web site has many (pre Web Service) projects that are multiple functionalities presented as (Java) globs and NOT (Java) Simple Services• Makes it hard to integrate sharing common security, user
profile, file access .. services
22
Grids of Grids of Simple Services• Link via methods messages streams• Services and Grids are linked by messages• Internally to service, functionalities are linked by methods• A simple service is the smallest Grid• We are familiar with method-linked hierarchy
Lines of Code Methods Objects Programs Packages
Overlayand ComposeGrids of Grids
Methods Services Component Grids
CPUs Clusters ComputeResource Grids
MPPs
DatabasesFederatedDatabases
Sensor Sensor Nets
DataResource Grids
2323
Choice of languages One needs to evaluate real-time version but I would
prefer Java to C++ or C Java has good software development tools and current
generation of programmers well trained in it C++ allows higher performance but find out if you need
this Prefer Web Service model if performance allowed
• Use message-based interaction not method based where possible
• Web services if requires messages and interoperability with outside world
• JDBC is message based interaction with external database Aim at supporting both Windows or Linux platforms if
possible
2424
Client Side Air Traffic Control Analyze all performance requirements Remember life cycle costs are larger than build costs
• Difficult consequences if contract just to build – not to maintain
Use Model View Controller architecture and separate Model and View• Control is often the interaction between Model and View
• So client is not same as user module; always separate business logic from user interface
Use GIS!
2525
Web Services and M-MVC Web Services are naturally
M-MVC – Message based Model View Controller with • Model is Web Service
• Controller is Messages (NaradaBrokering)
• View is rendering
R F I O
ViewView
PortalAggregate WS User Facing fragments
desktop handheld phone
Input port Output port
User Facing Port
PortFacingResource
Web ServiceApplication or
Model
WSRP and JSR168 Portlets
R F I O
ViewView
PortalAggregate WS User Facing fragments
desktop handheld phone
Input port Output port
User Facing Port
PortFacingResource
Web ServiceApplication or
Model
R F I O
ViewView
PortalAggregate WS User Facing fragments
PortalAggregate WS User Facing fragments
desktopdesktop handheldhandheld phonephone
Input port Output port
User Facing Port
PortFacingResource
Web ServiceApplication or
ModelUser Facing Port
PortFacingResource
Web ServiceApplication or
Model
WSRP and JSR168 Portlets
Model
Subscribe UI event
View
Broker
Subscribe re
nderingPublis
h UI event
Publish rendering
Explicit message-based Publish/Subscribe MVC model
ModelModel
Subscribe UI event
View
BrokerBroker
Subscribe re
nderingPublis
h UI event
Publish rendering
Explicit message-based Publish/Subscribe MVC model
As Controller
26
I: Data Mining and GIS Grid
WMS handlingClient requests
WMS Client
UDDI
WFS2
Databases withNASA, USGS features
SERVOGrid Faults
WFS1 NASA WMS
HTTP
SOAP
WFS3
Data Mining Grid
WMS Client
27
Typical use of Grid Messaging in NASA
Datamining Grid
Sensor Grid
Grid Eventing GIS Grid
28
Typical use of Grid Messaging
HPSearchManages
NaradaBrokering
Sensor Grid
WS-ContextStores dynamic data
Filter orDatamining
WFS (GIS data)
Post beforeProcessing
Post afterProcessing
Notify
SubscribeGrid DatabaseArchives
Web Feature Service
GIS Grid
GeographicalInformation System
29
I: Data Mining Grid
HPSearchWorkflow
UDDI
Databases withNASA,USGS features
SERVOGrid FaultsWFS4
SOAP
WS-Context
WFS3
PI Data Mining
Filter
GIS Grid
Filter
NaradaBrokering
Pipeline
System Services
3030
Architecture Consider requirements of application along side performance of
computers and networks• Remember performance of hardware will increase as will cost
of people Don’t fix number of tiers but rather build system from entities
linked by messages such as services linked by SOAP• Messaging good even if not SOAP• SOAP has “container overhead”
Build a data architecture in XML for all information that will be in messages
Use pointers internally to entities Things in messages use system metadata to look up references
• i.e. database lookup not hardware memory model As before use the slowest most general method possible
• Avoid unnecessary performance Build a fault tolerance model into initial architecture
3131
ATC Performance and Algorithm Find size (in latency, bandwidth) of critical
requirements Use publish-subscribe technology to support link
between data sources and programs• Introduces a few (1-5) millisecond delay but much easier to
build and more fault tolerant
• Prefer asynchronous links as makes more modular and more robust
Performance requirements drive architecture Build hierarchical algorithm to match hierarchical
architecture
3232
How to become a Software Architect Work hard! Understand modern technologies and their trends so
future enhances design choices Be able to understand system (requirements) in a clear
fashion Be able to decompose systems in a clear methodical
fashion Isolate detail into modules and use two or three level
programming model
33
Two-level Programming I• The Web Service (Grid) paradigm implicitly assumes a
two-level Programming Model• We make a Service (same as a “distributed object” or
“computer program” running on a remote computer) using conventional technologies– C++ Java or Fortran Monte Carlo module
– Data streaming from a sensor or Satellite
– Specialized (JDBC) database access
• Such services accept and produce data from users files and databases
• The Grid is built by coordinating such services assuming we have solved problem of programming the service
Service Data
3434
Two-level Programming II The Grid is discussing the composition of distributed
services with the runtime interfaces to Grid as opposed to UNIX pipes/data streams
Familiar from use of UNIX Shell, PERL or Python scripts to produce real applications from core programs
Such interpretative environments are the single processor analog of Grid Programming
Some projects like GrADS from Rice University are looking at integration between service and composition levels but dominant effort looks at each level separately
Service1 Service2
Service3 Service4
35
WS 2 WS N-1Web Service 1 Web Service N
3 Layer Programming Model
Level 2 Programming choosing services by virtualizationApplication Semantics (Metadata, Ontology) Semantic Grid
Level 1 Programming inside servicesApplication expressed in in Java Fortran C++ MPI etc.
Level 3 Grid Programming composing multiple servicesService Workflow, Transactions, Mediation
WS-* Infrastructure
Substantial work in UK e-Science program, international semantic web community
3636
Plethora of Standards Java is very powerful partly due to its many “frameworks” that
generalize libraries e.g.• Java Media Framework• Java Database Connectivity JDBC
Web Services have a correspondingly collections of specifications that represent critical features of the distributed operating systems for “Grids of Simple Services”• About 60 WS-* specifications introduced in last 2-3 years• These are low level with higher level standards such as access
database (OGSA-DAI) or “Submit a job” built on top of these Many battles both between standard bodies and between companies as
each tries to set standards they consider best; thus there are multiple standards for many of key Web Service functionalities
Microsoft a key player and stands to benefit as Web Services open up enterprise software space to all participants• e.g. MQSeries (IBM) and Tibco have to change their messaging
systems to support new open standards
3737
The WS-* Infrastructure Core Grid Services build on and/or extend the 60 or so
WS-* Infrastructure specifications which define• Container Model, XML, WSDL …• Service Internet ( (Reliable) Messaging, Addressing)
including extensions for high performance transport and representation. This is natural basis for streaming applications
• Service Discovery• Workflow and Transactions• Security• Metadata and State including lifetime• Notification• Policy, Agreements• Management (service interactions)• Portals and User Interfaces