WRL-TN-20

download WRL-TN-20

of 21

Transcript of WRL-TN-20

  • 7/28/2019 WRL-TN-20

    1/21

    J U L Y 1 9 9 1

    WRL

    Technical Note TN-20

    How Digital ImpedesPortability andInteroperability

    Jeffrey C. Mogul

    Digital Internal Use Only

    d ig

    i t a lWestern Research Laboratory 250 University Avenue Palo Alto, California 94301 USA

  • 7/28/2019 WRL-TN-20

    2/21

    The Western Research Laboratory (WRL) is a computer systems research group thatwas founded by Digital Equipment Corporation in 1982. Our focus is computer scienceresearch relevant to the design and application of high performance scientific computers.We test our ideas by designing, building, and using real systems. The systems we buildare research prototypes; they are not intended to become products.

    There is a second research laboratory located in Palo Alto, the Systems Research Cen-ter (SRC). Other Digital research groups are located in Paris (PRL) and in Cambridge,Massachusetts (CRL).

    Our research is directed towards mainstream high-performance computer systems. Ourprototypes are intended to foreshadow the future computing environments used by manyDigital customers. The long-term goal of WRL is to aid and accelerate the developmentof high-performance uni- and multi-processors. The research projects within WRL willaddress various aspects of high-performance computing.

    We believe that significant advances in computer systems do not come from any singletechnological advance. Technologies, both hardware and software, do not all advance atthe same pace. System design is the art of composing systems which use each level oftechnology in an appropriate balance. A major advance in overall system performance

    will require reexamination of all aspects of the system.

    We do work in the design, fabrication and packaging of hardware; language processingand scaling issues in system software design; and the exploration of new applicationsareas that are opening up with the advent of higher performance systems. Researchers atWRL cooperate closely and move freely among the various levels of system design. Thisallows us to explore a wide range of tradeoffs to meet system goals.

    We publish the results of our work in a variety of journals, conferences, researchreports, and technical notes. This document is a technical note. We use this form forrapid distribution of technical material. Usually this represents research in progress.Research reports are normally accounts of completed research and may include materialfrom earlier technical notes.

    Research reports and technical notes may be ordered from us. You may mail yourorder to:

    Technical Report DistributionDEC Western Research Laboratory, WRL-2250 University AvenuePalo Alto, California 94301 USA

    Reports and notes may also be ordered by electronic mail. Use one of the followingaddresses:

    Digital E-net: DECWRL::WRL-TECHREPORTS

    Internet: [email protected]

    UUCP: decwrl!wrl-techreports

    To obtain more details on ordering by electronic mail, send a message to one of theseaddresses with the word help in the Subject line; you will receive detailed instruc-tions.

  • 7/28/2019 WRL-TN-20

    3/21

    How Digital Impedes Portability and Interoperability

    Jeffrey C. Mogul

    July, 1991

    Abstract

    Digital is emerging from its years as a vendor of proprietary systems with

    institutional attributes that impede the delivery of high-quality portable and

    interoperable software. In spite of our best intentions, we cannot succeed in

    todays market without recognizing these barriers. Some of the barriers(such as byte-order) are inherent in our systems and must be circum-

    navigated; other barriers (such as the Not Invented Here syndrome) are

    part of the culture and must be dismantled. Drawing on my experiences in

    porting software to Unix systems, and in working on IP/TCP interoperability

    problems, I describe a number of Digitals subtle organizational barriers and

    suggest some solutions.

    Digital Internal Use Only

    Copyright 1991Digital Equipment Corporation

    d i g i t a l Western Research Laboratory 250 University Avenue Palo Alto, California 94301 US

  • 7/28/2019 WRL-TN-20

    4/21

    ii

  • 7/28/2019 WRL-TN-20

    5/21

    1. Introduction

    In the 1980s, Digital made a lot of money selling VAX/VMS and DECnet systems. The com-

    pany built up a way of doing things that succeeded in this proprietary environment.

    In the 1990s, our customers have changed; they arent willing to wait for us to provide the

    solutions that we think they want. If we dont deliver, they can probably find something better,sooner, from another vendor. The corporate ways that we evolved in the 1980s arent going to

    work in this new world. We need to find new ways of producing software products, and we

    cant afford to hope that luck will save us.

    In this paper I will describe a few of Digitals structural problems, loosely grouped under the

    categories barriers to portability and barriers to interoperability. These might also be

    called barriers to the creation of systems that support portable and interoperable software,

    since the problems lie chiefly in our systems, rather than the applications we sell.

    One thing should be understood: I am not blaming anyone for the structural issues I will

    describe in this paper. As I said, these structures worked well in the 1980s. What people should

    be blamed for is failing, now, to realize that some of these structures are no longer helpful.

    I would also like to apologize in advance if some of the things I say turn out to be inaccurate.

    It is extremely hard to find out exactly what is available, or in progress, within Digital; this in

    itself is something we should be working to fix. Also, there are many people who are already

    doing great work to solve our portability and interoperability problems; they deserve our support.

    2. Barriers to portability

    In this section, I will look at the problems we face in creating systems that support software

    portability; that is, systems to which and from which it is easy to port software.

    Customers like systems that support software portability, because it means that they dont

    have to rely on one vendor. Established vendors used to hate software portability because they

    feared losing their captive audiences. Today, of course, we know that no sane customer would

    buy a system to which it is hard to port applications. What is not quite so obvious is that neither

    would any sane customer buy a new system that they couldnt escape from later on; so, no sane

    customer would buy a system from which it is hard to port their applications.

    2.1. Avoid unnecessary differentiation

    To a first approximation, this means that our systems have to be the same as everybody elsessystems. We must be very careful about positioning proprietary value added features in

    otherwise standard systems. From the customers point of view, such features must be valuable

    indeed in order to justify giving up the freedom of easy escape. From our point of view, adding

    such features may not be the proper use of our limited engineering resources.

    This is not to say that we should not add value to otherwise standard systems. Some features

    would clearly be worth it to the customer (such as a high-quality backup system). Features that

    do not affect the external interface, such as SMP, may also be worth adding. PrestoServe is a

    good example of useful added value that does not harm portability.

    Digital Internal Use Only 1

  • 7/28/2019 WRL-TN-20

    6/21

    HOW DIGITAL IMPEDES PORTABILITY AND INTEROPERABILITY

    But, unnecessary differentiation is an unmitigated mistake. For example, DECwrite was

    originally based on FrameMaker. Today, DECwrite and FrameMaker have diverged con-

    siderably. Maybe DECwrite has features that arent available on the platforms of other vendors

    ... but most users probably dont need those features, and they would rather not have to learn a

    new system.

    In summarizing the results of a recent Product Directions Forum, Patricia Ward of T&Nwrote:

    The customers are highly enthusiastic about NAS, viewing it as critical to the kinds of ap-plications they will be building in the near future. However, questions regarding DECs inten-tions to utilize industry standard APIs emerges as a paramount concern of these customers.They strongly prefer industry standard APIs over DEC APIs, even if the latter are developed formultiple platforms. They believe that DECs current positioning of NAS does not clearlyenough convey the commitment to comply with existing -- and in particular, emerging -- in-dustry standards in all areas. Where standards are only partially complete, e.g. Motif, thecustomers still want DEC to follow the developing standards are closely as possible. They alsomaintain that DEC must not choose to develop proprietary standards even when facing irrecon-cilable differences with standards bodies.

    2.2. Dont take too much advantage of our advantages

    One common mistake is to write software that takes full advantage of the underlying

    hardware or operating system. This is a serious trap, because the tendrils of the system and the

    software get so entwined that portability is choked off. Compiler writers used to make this mis-

    take; they saw the nifty VAX instructions for doing complex operations, and assumed that if the

    hardware designers went through the trouble of putting them in, the compiler would have to use

    them. Often, the resulting programs ran slower, and the compilers were more complex ... and

    entirely unportable to new architectures.

    VMS suffers from the same problem. The entire Alpha project is an attempt to avoid solvingthis problem. It may even succeed, but we will still be stuck with an operating system that cant

    be ported. In theory, the reason why Alpha was necessary (instead of, say, using a MIPS instruc-

    tion set) was that VMS relies on certain protection features of the VAX architecture, not present

    in MIPS. VAX/VMS might have been the right design for the 1970s and 1980s, but woe unto us

    if we assume that this kind of entanglement will work in the future.

    2.3. Standards are not the entire answer

    Another mistake is to assume that standards can save us. Standards are supposed to make

    portability easier. We can jump up and down and scream and wave standards at our customersall we want, but customers dont care about standards; they care about portability. If Sun sys-

    tems dominate the workstation market, then it matters very little what official standards SunOs

    meets; what matters is that one can port software between ULTRIX and SunOs. Another way to

    say this is that de jure standards usually arent worth the paper they are printed on; de facto

    standards are the only ones that count. To be successful, we have to be good at guessing where

    the de facto standards are headed. Sometimes we can set them ourselves, but often we will have

    to follow our competition at very short notice.

    Digital Internal Use Only 2

  • 7/28/2019 WRL-TN-20

    7/21

    HOW DIGITAL IMPEDES PORTABILITY AND INTEROPERABILITY

    2.4. Excess stability is a trap

    A mistaken reliance on standards is often coupled with a mistaken reliance on stability. Open

    systems evolve far more rapidly than our old proprietary systems do. If we cant keep up with

    the changes in our competition, our systems will end up too different. Yes, of course it is true

    that we should not jerk the customers around unnecessarily; but it is worse to change an interface

    two years after our competition does so than it is to change it as soon as the trend is clear.

    For example, one of the most persistent portability problems with ULTRIX is the use of the

    4.2BSD syslog interface, when everyone else is using the 4.3BSD version of the interface. Many

    customers have complained to me that if it were not for this one difference, their programs would

    port without any source code changes. (One customer even offered to visit ZK3 and change all

    the ULTRIX applications to use the new syslog interface, if we would simply change over.)

    2.5. Bug reports are good

    Even less excusable than excess stability is our inability to repair bugs quickly. Software

    quality is one of our main advantages (I was told by a customer that our systems crash far lessoften than Suns), but some bugs seem never to get fixed. When a bug can be fixed without

    disrupting existing applications, there is no excuse when it is not fixed in the first release follow-

    ing the bug report. Yet, several times I have reported a bug and supplied a simple fix, only to be

    told months later that it might be fixed in a future release. Perhaps we should not be waiting

    until the end of field-test to fix bugs that have been known about for months.

    Bug reports should be welcomed, not shunned, because they provide us with a chance to im-

    prove our software. We should reward customers for reporting bugs, not charge them for the

    privilege (or make them type the reports onto antique five-part forms). This is especially impor-

    tant for our relationship with Independent Software Vendors (ISVs). When ISVs who discover

    ULTRIX bugs first have to fight to get anyone in Digital to listen, and then have to wait monthsfor a fix, they arent likely to chose ULTRIX as a platform for their applications.

    1Doug Clark, one of Digitals most successful computer designers, has written a paper that

    shows how well the bugs are good philosophy works in hardware engineering. It should be

    just as successful in software engineering.

    Unfortunately, the ULTRIX product groups seem to be unable to provide timely fixes to those

    bugs that do get reported. This, Im told, is due partly to a lack of human resources dedicated to

    bug-fixing, and partly to an ancient software development environment that makes it unneces-

    sarily difficult to fix bugs. If this is so, the fault lies with our engineering management hierar-

    chy.

    1Douglas W. Clark, Bugs are Good: A Problem-Oriented Approach to the Management of Design Engineer-ing, Research-Technology Management, May-June, 1990. This journal is available via the Digital Library Net-work.

    Digital Internal Use Only 3

  • 7/28/2019 WRL-TN-20

    8/21

    HOW DIGITAL IMPEDES PORTABILITY AND INTEROPERABILITY

    2.6. Staying honest about portability

    One of the most important lessons about portability is that it may be extremely hard to port an

    old program, but it is not hard at all to make a new program portable. That is, if portability is

    designed in from the start, its usually almost free. For example, Ive ported numerous network

    programs between big-endian machines (such as Suns) and little-endian machines (ours).

    In the cases where the original programmer was sensitive to the possibility of a byte-order mis-match, the programs had (almost) all the necessary ntohs (etc.) macros, even though these are

    no-ops on Suns. Porting these programs was easy. Other programmers were too lazy to think

    ahead; porting these programs is hard, not simply because the macros are missing, but because it

    often isnt at all obvious where to put them.

    Its also important to have something to keep the programmers honest about portability. Even

    when designed in, portability cannot be guaranteed without a lot of testing. The temptation is

    too great to make the software work on our own systems, or add new features, rather than look-

    ing ahead to the future.

    I understand that, from the beginning, Sun tried to ensure that their kernel was portable bymaking sure that every release would run on their Vax. This surely added a little time to their

    release cycle, but it even more surely saved them a lot of time when they introduced their 386-

    based and SPARC-based systems.

    Most of the code in the ULTRIX kernel isprobably portable to, say, big-endian systems, if only

    because it came that way from Berkeley. Ive looked at the kernel DECnet Phase IV code,

    though, and I would be extremely surprised if that code didnt have some nasty portability

    problems. Were lucky that the ACE initiative went with little-endian byte order, but if we ever

    want to move our DECnet code onto Suns installed base, we may have some trouble with this.

    Fortunately, the DECnet group has learned its lesson and wrote portable code for Phase V; it

    would be sad if other groups had to learn this lesson the hard way.

    2.7. Taking advantage of public-domain software

    One of the great things about Unix is that there is a lot of public-domain software floating

    around, which with a little effort can be ported to ULTRIX. Not only are these programs an

    opportunity for us to provide some added value (by shipping pre-ported versions along with our

    systems), they are also wonderful tests to see if our systems support software portability. We

    should be finding useful (or weird) public-domain software and porting it to ULTRIX, as part of

    the development cycle for ULTRIX releases. The people doing the ports would learn things that

    should influence the design of the system.

    Right now, it is too hard to import public-domain software. Aside from the usual Not In-

    vented Here (NIH) syndrome, our organizational structures are not able to cope with the concept

    of shipping software from public sources. For example, when I ported tcpdump I gave the

    ULTRIX documentation folks the manual page, which has a prominent copyright notice saying

    that the documentation must continue to contain the copyright notice. This section doesnt meet

    ULTRIX documentation standards, so it was removed, and I had to fight to get it put back.

    Digital Internal Use Only 4

  • 7/28/2019 WRL-TN-20

    9/21

    HOW DIGITAL IMPEDES PORTABILITY AND INTEROPERABILITY

    One of the benefits of public software is that we can rely on the public to improve it. In order

    to take advantage of these improvements, we cannot let our version of the sources diverge far

    from the public version. This would seem to stifle any improvement on our behalf, but in fact

    we could easily play in the game by making our own improvements public. Yet we cant bring

    ourselves to give software away, even if we got the original version for free (or nearly so). The

    continuing confusion over which version ofsendmail to use is partly due to our inability to par-

    ticipate in the public arena.

    Digitals Cambridge Research Lab (CRL) is working on producing a CD-ROM of public

    software. This would solve the problem of making such software easily available to customers,

    but we really should be integrating some of that software into our mainstream product process.

    2.8. How do we make money by giving away software

    Digital used to make its profits on hardware, but in a marketplace where hardware is a com-

    modity, we will have to make our money elsewhere. Many people have told me that they cannot

    understand how we can make money if we follow my advice to give away source code forpublic-domain programs.

    There are several answers to this objection. The most important is that we do not sell each of

    our programs separately; we sell software products, some of which contain hundreds or

    thousands of individual programs. The quality of a software product depends upon the quality of

    the individual programs, and on how well they are integrated. We cannot charge separately for

    each incremental addition to the base ULTRIX system. Instead, by integrating functionality we

    make customers more likely to chose our software over the competitions.

    Some of this base system functionality is best provided by public-domain programs. The only

    way to keep the most up-to-date versions of such software integrated into our systems is to par-

    ticipate in the open exchange of improvements. True, if we give away our improvements and ifour competitors are also efficient at importing public-domain software, we wont have much of

    an advantage, but at least we wont be at a disadvantage.

    If we dont participate, either our software will be obsolete or we will have to expend precious

    resources to maintain it. We have a dismal track record for internal maintenance of originally

    public-domain software, and there is no reason to expect that we can divert additional resources

    to that task. If we do participate, our software will be better and our time-to-market could be a

    lot shorter.

    Remember also that for software that promotes interoperability, one never gains much of an

    advantage by being different from the competition.

    It is also important to understand that public-domain software, while often critical to the

    smooth operation of our systems, constitutes a small fraction of our value-added software, and

    almost none of our layered products. Giving away our changes will not cut into our cash cows.

    Digital Internal Use Only 5

  • 7/28/2019 WRL-TN-20

    10/21

    HOW DIGITAL IMPEDES PORTABILITY AND INTEROPERABILITY

    2.9. The dangers of poor documentation

    Customers and applications vendors need to know how to use all the interfaces of our systems,

    yet many of the features we have added to ULTRIX are poorly documented, or simply un-

    documented. (VMS programmers porting code to or from ULTRIX have the same problem.) If

    these interfaces arent good enough to document, they probably shouldnt be there in the first

    place.

    For example, ISVs sometimes need to know how to obtain the Ethernet hardware address as-

    sociated with a workstation. The on-line manual says that the SIOCRPHYSADDR ioctl can be used

    to obtain this value, but I cannot find any documentation on the form of the call. (By reading

    some header files, one can infer the proper form for this call; in other cases, doing even that is

    not easy.)

    In general, the organization of ULTRIX documentation reflects Digitals political structure, not

    the needs of the customer. For example, some of the features of the base ULTRIX system are

    covered only in the DECnet/ULTRIX documentation. Worse, there is no decent overview

    (tutorial) documentation on how to put everything together. This may be because the functionaldivisions between groups responsible for pieces of the system leaves nobody to take care of the

    larger picture.

    It also seems to be the case that there is no up-to-date, comprehensive documentation for how

    to write kernel code to work with the SMP mechanisms now in ULTRIX; this is important for

    people trying to port kernel mechanisms into ULTRIX from other BSD-based systems. I believe

    that not even internally is such documentation available.

    The only solution to the lack of documentation for these interfaces is to insist that writing, and

    maintaining, the documentation be part of the job of the implementors. Currently, our program-

    mers are often reluctant tech-writers, and some of our document editors dont understand impor-

    tant issues about the systems they are documenting.

    We are going to have to change the reward structures so that implementors produce honest, if

    perhaps unpolished documentation. We are also going to have to find writers who truly know

    our systems. In either case, this means investing in additional training, and probably it means

    hiring more highly-skilled people. We cant afford to ship incompletely or improperly

    documented systems; people will not port to a system they cannot understand.

    2.10. Providing tools for portability

    If customers buy a Unix system from several of our major competitors, they will also get tools

    to help them port their VMS applications to Unix. These include clones of the VMS text editors,command interpreter, and certain library packages. Does Digital have it now? Apparently not.

    Is this because we are unwilling to make it easy for our customers to switch from high-markup

    VMS systems to low-markup ULTRIX systems? Too bad: they are switching anyway, but to HP

    and IBM and Sun.

    Do we have the tools to allow people to port big-endian applications to our little-endian sys-

    tems? We could have had a nearly seamless big-endian support environment to ship with

    Digital Internal Use Only 6

  • 7/28/2019 WRL-TN-20

    11/21

    HOW DIGITAL IMPEDES PORTABILITY AND INTEROPERABILITY

    ULTRIX 4.2, given the availability of the R3000A CPU, but we dont. (Someone is working on

    it, but dont hold your breath.)

    Do we provide to customers guidebooks on how to port software to ULTRIX (and from

    ULTRIX) to various systems? Or does each application vendor have to learn this from trial and

    error?

    3. Barriers to interoperability

    Digitals success in the 1980s was largely due to our ability to use DECnet to tie together large

    networks of VAX systems. In the 1990s, DECnet is no longer sufficient; our customers live in a

    multi-vendor, multi-organizational environment, and interoperability is the key to making it all

    work.

    Perhaps the most important thing to remember is that although one may intend to create an

    interoperable system, this does not mean that the system will interoperate. We are not in control

    of the environment; this means that interoperability is a moving target. Intending to hit it is not

    enough; we must continually watch where it is going, and to see if our attempts hit the mark.

    3.1. Standards do not mean interoperability

    Interoperability does not come magically with open network architectures. DECnet Phase

    IV was not, in fact, a proprietary protocol architecture, but we failed to spread it into the larger

    market. This may well be because Phase IV is not suitable for large, multi-organizational net-

    works.

    DECnet Phase V is premised on the openness of the ISO protocols, but after over a decade of

    work and thousands of pages of ISO standards, there is still virtually no ISO networking in use.

    Customers know that IP/TCP, with its many flaws, is the only truly interoperable networking

    technology available today.

    For years, people in the IP community have been searching unsuccessfully for an easy way of

    testing IP/TCP implementations to ensure that they interoperate. Experience has shown that

    standards, while necessary, are not sufficient; two competent implementors working from the

    same standard often produce incompatible implementations. Our ability to use formal methods

    is not yet, and may never be, sufficient to generate perfect implementations. Test suites are

    helpful, but not sufficient, because they cannot simulate the whole range of bizarre behavior that

    a robust implementation must handle.

    3.2. Test early and often

    The only accepted method for ensuring interoperability is to test each implementation against

    as many others as is possible. The IP community has numerous ways of doing this; for example,

    once a year Sun runs Connectathon to provide NFS and X vendors a chance to test their im-

    plementations against one another. The rule at Connectathon is that the results are kept secret;

    vendors (including Digital) go there to discover their own problems, not to obtain marketing

    ammunition against the competition.

    Digital Internal Use Only 7

  • 7/28/2019 WRL-TN-20

    12/21

    HOW DIGITAL IMPEDES PORTABILITY AND INTEROPERABILITY

    We also need to take better advantage of the Interop conference trade show. Interop is billed

    as the only conference where all vendors are required to connect to the show-floor network, and

    has been an invaluable testbed for interoperability issues. Digital, as a company, has continually

    resisted allocating sufficient resources to its Interop participation. We have also been reluctant

    to stress multi-vendor interoperability, instead pushing VMS-to-ULTRIX interoperability as a

    major theme.

    Interoperability testing should occur early in the product development process, when there is

    still time to fix problems. This means that we should be able to do a lot of it in-house, rather

    than waiting for field tests. Doing so requires that some product groups have access to our com-

    petitors most up-to-date products, as well as older systems (our own and the competitions) that

    may still populate many customer sites. It isnt sufficient that these alien systems sit in the

    corner, to be hauled out only for testing; if they arent heavily used by people who depend on

    them for their daily work, the real bugs wont be discovered. Perhaps the right approach is to

    make sure that the groups porting applications to other vendors platforms become an integral

    part of the network interoperability testing process.

    3.3. Multi-organizational networks are different

    Interoperability is not just about linking systems from multiple vendors. It also means linking

    together organizations that are under different administrations, and that might be potentially hos-

    tile to one another. Today, via the Internet, I can reach systems at universities, at all of our major

    competitors, and even in countries that a few years ago were considered enemy nations. This is a

    significantly different and more difficult environment than the single-company DECnet networks

    we built in the 1980s.

    Multi-organizational networks are on a completely different scale than our old DECnet net-

    works. We are proud of having one of the largest corporate networks in the world, but Digitals

    network is tiny compared with the Internet. (As of this writing, there are more than 27,000

    assigned IP network numbers, each of which may have hundreds or thousands of hosts). Al-

    gorithms and administrative procedures that work on a network with a mere 60,000 nodes under

    a single administration wont work on a network with millions of hosts under thousands of ad-

    ministrations.

    Digital does not have much ability to test our software in a multi-organizational environment,

    before it gets to external field test. Mostly, this is because security constraints limit the ways in

    which we can allow direct connections between our internal hosts and the Internet. These con-

    straints, alas, are not unreasonable. Perhaps in a number of years, we will be able to expose parts

    of our internal network without fear of hacking, but today our technology is not good enough.

    This means that, in addition to improving our security technology, we must find other ways to

    test our systems in the multi-organizational environment. One way would be to avoid the

    temptation to put all of our internal networks under one management. Multiple managements

    would complicate our internal networking, but we would certainly learn a lot.

    It is interesting to note that, several years ago, IBM sought and won the contract to manage the

    NSF network. IBM was not then, and probably still is not, recognized for its commitment to

    IP/TCP, but by managing the core of the IP/TCP Internet, they probably learned an awful lot

    more than we did about such networks.

    Digital Internal Use Only 8

  • 7/28/2019 WRL-TN-20

    13/21

    HOW DIGITAL IMPEDES PORTABILITY AND INTEROPERABILITY

    3.4. Shortening the product cycle

    Open systems live by de facto standards. Unlike official standards (such as IEEE or ISO

    standards), or proprietary standards (such as DECnet), de facto standards evolve rapidly. In the

    good old days, we had control over how fast DECnet evolved: the standards didnt evolve any

    faster than our ability to ship new implementations. Today, in the IP/TCP market, we do not

    have the luxury of a centralized release control process to ensure that nobody gets ahead of thepack.

    We can only compete in this market by shortening the time it takes us to respond to changes in

    the de facto standards. Leadership does not come from being best in a market where unifor-

    mity is important; it comes from being first (or at least, never last).

    I again quote from Patricia Ward:

    [The] relative importance of [the twenty identified NAS] attributes is highly dependent on theapplication environments at each customers site [...] However, [the customers] concur thatDEC must excel in the areas of: timeliness of implementation; ease of use; high availability;performance; reliability; and security. Timeliness of implementation is a key issue, with cus-tomers expressing concern over time-to-market plans.

    There are many ways to shorten our development cycles. One is to be agile at acquiring

    public domain software, and shipping it as soon as possible. Another is to do quick prototypes of

    new software. Prototypes might not have all the features and performance of a carefully en-

    gineered system, but they allow us to discover problems with our specification, approach, and

    mindset early enough to do something about it. When conceptual problems are discovered only

    in field test, nobody is willing to make the necessary changes.

    3.5. Misreading the market

    To get ahead of the competition, or at least to avoid falling far behind, we need more depth inunderstanding the market. Too often we decide to put our resources into doing something that

    the customers really dont want, and meanwhile fail to do the things that they want. We cant

    afford to let lost sales be our only indicator that the competition is ahead of us.

    For example, take the curious availability of Kerberos support for ULTRIX. (Kerberos is the

    MIT-Athena system for authenticating users in a distributed system.) MIT had already

    developed kerberized versions of numerous user commands, including rlogin and rcp (which

    in their original forms are scandalously insecure). However, ULTRIX still doesnt ship any ker-

    berized applications, except for the BIND/Hesiod name server. While Kerberos support in the

    name server might well be necessary for certain customers, I understand that this configuration is

    possible only in an all-ULTRIX environment, and relatively few of our customers are sufficientlyconcerned about the integrity of their name service to use it. Meanwhile, almost all of them

    could use the security of the kerberized commands, but we dont supply these. To me, this

    reflects a misplaced understanding of the market. (Apparently, it also reflects a turf battle within

    Digital ... which should have been resolved in order to satisfy the market. Note that blame for

    our inability to respond to the market may not lie with any particular individual, but rather with a

    corporate organization that diffuses control so that there is no focus for pushing the right

    response.)

    Digital Internal Use Only 9

  • 7/28/2019 WRL-TN-20

    14/21

    HOW DIGITAL IMPEDES PORTABILITY AND INTEROPERABILITY

    Another example of misreading the market was the decision, a few years ago, to support Suns

    YP name service but not the IP standard Domain Name Service (DNS). Fortunately, the ULTRIX

    group has since not only rectified this error, but has created a mechanism to support the coexis-

    tance of both YP and DNS (something that Sun has yet to do, apparently). In fact, dealing with

    coexistance of multiple mechanisms is one of the keys to success in the 1990s. The world will

    be a mixture of DECnet, SNA, XNS, IP, OSI, and myriad PC networking technologies, and we

    must allow a customer to use all of them at once.

    Blind spots in our understanding of the market can hurt us in other ways. For example, last

    year I helped review a new version of the Network Troubleshooting Guide. The previous ver-

    sion covered only DECnet networks; I was gratified that the new version covered IP/TCP net-

    works. Unfortunately, while the authors were only familiar with Digitals own product set,

    troubleshooting most IP/TCP networks demands an understanding of equipment from many ven-

    dors. We still dont ship IP routers, for example, so any real customer installation will be full of

    non-Digital routers. Also, troubleshooting on an Ethernet often requires the use of a network

    monitor (such as Network Generals Sniffer); we dont sell such a product. The Network

    Troubleshooting Guide would be far more useful to customers if it covered non-Digital products,

    but our documentation people simply dont have access to them.

    For a final example, I simply note how long it has taken us to get an IP router onto the market

    (not quite yet, as of this writing). The Internet Portal, while a nice design for a niche product, is

    not adequate for most uses.

    3.6. Exposing ourselves in public

    We desperately need to improve the connection between our development organizations and

    the customers. Mediating all such communication through the field organizations doesnt work;

    sales and marketing people dont usually have the technical sophistication to see the real tech-

    nical shortcomings in our systems. Engineers and engineering managers can, and must, find

    direct ways to communicate with our customers, and with our competitors customers (or else

    how do we ever get them back?)

    Part of the problem is that, because of our proprietary internal infrastructure, Digital is discon-

    nected from the outside world. The two main examples of this are our electronic mail system

    and our bulletin board systems.

    Outside the company, virtually all interorganizational mail flows via IP/TCP or UUCP

    mechanisms. Both of these use mail headers that roughly correspond to RFC822, and address

    formats that are understood by the majority of Unix users (and by all Unix system ad-

    ministrators). Inside Digital, we use message headers that dont really interoperate withRFC822, and a variety of address formats that make absolutely no sense to anyone outside of

    Digital. The DECWRL electronic mail gateway goes through amazing contortions to paper over

    these incompatibilities, but confusion inevitably results.

    The situation with bulletin boards is even worse. Much of Digitals internal communication is

    conducted via the Notes system. Notes has some very nice features, but connection with the

    outside world is not one of them. As a result, Digital as a whole contemplates its navel, via

    Notes, while ignoring the rest of the universe.

    Digital Internal Use Only 10

  • 7/28/2019 WRL-TN-20

    15/21

    HOW DIGITAL IMPEDES PORTABILITY AND INTEROPERABILITY

    Meanwhile, everyone else uses the USENET news system. News is quite similar to Notes;

    although there are some conceptual differences, both provide a set of public discussions or-

    ganized by topic area. The important difference is that news easily supports the connection of

    multiple organizations; Notes does not.

    There is a tremendous amount of information in the news system, much of it relevant to

    2Digitals business . Although many Digital employees do participate in various newsgroups(there are fully-functional news clients for VMS), I am continually dismayed to find people in

    the company who are ignorant of the whole concept, or who arent interested in finding out what

    is going on in the outside world. If each product manager spent some time reading the

    newsgroups relevant to his or her product plans, we might be in a much better position.

    We can and must change our internal infrastructure to be more like that of the real world. This

    means getting rid of NODE::USERNAME addresses; more and more systems inside the com-

    pany can exchange mail via TCP, and we should make username@host our preferred form of

    mail address. It also means shifting from Notes to news whenever possible; I find it amazing

    that our internal discussions about the IETF (the organization that sets IP standards) is carried

    out in a notesfile, rather than a newsgroup.

    Our competition does not have this problem. Practically everyone at Sun uses Unix, TCP

    mail, and news; they dont have proprietary in-house systems to isolate them from their cus-

    tomers. HP brags about the size of their internal IP network. We shouldnt pretend that OSI will

    save us; it will take too long, and when it arrives the winners will be those who have already

    learned how to play in the open systems world.

    3.7. Technical superiority is a red herring

    One of the hardest lessons for us to learn is that, while technical excellence is necessary, it is

    never sufficient. The world will not beat a path to your door simply because your mousetrap isbetter. A technically superior product may fail because it isnt compatible with the customers

    existing systems, or it doesnt solve the problems that the customer wants solved, or it is simply

    different from what the customer is used to. Most customers have realized that managing their

    systems is the hardest problem of all; if we try to sell them something new, they may not want it

    if they have to retrain people in order to use it.

    Our efforts to induce the IP community to adopt the ISO IS-IS routing protocol, instead of the

    OSPF protocol proposed by Proteon and others, should be instructive. Digital thought that IS-IS

    was clearly superior, technically. The OSPF camp thought that OSPF was clearly superior. The

    two camps were using different models of the world to make their judgements, and so agreement

    on strictly technical grounds was never possible. (Also, the designers of the two protocols havelearned from each others criticism, so neither protocol has many identifiable flaws.)

    The OSPF camp appears to have won. The decision was not only made for technological

    reasons; politics (which Digital misunderstood) had something to do with it. My guess is that

    2I recommend reading the newsgroup comp.unix.ultrix in particular.

    Digital Internal Use Only 11

  • 7/28/2019 WRL-TN-20

    16/21

    HOW DIGITAL IMPEDES PORTABILITY AND INTEROPERABILITY

    what gave OSPF the edge was that it was already available in products (nobody has yet shipped

    an IS-IS product). Not only did Digital fail by being late to market with an IS-IS product; we

    also failed by not anticipating OSPFs victory and getting an OSPF product to market. (Our first

    IP router product is not likely to support OSPF for a while after FCS.)

    3.8. Creating open standards

    We should not let our past failures at creating open standards dissuade us from active par-

    ticipation in the creation of future standards, but we must learn the right way to do it. The first

    step is to realize that the OSI and POSIX standards process might not be good models to follow.

    A standard developed under this kind of bureaucratic process doesnt always achieve significant

    market share, even if it does become a check-off item on large contracts.

    Successful promotion of a de facto standard requires more flexibility. First, one must work

    closely with customers and other vendors. Second, one must be willing to compromise elegance

    for relatively short-term pragmatic features. Finally, one must get implementations into use as

    quickly as possible, on all major-vendor platforms, even if the software has to be given away.This is how Sun succeeded with NFS, and how all the successful Internet standards were created.

    (Many Internet standards have languished for lack of widely-available implementations.)

    Asking how we make money by giving implementations away misses the point. A standard is

    no good to us if we are the only vendor to support it. Once the standard is widely accepted, we

    make money by having the best (fastest, most robust, most manageable) implementation. We

    cannot do that, though, until our competitors can interoperate using the standard.

    4. Changing the system

    It isnt hard to find problems with our software. Every large organization makes mistakes, andwe probably make fewer mistakes than most of our competitors. It is much harder to provide

    constructive suggestions for improving things, especially now that we are far more resource-

    limited than we once were.

    If I were to pick one step to take first, it would be to spend more effort staying in touch with

    the customers and the market. I dont mean participating in the standards process, which (while

    necessary) is after all just a way to define a more useful dividing line between us and our cus-

    tomers (or ISVs). I dont mean doing market surveys; they are also necessary, but they tend to

    obscure the specific insights that one gets from individual voices.

    Instead, I would like to see Digital switch more of its internal infrastructure to match the open

    systems world that the customers are living in. This will make it much easier for us to speak thesame language as our customers, and it will also improve our ability to hear what they are saying

    behind our back.

    Opening up in this way leads to other changes: using more public-domain software, doing

    more prototyping to test concepts of interoperability, doing broader portability and inter-

    operability testing, and finding out about problems earlier than we do now.

    Digital Internal Use Only 12

  • 7/28/2019 WRL-TN-20

    17/21

    HOW DIGITAL IMPEDES PORTABILITY AND INTEROPERABILITY

    We also need to remove some of the layers that isolate engineers and product developers from

    the customers and ISVs. It is too hard for customers and ISVs to discover technical information

    about our products, and much too hard for them to complain.

    Finally, as an outsider to the product creation process, I believe that much of our trouble lies

    with engineering management, and in particular a lopsided allocation of resources. It is hard to

    believe that our product groups in the open systems arena are still severely understaffed, giventhe obvious importance of these products to the health of the company. One cannot, of course,

    simply add programmers to speed up a late project; we have to start hiring good people now if

    we expect them to be useful in a year or two.

    5. Conclusion

    If we are going to make money, our product development process is going to have to do better

    at supporting portability and interoperability. Our insular organizational structures are the main

    barrier to success. We, meaning the people who actually do the work, must be willing to in-

    novate not only in technology but in approach. If this means breaking with the traditional DECway of doing things, people will have to take some risks.

    We need to improve our communication patterns. We need to find out where the open sys-

    tems market is heading, what the customers want, and how well our systems will satisfy them.

    There is a tremendous resistance in Digital to the spread of bad news; employees who complain

    about our products are sometimes even accused of disloyalty. Only when we are willing to face

    the hard truths, as soon as possible, will we get competitive products to the market on time.

    6. Acknowledgements

    I would like to thank Mary Jo Doherty, Henry Petras, Win Treese, and Kathy Wilde for com-menting on drafts of this document, but of course they are not responsible for my errors.

    Digital Internal Use Only 13

  • 7/28/2019 WRL-TN-20

    18/21

    HOW DIGITAL IMPEDES PORTABILITY AND INTEROPERABILITY

    Digital Internal Use Only 14

  • 7/28/2019 WRL-TN-20

    19/21

    HOW DIGITAL IMPEDES PORTABILITY AND INTEROPERABILITY

    WRL Research Reports

    Titan System Manual. MultiTitan: Four Architecture Papers.

    Michael J. K. Nielsen. Norman P. Jouppi, Jeremy Dion, David Boggs, Mich-

    WRL Research Report 86/1, September 1986. ael J. K. Nielsen.

    WRL Research Report 87/8, April 1988.

    Global Register Allocation at Link Time.

    David W. Wall. Fast Printed Circuit Board Routing.

    WRL Research Report 86/3, October 1986. Jeremy Dion.

    WRL Research Report 88/1, March 1988.

    Optimal Finned Heat Sinks.

    William R. Hamburgen. Compacting Garbage Collection with Ambiguous

    WRL Research Report 86/4, October 1986. Roots.

    Joel F. Bartlett.

    The Mahler Experience: Using an Intermediate WRL Research Report 88/2, February 1988.

    Language as the Machine Description.

    David W. Wall and Michael L. Powell. The Experimental Literature of The Internet: An

    WRL Research Report 87/1, August 1987. Annotated Bibliography.

    Jeffrey C. Mogul.

    The Packet Filter: An Efficient Mechanism for WRL Research Report 88/3, August 1988.

    User-level Network Code.

    Jeffrey C. Mogul, Richard F. Rashid, Michael Measured Capacity of an Ethernet: Myths and

    J. Accetta. Reality.

    WRL Research Report 87/2, November 1987. David R. Boggs, Jeffrey C. Mogul, Christopher

    A. Kent.

    Fragmentation Considered Harmful. WRL Research Report 88/4, September 1988.

    Christopher A. Kent, Jeffrey C. Mogul.

    WRL Research Report 87/3, December 1987. Visa Protocols for Controlling Inter-Organizational

    Datagram Flow: Extended Description.

    Cache Coherence in Distributed Systems. Deborah Estrin, Jeffrey C. Mogul, Gene Tsudik,

    Christopher A. Kent. Kamaljit Anand.

    WRL Research Report 87/4, December 1987. WRL Research Report 88/5, December 1988.

    Register Windows vs. Register Allocation. SCHEME->C A Portable Scheme-to-C Compiler.

    David W. Wall. Joel F. Bartlett.

    WRL Research Report 87/5, December 1987. WRL Research Report 89/1, January 1989.

    Editing Graphical Objects Using Procedural Optimal Group Distribution in Carry-Skip Ad-

    Representations. ders.

    Paul J. Asente. Silvio Turrini.

    WRL Research Report 87/6, November 1987. WRL Research Report 89/2, February 1989.

    The USENET Cookbook: an Experiment in Precise Robotic Paste Dot Dispensing.

    Electronic Publication. William R. Hamburgen.

    Brian K. Reid. WRL Research Report 89/3, February 1989.

    WRL Research Report 87/7, December 1987.

    Digital Internal Use Only 15

  • 7/28/2019 WRL-TN-20

    20/21

    HOW DIGITAL IMPEDES PORTABILITY AND INTEROPERABILITY

    Simple and Flexible Datagram Access Controls for Link-Time Code Modification.

    Unix-based Gateways. David W. Wall.

    Jeffrey C. Mogul. WRL Research Report 89/17, September 1989.

    WRL Research Report 89/4, March 1989.

    Noise Issues in the ECL Circuit Family.

    Jeffrey Y.F. Tang and J. Leon Yang.Spritely NFS: Implementation and Performance of

    WRL Research Report 90/1, January 1990.Cache-Consistency Protocols.

    V. Srinivasan and Jeffrey C. Mogul.

    Efficient Generation of Test Patterns UsingWRL Research Report 89/5, May 1989.

    Boolean Satisfiablilty.

    Tracy Larrabee.Available Instruction-Level Parallelism for Super-

    WRL Research Report 90/2, February 1990.scalar and Superpipelined Machines.

    Norman P. Jouppi and David W. Wall.

    Two Papers on Test Pattern Generation.WRL Research Report 89/7, July 1989.

    Tracy Larrabee.

    WRL Research Report 90/3, March 1990.A Unified Vector/Scalar Floating-Point Architec-

    ture.

    Virtual Memory vs. The File System.Norman P. Jouppi, Jonathan Bertoni, and David

    Michael N. Nelson.W. Wall.

    WRL Research Report 90/4, March 1990.WRL Research Report 89/8, July 1989.

    Efficient Use of Workstations for Passive Monitor-Architectural and Organizational Tradeoffs in the

    ing of Local Area Networks.Design of the MultiTitan CPU.

    Jeffrey C. Mogul.Norman P. Jouppi.

    WRL Research Report 90/5, July 1990.WRL Research Report 89/9, July 1989.

    A One-Dimensional Thermal Model for the VAXIntegration and Packaging Plateaus of Processor

    9000 Multi Chip Units.Performance.

    John S. Fitch.Norman P. Jouppi.

    WRL Research Report 90/6, July 1990.WRL Research Report 89/10, July 1989.

    1990 DECWRL/Livermore Magic Release.A 20-MIPS Sustained 32-bit CMOS Microproces-

    Robert N. Mayo, Michael H. Arnold, Walter S. Scott,sor with High Ratio of Sustained to Peak Perfor-

    Don Stark, Gordon T. Hamachi.mance.

    WRL Research Report 90/7, September 1990.Norman P. Jouppi and Jeffrey Y. F. Tang.

    WRL Research Report 89/11, July 1989.

    Pool Boiling Enhancement Techniques for Water at

    Low Pressure.The Distribution of Instruction-Level and Machine

    Wade R. McGillis, John S. Fitch, WilliamParallelism and Its Effect on Performance.

    R. Hamburgen, Van P. Carey.Norman P. Jouppi.WRL Research Report 90/9, December 1990.WRL Research Report 89/13, July 1989.

    Writing Fast X Servers for Dumb Color Frame Buf-Long Address Traces from RISC Machines:

    fers.Generation and Analysis.

    Joel McCormack.Anita Borg, R.E.Kessler, Georgia Lazana, and David

    WRL Research Report 91/1, February 1991.W. Wall.

    WRL Research Report 89/14, September 1989.

    Digital Internal Use Only 16

  • 7/28/2019 WRL-TN-20

    21/21

    HOW DIGITAL IMPEDES PORTABILITY AND INTEROPERABILITY

    Analysis of Power Supply Networks in VLSI Cir-

    cuits.

    Don Stark.

    WRL Research Report 91/3, April 1991.

    Procedure Merging with Instruction Caches.

    Scott McFarling.

    WRL Research Report 91/5, March 1991.

    Dont Fidget with Widgets, Draw!.

    Joel Bartlett.

    WRL Research Report 91/6, May 1991.

    Pool Boiling on Small Heat Dissipating Elements in

    Water at Subatmospheric Pressure.

    Wade R. McGillis, John S. Fitch, William

    R. Hamburgen, Van P. Carey.

    WRL Research Report 91/7, June 1991.

    WRL Technical Notes

    TCP/IP PrintServer: Print Server Protocol. Limits of Instruction-Level Parallelism.

    Brian K. Reid and Christopher A. Kent. David W. Wall.

    WRL Technical Note TN-4, September 1988. WRL Technical Note TN-15, December 1990.

    TCP/IP PrintServer: Server Architecture and Im- The Effect of Context Switches on Cache Perfor-

    plementation. mance.

    Christopher A. Kent. Jeffrey C. Mogul and Anita Borg.WRL Technical Note TN-7, November 1988. WRL Technical Note TN-16, December 1990.

    Smart Code, Stupid Memory: A Fast X Server for a MTOOL: A Method For Detecting Memory Bot-

    Dumb Color Frame Buffer. tlenecks.

    Joel McCormack. Aaron Goldberg and John Hennessy.

    WRL Technical Note TN-9, September 1989. WRL Technical Note TN-17, December 1990.

    Why Arent Operating Systems Getting Faster As Predicting Program Behavior Using Real or Es-

    Fast As Hardware? timated Profiles.

    John Ousterhout. David W. Wall.

    WRL Technical Note TN-11, October 1989. WRL Technical Note TN-18, December 1990.

    Mostly-Copying Garbage Collection Picks Up Systems for Late Code Modification.

    Generations and C++. David W. Wall.

    Joel F. Bartlett. WRL Technical Note TN-19, June 1991.

    WRL Technical Note TN-12, October 1989.