Evaluating Web Server Log Analysis Tools David Strom [email protected] SD’98 2/13/98.

27
Evaluating Web Server Log Analysis Tools David Strom [email protected] SD’98 2/13/98

Transcript of Evaluating Web Server Log Analysis Tools David Strom [email protected] SD’98 2/13/98.

Evaluating Web Server Log Analysis Tools

David Strom

[email protected]

SD’98 2/13/98

SD'98 (c) David Strom, Inc. 2

Summary

• Examine different log files

• What you can and can’t learn from your logs

• Pros and cons of various tools

SD'98 (c) David Strom, Inc. 3

Different types of log files

• Access

• Error

• Referral

• Other

SD'98 (c) David Strom, Inc. 4

Access logs

• Domain name

• Date, time

• Server command processed and result

• URL of visitor

• Bytes transmitted

SD'98 (c) David Strom, Inc. 5

Sample access log data

• rm258.fav.usu.edu [31/May/1995:09:03:23 +0600] "GET /NEI.html HTTP/1.0" 302 396

• rm258.fav.usu.edu [31/May/1995:09:03:28 +0600] "GET /xculture/nei/nei.html HTTP/1.0" 200 2114

• rm258.fav.usu.edu [31/May/1995:09:03:30 +0600] "GET /gifs/sedlbutton.gif HTTP/1.0" 200 1336

• 129.71.83.161 [31/May/1995:09:20:32 +0600] "GET /RELs.html HTTP/1.0" 304 0

• Leslie-Francis.tenet.edu [31/May/1995:09:36:06 +0600] "GET / HTTP/1.0" 200 1867

• ls973.ulib.albany.edu [31/May/1995:09:40:52 +0600] "GET /viii1.html HTTP/1.0" 404 244

SD'98 (c) David Strom, Inc. 6

Errors reported in your logs

• Clients that time out (or leave in frustration!)

• Scripts that don’t produce any output

• Server bugs

• User authentication or configuration problems

SD'98 (c) David Strom, Inc. 7

Sample error log data

• [Thu May 30 07:25:32 1996] send timed out for bamberg.sedl.org

• [Thu May 30 07:57:41 1996] send timed out for kenya.sedl.org

• [Thu May 30 08:23:11 1996] send timed out for ppp092.kyoto-inet.or.jp

• [Thu May 30 09:15:52 1996] access to /usr/local/www/htdocs/scimath/compass/vol03 failed for 170.211.67.51, reason: File does not exist

• [Thu May 30 09:57:56 1996] send timed out for dd10-048.compuserve.com

• [Thu May 30 10:47:25 1996] read timed out for ncia110b.ncia.net

SD'98 (c) David Strom, Inc. 8

Referral logs

• Who links to your site?

• Who downloads your pages?

SD'98 (c) David Strom, Inc. 9

Sample referral log data

• http://www.isisnet.com/ ->/change/welcome.html• http://www.ipl.org/ref/RR/EDU/Research-rr.html

->/welcome.html• http://www.tenet.edu/snp/main.html

->/policy/networks/toc.html• http://www.tenet.edu/new/main.html

->/policy/networks/toc.html• http://guide-p.infoseek.com/NS/Titles?qt=teacher+training -

>/resources/SCIMAST/announcement.html• http://www.tenet.edu/new/main.html

->/policy/networks/toc.html• http://www.tenet.edu/new/main.html

->/policy/networks/toc.html• http://www.nwrel.org/national/regional-labs.html

->/welcome.html

SD'98 (c) David Strom, Inc. 10

Common log format

• Output by most standard servers

• Needed by most third-party log analyzers• hoohoo.ncsa.uiuc.edu/docs/setup/httpd/Overview.html

SD'98 (c) David Strom, Inc. 11

Extended/custom log formats

• Log whatever you wish in whatever order you wish

• Useful if you will read them regularly!

• But can’t work with the analyzers

• Now in IIS v4, NSCP v3, others.

SD'98 (c) David Strom, Inc. 12

What you can learn from your log files

• Hits per day

• Domain origins

• The path people take in and around your web

• Problem areas

SD'98 (c) David Strom, Inc. 13

HITS

• (How Idiots Track Success)

• Nobody uses this word anymore

• Doesn’t really measure individual users, just access

• Catching servers and proxies mess up these statistics

SD'98 (c) David Strom, Inc. 14

Domain origins

• Where users are coming from -- sometimes

• Just because they are from ibm.net doesn’t mean they work at IBM!

• Forgotten accounts, friends and family using the account

• Hacked user names

• Proxies don’t help here either

SD'98 (c) David Strom, Inc. 15

The path people take in and around your web

• Search engines help sometimes

• Which search site was the most popular front door

• Who links to you and why

• Is there a pattern or a random walk?

SD'98 (c) David Strom, Inc. 16

Problem areas to deal with

• Broken links (locally)

• Broken outbound links

• Time outs (sunspots?)

SD'98 (c) David Strom, Inc. 17

What you can’t learn from your logs

• Who are these people, anyway?– No specific user names– Is it a bot or a real human?

• How long did they view a page?– Most people don’t spend much time on your

web– Where did they go visit next?

SD'98 (c) David Strom, Inc. 18

What technologies are available?

• Built-in analyzer tools

• Sites that capture user info

• Secure sites with registration

• Build your own from perl

• Third-party tools

SD'98 (c) David Strom, Inc. 19

Built-in tools

• WebSite, website.ora.com

• IIS with Site Server, www.microsoft.com/iis

• Netscape servers, www.netscape.com

• Easy to use but limited

SD'98 (c) David Strom, Inc. 20

WebSite Professional v2

• Win NT, 95

• Best web server for learning about logs, best docs

• QuickStats module for instant analysis:– single report but nice set of information– shows today, last two days requests and unique

hosts– IP addresses of visitors, average requests/hour

SD'98 (c) David Strom, Inc. 21

IIS Site Server

• NT Server v4 w/SP3 only

• Lots of preconfigured reports

• Two versions, Express and Full (customized reports)

• backoffice.microsoft.com/products/siteserver/express/

SD'98 (c) David Strom, Inc. 22

Netscape v3 web servers

• Various NT, Unix versions

• Reports for a few variables but nothing too extensive

• Best to use a third-party tool here

SD'98 (c) David Strom, Inc. 23

Sites that capture user info

• WebCounter, www.digits.com -- third-party hit counter

• Someone else does the programming and debugging

• But beyond your control

SD'98 (c) David Strom, Inc. 24

Secure sites with registration

• You know your users

• But many won’t register, or forget their passwords

• Requires scripting, database integration, more maintenance

SD'98 (c) David Strom, Inc. 25

Build your own from perl

• Needs some in-house support

• Works best with Unix-based webs

• Examples:– refstats,

members.aol.com/htmlguru/refstats.html– surfreport, bienlogic.com/SurfReport/

SD'98 (c) David Strom, Inc. 26

Third-party tools

• WebTracker, www.CQMInc.com/webtrack

• WebTrends, www.webtrends.com

• net.Genesis, www.netgen.com

• MarketWave, www.marketwave.com

• IIS Assistant, www.go-iis.com

SD'98 (c) David Strom, Inc. 27

Third-party tools (con’t)

• Can make very pretty reports

• Customizable

• Make sure they support your particular log format

• Not that expensive, mostly run on Windows