What We Will Cover
Log Tails
From Last Lab
Quiz Review
back to top
6.1: Log File Formats
Objectives
At the end of the lesson the student will be able to:
- Describe How to Configure Logging
- Describe How to Read Different Server Log Files
- Configure Apache to Log Transactions
|
back to top
6.1.1: About Log Files
- Important to get feedback about the activity and performance of a server
- Need to know about any problems developing
- Want to know about what resources are being requested
- All good web servers allow the system administrator to configure logging
- Even busy servers can enable logging and not suffer a performance loss
- Most requests create a single line in a file
- Not computationally intensive
- However, log files can grow very large
- Must make sure that they do not fill all the free space on your hard drive
- Common practice: put log files on a separate drive or partition -- why?
- Another method: rotate (rename and remove) log files periodically
Log File Formats
- Most servers support at least two common log file formats:
- Common Logfile Format (CLF)
- Extended Logfile Format (ELF)
- Will first look at both of these formats
- Then will examine error logs
- Afterwards, will look at how to configure our servers for logging
back to top
6.1.2: The Common Logfile Format
- NCSA and CERN Web servers first used the Common Logfile Format
- Many current Web servers now support this format including Apache and IIS
- Each line in the file represents a unique request
- Each line has seven fields in the following order:
remotehost rfc931 authuser [date] "request" status bytes
- remotehost
- Remote (client) hostname, or IP number if DNS hostname is not available or if DNSLookup is off.
- rfc1413
- The remote username. RFC1413 (which obsoletes RFC913) defines a protocol used to determine the identity of a client that requests a resource from the server. It is seldom used on Internet servers because it slows the response of the server. A "-" is entered into the log if the server is unable to determine the userid.
- authuser
- The username as which the user has authenticated himself. When authentication is required to access a page, this is the authenticated username. For normal unrestricted requests, this field is just "-".
[date]
- Date and time of the request. The date and time are usually saved in the format: DD/MON/YYYY:HH:MM:SS TZ. TZ is the timezone. Since there may be spaces in this field, it is enclosed in brackets for easy parsing.
"request"
- The request line exactly as it came from the client. Like the date field, this field is enclosed in quotes since there are spaces in the request line.
- status
- The HTTP status code returned to the client.
- bytes
- The content-length of the document transferred.
Additional Fields
- Some popular log file formats add two additional fields
- Referer
- The Referer field contains the URL that brought the user to this resource. (discussed in section 6.2)
- User-Agent
- The user-agent field is a string describing the client that made the request (e.g., Mozilla/4.0).
back to top
6.1.3: The Extended Logfile Format
- Common Logfile Format only logs certain fields
- Often desirable to log more information or omit certain fields
- Extended Logfile Format is an extendable format
- Allows specifying exactly which fields should be logged and in what order
- Similar format to the Common Logfile Format
- Each line of the file represents a request
- Beginning of file also contains some configuration directives
- Each directive line begins with a #
- Version and Fields are required and should precede all entries in log
- Version directive specifies version of Extended Logfile Format to use
- Fields directive specifies what data to record in the logfile
For Example
#Version: 1.0
#Fields: date time c-ip sc-bytes time-taken cs-version
1999-08-01 02:10:57 192.0.0.2 6304 3 HTTP/1.0
1999-08-01 02:12:41 192.0.0.2 5100 1 HTTP/1.0
1999-08-01 03:37:19 192.0.0.3 5100 2 HTTP/1.0
Notice that the Fields directive specifies six fields in the file
Date and time are ... date and time
c-ip stands for the client IP address
sc-bytes is the number of bytes sent from the server to the client
time-taken field is the number of seconds it took to send the data
cs-version is the version of HTTP used by the client
For more information click here
back to top
6.1.4: Error Logs
- Access log files save statistical information about a request
- Server can also generate messages when errors occur and log those errors to a file
- Informational messages and debugging information are also often logged to the error log file
- Error log is useful for:
- Finding problems with your server
- Debugging server-side programs (e.g. Perl scripts)
- Debugging new configuration options
- Usually control types of messages logged using the LogLevel directive
LogLevel warn
back to top
6.1.5: Configuring Apache Logging
- Apache configuration files for our classroom systems located at
/usr/local/apache2/conf/
Apache log location for our installation:
/usr/local/apache2/logs/
Log location and configuration is specified in the httpd.conf file
To define the location and format of the access logfile use:
CustomLog logs/access_log common
Note that:
- File path is relative to the server root
- "common" refers to Common Logfile Format
Nickname "common" is defined by a LogFormat directive, also in the httpd.conf file
LogFormat "%h %l %u %t \"%r\" %>s %b" common
% directives represents a particular field to be logged such as:
- %h: remote hostname
- %l: remote logname
- %u: remote user
- %t: date and time
- %s: status
- %b: bytes sent
Using predefined nicknames, can specify agent and referer logfiles
CustomLog logs/referer_log referer
CustomLog logs/agent_log agent
Must always stop and start the server after updating httpd.conf
Further Information
Log Files: how to use
Custom Log Formats: what the % directives mean
cronolog.org: for log rotation
back to top
6.1.6: Configuring IIS Logging
- IIS logfile configuration is available from the Web Site tab of the Properties dialog
- Select one of the available formats

- Any of the formats allow you to set general properties
- Note the location of the log file directory
- Since the logs are text files, can view the logs in Notepad

- Can modify the W3C Extended Log File Format using the Extended Properties tab
- Press the Help button for definitions of the properties

back to top
Lab Exercise 6.1
Instructions:
Use the next 10 minutes to complete the following.
- Start a text file named exercise6.txt
Will be adding to this file during the lesson -- save it often.
- Prepare the exercise header as described in the HowTo on submitting exercises
- Label this exercise: Lab 6.1
- Answer the following questions.
Exercises and Questions
- Configure your server to log all requests to a file named
access_log using the Common Logfile Format. What configuration options are used?
- Configure your server to log the HTTP User-Agent header to a file such as
agent_log. What configuration options are used? Access a page on the server; what does this file contain now?
- Try to access a page that does not exist on your server. What is recorded in the access log? What is recorded in the error log?
Consider the following three lines from a log file in Common Logfile Format:
volvo.vortexwidgets.com - moose [27/May/1999:20:00:52 -0500]
" GET /wm103/samples/ HTTP/1.0" 401 61
volvo.vortexwidgets.com - - [28/May/1999:18:20:03 -0400]
" GET /wm102/ HTTP/1.0" 200 4405
volvo.vortexwidgets.com - - [29/May/1999:10:31:48 -0400]
" GET /icons/back.gif HTTP/1.0" 200 216
- Can you tell which resource required authentication? What is the username of the authenticated user? Did they have access to the requested resource?
- What file is returned for the request in the second line? What is the size of the file?
back to top
6.2: Referrers
Objectives
At the end of the lesson the student will be able to:
- Describe how people are getting to your site
|
back to top
6.2.1: Seeing How People Get to Your Site
- HTTP request can specify a URL that the browser is viewing
- Only if the user clicked on a link in that page
- Information is sent in the HTTP Referer header
- Provides way of knowing which web page user is coming from
- Know where they are coming from in terms of an IP address
- Referer header allows us to see what Web page brought them to our site
- Referer header is generated by the browser
back to top
6.2.2: Referrer Example
CustomLog logs/referer_log referer
How might people come to your web site using a link?
back to top
Lab Exercise 6.2
Instructions:
Use the next 10 minutes to complete the following.
- Label this exercise: Lab 6.2
- Do not submit exercises until all of them from today's lesson are finished.
- Complete the exercises and answer the following questions.
Exercises and Questions
- Configure your server to log referrer information to a log file such as
referer_log. What options did you use?
- Open a page that has links to other pages on your site and click on some of the links. What shows up in
referer_log?
- Try linking from another computer in the classroom. What shows up in
referer_log now?
back to top
6.3: Being Proactive
Objectives
At the end of the lesson the student will be able to:
- Use log files to help find dead links
- Describe how to spot suspicious activity
- Find HTTP 404 -- Not Found log entries
|
back to top
6.3.1: Finding Problems Using Logs
- Being proactive means to fix small problems before they become large ones
- To be proactive, you must actively maintain your site
- Easiest way to find problems with your site is by analyzing log files
- Can easily see whenever there might be a problem
- Examples of common errors
- Dead links or requests for files that do not exist
- CGI scripts that do not work properly
- Permissions problems
- Dead links make your site look unprofessional
- What are some possible causes of dead links?
- Scripts with errors can fill your logs with error messages
- CGI scripts with errors are logged
- Useful resource when debugging server-side scripts
back to top
6.3.2: Finding CGI Script Errors
- Common errors with CGI scripts:
- Missing Content-Type header
- Incorrectly forming HTTP header section of response
- Many times the script runs just fine when tested manually on the server
- When user tries to access the script from a browser they receive an HTTP 500 Internal Server Error
- Server error log could look like this:
[Mon Apr 12 15:06:53 1999] [error] Premature end of script headers:
/export/home/paivam/public_html/test6.cgi
- Premature end of script headers means the header section of response was not formed correctly
- Syntax error with the script might show following
[Mon Apr 12 19:24:21 1999] [error] Premature end of script headers:
/export/home/patm/public_html/form.cgi
syntax error at form.cgi line 7, near ") print"
Execution of form.cgi aborted due to compilation errors.
- Premature end of script headers occurs because the script did not run far enough to generate header
- From the log, we can see that line 7 of the script has a problem
back to top
6.3.3: Finding Access Permissions Problems
- Access permissions are another problem you can see in Web server log files
- Users forget to give read permission for files or allow execute permission for scripts
- Password-protected pages log errors if unauthorized users try to access them
- Can also see how many times user repeatedly enters incorrect passwords to access a page
[Sun Apr 18 16:40:40 2001] [error] Permission denied: file permissions deny server access: /export/home/patm/public_html/phonelist.txt
[Mon Apr 12 19:43:45 2001] [crit] Permission denied: /opt/apache/share/htdocs/wm105/class6/.htaccess pcfg_openfile: unable to check htaccess file, ensure it is readable
[Mon Aug 9 21:55:53 2001] [error] [client 24.218.82.54] access to /sales/ failed, reason: user ericl not allowed access
back to top
6.3.4: Finding HTTP 404 -- Not Found Errors
- Can use UNIX grep command to find information in files
- grep is a command the finds all lines in a file that contain a certain string
- Can use grep to search log file for all lines containing a 404 error message
grep '" 404' /usr/local/apache2/logs/access_log
Searches for a double quote followed by a space, followed by 404
To send output to another file named out.txt:
grep '" 404' /usr/local/apache2/logs/access_log >> out.txt
back to top
Lab Exercise 6.3
Instructions:
Use the next 5 minutes to complete the following.
- Label this exercise: Lab 6.3
- Do not submit exercises until all of them from today's lesson are finished
- Answer the following questions.
Exercises and Questions
- Find all requests that produced an HTTP 404 -- Not Found error message on your server.
- What sort of things should you look for in log files if you suspect that someone is attempting to crack your server?
back to top
6.4: Statistics
Objectives
At the end of the lesson the student will be able to:
- Determine how many people have been visiting your site
- Use
grep to gather statistics
- Use
cut and awk to count unique hosts
|
back to top
6.4.1: Log File Analysis
- One statistic people usually want to know is how many people are visiting
- Looking at a log file can give you a lot of information
- Some of the useful information you can extract from your logs:
- Most requested pages
- Top entry pages (the first page users enter your site through)
- Information about search engines: most common search engines, common queries, and so forth
- Top referring sites and URLs
- Error counts
- Many programs are available to analyze log files and produce reports
- Popular free programs include the following
- More complete list of freeware log analyzer software available here
- Some commercial programs are
- Further information: Log Analysis Tools with commentary
- Note that any of these would make a good student project
back to top
6.4.2: Using grep to Count Hits
- Can count the number of hits using UNIX tools
- For this exercise, use example access_log
- Can use grep to count hits in log file
- Each line represents a transaction
- Date and time are recorded on each line
- To find all the entries from Feb 2002
cd /usr/local/apache2/logs
grep "Mar/2003" access_log
Can pipe results to wc (word count) program
grep "Mar/2003" access_log | wc
-> 418 4180 46311
Returns number of lines, words and characters
May be many transactions per page -- why?
Can remove gif and jpg files with egrep command
grep "Mar/2003" access_log | egrep -v 'gif|jpg' | wc
-> 414 4140 45795
Divide line count by number of days in month
back to top
6.4.3: Using cut to Count Unique Hosts
- Can count the number of unique hosts using UNIX tools
- This problem requires two steps:
- Get all the hostnames out of the access log
- Remove duplicate entries
- Use cut command to extract the first field from the access log
cut -d ' ' -f 1 access_log
-d option specifies that all fields are separated by spaces
-f option specifies that we only want to view the first field
Will have duplicates since a host accessed more than a single page
Can use the sort command with the -u option
cut -d ' ' -f 1 access_log | sort -u
To get the total number of unique hosts, use the wc command
cut -d ' ' -f 1 access_log | sort -u | wc
To get a count for a particular month:
grep "Mar/2003" access_log | cut -d ' ' -f 1 | sort -u | wc
-> 166 166 2329
back to top
Lab Exercise 6.4
Instructions:
Use the next 10 minutes to complete the following.
- Label this exercise: Lab 6.4
- Do not submit exercises until all of them from today's lesson are finished
- Answer the following questions.
Exercises and Questions
- Determine how many hits a site received in a month (e.g. Feb/2003). What is the average number of hits per day?
Note: can use this example access.log or choose your own
- Determine how many unique hosts have visited the site.
back to top
Wrap Up
- When class is over, please shut down your computer
=> Logout => Shut Down
Due Next: N/A
- You may complete unfinished exercises at any time before the next class.
- Be sure to submit the file to the instructor before the beginning of the next class to receive credit.
- Instructions on submitting exercises are available from the HowTo's page.
back to top
Home
| WebCT
| Announcements
| Schedule
| Expectations
| Syllabus
| Help
| FAQ's
| HowTo's
| Links
Last Updated: 7/16/2003 4:45:37 PM
|