What We Will Cover
Continuations
Questions from last class?
^ top
14.1: Improving Verification With Patterns
Objectives
At the end of the lesson the student will be able to:
- Use PHP pattern-matching functions
- Code regular expressions to match string patterns
|
^ top
14.1.1: About Regular Expressions
- Many programming problems require matching a pattern in string variables
- Verifying the data received from HTML forms is one such problem
- For example, if you are expecting an email address, your script needs to verify the string meets requirements for email addresses
john.doe@hotmail.com
Regular Expression Standards
- There are two main standards for regular expressions: POSIX and Perl
- PHP supports both standards
- We will use Perl-compatible functions and focus on using
preg_match()
Commonly Used Pattern-Matching Functions (Perl Compatible)
| Function |
Description |
| preg_match() |
Searches a string for matches to a regular expression. |
| preg_replace() |
Searches a string for matches to a regular expression and replaces them with the specified text. |
| preg_split() |
Searches a string for boundaries matched by a regular expression and splits the string into an array of strings along the boundaries. |
^ top
14.1.2: Using the preg_match() Function
- You use
preg_match() to search a string for matches to a regular expression
- If the regular expression pattern matches a part of the string, then it returns the number
1
Basic Syntax
int preg_match(string pattern, string subject)
pattern: regular expression pattern
subject: the string to search for pattern matches
For Example
<?php
$pattern = "/se/";
$subject = "She sells sea shells";
$found = preg_match($pattern, $subject);
if ($found) {
echo "Matches";
} else {
echo "No match";
}
?>
- Put the regular expression pattern between forward slashes:
/ /
- If the pattern "
se" is found, then $found is set to the number 1
- Otherwise,
$found is set to the number 0
Further Information
^ top
14.1.3: Using Regular Expressions with preg_match()
- PHP has a special set of pattern-matching characters (meta characters)
- These characters form a small language with each character having a special meaning
- These characters are part of an industry standard
Commonly Used Pattern-Matching Characters
| Symbol |
Description |
| ^ |
Matches when the characters that follow start the string. |
| $ |
Matches when the preceding characters end the string |
| * |
Matches zero or more occurences of the preceeding character |
| + |
Matches one or more occurences of the preceeding character |
| ? |
Matches zero or one occurences of the preceeding character |
| . |
A wildcard symbol that matches any one character |
| | |
Alternation symbol (OR) that matches either the pattern on the left or the right |
For Example
- We can test our regular expressions with a simple form script
- To match when starting with "She":
^She
- To match when ending with "shells":
shells$
- To match an "se" followed by one or more l's:
sel+
- To match an "se" followed by zero or more a's:
sea*
- To match any character followed by an "e":
.e
- To match "He" or "She":
He|She
- Note that you can ignore case by putting an "
i" after the closing slash
/^she/i
"Escaped" Character Literals
- You can match one of the special characters
- However, you must prefix it with the backslash character
/\.\*/
To match one backslash, your regular expression should include "\\"
The backslash is also used to specify non-printing characters like:
| Sequence |
Meaning |
\a |
Alert |
\f |
Formfeed |
\n |
Newline |
\r |
Carriage return |
\t |
Horizontal tab |
Further Information: Backslash
^ top
14.1.4: Grouping Characters
- Regular expressions use parenthesis, curly brackets and square brackets to group characters
- Each type of grouping character has different meanings
- You can combine these grouping characters with other special characters to get flexible and specific matching patterns
Using Parenthsis to Group Characters
- Use parenthesis to group characters in a regular expression
- For example, to match "Dave" or David" in a string"
/Dav(e|id)/
To match "Dave" or David" whether the name starts with a "D" or "d":
/(D|d)av(e|id)/
Further Information: Subpatterns
Using Curly Brackets to Specify Repetitions
- You use curly brackets to specify a range of repetitions for the preceeding character
- You can specify a range of values such as between 3 and 5
z's
/^z{3,5}$/
You can specify a minimum value such as 3 or more z's
/^z{3,}$/
You can specify a maximum value such as 3 or fewer z's
/$z{,3}$/
You can specify an exact value such as exactly 3 z's
/^z{3}$/
Further Information: Repetition
Using Square Brackets to Specify Character Classes
- You use square brackets to specify a character class
- Classes match only one of the characters found between the square brackets
- For example, to match either
sea or sel:
/se[al]/
A more common use is to specify a range of values to match
To specify a range, use a dash (-)
For example, to specify the numbers from 0 to 9: /[0-9]/
To specify a capital letter from "A" to "Z": /[A-Z]/
You can specify multiple ranges within one square bracket
/[0-9a-zA-Z_]/
When the caret symbol (^) appears first, it reverses the meaning
Thus to matches any character not between 0 and 9: /[^0-9]/
Further Information: Square brackets
^ top
14.1.5: Building Regular Expressions That Work
- Regular expressions are very powerful -- but can be almost unreadable
- To build complex regular expressions, start with a simple expression
- After a simple start, refine your regular expression incrementally
- Build it one piece at a time and test each addition as you go
Incremental Refinement Example
- This example incrementally builds a regular expression for form verification
- We want to verify that a form field meets requirements for email addresses
- The steps that follow detail a process for building this verification incrementally
- Determine the precise rules for your field
john.doe@hotmail.com
You determine what is valid and invalid input by examining email addresses and reading specifications. Some of the rules you come up with are:
- User names can have almost any printable ASCII character
- An @ symbol seperates the user name from the domain name
- Domain names can have letters, digits, and hyphens
- Each part of a domain name is separated by a dot
- Set up your test environment
Next you build a form with an element to verify and the receiving function. You decide to use the FormVerifier class and add a verification function like that shown. Make sure these work before you add regular expressions.
function isEmailAddress($field, $msg) {
$value = $this->getValue($field);
$pattern = "/.+/";
if(preg_match($pattern, $value)) {
return true;
} else {
$this->addError($field, $value, $msg);
return false;
}
}
- Code the most specific term possible
You look at the rules and code the most specific line you can easily come up with. Then you test the regular expression to verify it works.
$pattern = "/[_a-z0-9+.-]+@([a-z0-9-]+\.)+com/i";
- Set anchors if you can
Add the ^ and $ quantifiers where possible. This prevents characters before and after the acceptable pattern to be invalidated.
$pattern = "/^[_a-z0-9+.-]+@([a-z0-9-]+\.)+com$/i";
- Get more specific if you can, testing each addition carefully
You may decide to restrict the top level domain (TLD) to only those authorized. This turns out to be quite complicated. Almost every two-letter combination is used by some country. In addition to the well-known generic TLD's of com, edu, net, org, mil and gov, there are many new TLD's: biz, info, name, coop, aero and museum. More are being suggested and adopted every year.
We leave the coding of a TLD regular expression as an exercise for the student.
^ top
14.1.6: Summary
- Regular expressions enable a script to look for character patterns in a string
- PHP supports many functions useful for use with regular expressions
- The most useful function for verifying user input is preg_match()
- Regular characters are matched in an expression like:
/She sells/i
Special "meta" characters are used to form a small language for matching patterns
You use parenthesis to goup characters
/(D|d)av(e|id)/
You use curly brackets to specify a range of repetitions for the preceeding character
/^z{3,5}$/
You use square brackets to specify a character class:
/[0-9a-z_]/i
Regular expressions are very powerful -- but can be almost unreadable
You must build them carefully by starting with simple expressions that work
Refine and test your regular expression incrementally
Build it one piece at a time and test each addition as you go
^ top
Exercise 14.1
- Develop a regular expression for verifying top-level-domain (TLD) names.
- Make sure your regular expression works with the email pattern we have devloped so far.
$pattern = "/^[_a-z0-9+.-]+@([a-z0-9-]+\.)+com$/i";
- You may use the test script to test your changes.
^ top
14.2: Scripting the Internet
Objectives
At the end of the lesson the student will be able to:
- Send email from PHP scripts
- Work with URLs
- Read and parse web pages
|
^ top
14.2.1: Sending Email
- Sometimes you want to send email:
- New password
- Order confirmation
- Survey results
- PHP provides a funtion called
mail() that sends e-mail via SMTP
Basic Syntax
boolSuccess mail(toAddress, subject, message);
toAddress: destination address of the e-mail
subject: subject line of the email
message: text of the email message
For Example
<?php
$to = "someone@somewhere.com";
$subject = "Today's Wisdom";
$message = "
A Person Who Asks A Question
Is A Fool For Five Minutes.
A Person Who Doesn't
Is A Fool Forever";
echo mail($to, $subject, $message);
?>
Security Considerations
- Do not use a web form for the
toAddress
- Also, do not read a form variable for the
toAddress
- This would let anyone use your mail server to send anything
^ top
14.2.2: Verifying Network Information
- Sometimes you need to verify network information
- For example, you want to verify that an email address or URL is valid
- With PHP, you can look up hostnames, IP address and MX records
- An MX record is short for mail exchange record
- MX records are stored at the DNS and are looked up like a hostname
- If no MX record exists, there is nowhere for the email to go
- There can be more than one MX record, so the function
getmsrr() returns an array
Commonly Used Functions to Verify Network Information
| Function |
Description |
| gethostbyaddr(ipAddress) |
Returns the host name of the Internet host specified by the string ipAddress. |
| gethostbyname(hostname) |
Returns the IP address of the Internet host specified by the string hostname. |
| getmxrr(hostName, mxArray) |
Returns an array of MX host names in mxArray from an email hostName. (Not implemented on Windows) |
| parse_url(url) |
Returns from the URL string an associative array with the following indexes (if present): scheme, host, port, user, pass, path, query, and fragment. |
Example Checking a URL
<?php
$url = "http://www.edparrish.com/cis165/04s/lesson13.php";
$urlArray = parse_url($url);
$host = $urlArray['host'];
$ip = gethostbyname($host);
if ($ip != $host) {
echo "Host for URL has a valid IP";
} else {
echo "Host for URL does not have a valid IP";
}
?>
Example Checking an Email for MX Records
<?php
$email = "someone@totallyBogusEmailServerName.com";
$emailArray = explode('@', $email);
$emailHost = $emailArray[1];
$result = getmxrr($emailHost, $mxhosts);
if ($result) {
echo "MX host exist";
} else {
echo "MX host not found";
}
?>
^ top
14.2.3: Reading Pages from a URL
- You can easily read a page from a URL
$page = file_get_contents($url);
Many of PHP's Filesystem functions work with Internet sources
Some Functions that Read from URL's
| Function |
Description |
| file(url) |
Returns an array containing the contents read from the string url, with each element of the array corresponding to a line in the file. |
| file_get_contents(url) |
Returns a string containing the contents read from the string url. Note: Needs PHP version 4.3 or later and so does not work on classroom computers. |
For Example
<?php
$url = "http://www.edparrish.com/index.html";
$page = file_get_contents($url);
echo $page;
?>
^ top
14.2.4: Parsing a Web Page
- You can use information from other parts of the web in your own pages
- In general, the steps you follow are:
- Find an original source URL
- Read the information from the URL
- Parse (extract) the data you want to use
- Finding the information might involve some detective work
- We looked how to read the information in the previous section
- To parse the information, you often use regular expressions
- Function
preg_match() allows you to include an extra parameter for matches to the pattern
Syntax
int preg_match(string pattern, string subject, array matches)
pattern: regular expression pattern
subject: the string to search for pattern matches
matches: optional argument that is filled with the results of search
For Example
<?php
$symbol = "AMZN";
$url = "http://www.amex.com/equities/listCmp/"
."EqLCDetQuote.jsp?Product_Symbol=$symbol";
$page = file_get_contents($url);
$pattern = "/\\\$[0-9]+\\.[0-9]+/i";
if (preg_match($pattern, $page, $matches)) {
echo "$symbol last sold at: ";
echo $matches[0];
} else {
echo "No quote available";
}
echo "<br>Information retrieved from:<br>"
."<a href=\"$url\">$url</a><br>"
."on ".(date('l jS F Y g:i a T'));
?>
^ top
14.2.5: Summary
- PHP has numerous functions for using the Internet
- PHP provides a funtion called
mail() that sends e-mail via SMTP
- Function
parse_url() parses a URL and returns its various parts
- You can use PHP functions to verify user-supplied information
gethostbyname(): returns the IP address of a host, if found
getmxrr(): returns the MX records for an email host, if found
- Also, you can read entire pages off the web:
$page = file_get_contents($url);
Once you read the page, you can use regular expressions to extract information
preg_match($pattern, $page, $matches);
The information extracted is returned in the $matches array
echo $matches[0];
^ top
Exercise 14.2
- Modify the following script to extract information from a web page of your choosing.
Cannot find file: examples/urlparse2.php
^ top
14.3: Lecture Finale
Objectives
At the end of the lesson the student will be able to:
- Discuss the final preparation for the project presentation
- Advise the instructor on how to improve future courses
|
^ top
14.3.1: What We Have Learned
During the course, we have learned how to:
- Query a database using SQL
- Create databases and tables
- Insert into a database and update data already in a database
- Design a database and put it into an optimal form
- Improve the performance of a database with indexing
- Use database functions for grouping, aggregating and other procedures
- Create PHP pages and display database data in a Web page
- Work with PHP variables and process form data
- Save form data into a database
- Use conditional statements to make our applications appear intelligent
- Use loops to repeat code
- Use arrays to group our data
- Use database meta-data to format data in Web pages
- Write functions to group related code
- Use functions and include files to organize Web applications
- Create classes and objects using PHP
- Use classes that makes processing forms easier for users
- Pass data from one page to another using:
- Hidden fields
- Hypertext Links
- Cookies
- Sessions
- Apply these techniques to a multi-page authentication system
- Handle database errors
- Implement a shopping cart
- Improve the security of our Web applications
- Including our responsibilities as developers
- Safeguard our database from user input
- Especially from SQL injection
- Use regular expressions to validate data
- Send email from an application
- Read and parse web pages
- With this knowledge you can develop professional-looking database-driven Web sites
- Your project will allow you demonstrate what you have learned
- Suggestions for improvement?
^ top
14.3.2: End of Course Survey
- Please take a few minutes to answer this short survey
- This will help the instructor to improve future courses
- Survey respondent answers are anonymous
- The link to WebCT is here
- You may want to rate the textbook at your favorite book seller, such as:
- Also, you can rate me at Katsu's site: Student Feedback
^ top
14.3.3: About the Project Presentation
Before the Presentation
- Submit your project to WebCT before the presentation:
- Bring a written report on paper to give to the instructor before the presentation
During the Presentation
The presentation should have the following:
- A brief introduction describing the purpose of your database application
- A demonstration of your project that includes:
- A multi-form sequence where information is retained across pages
- User authentication
- User-error handling
- A description of your database design
- A list of table names
- A brief description of the data that the tables contain
- A demonstration of any extra-credit features
- Point out the extras so we can all appreciate them
- Feel free to display your written report during the presentation
- Keep the presentation to 15 minutes or less
After the Presentation
- Feel free to leave (or stay) after your presentation
- You can present to the instructor alone after the other presentations are through
^ top
Wrap Up
^ top
Home
| WebCT
| Announcements
| Course info
| Expectations
| Schedule
Project
| Help
| FAQ's
| HowTo's
| Links
Last Updated: December 08 2004 @15:38:41
|