Week 9

Updated on 28 Dec 2018

Stateless Entity

HTTP is a stateless protocol. What this means is that PHP has no way of remembering a past request. The only information PHP has available to it when a script is executed is the information it was provided during the HTTP request.

Another way of looking at this is that every time you execute your script, all information is lost and variables are reset! This has its advantages in some situations, however it certainly becomes an issue in other situations.

Another issue related to the statelessness is that the HTTP protocol has no way of distinguishing one visitor from another.

Questions:

  • Websites do have a way of ‘remembering’ stuff. What techniques are available for a website to remember?
  • What are some of the advantages and disadvantages of having HTTP as a stateless protocol?

Cookies

Cookies are small packets of data, no larger than 4kB that are stored on the user’s computer. The data file only contains text and is used to communicate between the user’s browser and the web server. The use of Cookies can solve the problems associated with HTTP being a stateless protocol because data can be stored in the cookies and then be used at the next request.

There are restrictions on cookies though. Any server can attempt to send a client browser a cookie, however there is never a guarantee that the client browser will accept the cookie. Further more, a browser will send cookies back only to the same domains that created them!

Also there may be a set number of cookies the browser will retain per domain. You need to be aware of this limitation if your website is going to be storing more than 20 cookies!

Questions:

  • What is a common misconception about Cookies?
  • Are cookies the ideal way for solving the Stateless entity problem with HTTP?

Sessions

One of the most useful applications of cookies is the capability to create sessions, which truly allow you to overcome the state-less nature of the HTTP protocol. When working with sessions in PHP, you are given the capability to store variables (including arrays and classes) between script executions.

Sessions are similar to cookies in that they are used to store information, however sessions are stored on the web server whereas cookies are stored on the client machine. A cookie is used in conjunction with Sessions to identify which session belongs to which browser (This is generally the case, however some websites use other techniques to associate a user with a session)

Sessions are much more sophisticated in their capabilities than just cookies alone which is why in this unit we will be working almost entirely with Sessions.

Questions:

  • Do you understand what a session is?
  • Has anyone seen session / cookie options in their web browser?
  • Why are sessions more sophisticated?

Using sessions is very easy, however there are a few rules that need to be abided by. First you need to start the session, and this needs to be done before data is sent to the browser. Consequently you’ll find the session_start() function near the very top of your PHP scripts. (session_start() is the function that is called to start a session!).

You can use session variables just like regular variables as shown in the example below.

//start the session
session_start();

//Check the credentials from the login form.
if(isset($_POST['username']) && isset($_POST['password']))
  {
  if(check_user($_POST['username'], $_POST['password']))
    {
    $_SESSION['logged_in'] = TRUE;
    $_SESSION['username'] = $_POST['username'];
    }
  else
    ...
  }

Questions:

  • What does the above code do?

If you have a config.php file, you might set some default values for various session variables. This would also be a good opportunity to call session_start so that you don’t have to call the function in every script that uses sessions.

//config.php
session_start();

if(!isset($_SESSION['logged_in']))
  $_SESSION['logged_in'] = FALSE;

if(!isset($_SESSION['username']))
  $_SESSION['username'] = '';

The technique shown above combined with the code from the previous page means you can now create ‘secure’ pages that can only be accessed if the person is logged in. One such example is shown below.

//securePage1.php
include_once 'config.php';

//if not logged in, redirect to the login screen!
if(!$_SESSION['logged_in']))
  {
  header("location: http://site.com/login.php");
  exit;
  }

Another header request is refresh. You have probably seen it in use on a number of websites already. header("refresh 5 http://...

Question:

  • Does the code make sense? Why is isset() used in the config.php file script?

The examples shown from the previous two sections should suffice for most web development that utilizes sessions. However there maybe situations where you want to delete a session variable completely or maybe reset all the session variables.

$_SESSION is a super-global array. To clear all elements, and effectively destroy the variables in an array you can redefine $_SESSION with an empty array as shown below.

$_SESSION = array();

To remove the session itself, you can call the session_destroy() function, and even call cookie functions to destroy the cookie from the user’s computer. There won’t be too many situations where you would need to do this though.

Questions:

  • If sessions are stored on the web-server – whereabouts on the server would you find the session data?
  • How does the php.ini file affect sessions?
  • What general purpose session information can be changed via the ini_set function?

Session Security

Manipulating the session data of another user, or using a website with someone else’s session id is something that hackers attempt to do. What ever their reason, it is usually not good. So, let’s look at how a hacker might do this so that we can write code that makes it more difficult (or impossible) for them to do this.

session_name('COMP306-Sample');
session_start();

$_SESSION['user_name'] = 'Brent';

Parameter Meaning
Name The name of the session as defined by the session_name function, or if this is not used, the default set in the PHP.ini file.
Value This is the session id. We can change this value to another session id to trick the server into thinking that we are that user; but we’d have to know the session id of another user first!
Host & Path Cookies are only valid for the server that creates them. Sometimes they can be restricted to a specific path.
Expires Your session can be set to retain the information well after the user leaves the web site (This is how websites remember who you are when you return to them a day or 2 later

If we somehow came across someone else’s session id, then it is a fairly simple task to change the session id by editing the cookie as shown in the screen shot above. We can not prevent that from happening, but we can add some user specific data to the session and check this data on each page visit. For example, when a user successfully logs in, we might add the following…

session_name('COMP306-Sample');
session_start();

$_SESSION['agent'] = md5($_SERVER['HTTP_USER_AGENT']);

And, when the user navigates to a page that checks to see if you’re logged in, we’ll add this as well.

if($_SESSION['agent'] != md5($_SERVER['HTTP_USER_AGENT']))
  {
  //oh oh.  The user agent has changed.
  //we better log them out and redirect.
  ...
  }

The HTTP_USER_AGENT is the header of the current request. It contains a combination of the browser and OS that is requesting the page.

Session Fixation

Bad web sites will be written in such a way that session ids are passed around in the URL. From a hackers point of view this is good because they can then pass you a link to a web site with the session id that they want you to use (which is one that they’ll piggy back on).

PHP has advanced session handling functions that allow you to re-generate the session id (session_regenerate_id). If you have a bad web site like the one described above, then consider re-generating the session id after the user has logged in.

Questions:

  • What purpose does the session_name function serve? Whereabouts do we see the impact of this function?
  • session_save_path is a function that allows you to change the location of where the session data is going to be saved. Why would this be important?

The definition of session_set_cookie_params is:

void session_set_cookie_params ( int $lifetime [, string $path [, string $domain [, bool $secure= false [, bool $httponly= false ]]]] )

  • What do you think the lifetime parameter is used for, and why might it be useful?
  • The functions mentioned in the previous 3 questions all have a common pre-requisite. What is that?

Regular Expressions

Regular expressions are on a par with arrays and pointers. This is where you separate good programmers from those that can only pretend. You may believe that I’ve added regular expressions into the unit to punish you, but truth be known we’re only up to page 10, and I needed more material to cover in the lecture. Plus I get to ask tricky questions about regular expressions on the exam!

Below is an example of a regular expression (wouldn’t you like this on your exam).

if(preg_match('/^[\w.-]+@[a-z0-9_.-]+\.[a-z]{2,6}$/',
                  stripslashes($_POST['email']))

Regular expressions are like secret codes. The trick is understanding how to decipher them (and in some cases encode them). Page 394, 397, 400 & 407 of the course text book has a couple of tables to help with the decoding.

Here’s what a couple of the characters mean:

  • ^ Indicate beginning of String
  • $ Indicate end of string

This part is the first logical part of the expression:

[\w.-]+

The \w (page 400) is any word character. It covers the letters a to z in upper and lower case plus the numerals 0 to 9 and the underscore.

The next logical section is this:

@

This means that immediately following the first logical section a ‘@’ character must appear. The next logical section is this:

[a-z_.-]+

This is almost identical to the first logical section. In this part of the expression, the string must have 1 or more characters (denoted by the ‘+’ sign page397) that reside in the group a-z or 0-9 or . (dot) or – (dash). Any character not in that group will mean that the preg_match function will return FALSE.

Also note that using a-z0-9 is similar to using [\w] with a pattern modifier i (page 407) to denote case insensitive mode.

Questions:

  • Is the decoding of regular expressions starting to make sense?
  • In the last paragraph I said ‘similar’. In what way is it not the same?
if(preg_match('/^[\w.-]+@[a-z0-9_.-]+\.[a-z]{2,6}$/',
              stripslashes($_POST['email']))

The next section of the regular expression is

\.

Like the ‘@’, this means that immediately following the previous section a ‘.’ (dot) must follow. Notice that this ‘.’ (dot) is escaped so that the regular expression recognises the dot as a string character and not one of the regular expression Meta characters. The last section of the regular expression is shown below.

[a-z]{2,6}

This means that after the ‘.’ (dot), 2, 3 or 4 characters in the range of a-z must follow. The $_POST['email'] kind of gives away the purpose of this regular expression, but I’ll say it anyway. This regular expression is used to check patterns that match email addresses.

For example, the following meet the criteria of the regular expression.

if(preg_match('/^[\w.-]+@[a-z0-9_.-]+\.[a-z]{2,6}$/',
              stripslashes($_POST['email']))

Question:

Would any of the following characters pass the regular expression test?

  • $
  • T
  • . (dot)

What if the email address was written as part of a statement, i.e. my email address is brent@acu.edu.au ?

Pattern Matching

The previous example was very specific in what it was trying to match. It had to be an email address and nothing else. We could tell by the use of ^ and $ to denote that our pattern must start at the beginning of the string and finish at the end of the string.

What if we wanted to match a particular word that might appear anywhere in a piece of text? Just to make it interesting how about a word that is spelt differently by the Americans and British? Like colour (American spelling is color).

So we want a regular expression that will scan text input and determine if it has any colour (or color). The example below demonstrates how to do this.

if(preg_match('/col(o|ou)r/', $text_string)) 

The pipe symbol | is used to denote the logical OR.

Notice in this example we are not using the ^ or $ Meta characters. This means that the pattern can occur anywhere inside the string. I.e.

I really want some colour in my day.

This pattern is looking for any words that start with col. The brackets are used to group sub-patterns, and in this case we are looking for an o or an ou. Finally we want our pattern to end with an r to round out the spelling of our word, colour or color.

Question:

  • How can we ensure that our pattern picks up our word (color or colour) if the word tested has a capital ‘C’?