Web Development

Web Development Tips & Tricks, the things that you don’t want to figure out.





Archive for the ‘Regular Expressions

Dynamic Content and Static URLs

Saturday, April 14th, 2007

SEO at its best!

Search engines will look at Dynamic pages as their own pages (with their own query strings), and index them as such. The advantage to having a Static URL is you can use keywords in the title which well help with your SEO. For instance, if you had an article about global warming, you might want the name to be “http://www.yoursite.com/global-warming.php”. But you don’t want to have to actually create a static page for each of those URLs.

There are two ways of doing this, one better than the other. What I had been doing for a long time is make a static page, then including the dynamic page and passing variables to it. Something like:

<?php
$articleid = 27;
include(“dynamic-page.php”);
?>

The dynamic page is coded to take that variable and then it displays the page. This is definitely better then redoing the code every page, but it can be improved.

.htaccess is the key to this solution. It can only be used on Apache servers as far as I know (sorry to those of you who have a Windows Server). Regular Expressions are also used in .htaccess, and will be in this example. The basis of this solution is you use .htaccess to redirect static links to a dynamic page, but when you redirect the links address doesn’t change. Here’s an example of how you can do that (put this code in your .htaccess file).

Options +FollowSymLinks
RewriteEngine on
RewriteRule ^(.*)-a-([0-9]*).php$ /dynamic-page.php?articleid=$2 [L]

Ok, basically says any link to .php that ends with “-a-” (number) “.php” will be sent to your dynamic page, with the query string “articleid=” (number). So, lets say you went to “global-warming-a-27.php” it would redirect the page to “/dynamic-page.php?articleid=27″. Now your page can use that ID to call the right article from the database or wherever you’re storing the article.

That’s it! It’s as simple as that. This will allow you to create static urls, so Search Engines can find them and people can book mark them, but you don’t have to create any extra code.

-Kerry

Another Regex

Tuesday, April 10th, 2007

Hello All,

You may have noticed that I’ve been writing less posts recently. That’s because of two reasons: I’ve been doing a lot of other activities, and the project I’m working on is a long one. I usually write about something that I’m working on every couple of days. I’ve been building a shopping cart, so I’ve be having several posts about Paypal integration, setting up various accounts and so on.

Today’s post is a small one, just another regular expression I made. Here it is:

^[-+](?:[0-9]+(?:\.[0-9]{1,2})?|(?:\.[0-9]{1,2}){1})$

This will match amounts that are lead by a + or – symbol. For example, it will match these: “+3″, “+599.99″, “-4.45″, or “-.5″. It will not match “3″, “-.”, “+3.356″ etc.

I hope this will be of some benefit to you.

-Kerry

Reading XML – XSL or PHP?

Friday, April 6th, 2007

XML, or Extensible Markup Language, is probably the most versatile language there is.

Since you make your own tags, and then get readers to use them, you can use the data in them for pretty much anything. Well, today I was making my php reader display a bunch of URLs based on a category they were in. They also had special attributes and so on, all of which was contained in an XML file.

As I was looking at my sitemap, I realized that it had formatting, but it was an XML, and XML files can’t format themselves. So, I found out that it called an XSL (or Extensible Stylesheet Language) file that gave it formatting. XSL is a language used to make XML in HTML or XHTML. Call glorified HTML or XHTML (has in built functions to read XML). You can make entire pages, or parts of pages in XHTML and call in the data from an XML sheet.

If this is the language, why should you use the roundabout method offered in PHP? It took me some time thinking and wondering the advantages of each. Realise, I don’t know XSL (I started learning today), so I don’t know its full functionality. I do know, however, that you can see the XSL file and you can see the XML file. There’s a difference.

PHP code is server side, meaning completely executed before it reaches you, so you can’t see it. This includes the call to the XML file. This means you can display data based on an XML file without letting the surfer know that you called an XML file. This is a security feature, which I kind of enjoy for my current purpose of the URLs.

Now, I assume that XSL has a lot more functionality than PHP, seeing as how it is a language specifically to convert XML into HTML or XHTML. PHP is a complete programming language. It actually didn’t have a lot of it’s support for XML until PHP5 came out.

One more thing while we’re on the subject of URLs. I needed to grab just the domain name out of a url. For example in “http://www.google.com” I want just “google.com”. In the case of “http://mail.google.co.uk” should match “mail.google.co.uk”. I decided to see if I could put my regex skills to work, and I came up with this: “(?:www\.)?([^.\/]+\.(?:[a-zA-Z]+\.)*(?:[a-zA-Z]{2}\.[a-zA-Z]{2}|[a-zA-Z]{2,4}))” which matches both of those examples. You simply have to grab the return value (there’s only one) and it will have the right result.

-Kerry

Mastering Regular Expressions (2)

Friday, March 30th, 2007

First chapter complete! And so I shall share with thee my knowledge of this divine subject…

They have a summary of the chapter, so I’ll basically just hit those points. Here are the symbols and what they mean.

  • . or dot means any character – number, digit or symbol.
  • [] encloses a character class, which has it’s own set of rules. It will match any one character listed. For example, [0-9a-zA-Z] is looking for any 1 character that is a digit, a lowercase letter or an uppercase letter. The hyphen in a character class shows a range, like 0-9 means 0 through 9. The only exception is if it’s at the beginning, like [-0-9] would mean to either get a hyphen or a digit.
  • \char can change the value or escape a character. For instance, \. would mean literally a ., while \< and \> can mean the beginning and ending of a word, respectively.
  • ? applies to the preceding 1 character or expression, meaning there may be 0 or 1 of the expression. So you could say colou?r to allow an optional u in the word color. If you said [0-9a-zA-Z]? it would mean an optional letter or digit.
  • * applies to the preceding 1 character or expression, meaning there may be 0 or more. So if you said [0-9]* it would mean that you were allowing an infinite amount of digits, or none.
  • + applies to the preceding 1 character or expression, meaning there may be 1 or more of the expression. So if you put [0-9]+ that would mean that there needs to be atleast 1 number, but can contain an infinite amount.
  • ^ (also called a caret) means to match from the beginning of the line or string, so if you said ^[0-9], it would not match the string “I’m 12″, but it would match the string “12 I am” (it would match the 1, since the character class only matches one character). It also has a special meaning inside of a character class. If you put a caret inside a character class, like [^0-9], that means anycharacter that is not a ______. In this case, any character that is not a digit.
  • $ means it matches at the end of a line. So if we used our previous example, but warped a little bit, ^[0-9]$ would mean the string or line would need to have 1 digit on it, and nothing else. Both “12 I am” and “I’m 12″ would not match. “12″ would also not match. It would match “1″ or “2″. On the other hand, if you put in ^[0-9]+$ that would mean that it would match “1″, as well as “12″, or “314159265″ etc.
  • \< As I briefly mentioned earlier means the start of a word. So if you said “\<at” (I’m starting to use double quotes to make it more clear) that would match “at” or event “attached”, but it would not match “categories”.
  • \> means the end of a word, so if you put “ate\>” it would match “ate” and “fate“, but not “categories”. “\<ate\>” would only match the word “ate”. Note on the last two, they are not supported by all regex utilities, so you should test it before you rely on it.
  • | means alternate when inside of parentheses. For example, if you were matching an extension of a file, like an image, your regex might look something like “(jpg|jpeg|gif|bmp|png|tiff)$”. That would mean any of those options would match.
  • () (parentheses) can be used for alternation, or grouping so that the symbols, ?, * and + will work on entire expression. For instance, if you said ([0-9]\.)+ that would mean you would have the whole expression (0-9]\.) 1 or more times to match. It is also used for captures (coming up next).
  • \1, \2, etc… refers to a back reference. This means that \1 will refer to the text matched in a set of parentheses. The example used in the book was used for editting. If you wanted to find everytime a word was repeated, you could use a regex like “\<([a-zA-Z]+) +\1\>” and that would match any word that doubled itself. If that’s a bit confusion, in english it says, start at the beginning of a word, followed by a word with 1 or more letters, with 1 or more spaces between it and the next word, while the next word is the same as the first word. If that’s a bit confusing, I’m sorry, but here’s an example it would match. “He ran ran to the store.” It would also match “He ran    ran to the store,” or a number of other combinations. Back references may not be supported by all regex checkers, so make sure you test it.

I hope that helped, and if you have any questions, feel free to contact me or get the book. It explains it far better than I did (that was basically a 30 page chapter). I’m just giving you the highlights if you want to get into the nitty gritty fast.

Dreamweaver Templates Replacement

Tuesday, March 27th, 2007

DW templates are useful for the moment, but can be quite cumbersome to other developers.

They essentially make it so you only have to change the necessary parts of each page, and you can change the template and it will change every page. Well that’s great, but I know from my experience and my web friends that it is very hard to transfer a template to some else.

This means that if you ever stop using the template, or another developer has to use it after you’re done, and they can’t easily grab the template off the web (it can be hard,) that they are scrued and Dreamweaver won’t allow them to change the page. Of course you can do little tricks to each page and eventually regain full control, but that can take some time.

Well, why don’t you just use a find and replace? You should! The only thing is, that there are many different tags that it uses with different attributes. To name a few:

  • <!– InstanceBegin template=”blabla” etc. –>
  • <!– InstanceEnd –>
  • <!– InstanceBeginEditable name=”blabla” –>
  • <!– InstanceEndEditable –>

It may take some time to get through all of this. A friend of mine had this problem today, so I half developed it and asked for some help on a regex forum, and came with up with a great regular expression: “<\!–\s*Instance(Begin|End).*?–>”. When you do a find and replace, you have to do it on closed documents (can’t use current document or open documents). The option I used was “Entire Current Local Site” and replace it with nothing, which removed it all affectively.

You might complain, but it’s so useful! I want that functionality. Well, it’s definitely a good functionality but you should go about it in another way. Use a scripting language like PHP or ASP. Usually you only really need to change the body of the page. This means everything that goes before that point should be included in a filer with a name like “header.php” and you do a simple call like “include(‘header.php’);”, and you do the same thing with the footer.

This will keep the same functionality as with the templates. Now, if you want more editable regions, use variables instead. This varies per scripting language, but I’ll give an example with php. If I wanted a unique title tag, I would put the following code in the header.php file:

<title><?php echo $title; ?></title>

Now, when I called the header file, I could do something like this:

<?php
$title = “Phoenix Development – Home”;
include(“header.php”);
?>

That would automatically put your title in the right place. You just do the same includes on every page and change the variables you use, with the one section you have for the main content. Your page would end up looking something like the code below:

<?php
$title = “Phoenix Development – Home”;
include(“header.php”);
?>
<div id=”text”>
<h1>Home</h1>
<p>Hello and welcome to Phoenix Develo…</p>
</div>
<?php include(“footer.php”); ?>

This is much better code design and the web developer after you will be very glad you had it in this easy style. Also, because you can now universally update all your pages by changing something in the header.php or footer.php, you can change your pages much faster and much more efficiently.

I hope this helps,
Kerry

Mastering Regular Expressions (1)

Monday, March 26th, 2007

Hey everyone,

Sorry for the delay between blogs, I was at Mammoth and had a blast skiing. Anyways, I got back a couple days ago and just picked up my Mastering Regular Expressions book which came while I was away.

I’ve just been reading the preface and it definitely looks like it’s going to be a good book, so I thought I should keep you updated. Probably every couple of posts I’m going to do one in this series. I haven’t learnt anything yet, but I’ll be giving you the highlights.

-Kerry

Regular Expressions

Monday, March 12th, 2007

This is an area that I had/have very little experience with, but is one of the most powerful tools a web developer can use. If you don’t know what it is, I suggest you learn.

So, what is a regular expression? “A regular expression (regex or regexp for short) is a special text string for describing a search pattern.” (Regular-Expressions.info)

You might be thinking “Okay, but what does that mean?” It basically means you can define a set of symbols and letters to find a string or character within a string. A string could be defined as a certain set of letters, words, numbers, symbols or a combination of them all. For example, the regular expression “[0-9]” would find the first digit in this phone number string “(800)123-4567″. It would return saying it found the letter 8.

They are a big use in validating an input on a form. For example, lets say on your Contact page you have an e-mail form where they are asked to put in their name, email and a message to you. Well, you don’t want people to give you a bogus email do you? Using a regular expression like this: “\b[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b” and that will make sure they use the right email. Of course that is extremely confusing if you don’t know what they do, so that’s why I’m writing this blog so you have some resources so that you can learn or just get regular expressions.

Probably the best site to get regular expressions from is RegExLib.com, where thousands of users submit their own RegEx’s and are all available to you. They have a search so you can search “email” and it will give you tons of different examples, or “date” (I made one that I will get to in a bit) and so on. It also has a resources section, which includes some free downloadable programs that will help you validate or create regular expressions. I got one of them, The Regex Coach. Unfortunately, it tries to be helpful and auto update as you type, but it can severly lag the program. Although this feature can be useful, I’d rather have a hotkey to press to check the Regex, so I might be searching for another program soon.

Another useful link is RegExAdvice.com, where they have a forum where you can ask for help with your regex. There are also tons of blogs specifically about regex, so if you need help with someone from actual live people, this is the place to go. I haven’t actually done that much here, mainly because I hate waiting for other people to continue working, I like finding the answer myself.

Tutorials. Well, that’s probably what I should have infoed you about first. I would have, too, if I had gone in that order, but I only did my tutorial after the rest of my studying recently. I’ve done some here and there but they were really confusing and could only vaguely grasp the concepts. Well, I was searching for a good one so that I could recommend it to other people, and I found one: Regular-Expressions.info. Here you will see some of the same examples I gave at the top, and also a “quick” tutorial. I did that quick tutorial tonight, and though it’s a lot of data to grasp at once (I don’t recommend trying to go through the whole thing in one go), it’s very good. If you have seen regular expressions before and only have a half understanding like I did, it goes over a lot of the symbols that are used and makes things make so much more sense. Apart from their quick tutorial which is divvied up into sections, they also have thorough tutorials on each section (not a paragraph or two). I’m probably going to go through those one at a time to make sure I really understand everything.

Of course that’s the free way of learning, and it’s how I’ve learned pretty much all of my web knowledge. You can, however, also buy books. I’ve had some good recommendation for O’Reilly books (like on CSS and HTML/XHTML), so I checked if there was an O’Reilly Regular Expressions book. They actually have two, a “Pocket Reference” and one called “Mastering Regular Expressions“. I plan on getting the latter first, and once I do “master” them get the pocket reference.

I actually do often learn a lot just by pocket references, or cheatsheets, things that just give you the raw data for you to program. They’re quick and get to the point, and I learned about about regular expressions from RegExLib.com’s Cheatsheet, which goes over a quick definition of pretty much all the symbols, though they aren’t necessarily complete.

Oh, I thought I should note that you need to be careful if you are using a script (such as javascript) that uses “\” for other functions (like “\n” means new line). For instance, “[0-9]” and “[\d]” both specify a single digit and are completely identical, but in javascript you can’t say “[\d]“, you have to say “[\\d]“.

This is to make sure a field contains a format in the php datetime (includes both the date and time) format for a date only, which is “yyyy-mm-dd”: “^[0-9]{4}-[0-9]{2}-[0-9]{2}$”. If you don’t have that specific format, my javascript validator will alert you.

The other one I made, which I reluctantly realized had already been created by someone on RegExLib, is one that validates image extensions. It is true that mine is slightly better in the fact that it validates the whole image file name, and not just the extension, but here you go:
“^[0-9A-Za-z_ \-]+(.[jJ][pP][gG]|.[jJ][pP][eE][gG]|.[gG][iI][fF]|.[pP][nN][gG])$”. I might change it a bit later so that you can enter a folder name, like “/images/logo.jpg” and it will work, rihgt now you have to enter just the name “logo.jpg”. That verifies the formats jpg, jpeg, gif and png.

One last thing, if you need an online validator, RegExLib.com has one and the one I use implements the javascript regex function (as I told you there can be slight differences depending what regex engine you’re using), so that I get as close to right for my javascript validation as possible before I implement. Here’s the link: Regular Expression Tester.

I will probably be doing more updates once I get the O’Reilly book on this subject, or if I come up with more cool regexs, but this is it for now.

-Kerry

P.S. In case anyone is wondering, I am not trying to “promote” these sites or companies, I don’t get any money for it and I don’t have some agreement where I link to them and they link to me (I wish), it’s for the sole purpose that you’ll have the references you need.