Apache mod_rewrite for Oracle UCM Users

Kyle had a useful tip a while back about a little-known feature in Oracle UCM: the WebUrlMapPlugin which allows friendly URLs for Site Studio pages. It's extensible, which means you can add your own friendly URLs to the map if you wish.

For example, you could make the URL http://example.com/file/foo automatically redirect to the GET_FILE service to download the item with the content ID of "foo." You set this by going to the page "Administration > Filter Administration > Edit Web Map Urls," then assigning the prefix /file to a map like this:

<!--$cgipath-->?IdcService=GET_FILE&dDocName=<!--$suffix-->&allowInterrupt=1&
        RevisionSelectionMethod=LatestReleased&Rendition=web&noSaveAs=1

Looks a bit hairy... but those are all the parameters you need to run the service. You can see why some URL aliases come in handy! Kyle has some more suggestions on his blog post.

This is fine and good for internal UCM products... but it can be somewhat limited. This map looks like it supports IdocScript, but you actually only have very limited token replacement functionality...

If you are using Apache, a better option would be to use mod_rewrite. This is an extremely powerful add-on module to Apache that allows you to "rewrite" URLs to make friendly aliases. It's one of the few Apache modules so popular that it has it's own book!

There are some good cheat sheets and beginners guides available for mod_rewrite... but lets' just dive in for a quick tour.

To get started, first we need to enable the module. This is almost always installed with Apache, so installation should be simple. Append the following code at the bottom of your httpd.conf Apache configuration file:

# load the rewrite module, if its not already loaded
<IfModule !rewrite_module>
       LoadModule rewrite_module /usr/lib64/httpd/modules/mod_rewrite.so
</IfModule>
# turn on rewrite debugging so we can figure out when redirects don't work
RewriteLog "/apache/logs/rewrite.log"
RewriteLogLevel 3
# turn it on
RewriteEngine On

Now, mod_rewrite should be running... the next step is to create a RewriteRule to match the pattern. The mod_rewrite module uses Regular Expressions to match patterns in the incoming URLs. If a pattern matches, then the rule applies. The first part of the rule is the pattern to match, the second part of the rule is where to redirect the URL to. A very simple example would be this:

RewriteRule /file/foo /bar

This rule would redirect all URLs that look like http://example.com/file/foo to http://example.com/bar. Once we add regular expressions, however, things get more interesting... instead of just matching the word "foo", let's say we wanted to redirect ALL of the pages to http://example.com/bar. The rule below would accomplish that:

RewriteRule /file/.* /bar

The "." character says "any character," and the "*" character says "zero or more of the previous character." So, this rule will match everything under http://example.com/file/, and redirect it to http://example.com/bar. Of course, this isn't terribly useful... so lets see if we can make the same file download URL as above.

First, let's add some security... the global wildcard is usually discouraged, because it does actually grab everything, and unless you really know what you're doing it can be a security hole. A better option is to use a token for just characters valid for a content ID... Which means lowercase "a" through "z", uppercase "A" through "Z", the numbers "0" through "9", and the underscore and dash characters... so we do this:

RewriteRule ^/file/([a-zA-Z0-9_-]*)$ /bar/$1

Hopefully this doesn't look too alien... Firstly, the ^ character represents the beginning, and the $ character represents the end of the URL. We then put the allowed characters in brackets "[a-zA-Z0-9_-]" and the "*" says "zero or more of the characters in that range." We put this whole thing in parenthesis to turn it into an "atom".

Once we match the pattern, we can re-use the "atom" in the subsequent redirect URL. The "$1" token represents the first atom found in the pattern... so our URL above will redirect http://example.com/file/foo to http://example.com/bar/foo. For complex redirects, you can use multiple atoms in the same rewrite rule, which can be extremely handy for organizing quick links to content.

Finally, we need to change "/bar/$1" to something that will actually download the file... so we will redirect to a content server service instead, like so:

RewriteRule ^/file/([a-zA-Z0-9_-]*)$ 
        http://%{HTTP_HOST}/idc/idcplg?IdcService=GET_FILE&RevisionSelectionMethod=Latest&dDocName=$1

Ta da! Now the url http://example.com/file/foo will redirect to http://example.com/idc/idcplg?IdcService=GET_FILE&RevisionSelectionMethod=Latest&dDocName=foo.

To support URLs of the type http://example.com/file/foo.txt, you'd modify it to look like this:

RewriteRule ^/file/([a-zA-Z0-9_-]*)\.([a-zA-Z]*)$
        http://%{HTTP_HOST}/idc/idcplg?IdcService=GET_FILE&RevisionSelectionMethod=Latest&dDocName=$1

This rewrite rule has two atoms, but we only need the first atom, so we just ignore the second. However, we could get clever and use a different rewrite rule depending on the file extension... or we can just ignore the extension since the content ID is all that matters.

Apache in 11g?

I'm starting to look into 11g and it seems like Apache is not an option there. Is that true?

Re: Apache in 11g?

you can always use Apache as a web proxy... also WebLogic has a plug-in for Apache to do authentication:

http://download.oracle.com/docs/cd/E14571_01/web.1111/e16435/apache.htm#CDEGCBAC

built into 11g

Also for those interested in 11g, if you are using the built in WLS web server and not some external web server, the features from WebUrlMapPlugin are available in the standard WLS admin interface.

The Web server plugins are not used in that case.

This would also apply to things like ExtranetLook and the cookie login plugin because WLS would handle this now and automatically provides the logout capability.

Just FYI for people.

Creating friendly URLs in web logs

Do you think it would be possible to use this technique to create friendly URLs in the web log?

My example is:

- I have a web site, the web content is managed in UCM
- the web pages are served up through simple Portal pages
- the requests logged in the Portal web server are for content items I.e. Not friendly page names
- the content items have friendly names in UCM

might it be possible to use mod rewrite and an rid call to content server to get the friendly name and log that in the Portal web log?

Post new comment

CAPTCHA
This form prevents comments spam...
Image CAPTCHA
Enter the characters shown in the image.

Recent comments