Tidy Up Plugin

Jun 12, 2006 | Tags: , , , , , , | Written by Administrator

This plugin provides the ability to run HTML Tidy through all your posts, pages, and comments, generating a report on just how dirty your code is. Should you want to, the plugin can also automatically update your database with the cleansed data.

If you are unaware of it's existence, HTML Tidy is a wonderful little tool that is embedded into almost everything nowadays. It's purpose is to take potentially malformed HTML code and produce clean XHTML.

Tidy Up does not require any special PHP configuration. As long as you have the ability to run executables then the plugin will work. Currently the plugin contains Tidy executables for:

  • Linux
  • Windows
  • Mac OS X
  • FreeBSD

It is likely that your web host runs one of these.

Installation

Installation is just like any WordPress plugin:
  • Download tidy-up.zip
  • Unzip
  • Upload to /wp-content/plugins on your server
  • Activate the plugin
  • Give tidy.linux, tidy.osx, tidy.exe, or tidy.freebsd execute permissions for the web server (generally means giving 'x' permission to group/other)
  • Use Tidy Up from the Manage/Tidy Up menu

You can find full details of installing a plugin on the plugin installation page.

Usage

To produce a report of all your posts or comments you need to go to the Tidy Up page of the Manage menu. Here you will be presented with the following interface:

Tidy Up interface

Choose your source (posts/pages or comments) and select your input and output formats. Then press either the 'Report' or 'Clean' button - report will just generate a report without making any modifications, while clean will save all modifications back to the database.

Input Format

When you write a post in WordPress you generally don't need to think about HTML formatting. You type your text, enter a blank line for a new paragraph, and write as you would normally. WordPress, being the clever chappy that it is, is aware of this and will automatically reformat your writing when it comes to viewing your blog, and everything displays correctly.

What's happening behind the scenes is that WordPress is adding HTML paragraph markers around your sections of plain text. While this is great for you (no need to think about HTML), it is a nuisance for am HTML verifier.

To get around this, the plugin allows you to specify the input and output formats. The input format is the current format of your data:

  • Default WordPress - Posts are stored without HTML paragraph formatting
  • Raw XHTML - Posts are stored with HTML paragraph formatting

Output Format

As an extension to the input format, this allows you to tell the plugin whether you want the cleaned code to be stored with HTML paragraph formatting or without.

Why bother with all this input/output formatting malarky?

Flexibility. You may want your data to be cleaned, but you want it stored in the paragraph-less WordPress format. You may want to convert from HTML paragraphs into WordPress format. It's up to you.

Tidy Up Single Post

As well as bulk-reporting on all posts, you can individually clean a single post. When the plugin is enabled, a new column will appear in the Post management screen. Clicking on this will produce a Tidy report on that post, with the capability of then saving it back to the database.

Report

A report will contain entries for each post that was checked:

Tidy Up report

Clicking on 'tidy' will clean that item (saving to the database). Clicking on 'edit' will open an edit box above the messages where you can update the text directly.

Configuration

Some people may want their data cleansed in different ways. HTML Tidy has many configuration options, and these are provided to you through two files located in the plugin directory:

  • wordpress.config - Tidy options when converting to WordPress format
  • xhtml.config - Tidy options when converting to XHTML format

You are free to modify these according to your own preferences and the HTML Tidy documentation.

Console Version

Due to the nature of the plugin it's possible that a scan of your HTML will result in a PHP timeout error. If your webserver is so configured then there is now around this other than using the command line version of the plugin. For this you will need SSH access to your web account:

  1. Change to the /wp-content/plugins/tidy-up directory
  2. Run php tidy_console.php source input [output] >report.html

Where source is:

  • posts
  • comments

And inputis:

  • wordpress
  • xhtml

output is optional and will cause the cleaned data to be written back to the database in the format specified ('wordpress' or 'xhtml').

The console version will output an HTML report to the screen. If you want, you can redirect this to a file (>report.html), and can then view it in your web browser at:

  http://yoursite.com/wp-content/plugins/tidy-up/report.html

For example:

 php tidy_console.php wordpress xhtml >report.html

Warning

I will accept no responsibility for any damage caused to your data. It is possible that the cleansed HTML code breaks existing formatting, corrupts your posts, or starts a feedback loop leading to the breakdown of the universe. You have been warned.

Share This

Comments (page 1 of 4)

  1. Alessandro :

    Aug 14, 2006 6:27 am

    It would be very useful if this plugin checks the integrity of the post also only before <!-- more --> tag, not only on the entire tag, because also the first part of the post will be shown lonely on the summary posts page.

  2. John (author) :

    Aug 6, 2006 2:44 am

    Yes, it appears that your host has prevented executables from running. Possibly you could ask them to enable it for you, otherwise you're out of luck I'm afraid!

  3. Gregg :

    Aug 6, 2006 1:28 am

    I receive the following error
    Warning: proc_open() has been disabled for security reasons

    I have CHMOD the files to 777.

    Does this mean my host does not let me execute exe files?

  4. John (author) :

    Aug 3, 2006 1:07 am

    Thanks for that David, I had underscores and dashes all over the place. I've released version 1.1 which fixes those as well as adding support for excerpts, and some AJAX goodness thrown in as well.

  5. TJ Singleton :

    Jul 26, 2006 9:11 am

    Great stuff. Very useful. I used to to clean up a client site from the invalid mark-up. It'd be great if you could run it on the excerpts too!

  6. David :

    Jul 17, 2006 2:33 am

    John,
    A couple of other (pretty unimportant points). The CSS refers to a folder tidy-up rather than tidy_up (I renamed the folder to avoid 404 errors in my stats) and the link on the plugin page links to this page as http://www.urbangiraffe.com/plugins/tidy_up/ rather than http://www.urbangiraffe.com/plugins/tidy-up/. You may want to check this.
    Cheers. The plugin works great.

  7. John (author) :

    Jul 12, 2006 2:30 am

    aJ: You are correct. Documentation changed

    David: Curious. Looks like WordPress is somehow changing things with the rich text editing. I'll look into it

  8. David :

    Jul 10, 2006 11:31 pm

    Further to what I wrote, turning off rich text editing solves the problem.

  9. David :

    Jul 10, 2006 6:36 am

    This plugin is excellent.
    One problem I have is with the ampersand charactor & &, if I have a properly escaped ampersand in the xhtml of the post, then tidy throws an error, but the w3 validator does not (as it is correct). If I ask tidy to clean the post, it does not appear to change anything, and then does not show an error. However, if I edit the post, the error returns.
    Is anyone else seeing behaviour like this?

  10. aJ :

    Jul 6, 2006 1:07 am

    In the instructions you might want to add them to change the file permissions appropriately. I had to chmod the files before any reports could be generated.

    Thanks for the plugin :)

Leave a comment


XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Home | Software | Terms & Conditions | Sitemap | John Godley © 2008
Close
E-mail It