HTML and CSS Standards

Webpage Analysis Criteria

Below are criteria by which I judge the quality of a site's HTML/CSS and its accessibility. It starts with a numerical analysis of various features and then evaluates the meaning of those numbers within the given context. There are few absolutes, but there should be neither validation errors nor <font> tags.

The whole purpose of the HTML/CSS distinction is to separate structure from presentation through judicious use of CSS in combination with well-structured HTML. HTML provides a semantic and logical structure to the file while the CSS provides the appearance you want to achieve. The separation also makes it possible to change the appearance of an element throughout a site from one location, the CSS file(s), rather than having to edit each HTML file or some collection of include files. This process also simplifies meeting accessibility standards.

For my own part, all code will validate as correct HTML and CSS and will address accessibility issues. I use XHTML 1.0 Strict unless otherwise required. This most restrictive implementation path ultimately leads to the most flexible, accessible, and easily maintainable code.

There is a glossary at the end of this document just in case any term or especially acronym is unfamiliar. Terms included are italicized and underlined with red dots on first appearence

Contents

  1. HTML
  2. CSS
  3. JavaScript
  4. Flash
  5. Accessibility
  6. Graphs
  7. Glossary

HTML

DOCTYPE:
Is there a DOCTYPE declaration and, more importantly, does the code conform to it? Very often there is a declaration of HTML 4.01 but then the code includes XHTML tag closers on contentless tags (e.g., link, input) or vice-versa. This throws the validator into a tizzy, producing spurious errors and masking real ones. Also, is it a valid declaration so browsers use standards mode instead of quirks mode rendering?

Validation errors:
With standards well-defined and agreed upon everyone should strive to meet those standards. Adherence improves accessibility and uniformity across platforms and user agents. It also reduces the chance of user agents giving wildly different interpretations to the applied CSS (I've seen some really screwy looking material when code violated standards).
There are some recent sophisticated developments in accessibility which use techniques which don't validate under the present system. Those are the only exceptions to validation that should be accepted.

<font> tags:
Totally unnecessary with CSS and a heavy-handed and inflexible way of controlling text. CSS has more ways to achieve more effects. See HTML/CSS vs. the Browsers and W3C QA Tips

<table> tags:
Generally superseded by <div> and CSS for layout purposes. CSS is far more flexible and allows the HTML to be structured in a logical fashion for people who use alternate access methods (e.g., the handicapped); tables often present material in an illogical order. Any page with more than five tables for layout is using them badly—and even five is excessive. I have also seen only one used badly. Tabular data should take full advantage of <th>, <colgroup>, and other structural markup, though colgroup, thead, etc. are not well supported in CSS by current browsers. We hope that will change in the new releases.
For a demonstration of the flexibility provided by abandoning tables for layout purposes see CSS Zen Garden. In the right-hand column you will find alternate views of the same material—exactly the same material since there is no difference in the HTML; the apparent changes are in the applied CSS. There are over 200 designs the webmaster has thought worthy of posting—more than 1000 were rejected. My favorite is “Mozart”, number 189, and another good one for its simplicity and clarity is “CSS Co., Ltd.”, number 209.

Improper inter-element spacing:
Using <br>, &nbsp;, and spacer images is also made obsolete by CSS. Margins and padding can be controlled better and margins can be negative to achieve visual effects like these outdented paragraphs (effect achieved here through a semantic/HTML method).

<div> tags:
The preferred structural element, but can also be over-used with multiple nesting levels just because someone didn't bother to think out structure (tables redux). Machine generation (e.g., PHP, ColdFusion) often prompts people to stop thinking and overload a page (especially when using tables but also with divs).

<h#> tags:
Should be used to give structure to the page and never used solely to size text. They are needed only if the page has a relevant structure. If used, must start with h1 and go in proper nesting order. They serve the same purpose as an outline but the Web page includes the text that fills in the outline. Accessibility standards encourage their use—at least <h1> and a blind informant says he relies heavily on levels of headings while a survey confirms their common use.
For instance, this page uses “HTML and CSS Standards” as its <h1> and the page title, “Webpage Analysis Criteria”, as its <h2>. It then also has several <h3> and <h4> tags to organize subsidiary portions. The size and other display characteristics are controlled by CSS.

<ul> tags:
Should be used for all menus and for anything else that looks like a list. The fact that the menu is horizontal or you don't want bullets is irrelevant; we're talking structure (semantics), not appearance. CSS performs the bulletless horizontal magic (see Zephyr Press for a horizontal menu—top and bottom—and MGA or Wheelchair Mobility for a vertical menu whose buttons are solely CSS creations). See list tips at W3C for more information and resources.

Forms:
Is the form restricted in scope to its place of use? People commit one of two form sins—enclose the whole page in a form even though the actual form is only a small part of the page or break the form across multiple structural elements. Avoiding the latter by committing the former is not a solution. A form whose only purpose is to present a search box should consist of little more than the associated input tags. Feedback and data entry forms may require extensive structural elements within the form—but not the whole page. Locating a form syntactically correctly also seems to be a challenge.

Liquid design
Is the website of fixed width or does it adjust to the window size? A site that looks great on a 1280px wide screen may require horizontal scrolling on a 1024px screen. Now that screens are wide enough, many people want to have two windows open side-by-side. Fixed width sites make this difficult because they don't adjust to the window size. Most of the Wikipedia pages can be scrunched down to less than 500px before they become unuseable. See Web Matters for a more complete explanation with examples.

CSS

Validation errors:
Not generally a severe problem, but sometimes people invent values or even properties or use a value that's not valid for that property. There are also several proprietary or proposed properties that are used, but then cross-browser behaviour is difficult to control. From CSS Zen Garden—“The only real requirement we have is that your CSS validates.”

Efficiency/readability:
This is where WYSInWYG tools truly shine in their stupidity. I have seen rules like elementx { border-top: 1px solid red; padding-bottom: 4px; padding-top: 4px; margin-right: 7px; border-left: red 1px solid; margin-left: 7px; border-bottom: 1px solid red; margin-top: 7px; border-right: solid 1px red; padding-left: 4px; padding-right: 4px; margin-bottom: 7px; }. I've seen this sort of thing many times, often in one seemingly endless and unreadable line, as above; it is not rare (and often font or other information is included randomly just to add to the confusion). If someone actually wants to read the code without having to feed it into some compatible WYSInWYG tool, it will take many minutes to figure out that the rule says elementx { margin: 7px; border: 1px solid red; padding: 4px; }. And the first way probably isn't even the right way when there are differences in the TRBL values.
I format my CSS so it's easier to read than a single line, making the rule above appear in my files as

elementx { 
	margin: 7px; 
	border: 1px solid red; 
	padding: 4px; 
	}

External vs. page-level vs. inline:
As much as possible CSS should be put in external files for sharing across pages. A typical page should have no inline styles and only a minimum of page-level styles. Inline styles should be used only when there is a single usage of that style and it is unlikely to be needed by any other element. Home pages are often different from the rest of a site; a few page-level styles are okay but extensive CSS should be moved to a home page-specific CSS file. Remember, an HTML page can reference as many CSS files as required to do the job and one CSS file can reference others to help provide some coherent structure. I hate trying to read through CSS files that are 20+ screens long (there are a lot of them) just to find the code that applies to some limited portion of a page. Some of those bloated CSS files are a result of the efficiency issue mentioned above, but not all.
As an example of multiple CSS files, King's College London uses a different colour scheme for each major portion of the Website—Undergraduate, Graduate, Research, etc. This is effected by the invocation of different external CSS files whose only function is to control issues surrounding colour (background, border, associated images, etc.). Other CSS files create a consistent appearance across the whole Website. The same is true of Graficsmiths, a site I restructured from using frames and other outdated techniques. These Standards pages also mix and match CSS files to achieve the desired effect.

Text size:
Should be specified relatively rather than absolutely, so it can be resized by users (see HTML/CSS vs. the Browsers for basic type information and two scalable methods).

Class/id names:
Should be chosen to reflect the function of the matter covered and not its appearance (e.g., bad: class="bluebox"; good: id="special-note").

Some things, like class and id name choices, cannot be quantified. Another is where to place rules and yet another is how much repetition to tolerate. These are judgment calls where I choose the highest level that can easily be controlled. For example, people often specify the same font-family for all paragraphs, headings, and table data when it could be specified in the body rule. (e.g., WGBH home page twenty times, but at no point do they use any other font-family, even the default).

A large CSS file is not, by itself, an indication of good CSS usage. Very often CSS is over-specified and underutilized. Bloating occurs from things mentioned already as well as creating many more classes than needed. BBC news has at least seven CSS files, all large, and I couldn't find the organizing element. One class name can occur in multiple files, making it frighteningly difficult to find the rule that applies at any given moment. Yet many such sites still have a <body> tag that specifies the non-standard attributes of marginheight, marginwidth, leftmargin, and topmargin—which are correctly handled in CSS.

There is also an issue called “liquid design”, which refers to building a site so it adjusts to the user's window width. As screens get wider (1280 or 1600 pixels), people often want two windows visible at the same time. Narrow windows with a fixed-width site often end up forcing horizontal scrolling—a true inconvenience most of the time.

For the “how to” basics of CSS usage check A CSS Quick Reference.

JavaScript

Like CSS, as much as possible—all functions—should be in external files to reduce clutter and load times through caching, leaving only the function invocations (onload, etc.) in HTML. JavaScript also needs to have a workaround for user agents that don't recognize JavaScript or have it turned off. Very often people use JS for menus or even create the menu with JS, making navigation of a site impossible for such people. See J Korpela for one example of how to fix a common problem and see the rest of the page for more JavaScript advice—including having the introduction say “Specifically, one should never rely on JavaScript alone in the processing of data entered by user” [my emphasis].

That being said, the Web is constantly evolving and where it started out as an alternate publishing medium, it has recently also acquired the function of an alternate application platform. Instead of writing a document in MSWord or some other word processor and then sending copies (print or electronic) to interested parties, it is now possible to write the document on a word processor accessed through the web, have it immediately available to others, and to allow them to contribute to or modify/edit it themselves. You can also create and submit forms whose contents vary according to initial and evolving conditions, where JavaScript changes the page dynamically, without going back to the server. I don't think the standards have caught up with this situation.

Flash

Flash is almost universally hated by accessibility and usability people. “in the usability field, we've learned that more technical capabilities and a broader set of design options usually translate into more rope for hanging the users.” (http://www.useit.com/alertbox/20021125.html). Flash isn't accessible unless people make a special effort; since most make no effort on the website, why should they make an effort with Flash?

One guide says the content is in the HTML, the layout is in the CSS, and JS and Flash are decoration only. One site I know (which means there must be more) uses JS to create the menu on the client and is thus unusable by the blind, etc. Any number of sites are made solely in Flash.

Accessibility

The W3C Introduction to Web Accessibility and its referenced pages answer more questions than I ever dreamed of asking. It provides the rationale and methods for creating accessible pages.

The Web Accessibility Initiative (WAI) is the W3C set of “Strategies, guidelines, resources to make the Web accessible to people with disabilities.” The method of achieving accessibility is set out in the Web Content Accessibility Guidelines (WCAG 1.0 and 2.0).

WCAG 1.0 consist of fourteen guidelines, each with several checkpoints which are grouped into three priority levels. Some of these checkpoints can be checked with automated tools while others must be checked manually. Version 2.0 is similar, but updated to more recent changes in Web practices and feedback from 1.0.

The U.S. government has standards set forth in Section 508 that government sites and contractors are supposed to follow. They are similar to WAI, but not as rigorous. The U.K. also has its own set of standards, as do other governments.

"Since validation is the first step towards ensuring accessibility" (http://www.w3.org/WAI/AU/reviews/homesite#gl4), simply converting from the old, table-based structure to modern structure and validating the code will reduce the number of accessibility errors and warnings significantly. For instance, the alt attribute for images is required for both validation and WCAG. In addition, WCAG wants the contents of the alt attribute to be meaningful; that has to be a manual check. Eliminating the use of images as spacers thus eliminates all associated errors and warnings.

Several online tools make it relatively easy to find and fix many errors. I use ATRC, FAE, Basic Checker, and Cynthia Says at various times. There are others and if you find one you particularly like, please let me know. I also use Web Developer toolbar to easily turn off CSS and JS, validate pages, and other things.

Unfortunely, though I am intensely interested in accessibility and test all pages, I will never post a claim of passing any particular standard. The problem is that few standards agree with each other and some even violate HTML logic, while others are so obtusely written that I can't understand them.

Skip links:
Generally agreed that they're needed; disagreement on how to implement them. Should they be visible or invisible? How to make them invisible that won't impact one audience or another. Use "skip to content" or "skip to navigation" with the page structured appropriately?

Access keys
Some people use them extensively, even for things I never dreamed of, while others say flat out, don't use them. My blind informant doesn't use them, but someone with limited mobility and who can't use a mouse probably would find them useful.

Headings
(Also see the notes under HTML.) Some people advocate using h2 to mark menus, even if one or more precede the h1. Most accessibility checkers would flag this as an error.

WCAG 1.0 vs. WCAG 2.0 vs. Section 508 vs.
I just got used to working with WCAG 1.0 when they released WCAG 2.0. They are not completely incompatible, but there are some significant changes. And other people have other ideas. Section 508 is too lax, so I do what I think gives a reasonable result—unless the client has specific requirements.

Text size:
Many advocates recommend setting body { font-size: 100%; }. The laudable intent is based on the principle that users set their preferred size in their own stylesheet and any adjustments are based on that. Unfortunatly, that's rarely the case (my computer science PhD nephew is the only person I know) and most Web authors are now graphic artists, rather than techies, who never look at the code because they're using a WYSInWYG tool.

Graphs

Page graph

Websites_as_Graphs

This tool graphs the tags on an individual page within a website, despite its name. It creates a tree of color-coded nodes that gives some idea of how the page is put together. For instance, lots of red says table-based structure and green indicates div-based. Lots of nodes off the body tag indicates a probable lack of structure. All elements within a form should be clustered together (apparently harder than one might think). I've also seen pages where the form tag encloses everything (<body><form>…</form></body>), even though only a few lines, if any, are the real form. Lots of images may indicate their use as spacers. I'd like to see an additional color for lists, since they should be a strong structural element. Not all table tags get colored red; caption, th, thead, tbody, tfoot, col, and colgroup are omitted.

Unbranched chains of red or green indicate nesting that is probably not well thought out and therefore unnecessary.

A reduction in the number of nodes almost invariably means a gain in clarity of structure and with it, easier maintenance and modification.

What do the colors mean?

node size (px) ==> # nodes

Site graph

Recommendations for a good one gratefully accepted.

http://www.touchgraph.com/ have to construct the tool?

VisVIP check this out

Glossary

<…>:
Material enclosed between angle brackets constitute one of many HTML tags and associated attributes which control what appears on the computer screen.

Accessibility:
The concept that Web pages should be structured and constructed in such a way that they are available to the widest range of people possible regardless of access method. Some accommodations are directed at hand-held devices or text-only browsers. Others address disabilities ranging from color-blindness to cognitive and physical impairments (perhaps 10% of the U. S. population).

CSS:
Cascading Style Sheets—a tri-level system of applying rules to control the appearance of Web pages. The rules consist of one or more property/value pairs and can be applied to multiple pages with external files, to a single page, or to a single tag. A CSS Quick Reference gives the basic outline for use.

DOCTYPE:
A formal statement at the beginning of a conforming document of its Document Type Definition (DTD)—a rigourous specification for a language so the user agent knows how to treat what follows. Failure to include a DOCTYPE leaves the user agent to guess at what parsing rules to use and how best to display the document.
Browsers operate in “standards mode” or “quirks mode” based on a correct DOCTYPE. The latter tries to match the bad-old-methods that don't display the same under the former.

HTML:
HyperText Markup Language—the basic language for writing pages that appear on the World Wide Web (WWW). This includes XHTML (eXtensible HTML), which is a subset of XML (eXtensible Markup Language), a more rigourous definition of how a computer language should be structured. Until HTML version 4.0 the language did not have a clear definition that most players accepted and agreed would be the basis for browser and other user agent development.

Tag (W3C often uses “element” to refer to a tag):
The basic structural element of HTML which may include several attribute/value pairs to more precisely control its effect on a Web page.

TRBL:
Top, Right, Bottom, Left (TRouBLe—i.e., stay out of trouble by following this sequence); the sequence for interpreting CSS shortcut properties. For example, the rule img { margin-left: 5px; margin-right; 2px; margin-top: 10px; margin-bottom: 0px } is more simply and clearly written as img { margin: 10px 2px 0px 5px; }.

User agent:
Any device through which a person accesses the Web, whether it be one of the standard browsers, a handheld device (PDA, cell phone), a text-only browser, or screen reader or tactile device for the blind (list not exhaustive).

Validation:
The process of measuring HTML or other code against a precise syntactic definition or other specification of a standard.

W3C:
World Wide Web Consortium—the body responsible for setting standards for the Web, i.e., HTML, CSS, etc. It's members constitute various stakeholders in the Web.

Web or WWW:
Shorthand for World Wide Web. (WWW is sometimes spoken as “dub-dub-dub” to avoid having to say so many syllables.)

Website:
The collection of pages (one or many; static or dynamic) which originate at a single page (generally designated the home page) which is itself uniquely identified by a WWW domain name (e.g., NPR.org).

WYSInWYG:
What you see is NOT what you get—my reformulation of the usual WYSIWYG (What You See Is What You Get) description of a tool that, unlike original computer tools, purported, like MSWord, to immediately reflect the appearance of the final product. A WYSInWYG tool, on the other hand, mimics its namesake but has no hope of actually fulfilling that mission because of internal and external constraints beyond its control. Any visual web authoring tool is WYSInWYG because it uses an internal browser which is of necessity different from all outside browsers.
There is also another similar formulation called WYSINWOG—What You See Is Not What Others Get. Again, the reason is that each browser interprets the code differently and not everyone is using the standard visual browsers. That's why disciplined use of modern methods and standards is necessary.