HTML (Hypertext Markup Language) is a text-based approach to describing how content contained within an HTML file is structured. This markup tells a web browser how to display text, images and other forms of multimedia on a webpage.
HTML is a formal recommendation by the World Wide Web Consortium (W3C) and is generally adhered to by all major web browsers, including both desktop and mobile web browsers. HTML5 is the latest version of the specification.
How HTML works
HTML is a text file containing specific syntax, file and naming conventions that show the computer and the web server that it is in HTML and should be read as such. By applying these HTML conventions to a text file in virtually any text editor, a user can write and design a basic webpage, and then upload it to the internet.
The most basic of HTML conventions is the inclusion of a document type declaration at the beginning of the text file. This always comes first in the document, because it is the piece that affirmatively informs a computer that this is an HTML file. The document header typically looks like this: <!DOCTYPE html>. It should always be written that way, without any content inside it or breaking it up. Any content that comes before this declaration will not be recognized as HTML by a computer.
Doctypes are not just used for HTML, they can apply to the creation of any document that uses SGML (Standard Generalized Markup Language). SGML is a standard for specifying a specific markup language being used. HTML is one of several markup languages that SGML and doctype declarations apply to.
The other critical requirement for creating an HTML file is saving it with a .html file extension. Whereas the doctype declaration signals HTML to the computer from the inside of the file, the file extension signals HTML to the computer from the outside of the file. By having both, a computer can tell that it's an HTML file whether it's reading the file or not. This becomes especially important when uploading the files to the web, because the web server needs to know what to do with the files before it can send them to a client computer for the inner contents to be read.
After writing the doctype and saving as an HTML file, a user can implement all the other syntactic tools of HTML to customize a web page. Once finished, they will likely have several HTML files corresponding to various pages of the website. It's important that the user uploads these files in the same hierarchy that they saved them in, as each page references the specific file paths of the other pages, enabling links between them. Uploading them in a different order will cause links to break and pages to be lost, because the specified file paths will not match the pages.
Basic elements of HTML
Using HTML, a text file is further marked up with additional text describing how the document should be displayed. To keep the markup separate from the actual content of the HTML file, there is a special, distinguishing HTML syntax that is used. These special components are known as HTML tags. The tags can contain name-value pairs known as attributes, and a piece of content that is enclosed within a tag is referred to as an HTML element.
HTML elements always have opening tags, content in the middle and closing tags. Attributes can provide additional information about the element and are included in the opening tag. Elements can be described in one of two ways:
- Block-level elements start on a new line in the document and take up their own space. Examples of these elements include headings and paragraph tags.
- Inline elements do not start on a new line in the document and only take up necessary space. These elements usually format the contents of block-level elements. Examples of inline elements include hyperlinks and text format tags.
Pros and cons of HTML
Pros of using HTML include:
- Is widely adopted with a large amount of resources available.
- Is natively run on every browser.
- Is relatively easy to learn.
- Has a clean and consistent source code.
- Is open source and free to use.
- Can be integrated with other backend programming languages such as PHP.
A few cons to consider are:
- Does not have very dynamic functionality and is mainly used for static web pages.
- All components must be created separately even if they use similar elements.
- Browser behavior can be unpredictable. For example, older browsers may not be compatible with newer features.
Commonly used HTML tags
HTML tags dictate the overall structure of a page and how the elements within them will be displayed in the browser. Commonly used HTML tags include:
- <h1> which describes a top-level heading.
- <h2> which describes a second-level heading.
- <p> which describes a paragraph.
- <table> which describes tabular data.
- <ol> which describes an ordered list of information.
- <ul> which describes an unordered list of information.
As mentioned, there are opening and closing tags that surround the content they are augmenting. An opening tag looks like this: <p>. A closing tag is the same but contains a backslash in it to indicate that it's the end of the given HTML element. Closing tags look like this: </p>.
How to use and implement HTML
Because HTML is completely text-based, an HTML file can be edited simply by opening it up in a program such as Notepad++, Vi or Emacs. Any text editor can be used to create or edit an HTML file and, so long as it is named with an .html file extension, any web browser -- such as Chrome or Firefox -- will be capable of displaying the file as a webpage.
For professional software developers, there are a variety of WYSIWYG editors to develop webpages. NetBeans, IntelliJ, Eclipse and Microsoft's Visual Studio provide WYSIWYG editors as either plugins or as standard components, making it incredibly easy to use and implement HTML.
These WYSIWYG editors also provide HTML troubleshooting facilities, although modern web browsers often contain web developer plugins that will highlight problems with HTML pages, such as a missing closing tag or syntax that does not create well-formed HTML.
Chrome and Firefox both include HTML developer tools that allow for the immediate viewing of a webpage's complete HTML file, along with the ability to edit HTML on the fly and immediately incorporate changes within the internet browser.
For example, if a user wants the color of a certain amount of text to be red, they can write code in the CSS file with an accompanying class attribute that turns text red. Then they can place the associated class attribute on all the pieces of text they want to be red in the HTML sheet. The same basic method applies to JS sheets, with different functions.
Separating information about how a page is structured, the role of HTML, from the information about how a webpage looks when it is rendered in a browser is a software development pattern and best practice known as separation of concerns.
History and development
In the early days of the world wide web, marking up text-based documents using HTML syntax was more than sufficient to facilitate the sharing of academic documents and technical memos. However, as the internet expanded beyond the walls of academia and into the homes of the general population, greater demand was placed on webpages in terms of formatting and interactivity.
HTML 4.01 was released in 1999, at a time when the internet was not yet a household name, and HTML5 was not standardized until 2014. During this time, HTML markup drifted from the job of simply describing the document structure of webpage content into the role of also describing how content should look when a webpage displays it.
As a result, HTML4-based webpages often included information within a tag about what font to use when displaying text, what color should be used for the background and how content should be aligned. Describing within an HTML tag how an HTML element should be formatted when rendered on a webpage is considered an HTML antipattern. HTML should generally describe how content is structured, not how it will be styled and rendered within a browser. Other markup languages are better suited to this task.
One major difference between HTML4 and HTML5 is that the separation of concerns pattern is more rigorously enforced in HTML5 than it was in HTML4. With HTML5, the bold <b> and italicize <i> tags have been deprecated. For the paragraph tag, the align attribute has been completely removed from the HTML specification.
The following is a list of HTML versions and the years they were created. Several iterations of each version have been released over time. This list aims to focus on significant iterations.
- HTML 1.0 -- released in 1992 -- had very limited capability and around 20 elements.
- HTML 2.0 -- released in 1995 -- began to incorporate elements relating to math functions.
- HTML 3.2 -- released in 1996 -- dropped the math function initiative altogether, and fixed overlap between various proprietary extensions.
- HTML 4.0 -- released in 1997 -- offered three variations which differed in the number of deprecated elements that were allowed.
- HTML 4.01 -- released in 1999 -- largely the same as 4.0.
- HTML 5 -- released in 2014 -- came after a long break in updates because the organization that developed it -- W3C -- was focusing on another, parallel language called XHTML.
- HTML 5.1 -- released in 2016 -- aimed to more easily accommodate various types of media embedding with new tags.
- HTML 5.2 -- released in 2017 -- aimed to be equally understandable by humans and computers.
- HTML 5.3 -- yet to be released -- W3C is collaborating with WHATWG on a new version. The collaboration began in 2019.
Features of HTML5
HTML5 introduces several elements to increase interactivity, multimedia capabilities and semantic efficiency. Instead of using plugins, multimedia can be placed within the HTML code. These elements include:
- Graphics elements:
- <svg>, which is a container for scalable vector graphics (SVG).
- Semantic elements:
- <header>, which creates a header at the top of the page.
- <footer>, which creates a footer at the bottom of the page.
- <article>, which creates an area for independent content.
- <section>, which defines sections and subsections such as chapters, or headers and footers when more than one are necessary.
- <nav>, which creates a navigation menu.
- Multimedia elements:
- <audio>, which describes MP3 files, WAV files and OGG files in HTML.
- <video>, which describes MP4, WebM and OGG video types.
- Attributes that apply to the <form> element, which creates an area for user input on the web page. These include number, date, calendar and range.
Other main features of HTML5 include:
- Elimination of outmoded or redundant attributes.
- Offline editing.
- The ability to drag and drop between HTML5 documents.
- Messaging enhancements.
- Detailed parsing
- MIME and protocol handler registration.
- A common standard for storing data in SQL databases (Web SQL).
- Application program interfaces (API) for complex applications.
- Accommodations for mobile device app development.
- MathML for mathematical and scientific formulas.
While the addition of these features represents an effort to support multimedia embedding, changes to the HTML specification demonstrate the desire of the community for HTML to return to its original purpose of describing the structure of content. Basically, more structural features have been added, while several format-centric features have been deprecated. For the purpose of backward-compatibility, web browsers will continue to support deprecated HTML tags, but eventually HTML will be mainly structure-based.
HTML syntax standards
In the following HTML example, there are two HTML elements. Both elements use the same paragraph tag, designated with the letter p, and both use the directional attribute dir, although a different attribute value is assigned to the HTML attribute's name-value pairing, namely rtl and ltr.
Notice that when this HTML snippet is rendered in a browser, the HTML tags impact how each HTML element is displayed on the page, but none of the HTML tags or attributes are displayed. HTML simply describes how to render the content. The HTML itself is never displayed to the end user.
In order for a web browser to display an HTML page without error, it must be provided with well-formed HTML. To be well-formed, each HTML element must be contained within an opening tag -- <p> -- and a closing tag -- </p>. Furthermore, any new tag opened within another tag must be closed before the containing tag is closed. So, for example, <h1><p>well-formed HTML</p></h1> is well-formed HTML, while <h1><p>well-formed HTML</h1></p> is not well-formed HTML.
Another syntax rule is that HTML attributes should be enclosed within single or double quotes. There is often debate about which format is technically correct, but the World Wide Web Consortium asserts that both approaches are acceptable.
The best advice for choosing between single and double quotes is to keep the usage consistent across all the documents. HTML style-checkers can be used to enforce consistent use across pages. It should be noted that sometimes using a single quote is required, such as in an instance where an attribute value actually contains a double quote character. The reverse is true as well.
It's important to note as well that the language HTML works with is basic English. Non-English characters -- or letters -- such as Chinese, or special symbols -- like letters with accent marks -- may not display correctly on a webpage by default. In order to accommodate special character sets, users need to specify the character encoding with an element that looks like this: <meta charset="utf-8"/>. In this case, utf-8 is the character set. Utf-8 is HTML's default English charset.