Uniform Resource Locator | Part 1

Uniform Resource Locator | Part 1

Uniform Resource Locator

A Uniform Resource Locator, more commonly referred to as URLs (acronym for Uniform Resource Locator) is a sequence of characters, according to a standard format and standard that is used to name resources on the Internet for locating or identifying, as e.g., textual documents, images, videos, digital presentations, etcetera.

Uniform Resource Locators were a fundamental innovation in the history of the Internet. They were first used by Tim Berners-Lee in 1991, to allow document authors to establish hyperlinks on the World Wide Web. Since 1994, the standards of the Internet, the concept of a URL has been incorporated into the more general URI (uniform resource identifier, in Spanish uniform resource identifier), but the term URL is still used widely.

Though never mentioned as such in any standard, many people believe that the initials mean universal resource locator URL (universal resource locator). This interpretation may be due to the fact that although the U in URL has always stood for “uniform”, the U of URI meant at first “universal”, before the publication of RFC 2396.

The URL is the character string which is assigned a unique address to each of the information resources available on the Internet. There is a unique URL for each page of each of the documents on the World Wide Web, for all elements of Gopher and all USENET discussion groups, and so on.

The URL of an information resource is its website, which allows the browser to find and display it properly. Therefore the URL combines the name of the computer that provides information, the directory where the file name, and the protocol to use to retrieve the data.

URL Definition

The general format of a URL is:
scheme: / / machine / directory / file

You can also add other information:
scheme: / / username: password @ host: port / directory / file

For example: http://es.Wikipedia.org/

The detailed specification is in RFC 1738, entitled Uniform Resource Locators.

Schema URL

A URL is classified by its outline, which usually indicates the network protocol used to retrieve, through the network, the information resource identified. A URL starts with the name of his scheme, followed by a colon, followed by a specific part of the scheme ‘.

Examples of URL schemes:

Http – HTTP resources
Https – HTTP over SSL
FTP – File Transfer Protocol
Mailto – email addresses
Ldap – LDAP searches Lightweight Directory Access Protocol
File – recourses available in the local system or on a local network
News – Usenet newsgroups (newsgroup)
Gopher – Gopher protocol (already obsolete)
Telnet – telnet protocol
Data – the schema for inserting small pieces of content in documents Data: URL

Some URL schemes, such as the popular “mailto”, “http”, “ftp” and “file”, along with the general URL syntax was first detailed in 1994 in the Request for Comments RFC 1630, replaced a year later by the more specific RFC 1738 and RFC 1808.

Some of the schemes defined in the first RFC are still valid, while others are discussed or have been refined by later standards. Meanwhile, the definition of the general syntax of the URL was split into two separate lines URI specification: RFC 2396 (1998) and RFC 2732 (1999), both now obsolete but still widely referred to in the definitions of URL schemes.

The current standard is STD 66 / RFC 3986 (2005).

URL Generic Syntax

All URLs, regardless of the scheme should follow a general syntax. Each scheme can determine their own requirements for their specific syntax, but the full URL should follow the general syntax.

Using a limited set of characters, compatible with the printable ASCII subset, the generic syntax allows the URL representing the address of a resource, regardless of the original shape of the steering components.

The typical schemes using connection-based protocols use a common syntax for “generic URI” as defined below:

scheme: / / authority / path? query # fragment

The authority is usually in the name or IP address of a server, sometimes followed by a colon (“:”) and a TCP port number. It may also include a user name and password to authenticate to the server.

The route is to specify a location in a hierarchical structure, using a slash (“/”) as the delimiter between components.

The consultation usually indicates a dynamic query parameters to a database or process resident on the server.

The fragment identifies a portion of a resource, usually a location in a document.
Example: HTTP URL

The URL used by HTTP, the protocol used to transmit web pages, is the most popular type of URL and can be used for display as an example. The syntax of an HTTP URL is:

scheme: / / host: port / path? parameter = value # link

Scheme, in the case of HTTP, in most cases amounts to http, but https can be when it comes to HTTP over TLS connection (to make the connection more secure).

Most Web browsers allow use of outline: / / username: password @ host: port / . for HTTP authentication. This format has been used as a “feat” to make it difficult to identify correctly the server involved. Consequently, support for this format has been shelved by some browsers. Section 3.2.1 of RFC 3986 recommends that browsers should display the user / password otherwise than in the address bar, because of security problems mentioned above and because the display never passwords as clear text.

Host, which is probably the part that protrudes from a URL, is in almost all cases the domain name server, e.g., www.wikipedia.org, google.com, etc..

The serving: port specifies a TCP port number. Usually it is omitted (in this case it defaults ES80) and probably, for the user is what has less relevance in the entire URL.

The path portion used by the server (host specified) in any way that your software provides, but in many cases is used to specify a file name, possibly preceded by directory names. For example, in the path / wiki / Vaca, wiki would be a (pseudo-) directory and Cow would be a (pseudo-) filename.

Part as shown above? Parameter = value is known as query portion (or also search portion). May be omitted, there may be a single-value parameter pair as in the example, or may have many of them, which is expressed as: param = value & value otroParam = &…. The parameter-value pairs are relevant only if the file specified by path is not a simple, static website, but some kind of automatically generated page. The software generator uses the parameter-value pairs in any way in the establishment, mostly carrying specific information to a user and a time to use the site, such as specific search terms, user names, etc. (Note, for example, how it behaves in the URL in the address bar of your browser for a Google search: the search term is a sophisticated program passed as a parameter to google.com and the Google program returns a page of search results.)

Part # link, finally, is known as a fragment identifier and refers to certain significant places within a page, for example, this page has internal links to each section header which can be directed using the ID fragment. This is relevant when a URL of a page already loaded in a browser lets you skip to a certain point in a page long. An example is this link that leads to this same page and the beginning of this section. (Notice how it changes the URL in the address bar of your browser when you click the link.)