Urlparse¶
This module defines a standard interface to break Uniform Resource Locator (URL) strings up in components (addressing scheme, network location, path etc.)
-
parse
(url)¶ Returns a tuple (scheme, netloc, path, query, fragment) derived from url.
This corresponds to the general structure of a URL:
scheme://netloc/path?query#fragment
. Each tuple item is a string, possibly empty. The components are not broken up in smaller parts (for example, the network location is a single string), and % escapes are not expanded. The delimiters as shown above are not part of the result, except for a leading slash in the path component, which is retained if present. For example:import urlparse o = urlparse(‘http://www.cwi.nl:80/%7Eguido/Python.html‘)
# result is (‘http’, ‘www.cwi.nl:80’, ‘/%7Eguido/Python.html’, ‘’, ‘’)
Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.
-
parse_netloc
(netloc)¶ Given netloc as parsed by
parse()
, breaks it in its component returning a tuple (user, password, host, port). Each component of the returned tuple is a string.
-
quote
(s)¶ Return the urlencoded version of s.
Urlencoding transforms unsafe bytes to their %XX representation where XX is the hex value of the byte.
Safe bytes are:
- lowercase letters from “a” to “z” (bytes from 0x61 to 0x7a)
- uppercase letters from “A” to “Z” (bytes from 0x41 to 0x5a)
- numbers from “0” to “9” (bytes from 0x30 to 0x39)
- the following symbols: $-_.+!*’()
-
unquote
(s)¶ If s is urlencoded, returns s with every
+
substituted with a space and every%xx
substituted with the corresponding character.
-
parse_qs
(qs)¶ Parses a query string qs and returns a dictionary containing the association between keys and values of qs. Values are urldecoded by
unquote()
.
-
urlencode
(data)¶ Tranforms data dictionary in a urlencoded query string and returns the query string. Each pair (key, value) is encoded by quote_via function. By default,
quote_plus()
is used to quote the values. An alternate function that can be passed as quote_via isquote()
.