Quickstart¶
Note
All doctests in this documentation use Python 3.3 syntax.
>>> from urlobject import URLObject
Create a URLObject with a string representing a URL.
URLObject
is a regular subclass of unicode
(or str
if you’re using Python 3), it just has several properties and methods which
make it easier to manipulate URLs. All the basic slots from urlsplit are there:
>>> url = URLObject("https://github.com/zacharyvoase/urlobject?spam=eggs#foo")
>>> print(url)
https://github.com/zacharyvoase/urlobject?spam=eggs#foo
>>> print(url.scheme)
https
>>> print(url.netloc)
github.com
>>> print(url.hostname)
github.com
>>> (url.username, url.password)
(None, None)
>>> print(url.port)
None
>>> url.default_port
443
>>> print(url.path)
/zacharyvoase/urlobject
>>> print(url.query)
spam=eggs
>>> print(url.fragment)
foo
You can replace any of these slots using a with_*()
method. Remember that,
because unicode
(and therefore URLObject
) is immutable,
these methods all return new URLs:
>>> print(url.with_scheme('http'))
http://github.com/zacharyvoase/urlobject?spam=eggs#foo
>>> print(url.with_netloc('example.com'))
https://example.com/zacharyvoase/urlobject?spam=eggs#foo
>>> print(url.with_auth('alice', '1234'))
https://alice:1234@github.com/zacharyvoase/urlobject?spam=eggs#foo
>>> print(url.with_path('/some_page'))
https://github.com/some_page?spam=eggs#foo
>>> print(url.with_query('funtimes=yay'))
https://github.com/zacharyvoase/urlobject?funtimes=yay#foo
>>> print(url.with_fragment('example'))
https://github.com/zacharyvoase/urlobject?spam=eggs#example
For the query and fragment, without_
methods also exist:
>>> print(url.without_query())
https://github.com/zacharyvoase/urlobject#foo
>>> print(url.without_fragment())
https://github.com/zacharyvoase/urlobject?spam=eggs
Relative URL Resolution¶
You can resolve relative URLs against a URLObject using
relative()
:
>>> print(url.relative('another-project'))
https://github.com/zacharyvoase/another-project
>>> print(url.relative('?different-query-string'))
https://github.com/zacharyvoase/urlobject?different-query-string
>>> print(url.relative('#frag'))
https://github.com/zacharyvoase/urlobject?spam=eggs#frag
Absolute URLs will just be returned as-is:
>>> print(url.relative('http://example.com/foo'))
http://example.com/foo
And you can specify as much or as little of the new URL as you like:
>>> print(url.relative('//example.com/foo'))
https://example.com/foo
>>> print(url.relative('/dvxhouse/intessa'))
https://github.com/dvxhouse/intessa
>>> print(url.relative('/dvxhouse/intessa?foo=bar'))
https://github.com/dvxhouse/intessa?foo=bar
>>> print(url.relative('/dvxhouse/intessa?foo=bar#baz'))
https://github.com/dvxhouse/intessa?foo=bar#baz
Path¶
The path
property is an instance of URLPath
,
which has several methods and properties for manipulating the path string:
>>> print(url.path)
/zacharyvoase/urlobject
>>> print(url.path.parent)
/zacharyvoase/
>>> print(url.path.segments)
('zacharyvoase', 'urlobject')
>>> print(url.path.add_segment('subnode'))
/zacharyvoase/urlobject/subnode
>>> print(url.path.root)
/
Some of these are aliased on the URL itself:
>>> print(url.parent)
https://github.com/zacharyvoase/?spam=eggs#foo
>>> print(url.add_path_segment('subnode'))
https://github.com/zacharyvoase/urlobject/subnode?spam=eggs#foo
>>> print(url.add_path('tree/urlobject2'))
https://github.com/zacharyvoase/urlobject/tree/urlobject2?spam=eggs#foo
>>> print(url.root)
https://github.com/?spam=eggs#foo
Query string¶
The query
property is an instance of
QueryString
, so you can access sub-attributes of that with richer
representations of the query string:
>>> print(url.query)
spam=eggs
>>> url.query.list # aliased as url.query_list
[('spam', 'eggs')]
>>> url.query.dict # aliased as url.query_dict
{'spam': 'eggs'}
>>> url.query.multi_dict # aliased as url.query_multi_dict
{'spam': ['eggs']}
Modifying the query string is easy, too. You can ‘add’ or ‘set’ parameters: any
method beginning with add_
will allow you to use the same parameter name
multiple times in the query string; methods beginning with set_
will only
allow one value for a given parameter name. Don’t forget that each method will
return a new QueryString
instance, unattached to the original URL:
>>> print(url.query.add_param('spam', 'ham'))
spam=eggs&spam=ham
>>> print(url.query.set_param('spam', 'ham'))
spam=ham
>>> print(url.query.add_params({'spam': 'ham', 'foo': 'bar'}))
spam=eggs&foo=bar&spam=ham
>>> print(url.query.set_params({'spam': 'ham', 'foo': 'bar'}))
foo=bar&spam=ham
Delete parameters with del_param()
and del_params()
. These will remove
any and all appearances of the requested parameter name from the query string,
returning a new query string:
>>> print(url.query.del_param('spam')) # Result is empty
<BLANKLINE>
>>> print(url.query.add_params({'foo': 'bar', 'baz': 'blah'}).del_params(['spam', 'foo']))
baz=blah
Again, some of these methods are aliased on the URLObject
directly:
>>> print(url.add_query_param('spam', 'ham'))
https://github.com/zacharyvoase/urlobject?spam=eggs&spam=ham#foo
>>> print(url.set_query_param('spam', 'ham'))
https://github.com/zacharyvoase/urlobject?spam=ham#foo
>>> print(url.del_query_param('spam'))
https://github.com/zacharyvoase/urlobject#foo
Next Steps¶
Check out the API documentation for a detailed description of all
the properties and methods available on URLObject
.