Accessing Remote Resources¶

Web pages and data¶

I have mentioned before how one can access data files on your hard drive, but Python also allows you to access remote data, for example on the internet. The easiest way to do this is to use the requests module. To start off, you just can get the URL:

import requests

response = requests.get('http://xkcd.com/353/')

response holds the response now. You can access the content as text via the text-property:

print(response.text[:1000])  # only print the first 1000 characters

<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" type="text/css" href="/s/b0dcca.css" title="Default"/>
<title>xkcd: Python</title>
<meta http-equiv="X-UA-Compatible" content="IE=edge"/>
<link rel="shortcut icon" href="/s/919f27.ico" type="image/x-icon"/>
<link rel="icon" href="/s/919f27.ico" type="image/x-icon"/>
<link rel="alternate" type="application/atom+xml" title="Atom 1.0" href="/atom.xml"/>
<link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="/rss.xml"/>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');

ga('create', 'UA-25700708-7', 'auto');
ga('send', 'pageview');
</script>
<script type="text/javascript" src="//xkcd.com/1350/jquery.min.js"></script>
<script type="text/javascript" src="//

You can either just use this information directly, or in some cases you might want to write it to a file. Let's download one of the full resolution files for the Ice coverage data from Problem Set 9:

r2 = requests.get('http://mpia.de/~robitaille/share/ice_data/20060313.npy')

r2.text[:200]

u"\x93NUMPY\x01\x00F\x00{'descr': '>f4', 'fortran_order': False, 'shape': (1100, 1000), }    \n\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00@\xe5 ;\xff\xff\xff\xffA\xc2PRB;a\x9d\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff"

However, this doesn't seem to be actual text. Instead, its a binary format. The binary data of the response can be accessed via

r2.content[:200]

"\x93NUMPY\x01\x00F\x00{'descr': '>f4', 'fortran_order': False, 'shape': (1100, 1000), }    \n\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00@\xe5 ;\xff\xff\xff\xffA\xc2PRB;a\x9d\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff"

Note the little b at the beginning indicating a binary byte-string.

Now we can open a new (binary) file and download the data to the file.

f = open('20060313.npy', 'wb')
f.write(r2.content)
f.close()

Let's now load and plot the data:

import numpy as np
data = np.load('20060313.npy')

%matplotlib inline
import matplotlib.pyplot as plt
plt.figure(figsize=(12,12))
plt.imshow(data, origin='lower')

<matplotlib.image.AxesImage at 0x112e03650>

APIs¶

Imagine that you want to access some data online. In some cases, you will need to download a web page and search through the HTML to extract what you want. For example:

r = requests.get('http://www.wetteronline.de/wetter/heidelberg')

r.text[:1000]

u'<!DOCTYPE html>\n<html>\n<head>\n <title>Wetter Heidelberg - aktuelle Wettervorhersage von WetterOnline</title>\n <meta http-equiv="X-UA-Compatible" content="IE=edge" />\n <meta name="description" content="Das Wetter in Heidelberg - Wettervorhersage f&uuml;r heute, morgen und die kommenden Tage mit Wetterbericht und Regenradar von wetteronline.de" />\n <meta name="keywords" content="Wetter Heidelberg, Baden-W\xfcrttemberg, Deutschland, Wetter , Wettervorhersage, Regenradar, Unwetterwarnung, 14 Tage Wetter, 16 Tage Wetter" />\n <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n <meta http-equiv="content-language" content="de-DE" />\n \n  <meta property="fb:admins" content="100001020190994" />\n <meta property="fb:admins" content="1060016694" />\n\n <meta property="og:title" content="Wetter Heidelberg - aktuelle Wettervorhersage von WetterOnline">\n <meta property="og:type" content="article">\n  <meta name="viewport" content="width=1160">\n <meta property="og:image" content="//st.wette'

This is not ideal because it is messy, and also slow if all you want are a couple of values. A number of websites now offer an "Application programming interface" (or API) which is basically a way of accessing data is a machine-readable way. Let's take a look at http://openweathermap.org/ for example, which has an API: http://openweathermap.org/API.

Unfortunately access to this API is no longer possible without registration (it was when this course was designed). But it is not difficult to register. Just sign up at http://openweathermap.org/appid and you will get an account where you will be given a key, which is a string like ab435cdf743df24543d322ac1445dc3a or so.

To access the weather for Heidelberg, you can do:

key = 'ab435cdf743df24543d322ac1445dc3a'   # BUT PLEASE REPLACE THIS FAKE KEY WITH YOUR REAL KEY!
r = requests.get('http://api.openweathermap.org/data/2.5/weather?q=Heidelberg,Germany&APPID='+key)

r.text

u'{"coord":{"lon":8.69,"lat":49.41},"weather":[{"id":802,"main":"Clouds","description":"scattered clouds","icon":"03n"}],"base":"stations","main":{"temp":292.84,"pressure":1011,"humidity":79,"temp_min":290.37,"temp_max":295.37},"wind":{"speed":2.71,"deg":190},"rain":{},"clouds":{"all":44},"dt":1470263053,"sys":{"type":3,"id":10605,"message":0.0523,"country":"DE","sunrise":1470196868,"sunset":1470250832},"id":2907911,"name":"Heidelberg","cod":200}\n'

This is much shorter, but still not ideal for reading into Python as-is. The format above is called JSON, and Python includes a library to easily read in this data:

import json
data = json.loads(r.text)

data

{u'base': u'stations',
 u'clouds': {u'all': 44},
 u'cod': 200,
 u'coord': {u'lat': 49.41, u'lon': 8.69},
 u'dt': 1470263053,
 u'id': 2907911,
 u'main': {u'humidity': 79,
  u'pressure': 1011,
  u'temp': 292.84,
  u'temp_max': 295.37,
  u'temp_min': 290.37},
 u'name': u'Heidelberg',
 u'rain': {},
 u'sys': {u'country': u'DE',
  u'id': 10605,
  u'message': 0.0523,
  u'sunrise': 1470196868,
  u'sunset': 1470250832,
  u'type': 3},
 u'weather': [{u'description': u'scattered clouds',
   u'icon': u'03n',
   u'id': 802,
   u'main': u'Clouds'}],
 u'wind': {u'deg': 190, u'speed': 2.71}}

You should now be able to do:

data[u'main'][u'temp']

292.84

It looks like the temperature is in K!

Another API example: Astronomy¶

APIs are everywhere... For instance, there is an astronomical database service called "virtual observatory" which allows you to get machine-readable data of numerous astronomical objects. At the Astronomische Rechen-Institut here at Heidelberg University a team of developers continuously improve this service. One of the teachers of this course, Dr. Markus Demleitner, is one of these developers. He provided the following example of obtaining information about stars that are known to have exoplanets around them.

r = requests.get('http://heasarc.gsfc.nasa.gov/cgi-bin/'
                 'vo/cone/coneGet.pl?table=exoplanets&',
                 params={"RA": 0, "DEC": 90, "SR": 30})
from io import BytesIO
from astropy.table import Table
t = Table.read(BytesIO(r.content), format="votable")

The object t now contains the coordinates (RA = "right ascension", DEC = "declination") of those stars within a search radius SR away from the north pole (RA=0, DEC=90), as well other information. You can find out which information is there:

print(t.columns)

<TableColumns names=('name','ra','dec','orbital_period','semi_major_axis','star_name','distance','spect_type','Search_Offset')>

and you can print out this information:

t.columns["semi_major_axis"]

You can now make a plot of the positions of these stars on the sky, with symbol size representing the exoplanet's semi-major axis (i.e. distance from its host star):

plt.scatter(t.columns["ra"], t.columns["dec"], 
            s=2+4*np.log(t.columns["semi_major_axis"]))

<matplotlib.collections.PathCollection at 0x11bc225d0>

Exercise¶

You can find over 2000 tiles of the Arctic ice coverage data using the URL with the format:

http://mpia.de/~robitaille/share/ice_data/YYYYMMDD.npy

Write a Python function that takes three arguments - the year, month, and day, as integers, and returns a Numpy array. If the map does not exist, try and return None instead of having an error:

# your solution here

Try using the function to make a plot, as shown above:

# your solution here

1148.00000000
3.90000000
0.07040000
0.05700000
1.60000000
2.30600000
0.49500000
0.92000000
0.38900000
0.87000000
5.00000000
0.02343000
...
1.40000000
0.33000000
0.49000000
0.05664000
0.11340000
0.15510000
0.95000000
2.05000000
4.30000000
1.10000000
6.70000000
0.54000000