Fork me on GitHub

Erin Hengel

Software > Requests-AEAweb

AEAweb recently changed its website so Requests-AEAweb no longer works; fixes available soon.

Requests-AEAweb is (another) custom Requests class, this time to log directly onto AEAweb.org, the website of the American Economic Association. Requests-AEAweb includes a subclass to directly access bibliographic information and PDFs from the American Economic Review.

Installation

Install Requests-AEAweb with pip:

$ pip install requests_aeaweb

To install from source, download the latest version on GitHub.com and run the following command:

$ python setup.py install

I have only ever tested Requests-AEAweb on Python 3.4. If you are on a Mac, Python 2.7 is pre-installed by default; upgrade at python.org and follow the instructions in the Installation section of Textatistic to update the python command link (or just use python3, instead).

Log in

The AEAweb class logs onto AEAweb.org and establishes a connection with the host. The session attribute returns a Requests Session object with all the methods of the main Requests API. For example:

>>> from requests_aeaweb import AEAweb
>>> deets = {'username': 'someuser', 'password': 'XXXX'}
>>> conn = AEAweb(login=deets)
>>> conn.url
'https://www.aeaweb.org/'

session references the original Requests Session object created by AEAweb—and therefore all of its methods. For example, to get the response code after making a GET request for the webpage of the article "University Differences in the Graduation of Minorities in STEM Fields: Evidence from California":

>>> url = '{}/articles.php'.format(conn.url)
>>> payload = {'doi': '10.1257/aer.20130626'}
>>> request = conn.session.get(url, params=payload)
>>> request.status_code
200
Accessing American Economic Review articles

The AER subclass contains the html, pdf and ref methods to download the webpage HTML, PDF and bibliographic information of articles published in the American Economic Review, respectively. These methods are very similar (in fact, usually identical) to the corresponding methods described in Requests-Raven. A brief description is provided here.

Use the AER subclass to log into AEAweb.org and establish a connection to the host (this just invokes AEAweb):

>>> from requests_aeaweb import AER
>>> conn = AER(login=deets)
Download PDF

To return the contents of a PDF in bytes, use the pdf method and indicate the document identifier using the id keyword argument. If you also supply file with a file name, the PDF is automatically saved for you; otherwise, you'll need to manually save a copy of it using Python's I/O functions.

>>> pdf = conn.pdf(id='10.1257/aer.20130626', file='article.pdf')

If you download more than 100 PDFs in a single session, AEAweb.org blocks your IP address, so keep that in mind. Also, remember that these PDFs are intellectual property owned by the AEA. Please only download them if you are a registered member and then only for personal use.

Download bibliographic data

The ref method returns a dictionary of bibliographic information on an article. The data include:

Abstract string the article's abstract
Authors list each list element is an author-specific dictionary containing the keywords Name (author name) and Affiliation (author affiliation, when available)
DOI string document identifier
FirstPage integer page number of the article's first page
ISSN string journal international standard serial number
Issue string journal issue number
JEL list JEL classification codes
Journal string journal name (obviously the American Economic Review)
LastPage integer page number of the article's last page
PubDate string date (YYYY-MM-DD) the article was published
Title string article's title
Volume integer journal volume number

As before, use the id keyword argument to fill in the relevant document identifier. In the following example, I use ref to get the authors and abstract of the paper "Search Design and Broad Matching":

>>> biblio = conn.ref(id='10.1257/aer.20150076')
>>> biblio['Authors']
[{'Affiliation': 'Tel Aviv U and U MI', 'Name': 'Eliaz, Kfir'}, {'Affiliation': 'Tel Aviv U and U College London', 'Name': 'Spiegler, Ran'}]
>>> biblio['Abstract']
>>> 'We study decentralized mechanisms for allocating firms into search  pools. The pools are created in response to noisy preference signals  provided by consumers, who then browse the pools via costly  random sequential search. Surplus-maximizing search pools are  implementable in symmetric Nash equilibrium. Full extraction of the  maximal surplus is implementable if and only if the distribution of  consumer types satisfies a set of simple inequalities, which involve  the relative fractions of consumers who like different products and  the Bhattacharyya coefficient of similarity between their conditional  signal distributions. The optimal mechanism can be simulated by a  keyword auction with broad matching.'
Download HTML

Finally, the html method returns the raw HTML code from an article's webpage. This might be used in conjunction with Beautiful Soup to search metadata not found in ref. For example, to fetch the value of the property attribute for the first meta tag on "Liquidity Trap and Excessive Leverage":

>>> from bs4 import BeautifulSoup
>>> html = conn.html(id='10.1257/aer.20140289')
>>> soup = BeautifulSoup(html, 'html.parser')
>>> soup.meta['property']
'og:url'

License

Copyright 2016 Erin Hengel

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.