Software > Requests-AEAweb
AEAweb recently changed its website so Requests-AEAweb no longer works; fixes available soon.Requests-AEAweb is (another) custom Requests class, this time to log directly onto AEAweb.org, the website of the American Economic Association. Requests-AEAweb includes a subclass to directly access bibliographic information and PDFs from the American Economic Review.
Installation
Install Requests-AEAweb with pip
:
$ pip install requests_aeaweb
To install from source, download the latest version on GitHub.com and run the following command:
$ python setup.py install
I have only ever tested Requests-AEAweb on Python 3.4. If you are on a Mac, Python 2.7 is pre-installed by default; upgrade at python.org and follow the instructions in the Installation section of Textatistic to update the
python
command link (or just usepython3
, instead).
Log in
The AEAweb
class logs onto AEAweb.org and establishes a connection with the host. The session
attribute returns a Requests Session object with all the methods of the main Requests API. For example:
>>> from requests_aeaweb import AEAweb
>>> deets = {'username': 'someuser', 'password': 'XXXX'}
>>> conn = AEAweb(login=deets)
>>> conn.url
'https://www.aeaweb.org/'
session
references the original Requests Session object created by AEAweb
—and therefore all of its methods. For example, to get the response code after making a GET request for the webpage of the article "University Differences in the Graduation of Minorities in STEM Fields: Evidence from California":
>>> url = '{}/articles.php'.format(conn.url)
>>> payload = {'doi': '10.1257/aer.20130626'}
>>> request = conn.session.get(url, params=payload)
>>> request.status_code
200
Accessing American Economic Review articles
The AER
subclass contains the html
, pdf
and ref
methods to download the webpage HTML, PDF and bibliographic information of articles published in the American Economic Review, respectively. These methods are very similar (in fact, usually identical) to the corresponding methods described in Requests-Raven. A brief description is provided here.
Use the AER
subclass to log into AEAweb.org and establish a connection to the host (this just invokes AEAweb
):
>>> from requests_aeaweb import AER
>>> conn = AER(login=deets)
Download PDF
To return the contents of a PDF in bytes, use the pdf
method and indicate the document identifier using the id
keyword argument. If you also supply file
with a file name, the PDF is automatically saved for you; otherwise, you'll need to manually save a copy of it using Python's I/O functions.
>>> pdf = conn.pdf(id='10.1257/aer.20130626', file='article.pdf')
If you download more than 100 PDFs in a single session, AEAweb.org blocks your IP address, so keep that in mind. Also, remember that these PDFs are intellectual property owned by the AEA. Please only download them if you are a registered member and then only for personal use.
Download bibliographic data
The ref
method returns a dictionary of bibliographic information on an article. The data include:
Abstract | string | the article's abstract |
Authors | list | each list element is an author-specific dictionary containing the keywords Name (author name) and Affiliation (author affiliation, when available) |
DOI | string | document identifier |
FirstPage | integer | page number of the article's first page |
ISSN | string | journal international standard serial number |
Issue | string | journal issue number |
JEL | list | JEL classification codes |
Journal | string | journal name (obviously the American Economic Review) |
LastPage | integer | page number of the article's last page | PubDate | string | date (YYYY-MM-DD) the article was published |
Title | string | article's title |
Volume | integer | journal volume number |
As before, use the id
keyword argument to fill in the relevant document identifier. In the following example, I use ref
to get the authors and abstract of the paper "Search Design and Broad Matching":
>>> biblio = conn.ref(id='10.1257/aer.20150076')
>>> biblio['Authors']
[{'Affiliation': 'Tel Aviv U and U MI', 'Name': 'Eliaz, Kfir'}, {'Affiliation': 'Tel Aviv U and U College London', 'Name': 'Spiegler, Ran'}]
>>> biblio['Abstract']
>>> 'We study decentralized mechanisms for allocating firms into search pools. The pools are created in response to noisy preference signals provided by consumers, who then browse the pools via costly random sequential search. Surplus-maximizing search pools are implementable in symmetric Nash equilibrium. Full extraction of the maximal surplus is implementable if and only if the distribution of consumer types satisfies a set of simple inequalities, which involve the relative fractions of consumers who like different products and the Bhattacharyya coefficient of similarity between their conditional signal distributions. The optimal mechanism can be simulated by a keyword auction with broad matching.'
Download HTML
Finally, the html
method returns the raw HTML code from an article's webpage. This might be used in conjunction with Beautiful Soup to search metadata not found in ref
. For example, to fetch the value of the property attribute for the first meta tag on "Liquidity Trap and Excessive Leverage":
>>> from bs4 import BeautifulSoup
>>> html = conn.html(id='10.1257/aer.20140289')
>>> soup = BeautifulSoup(html, 'html.parser')
>>> soup.meta['property']
'og:url'
License
Copyright 2016 Erin Hengel
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.