Documentation for pygaR

pygaR is a Python module (with R package wrapper) that searches the SEC EDGAR system in a variety of methods. It is cross platform (Linux/UNIX/Windows), and works on either the Python 2 or 3 branches.

More information on the SEC EDGAR system can be found here:

https://www.sec.gov/edgar.shtml

This project is a module on PyPI:

https://bcable.net/x/pygaR/pypi

This project’s source code can be found here:

https://bcable.net/x/pygaR

This project’s Sphinx source code can be found here:

https://bcable.net/x/pygaR/sphinx

Requirements

Python 2.7+, 3.3+

PyPI Modules: certifi, pycurl

Installation

Easy installation can be done via the ‘pip’ command:

pip install pygar

Or on Windows:

\path\to\python.exe install pygar

After installation a simple ‘import pygar’ command will get you started.

Usage

pygar.form(path, **kwargs)

Query a single form filing from the SEC EDGAR system.

path – The path of the EDGAR form starting with “edgar/”

Returns a dictionary with two keys, “Header” and “Body”, representing the contents of each. Header contains the SEC filing header information. Body contains a sequence of documents contained in the form.

pygar.master(**kwargs)

Query a master file or sequence of master files from the SEC EDGAR system.

Master files are indexes into the form filings provided.

Three options are available for querying master data:

Date: Specify a ‘date’ on the command line.

Single Quarter: Specify a ‘qtr’ on the command line.

Quarter Range: Specify both ‘startqtr’ and ‘endqtr’ on the command line. Quarters are then merged together.

Options:

date – Specific date to query in format 20170504

qtr – Specific quarter to query in format 201702

startqtr – Beginning of a quarter range in format 201701

endqtr – Beginning of a quarter range in format 201702

Filters:

Filters can be passed as strings or as regular expressions. If passed as a regular string, it is interpreted as an exact match. If passed as a regula expression, the values of the specified field are compared to your regular expression expressed as ‘/COMPANYNAME/’ or ‘@COMPANYNAME@’. To specify case insensitivity, use ‘/cOmPaNyNaMe/i’ or ‘@cOmPaNyNaMe@i’.

cik – A CIK is an SEC specific unique company identifier.

company – This searches for a company name.

form – Form filing type such as “4”, “10-K”, “10-Q”, “DEF 14A” (there are hundreds of types, look at the EDGAR website for documentation)

date_filed – Date the form was filed.

filename – Filename in the EDGAR system.

Returns a list. First list item is a list of header names returned. Remaining list items are values of the returned rows.

pygar.search(**kwargs)

Performs a query on either the header strings or the XML body of a sequence of filings, and flattens the result.

Selection Options (see pygar.master’s documentation for more information):

date – Specific date to query in format 20170504

qtr – Specific quarter to query in format 201702

startqtr – Beginning of a quarter range in format 201701

endqtr – Beginning of a quarter range in format 201702

Search Options:

header – Boolean representing if to search the header information. Default false.

query – Dictionary representing the query to parse. Query format and descriptions are described.

Filters (see Filters section on pygar.master for more information), these filter the master files to scrape before digging into the forms and applying your query string:

cik – A CIK is an SEC specific unique company identifier.

company – This searches for a company name.

form – Form filing type such as “4”, “10-K”, “10-Q”, “DEF 14A” (there are hundreds of types, look at the EDGAR website for documentation)

date_filed – Date the form was filed.

filename – Filename in the EDGAR system.

Query Syntax:

Queries are a way to prune and extract data from either the header or the XML root of the first XML document. This will first prune the document based on your criteria, then extract the data.

If you are parsing a dictionary or an XML document with the structure:

{“root”: [{“subelement”: {“value”: 17}}, {“subelement”: {“value”: 23}}]}

With XML equivalent:

<root><subelement><value>17</value></subelement><subelement><value>23</value></subelement></root>

To retrieve these values you can use the following query:

{“root”: {“subelement”: {“value”: “elementValue”}}}

Which will return a standard pygar.master() result set with each form queried with an additional header name “elementValue”. Because two results were returned, the row will be broken up into two rows as a result set with the first row containing 17 as the value of elementValue, and the second row containing 23. Note that the “elementValue” text can be whatever you want to name this header field on output.

To prune, specify the “_prune” keyword at the base of the element you wish to prune off. For instance with the previous example:

{“root”: {“subelement”: {“_prune”: {“value”: {“_lt”: 23}}, “value”: “elementValue”}}}

Will return only one row, the elementValue of 17. This uses the “_lt” (or less than) operator to prune any values off that are less than 23.

All available pruning operators are as follows:

_eq – equal to operator

_lt – less than operator

_gt – greater than operator

_le – less than or equal operator

_ge – greater than or equal operator

_z – verifies value is an empty string

_n – verifies value is NOT an empty string

Returns a list. First list item is a list of header names returned. Remaining list items are values of the returned rows.

Indices and Tables