GPO makes available XML bulk downloads of House bills
The announcement heralds the aggregation in one place of House bill information in XML format rather than any new information being made available; the GPO has already for a couple years made XML-formatted House bills individually available, noted Sunlight Foundation policy counsel Daniel Schuman in a blog post.
"It is a right step, but it isn't a huge improvement, it's an incremental improvement," Schuman said in a brief interview.
Data associated with House bills posted online at the official Library of Congress legislative tracking website THOMAS, such as current status and Congressional Research Service summary, won't be available in the XML-formatted data from the GPO.
Third party websites that seek to mirror THOMAS in a more accessible, user-friendly and feature-rich manner must still resort to screen scraping to gather that related data.
Nonetheless, the measure is a significant policy development, Schuman said, since they show a determination in the House to push for making legislative information available in a machine readable format. Through docs.house.gov, the House Clerk makes available bills to be considered each week in XML and only this month started publishing committee information, as well.
In a user's guide (.pdf), GPO addresses the matter of data authenticity, an issue that's been held by critics as an obstacle to dissemination via XML. The GPO-distributed XML files are not digitally signed, the guide says, but the integrity of a House bill XML file can be verified "by checking its SHA-256 hash value against the hash value recorded in the PREMIS metadata file for each bill on FDsys."
- download GPO's announcement of the bulk downloading (.pdf)