Verbaendeliste-Bundestag Extractor

Use pdftohtml to get an XML file from the pdf.

pdftohtml -xml input.pdf output.xml

Then use the extractor with first and last relevant page number to convert to parsed JSON:

python extract_lobby.py 4 690 < lobbylist.xml > lobbylist.json

License: MIT-License

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
verbaendeliste_bundestag.py		verbaendeliste_bundestag.py

Provide feedback