Parsing XML Files with Python: Good Coding Practice

This idea of this post came from building a Python application to parse some XML. The XML file is an Alteryx Workflow file; the application was conceived to check the values of attributes and texts in certain nodes to ensure proper usage of Alteryx Tools.

Alteryx is a data analytics software.

As the code was being built, it felt like a good practice for getting a hang of Python's data types and how to manipulate them. XML data is a common sight and it is helpful to know how to work with it.

By the way, the XPATH support is elementary. Supported syntax can be found at the documentation page

More info can be found at:

https://docs.python.org/3/library/xml.etree.elementtree.html

This article is not an entire walkthrough of the code, rather some snippets of it of what was used. For new comers to coding, it can seem like there are so many things learn and do, options paralysis might kick in. Since coding generally is about dealing with data types to achieve some end goal(grossly oversimplifying). A practice like that will help to learn the ins and outs of a new coding language and is a highly useful 

Along the same lines, one could practice building some code to consume JSON data from an API end point or maybe from a SQL database. Then do something with the data


Here are the modules used for the exercise:

To begin, we have to initialize tree and then get the root.

From here on it we will have to use XPATH to obtain either all matches with findall() or find()



One of the helper functions to get the data we want from the XML tree :

Calling the function and passing different arguments each time



Another of the helper  functions this one extracts a nested dictionary(dictionary within a dictionary). We are mapping a dictionary as a value of an outer dictionary. The key of the outer dictionary is the argument we passed in, in this case it is the a tool ID

By using the function the queried tool ID and matching values are mapped together.


Code is still work in progress and maybe it will go up a GitHub repo someday once it is done.


















Comments

Popular posts from this blog

Test Driven Development: How and Why

Using Git pre-commit hooks: Automated Testing of Code with Python Unit Tests

Five Things I Wish I Knew When Starting with Python