XML Data
Storing and working with XML data in BlobCity
BlobCity offers exhaustive support for XML storage. It allows storage and retrieval of XML, along with running queries on XML as if the data was in a tabular structure.
A schema inference engine processes the XML document or object being submitted to BlobCity and maps to a tabular schema for appropriate indexing and search-ability. The XML elements are converted to column names, and the values encapsulated within the elements associate with the value of the particular column. Nesting of XML element is permitted with the nested tags forming a column name by concatenating the tags separated by a ‘.’ (dot).
Let us start with an empty table and insert the following XML document into it.
<?xml version="1.0" encoding="UTF-8"?>
<user>
<name>John</name>
<age>45</age>
</user>
The collection after taking the insert of the above XML appears as shown below.
_id | user.name | user.age |
---|---|---|
xxx | John | 45 |
The insert operation results in two columns getting automatically created within the collection. The _id
column is a self created mandatory column in each table, and acts as a way for the database to differentiate between records. The columns created due to insert of the XML are named user.name
and user.age
. Although not visible within a collection schema, the name
and age
columns are actually placed within the user
column. The user
column will not show up in a collection schema but can be used as a column in a select query. The user
column however may not be used as in a where
clause
select `user` from ds.Users
select * from ds.Users where `user` = '....'
The response of all SQL queries is defaulted to a JSON array and is consistent with the response received when JSON was inserted into the table. The XML is automatically mapped to equivalent JSON when running a SQL query. The database however stores the XML, and it can be fetched in its original form if explicitly specified. However the record cannot be retrieved in XML form as part of the SQL search query.
SELECT * FROM ds.Users
[
{
"user": {
"name": "John",
"age": 45
}
}
]
The response is a JSON array with each record being a JSON object entry within the array. A major difference between the working of the JSON and working of the XML document is that the name
and age
value actually gets nested within the user
key as was present in the original document. This may happen to be the case for most XML documents or objects you use. A standard practice with XML encapsulates the entries within a root tag, which results in the nesting.
The root tag and the XML format specification tag is not mandatory for BlobCity to infer the XML document. Let us see the table with an insert without the xml
element at the beginning of the document.
<user>
<name>Tom</name>
<age>26</age>
</user>
_id | user.name | user.age |
---|---|---|
xxx | John | 45 |
xxx | Tom | 26 |
The insert is successful with the collection getting an additional record. The xxx
is the placeholder for auto-defined values, with the actual record containing a unique value that is auto generated by the database.
To explicitly specify the _id
value for any element, the following XML structure needs to be used.
<?xml version="1.0" encoding="UTF-8"?>
<_id>1000</_id>
<user>
<name>Mary</name>
<age>30</age>
</user>
_id | user.name | user.age |
---|---|---|
xxx | John | 45 |
xxx | Tom | 26 |
1000 | Mary | 30 |
The _id
value for the new record will corresponding to the _id
passed at time of insert.
Nested XML's are also supported. An example of a nested XML is shown below.
<?xml version="1.0" encoding="UTF-8"?>
<user>
<name>Stacy</name>
<age>43</age>
<addr>
<line1>line1</line1>
<line2></line2>
<state>IL</state>
<zip>60002</zip>
</addr>
</user>
_id | user.name | user.age | user.addr.line1 | user.addr.line2 | user.addr.state | user.addr |
---|---|---|---|---|---|---|
xxx | John | 45 | ||||
xxx | Tom | 26 | ||||
1000 | Mary | 30 | ||||
xxx | Stacy | 43 | line1 | IL | 60002 |
Collection post insertion of a nested XML
Four new columns are added into the table. These columns are added under a sub-column of user.addr
. While user.addr
is not an actual column that stores data.user.addr.line1
, user.addr.line2
, user.addr.state
, user.addr.zip
are actual columns. However user.addr
can be used in a n SQL query to select all of its nested columns.
The created columns will remain in the schema unless explicitly deleted, even if the record for Stacy is deleted.
No root tag
While XML documents by general convention have a root tag, the same can be skipped when inserting an XML into BlobCity. The database supports interpretation of XML documents that have an XML like structure without a root tag present. The root tag in the XML that we used so far has been the <user></user>
tag.
<?xml version="1.0" encoding="UTF-8"?>
<name>Stacy</name>
<age>43</age>
<addr>
<line1>line1</line1>
<line2></line2>
<state>IL</state>
<zip>60002</zip>
</addr>
_id | name | age | addr.line1 | addr.line2 | addr.state | addr.zip |
---|---|---|---|---|---|---|
xxx | Stacy | 43 | line1 | IL | 60002 |
When the above XML is inserted in a new empty table, the columns created are not under the user
root tag as was previously the case. It is important to note that the structure of this table is exactly same as the table created by inserting an equivalent JSON object.
Arrays in XML
When XML documents have a tag that is repeated, depending on the position of the tag, the inference engine consider occurrence of the tag as separate records, or a collection field within a single record. If the root tag is repeated, and no other tag occurs at the root level, then each root tag and its contents are treated as a separate record. If sub-tags are repeated, or a tag at the root level is repeated, but there are other tags other than the repeated tag at the root level, then contents of each repeated tag are collected into the single column as a collection field.
Updated about 6 years ago