Xml – Basic Principles

by Silan Liu

Comparing XML with OO Language. 2

Encoding Binary Data. 3

Namespaces. 4

Define a Moniker for a Namespace 4

Two Fundamental XML Namespaces 4

Location Information of the Referenced Namespace 4

Namespace Scope Issues 5

Simple Types. 6

By Constraint 6

By List 6

By Union 6

Complex Types. 7

XML Schema and Instance. 8

Polymorphism.. 9

Substitution Group 9

More Varieties in Polymorphism 10

 

Comparing XML with OO Language

XML as a language has lots of similarities when compared with object-oriented languages such as C#.  C# is used to describe two aspects of types – data and behaviour, while XML is only used to describe the data aspect of types. In this aspect, XML is very similar to C#. An in-depth comparisons between an OO language such as C# and XML will make the understanding of XML a lot easier for programmers.

Both C# and XML allow you to define complex types. The basic building material C# provides to build complex types are built-in types such as int, float, double, string, DataTime, etc., and keywords that are used to define or reference types, such as class, struct, enum, using, namespace, etc. In a complex type, members not only have their own types but also their names, for example, an Employee type may have three members: two strings and one int, with names as EmployeeName, Age and Address.

In XML, there are similar built-in types, but no keywords. All keywords and members of complex types are represented by elements. The highest-level types in XML world are document-level types, represented by schema documents, and the highest-level instances are document-level instances represented by instance documents. Both the document-level types and document-level instances do not have to have their name defined, not like a C# class.

The following schema file says: there is a type/class called “MyDataSet”, which describes a type of document-level instance, which contains one data member called “Quote” of type float:

<?xml version=’1.0’?>

<schema id=”MyDataSet” xmlns=”http://www.w3.org/2001/XMLSchema.xsd”>

   <element name=”Quote” type=”float”>

</schema>

An instance of this type can be:

<?xml version=’1.0’>

<Quote>125.5</Quote>

In C#, all members of a complex type must exist in the instance (it can be null), while in XML, an element can be missing from the instance if you assign “0” to element’s attribute minOccurs in the definition of the element in the schema. There is one exception: elements under schema element, the top-level elements, can be missing from the instance document without specified using minOccurs attribute. In other words, if a schema contains three elements, a valid instance document can contain one, two or all of them.

Do not mistake attribute minOccurs for nillable, which indicates whether the value of the element can be null.

In XML, normally a complex type has a name, but when it is defined within an element as the type of the element, it does not need a name.

Encoding Binary Data

In XML, everything is represented by readible text in angle brackets. How to convert binary data into XML? There are two built-in binary data types in XML: base64Binary and hexBinary, which are a string of characters representing a block of binary data. You can encode between binary data in byte arrays and these two types using XmlTextReader’s ReadBinHex and ReadBase64 methods and XmlTextWriter’s WriteBinHex and WriteBase64 methods.

Namespaces

Both in C# and in XML, you define types in a namespace, and you often need to reference types defined in other namespaces. The namespaces in XML are often represented by URLs which are not necessarily resolvable.

To define the namespace for the schema file so that all types and elements in this schema file are defined in this namespace:

<?xml version=’1.0’?>

<schema targetNamespace=”http://www.silan.com.au/Marketing”

...

To reference something defined in another namespace:

<?xml version=’1.0’?>

<schema xmlns=”http://www.w3.org/2001/XMLSchema”

...

This defines a default namespace for all unqualified items contained within the schema element. The referencing can happen in an element at any level.

Define a Moniker for a Namespace

If everything referenced in the document are defined in one namespace, then you only need to reference this namespace once in the schema element. But in reality you often need to reference multiple namespaces, and you often need to reference the same namespace multiple times.

To simplify referencing the same namespace multiple times, you can define a moniker in the schema element and use this moniker in place of the namespace hereinafter. The following example defines a “ProductInfo” element in namespace “urn:Service” using two types (ProductId and AUDollar) defined in two other namespaces:

<?xml version=’1.0’?>

<schema xmlns=”http://www.w3.org/2001/XMLSchema”

 targetNamespace=”urn:Service”

 xmlns:nsmarket=”http://www.silan.com.au/marketing”

 xmlns:nsprod=”http://www.silan.com.au/production”>

 

   <element name=”ProdInfo”>

       <complexType>

          <element name=”Product” type=’nsprod:ProductId’>

          <element name=”Quote” type=’nsmarket:AUDollar’>

       <complexType>

   </element>

...

Note the difference between “xmlns=” and “xmlns:moniker=”. The former actually references a namespace, while the later merely defines a moniker for a namespace to be used later.

Two Fundamental XML Namespaces

In XML, the built-in types and elements defined for schemas are in namespace “http://www.w3.org/2001/XMLSchema”. Its moniker by convention is xsd. Those for instance documents are in namespace “http://www.w3.org/2001/XMLSchema-instance”. Its conventional moniker is xsi.

Location Information of the Referenced Namespace

Because a namespace is not necessarily resolvable, you could use schemaLocation attribute to provide extra location hint. The following example references an element defined in namespace “urn:Services” and uses schemaLocation to provide location information about the referenced schema file:

<ProdInfo xmlns=”urn:Service

 xsi:schemaLocation=”urn:Service http://www.silan.com.au/Service/Foo.xsd”>

   ...

To reference something defined in no namespace:

<Bar xsi:noNamespaceSchemaLocation=”file:Foo.xsd”>

   ...

Namespace Scope Issues

Look at the following schema file:

<?xml version=’1.0’?>

<schema xmlns=’http://www.w3.org/2001/XMLSchema’

 targetNamespace=’urn:Service’>

 

   <element name=’Employee’>

      <complexType>

          <element name=’Name’ type=’string’ />

          <element name=’Age’ type=’int’ />

      </complexType>

   </element>

 

</schema>

Now is the following instance valid?

<?xml version=’1.0’?>

<Employee xmlns=’urn:Service’>

   <Name>Silan Liu</Name>

   <Age>18</Age>

</Employee>

The answer is no. Because unqualified local elements and attributes defined within a complex type such as “Name” and “Age” definition does not belong to any namespace. But the instance document says that “Name” and “Age” element all belong to namespace “urn:Service”.

To specify that  “Name” and “Age” belong to the same namespace as their containing element:

<?xml version=’1.0’?>

<schema xmlns=’http://www.w3.org/2001/XMLSchema’

 targetNamespace=’urn:Service’>

 

   <element name=’Employee’>

      <complexType>

          <element name=’Name’ type=’string’ form=’qualified’ />

          <element name=’Age’ type=’int’ form=’qualified’ />

      </complexType>

   </element>

 

</schema>

To specify that all local elements and attributes are qualified all across the schema document, assign “qualified” to schema’s attribute elementFormDefault and attributeFormDefault:

<?xml version=’1.0’?>

<schema xmlns=’http://www.w3.org/2001/XMLSchema’

 targetNamespace=’urn:Service’

 elementFormDefault=’qualified’ attributeFormDefault=’qualified’>

 

   <element name=’Employee’>

      <complexType>

          <element name=’Name’ type=’string’ />

          <element name=’Age’ type=’int’ />

      </complexType>

   </element>

 

</schema>

Simple Types

Between built-in types and complex types that contain members of other types, XML allows to define simple types, which are derived fom built-in types by constraint, by list, or by union.

By Constraint

The constraints that can be applied on built-in types are length, minLength, maxLength, pattern, enumeration, whiteSpace, mininclusive, minExclusive, maxInclusive, maxExclusive, totalDigits, fractionDigits. Some examples:

<simpleType name=”EmployeeId”>

   <restriction base=”int”>

       <maxLength value=”4”/>

       <minInclusive value=”0”/>

   <restriction/>

</simpleType>

 

<simpleType name=”Department”>

   <restriction base=”string”>

       <enumeration value=”Production”/>

       <enumeration value=”Marketing”/>

       <enumeration value=”R&#38D”/>

       <enumeration value=”HR”/>

       <enumeration value=”Purchasing”/>

   <restriction/>

</simpleType>

By List

Deriving by list is different from by enumeration in that an instance of the simple type can contain one or more of the listed items, while in enumeration it can only contain one:

<simpleType name=”GroupComposition”>

   <list base=”Production Marketing R&#38D HR Purchasing” />

</simpleType>

Items in the list are white space-delimited.

By Union

<simpleType name=”MyUnion1”>

   <union memberTypes=”string int float” />

</simpleType>

 

<simpleType name=” MyUnion2”>

   <union>

       <simpleType ... />

       <simpleType ... />

       ...

   < union />

</simpleType>

Complex Types

Complex types are defined with the complexType element and are made up of member elements which can be of any type. They must derive from a base type either by restriction or extension. These behaviours are specified with restriction or extension element and their attribute base. If not specified, it is by default deriving by extension from anyType. The content type of the complex type can be specified by element simpleContent or complexContent. If not specified, it is by default complexContent. Element sequence indicates that the sequence of the member elements should be observed, just like in a C# class.

Example:

<complexType name=’Family’>

   <sequence>

      <element name=’Parent’ type=’string’ minOccurs=’1’ maxOccurs=’2’ />

      <element name=’Child’ type=’string’ minOccurs=’0’ />

   </sequence>

</complexType>

 

<complexType name=’SingleParentFamily’>

   <complexContent>

      <restriction base=’Family’>

          <element name=’Parent’ type=’string’ minOccurs=’1’ maxOccurs=’1’ />

          <element name=’Child’ type=’string’ minOccurs=’0’ />

      </restriction>

   </complexContent>

</complexType>

 

<complexType name=’BiggerFamily’>

   <complexContent>

      <extension base=’Family’>

          <element name=’Parent’ type=’string’ minOccurs=’1’ maxOccurs=’2’ />

          <element name=’Child’ type=’string’ minOccurs=’0’ />

          <element name=’Tenant’ type=’string’ minOccurs=’0’ />

      </extension >

   </complexContent>

</complexType>

 

<complexType name=’int’>

   <simpleContent>

      <extension base=’int’>

          <attribute name=’id’ type=’ID’ />

          <attribute name=’href’ type=’uriReference’ />

      </extension>

   <simpleContent>

</complexType>

XML Schema and Instance

Look at the following schema document “DS.xsd”:

<?xml version="1.0" encoding="utf-8" ?>

<xs:schema id="NewDataSet" elementFormDefault="qualified" attributeFormDefault="qualified" xmlns="http://tempuri.org/DS.xsd" xmlns:mstns="http://tempuri.org/DS.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">

   <xs:element name="NewDataSet" msdata:IsDataSet="true">

      <xs:complexType>

          <xs:choice maxOccurs="unbounded">

             <xs:element name="Table">

              <xs:complexType>

                 <xs:sequence>

                    <xs:element name="EmployeeID" msdata:ReadOnly="true" msdata:AutoIncrement="true" type="xs:int" />

                    <xs:element name="LastName" type="xs:string" />

                    <xs:element name="FirstName" type="xs:string" />

                 </xs:sequence>

              </xs:complexType>

             </xs:element>

          </xs:choice>

      </xs:complexType>

      <xs:unique name="DSKey1" msdata:PrimaryKey="true">

          <xs:selector xpath=".//mstns:Employees" />

          <xs:field xpath="mstns:EmployeeID" />

      </xs:unique>

   </xs:element>

</xs:schema>

and the following instance document “Employees.xml”:

<?xml version="1.0" standalone="yes"?>

<NewDataSet>

  <Table>

    <EmployeeID>1</EmployeeID>

    <LastName>Davolio</LastName>

    <FirstName>Nancy</FirstName>

    <Title>Sales Representative</Title>

  </Table>

  <Table>

    <EmployeeID>8</EmployeeID>

    <LastName>Callahan</LastName>

    <FirstName>Laura</FirstName>

    <Title>Inside Sales Coordinator</Title>

  </Table>

  <Table>

    <EmployeeID>9</EmployeeID>

    <LastName>Dodsworth</LastName>

    <FirstName>Anne</FirstName>

    <Title>Sales Representative</Title>

  </Table>

</NewDataSet>

and the following code in C#:

mds = new DataSet();

mds.ReadXmlSchema("DS.xsd");

mds.ReadXml("Employees.xml");

mdg.DataSource = mds;

mdg.DataMember = mds.Tables[0].TableName;

where mdg is a DataGrid, which will be populated with the Employees rows. This indicates that the data complies to the schema. If we change any of the element name or data type to make them non-compliant, the reading of the instance file will throw an exception.

Polymorphism

Just like in any OO language, polymorphism in XML means a phenomenon that when the schema files expects a base type, the instance can contain a derived type instead. This is based on the fact we have mentioned above – in XML all types are derived from some base types by restriction or extension.

If you consider the base type to contain insufficient information and want to forbid the base type to have an instance, assign “true” to complexType’s attribute abstract.

<?xml version=’1.0’?>

<schema xmlns=’http://www.w3.org/2001/XMLSchema’

 targetNamespace=’http://www.someaccountant.com/service’

 xmlns:tns=’http://www.someaccountant.com/service’

 elementFormDefault=’qualified’ attributeFormDefault=’qualified’>

   <complexType name=’Family’ abstract=’true’>

      <element name=’Parent’ type=’string’ />

      <element name=’Child’ type=’string’ />

   </complexType>

 

   <complexType name=’FamilyForTaxPurposes’>

      <complexContent>

          <extension base=’Family’>

             <element name=’Parent’ type=’string’ />

             <element name=’Child’ type=’string’ />

             <element name=’FamilyIncome’ type=’float’ />

             <element name=’Address’ type=’string’ />

          </extension>

      </complexContent>

   </complexType>

 

   <!–- Defines an element –->

   <element name=’TheFamily’ type=’tns:Family />

 

</schema>

The instance document could be:

<?xml version=’1.0’?>

 

<someacc:TheFamily xsi:type=someacc:FamilyForTaxPurposes

 xmlns:xsi=’http://www.w3.org/2001/XMLSchema-instance’

 xmlns:someacc=’http://www.someaccountant.com/service’ >

   <someacc:Parent>Frank</someacc:Parent>

   <someacc:Parent>Yang</someacc:Parent>

   <someacc:Income>12345.67</someacc:Income>

</someacc:TheFamily>

Substitution Group

An alternative is to use substitution group for polymorphic behaviour:

<?xml version=’1.0’?>

<schema xmlns=’http://www.w3.org/2001/XMLSchema’

 targetNamespace=’http://www.someaccountant.com/service’

 xmlns:tns=’http://www.someaccountant.com/service’

 elementFormDefault=’qualified’ attributeFormDefault=’qualified’>

   <complexType name=’Family’ abstract=’true’>

      <element name=’Parent’ type=’string’ />

      <element name=’Child’ type=’string’ />

   </complexType>

 

   <complexType name=’FamilyForTaxPurposes’>

      <complexContent>

          <extension base=’Family’>

             <element name=’Parent’ type=’string’ />

             <element name=’Child’ type=’string’ />

             <element name=’FamilyIncome’ type=’float’ />

             <element name=’Address’ type=’string’ />

          </extension>

      </complexContent>

   </complexType>

 

   <!–- Defines an element –->

   <element ref=’tns:FamilySub’ />

 

   <!–- Defines a substitution group –->

   <element name=’FamilyForSub’ type=tns:Family abstract=’true’ />

   <element name=’FamilyForTaxPurposesSub’ type=tns:’FamilyForTaxPurposes’ sustitutionGroup=’tns:FamilySub/>

 

</schema>

The instance document could be:

<?xml version=’1.0’?>

 

<someacc:FamilyForTaxPurposesSub

 xmlns:xsi=’http://www.w3.org/2001/XMLSchema-instance’

 xmlns:someacc=’http://www.someaccountant.com/service’ >

   <someacc:Parent>Frank</someacc:Parent>

   <someacc:Parent>Yang</someacc:Parent>

   <someacc:Income>12345.67</someacc:Income>

</someacc:FamilyForTaxPurposesSub>

More Varieties in Polymorphism

To disallow both inheritance by extension and restriction, assign “#all” to complexType’s attribute final. To disallow inheritance by restriction, assign “restriction” to final. To disallow inheritance by extension, assign “extension” to final.

To disallow the instance of a certain type of inheritance from appearing in the instance document, assin “restriction”, “extension”, “substitution” or “#all” to complexType’s attribute block.