by Silan Liu
Comparing XML with OO Language
Define a Moniker for a Namespace
Two Fundamental XML Namespaces
Location Information of the Referenced Namespace
More Varieties in Polymorphism
XML as a
language has lots of similarities when compared with object-oriented languages
such as C#. C# is used to describe two
aspects of types – data and behaviour, while XML is only used to describe the
data aspect of types. In this aspect, XML is very similar to C#. An in-depth
comparisons between an OO language such as C# and XML will make the
understanding of XML a lot easier for programmers.
Both C# and
XML allow you to define complex types. The basic building material C# provides
to build complex types are built-in types such as int, float, double,
string, DataTime, etc., and keywords that are used to define or
reference types, such as class, struct, enum, using,
namespace, etc. In a complex type, members not only have their own types
but also their names, for example, an Employee type may have three members: two
strings and one int, with names as EmployeeName, Age and Address.
In XML,
there are similar built-in types, but no keywords. All keywords and members of
complex types are represented by elements. The highest-level types in XML world
are document-level types, represented by schema documents, and the
highest-level instances are document-level instances represented by instance
documents. Both the document-level types and document-level instances do
not have to have their name defined, not like a C# class.
The
following schema file says: there is a type/class called “MyDataSet”, which
describes a type of document-level instance, which contains one data member
called “Quote” of type float:
<?xml
version=’1.0’?>
<schema
id=”MyDataSet” xmlns=”http://www.w3.org/2001/XMLSchema.xsd”>
<element
name=”Quote” type=”float”>
</schema>
An instance
of this type can be:
<?xml
version=’1.0’>
<Quote>125.5</Quote>
In C#, all
members of a complex type must exist in the instance (it can be null), while in
XML, an element can be missing from the instance if you assign “0” to element’s
attribute minOccurs in the definition of the element in the schema.
There is one exception: elements under schema element, the top-level elements,
can be missing from the instance document without specified using minOccurs
attribute. In other words, if a schema contains three elements, a valid
instance document can contain one, two or all of them.
Do not
mistake attribute minOccurs for nillable, which indicates whether
the value of the element can be null.
In XML,
normally a complex type has a name, but when it is defined within an element as
the type of the element, it does not need a name.
In XML,
everything is represented by readible text in angle brackets. How to convert
binary data into XML? There are two built-in binary data types in XML: base64Binary
and hexBinary, which are a string of characters representing a block of
binary data. You can encode between binary data in byte arrays and these two
types using XmlTextReader’s ReadBinHex and ReadBase64
methods and XmlTextWriter’s WriteBinHex and WriteBase64
methods.
Both in C#
and in XML, you define types in a namespace, and you often need to reference
types defined in other namespaces. The namespaces in XML are often represented
by URLs which are not necessarily resolvable.
To define
the namespace for the schema file so that all types and elements in this schema
file are defined in this namespace:
<?xml
version=’1.0’?>
<schema targetNamespace=”http://www.silan.com.au/Marketing”
...
To
reference something defined in another namespace:
<?xml
version=’1.0’?>
<schema xmlns=”http://www.w3.org/2001/XMLSchema”
...
This
defines a default namespace for all unqualified items contained within the schema
element. The referencing can happen in an element at any level.
If
everything referenced in the document are defined in one namespace, then you
only need to reference this namespace once in the schema element. But in
reality you often need to reference multiple namespaces, and you often need to
reference the same namespace multiple times.
To simplify
referencing the same namespace multiple times, you can define a moniker in the
schema element and use this moniker in place of the namespace hereinafter. The
following example defines a “ProductInfo” element in namespace “urn:Service”
using two types (ProductId and AUDollar) defined in two other namespaces:
<?xml
version=’1.0’?>
<schema
xmlns=”http://www.w3.org/2001/XMLSchema”
targetNamespace=”urn:Service”
xmlns:nsmarket=”http://www.silan.com.au/marketing”
xmlns:nsprod=”http://www.silan.com.au/production”>
<element name=”ProdInfo”>
<complexType>
<element name=”Product” type=’nsprod:ProductId’>
<element name=”Quote” type=’nsmarket:AUDollar’>
<complexType>
</element>
...
Note the difference
between “xmlns=” and “xmlns:moniker=”. The former actually references a
namespace, while the later merely defines a moniker for a namespace to be used
later.
In XML, the
built-in types and elements defined for schemas are in namespace “http://www.w3.org/2001/XMLSchema”.
Its moniker by convention is xsd. Those for instance documents are in
namespace “http://www.w3.org/2001/XMLSchema-instance”. Its conventional
moniker is xsi.
Because a
namespace is not necessarily resolvable, you could use schemaLocation
attribute to provide extra location hint. The following example references an
element defined in namespace “urn:Services” and uses schemaLocation to
provide location information about the referenced schema file:
<ProdInfo
xmlns=”urn:Service”
xsi:schemaLocation=”urn:Service
http://www.silan.com.au/Service/Foo.xsd”>
...
To
reference something defined in no namespace:
<Bar
xsi:noNamespaceSchemaLocation=”file:Foo.xsd”>
...
Look at the
following schema file:
<?xml
version=’1.0’?>
<schema
xmlns=’http://www.w3.org/2001/XMLSchema’
targetNamespace=’urn:Service’>
<element name=’Employee’>
<complexType>
<element name=’Name’ type=’string’ />
<element name=’Age’ type=’int’ />
</complexType>
</element>
</schema>
Now is the
following instance valid?
<?xml
version=’1.0’?>
<Employee
xmlns=’urn:Service’>
<Name>Silan
Liu</Name>
<Age>18</Age>
</Employee>
The answer
is no. Because unqualified local elements and attributes defined within a
complex type such as “Name” and “Age” definition does not belong to any
namespace. But the instance document says that “Name” and “Age” element all
belong to namespace “urn:Service”.
To specify
that “Name” and “Age” belong to the
same namespace as their containing element:
<?xml
version=’1.0’?>
<schema
xmlns=’http://www.w3.org/2001/XMLSchema’
targetNamespace=’urn:Service’>
<element name=’Employee’>
<complexType>
<element name=’Name’ type=’string’ form=’qualified’ />
<element name=’Age’ type=’int’ form=’qualified’ />
</complexType>
</element>
</schema>
<?xml
version=’1.0’?>
<schema
xmlns=’http://www.w3.org/2001/XMLSchema’
targetNamespace=’urn:Service’
elementFormDefault=’qualified’
attributeFormDefault=’qualified’>
<element name=’Employee’>
<complexType>
<element name=’Name’ type=’string’
/>
<element name=’Age’ type=’int’
/>
</complexType>
</element>
</schema>
Between
built-in types and complex types that contain members of other types, XML
allows to define simple types, which are derived fom built-in types by
constraint, by list, or by union.
The
constraints that can be applied on built-in types are length, minLength,
maxLength, pattern, enumeration, whiteSpace, mininclusive,
minExclusive, maxInclusive, maxExclusive, totalDigits,
fractionDigits. Some examples:
<simpleType name=”EmployeeId”>
<restriction
base=”int”>
<maxLength
value=”4”/>
<minInclusive
value=”0”/>
<restriction/>
</simpleType>
<simpleType name=”Department”>
<restriction
base=”string”>
<enumeration
value=”Production”/>
<enumeration
value=”Marketing”/>
<enumeration
value=”R&D”/>
<enumeration
value=”HR”/>
<enumeration
value=”Purchasing”/>
<restriction/>
</simpleType>
Deriving by
list is different from by enumeration in that an instance of the simple type
can contain one or more of the listed items, while in enumeration it can only
contain one:
<simpleType name=”GroupComposition”>
<list
base=”Production Marketing R&D HR
Purchasing” />
</simpleType>
Items in
the list are white space-delimited.
<simpleType name=”MyUnion1”>
<union
memberTypes=”string int float” />
</simpleType>
<simpleType name=” MyUnion2”>
<union>
<simpleType ... />
<simpleType ... />
...
< union />
</simpleType>
Complex
types are defined with the complexType element and are made up of member
elements which can be of any type. They must derive from a base type either by
restriction or extension. These behaviours are specified with restriction
or extension element and their attribute base. If not specified,
it is by default deriving by extension from anyType. The content
type of the complex type can be specified by element simpleContent or complexContent.
If not specified, it is by default complexContent. Element sequence
indicates that the sequence of the member elements should be observed, just
like in a C# class.
Example:
<complexType
name=’Family’>
<sequence>
<element name=’Parent’ type=’string’
minOccurs=’1’ maxOccurs=’2’ />
<element name=’Child’ type=’string’
minOccurs=’0’ />
</sequence>
</complexType>
<complexType
name=’SingleParentFamily’>
<complexContent>
<restriction
base=’Family’>
<element name=’Parent’
type=’string’ minOccurs=’1’ maxOccurs=’1’
/>
<element
name=’Child’ type=’string’ minOccurs=’0’ />
</restriction>
</complexContent>
</complexType>
<complexType
name=’BiggerFamily’>
<complexContent>
<extension
base=’Family’>
<element name=’Parent’
type=’string’ minOccurs=’1’ maxOccurs=’2’ />
<element name=’Child’ type=’string’
minOccurs=’0’ />
<element name=’Tenant’
type=’string’ minOccurs=’0’ />
</extension >
</complexContent>
</complexType>
<complexType
name=’int’>
<simpleContent>
<extension base=’int’>
<attribute name=’id’ type=’ID’
/>
<attribute name=’href’
type=’uriReference’ />
</extension>
<simpleContent>
</complexType>
Look at the
following schema document “DS.xsd”:
<?xml version="1.0" encoding="utf-8"
?>
<xs:schema id="NewDataSet" elementFormDefault="qualified" attributeFormDefault="qualified" xmlns="http://tempuri.org/DS.xsd" xmlns:mstns="http://tempuri.org/DS.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xs:element name="NewDataSet" msdata:IsDataSet="true">
<xs:complexType>
<xs:choice maxOccurs="unbounded">
<xs:element name="Table">
<xs:complexType>
<xs:sequence>
<xs:element name="EmployeeID" msdata:ReadOnly="true" msdata:AutoIncrement="true"
type="xs:int" />
<xs:element name="LastName" type="xs:string"
/>
<xs:element name="FirstName" type="xs:string"
/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
<xs:unique name="DSKey1" msdata:PrimaryKey="true">
<xs:selector xpath=".//mstns:Employees" />
<xs:field xpath="mstns:EmployeeID" />
</xs:unique>
</xs:element>
</xs:schema>
and the
following instance document “Employees.xml”:
<?xml version="1.0" standalone="yes"?>
<NewDataSet>
<Table>
<EmployeeID>1</EmployeeID>
<LastName>Davolio</LastName>
<FirstName>Nancy</FirstName>
<Title>Sales Representative</Title>
</Table>
<Table>
<EmployeeID>8</EmployeeID>
<LastName>Callahan</LastName>
<FirstName>Laura</FirstName>
<Title>Inside
Sales Coordinator</Title>
</Table>
<Table>
<EmployeeID>9</EmployeeID>
<LastName>Dodsworth</LastName>
<FirstName>Anne</FirstName>
<Title>Sales
Representative</Title>
</Table>
</NewDataSet>
and the
following code in C#:
mds = new DataSet();
mds.ReadXmlSchema("DS.xsd");
mds.ReadXml("Employees.xml");
mdg.DataSource
= mds;
mdg.DataMember
= mds.Tables[0].TableName;
where mdg
is a DataGrid, which will be populated with the Employees rows. This indicates
that the data complies to the schema. If we change any of the element name or
data type to make them non-compliant, the reading of the instance file will
throw an exception.
Just like in any OO language, polymorphism in XML means a phenomenon that when the schema files expects a base type, the instance can contain a derived type instead. This is based on the fact we have mentioned above – in XML all types are derived from some base types by restriction or extension.
If you consider the base type to contain insufficient information and want to forbid the base type to have an instance, assign “true” to complexType’s attribute abstract.
<?xml
version=’1.0’?>
<schema
xmlns=’http://www.w3.org/2001/XMLSchema’
targetNamespace=’http://www.someaccountant.com/service’
xmlns:tns=’http://www.someaccountant.com/service’
elementFormDefault=’qualified’
attributeFormDefault=’qualified’>
<complexType name=’Family’ abstract=’true’>
<element name=’Parent’ type=’string’
/>
<element name=’Child’ type=’string’
/>
</complexType>
<complexType name=’FamilyForTaxPurposes’>
<complexContent>
<extension
base=’Family’>
<element name=’Parent’
type=’string’ />
<element name=’Child’
type=’string’ />
<element name=’FamilyIncome’
type=’float’ />
<element name=’Address’
type=’string’ />
</extension>
</complexContent>
</complexType>
<!–- Defines an element –->
<element name=’TheFamily’
type=’tns:Family />
</schema>
The
instance document could be:
<?xml
version=’1.0’?>
<someacc:TheFamily xsi:type=someacc:FamilyForTaxPurposes
xmlns:xsi=’http://www.w3.org/2001/XMLSchema-instance’
xmlns:someacc=’http://www.someaccountant.com/service’ >
<someacc:Parent>Frank</someacc:Parent>
<someacc:Parent>Yang</someacc:Parent>
<someacc:Income>12345.67</someacc:Income>
</someacc:TheFamily>
An
alternative is to use substitution group for polymorphic behaviour:
<?xml
version=’1.0’?>
<schema
xmlns=’http://www.w3.org/2001/XMLSchema’
targetNamespace=’http://www.someaccountant.com/service’
xmlns:tns=’http://www.someaccountant.com/service’
elementFormDefault=’qualified’
attributeFormDefault=’qualified’>
<complexType name=’Family’ abstract=’true’>
<element name=’Parent’ type=’string’
/>
<element name=’Child’ type=’string’
/>
</complexType>
<complexType name=’FamilyForTaxPurposes’>
<complexContent>
<extension
base=’Family’>
<element name=’Parent’
type=’string’ />
<element name=’Child’
type=’string’ />
<element name=’FamilyIncome’
type=’float’ />
<element name=’Address’
type=’string’ />
</extension>
</complexContent>
</complexType>
<!–- Defines an element –->
<element ref=’tns:FamilySub’
/>
<!–- Defines a substitution group –->
<element name=’FamilyForSub’
type=tns:Family abstract=’true’ />
<element name=’FamilyForTaxPurposesSub’
type=tns:’FamilyForTaxPurposes’ sustitutionGroup=’tns:FamilySub/>
</schema>
The
instance document could be:
<?xml
version=’1.0’?>
<someacc:FamilyForTaxPurposesSub
xmlns:xsi=’http://www.w3.org/2001/XMLSchema-instance’
xmlns:someacc=’http://www.someaccountant.com/service’ >
<someacc:Parent>Frank</someacc:Parent>
<someacc:Parent>Yang</someacc:Parent>
<someacc:Income>12345.67</someacc:Income>
</someacc:FamilyForTaxPurposesSub>
To disallow
both inheritance by extension and restriction, assign “#all” to complexType’s
attribute final. To disallow inheritance by restriction, assign “restriction”
to final. To disallow inheritance by extension, assign “extension”
to final.
To disallow
the instance of a certain type of inheritance from appearing in the instance
document, assin “restriction”, “extension”, “substitution”
or “#all” to complexType’s attribute block.