Moving to XML
Author and presenter: Simon
Brooke.
The full text of this presentation is online at <URL:
http://www.jasmine.org.uk/~simon/bookshelf/courses/xml/>
Written May 2001; last revised 30th June 2004
Note: the original version of this text had
a horrible error of understanding in the use of the
org.w3c.dom package. My apologies to all those
mislead by it. This remains, essentially, a three year old
course; I have fixed the major error but have not otherwise
brought it up to date. Bits of it are still worth reading.
Changes to the presentation since your handouts
were printed are highlighted like this.
Simon Brooke, 21
Main Street, Auchencairn, DG7 1QU, Scotland.
What we're going to do today
-
What is XML
- A brief chautauqua on language
- What is XML
- A bit about the other bits
- XML in your context
-
Anatomy of an XML system
- Specifying
- Creating
- Transforming
- Communicating
How we're going to get there
- The morning, mostly talking
- This afternoon, mostly doing it. We have an awful lot to
get in!
Breaks, meals, fire exits, nearest WCs, how
to get to coffee?
Before we start: what do you know
We've got a lot of ground to cover... Just so I can know
which bits to concentrate on and which bits to skip, what do
you know about...
|
nothing |
a little |
can do it |
expert |
HTML |
|
|
|
|
Java |
|
|
|
|
XML |
|
|
|
|
Before we start: Namespace
- A context in which there are things with names
- Each thing in the namespace has a different name
- You can address a thing in the namespace just by using
its name
-
Examples
- This is a powerful concept, and I'm going to use it a lot
- but it has special meaning in XML. Be clear about when I'm
talking about an 'XML namespace' and when I'm just
talking about a namespace!
[get participants to write their names
on bits of paper. Make sure there are not two in the class with
the same name. If there are, get them to add something to their
names to disambiguate]
A brief chautauqua on language
Words
We can recognise words as belonging to a
language because we know them... (sometimes, we can recognise
words as belonging to a language even when we don't know them,
because they sound right).
Sentences
-
Colourless green ideas sleep furiously.
-
Development state with join material and.
Language
- In any given language, we can easily recognise what is a
well formed component of the language.
- And what is not...
Words
-
Fish
- Peske
- Choiremheir
- Chautauqua
- Gkprtwcv
- P7ajo
Although language families have rules
about what can be in a word and what can't, it's much harder to
tell whether a word is valid or not, unless we know which
language we're looking at.
Sentences
- This is not a pipe.
- Ceci n'est pas une pipe.
- Chagco vet nici yan toube.
- GGGGGG #000007 cabala.
Meta-Language [1]
- Within a family of languages, we can recognise what is a
well-formed component of some language
- or might be...
- and what certainly isn't.
Meta-Language [2]
-
In Indo-European languages,
-
a word has at least one vowel
- words don't have more than four consonants in
succession
- a sentence is a succession of words
- a sentence starts with a capital letter and ends with
a period
- There is an (implicit) meta-grammar.
HTML [1]
-
<address>
- A valid HTML tag (in HTML 4.0
Transitional)
-
<cotton>
HTML [2]
- HTML is a language (albeit a simple one).
- It's a markup language, and I hope it's one you're all
familiar with.
- we can know at once whether a tag is a valid HTML tag or
not...
- and what it means...
- and how it should be used...
Well Formedness
-
When we know what the language is we can parse ill-formed
forms:
- because we can predict what the missing bits are
-
and where they should be:
- I have been there, and
I have done that.
What is XML
- Key features
- Differences from HTML
- Differences from SGML
- A bit about the other bits
- Reality check
Key Features
A universal, application-independent framework for the
communication of semantically rich structured information
between software agents.
-
A language for describing other languages
- Which describe the structure of a document
- Not the visual appearance (CSS, XSL)
- Written in simple UniCode (a sixteen-bit
replacement for ASCII)
Differences from HTML
-
A Metalanguage: In a word, extensible.
- HTML can be (and has been)
reimplemented as an XML dialect
- Also, strictly parsed.
Extensible: what does this mean for you?
- Allows you to define new markup.
- Describing structure, not
appearance.
- Makes it easier for programs to extract
information from your documents.
Extensible: a simple example [1]
<?xml version="1.0"?>
<!DOCTYPE meeting PUBLIC "-//WEFT//DTD MEETING 0.1//EN"
"meeting.dtd">
<meeting id="June Board Meeting">
<venue>
28 Forth Street, Edinburgh
</venue>
<invitees>
<attendee attendance="required"
meeting-role="convenor">
<name>
Simon Brooke
</name>
<position>
Technical Director
</position>
</attendee>
<attendee attendance="required">
<name>Angela Stormont</name>
<position>
Communications Director
</position>
</attendee>
</invitees>
</meeting>
Extensible: a simple example [2]
- What does this do?
- For the user directly, very little.
- For the user's program, it allows it to isolate items of
structured information and handle them in intelligent ways to
help the user.
- But only if the user's program understands the special
markup you have defined.
Strictly parsed: what does this mean for you? [1]
- Documents which are not well-formed will not be handled by an
XML application. At all.
- Tags and attributes are case-sensitive;
- End tags cannot be omitted - every <p>
must have a </p>.
- Tags must be correctly nested:
<b><i>This won't
work</b></i>
- Empty tags (those which don't enclose any content) must
be marked with a trailing slash like this:
<xx/>
Strictly parsed: what does this mean for you? [2]
- Most Web designers are sloppy.
- More than ninety percent of all commercially authored Web
pages do not conform to any standard and are not valid
HTML.
- Few if any of the commercially available WYSIWYG tools
generate valid HTML.
- Web authors switching to XML will need to adopt much more
rigorous technical discipline.
Differences from SGML
-
Like HTML, simpler!
- I used to say 'much simpler', but now I'm not too
sure...
- Like HTML, optimised for delivery over
restricted-bandwidth links.
- Unlike HTML, a true subset of SGML.
- All valid XML documents are valid SGML documents.
- SGML tools (conforming to ISO 8879) will work with
XML.
- Organisations with an existing committment to SGML will
find the transition to XML much simpler.
A bit about the other bits
-
XML is a language for describing other languages
- Most of these are application specific
-
Some are very general
- XLink:
a vocabulary for linking between XML documents
- XPath
and XPointer:
vocabularies for describing positions inside XML
documents
- XSL-T: a
vocabulary for transforming XML documents
- XML
Schema: a vocabulary for describing
vocabularies
- SMIL
(Synchonised Multimedia Integration Language): a
vocabulary for integrating and synchronising
multimedia presentations
- SOAP
(Simple Object Access Protocol): a vocabulary for
exchanging computation requests between heterogenous
agents in a network.
- All of these key standards are looked after by W3C
A bit about the other bits [ii]: XLink
- In HTML, you need to use a special element (the A or
Anchor tag) to be the start of a link
-
In XML any element can be the start of a link
- Currently, Mozilla/Netscape 6 is
the only 'mainstream' browser which partly supports
this
- W3C's Amaya 4
also partly supports it
-
Several other demo and prototype implementations
- No mainstream browser fully supports XLink
A bit about the other bits [iii]: XPath and XPointer
- In HTML, you need to use a special element (the A or
Anchor tag) to be the target of a link
-
In XML a link can target any element in the target document
- Several demo and prototype implementations
- No mainstream browser fully supports this
- You've heard this before somewhere, haven't you?
A bit about the other bits [iv]: XSL-T
-
'eXtensible Stylesheet Language - Transformations'
- A language for manipulating document structure
- Maps any XML dialect into any other (or even to plain
text)
- Declarative, pattern matching language, conceptually
like Prolog
- Extremely powerful, unquestionably useful.
- But not really a stylesheet language
Digression: Visual Appearance and Stylesheets [i]
- XML documents are not necessarily or primarily intended
to be viewed by people, but when they are...
- The visual appearance of a document should be controlled
by stylesheets.
- The appearance of this one is.
- In XML as in HTML you don't have to use stylesheets.
- If you don't, you will get a plain, simple
appearance.
If people are interested, you can open
the stylesheet for this presentation, slideshow.css, in a text
editor.
Visual Appearance and Stylesheets [ii]
-
A special stylesheet language, XSL, was conceived to
support the new features of XML.
-
Two parts:
-
XSL-T, the Transformation language
- I've described this above
-
XSL-FO, formatting objects
- A comprehensive language for descibing the
fine detail of document presentation.
- Produces prolix, semantically impoverished
markup.
- Not supported by any client yet.
- Really a stylesheet language...
- Of doubtful value.
Visual Appearance and Stylesheets [iii]: Status of XSL
- XSL-T was adopted
on 16 November 1999 as a W3C recommendation.
- XSL-FO was
adopted on 21 November 2000 as a W3C recommendation.
- Microsoft IE5 implemented a proprietary 'XSL'
which is based on an older draft of XSL; newer IE5s are
migrating towards the standard.
Visual Appearance and Stylesheets [iv]: XSL Summary
- Transformation language of unquestionable merit, greatly
aids separating content from presentation.
- Designed primarily to transform XML to XSL-FO, but can
transform to any other XML dialect (including XHTML).
-
Recommendation:
- use XSL-T to map XML into XHTML for presentation to
users, decorate with CSS
- use XSL-T to map XML to other XML dialects as needed
for communication with other organisations
-
ignore XSL-FO for now, except if
- You need pixel-perfect presentation of your
documents and
- You work in an environment (e.g. an Intranet)
where you control the client.
Visual Appearance and Stylesheets [v]: What about CSS?
-
You can continue to use existing CSS1 and CSS2 stylesheets.
- Probably.
- Depending on what individual client vendors decide to
support...
-
all (roughly) support CSS2.
- This presentation is not about stylesheets.
A bit about the other bits [v]: XML Schemas
- A vocabulary for defining vocabularies or 'dialects'
- A bit late arriving
- Replace DTDs, inherited from SGML
Digression: Dialects of XML
- What is a DTD?
- What about Schemas?
- Do I have to use a DTD or Schema?
- What DTDs and Schemas are available?
- Who will write DTDs and Schemas?
What is a Document Type Definition?
- Essentially, a dictionary for the language you are
using.
- Every Web author has heard of one
-
Every good Web author has seen one
- Very few Web authors have written one
What about Schemas? (Schemata?)
- Schemas are a new, alternate way to specify XML
languages
- Officially adopted by w3c on 4
th May 2001 - so still very new
- Recommendation: Let someone else take
the grief of getting the bugs out of it - stick with DTDs for
now.
More about Schemas [i]: benefits
-
The schema language is itself an XML laguage, so schemas
can be parsed with standard XML tools
-
You can specify rules for the content of elements and
attributes with much finer granularity than with DTDs
- You can specify that an attribute must be a
number
- You can specify minimum and maximum values for an
attribute
- You can specify
regular expression patterns the attribute must
match
More about Schemas [ii]: examples
-
An attribute representing someone's age
-
An attribute representing a UK bank sorting code (e.g.
68-59-13)
-
An attribute representing a UK grid reference (e.g.
NX7951)
The pattern specification seems to have
changed at some stage in the drafting process. The examples
given in Learning
XML don't work with Daniel Potter's tutorial applet.
Treat all tutorials with care and refer back to the formal
specification!
More about Schemas [iii]: conversion
-
Schema has superset of the same information in a DTD
- You can convert a DTD to a schema with a PERL
script
-
You should be able to convert a schema to a DTD using
XSL-T
- But you might lose some information
Do I have to use a DTD or Schema?
- As with HTML, you don't have to specify a DTD.
- Even if you define new markup...
- ... but client programs won't know how to interpret your
new markup unless you also define a DTD or Schema.
- As with HTML, you should specify one.
What DTDs and Schemas are available?
- All the XML extensions discussed in this presentation are
defined as DTDs or Schemas (mostly DTDs).
- Thousands of SGML DTDs are available which can relatively
easily be converted.
-
There are already many hundreds of XML DTDs available, and
the number is growing fast.
Some repositories:
Who will write DTDs and Schemas? [i]
- Very specialised documents, technically demanding to
write.
- For most purposes, suitable examples are available.
- Most XML users will never write one.
Who will write DTDs and Schemas? [ii]
- Large organisations with special documentation
requirements may write DTDs and/or Schemas.
- Communities of organisations which wish to exchange data
will probably write DTDs and/or Schemas.
- Corporations which sell application programs will
probably write DTDs and/or Schemas.
-
Corporations which sell WYSIWYG Web authoring tools
will certainly write DTDs and/or Schemas.
- In future, there will be much less distinction
between a word processor and a Web authoring tool.
- Communities of interest with special technical needs
will certainly write DTDs and/or Schemas.
Ownership of DTDs: Communities vs single vendors
- Rich Site Summary (RSS) is an important XML dialect used in news
syndication
- The original DTDs (0.9, 0.91) was developed and 'owned'
by Netscape
- By April 2001, Netscape had lost interest in RSS...
- Many other people's systems broke.
- My conclusion: single vendors are not to be trusted with
community resources.
- Since 2001 the history of RSS and its
competing versions has got even more complex and
bizarre. It's still a really useful tool for doing
syndication, though.
Another cautionary tail about software vendors
- Microsoft has a long history of 'embracing and extending'
standards
- Making small changes which cause other people's implementations to
break.
- Thus forcing people to use only their implementations.
- MS Word 2002 saves as 'HTML' and as 'XML'
- When it saves as HTML, the HTML contains embedded 'XML' elements
- In the 'XML', the attribute values are unquoted.
- This is explicitly forbidden by the XML standards...
- Standards compliant XML parsers can't parse this
- Microsoft's own XML parsers can parse this
- Is this simply incompetence
- Or is it 'embracing and extending'...?
A bit about the other bits [vi]: SOAP
- Simple Object Access Protocol
-
A vocabulary for communicating with software agents in a
heterogenous network
-
Not actually very simple...
- But this is an inherently difficult area
- Software toolkits (such as Apache Soap) will
make this easier to deploy
More about SOAP
- Developed from 'XML-RPC' (Dave Winer, Userland)
- Three versions out there
- 0.9
- 1.0
- submitted to IETF as a 'draft'
- certainly incompatible with 0.9
- 1.1,
- May 2000
- submitted to W3C as a 'note'
- probably incompatible with 1.0
- Not (yet) a W3C recommendation, just a 'note'
- Not (yet) an IETF RFC
- Vapourware: not ready for prime time.
- If you want to pursue this further, there's an online tutorial here.
XML in your context
- Applications which will benefit greatly from XML
- Applications which will benefit little from XML
- XML in action: Content syndication
Applications which benefit greatly from XML
-
Applications exchanging structured data with other software
agents.
- Accounting systems exchanging orders, invoices,
payments...
- Engineering systems exchanging specifications,
dimensions...
- Diary systems exchanging bookings, events, meetings,
holidays...
- Technical documentation applications, or applications
involving special notation (e.g., mathematics, music).
- Applications requiring highly detailed
illustrations.
- Multimedia applications.
At present, only where the audience is
controlled
Applications which will benefit little from XML
-
Simple publishing of text, with or without simple graphics.
XML in action: content syndication
- What is content syndication
- History of Syndication
- Standards for Syndication
- Offering Syndication
- Incorporating Syndication
- Aggregation
What is content syndication
- Making headlines from one web site available to
others
- Automatically
- A dramatically successful public application of XML
History of Syndication
- In the beginning was the ripper
- 1997: ScriptingNews starts promoting XML-based
syndication
- 1999: My Netscape and Rich Site Summary 0.90
- 1999: ScriptingNews elements integrated by Netscape into
RSS 0.91
- 2001: Netscape abandon Rich Site Summary
Standards for Syndication
-
Rich Site Summary 0.91
- Netscape, now abandoned
- Very, very simple
- Still useful
-
Rich Site Summary 1.0
-
Invent your own
Offering Syndication
-
Provide a URL on your site from which an RSS document can
be pulled
- Example pulled from a flat file (static, compiled
periodically) [
Wired news]
- Example pulled from a Servlet (dynamic) [PRES]
- You can do this with CGI, or any other server side
content technology
- Very easy to set up.
Incorporating Syndication [i]
-
Periodically request RSS from donor sites and transform to
HTML
-
Example sites
Incorporating Syndication [ii]: Sample code
<!-- sidebar sections: show title and top eight entries -->
<xsl:template match="rss">
<h2>
<xsl:apply-templates select="channel/title" />
</h2>
<xsl:for-each select="channel/item">
<xsl:if test="9 > position()">
<p>
<a>
<xsl:attribute name="href"><xsl:value-of
select="link"/>
</xsl:attribute>
<xsl:apply-templates select="title" />
</a>
</p>
</xsl:if>
</xsl:for-each>
</xsl:template>
|
Sample XSL code |
Moreover Internet Europe headlines, processed with
this XSL 22nd May 2001 |
Aggregation
- If you can collect headlines from multiple sources, you
can search the collection with predetermined patterns, and
offer personalised aggregations of news to users.
- O'Reilly's Meerkat
- Start of something big.
Worked Example: a meeting arranger system
- We all go to meetings...
- We all know what a hassle it is arranging them...
- Wouldn't it be nice if the machines could do it for
us?
- Here's how!
Creating an example document (quite easy)
- Start by typing what you want into your favourite text
editor.
- Invent sensible looking markup as you go along.
-
Don't be too casual about this
- this is a data design exercise,
- you need to think about not only what you need for
this document,
- but what you might need for others.
- you need to think about all the possible uses of your
document.
- Here's one I did
earlier.
This is a good opportunity for a
whiteboard and some interaction! If possible, get the
participants to do an example for themselves.
Creating the DTD and/or Schema (hard, but we'll use a
trick)
- DTDs and Schemas are precise, technical documents. How
are we going to make them?
- Pass our example page to the DTDGenerator
- [2004: unfortunately the DTD generator is no longer available]
- Pass the results of that through the DTD2Schema
script (requires PERL)
- Tidy up the results with your text editor
- Here's a DTD and a schema I did earlier.
Again, if possible, get the participants
to actually do this.
Viewing it: creating a style-sheet (harder)
-
Two approaches to stylesheets:
- CSS1:
- just establishes visual styles for the actual
elements in your document
- XSL:
- much more complex, but allows on-the-fly
transformation of the document to present particular
features
(Of course, you can just do without altogether)
-
Here's one just for the
agenda.
Using it: applications
Now we need to write applications which will:
-
allow us to generate these documents
- not very hard, there are Java components around which
semi-automate creating a form-driven special-purpose
editor from a DTD...
-
allow our diary programs to automatically handle these
documents
- much harder, but XML parser libaries are available
for most modern programming languages which you can build
on.
- We probably won't get that far today.
Specifying
- The Structure of an XML document
- Exercise period [i]
The Structure of an XML document
- Overall structure
- Processing Instructions
- XML Namespaces
- Elements
- Attributes
- When to use which
Overall Structure
-
Prolog
-
The XML declaration
<?xml version="1.0"?>
- declares that this is XML
- strictly, not optional
-
The Document Type Declaration
<!DOCTYPE meeting PUBLIC "-//WEFT//DTD
MEETING 0.1//EN" "meeting.dtd">
- says what dialect of XML this is
- optional
- Processing instructions
- Comments
-
Root element
- Just an element, like any other
- Just exactly one.
- Special instructions for particular applications
-
Syntactically, delimited by
<?
and
?>
<?xml version="1.0"?>
is a
processing instruction
- a special one
-
The tag-part identifies the particular application this PI
is intended for
xml
means 'any XML parser'
- The rest of the content is application specific
- Warning: Special use of the term!
- Allow mutiple XML dialects to be used in one
document
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
- xmlns means 'this is an XML namespace declaration'
- the rest means that names starting with xsl: belong to
the namespace defined as http://www.w3.org/1999/XSL/Transform
- Note that the URL doesn't actually point to anything
interesting, it's just a marker!
Elements [i]
Syntactically, an element is what is delimited by its
tags.
-
An opening tag comprises a left angle bracket
<
, the name of the element, optionally some
attribute-value pairs, and a closing angle bracket
>
<meeting id="June Board
Meeting">
-
A closing tag comprises a left angle bracket
<
, a slash /
, the name of the
element, and a closing angle bracket >
- An empty tag comprises a left angle bracket
<
, the name of the element, optionally some
attribute-value pairs, a slash /
, and a closing
angle bracket >
; it is just shorthand for an
opening tag immediately followed by the closing tag with
nothing in between.
Elements [ii]
-
An element is a primary structural unit in the XML markup
-
May allow child elements of particular kinds
- Or just text (PCDATA)
- Or neither (empty tags)
- An element may have many child elements with the same
name
Attributes
- An attribute belongs to a particular element type
- Has a name which is a string of characters
- Has a value which is a string of
characters
-
Syntactically
- name and value are separated by an equals sign
=
- value is delimited by quotation marks
"
- An element may have only one attribute with any given
name
When to use which
-
When you may have a value which is a complex data item, use
an element
- example: agenda containing agenda items
-
When you may have many values of the same type, use an
element.
-
When you may have a long simple text value, use an element
- example: title of an agenda item
-
When you always have just one short simple text value, use
an attribute
- example: proposer of an agenda item
<meeting id="June Board Meeting">
<agenda>
<item proposer="Simon Brooke">
<title>
Adoption of new project management
procedures manual
</title>
</item>
<item proposer="Angela Stormont">
<title>
Transfer of shares
</title>
</item>
</agenda>
</meeting>
Exercise period [i]
- In groups, produce a DTD for an XML dialect to describe
meetings
- You may use the DTD generator at <URL:http://www.pault.com/Xmltube/dtdgen.html>
- [2004: unfortunately the DTD generator is no longer available]
- You should think about your meetings database as you do
so and have some idea of how your XML DTD relates to your
database design.
Creating
- Building XML applications: tools and technologies
- Constructing the document
- Exercise period[ii]
Building XML applications: tools and technologies
- Languages for XML applications
- Tools, components and toolkits
- What we will be using today
Why Java?
- Portable
- Reasonably readable
- Very well supported with XML toolkits and components
- I like it...
Other languages for building XML applications
Tools, components and toolkits
- Parsers
- Transformation engines
- APIs
-
Where to find XML tools
Transformation engines
Apply XSL stylesheets to transform a document from one
representation to another.
- XML to XML
- XML to HTML
- XML to text
What we will be using today
- Apache
Xalan
- XSL processor contributed to the Apache Foundation by IBM
closely related to IBM's
LotusXSL processor
- Apache
Xerces
- XML parser contributed to the Apache Foundation by IBM;
based on IBM's XML4J parser
- SAX
- Simple API for XML, by David Megginson and others
- DOM
- The W3C Document Object Model API
- W3C Jigsaw
- HTTP Server and Servlet Server developed by W3C
- Jacquard
- A toolkit of useful bits for sticking it all together. By
me. Not neccesarily the best but it's what I know and
use.
Constructing the document
- Writing text to the output stream
- Using the DOM
First an apology
Previous versions of this course contained a howling error at this
point. It suggested creating DOM objects essentially by
calling the newInstance method of the implementing
classes. This only works with the particular DOM
implementation you happen to be using and is not portable
between DOM implementations (or even, necessarily, between
successive versions of the same DOM implementation). So clearly
it is very bad practice to do this.
I can only apologise to people who were mislead by
this.
The Document Object Model
- Standardised interface for working with XML
documents
- A W3C standard
- Many DOM implementations
The DOM: what is a Document?
The DOM: what is an Element?
- A 'tag'
-
With 'attributes'
-
And 'contents'
- other elements which are children of this
element
- text elements
- Constructed by calling the createElement( String tagName)
method of the Document object
- Or in Jacquard, by calling the generate method of a class
which implements the NodeGenerator interface
The DOM: what is a Text?
-
just text
- No tag
- No attributes
- No enclosing angle brackets
Create a document object
// get a handle on a DOM implementation...
DOMImplementation di = DOMStub.getDOMImplementation( context);
// and use it to create a document object
Document doc = di.createDocument( getNamespaceURI( context),
rootName, doctype);
DOMStub is a Jacquard utility class which gets hold of
whatever DOM implementation is available. If you don't use
Jacquard you'll have to instantiate a DOM implementation for yourself.
Add a root ('content') element
doc.appendChild( doc.createElement( doc, "eventsdiary"));
Element content = doc.getDocumentElement();
- Every Document must have exactly one 'content'
element
- If you attempt to add another child to a document which
already has a child, that's an error.
Add further elements recursively as required
// match the pattern against the convenience view and pull
// back the rows that match as namespaces
Contexts events =
TableDescriptor.getDescriptor( VIEW, null,
context ).match( pattern );
Enumeration e = events.elements( );
// and pass each of those namespaces in turn to my event element
// generator to generate children for my element
while ( e.hasMoreElements( ) )
content.appendChild( eventEltGenerator.generate( doc,
(Context) e.nextElement( ) ) );
This is a bit of a cheat. It depends on having a view in the
database which collects together all the necessary fields for
us:
---- EVENTS_VIEW -----------------------------------------------------
CREATE VIEW events_view AS
SELECT EVENT.Actor,
EVENT.Event,
CATEGORY.Description AS Type,
LOCATION.Description AS Location,
EVENT.Eventdate,
EVENT.Starttime,
EVENT.Endtime,
EVENT.Description
FROM EVENT,
CATEGORY,
LOCATION
WHERE EVENT.Location = LOCATION.Location
AND EVENT.Category = CATEGORY.Category
ORDER BY Eventdate,Starttime
;
Let's see that again [i] the source
public class DayView extends DocumentGeneratorImpl
{
//~ Static fields/initializers --------------------------------------------
/**
* the name of the convenience view in the database from which I will
* collect all the information I need
*/
protected static final String VIEW = "events_view";
/** the field in that view which represents the date of the event */
protected static final String EVENTDATEFIELD = "when";
//~ Instance fields -------------------------------------------------------
/** a generate to generate the event elements which will be my children */
protected EventElementGenerator eventEltGenerator =
new EventElementGenerator( );
//~ Methods ---------------------------------------------------------------
/**
* generate a document containing all the events on the day implied by
* this context
*/
public Document generate( Context context ) throws GenerationException
{
DOMImplementation di = DOMStub.getDOMImplementation( context );
Document doc = di.createDocument( "", "eventsdiary", null );
String day = context.getValueAsString( "day" );
uk.co.weft.dbutil.Calendar when = new uk.co.weft.dbutil.Calendar( );
if ( day != null )
{
// if we've got a date, set my calendar to that day
// (by default it sets itself to today)
when.setTime( java.sql.Date.valueOf( day ) );
}
Element content = doc.getDocumentElement( );
content.setAttribute( "date", when.toString( ) );
try
{
// create a new, blank, context as a pattern to match
Context pattern = new Context( );
// give it the database username, password and url from the current context
pattern.copyDBTokens( context );
// put the date we're interested in into the pattern
pattern.put( EVENTDATEFIELD, when );
// match the pattern against the cnvenience view and pull
// back the rows that match as namespaces
Contexts events =
TableDescriptor.getDescriptor( VIEW, null, context ).match( pattern );
Enumeration e = events.elements( );
// and pass each of those namespaces in turn to my event element
// generator to generate children for my element
while ( e.hasMoreElements( ) )
content.appendChild( eventEltGenerator.generate( doc,
(Context) e.nextElement( ) ) );
}
catch ( DataStoreException dex )
{
throw new GenerationException( "Failed to read from data store: " +
dex.getMessage( ) );
}
return doc;
}
Let's see that again [ii]: the event element generator
The event element is a simple wrapper round a context
element generator:
//~ Inner Classes ---------------------------------------------------------
/**
* a generator for an XML element representing a single event. This uses
* ContextElementGenerator which knows how to construct a DOM element
* node by taking values out of a context, so all we need to do is tell
* it which value names to treat as attributes and which as children
*/
class EventElementGenerator extends ContextElementGenerator
{
//~ Constructors ------------------------------------------------------
/**
* the tag of the element I generate is 'event'
*/
public EventElementGenerator( )
{
super( "event" );
}
//~ Methods -----------------------------------------------------------
/**
* return a String array of the names of my properties to output as
* attributes
*/
protected String[] getAttrNames( )
{
String[] attrNames =
{ "event", "type", "location", "starttime", "endtime", "actor" };
return attrNames;
}
/**
* return a String array of the names of my properties to output as
* children
*/
protected String[] getChildNames( )
{
String[] childNames = { "description" };
return childNames;
}
}
}
Let's see that again: [iii] the context element generator
-
A class which makes a simple elements out of namespaces.
Often useful
- Not part of DOM or SAX - part of my own Jacquard
toolkit
- There's no particular reason to use Jacquard
-
ContextElmentGenerator
- A name to be used as an element name
- A list of names which are to be used as
attributes
- A list of names which are to be used as child (text)
elements
- Constructs element nodes to that specification
- taking attribute and child values from a namespace
passed to the generate method
Let's see that again: [iv] the output
<?xml version="1.0"?>
<eventsdiary
date="Jul 18, 2000">
<event
actor="simon"
endtime="5:30:00 PM"
event="19"
location="Yokohama, Japan"
starttime="9:00:00 AM"
type="Otherwise unavailable">
<description>
Lecture, Java XML, all day
</description>
</event>
</eventsdiary>
Should be online here
(login required). HTML formatted view here
Exercise period [ii]
We may skip this one if time's short or the group is
struggling!
- In groups: Try to write a Java application or Servlet
which produces at least part of an XML document to your
meeting DTD from your database
Transforming
- Beginning XSL-T
- Exercise period [iii]
Beginning XSL-T [i] The 'stylesheet'
<?xml version="1.0"?>
<xsl:stylesheet version=1.0
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Basic XSL stylesheet for day view of events diary. -->
<xsl:output indent="yes" method="html"
doctype-public="-//W3C//DTD HTML 4.0 Transitional//EN"/>
<xsl:template match="eventsdiary">
<html>
<head>
<title>
Diary for <xsl:value-of select="@date" />
</title>
<link rel="StyleSheet" href="/styles/jacquard.css" type="text/css"
media="screen"/>
</head>
<body>
<h1>
Diary for <xsl:value-of select="@date" />
</h1>
<table>
<tr>
<th rowspan="2">
Who
</th>
<th rowspan="2">
Where
</th>
<th colspan="2">
When
</th>
<th rowspan="2">
What
</th>
<th rowspan="2">
Details
</th>
<th rowspan="2">
<a href="event">Add</a>
</th>
</tr>
<tr>
<th>
Starts
</th>
<th>
Ends
</th>
</tr>
<xsl:apply-templates select="event" />
</table>
</body>
</html>
</xsl:template>
<xsl:template match="event">
<tr>
<td>
<xsl:value-of select="@actor"/>
</td>
<td>
<xsl:value-of select="@location"/>
</td>
<td>
<xsl:value-of select="@starttime"/>
</td>
<td>
<xsl:value-of select="@endtime"/>
</td>
<td>
<xsl:value-of select="@type"/>
</td>
<td>
<xsl:value-of select="description"/>
</td>
<td>
<a>
<xsl:attribute name="href">event?event=<xsl:value-of
select="@event"/>
</xsl:attribute>
Edit
</a>
</td>
</tr>
</xsl:template>
</xsl:stylesheet>
Beginning XSL-T [ii] The 'stylesheet' tag
<?xml version="1.0"?>
- This says this stylesheet is written in XML; it should be
the first line of every XML document
- Yes, XSL is a dialect of XML
version=1.0
says it's version 1.0 of
XML
<xsl:stylesheet version=1.0
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
- Every XSL-T 'stylesheet' starts with this
xsl:stylesheet
says it's a stylesheet
version=1.0
says it's version 1.0 of
XSL
xmlns
says the namesspace definition of
names which start with 'xsl:
' is identified by
the URL
http://www.w3.org/1999/XSL/Transform
Beginning XSL-T [iii] comments
<!-- Basic XSL stylesheet for day view of events diary. -->
-
Comments in XSL text are just like any other XML (or SGML)
comments
- Start with
<!--
(the space
matters)
- End with
-->
(the space
matters)
- Because they're comments, they don't appear in the
output
-
To create comments in the output, use
xsl:comment
<xsl:comment>text of
comment</xsl:comment>
- will produce
<!-- text of comment
-->
Beginning XSL-T [iv] output specifier
<xsl:output indent="yes" method="html"
doctype-public="-//W3C//DTD HTML 4.0 Transitional//EN"/>
- The output specifier is not required
-
If it exists it must appear at top level
- as a child of the
xsl:stylesheet
element
indent="yes"
says we want the output neatly
indented to show structure
-
method="html"
saya we want the output to have
html syntax
- might have been "xml" or "text"
doctype-public
says include a DOCTYPE
declaration of this DTD
- There are a number of other possible
attributes.
Beginning XSL-T [v] declaring a template
<xsl:template match="eventsdiary">
This template matches every instance of the element
eventsdiary
which is found in the document being
processed. As eventsdiary
is the root element of
the document type we're interested in, there will only be
one.
<html>
<head>
<title>
As you can see, what is in the template is just the HTML
markup that will be output (if we were outputting XML, it would
be XML, of course)...
Diary for <xsl:value-of select="@date" />
with scattered among it special xsl tags which
cause things to be spliced into the output. This one says 'use the
value of the date attribute of the current element'
</title>
</head>
<body>
<h1>
Diary for <xsl:value-of select="@date" />
</h1>
<table>
<tr>
<th rowspan="2">
Who
</th>
<th rowspan="2">
Where
</th>
<th colspan="2">
When
</th>
<th rowspan="2">
What
</th>
<th rowspan="2">
Details
</th>
<th rowspan="2">
<a href="event">Add</a>
</th>
</tr>
<tr>
<th>
Starts
</th>
<th>
Ends
</th>
</tr>
<xsl:apply-templates select="event" />
This is the important one. It says "apply the templates in
this stylesheet to all the instances of event
elements which are children of the current node".
</table>
</body>
</html>
</xsl:template>
Beginning XSL-T [vi] other useful bits
<xsl:template match="section[ @slot='main']">
This template will match only section
elements
which have an attribute named slot
whose value is
main
<p>
<xsl:call-template name="toc"/>
paste in the output of the named template called toc.
</p>
<xsl:apply-templates select="section">
<xsl:sort select="title"/>
Apply templates in this stylesheet to section
s
which are children of this section
, sorted
alphabetically by their title
sub-element
</xsl:apply-templates>
</xsl:template>
<xsl:template name="toc">
This is the named template which was called
earlier. Most templates are not named: they are applied
automatically if their patterns match an element
<xsl:for-each select="section">
for-each iterates over matching elements in turn
<xsl:sort select="title"/>
<a>
<xsl:attribute name="href">#<xsl:value-of select="title"/>
</xsl:attribute>
xsl:attribute allows us to construct the value of an
attribute of the enclosing tag
<xsl:value-of select="title"/>
</a> |
</xsl:for-each>
</xsl:template>
XSL-T elements: reprise
- xsl:output
- allows us to define how we want the output to be
formatted
- xsl:template
- defines what should be output for elements matching a
given pattern
- xsl:apply-templates
- applies the templates to the elements which match its
pattern
- xsl:call-template
- calls a template with a particular name, overriding the
pattern-matching system
- xsl:for-each
- produces output iteratively, overriding the
pattern-matching system
- xsl:sort
- orders the result of its enclosing element (an
xsl:apply-templates
or an
xsl:for-each
)
- xsl:value-of
- produces the value of the thing matched by its
pattern
- xsl:attribute
- outputs an attribute for the output element which
encloses it
There are a few more XSL elements, but these will do most
things for you.
Beginning XSL-T [vii]: Patterns
- *
- matches any element
- foo
- matches any element whose type is
foo
- foo | bar
- matches any element whose type is
foo
or
bar
- foo/bar
- matches any
bar
element with a
foo
parent
- foo//bar
- matches any
bar
element with a
foo
ancestor
- foo[ @bar='baz']
- matches any
foo
element which has a
bar
attribute which has the value
baz
- foo[1]
- matches any
foo
element which is the first
foo
child of its parent
- foo[ position() = 1]
- matches any
foo
element which is the first
child of its parent
- [ position() < 5]
- matches any element which is the first, second, third or
fourth child of its parent
- text()
- matches any text element.
This is just the basics. The full definition is here
XSL-T: A deceptively simple language
- Not many elements
- Simple to learn all of them
- Very subtle in use
- The power is in the patterns
Exercise period [iii]
- In groups: Write an XSL-T stylesheet which produces an
HTML agenda for your group's Meeting DTD.
- Everyone together: negotiate and agree a new, common DTD
which you can use to communicate meeting information between
your groups
- In groups: Write an XSL-T stylesheet which produces a
document conforming to the common DTD from a document
conforming to the groups DTD.
Communicating
- Just a bit about transport
- XML Parsers
- Parsing XML into the Database
- Parsing: Simple worked example
- Exercise period [iv]
Just a bit about transport
- XML is about the content of communication, not how it's
sent...
-
But how do you send XML information?
- HTTP GET to get a information from a known place
- HTTP POST or PUT to send information to a known
place
- Special purpose listener daemons with special purpose
protocols
- eMail
Parsers
- read a document from some source,
- construct a representation of that document in the
machine
- or provide the hooks to allow you to do so
Parsing is quite compute-intensive - don't do it if you
don't have to!
More about parsers [i] types
-
Event-based parsers
- You register handlers for parsing events you are
interested in
- The parser calls these handlers when it sees the
events
- Useful if you only want some of the information out
of the document
- Useful if the document might use more memory than you
have available
- Quite a lot of work to set up.
-
Document parsers
- Usually built on event-based parsers
-
Parse the whole document and provide you with a handle
on an internal representation of it
- Usually a DOM document object
- Useful if you want all the information out of the
document
More about parsers [ii] types
-
Validating parsers
- Read the DTD (or schema)
- Read the document
- If the document isn't valid according to the DTD,
report this
- Good if you're making sure your document conforms to
the dialect standard
-
Non validating parsers
- Don't read the DTD (or schema)
- Read the document
- Will still throw an error if the document has bad
syntax
- Good if you just want to parse XML quickly
Parsing from XML into the database
- Walk recursively down the document tree
- identifying the elements we want to store
- for each one, see if it's already there (tricky!)
- if not, store it.
Identifying the data to store
- The attributes of an element are a namespace
- So are the fields of a table
-
If you have one table for every element type
- and one field in that table for every attribute that
element can have
- It's relatively easy
-
The real world isn't often like that
- the overall structure of XML and relational databases
are quite different
- most serious databases have been around a long time,
we can't just design them to fit our DTD
- most DTDs are agreed between large numbers of
organisations, we can't just design them to fit our
database
- but it may be coerced with a little help from
XSL...
Other things to bear in mind
- Text nodes - what do you do with them?
- Context - what was the key value of that meeting we just
stored?
Parsing: very simple worked example
- Sample XML document
- Sample Java class
Sample XML document
<?xml version="1.0"?>
<workshop tutor="Simon Brooke"
title="Parsing XML" venue="small">
<attendee name="Jon Smith" age="37"
sex="M" country="UK" />
<attendee name="Jane Doe" age="42"
sex="F" country="US" />
</workshop>
those who were here yesterday will
probably recognise this from the 'WORKSHOP' database - I'm
using this because I can't predict what your 'MEETING'
databases will look like
Sample Java class
import java.io.*; // to read things from the user
import java.sql.*; // to talk to the database
import uk.co.weft.domutil.*; // things to convert elements to namespaces
import uk.co.weft.dbutil.*; // things to store namespaces in databases
import org.w3c.dom.*; // interrogates a DOM tree...
import org.apache.xerces.dom.*; // using Apache's DOM implementation
import org.apache.xalan.xslt.*; // Apache's XSL processor
import org.apache.xerces.parsers.DOMParser;
// and Apache's XML parser
public class ParseExample
{
static Context connectionContext = new Context();
// a context to hold database
// connection details
/** walk down a document tree looking for nodes we recognise */
public static void walk( Node node)
throws SQLException, DataStoreException
{
if ( node.getNodeType() == Node.ELEMENT_NODE)
{
Element elt = ( Element) node;
System.out.println( "Considering element of type " +
elt.getTagName());
if ( elt.getTagName().equals( "workshop"))
handleWorkshop( elt);
else
{
NodeList children = elt.getChildNodes();
for ( int i = 0; i < children.getLength(); i++)
walk( children.item( i));
// recurse down through the children
}
}
}
/** handle a workshop element; extract its attribute (and
* actually, it's text-only child) values, and store them in the
* database. Then look for attendees.*/
protected static void handleWorkshop( Element elt)
throws SQLException, DataStoreException
{
Object key = null;
Context c = ( Context)connectionContext.clone();
// construct a new namespace with just
// the database connection details in
// it
ContextElement.populateContext( elt, c);
// fill it with values from the element
TableDescriptor workshopDescriptor =
TableDescriptor.getDescriptor( "WORKSHOP", "Workshop", c);
// get a descriptor on the WORKSHOP table
Contexts rows = workshopDescriptor.match( c);
// try to match that against what's
// already in the table
if ( rows != null && rows.size() > 0)
{ // there was a match
key = ( ( Context)rows.get( 0)).getValueAsInteger( "Workshop");
// get its primary key value
System.out.println( "Found workshop " + key.toString());
}
else
{
key = workshopDescriptor.store( c);
// store it and get its primary key value
System.out.println( "Created workshop " + key.toString());
}
NodeList children = elt.getChildNodes();
for ( int i = 0; i < children.getLength(); i++)
{ // look through the children for my attendees
Node child = children.item( i);
if ( child.getNodeType() == Node.ELEMENT_NODE &&
( ( Element) child).getTagName().equals( "attendee"))
{
handleAttendee( ( Element)child, key);
}
}
}
/** handle an attendee element by finding or storing it in the
* database, and fixing up the link table */
protected static void handleAttendee( Element elt, Object workshopKey)
throws SQLException, DataStoreException
{
Object attendeeKey = null;
Context c = ( Context)connectionContext.clone();
// construct a new namespace with just
// the database connection details in
// it
ContextElement.populateContext( elt, c);
// fill it with values from the element
TableDescriptor attendeeDescriptor =
TableDescriptor.getDescriptor( "ATTENDEE", "Attendee", c);
// get a descriptor on the ATTENDEE table
Contexts rows = attendeeDescriptor.match( c);
// try to match that against what's
// already in the table
if ( rows != null && rows.size() > 0)
{ // there was a match
attendeeKey =
( ( Context)rows.get( 0)).getValueAsInteger( "Attendee");
// get its primary key value
System.out.println( "Found attendee " +
attendeeKey.toString());
}
else
{
attendeeKey = attendeeDescriptor.store( c);
// store it and get its primary key value
System.out.println( "Created attendee " +
attendeeKey.toString());
}
String q = "insert into ATTENDANCE ( Attendee, Workshop) values ("
+ attendeeKey.toString() + ", " + workshopKey.toString() + ")";
Connection conn = c.getConnection();
Statement s = conn.createStatement();
// set up a database connection
s.executeUpdate( q); // run the statement
System.out.println( "Inserted link into link table");
s.close(); // close it...
c.releaseConnection( conn);
// and release it back into the pool
}
/** prompt the user for input; if we get any, return it */
protected static String maybeGetFromUser( BufferedReader in, String prompt,
String val) throws IOException
{
System.out.print( prompt + " ] ");
String s = in.readLine();
if ( s != null || s.length() == 0)
val = s.trim();
return val;
}
/** start me up... */
public static void main(String args[])
{
BufferedReader in = new
BufferedReader( new InputStreamReader( System.in));
// get from the user the name of the
// database driver to use
try
{
Class.forName(
maybeGetFromUser( in, "Database Driver",
"sun.jdbc.odbc.JdbcOdbcDriver"));
// get from the user the details
// needed to connect to the database
connectionContext.put( "db_url",
maybeGetFromUser( in, "Database URL",
"jdbc:odbc:workshop"));
connectionContext.put( "db_username",
maybeGetFromUser( in, "Database Username",
"nobody"));
connectionContext.put( "db_password",
maybeGetFromUser( in, "Database Password",
"doesntmatter"));
DOMParser p = new DOMParser();
p.parse( maybeGetFromUser( in, "URL of XML to handle",
"file:workshop.xml"));
walk( p.getDocument().getDocumentElement());
System.exit( 0); // all satisfactory
}
catch ( Exception e)
{
System.out.println( "Failed: " + e.getClass().getName() +
": " +e.getMessage());
System.exit( 1); // whoops
}
}
}
Exercise period [iv]
-
In your groups
- Write an XSL-T stylesheet that converts back from the
common DTD to the group's DTD
- Adapt the above Java class to store (at least part
of) documents in your group's DTD into your database
References
XML
-
- news:comp.text.xml
- Newsgroup for XML - recommended
-
FAQs, Directories and Resources
-
- Extensible Markup Language (XML): http://www.oasis-open.org/cover/xml.html
- A useful and authoritative overview of the
technology; another good place to start.
- Frequently Asked Questions about the Extensible
Markup Language: http://www.ucc.ie/xml/
- The most superior FAQ. Everyone seriously interested
in XML should start here.
- SCHEMA.NET: The XML Schema Site: http://www.schema.net/
- Cafe con Leche XML News, and Resources: http://metalab.unc.edu/xml/index.html
- DEVELOPERLIFE.COM brought to you by Nazmul Idris.: http://developerlife.com/
- xmlTree - The leading directory of XML content on the
Web: http://www.xmltree.com/
-
News
-
- Welcome to XMLNews.org: http://www.xmlnews.org/
- Mulberry Technologies, Inc.: XSL-List -- Open Forum
on XSL: http://www.mulberrytech.com/xsl/xsl-list/
- XMLephant: News: http://www.xmlephant.com/pages/News/
- XML.ORG - A good XML Portal: http://www.xml.org/
- XML.com - Another good XML portal: http://www.xml.com/pub
-
Standards
- Authoritative sources of standards documents, mostly from
the World Wide Web Consortium (W3C)
-
-
-
Core standards
-
-
- The Annotated XML Specification: http://www.xml.com/axml/testaxml.htm
- The standard annotated by one of the editor's
personal comments -- very revealing!
- Extensible Markup Language (XML) 1.0: http://www.w3.org/TR/1998/REC-xml-19980210
- XML Linking Language (XLink): http://www.w3.org/TR/WD-xlink#addressing
-
Resource Description Framework
-
- W3C Resource Description Framework: http://www.w3.org/RDF/
- java tutorial help resource only at gamelan.com:
http://www.gamelan.com/journal/techfocus/090199_rdf1.html
- UKOLN: DC-dot, A Dublin Core Generator: http://www.ukoln.ac.uk/metadata/dcdot/
- Dublin Core Metadata Initiative / Documents /
Proposed Recommendations / Dublin Core Element Set,
Version 1.1:
http://purl.org/DC/documents/rec-dces-19990702.htm
- Dublin Core Metadata Initiative: http://purl.org/dc/index.htm
- UKOLN Metadata Resources - DC: http://www.ukoln.ac.uk/metadata/resources/dc/
- UKOLN Metadata Resources - DC: http://www.ukoln.ac.uk/metadata/resources/dc/
- Welcome to XMLNews.org: http://www.xmlnews.org/
-
XSL
-
- XSL Transformations (XSLT) Specification: http://www.w3.org/TR/WD-xslt
-
DocBook
-
- The nwalsh.com Home Page - XSL DocBook
Stylesheets: http://nwalsh.com/docbook/xsl/
- XSL DocBook Stylesheets: http://nwalsh.com/docbook/xsl/
-
WML
-
- WAP WAP Binary XML (WBXML) Encoding
Specification: http://www.w3.org/TR/wbxml/
- Welcome to WAP School: http://www.refsnesdata.no/wap/default.asp
- Nokia WAP Developer Forum: Nokia WAP Toolkit:
http://www.forum.nokia.com/wapforum/main/1,6668,1_1_3_2,00.html
-
RSS: Rich Site Summary
-
-
Tutorials
-
- My Netscape Network: http://my.netscape.com/publish/
- Using RSS News Feeds - Webreference.com:
http://www.webreference.com/perl/tutorial/8/
-
Feed Directories
-
- Webfeeds:
http://www.stirbitch.com/cgi-bin/agg/sources.pl
- Moreover... Top stories: http://w.moreover.com/
- StartsHere Channel List:
http://theweb.startshere.net/channels.phtml
- Open Directory - Computers: Internet: WWW:
Web Portals: Netscape Netcenter: My Netscape
Network:
http://dmoz.org/Computers/Internet/WWW/Web_Portals/Netscape_Netcenter/My_Netscape_Network/
- Internet Alchemy : Internet Alchemy : RSSMaker:
http://internetalchemy.org/rss/index.phtml
- xmlTree - The leading directory of XML content on
the Web: http://www.xmltree.com/rss/index.htm
- XML.COM - Standards List Sorted by Date: http://www.xml.com/xml/pub/standate/
- W3C Scalable Vector Graphics (SVG): http://www.w3.org/Graphics/SVG/
- VML - the Vector Markup Language: http://www.w3.org/TR/1998/NOTE-VML-19980513
- Vector (infinitely zoomable) graphics for the Web,
with implications especially for maps and technical
diagrams.
- News Industry Text Format: http://www.nitf.org/
- Meta Content Framework Using XML: http://www.w3.org/TR/NOTE-MCF-XML/
- 'Content about content' - i.e. information for search
and indexing engines and other software agents which must
make some sense of the document.
- Audio, Video, and Synchronized Multimedia: http://www.w3.org/AudioVideo/
- The SMIL standard. I believe SMIL has implications
not just for the Web, but for all sorts of presentation
media including digital television.
- XHTML 1.0: The Extensible HyperText Markup Language:
http://www.w3.org/TR/WD-html-in-xml/
- Backwards compatibility: implementing HTML in XML.
Only very well written HTML is going to work!
- XML Catalog proposal: http://www.ccil.org/~cowan/XML/XCatalog.html
- XHTML 1.0: The Extensible HyperText Markup Language:
http://www.w3.org/TR/xhtml1/
- Template Resolution in XML/HTML: http://www-uk.hpl.hp.com/people/ak/doc/trix.html
- eXtensible Server Pages (XSP) Layer 1: http://java.apache.org/cocoon/xsp/WD-xsp.html
- Workflow Management Coalition: http://www.aiim.org/wfmc/mainframe.htm
- DSML.ORG: The Standards Effort to Link Directories
with XML: http://www.dsml.org/
-
Turorials
-
- Info for Newcomers to XML at XMLINFO: http://www.xmlinfo.com/newcomers/
- Producing HTML tables with XSLT:
http://www.cogsci.ed.ac.uk/~dmck/xslt-tutorial.html
- A Tutorial in XML and XSL Authoring: http://pdbeam.uwaterloo.ca/~rlander/XML_Tutorial/
- Java & XML: 1 + 1 > 2:
http://www.sun.com.au/sjug/pres/xml/JavaAndXML/seminar.html#Slide3
- The WDVL: XML Tutorials:
http://www.wdvl.com/Authoring/Languages/XML/Tutorials/
- Generally Markup: XML Resources: http://pdbeam.uwaterloo.ca/~rlander/XML_Tutorial/
- developerWorks : XML : Education:
http://www.software.ibm.com/developer/education/xmlintro/xmlintro.html
- SGML/XML: Using Elements and Attributes:
http://www.oasis-open.org/cover/elementsAndAttrs.html
- Producing HTML tables with XSLT:
http://www.cogsci.ed.ac.uk/~dmck/xslt-tutorial.html
- Welcome to XML School: http://www.refsnesdata.no/xml/
- Practical XML : An introduction to XML and XSL
stylesheets:
http://www.kst.com/articles/2000/January/practical_xml1/index.php
- Crane Softwrights Ltd. - Training:
http://www.CraneSoftwrights.com/training/index.htm#ptux-dl
- developerWorks : XML : Education:
http://www-4.ibm.com/software/developer/education/xmlintro/xmlintro.html
- RSS Tutorial:
http://my.netscape.com/publish/help/mnn20/quickstart.html#rsssyntax
- XML DTD Tutorial: http://www.xml101.com/dtd/
-
Software resources
-
-
Editors
-
- Editing SGML with Emacs and PSGML - Table of
Contents:
http://rainbow.ldeo.columbia.edu/documentation/programs/psgml/psgml_toc.html#SEC2
- A GNU Emacs mode for SGML files:
http://www.lysator.liu.se/projects/about_psgml.html
- This is what I use and recommend (I personally
use XEmacs rather than GNU Emacs)
- SoftQuad XMetaLhttp://www.softquad.com/index_main.html
- Mulberry Technologies -- tdtd Emacs Major Mode
for SGML and XML DTDs: http://www.mulberrytech.com/tdtd/
- Download Morphon XML Editor 1.0b41:
http://www.lunatech.com/products/morphon-xml-editor/download/
-
Browsers
-
- Jumbo:
http://ala.vsms.nottingham.ac.uk/vsms/java/jumbo/
- Doczilla: http://www.doczilla.com/download/index.html
- XML Viewer : another alphaWorks technology: http://www.alphaworks.ibm.com/tech/xmlviewer
- InDelv: http://www.indelv.com/
-
XML to HTML on the fly
-
- IBM XML Web Site, Education - Accessing XML on
the Client:
http://www.software.ibm.com/xml/education/client/client.html
- Apache Cocoon: http://xml.apache.org/cocoon/
- Apache is the world's most widely used Web
server. This is the Apache project's server-side XML
to HTML conversion strategy, important for serving
XML documents while many browsers are still unable to
interpret it. Implemented as a Java Servlet, may work
with other Servlet enabled Web servers (but then does
anyone serious use anything other than Apache
anyway?)
-
XML Database integration
-
- DB2XML A tool for transforming relational
databases into XML documents:
http://www.informatik.fh-wiesbaden.de/~turau/DB2XML/index.html
- Tamino - The Information Server for Electronic
Business, Software AG: http://www.softwareag.com/tamino/
- A database which claims to store XML directly.
Whether this means that it's really an
object-oriented database underneath I'm not
sure.
- ODBC2XML: Merging ODBC data into XML documents:
http://members.xoom.com/_XOOM/gvaughan/odbc2xml.htm
- pgxml homepage: http://www.morinel.demon.nl/pgxml/
- My favourite database engine, Postgres,
- XML Lightweight Extractor : another alphaWorks
technology: http://alphaworks.ibm.com/tech/xle
-
Conversion tools and filters
-
- RTF2XML: http://www.xmeta.com/omlette/
- Tool for converting RTF to XML, written in
Omnimark
- OmniMark Technologies Corporation: http://www.omnimark.com/
- A programming language for manipulating data
streams, useful in writing conversion filters from
other formats into XML.
-
Quick ways to produce DTDs
-
- DTDGenerator Frontend: http://www.pault.com/Xmltube/dtdgen.html
- DB2XML A tool for transforming relational
databases into XML documents:
http://www.informatik.fh-wiesbaden.de/~turau/DB2XML/index.html
- schematron:
http://www.ascc.net/xml/resource/schematron/schematron.html
- Widely recommended as a very powerful and elegant
solution, knows about schemas as well as DTDs.
- XMLschema.com: http://apps.xmlschema.com/
-
Structured Search tools
-
- Downloading sgrep:
http://www.cs.helsinki.fi/~jjaakkol/sgrep/download.html
- Probably the most powerful simple tool for
manipulating SGML and XML documents
-
Software collections and directories
-
- xml.apache.org: http://xml.apache.org/
- XMLSOFTWARE.COM: The XML Software Site: http://www.xmlsoftware.com/
- This (commercial) site tries to keep track of XML
related software tools which are available. Likely
not to effectively index open source tools in the
longer term.
- Free XML software:
http://www.stud.ifi.uio.no/~larsga/linker/XMLtools.html#SC_XSL
- IBM Developers: XML : Overview: http://www.ibm.com/developer/xml/
- eXtensible Server Pages (XSP) Layer 1: http://java.apache.org/cocoon/xsp/WD-xsp.html
- OpenXML: http://www.openxml.org/
- Major open source project to provide XML tools in
Java
- PHP3: Manual: XML Parser Functions: http://www.php.net/manual/ref.xml.php3
- PHP is a server-side scripting language -- probably
the best of the open source ones available. This manual
section shows how the PHP project intends to handle XML
at the server side, and is thus an alternative to
Apache's Cocoon technology.
- XML Authority Product Overview:
http://www.extensibility.com/xml_authority/xml_ath_specs.htm
- eidon products - Solutions for Structured Documents:
http://www.eidon-products.com/
- Dynamic XML for Java : another alphaWorks technology:
http://www.alphaworks.ibm.com/tech/dynamicxmlforjava
- XML Products Evaluation Form:
http://www.bluestone.com/scripts/SaApps/SaCGI.exe/XMLevaluate.class
- XML Script - XML tools for E-commerce: http://www.xmlscript.org/
- SAX: The Simple API for XML: http://www.megginson.com/SAX/
- Activated Intelligence Rocks Your Java World!: http://www.activated.com/
- W4F, the World Wide Web Wrapper Factory: Welcome: http://db.cis.upenn.edu/W4F/
- JDOM: Who We Are: http://www.jdom.org/credits/index.html
-
Commentry and background
-
- XML, Java, and the future of the Web:
ftp://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.html
- Scientific American: Feature Article: XML and the
Second Generation Web: May 1999:
http://www.scientificamerican.com/1999/0599issue/0599bosak.html
- An extremely clear and well written article
- DevEdge Online - Metadata:
http://developer.netscape.com/tech/metadata/index.html
- Netscape's official take on metadata.
- XML.COM - XML support in IE5:
http://www.xml.com/xml/pub/1999/03/ie5/first-x.html
- XML.com sets out to be a newsletter on XML and
related developments. It's contributors are in general
exceptionally well informed. In this article Tim Bray
(who works closely with Netscape) reviews Microsoft IE5's
XML compatibility.
- CNET News.com - Taking sides on XML: http://www.news.com/News/Item/0,4,37072,00.html
- XML, Java, and the future of the Web:
ftp://sunsite.unc.edu/pub/sun-info/standards/xml/why/xmlapps.html
- XML Namespaces: http://www.jclark.com/xml/xmlns.htm
- The Last Page: XML's Achilles Heel (Web Techniques,
June 1999):
http://www.webtechniques.com/archives/1999/06/lastpage/
-
XML EDI and e-Commerce stuff
- A number of competeing proposals are being developed to
do automatic businessto business transfer of invoices,
orders,et cetera...
-
-
- CNET.com - News - Services & Consulting -
Big-name chemical firms join business e-commerce trend:
http://news.cnet.com/news/0-1008-200-1579569.html?tag=st
-
Collaborative initiatives
-
- The OBI Consortium: http://www.openbuy.org/
- A solid business community consortium
- Welcome to RosettaNet: http://www.rosettanet.org/
- Probably the most incompetent and unprofessional
Web site I've ever seen. This organisation claims to
be the hub of EDI in XML development, but their Web
site gives no comfort whatever regarding their
competence.
- Biztalk - Letting computers speak the language of
business: http://www.biztalk.org/
- Microsoft's tame e-Commerce consortium.
- FpML.org: http://www.fpml.org/
- JP Morgan - PriceWaterhouseCoopers initiative,
apparently mainly aimed at financial services.
- Electronic Business XML (ebXML) Home Page: http://www.ebXML.org/
-
Suppliers
-
- DEDIOUX - Dynamic EDI Objects Using XML:
http://www.americancoders.com/OpenBusinessObjects
- ariba.com - welcome: http://www.ariba.com/
- Welcome To OpenLink Software: http://www.openlinksw.com/virtuoso/
-
Stories
-
- XML Applications Stand Up To EDI:
http://www.techweb.com/wire/story/TWB19990416S0002
- XML Applications Stand Up To EDI:
http://www.techweb.com/se/directlink.cgi?INW19990419S0014
- News story about Dell Computer's XML
- CNET News.com - IBM links business software,
e-commerce:
http://www.news.com/News/Item/0,4,35128,00.html
- News story about IBM's XML e-Commerce
-
WAP/WML
-
- WAP WAP Binary XML (WBXML) Encoding Specification: http://www.w3.org/TR/wbxml/
- wml-tools: http://www.pwot.co.uk/wml/
- www.kannel.org: http://www.kannel.org/
- XML Icon Gallery.: http://www.iol.ie/~alank/xml/icons.htm
give me
feedback on this page // show previous
feedback on this page