qad_doc2xml - Tutorial 1:
Conversion of a Word document in XML

 

Preparation

Before starting qad_doc2xml, check the word document wich is going to be converted (folder "\Examples\Tutorial1", file "Recipe1.doc"). It is a word document with different paragraph styles (colours are to be ignored, they are used give a better overview).


Start qad_doc2xml and select the word document

Select file "Recipe1.doc" (in folder "\Examples\Tutorial1") in the "Source" section. To avoid an error message, close word before getting started with qad_doc2xml.


Name the new XML document

... e. g. "Recipe1.xml"


Determine the first ("root") XML-tag


Conversion rules

Specify an XML-tag for each paragraph-style in word.


Load DTD

To edit conversion rules you can load a list of all tags from a DTD (if one is available). Click "Get Taglist from DTD" and select the "cookbook.dtd" file.

Note: The DTD should not be in Unix file format.


First conversion

Click the "Convert" button. After a successfull conversion the result is shown by clicking the "View XML" (in the Browser) or "View code" (in notepad) button.

Until now, the result (see screenshot) is not very convincing yet:


Fixing levels

Set the "level" column as follows:

Now again convert the document and compare the result with the first conversion's one.


Unneeded Information

The resulting file still shows some unneeded information (see screenshot) wich is yet easy to remove.

Click the "Special" text field in the "Ingredients" line and activate the "ingnore text ..." function in the appearing dialog box.


"Text in Child"

The following section is not yet optimally solved:

The result should be like this:

<Recipe>
<Name>Chicken Curry</Name>
<Course>Entree</Course>
...
</Recipe >

... wich can also be performed by qad_doc2xml: Click the "Special" text field in the "Recipe" line and enter "Name" in "Text in Child Element".


Attributes

Attributes can be fixed for each tag as shown:

After another conversion the result should now look like this:


Text in attributes

Alternatively "Chicken Curry" can be set as an attribute of "Recipe".

Click the "Special" text field in the concerning element (here: "Überschrift 2") and modify entries as follows:


"Hard" formats - Bold, Italics

qad_doc2xml recognizes also the following "hard" formattings (italics, bold, underline, smallcaps). As this option slows down the conversion process it is deactivated by default. Using this function choose the following entries:

Result:


Save conversion rules

Save the ruleset to convert similar documents.


Templates

XML documents usually have a detailed header wich can be added automatically, too. See the sample template-file "cookbook_template.txt". Edit the text before and after the <! - - word text --> tag (that ist the place where qad_doc2xml will place the resutls).

You can add a header using the function "Select Tmpl.", "cookbook_template.txt".


see also Tutorial 2 (conversion in XHTML) and Tutorial 3 (conversion in XML/TEI)

<<Back to Homepage