- Q: What is Exegenix technology?
-
A: Exegenix technology can convert into XML
any file that can be printed to PostScript® or PDF. It uses visual cues to
uncover a document's structure, much the same way that humans do. It looks at a
document as a whole, taking into consideration each graphical object's format,
position, and context.
Back to top
- Q: How does Exegenix technology differ from
existing conversion solutions?
-
A: Unlike hand-tagging, it is accurate,
fast, and minimizes human resource requirements, enabling valuable expertise to
be employed more effectively elsewhere.
Unlike "scripted" conversion, there is no dependency on
consistently-applied formatting styles, and no programming expertise required
to develop/maintain configuration scripts.
Back to top
- Q: Is Exegenix technology similar to
OCR (Optical Character Recognition) technology?
-
A: No. Exegenix technology does not read
scanned images of paper documents; it examines PostScript® and PDF data created from a
print-to-file operation and uses the characters in the PostScript file
directly, with the layout of the page as a guide to the structure of the
document. However, if your legacy documents are paper-based and require
scanning and pre-processing into PDF format prior to XML conversion, Exegenix
can supply this service via one of our partners.
Back to top
- Q: What unstructured formats can Exegenix
technology convert to XML?
-
A: Exegenix technology can convert any
printable content; that is, any content that can be printed to a
PostScript® or PDF file. This means that all of the following conversion
requirements can be met:
- PDF to XML
- PostScript to XML
- Microsoft Word to XML
- RTF to XML
- WordPerfect to XML
- Quark XPress to XML
- Paper to XML
- Anything-you-can-print-to-PostScript to XML
See the document "Printing to PDF for Exegenix
Conversion Solutions" for further information. For formats other than PDF
and PostScript, Exegenix can supply technology and services to automate the
"print-to-PostScript" process.
Back to top
- Q: How does Exegenix technology know what
DTD to use?
-
A: Exegenix technology generates XML output
compliant with the Exegenix Export XML DTD. This DTD models generic,
hierarchical output and is based on DocBook (including the OASIS/CALS table
model). Formatting properties of the original document are represented using
attributes that conform to XSL-FO (XSL-Formatting Objects) and CSS (Cascading
Style Sheets). From this Canonical form, XSLT and/or other transformation
scripts can be applied to convert to other formats, such as another XML format,
SGML, HTML, RTF, ASCII Text, OeB, DAISY, DocBook 4.4 etc.
Back to top
- Q: What about presentation of the XML
output? Does it automatically use CSS/XSL also?
-
A: All formatting information in the source
document is retained in the XML output as XSL-FO and CSS compatible
information, which can be isolated from the output file by an
XSL-Transformation script. If you require automated post-processing in the form
of custom XSLT scripts and/or population of your content management system,
Exegenix can supply this service.
Back to top
- Q: Can Exegenix technology output SGML or
other formats?
-
A: Exegenix technology always creates XML.
This XML is valid to the Exegenix Export DTD.
The Exegenix Export DTD was designed such that Exegenix
Export XML will be easily transformable to whatever format a customer requires.
Exegenix has developed XSL-Tranformation scripts that that will transform
standard Exegenix Export XML to SGML syntax, and other transformations to
convert it to XHTML, DocBook,Text, OeB, DocBook 4.4, DAISY, Digital Talking
Book (DTBook) etc.
See The
Exegenix Export DTD for more information.
Back to top
- Q: Can Exegenix technology convert all
constructs within a document?
-
A: Exegenix technology is modular and
extensible. We have undertaken a broad-based document analysis project to build
supporting modules for the most commonly-found constructs. As distinct new
document formatting constructs are encountered in new document sets, new
modules can be added to support that type of construct. See "Document Complexity in XML
Conversion" for additional information on document types and their ease of
conversion.
Back to top
- Q: Is there a limit on the size of document
Exegenix technology can convert?
-
A: While there are no technical limits to
the size of single document that can be processed, large single documents
greater than 1000 pages will require significant amounts of processor power and
memory. Though it is not a requirement of the technology, Exegenix therefore
recommends splitting large documents into more manageable sections of around
200 pages.
Back to top
- Q: What platforms can Exegenix technology
run on?
-
A: Exegenix technology currently runs on
Win32 and Linux®.
Back to top
- Q: How is Exegenix technology
deployed?
-
A: The modular structure of Exegenix
technology means that we can offer a flexible range of deployment options to
address each individual organization's XML conversion needs. Depending on your
requirements for process control, data security, processing power and support,
our technology can be employed in a variety of ways: choose from onsite
installation of the heavy-duty Exegenix Conversion System, a deployment of the
nimbler Exegenix Conversion Satellite, or the fully turnkey Exegenix Conversion
Service - all leveraging our technology to provide accurate, fast,
cost-effective XML output.
Back to top