HOMECONTACT US

Exegenix KeepMedia Case Study

Overview

KeepMedia is a premium content service.

The 2004 'EContent 100' Icon
KeepMedia has been elected to the 2004 EContent 100—the 100 companies that matter most in the content industry.

Their PDF source material is taken directly from their publishing partners' original printed materials and can be complex and design-heavy. The converted output must be high-quality richly-structured XML in order for KeepMedia to deliver the services required by their subscribers.

KeepMedia selected the Exegenix Conversion Service based on a winning combination of quality and price, and is very happy with the results.

“Exegenix has been flexible and responsive in working with KeepMedia, and consistently delivers high quality output,” says Dan Climan, Director of Content Management, KeepMedia.

Organization

KeepMedia logoKeepMedia is a premium content service, delivering current and archived articles from hundreds of publications in one convenient location.

For publishers, KeepMedia provides a marketplace to sell magazine content and offline, print subscriptions, as well as a technology platform they can use to augment or replace their current sites.

KeepMedia's patent-pending personalization and search technology ensures that consumers will continually be introduced to articles that match their interests. These same personalization features allow publishers to reach new readers and potential subscribers.

The technology essential to these services is XML. Articles from a variety of publications must therefore be converted into XML from their original PDF format.

KeepMedia Free TrialVisit www.keepmedia.com for unlimited access to hundreds of publications in one convenient location with a free trial.

Type of material

The conversion of magazine content can be a complex undertaking. Printed articles are generally laid out in innovative ways to make them more visually appealing to the reader, and to draw the reader further into the magazine. XML, on the other hand, must be a structured representation of this information.

The more challenging characteristics of magazine material include:

Complex text flows

Document Complexity in XML conversion

During the conversion process, Exegenix software uses its integral knowledge of typographic principles to identify constructs such as sections, paragraphs, quotes, lists, tables, footnotes, etc., and applies a variety of techniques across the entire document to form a complete, cohesive, internal representation of its structure.

In heavily designed material, text flow cannot necessarily be described by a single path - sidebars or pull-quotes can be offset from the main text flow; a page can contain parts of two or more logical text flows; text can flow around and even over images. Difficulties in correctly identifying and understanding the text flow in the original document can result in XML output that contains out-of-order text or elements, requiring post-conversion processing.

Graphically rich layout

In heavily designed material, text and graphical objects are painted on the same areas of the page - for example, a complex graphical page background with body text overlaid; an “underlaid” capital where a letter is drawn in grey underneath text; an image with both captions and body text incorporated. Difficulties in distinguishing between text and graphics in the original document can result in XML output that contains text captured as a graphic, or textual parts of a graphic captured separately, requiring post-conversion processing.

The analysis and extraction of such content into a meaningful order in the converted XML file via scripting conversion tools is not currently possible, and can be time-consuming and arduous to tag manually.

For more information on the effect of design on the conversion process, see “Exegenix: Document Complexity in XML conversion”.

Challenge

KeepMedia needed a way to make sense of this sometime chaotic source material, to be able to publish articles online with visual appeal, from files structured internally for search and personalization.

Complex text flows

In order to display the content properly online, distinct text flows and their component sub-objects must be precisely identified and segregated.

One of the challenging types of article to be converted, from a do-it-yourself home improvement periodical, consists of “how-to” information - concise step-by-step instructions, alongside a general description of the process.

For each “step”, the source material includes an image (generally with callout text overlaid), the step number, and instructional text. The text flow that runs in parallel with these instructions consists of multiple pages - including not only normal structural conventions such as subsections, etc., but also its own internal “sidebar” with specific accompanying structures, as shown below:

KeepMedia page image with callout text

The online format requires the multi-page complementary text to be the first part of the XML file, its related sidebars placed properly between subsections, and the “steps” content in the correct order becoming the final part of a given article.

KeepMedia had noticed that other conversion vendors tended to mix up the contents of sidebar blocks with the main body of the text, so the article no longer made sense.

Complex graphics

KeepMedia's source material is graphically rich, with text overlaid on many of the images, and even individual images overlaid on one another. The rules for converting this text into XML depend on the context and purpose of each image:

  • Sometimes the image is included just for design purposes, a “decoration”. In these cases, the overlaid text is actually part of the main body, and must be accurately retained in the text flow. The image, as ornamentation for the printed document, is not suited to display online, and must be discarded.
  • Sometimes, the text is an image caption that must be properly associated with the specific image.
  • Sometimes, the text refers to a specific object on the image, and must be positioned exactly where it is in order to communicate the correct information.

In this image, for example, several types of text are laid directly on the bitmap:

KeepMedia text overlay example image

So the source material is challenging, and the output requirements stringent. Says Dan Climan: “Working with Exegenix has allowed KeepMedia to substantially accelerate the process of converting content with this level of complexity.”

Solution

Exegenix logo

Exegenix Conversion Service

The outstanding combination of Exegenix technology and the expertise of our service team makes the Exegenix Conversion Service fast, accurate, cost-effective, and scalable. For more information on this solution see the Exegenix Conversion Service Datasheet

In May 2004, following a trial conversion of representative material, KeepMedia selected the fully turnkey Exegenix Conversion Service. Like many other organizations that focus on their core capabilities rather than the “means-to-an-end” conversion process, they were pleased with the idea - and the results - of combining innovative conversion technology with a skilled services team, to deliver ready-to-load XML.

Exegenix technology

Exegenix technology is designed to minimize the costs of XML content conversion via a groundbreaking approach that converts into XML any file that can be printed to PostScript or PDF.

Exegenix's revolutionary approach uses visual cues to uncover a document's structure, much the same way that humans do. People rarely have problems determining the hierarchical structure of any document they encounter, because they look at a document as a whole, taking into consideration each graphical object's format, position, and context. Exegenix technology does the same thing - it interprets a document's logical structure based on the appearance and position of its components.

For example, it's easy to see which of these pages contains a sidebar, and which is standard three-column text:

three-column document layout icons

Unlike Exegenix technology, other automated solutions will not recognize and isolate parallel text flows.

An additional issue is that text overlaid on large images becomes illegible when these are reduced in size for display online. Exegenix provides automated recognition of images that would exhibit this problem, and provides KeepMedia with special markup and both “thumbnail” and full-size versions of each image, allowing KeepMedia to generate a “click for large version” link for each such image automatically.

Exegenix services

The complex nature of KeepMedia's source material means that humans are inevitably involved in the review process, and the integral ECS Inspector provides intuitive tools to streamline this involvement.

Once the automated conversion process is complete, our services team examines structures and text identified by the conversion engine, adding value to the output and resolving any outstanding conversion ambiguities. They can also separate and extract text layers from graphical objects, to quickly and easily:

  • Differentiate between images that are “decoration”, and those that are part of the main document content
  • Identify text that overlays an image as “part of the image” or “part of the main body text.”

Finally, the files are automatically post-processed to the KeepMedia DTD, with metadata tagged so that the output XML will seamlessly integrate into KeepMedia's system.

ECS Inspector Screenshot
Before any XML is generated, our service team confirms via the ECS Inspector that all graphics and text are separated as required, and all parallel text flows are correctly identified.

Result / Benefits

All KeepMedia's PDF source files are processed automatically, checked by hand, enhanced where required, and returned as ready-to-load XML.

Human resource is generally the most costly part of any process. But because Exegenix conversion technology automates so much of the heavy lifting, our services team is able to spend time on issues that only human intervention can resolve, without raising the overall cost of conversion.

Agrees Dan Climan, Director of Content Management, KeepMedia: “Exegenix has exceeded our expectations. Their automated conversion technology gives us consistent output, and their services team's attention to detail ensures we can provide high quality articles to our paying customers without delay.”

KeepMedia gets great results at an outstanding price.