
Exegenix KeepMedia Case StudyOverviewKeepMedia is a premium content service.
Their PDF source material is taken directly from their publishing partners' original printed materials and can be complex and design-heavy. The converted output must be high-quality richly-structured XML in order for KeepMedia to deliver the services required by their subscribers. KeepMedia selected the Exegenix Conversion Service based on a winning combination of quality and price, and is very happy with the results. Exegenix has been flexible and responsive in working with KeepMedia, and consistently delivers high quality output, says Dan Climan, Director of Content Management, KeepMedia. Organization
For publishers, KeepMedia provides a marketplace to sell magazine content and offline, print subscriptions, as well as a technology platform they can use to augment or replace their current sites. KeepMedia's patent-pending personalization and search technology ensures that consumers will continually be introduced to articles that match their interests. These same personalization features allow publishers to reach new readers and potential subscribers. The technology essential to these services is XML. Articles from a variety of publications must therefore be converted into XML from their original PDF format.
Type of materialThe conversion of magazine content can be a complex undertaking. Printed articles are generally laid out in innovative ways to make them more visually appealing to the reader, and to draw the reader further into the magazine. XML, on the other hand, must be a structured representation of this information. The more challenging characteristics of magazine material include: Complex text flows
In heavily designed material, text flow cannot necessarily be described by a single path - sidebars or pull-quotes can be offset from the main text flow; a page can contain parts of two or more logical text flows; text can flow around and even over images. Difficulties in correctly identifying and understanding the text flow in the original document can result in XML output that contains out-of-order text or elements, requiring post-conversion processing. Graphically rich layoutIn heavily designed material, text and graphical objects are painted on the same areas of the page - for example, a complex graphical page background with body text overlaid; an underlaid capital where a letter is drawn in grey underneath text; an image with both captions and body text incorporated. Difficulties in distinguishing between text and graphics in the original document can result in XML output that contains text captured as a graphic, or textual parts of a graphic captured separately, requiring post-conversion processing. The analysis and extraction of such content into a meaningful order in the converted XML file via scripting conversion tools is not currently possible, and can be time-consuming and arduous to tag manually. For more information on the effect of design on the conversion process, see Exegenix: Document Complexity in XML conversion. ChallengeKeepMedia needed a way to make sense of this sometime chaotic source material, to be able to publish articles online with visual appeal, from files structured internally for search and personalization. Complex text flowsIn order to display the content properly online, distinct text flows and their component sub-objects must be precisely identified and segregated. One of the challenging types of article to be converted, from a do-it-yourself home improvement periodical, consists of how-to information - concise step-by-step instructions, alongside a general description of the process. For each step, the source material includes an image (generally with callout text overlaid), the step number, and instructional text. The text flow that runs in parallel with these instructions consists of multiple pages - including not only normal structural conventions such as subsections, etc., but also its own internal sidebar with specific accompanying structures, as shown below: ![]() The online format requires the multi-page complementary text to be the first part of the XML file, its related sidebars placed properly between subsections, and the steps content in the correct order becoming the final part of a given article. KeepMedia had noticed that other conversion vendors tended to mix up the contents of sidebar blocks with the main body of the text, so the article no longer made sense. Complex graphicsKeepMedia's source material is graphically rich, with text overlaid on many of the images, and even individual images overlaid on one another. The rules for converting this text into XML depend on the context and purpose of each image:
In this image, for example, several types of text are laid directly on the bitmap: ![]() So the source material is challenging, and the output requirements stringent. Says Dan Climan: Working with Exegenix has allowed KeepMedia to substantially accelerate the process of converting content with this level of complexity. Solution
In May 2004, following a trial conversion of representative material, KeepMedia selected the fully turnkey Exegenix Conversion Service. Like many other organizations that focus on their core capabilities rather than the means-to-an-end conversion process, they were pleased with the idea - and the results - of combining innovative conversion technology with a skilled services team, to deliver ready-to-load XML. Exegenix technologyExegenix technology is designed to minimize the costs of XML content conversion via a groundbreaking approach that converts into XML any file that can be printed to PostScript or PDF. Exegenix's revolutionary approach uses visual cues to uncover a document's structure, much the same way that humans do. People rarely have problems determining the hierarchical structure of any document they encounter, because they look at a document as a whole, taking into consideration each graphical object's format, position, and context. Exegenix technology does the same thing - it interprets a document's logical structure based on the appearance and position of its components. For example, it's easy to see which of these pages contains a sidebar, and which is standard three-column text: ![]() Unlike Exegenix technology, other automated solutions will not recognize and isolate parallel text flows. An additional issue is that text overlaid on large images becomes illegible when these are reduced in size for display online. Exegenix provides automated recognition of images that would exhibit this problem, and provides KeepMedia with special markup and both thumbnail and full-size versions of each image, allowing KeepMedia to generate a click for large version link for each such image automatically. Exegenix servicesThe complex nature of KeepMedia's source material means that humans are inevitably involved in the review process, and the integral ECS Inspector provides intuitive tools to streamline this involvement. Once the automated conversion process is complete, our services team examines structures and text identified by the conversion engine, adding value to the output and resolving any outstanding conversion ambiguities. They can also separate and extract text layers from graphical objects, to quickly and easily:
Finally, the files are automatically post-processed to the KeepMedia DTD, with metadata tagged so that the output XML will seamlessly integrate into KeepMedia's system.
Result / BenefitsAll KeepMedia's PDF source files are processed automatically, checked by hand, enhanced where required, and returned as ready-to-load XML. Human resource is generally the most costly part of any process. But because Exegenix conversion technology automates so much of the heavy lifting, our services team is able to spend time on issues that only human intervention can resolve, without raising the overall cost of conversion. Agrees Dan Climan, Director of Content Management, KeepMedia: Exegenix has exceeded our expectations. Their automated conversion technology gives us consistent output, and their services team's attention to detail ensures we can provide high quality articles to our paying customers without delay. KeepMedia gets great results at an outstanding price. |