XML in Analysis and Design

Introduction to XML

The Extensible Markup Language (XML) is a significant development that adds flexibility and richness to the exchange of data on the Net. Specifically, XML and its related standards improve Web data exchange by providing a method for structured data transfer. This structured data transfer allows different systems to exchange data through standard or agreed upon formats. Furthermore, XML supports an object model that allows for the sorting, selecting, and manipulating of data in documents, regardless of whether the data resides on the server or on the client computer.

The purpose of XML is to allow for data exchange across multiple tiers of Web applications. Like HTML, XML is a subset of SGML (Structured Generalized Markup Language), optimized for delivery over the Web. HTML, however, restricts developers to a finite set of tags designed to describe Web pages for presentation, as opposed to exchange of reusable information or data. Unlike HTML, which tags only elements in Web pages for presentation by a browser, XML tags elements and stores them as data. This means that XML can be used to identify text or numeric elements within a Web page, allowing text to be passed into the page as data. This is accomplished by a coding system that makes use of data tags that precede the value. For this reason, XML can be defined as a metalanguage—a language used to define new markup languages. With XML, developers can create languages or documents specifically for an application.

The ecommerce applications that have the most to gain from using XML are obviously those that cannot accomplish what they need to do with HTML. The types of applications that can maximize their benefit from XML fall into one of four categories:

1. Applications that require a Web client to utilize data between two or more related databases.

2. Applications that have more processing on the client tier than on the server tier.

3. Applications that require Web pages to be presented differently depending on the user and device accessing the ecommerce system.

4. Applications that allow intelligent Web agents to dynamically tailor the information provided to users depending on the environment or Web client that the information must reside on.

In practice, then, XML is suitable for storing and exchanging data that can be plausibly coded as text (Harold & Means, 2001). On the other hand, it is unsuitable for multimedia data, such as photographs, sound, and video.

XML Structure

As stated earlier, XML can be considered a metalanguage. It is essentially text that can represent an element, tag, or character data. An element defines the existence of a data field, the tag delimits its beginning and ending, and the character data represents the element’s actual data. Sometimes the data is an actual value; other times it is a text string. This depends, of course, on the definition of the element’s attributes. An example of an XML text structure is shown in Figure 8.1.

Because XML can have multiple types of definitions, it is said to contain mixed content. The mixed content can be represented in the form of a tree structure as shown in Figure 8.2.

In figure 8.2, the contents of the F-Name and L-Name are character data, whereas name and profession are elements that have tags to delimit the definition values.

XML Parsing

From a process perspective, there are a number of related components or tools that provide XML with the ability to connect to various applications. The first component is called the XML Parser. In many ways a parser is similar to a compiler in that it is charged with the responsibility to translate source data into an object or target machine form. The XML Parser is responsible for editing

an XML document and breaking it down to its elements for transfer to various application programs. These programs could be Web browsers, word processing editors, database systems for storage of the source, spreadsheet, Java or C++ programs, or miscellaneous third-party products. The transferring process is known as “parsing out” the data. From an edit perspective, the XML Parser is responsible for validating the input data to ensure that there are no errors. This is accomplished by interfacing with a document type definition (DTD) file. The DTD contains the master list of data rules and document constraints. The data rules include the list of fields that can be used in an XML document. If the DTD does not contain a definition, an error will be generated. Error messages can have varying levels of significance, similar to the way compilers issue warning and fatal errors when editing source code.

Once an XML document has completed the parser, it can be generated to a number of different applications as shown in Figure 8.3.

What XML Is Not

XML is just a markup language, meaning that it is limited to the manipulation and presentation of data as text. Therefore, XML is not truly a programming language per se. Furthermore, XML is not a network protocol and therefore cannot be used to send data across a network without the assistance of other network protocols like HTTP and FTP (File Transfer Protocol). Finally, it is important to recognize that XML is not a database and cannot replace the functionalities of databases. XML can be stored within a database and provides a reuse function by providing an XML format to many different types of applications on many hardware platforms. The format is stored as data only.

Other XML Interfaces

There are other application interfaces that are used when the target is a Web browser. The functionality of these applications is to provide better formatting features that allow XML to better integrate with a Web browser screen or what is called a “stylesheet.” The Extensible Stylesheet Language (XSL) is an XML application that transforms XML documents into Web browser form. This application has been decomposed into two more specific programs: XSL Trans- formations (XSLT) and XSL Formatting Object (XSL-FO). XSLT is general- purpose in that it allows one XML document to be displayed in a Web browser, whereas the XSL-FO feature describes the layout of the text data. Another enhancement is that the Extensible Linking Language (XLL) is a more powerful linking construct that has greater capabilities than HTML’s A tag. XLL is comprised of two separate standards: Xlink, which describes the connections between XML documents, and Xpointer, which addresses the individual compo- nents within an XML document.

Another application that can be used to format Web browser pages is Cascading Style Sheets (CSS). CSS is usually utilized as a lower-level formatter for HTML. CSS does allow styles to be reused within documents without the need to redefine them. In addition, CSS allows for the definition to be changed anywhere in the document file.

XML and Ecommerce Applications

Ecommerce systems are all about the interfacing of systems across the Web to conduct business transactions. A number of years ago, a communication standard called Electronic Data Interface (EDI) was developed. EDI is essentially a standard file format that allowed companies to provide a consistent way for vendors to interface with their systems. For example, the publications industry has a standard file format for vendors who submit invoices to them. The EDI standard became part of most internal systems in the publications industry because it saved costly custom modifications for each system for both the company and its vendors. XML provides a great deal more functionality than EDI; however, it works on the same principle of allowing standard interfacing and exchange of data. What makes XML so exciting is that the data, unlike in EDI, can also be used as text to populate applications, as well as passing over the Web so that businesses can exchange information without standard network protocols. Thus, an EDI-like data interchange can be accomplished by combining data requirements and applications; the Web page combines application interface and data interface in one package. By using XSL capabilities, an XML file can be translated into HTML and be propagated into an operational Web application in a Web browser as shown in Figure 8.4.

Document Object Model

The Document Object Model (DOM) represents the most important component application for the ecommerce analysis and design function. Indeed, ecommerce analysts must decide which portions of the system will utilize XML applications, and more important how these XML documents should be defined. The DOM is a powerful tool that allows XML documents (as well as other documents)

to be manipulated as objects. This means that the DOM can access an XML document, reformat it, and preview it visually as an object tree-structure. Given that XML is structured hierarchically, any XML document can be displayed in a tree structure as depicted in Figure 8.5.

Figure 8.5 reflects a recipe as a tree of document tags. This structure makes it easier to understand—and to update—any XML design. The tree structure is also consistent with a class diagram structure, so the model is well suited for interface with object-oriented systems. Thus, the DOM allows developers to make modifications to the structure, which will then be automatically translated back to the XML document as shown in Figure 8.6.

Furthermore, the DOM can be used to create XML documents from scratch, so it is truly full-duplex in its relationship with XML documents. Thus, with the DOM, XML documents can be manipulated as objects, instead of only as streams of text. Having a visual representation of an XML document makes it easier to edit the information. Furthermore, the DOM is compatible with the Interface Definition Language used in CORBA (Common Object Request Broker Architecture), often used to implement object middleware applications.

The DOM also plays a pivotal role in the interface of XML with relational databases. This is accomplished by thinking of the relational database as a set

of DOM objects. If an ecommerce analyst wants to design a document that represents a particular schema of data, it can be described by the DOM and then translated into an XML document. The XML document can then be formatted using XSL into an HTML Web application. SQL can be used and mapped via XML to the backend relationally modeled data. Thus, database schemas can be brought together in a rather robust way using the DOM as shown in Figure 8.7.

XML as a Common Data Format

In Chapter 9, I discuss the different methods of linking legacy applications with ecommerce systems. Two methods, “Leave As Is” and “Enhance,” both required some form of data linkage. The format alternatives were parameter passing and database. While these options had value, XML provides yet another interesting

alternative, especially in cases where the legacy link does not have a database interface. XML, as previously stated, provides linkages without concern for the hardware platform. While relational databases provide for application and platform independence, there is huge overhead associated with using a database as a method of intercommunication among application programs.

With the rapid proliferation of ecommerce interfaces, free XML tools can represent another method of providing platform-independent and database- independent program communications. The unique strengths of using XML as a software data communication method include:

• Simple Syntax: XML is easy to generate using the DOM, and easy to parse using XSL.

• Support for Nesting: the tree format allows for programs to represent

data structures with nested elements—required by many program formats.

• Easy to Debug: XML is easy to format, especially with the DOM.

• Language and Platform Independent: an XML data file is completely transportable across different architectures and database products.

Thus, XML is becoming a popular format for enterprise data sharing, especially when there are mixed platform environments. For example, suppose a typical organization needs to take information from an IBM mainframe and display it on a Web site. Using XML, the mainframe schema of data can be accessed by the DOM, which in turn would format the output as an XML document. Using XSL, the Web server could then transform the XML into HTML, which could then be loaded into a Web browser template (Figure 8.8). It is important to note that I am not suggesting that XML replace relational database technologies. Indeed, XML is too slow to handle high-volume transactions. However, XML can be used to work with database subsets, called sub-schemas, in which a small picture of the data needed can be downloaded and formatted in a class structure for use by an ecommerce object application.

XML Applications with Database Systems

A significant issue addressed by XML is the ability to create consistent representation of data. While normalized databases were supposed to provide this consistency, most do not because they violate certain normal form rules. These violations ultimately create problems with the transportability of the data. Furthermore, stored procedures tend to be proprietary at the vendor database level, and therefore further complicate the task of providing seamless portability among databases. XML, on the other hand, creates a portable structure that allows ecommerce systems to connect multiple databases across different hardware platforms. Indeed, in some situations an XML-based representation of data and the http protocol might be the only method of connecting legacy systems with ecommerce technology. Figure 8.9 illustrates a sample ecommerce system that uses XML to provide application integration. This integration is provided via the linking of various databases using an XML interface.

Figure 8.9 reflects that XML, over the Web, provides the data integration between the Oracle and SQL Server systems. If XML were not used, the application developer would need to use separate SQL-based languages (SQL-7 for SQL Server and PLSQL for Oracle) to provide data access to each program. XML provides a means of using a central repository that can generate its own SQL necessary to access its data. This means that each database vendor supplies an SQL-XML interface that can generate (output) and access (read) XML formats. Thus, just knowing and using XML allows for access to multiple proprietary SQL coding.

The key to integrating XML and SQL is making the information in the query results easy to transform, transport, and transcribe. Figure 8.10 shows the architecture of integrating XML and SQL to produce multiple applications. Note that XSLT is used to transform the parsed XML data pages to multiple applications such as Web pages, Wireless Markup Language (WML) for cell phones, and handheld devices.

Therefore, publishing XML diagrams from relational databases allows for proven portability, scalability, manageability, and performance.

Using the Oracle product as an example, we can see a more detailed view of how a relational database and XML are integrated to provide portability between XML and SQL technologies. Figure 8.11 shows how Oracle has implemented XML in its product.

Using the Oracle XML parser, XML documents can be manipulated and modified with the DTD, which Oracle calls the information set or Infoset. The Oracle XSLT processor transforms XML into HTML or other structures. While these components are somewhat standard XML applications, Oracle adds an XPath engine that enables the querying of XML documents. The Oracle XML SQL utility then automates the tasks of producing XML from SQL query results and vice versa (that is, storing XML documents into tables in the database). The Oracle Text application provides a method of creating indexes of the XML structure to support SQL queries. Figure 8.12 provides a summary of the Oracle XML infrastructure.

Analysis and Design of XML Documents

The ecommerce analyst must provide XML specifications as part of the overall engineering architecture. To do so, the analyst must determine the XML compo- nents that will be used for data interface and what portions will be used for ecommerce application reuse. Therefore, the first step in designing an XML interface is to decide what will constitute an XML document. The second step

is to determine what elements, text, or code will comprise the XML document. The third step is to determine the reuse and propagation of the XML document data into the various ecommerce applications.

Step 1: Determining XML Documents

The process of determining what will be an XML document relates to two factors mentioned above: (1) what data will be used as an EDI application and (2) what will be used to populate Web applications. Regardless of which factor is used, an ecommerce analyst must determine the actual data structure for the XML tree. This data structure can be defined as a vocabulary of XML elements and attributes. The elements and attributes are the text in the vocabulary that enables communication of information.

An example of choosing the right reuse is the function provided by a search engine. A search engine is typically used for general search requirements to locate specific information in the database. An XML document can be created that identifies the database elements that are required for the search application. This is then coded in the DOM, and outputted as a XML document. The XML design allows a user to search for a particular type of product, similar to the way Amazon.com allows a user to search for a particular product offering like a book or a DVD (see Figure 8.13). Thus, the root of the XML document tree is the “Type Identifier” of the product. The DOM would then structure the XML tree as shown in Figure 8.14.

Thus, the “Type Identifier” element identifies a type of product that is needed to display the results of the search. The data can then be transformed into various HTML Web browser applications. The data interface can be accomplished via an SQL interface engine. So when a user wants to query the product database, he/she will use the search engine XML document. This document interfaces with the Oracle Text application, which created indexes on the related database

elements that were defined in the XML document. When the query is created by the application driving the ecommerce activity, the stylesheet created from the XML document will issue an SQL call to the XML SQL interface (Oracle), which will retrieve the necessary data and convert it back to the XML format as shown in Figure 8.15.

XML as an EDI

Earlier in this chapter it was mentioned that XML is an effective application to replace the EDI traditionally used to handle standard file interfaces between clients and vendors. EDI in legacy applications was typically implemented by designing a standard file format that would be required for any entity to interface with an ecommerce system. This file format was usually delivered in a comma-delimited record (CSV file) that matched with a standard database format required by the accepting system. A special application was then programmed that was designed to read the standard CSV file and convert it into the target database format. The program operated in a batch mode and included an edit program that reported errors as appropriate and determined the validity of the input transactions.

XML can be used to create a much more robust and portable EDI. The data structure required would be converted as an object-tree structure using the DOM (or coded directly as an XML document). The DOM would then generate the XML document, which would be parsed against the DTD. An XSLT application would then create a Web browser program that would accept the required input data.

The XML solution could also be adapted to accept a file input, instead of an input screen over the Web. In this situation, the vendor would be provided with

the XML schema and asked to create their data using the same XML document. Should an EDI standard exist in a particular industry group, then all parties would incorporate the XML format into their respective systems. Of course this requires all parties to agree about what that format should be. Figure 8.16 depicts the EDI XML configuration interface.

In a nutshell, the ecommerce analyst must know whether an industry standard exists, or whether the organization must create a standard for its own use and hope others will adhere to it. Typically the latter is the case, and the EDI standard format will evolve over time.

XML and Populating Web Applications

While XML can be used as a method of data interchange among applications, it is also useful to provide data that is part of a Web application. Thus, the ecommerce analyst needs to provide information about what portions of a particular application can receive data that can be used to populate its content. For example, suppose there is a particular phrase or logo that is used on multiple screens in an application. The ecommerce analyst needs to determine if this phrase or logo should be stored in a database and then formatted as an XML document. This process greatly reduces duplication of static coding on a Web page. More important, the concept of storing literals as data allows for better maintenance of text that can change over the life of a Web application. Indeed, the content in ecommerce systems is much more dynamic. Thus, XML can provide the standard

format to propagate content for Web applications. In Chapter 16, we will see how XML, along with content management products, provides significant maintenance capabilities by storing content as data and incorporating it into template systems.

In addition, XML can be used to provide data that is stored inside program applications. This data could be in the form of an internal array, or multidimensional tables that are used by an application to complete an algorithm, or as a component of some conditional selection logic. Internal data structures are very common, particularly in object component programming. Once again, XML can be used to provide the data structure and allow programmers to use it in many different applications. This allows for the creation of standard data structures that can be dynamically loaded into an application. The capability to dynamically load data structures can have a profound impact on program maintenance. Not only can the data structure be stored, its values associated with the structure can also be included in the XML document. Remember, XML can store both element definitions and values. Suppose, then, that an XML document contained a table that calculated discounts for book purchases. Such a calculation could have many tiers of logic, and could be dependent on how many books have been purchased and where. The XML document could store the table of questions, as well as the questions themselves. Furthermore, the “discount codes” could be stored as values. Thus, when the XML document is propagated into the appli- cation program, it not only has the algorithm structure, but also the actual codes that it needs to formulate the final discount amount. Should any of the questions

or codes change, which normally will occur, the developer need only update the XML document, changing its data structure or values, as necessary. These changes would then be reloaded into the application upon its next instance (or next execution of the program in the Web application). Figure 8.17 depicts the XML data structure for the discount algorithm.

Embracing XML goes beyond just agreeing to use it; it also means engineering applications that assume a component will change. This greatly affects the architecture of the applications. Thus, the ecommerce analyst must work closely with developers to ensure that specifications clearly establish infor- mation on what components will be stored as XML documents. Ecommerce analysts must also be aware how text (or content) will be stored in the relational database. This topic will be discussed further in Chapter 13. As previously mentioned, the reuse aspect of XML is consistent with object technologies. These aspects are discussed in greater detail in Step 3 of this section.

Step 2: XML Data Schemas

A schema is defined as a way of describing the characteristics of data. The DTD, in many ways, constitutes the basis of the XML schema. Unfortunately, the DTD falls short in relation to some required processing needs. An XML schema must provide three important benefits: (1) functionality and power, (2) ease of use, and (3) compatibility.

Functionality

The DTD alone provides limited semantic checking of XML. This means that there can be errors in the syntax of the code that will not be caught by the DTD. So it is possible to produce an XML document that will be authorized by the DTD, yet have some problems with syntax. Incorrect syntax can affect a statement’s semantics or meaning.

Ease of Use

Unfortunately, good DTDs are difficult to write because of the limited infrastructure support provided by their design. Furthermore, DTDs are not easy to read and edit. The result is that it can be difficult to produce XML that correctly matches the DTD. DTDs also lack extensibility and have poor version control mechanisms. Finally, there are syntactical differences between XML and DTD, which makes it even more difficult to match the two scripts.

Compatibility

The DTD is part of the first version of XML. There are many later versions of XML that have difficulty communicating with earlier versions. Ultimately this means that the relationship between XML documents and DTDs is very version sensitive, and system personnel must ensure that versions are kept consistent. While this requirement sounds straightforward, it is difficult to enforce after the initial installations of ecommerce systems. Indeed, version control becomes progressively more challenging as the system is updated.

The weaknesses of DTD can be summarized below:

• Poor support for semantic checking

• No data typing

• No relational support

• No support for objects features such as inheritance

• Cannot use parts of other DTDs

• Difficult to write

• No extensibility

• No version control mechanism

• Unique syntax

(Source: Spencer, 1999)

The schema method is obviously designed to address the DTD weaknesses summarized above. There are a number of product schemas designed to complement the DTD. A document content description (DCD) mechanism has been implemented as a schema in a number of products such as Internet Explorer 5. Internet Explorer 5 implements a schema that contains a more exten- sible and robust method for providing constraints on the structure and contents of XML documents. The importance of an enhanced DCD is that it can provide more advanced metadata, or data about data. The ecommerce analyst needs to be aware of whether specific DCDs are available, since this information can affect how XML documents are designed. Specifically, the more DCD capability, the more advanced the XML applications can be used in the overall engineering of the ecommerce system. More important is that the DCD has a significant influence on what data elements and structures can be included in the XML document. If the DTD edit abilities are limited, then certain data structures may be deemed too risky (from a quality perspective) for inclusion in an XML document.

No matter how ambitious the XML structure designed, the ecommerce analyst must provide a specification that defines the data elements and specific data types to be included in the XML tree structure. While the data elements should correspond to the definitions stored in the relational database, their syntax can be different from an SQL data type and type qualifier. Figure 8.18 represents the list of data types supported by XML.

Thus, an XML specification must list the actual names and XML definitions. One method of accomplishing this is for the ecommerce analyst to establish the corresponding XML name and attributes when building the data dictionary. Thus, an XML definition should be included for every data element that is a candidate for an XML solution. Another approach is to provide an XML definition for all data dictionary data elements. The advantage of the latter approach is that it prepares all data elements for possible inclusion in an XML document.

Step 3: XML Reuse

XML documents can be designed to provide reusable components that can be part of different Web applications. In order to design such components, ecommerce analysts must track the number of operations that are reused in applications, similar to the example of the Search Engine. An example of application reuse can be described with a function that adds a new customer. Let’s say that this function is used when a caller wants to sign up as a customer to order a product. An XML document could be designed that would be comprised of the data elements necessary for inserting a new customer. The XML document would then be used through XSLT to generate an HTML screen. There is another operation that allows for entering new orders. This application requires the entry of the customer who has placed the order. Sometimes a new order is placed by a caller who has not yet signed up as a customer and is therefore not in the Customer database. Users would like the order entry screen to allow the customer to be added while the new order entry screen is operating. This is accomplished by allowing a pop-up window to be invoked when a new customer needs to be inserted during the order entry process. Instead of writing a new screen, the application can use the same XML document that was used to generate the new customer application. Thus, the XML document serves to provide a reusable application that is generated into more than one application screen. Analysts should seek to design XML documents that link up with object classes. The purpose of this approach is to provide matching data for classes that are designed by nature to be reusable. This means that the need for reusable XML documents for applications can be mapped to classes designed to become reusable object components. Of course, this approach assumes that the host system is based on object development. In legacy systems, or even those that contain hybrid combinations (a high probability in ecommerce systems), the likelihood of using this approach to identify all reusable components is slim. XML and its relation to object components are shown in Figure 8.19.

Another area that can assist in determining the need for XML reuse appli- cation is in a TP system. You might recall that the TP system was designed as a middleware application repository. The TP monitor component was designed to provide consistent operation and data integrity among many “linked” programs from different systems. Thus, the TP monitor was the center of traffic among different system components and reduced the need to program the same logic in different program languages and for different file systems. XML can become the data vehicle to operate within the TP system. Instead of requiring the TP system to format and update data in different file systems, XML can be used to send a standard format of the data needed to multiple applications across the network. Each system would have the ability to read the data and reformat it as necessary into their respective applications and file schemas. Figure 8.20 reflects the use of XML in a TP system.

Storing XML Documents in a Database

External XML document files can be loaded into a database system so that it can be better integrated with the SQL-based relational system. Notwithstanding the fact that there is a difference in file format, most of the physical data will remain stored in the database. The significant issue to remember is that the relational model is built under third-normal form and referential integrity rules. This, of course, focuses on a production-oriented database management component that contains a high volume of transactional processing as opposed to a data warehouse implementation (see Chapter 15). However, it is possible in a database system such as Oracle actually to store an XML document in the database without it being part of the traditional normalized schema. This means that the XML document may contain data elements unique to the document itself, thus not propagated into a particular entity. A database system like Oracle provides an XML parsing utility, which allows the XML document to be accessible via normal SQL queries and stored procedures. Thus, the XML document retains its own format while being assimilated into the database infrastructure. Accessing a data element within the XML document or in the relational database can be transparent to the user community. Figure 8.21 reflects the concept of XML and relational database schema integration.

The question arises from a design perspective of whether an XML document should contain data elements that have not originated from the relational part of the database. This certainly is a question that must be answered by the ecommerce analyst. The most logical support for this occurrence is when a data element in an XML document is unique to a particular application. Therefore, the XML document has been designed to use in a one-on-one relationship with a specific Web program or application. To propagate the data element into the relational database under these circumstances would create unnecessary overhead and the ongoing need for data synchronization. On the other hand, leaving the data elements in XML structure eliminates the need for dual storage and allows access from multiple sources, including traditional SQL-based applications.

From a specification perspective, an ecommerce analyst needs to understand how to design the XML-relational model. This task is easier if the analyst knows the physical database features and functions that are designed to support the product’s architecture. For example, Oracle supports the creation of an element known as a CLOB. “CLOB” stands for Character Large Object. In Oracle, a CLOB can hold character-based data like XML documents as large as four gigabytes (4GB). A CLOB column is stored in the database and is fully readable and writable. Using Oracle Text, CLOB files can be indexed for fast XML document searching across millions of rows. Obviously, Oracle needed to establish an XML/SQL API (Application Program Interface) to allow for the parsing and integration of both models within the database. Essentially it allows for the coexistence of the two.

Should the physical database vendor, like Oracle, support XML document storage within their database, ecommerce analysts still must be aware of the following challenges:

1. If an XML document is used by multiple databases vendors, will there be compatibility in the XML/SQL integration, or should the XML document be stripped out from one of the databases so that it is resident in only one file?

2. There cannot be any “external” references within the XML document that allow for insertion of data elements that exist, let’s say, in a URL. This would be problematic since there is no existing infrastructure in Oracle that allows for the external linking outside the confines of the database infrastructure.

3. There needs to be documentation and control among the different versions of XML and DTD used in all systems. Different versions between XML and DTD could cause serious problems in the quality of the parsing process as well as the validity of the applications generated through the XSLT.

XML-Based File Types in Oracle iFS

Oracle provides an Internet file system (iFS) that supports the infrastructure for defining XML documents. The file system essentially assigns descriptors that specify the XML structure and store the schema in the database. As stated above, the CLOB allows an entire XML file to mix structured data and text markup in the same document. For example, an “Order” file type may map to an OrderHeader and OrderLines table, while a “BenefitsClaim” file type can mix the methods of data and text by actually structuring text markup for a Summary Report and combining it with a Payments Report. In this case the data is stored in the traditional database, whereas the text stored as a CLOB. Figure 8.22 shows the contents of a sample Benefits Claim.

XML as a Centralized Data Search Engine

Another application of XML is as a central search engine. A central search engine is one that can be used to collect information over several different systems and provide summary information on the collective meaning of the information. This can be a difficult task in large organizations that have a proliferation of legacy applications running across different hardware platforms. While Chapter 9 provided guidance on ways of integrating legacy applications, it did so with approaches that have high overhead and require significant time to install. With XML, on the other hand, the goal of centralizing information quickly and cheaply is attainable. This can be accomplished by allowing each proprietary application to generate a standard XML document that holds the data and text needed. Each XML output could then be merged together in a database CLOB or integrated in a table relational format for querying of data. Furthermore, the searching could be accomplished via the Web, using the index utilities provided by the database vendor. While this “central” repository of data is similar in concept to data warehousing, it involves far less complexity in overhead, coding, and maintainability. Most important is that it can be accessed over the Internet using portable application programming. Thus, the ecommerce analyst should consider XML for all multi-data analysis requirements.

The importance of having a central repository for searching has further benefits. Common ecommerce initiatives involve accessing text documents on the Web for research, analysis, and business. Indeed, users would love to be able to query text in a document and pull out all relevant information and citations that they need, as opposed to being forced to print the entire document and find the information they need manually. We say then that such data needs to be stored as metadata. For example, suppose a user needed to access a Web site to review information about a particular subject. If the document were created in XML, it could be searched for matches on the subject, which would be looking at multiple XML documents from many systems. If a match is found, instead of just printing or copying the text, a portion of the data could be extracted from the document in order to populate another application. This application could be a spreadsheet program, or database system, that would capture the data or text and categorize it as appropriate. Thus, the text document has become metadata, and the user can select what portions of the text are useful for other applications.

XML Query Usage

Database products such as Oracle and Microsoft’s SQL Server must provide a set of query functions that support the manipulation of data as provided by SQL on the relational model. Obviously, with a hybrid model of XML integrated with the database, not all typical SQL queries can be supported. Ecommerce analysts need to understand the extent of query functionality available in XML documents and the types of documents that can be queried before making final design decisions relating to what will be stored as XML and what will be stored as a traditional data element. W3C (World Wide Web Consortium) has issued “query usage scenarios” that provide guidelines on the query usage for XML documents as follows:

1. Human-Readable Documents: perform queries on structured documents and collections of documents, such as technical manuals, to retrieve individual documents, to generate tables of contents, to search for information in structures found within a document, or to generate new documents as the result of a query.

2. Data-Oriented Documents: perform queries on the XML representation of database data, object data, or other traditional data sources to extract data from these sources, to transform data into new XML representations, or to integrate data from multiple heterogeneous data sources. The XML representation of data sources may be either physical or virtual; that is, data may be physically encoded in XML, or an XML representation of the data may be produced.

3. Mixed-Model Documents: perform both document-oriented and data- oriented queries on documents with embedded data, such as catalogs, patient health records, employment records, or business analysis documents.

4. Administrative Data: perform queries on configuration files, user profiles, or administrative logs represented in XML.

5. Filtering Streams: perform queries on streams of XML data to process the data in a manner analogous to UNIX filters. This might be used to process logs of email messages, network packets, stock market data, newswire feeds, EDI, or weather data to filter and route messages represented in XML, to extract data from XML streams, or to transform data in XML streams.

6. Document Object Model (DOM): perform queries on DOM structures to return sets of nodes that meet the specified criteria.

7. Native XML Repositories and Web Servers: perform queries on collections of documents managed by native XML repositories or Web servers.

8. Catalog Search: perform queries to search catalogs that describe document servers, document types, XML schemas, or documents. Such catalogs may be combined to support search among multiple servers. A document-retrieval system could use queries to allow the user to select server catalogs, represented in XML, by the information provided by the servers, by access cost, or by authorization. Once a server is selected, a retrieval system could query the kinds of documents found on the server and allow the user to query those documents.

9. Multiple Syntactic Environments: queries may be used in many environments. For example, a query might be embedded in a URL, an XML page, or a JSP or ASP page; represented by a string in a program written in a general-purpose programming language; provided as an argument on the command-line or standard input.

XML versus the Database

XML appears to provide many storage and content benefits. However, I have not meant to suggest that the relational database model is no longer necessary and should be scrapped for total XML-based storage. The reality is that XML deals much better with content than storage. During complex searches on large databases, XML searches for data elements will be considerably slower than through the relational model. Furthermore, relational databases provide better facilities for security and the maintainability of the data itself.

However, databases have their downsides also. Their content cannot easily be shared, and the standardization of their design is questionable; therefore, exchanging data between two different database systems can be very difficult. Furthermore, field value among multiple systems may not map well, and as a result there can be significant incompatibilities when passing data among applications. XML addresses this shortfall by providing a neutral data format that supports easy data exchange.

The result of this chapter’s discussion on XML is that, like so many other system components, XML is best integrated within a framework, in this case the traditional relational database. Indeed, XML works best when it is integrated with the relational model, where each component can be used to the fullest extent of its design advantages. Thus, it is not XML versus the database; it is XML and the relational database as a new hybrid storage model.

XML and SVG

SVG or Scalable Vector Graphics is a language for describing two-dimensional graphics in XML. SVG allows for three types of graphic objects: vector graphic shapes (images consisting of straight lines and curves), images, and text. Graphic objects can be grouped, styled, transformed, and composited into previously completed objects, meaning that they can be exported to XML and then imported back as an XML version of the original graphic. Text can be included as part of the XML SVG, which enhances the document’s searchability and the accessibility of the SVG graphics within the document. Therefore, SVG enables the creation of resolution- and media-independent graphics in a text-based format that permits integration with XHTML, XSL and XSLT, XLink, DOM, and other W3 specifications, including support for CSS, scripting, and animation.

SVG drawings can be dynamic and interactive. The Document Object Model (DOM) for SVG, which includes the full XML DOM, supports vector graphics animation through scripting languages. The power of this model is that it allows for scripting to be used for both XML text and XML SVG drawings within the same document simultaneously. This means that one search will examine both text and graphic images within the domain of the search. Furthermore, SVG files are not proprietary binary data files as are many other graphic formats. Because SVG files use XML, their syntax is readable as text files. This means that developers can easily create scripts that dynamically modify content as well as exchange designs between tools. XML also describes information in terms of a structured data format thus allowing applications to process the same SVG image differently.

SVG is currently a W3C candidate for recommendation, meaning that it has not yet been authorized as a W3C standard. However, SVG is expected to undergo widespread testing and eventual acceptance as a standard shortly. The ability of XML to be extended beyond just text-based documents allows for much greater use within Web applications. In Chapter 6, I described the complex and extensive features of animation and interactivity being used in today’s Web applications. The inclusion of XML into graphics establishes much greater capabilities to combine database operations with interactive objects using XML as the standard delivery medium.

SVG Formats

Defining objects, such as text and shapes, in an SVG image is relatively straight- forward. While a software developer can code most SVG formats directly, there are a number of third-party products such as Adobe® and Illustrator® 9.0 that allow designers to generate complex images easily by simply exporting the files as SVG. However, an ecommerce analyst must have a general understanding of how SVG objects are defined, and what the various coordinates and elements refer to in the syntax when considering animation and interactivity in any Web design project. Listed below is some important information about the types of graphics supported by SVG and the most common data type formats. This information may need to be identified in the DOM and possibly the database system. Most important, it needs to be described in the repository of data, meaning that graphic images and their formats need to be organized in a central place so that they can be reused in the same manner as data and applications.

SVG supports three fundamental types of graphics elements that can be rendered onto the canvas:

1. Shapes: representing combinations of straight line and curves

2. Text: representing combinations of characters

3. Raster Images: representing an array of values that specify the paint color at a series of points on a rectangular grid.

The common data types for SVG properties and attributes fall into the following categories:

• <integer>: a whole number.

• <number>: real number value.

• <length>: a length is a distance measurement.

• <coordinate>: represents a length in the user coordinate system that is the given distance from the origin of the user coordinate system along the relevant axis (the x-axis for X coordinates, the y-axis for Y coordinates).

• <angle>: an angle value is a number optionally followed immediately with an angle unit identifier. Angle unit identifiers are: deg, degrees; grad, grads; rad, radians.

• <color>: the basic color type.

• <paint>: specifications of the type of paint to use when filling or stroking a given graphics element.

• <percentage>: the format of a percentage value is a number immediately followed by a “%”. Percentage values are always relative to another value, for example a length.

• <uri>: Uniform Resource Identifiers [URI] references. A URI is the address of a resource on the Web.

• <frequency>: a frequency value is a number immediately followed by a frequency unit identifier. Frequency unit identifiers are: Hz, hertz ; kHz, kilohertz.

• <time>: a time value is a number immediately followed by a time unit identifier. Time unit identifiers are: ms, milliseconds; s, seconds.

Summary of SVG Features

SVG has many advantages over other image formats, and particularly over JPEG and GIF, the most common graphic formats used on the Web today. Listed below is a summary of SVG features that the ecommerce analyst should keep in mind when considering the integration of animation and graphics as an XML extension.

• Plain text format: SVG files can be accessed by a number of software tools and are usually smaller and more compressible than JPEG or GIF images.

• Scalable: unlike bitmapped GIF and JPEG formats, SVG is a vector

format, which means that SVG images can be printed with high quality at any resolution.

• Zoomable: images can be zoomed in on any portion of an SVG image without any visible degradation.

• Searchable and selectable text: unlike bitmapped images, text in SVG text is selectable and searchable. For example, you can search for specific text strings, like city names in a map.

• Scripting and animation: enables dynamic and interactive graphics far more sophisticated than bitmapped or Flash images.

• Works with Java technology: SVG complements Java technologies’ high-end graphics engine.

• Open standard: SVG is an open recommendation developed by a cross- industry consortium. Unlike some other graphics formats, SVG is not proprietary.

• True XML: as an XML grammar, SVG offers all the advantages of XML.

Search This Blog

System Analysis and Design course