Tuesday, October 30, 2007

Semantic Web and Beyond


As computing becomes ubiquitous and pervasive, computing is increasingly becoming an extension of human, modifying or enhancing human experience. Today's car reacts to human perception of danger with a series of computers participating in how to handle the vehicle for human command and environmental conditions. Proliferating sensors help with observations, decision making as well as sensory modifications. The emergent semantic web will lead to machine understanding of data and help exploit heterogeneous, multi-source digital media. Emerging applications in situation monitoring and entertainment applications are resulting in development of experiential environments.

Brings together forward looking research and technology that will shape our world more intimately than ever before as computing becomes an extension of human experience. Covering all aspects of computing that is very closely tied to human perception, understanding and experience. Bringing together computing that deal with semantics, perception and experience. Serving as the platform for exchange of both practical technologies and far reaching research.


Saturday, October 27, 2007

An EA (Enterprise Architecture) starting point

Looking abstractly; most companies have a need to know "where to start" here is a jumping off point and it maintains an abstract and simple view to be leveraged by all organization. I submit that it’s a starting point for a EA (Enterprise Architecture) model. More to come soon, thoughts and comments welcome?

Sunday, December 17, 2006

Defining the Deliver Module

The Deliver module determines how to get the right content to the right audience on the correct device. The deliver module for ECM is used to present information from manage, store and preserve modules. They also contain functions used to enter information in systems (such as information transfer to media or generation of formatted output files) or for readying (for example converting or compressing) information for the store and preserve components. The functionality in the deliver module is also known as “output” and summarized under the term “Output Management.” The deliver module comprises of three groups for functions and media: Transformation Technologies, Security Technologies, and Distribution. Transformation and Security as services belong on the middleware level and should be available to all ECM components equally.

Layout/Design; Visually, there is very little originality in design — it is the rearrangement of an idea observed and recorded previously. No matter how simple the design may be, there are certain principles that must be applied. Appreciation of their importance will be slowly gained by observation and practice together with good judgment. Principles of design should always be incorporated in any graphic design project to assist its communicating and graphic interest, however in the planning of a basic design; the designer must produce a job to suit the class of work, the copy, and the tastes of the business. To develop a senses of design use the three “eyes”; Visual-eyes: examine closely all types of printed material; Critic-eyes: separate the good from the bad; Analyst-eyes: select the element that makes it a good design. There are three essential qualities needed; Vision — to be able to detect an idea and then to contemplate the idea; Imagination — to be able to use an idea effectively; Judgment — to be able to assess the idea's value, correct place, and use; Balance — this is the result of an arrangement of one or more elements in the design so that visually, they equal each other.

Publishing; Publishing is the industry of the production of literature or information - the activity of putting information for public view. Traditionally, the term refers to the distribution of printed works such as books and newspapers. With the advent of digital information systems and the Internet, the scope of publishing has expanded to include websites, blogs, and other forms of new media. As a business, publishing includes the development, marketing, production, and distribution of news and non-fiction magazines and books, literary works, musical works, software, other works dealing with information.

Portal; The simplest definition is a door or entrance; specificity is required for ECM. Portal is a web site that is the entry point to both the Internet and Intranet services. A portal offers functionality in the areas of: search, personalization, aggregation and all abstract business processes leveraging portlets as the core for aggregating information at this presentation layer.

Portlet; A portlet is a small chunk of secondary content that is often assisting or functional, like a navigation or information on related items. Most of the time they grace one of the two columns that are available on most sites. This does seem to indicate that portlets are a subtype of pagelets but for various usage reasons it’s generally beneficial to regard them as separate and simply include portlets using a pagelet.

Transformation Technologies; Transformation is a form of conversion in which a file is converted into a file format with a comparable structure (examples, OWL to XML, XML to OWL, XML to XML, or SGML to HTML) and is the changing of content from one format to the needed delivery format.. Usually, this form of conversion can be carried out very well. However, it can lead to the usual problems, since in many cases such a conversion is used to increase the quality of the files, based on the principle that 'we are at it anyway'.

COLD/ERM; COLD/ERM is the way documents and delivered from computer output (reports primarily) from magnetic disks, optical discs, and magnetic tape. Once the document has been stored, the reports can be distributed for viewing, printing, faxed, or distributed via a web interface for both or either Internet/Intranet. Often used for Internet customer facing applications and processes. Enterprise Report Management (formerly known as COLD technology, and, today, often written as COLD/ERM): A component technology of an ECM environment. This technology electronically stores, manages, and distributes documents that are generated in a digital format and whose output data are report-formatted/print-stream originated. Unfortunately, documents that are candidates for this technology too often are printed to paper or microform for distribution and storage purposes. This is mostly an aged terminology which over time will collectively assimilate into the “Document Management” functional model.

Personalization; Dynamic content automatically assembles the appropriate content to deliver personalized content which specifically meets user needs. User content actions and requests can be used to predict the content requirements. Personalization techniques currently in use include Personal records in a database/file, Rule base or a profile database, Data mining (of numerical and string data in formatted tables/files), Web-log data is stored in formatted tables/files, Text mining, Online analytical processing (OLAP) techniques.

XML; XML is an established standard, based on the Standard Generalized Markup Language, designed to facilitate document construction from standard data items. Also used as a generic data exchange mechanism. XML is a W3C initiative that allows information and services to be encoded with meaningful structure and semantics those computers and humans can understand. XML is great for information exchange, and can easily be extended to include user-specified and industry-specified tags.

OWL; Ontology Web Language (OWL) is a markup language for publishing and sharing data using ontologies on the Internet. OWL is a vocabulary extension of the Resource Description Framework (RDF) and is derived from the DAML+OIL Web Ontology Language (see also DAML and OIL). Together with RDF and other components, these tools make up the Semantic Web project.

PDF; Portable Document Format (PDF) is an open file format created and controlled by Adobe Systems, for representing two-dimensional documents in a device independent and resolution independent fixed-layout document format. Each PDF file encapsulates a complete description of a 2D document (and, with the advent of Acrobat 3D, embedded 3D documents) that includes the text, fonts, images, and 2D vector graphics that compose the document. PDF files do not encode information that is specific to the application software, hardware, or operating system used to create or view the document. This feature ensures that a valid PDF will render exactly the same regardless of its origin or destination (but depending on font availability when fonts are not encapsulated in the file).

Compression; image compression can be lossy or lossless. Lossless compression is sometimes preferred for artificial images such as technical drawings, icons or comics. This is because lossy compression methods, especially when used at low bit rates, introduce compression artifacts. Lossless compression methods may also be preferred for high value content, such as medical imagery or image scans made for archival purposes. Lossy methods are especially suitable for natural images such as photos in applications where minor (sometimes imperceptible) loss of fidelity is acceptable to achieve a substantial reduction in bit rate. Methods for lossless image compression are; Run-length encoding; Entropy coding; Adaptive dictionary algorithms such as LZW. Methods for lossy compression; reducing the color space to the most common colors in the image. The selected colors are specified in the color palette in the header of the compressed image. Each pixel just references the index of a color in the color palette. This method can be combined with dithering to blur the color borders. Chroma sub sampling. This takes advantage of the fact that the eye perceives brightness more sharply than color, by dropping half or more of the chrominance information in the image. Transform coding. This is the most commonly used method. A Fourier-related transform such as DCT or the wavelet transform are applied, followed by quantization and entropy coding. Fractal compression.

Syndication; Syndication is in which a section of a website is made available for other sites to use. This could be simply by licensing the content so that other people can use it; however, in general, web syndication refers to making Web feeds available from a site in order to provide other people with a summary of the website's recently added content. This originated with news and blog sites but is increasingly used to syndicate other types of information. Millions of online publishers including newspapers, commercial web sites and blogs now publish their latest news headlines, product offers or blog postings in standard format news feed. Syndication benefits both the websites providing information and the websites displaying it. For the receiving site, content syndication is an effective way of adding greater depth and immediacy of information to its pages, making it more attractive to users. For the transmitting site, syndication drives exposure across numerous online platforms. This generates new traffic for the transmitting site — making syndication a free and easy form of advertisement. The prevalence of web syndication is also of note to online marketers, since web surfers are becoming increasingly wary of providing personal information for marketing materials and expect the ability to subscribe to a feed instead.


Security Technologies; with the rapid growth of interest in the Internet, network security has become a major concern to companies throughout the world. The fact that the information and tools needed to penetrate the security of corporate networks are widely available has increased that concern.

Public Key Infrastructure (PKI); PKI enables users of an unsecured public network such as the Internet to securely and privately exchange data and money through the use of a public and a private cryptographic key pair that is obtained and shared through a trusted authority. The public key infrastructure provides for a digital certificate that can identify an individual or an organization and directory services that can store and, when necessary, revoke the certificates. Although the components of a PKI are generally understood, a number of different vendor approaches and services are emerging. Meanwhile, an Internet standard for PKI is being worked on.

Digital Rights Management (DRM)/Watermark; DRM is the umbrella term referring to any of several technologies used to enforce pre-defined limitations on software, music, movies, or other digital data. In more technical terms, DRM handles the description, layering, analysis, valuation, trading and monitoring of the rights held over a digital work. In the widest possible sense, the term refers to any such management. Watermark is a translucent design visible in handmade paper when held up to the light. Watermark designs were typically wired into the paper mold, and can denote either the size of the paper, the place of manufacture, or the intended market. Typically the watermark is present in the center of the right side of the paper mold.

Digital Signature; Digital signature is like a paper signature, but it is electronic. A digital signature cannot be forged. A digital signature provides verification to the recipient that the file came from the person who sent it, and it has not been altered since it was signed.


Distribution technologies leveraging

Mobile Device; Devices that are not directly connected into the network via a “hard-line”, an example of these types of devices would possible be all wireless connected devices. Mobile Devices or Handheld devices (also known as handhelds) are pocket-sized computing devices that are rapidly gaining popularity as the access to information in every walk of life becomes more and more mission critical. Along with mobile computing devices such as laptops and smart hones, and PDA’s represent the new frontier of computing as desktop computers find less and less favor among every day users. The following are typical handhelds such as: Information appliance, Smart phone, Personal digital assistant, Mobile phone and Handhelds.

Internet; The Internet, or simply the Net, is the publicly accessible worldwide system of interconnected computer networks that transmit data by packet switching using a standardized Internet Protocol (IP). It is made up of thousands of smaller commercial, academic, domestic, and government networks. It carries various information and services, such as electronic mail, online chat, and the interlinked Web pages and other documents of the World Wide Web. Contrary to some common usage, the Internet and the World Wide Web are not synonymous: the Internet is a collection of interconnected computer networks, linked by copper wires, fiber-optic cables, etc.; the Web is a collection of interconnected documents, linked by hyperlinks and URLs, and is accessible using the Internet.

Intranet; The Intranet is a private network that uses Internet Protocols, network connectivity, and possibly the public telecommunication system to securely share part of an organization's information or operations with its employees. Sometimes the term refers only to the most visible service, the internal website. The same concepts and technologies of the Internet such as clients and servers running on the Internet protocol suite are used to build an intranet. HTTP and other Internet protocols are commonly used as well, especially FTP and email. There is often an attempt to use Internet technologies to provide new interfaces with corporate 'legacy' data and information systems. There does not necessarily have to be any access from the organization's internal network to the Internet itself. Where there is, there will usually be a firewall with a gateway through which all access takes place, along with user authentication, encryption of messages, and the use of virtual private networks (VPNs) that tunnel through the public network. Through such devices, company information and computing resources can be shared by employees working from external locations. Increasingly, intranets are being used to deliver tools and applications, e.g.: collaboration (to facilitate working in groups and for teleconferences) or sophisticated corporate directories, sales and CRM tools, project management, etc, to advance productivity. Intranets are also being used as culture change platforms. For example, in IBM's "Jam" program, large numbers of employees could discuss key issues in online forums, and key ideas surfaced with the aid of text analysis tools. Intranet traffic, like public-facing web site traffic, is better understood by using web metrics software to track overall activity, as well as through surveys of users. Intranet User Experience, Editorial, and Technology teams work together to produce in-house sites.

Extranet; The extranet is a private network that uses Internet protocols, network connectivity, and possibly the public telecommunication system to securely share part of a business's information or operations with suppliers, vendors, partners, customers or other businesses. An extranet can be viewed as part of a company's Intranet that is extended to users outside the company (e.g.: normally over the Internet). It has also been described as a "state of mind" in which the Internet is perceived as a way to do business with other companies as well as to sell products to customers. An argument has been made that "extranet" is just a buzzword for describing what institutions have been doing for decades, that is, interconnecting to each other to create private networks for sharing information.

Paper; Paper is still required for some operations, and in some industries, paper must be distributed for a certain legal reasons, regulatory mandates, etc.

eStatements; eStatements are a convenient way to access monthly account statements online, and replaces the paper statement sent through the mail. eStatements provide an environment where the paper statement is no longer required. Instead, it can be viewed online or received through email. It can be available by the same business day and quicker than mailed statements. eStatements are an electronic version of the paper statement. Using a current email address, a monthly statement notification automatically delivered directly to the email inbox! A statement can be accessed anytime, day or night through the online environment. An eStatement is an exact replica of the paper statement and contains the identical information. 97% of eStatement adopters continue to receive a paper statement. Firms must wean customers from their addiction to paper by offering a printable statement in PDF format, automatically turning off paper statements for eStatement adopters, and charging customers who request a paper statement via snail mail.
Digital replica of the original statement paper version, and available via an online service or email to authorized users.

Portal; Portals are pages intended to serve as "main pages" for specific topics or areas. Web portals are sites on the World Wide Web that typically provide personalized capabilities to their visitors. They are designed to use distributed applications, different numbers and types of middleware, and hardware to provide services from a number of different sources. In addition, business portals are designed to share collaboration in workplaces. A further business-driven requirement of portals is that the content be able to work on multiple platforms such as personal computers, personal digital assistants (PDAs), and cell phones.

Facsimile (FAX); FAX is a method of sending graphical data down a serial communication system (usually a telephone line or intranet) that involves (conventionally) scanning a document at one end, transmitting the data via modulated tones and then reproducing the picture at the other end on heat-sensitive paper, printers or computers. Fax modems to allow computer-generated graphics to be transmitted as if they came from a conventional fax. A computer and fax modem can be used to receive a fax transmission regardless of origination. It can be display it on the screen and/or output it via a conventional printer or route into other technologies for processing. A fax machine is essentially an image scanner, a modem, and a computer printer combined into a highly specialized package. The scanner converts the content of a physical document into a digital image; the modem sends the image data over a phone line, network and the printer at the other end makes a duplicate of the original document. Fax machines with additional electronic features can connect to computers can be used to scan documents into a computer, and to print documents from the computer. Such high-end devices are called multifunction printers and cost more than fax machines. Capabilities, there are several different indicators of fax capabilities: Group, class, data transmission rate, and conformance with ITU-T (formerly CCITT) recommendations.

Defining the Preserve Module

The Preserve Module addresses long-term archival and storage of organizations' essential content resides. The Preserve Module of ECM handles the long-term, safe storage and backup of static, unchanging information, as well as temporary storage of information that it is not desired or required to archive. For purely securing information microfilm is still viable, and is now offered in hybrid systems with electronic media and database-supported access. The decisive factor for all long-term storage systems is the timely planning and regular performance of migrations, in order to keep information available in the changing technical landscape. This ongoing policy is called Continuous Migration. The Preserve Module contains special viewers, conversion and migration tools, and long term storage media.

Optical Technology; Based on high density, blue laser recording technology, Ultra Density Optical (UDO) is the standard for professional archival storage. UDO delivers all the traditional strengths expected from optical storage such as record authenticity and media longevity, but at a much higher capacity and lower cost than previous generation products. UDO is designed specifically for the secure, long-term storage of high volume document images, emails, customer records, audio or video files, financial information and engineering documentation. UDO media is available in Write Once and Rewritable media formats, all of which are ISO standards. UDO Write Once technology enables regulatory compliance, best practice and audit trail management and is ideal for applications that require records be archived in an unalterable, non-erasable format for legal admissibility. The UDO technology roadmap calls for future generations of 60GB and 120GB media capacities with drive backward read compatibility to maximize investment protection and ensure long-term data access.

Content Addressed Storage (CAS); CAS is all increasingly used for archiving content. CAS is a storage methodology designed for rapid access to fixed content.

Microfilm (Microforms); Microforms are processed films that carry images of documents to users for transmission, storage, reading and printing. Microform images are commonly about 25 times reduced from the original document size. For special purposes greater optical reductions may be used. All microform images may be provided as positives or negatives. For use in readers and printers negative images are preferred, that is with a dark background; the low light available to be scattered gives cleaner images. Two formats are common: microfilm (reels) and microfiche (flat sheets). Microfilm (Aperture Cards, Microfiche, Microfilm Jackets, 16mm Roll Film); Fine-grain, high-resolution film used to record images reduced in size from the original; Microform in the shape of a strip or roll; To record microphotographs on film

Paper; centuries old and, with Microfilm, one of two ways to ensure that documents are readable 100 years from now, or longer.

Storage Area Network (SAN)/ Network Attached Storage (NAS); The NAS can be part of a SAN with hard disk storage directly attached to the network to provide information access. The SAN is a high-speed network that connects computer systems and storage elements and allows movement of data between computer systems and storage elements and among storage elements.

Defining the Store Module


The Store Module determines where the content goes and how it can be located. The Store Module is used for the temporary or transient storage of information, which it is not required or desired to archive. Even if it uses media that are suitable for long-term archiving, “Store” is still separate from “Preserve.” These infrastructure components are sometimes held at the operating system level like the file system, and also include security technologies, which will be discussed farther below in the “Deliver Module” area. However, security technologies including access control are super-ordinate components of an ECM solution. The “Store Module” can be divided into three categories:

Repositories; Repositories are a place that is usually a file system, database or data warehouse where content is deposited or stored. The storage locations have different kinds of repositories that can be used in combination with any transient content per location.
File systems; The way in which files are named and where they are placed logically for storage and retrieval, most commonly in a hierarchical (tree) structure.

Integrator; Integrator is a technological function that makes content into a whole by bringing all repositories together; to make or become a single repository for the process of unifying or uniting. Has the ability to connect multiple repositories across the functional model.
Content Management Systems; Content management systems have the capability to manage and track the location of, and relationships among, content within multiple repositories.

Database; The term database originated within the computer industry. Although its meaning has been broadened by popular use, even to include non-electronic databases, this article takes a more technical perspective. A possible definition is that a database is a collection of records stored in a computer in a systematic way, so that a computer program can consult it to answer questions. The items retrieved in answer to queries become information that can be used to make decisions. The computer program used to manage and query a database is known as a database management system (DBMS). The properties and design of database systems are included in the study of information science. Is defined as; Electronic collection of records stored in a central file and accessible by many users for many applications and a collection of data elements within records or files that have relationships with other records or files. Relational databases are most common—data is stored in standard rows, tables, and columns. XML and Ontology Web Language (OWL) databases are a developing technology.

Data Warehouse; Data Warehouse is the central repository for all, or most, of an organization’s structured data with the next generation of database vendors accepting the need to unite both structure and unstructured content into one environment. ECM interacts with this environment and through business requirements are loosely connected.

Storage Technologies; Storage technologies a wide variety of technologies can be used to store information, depending on the application and system environment.

Storage Area Network (SAN)/ Network Attached Storage (NAS); the NAS can be part of a SAN with hard disk storage directly attached to the network to provide information access. The SAN is a high-speed network that connects computer systems and storage elements and allows movement of data between computer systems and storage elements and among storage elements.

Paper; although Electronic Content Management systems are considered essential for streamlining paper-intensive operations, paper is still required for some operations, and in some industries, paper must be stored for a certain period of time for legal reasons, regulatory mandates, etc. However, once the data has been captured and the image stored, paper documents can be archived away from the central office in special warehouses, thus reducing the direct cost of storage but increasing the cost of retrieval (if the physical paper is frequently required).
Paper is generally a low cost option for storage for low volume businesses. However, as paper volume increases, the cost of manual paper processes and storage also increases. In some cases, these costs become a limiting factor to growth in a company.

Compact Disc Read Only Memory (CD-ROM) and, Digital Versatile Disc (DVD); CD-ROM optical disc is created by a mastering process and used for distributing read-only information. The DVD is a 120 mm optical disc on which digital video, audio, data, and images can be stored. The available formats are read-only, recordable, and rewritable.

Redundant Array of Independent Disks (RAID); RAID storing is the same data on multiple hard disks for improved performance and fault tolerance.

Magnetic Storage/Optical; During the initial period that an image is being used for work processing, it is frequently accessed for different operations and stored temporarily on different magnetic media. Images are generally stored on the host system, and as the application dictates, they are downloaded to workstations for work processing. Once on the workstation, the size of the document or document folder, number of documents, and the need to access those documents quickly will help determine the workstation memory requirements.

Optical Disk; Optical disk storage is commonly used to store or archive images once the initial work has been completed. Images are downloaded from magnetic disk to optical disk storage, making the magnetic disk available for new batches of images. An optical storage configuration is similar to a magnetic disk configuration. The size of the image database is dependent on the estimated total number of images to be stored, the frequency of access, the response time requirements for access, and the DPI of the image. Depending on the application needs, optical storage can range from one stand-alone optical disk drive with image platters manually inserted as needed to machines with multiple disk drives that automatically retrieve disks and insert them into the drives. Optical disk storage devices and optical disks are available in a variety of sizes and configurations. Depending on the configuration, vendors could propose different optical disk technologies.

Library Services; have to do with libraries only in a metaphorical way. They are the administrative components close to the systems that handle access to information. The Library service is responsible for taking in and storing information from the “Capture module” and any managing components within “Manage Module”. The storage location is determined only by the characteristics and classification of the information. These services are used and leveraged across all compartmentally management component’s along with capture and deliver modules. The library service works in concert with the database of the manage components each support the following functionalities:

Search; typically has core ties into stored content according to its content type from both the structured and now un-structured data. Having the ability to reference content where it is held in any system, catalogue information in a database, letters and reports in content management systems. However, for an individual to perform a business tasks efficiently, they need to access all relevant content at once. A complaint from a customer will appear in the letters document management system, but that customer's details are hidden in the CRM application and without understanding the product they bought (information that is held in the Products database) it will be difficult to help them. Searching is the act of trying to find something or someone. One can distinguish between two forms of search. One may search for an item that is known to exist, with the intent to locate it, and one may search for an item whose existence is uncertain, in order to ascertain whether it exists or not. The ability to locate the correct and proper content within a repository indexed by either within a database or reference.

Version Control; the ability to track and manage the history of a particular piece of content may be very important. Some of the features to look for may include: the ability to rollback to a previous version of a document, that ability to track major and minor revisions of a document, and the ability to purge earlier versions of a document. The ability to track the history, implement rollback activities on content at any level either directly/indirectly to the physical document through the Information Life-cycle Process (ILcP).

Check In/Check Out; systems used in a manner that require users to “check out” documents must ensure that modifications are made by one person at a time. Another potential feature that may need to be evaluated is if the user is disconnected from the system can they modify the document and have resulting changes synchronized with the document stored in the repository. Checking in and out ensures that only one person can work on a document at any time.

Workflow; compartmental operational function executing specific operations of the smallest actionable item of a business function. This area operates within an any area of the ECM module and should not to be confused with the holistic process orchestration of Business Process Management.

Retrieval; the outlined requirements for accessing the repository from any device or workstation requests must be serviceable and separate from other modules. Access is through the Internet, and via service requests utilizing an OWL, XML or API based on international standards.

Audit Trail; knowing where a document is or has been is vital to ensuring only entitled users have had access to content. The inherent ability to also track content externally is becoming needed for accountability check and balances. This function provides an accurate logging of who accessed, changed content and when for accountability. The Audit Trail is an examination and verification of a company's accounting records and supporting documents by a professional, such as a Certified Public Accountant (CPA). An audit trail is the step-by-step accounting or record from/by/which data can be traced to its source.

Defining the Manage Module

Comprised of tools and techniques for moving content around an organization and monitoring those tools’ performance. The Manage Module comprises tools and techniques for moving content around an organization and monitoring those tools’ performance. The Manage Module is for the management, processing, and use of information. A closed ECM system is to provide manage components just once as services for any “Manage” solutions such as Document Management, Collaboration, Web Content Management, Records Management and Business Process Management/Workflow. To link the various “Manage” components, they will have standardized interfaces and secure transaction processes for inter-component communication. Solution must be a component of, and include a complete and integrated product suite at the module level as defined. This module has an initial scope of the following sub-categories and will have addition as the document develops. Security requirements; to ensure the technology supports secure access that meets Citigroup business needs, the solution must also be assessed with respect to how it supports end-to-end security as related to user authentication, document authentication, and secure network transactions over the Internet, Intranet, and Extranet as necessary. The complexity and scope of organizations security issues especially when dealing with distribution will require the collaboration of multiple sectors, segments and lines of business organizational disciplines including legal, business operations, system administration, network administration, vendors, and external users of the system. For more information on security related requirements, organizations should review for consideration, ISO 17799 Information technology -- Security techniques -- Code of practice for information security management.

Document Management (DM); DM is a technology that has evolved over the past two decades from a basic "electronic filing cabinet" which is stored scanned documents and images in electronic form on a server that could capture, index, and retrieve for future use. With technological advances, we have available to us the tools to properly manage paper documents and electronic files. DM stores and indexes voice recordings, faxes, videos, pictures, drawings, computer output, and many other types of paper and electronic files. DM has the capability to manage and track the location of, and relationships among, content within a repository. DM leverages all of the services identified in section "Library Services" to manage content through the ILcP. The Content Management domain team defines DM as:
DM software technologies control and organize documents throughout an enterprise. Usually incorporating document and content capture, workflow, document repositories, COLD/ERM and output systems, and information retrieval systems.

System of record; A system of record is an information storage system which is the data source for a given data element or piece information. The need to identify Systems of Record can become acute in organizations where management information systems have been built by taking output data from multiple source systems, re-processing this data, and then re-presenting the result for a new business use. Where the integrity of the data is vital, a data element must either be linked to, or extracted directly from its System of Record. The integrity and validity of any data set is open to question when there is no traceable connection with a known System of Record.

Annotation; annotation is extra information associated with a particular point in a document or other piece of information. In the digital imaging community the term annotation is commonly used for visible metadata superimposed on an image without changing the underlying raster image, such as sticky notes, virtual laser pointers, circles, arrows, and black-outs.

Versioning; revision control or versioning is the management of multiple revisions of the same unit of information. Changes to these documents are identified by incrementing an associated number or letter code, termed the "revision number", "revision level", or simply "revision" and associated historically with the person making the change.

Metadata; all physical data and knowledge from inside and outside an organization, including information about the physical data, technical and business processes, rules and constraints of the data, and structures of the data used by a corporation. The metadata concept has been extended into the world of systems to include any "data about data"--the names of tables, columns, programs, and the like. Different views of this system metadata are described below, but beyond that is the recognition that metadata describe all aspects of systems--data, activities, people and organizations involved, locations of data and processes, access methods, limitations, timing and events, as well as motivation and rules. Fundamentally, then, metadata are "the data that describe the structure and workings of an organization’s use of information, and which describe the systems it uses to manage that information." To do a model of metadata is to do an "Enterprise model" of the information technology industry itself. Metadata is typically stored for each document. Metadata may, for example, include the date the document was stored and the identity of the user storing it. The DMS may also extract metadata from the document automatically or prompt the user to add metadata.


Collaboration; collaborations are sophisticated software tools (collaborative authoring, video conferencing, shared whiteboards, etc.) that allow multiple users to work on the same content in a common environment to organize teamwork. Collaboration Management solutions offer the ability to communicate complex information within groups and to customers, to organize thoughts and ideas, and to manage information for meetings, presentations, projects, proposals, research, contacts. Project management tools automate and simplify task management, while conferencing tools such as screen sharing; instant messaging and polling enable instant real-time collaboration regardless of where team members are located. Therefore, teams can work with resources outside the organization effectively. Collaboration leverages all of the services identified in section “Library Services" to manage content through the ILcP. All final artifacts from this component are moved into the DM component where it will be permanently controlled until the ILcP is completed. The Content management domain team defines collaboration as:
Collaboration is the software technology that enables individual users such as employees, customers and business partners to easily create and maintain teams regardless of geographic location. These technologies facilitate team-based content creation and decision-making. A set of tools allows multiple users to work on the same content in a common environment. The technologies include but not limited to: instant messaging, shared white board, collective authoring (BRD, CEP, etc.), and voice and video conferencing.


Web Content Management (WCM); WCM covers the complete lifecycle of the site pages, from providing simple tools to create the content, through to publishing, and finally to archiving. It also provides the ability to manage the structure of the site, the appearance of the published pages, and the navigation provided to the users. The most common use of a WCM is to manage web content. There are a wide range of business benefits that can be obtained by implementing a WCM environment, including but not limited too; Streamlined authoring process; Faster turnaround time for new pages and changes; Greater consistency; Improved site navigation; Increased site flexibility; Support for decentralized authoring; Increased security; Reduced duplication of information; Greater capacity for growth; Reduced site maintenance costs WCM leverages all of the services identified in section “Library Services" to manage content through the ILcP. All final artifacts from this component are moved into the DM component where it will be permanently controlled until the ILcP is completed.
Software technology that addresses the content creation, modification, review, approval, publishing and discovery processes of Web-based corporate content supporting the business.


A web content management system is essentially a way of separating your visual presentation from actual content — whether that content includes photos, text or product catalogs. This separation allows one to accomplish several key things, including:


Automated Templates; Create standard visual templates that can be automatically applied to new and existing content, creating one central place to change that look across all content on your site.

Creation/Editable Content; The ability to easily create, edit and maintain web site content through a self-servicing model through proper entitlements. Most WCM software includes WYSIWYG editing tools allowing non-technically trained individuals to easily create and edit content.

Publishing; The ability to publish web site content through a self-servicing model through proper entitlements and controls once your content is separate from the visual presentation of your site, it usually becomes much easier and quicker to edit and manipulate.

Scalable Feature Sets; Most WCM have plug-ins or modules that can be easily installed to extend your existing site's functionality. For example, if one wanted to add a product catalog or chat functionality to a website, one could easily install a module/plug-in to add that functionality rather than hiring a web developer to hard code that new functionality.


Records Management (RM); What is a “record”? According to the Federal Records Act a record is, “recorded information, regardless of medium or characteristics, made or received by an organization that is evidence of its operations and has value requiring its retention for a specific period of time.” There are other ways to determine if an information item should be considered a record or not:
RM is the application of systematic and scientific controls to recorded information required in the operation of an organization’s business. The systematic control of all organizational records during the various stages of their life cycle: from their creation or receipt, through their processing, distribution, maintenance and use, to their ultimate disposition. The purpose of record management is to promote economies and efficiencies in recordkeeping, to assure that useless records are systematically destroyed while valuable information is protected and maintained in a manner that facilitates its access and use.
RM enables an enterprise to assign a specific life cycle to individual pieces of corporate information from creation, receipt, maintenance, and use to the ultimate disposition of records. A record is not necessarily the same as a document. All documents are potential records, but not vice versa. A record is essential for the business; documents are containers of “working information.” Records are documents with evidentiary value. RM leverages all of the services identified in section “Library Services" to manage content through the ILcP. All final artifacts from this component are moved into the DM component where it will be permanently controlled until the ILcP is completed. The Content management domain team defines RM as:
Records Management (RM) is the technologies which support the professional discipline that is primarily concerned with the management of document-based, document-centric information throughout the life-cycle within systems.


ISO 15489:2001 states that records management includes: setting policies and standards; assigning responsibilities and authorities; establishing and promulgating procedures and guidelines; providing a range of services relating to the management and use of records; designing, implementing and administering specialized systems for managing records; and integrating records management into business systems and processes. Managing physical records involves a variety of diverse disciplines. At the simplest, physical and digital records must be organized and indexed. In more complex environments, records management demands expertise in forensics, history, engineering, and law. Records management then resolves to be a coordination of many experts to build and maintain the system. Records must be identified and authenticated. In a business environment, this is usually a matter of filing business documents and making them available for retrieval. However, in many environments, records must be identified and handled much more carefully. Below are some of the records management components that are required. Currently managing records at the hardware layer is not the corporate direction and should not be leveraged. Legal acceptance of records; evidentiary issues associated with using electronic imaging systems and optical storage technologies need to be considered based upon local legal guidelines (ISO 12654).

Identifying records; If an item is presented as a record, it must be first examined as to its relevance, and it must be authenticated. Forensic experts may need to examine a document or artifact to determine that it is not a forgery, or if it is genuine, that any damage, alterations, or missing content is documented. In extreme cases, items may be subjected to a microscope, x-ray, radiocarbon dating or chemical analysis to determine their authenticity and prior history. This level of authentication is rare, but requires that special care be taken in the creation and retention of the records of an organization.

Storing records; Records must be stored in such a way that they are both sufficiently accessible and are safeguarded against environmental damage. A typical contract or agreement may be stored on ordinary paper in a file cabinet in an office. However, many records file rooms employ specialized environmental controls including temperature and humidity. Vital records may need to be stored in a disaster-resistant safe or vault to protect against fire, flood, earthquakes and even war. In extreme cases, the item may require both disaster-proofing and public access, which is the case with the original, signed US Constitution. Even civil engineers must be consulted to determine that the file room can effectively withstand the weight of shelves and file cabinets filled with paper; historically, some military vessels were designed to take into account the weight of their operating procedures on paper as part of their ballast equation (modern record-keeping technologies have transferred much of that information to electronic storage). In addition to on-site storage of records, many organizations operate their own off-site records centers or contract with commercial records centers. Users and systems designers should consult the organization’s established retention requirement set forth in their Records Management Policies and Procedures. The system being implemented should ensure that the system is able to retrieve the information throughout the required document life cycle. The storage media and its life expectancy rating must be considered, hardware and software obsolescence issues must be evaluated, and a sound migration strategy must be developed to ensure access. Organizations that do not have current retention requirements should consider developing these documents. These documents enable organizations to manage existing records, along with provide a mechanism to automate when documents are to be archived, for how long, what action to take after the retention period is passed, along with numerous other organizational advantages from a management perspective. There are 2 international standards providing information on retention requirements that should be reviewed. These international standards are ISO 15489-1:2001 Information and documentation -- Records management -- Part 1: General and ISO 15489-2: Information and documentation -- Records management -- Part 2: Guidelines. These international standards provide information associated with the records management perspectives of EDMS technologies.

Circulating records; Records are stored because they may need to be retrieved at some point. Retrieving, tracking the record while it is away from the file room, and then returning the record, is referred to as circulation. At its simplest, circulation is handled by manual methods such as simply writing down who has a particular record, and when they should return it. However, most modern records environments use a computerized records management system that includes the ability to employ bar code scanners for better accuracy, or radio-frequency identification technology (RFID) to track movement of the records from office to office, or even out of the office. Bar code and RFID scanners can also be used for periodic auditing to ensure that unauthorized movement of the record is tracked.

Disposition of records; Disposition of records does not always mean destruction. Disposition can also include transfer of records to a historical archive, to a museum, or even to a private party. When physical records are destroyed, the records must be authorized for destruction by law, statute, regulation, and operating procedure. Once approved, the record must be disposed of with care to avoid inadvertent disclosure of information to unauthorized parties. The process to dispose of records needs to be well-documented, starting with a records retention schedule and policies and procedures that have been approved at the highest level of an organization. An inventory of the types of records that have been disposed of must be maintained, including certification that the records have been destroyed. Records should never simply be discarded as any other refuse. Most organizations use some form of records destruction including pulverization, paper shredding or incineration. Disposition; Disposing information from databases and storage systems need to meet specific legal requirements (ISO 12037). It should be noted that information being expunged needs to follow specific legal rules and does not necessarily require that documents be permanently deleted, but can require that access to documents be permanently removed. Advice from legal counsel should be requested to determine whether permanent removal from accessing documents would meet disposition requirements.


Business Process Management (BPM)/Workflow; a business process is any activity within a company involved in the development of a product or service for a customer. BPM is a general term that encompasses any means of identifying, documenting, monitoring, improving and automating a business process by information technology. Smoothly-running business processes are critical in maximizing the values provided to customers, and managing processes well is critical to business success. Specifically, a well-implemented BPM strategy can provide benefits like automation of business processes, in whole or in part, where documents, information, or tasks are passed from one participant to another for action, according to a set of rules. A business process is a logically related set of workflows, work-steps, and tasks that provide a product or service to customers. BPM is a mix of process management workflow with application integration technology. BPM Workflow leverages all of the services identified in section “Library Services" to manage content through the ILcP. All final artifacts from this component are moved into the DM component where it will be permanently controlled until the ILcP is completed.
Business Process Management (BPM) is a natural and holistic management approach to operating business that aids in producing a highly efficient, agile, transparent, innovative, and adaptive model for a organization that far exceeds that achievable through traditional management approaches.

The activities which constitute business process management can be grouped into three categories: design, execution and monitoring.
Process design; This encompasses either the design or capture of existing processes. In addition the processes may be simulated in order to test them. The software support for these activities consists of graphical editors to document the processes and repositories to store the process models. An emphasis on getting the design of the process right will logically lead to better results as the flow on effect of problems at the design stage logically affects a large number of parts in an integrated system. Evolution of business processes requires a change to the process design to flow on into the live system. Integrating business process is also a current research area. Integration of software for process design to be used both for creating graphical representations of workflows and implementing and maintaining these workflows makes evolution of business processes less stressful, given that requirements are not as static as information systems.

Process Execution; the traditional way to achieve the automatic execution of processes is that an application is developed or purchased which executes the steps required. However, in practice, these applications only execute a portion of the overall process. Execution of a complete business process can also be achieved by using a patchwork of interfacing software with human intervention needed where applications are not able to automatically interface. In addition, certain process steps can only be accomplished with human intervention (for example, deciding on a major credit application). Due to the complexity that this approach engenders, changing a process is costly and an overview of the processes and their state is difficult to obtain. As a response to these problems, the Business Process Management System (BPMS) category of software has evolved. BPMS allows the full business process (as developed in the process design activity) to be defined in a computer language which can be directly executed by the computer (see Business Process Management standards). The BPMS will either use services in connected applications to perform business operations (e.g. calculating a repayment plan for a loan) or will send messages to human workers requesting they perform certain tasks which necessitate a human attribute such as intuition as opposed to automated processes. As the process definition is directly executable, changes in the process can be (in comparison to the traditional approach of application development or maintenance) relatively quickly moved into operation. In order to work effectively a BPMS often requires that the underlying software is constructed according to the principles of a service-oriented architecture. Thus, it is often difficult to make a suite of existing legacy systems fit with a BPMS. The commercial BPMS software market has focused on graphical process model development, rather than text-language based process models, as a means to reduce the complexity of model development. Visual programming using graphical metaphors has increased productivity in a number of areas of computing and is well accepted by users. Business rules are a growing area of importance in BPMS as these rules provide governing behavior to the BPMS, and a business rule engine can be used to drive process execution and resolution.

Process Monitoring; This monitoring encompasses the tracking of individual processes so that information on their state can be easily seen and the provision of statistics on the performance of one or more processes. An example of the tracking is being able to determine the state of a customer order (e.g. ordered arrived, awaiting delivery, invoice paid) so that problems in its operation can be identified and corrected. In addition, this information can be used to work with customers and suppliers to improve their connected processes. Examples of the statistics are the generation of measures on how quickly a customer order is processed, how many orders were processed in the last month etc.. These measures tend to fit into three categories: cycle time, defect rate and productivity. Although such functions may be within the scope of current applications, the use of a BPMS is expected to ease the development of such reporting. Manufacturers of BPMSs will often offer process monitoring software as well as MIS and execution. Business Process Management is an automated process to handle the complex business policy of any organization. There are so many tools & programming languages to handle this process and customize the already existing tools, sometimes it's really a challenge to realize the complex business process of an organization to automate.

Defining the Capture Module


The Capture Model comprises how business content, whether paper or electronic, is transported into a content repository for reuse, distribution, and storage. The Capture Module contains functionalities and components for generating, capturing, preparing and processing analog and electronic information. There are several levels and technologies, from simple information capture to complex information preparation using automatic classification. Capture components are often also called Input components. Manual capture can involve all forms of information, from paper documents to electronic office documents, e-mails, forms, multimedia objects, digitized speech and video, and microfilm. Automatic or semi-automatic capture can use EDI or eXtensible Markup Language (XML) documents, business and ERP applications or existing specialist application systems as sources. The first step in managing content is getting it into the IT infrastructure. While internal company documents are born, and remain, digital throughout their lifecycle organizations still needs to ingest a tremendous amount of paper-based communications from outside. Checks, invoices, application forms, customer letters, etc., all need to be turned into digital form for insertion into the content management electronic workflow and/or directly into storage. Capturing documents puts them in motion and enables the information contained in those documents to be acted upon. While organizations predominantly refer to capture as the digitizing of paper documents, capture can also mean ingesting electronic files into your content management environment.


Recognition Technologies; For an extended period of time organization have always (and will) need a tool that deciphers data from the abstract and perceived un-abstractable and this is where recognition technologies lend a hand below are a few of the ‘solidified’ and ‘fluid’ functions within the model. With the advancement a voice and television over IP future recognition engines will need to be designed and developed to extract information from these new forms of digital (content) media. For the purpose of the domain the definition is:

The sensing and encoding of printed or written data by a machine and is a process that occurs in thinking when some event, process, pattern, or object recurs. Thus in order for something to be recognized, it must be familiar. This recurrence allows the recognizer to more properly react, and has survival value.

Optical/Intelligent Character recognition (OCR/ICR); Optical/Intelligent Character Recognition (OCR/ICR) engines can achieve high recognition rates equal to human accuracy when documents are properly designed, printed, and controlled. Optical character recognition, usually abbreviated to OCR, is computer software designed to translate images of handwritten or typewritten text (usually captured by a scanner) into machine-editable text, or to translate pictures of characters into a standard encoding scheme representing them (e.g. ASCII or Unicode). OCR began as a field of research in pattern recognition, artificial intelligence and machine vision. Though academic research in the field continues, the focus on OCR has shifted to implementation of proven techniques. Optical character recognition (using optical techniques such as mirrors and lenses) and digital character recognition (using scanners and computer algorithms) were originally considered separate fields. Because very few applications survive that use true optical techniques, the optical character recognition term has now been broadened to cover digital character recognition as well. Early systems required "training" (essentially, the provision of known samples of each character) to read a specific font. Currently, though, "intelligent" systems that can recognize most fonts with a high degree of accuracy are now common. Some systems are even capable of reproducing formatted output that closely approximates the original scanned page including images, columns and other non-textual components.

Optical Mark Recognition (OMR); Optical Mark Recognition (OMR) is the process of capturing data by contrasting reflectivity at predetermined positions on a page. By shining a beam of light onto the document the capture device is able to detect a marked area because it is more reflective than an unmarked surface. Some OMR devices use forms which are preprinted onto paper and measure the amount of light which passes through the paper, thus a mark on either side of the paper will reduce the amount of light passing through the paper. It is generally distinguished from optical character recognition by the fact that a recognition engine is not required. That is, the marks are constructed in such a way that there is little chance of not reading the marks correctly. This requires the image to have high contrast and an easily-recognizable or irrelevant shape. Recent improvements in OMR have led to various kinds of two dimensional bar codes called matrix codes. For example, United Parcel Service (UPS) now prints a two dimensional bar code on every package. The code is stored in a grid of black-and-white hexagons surrounding a bullseye-shaped finder pattern. These images include error-checking data, allowing for extremely accurate scanning even when the pattern is damaged. Most of today's OMR applications work from mechanically generated images like bar codes. A smaller but still significant number of applications involve people filling in specialized forms. These forms are optimized for computer scanning, with careful registration in the printing, and careful design so that ambiguity is reduced to the minimum possible.

Speech and Translation Recognition (STR); Speech and Translation recognition is the process of converting a speech signal either analog or digital to a sequence of words relative to the receiving audience, by means of an algorithm implemented as a computer program. Speech and translation recognition applications that have emerged over the last years include voice dialing, call routing, simple data entry, and preparation of structured documents. Voice Verification or speaker recognition is a related process that attempts to identify the person speaking, as opposed to what is being said.

Intelligent Handwriting Recognition (IHR); Intelligent handwriting recognition is the ability of a computer to receive intelligible handwritten input. The image of the written text may be sensed "off line" from a piece of paper by optical scanning. Alternatively, the movements of the pen tip may be sensed "on line", for example by a pen-based computer screen surface. Handwriting recognition principally entails optical character recognition. However, a complete handwriting recognition system also handles formatting, performs correct segmentation into characters and finds the most plausible letters and words. Hand-printed digits can be modeled as splines that are governed by about 8 control points. For each known digit, the control points have preferred 'home' locations, and deformations of the digit are generated by moving the control points away from their home locations. Images of digits can be produced by placing Gaussian ink generators uniformly along the spline. Real images can be recognized by finding the digit model most likely to have generated the data. For each digit model we use an elastic matching algorithm to minimize an energy function that includes both the deformation energy of the digit model and the log probability that the model would generate the inked pixels in the image. The model with the lowest total energy wins. If a uniform noise process is included in the model of image generation, some of the inked pixels can be rejected as noise as a digit model is fitting a poorly segmented image. The digit models learn by modifying the home locations of the control points.

Intelligent Pattern Recognition (IPR); Pattern recognition is a field within the area of machine learning. Alternatively, it can be defined as: "The act of taking in raw data and taking an action based on the category of the data.” As such, it is a collection of methods for supervised learning. Pattern recognition aims to classify data (patterns) based on either a priori knowledge or on statistical information extracted from the patterns. The patterns to be classified are usually groups of measurements or observations, defining points in an appropriate multidimensional space. A complete pattern recognition system consists of a sensor that gathers the observations to be classified or described; a feature extraction mechanism that computes numeric or symbolic information from the observations; and a classification or description scheme that does the actual job of classifying or describing observations, relying on the extracted features. The classification or description scheme is usually based on the availability of a set of patterns that have already been classified or described. This set of patterns is termed the training set and the resulting learning strategy is characterized as supervised learning. Learning can also be unsupervised, in the sense that the system is not given an a priori labeling of patterns, instead it establishes the classes itself based on the statistical regularities of the patterns. The classification or description scheme usually uses one of the following approaches: statistical (or decision theoretic), syntactic (or structural). Statistical pattern recognition is based on statistical characterizations of patterns, assuming that the patterns are generated by a probabilistic system. Structural pattern recognition is based on the structural interrelationships of features. A wide range of algorithms can be applied for pattern recognition, from very simple Bayesian classifiers to much more powerful neural networks. Holographic associative memory is another type of pattern matching scheme where a target small patterns can be searched from a large set of learned patterns based on cognitive meta-weight. Typical applications are automatic speech recognition, classification of text into several categories (e.g. spam/non-spam email messages), the automatic recognition of handwritten postal codes on postal envelopes, or the automatic recognition of images of human faces.

Magnetic Ink Character Recognition (MICR); Magnetic Ink Character Recognition was developed to permit the effective processing of checks. Checks consist of a single line of numeric data, and the font used (E-13B or CMC7) is highly stylized and printed with special magnetically conducting ink. Check and remittance readers were equipped with a magnetic reader that analyzed the graph created from a stylized number font and was able to accurately interpret them.

Intelligent Document Recognition (IDR); Intelligent Document Recognition (IDR) technologies originally developed for invoice processing and the electronic mailroom. IDR uses techniques from each of the above areas and eliminates the limitations. It is no longer necessary to know what the form layout looks like. It is no longer necessary to insert separators. It is no longer necessary to presort. Specific rules can make the data understandable. IDR has the ability to figure out what the document category is and apply the appropriate business rules. IDR, which is also called intelligent data capture works a lot more like humans, relying on training and an internal knowledge of the layout and content of generic forms types, which is used to understand and extract required information and initiate workflows to act on the content. That widens the types of forms that can be captured and reduces costs, but IDR also changes capture capabilities substantially into a series of tools that have the ability to interpret and extract data from all sorts of unstructured information. This functionality allows an expanded reach and can provide the front-end understanding needed to feed business process management (BPM) and business intelligence (BI) frameworks by providing information previously not easily accessible or available. The information can be input as scanned paper or document formatted information, whether it is data-centric, such as Word or PDF normal, or image-based. Typically that includes and leverages multiple different methods including pattern recognition, OCR and other recognition and search engines to locate and extract required information before applying business rules to it. IDR provides the ability to make sense of and help manage the unstructured, untagged information that is coming into the corporation or organization.


Document Imaging; Document Imaging is the process of capturing and pre-indexing documents and transmitting (relay-race) into a “System of Record” repository as a digitized image file regardless of original format, using micrographics and/or electronic imaging. Leveraging all type of recognition technologies in both a central and decentralized environment.

Indexing; in English parlance, indexing refers to the manual assignment of index attributes used in the database of a "manage" component for administration and access.

Web Forms/Forms Processing; Web Forms and Forms Processing is how forms are designed, managed, and processed completely in an electronic environment. A specialized imaging application designed for handling pre-printed forms. Forms processing systems often use high-end (or multiple) OCR engines and elaborate data validation routines to extract hand-written or poor quality print from forms that go into a database. This type of imaging application faces major challenges, since many of the documents scanned were never designed for imaging or OCR.

Computer Output to Laser Disk (COLD)/Enterprise Report Management (ERM); COLD/ ERM is the way documents and indexes from computer output (reports primarily). Once stored, the reports can be retrieved, viewed, printed, faxed, or distributed to the Internet. Often used for Internet Billing applications. Enterprise Report Management formerly known as computer output laser disk COLD technology, and, today, often written as COLD/ERM. This technology electronically stores, manages, and distributes documents that are generated in a digital format and whose output data are report-formatted/print-stream originated. Unfortunately, documents that are candidates for this technology too often are printed to paper or microform for distribution and storage purposes. This is mostly an aged terminology which over time will collectively assimilate into the “Document Management” functional model.

Aggregation; The ability to automate the collecting of units or parts into a mass or whole and address the creation of documents in concert with critical business processes from different creation, authoring tools and other systems producing a single work effort. Transform any document into a compelling, personalized communication that improves customer satisfaction and reduces operating costs. Assemble content from multiple sources into a single, complete, customized document package that presents a consistent company image — across multiple communication channels.

Categorization; categorization is the process in which objects (digital images) are recognized, differentiated and understood. Categorization implies that objects are grouped into categories, usually for some specific purpose. Ideally, a category illuminates a relationship between the subjects and objects of knowledge. Categorization is fundamental in prediction, inference, decision making and in all kinds of interaction with the environment. There are, however, different ways of approaching categorization. Categorization tasks in which category labels are provided to the learner for certain objects are referred to as supervised classification, supervised learning, or concept learning. Categorization tasks in which no labels are supplied are referred to as unsupervised classification, unsupervised learning, or data clustering. The task of supervised classification involves extracting information from the labeled examples that allows accurate prediction of class labels of future examples. This may involve the abstraction of a rule or concept relating observed object features to category labels, or it may not involve abstraction. The task of clustering involves recognizing inherent structure in a data set and grouping objects together by similarity into classes. It is thus a process of generating a classification structure. In conceptual clustering, this involves also generating a rule or description for each generated category.
Categorizing is the process of organizing documents, web pages, and all other unstructured content by putting each item into logical groupings, based on their contents category or classification.


Application Created Content; when looking at application created content they are two perspectives to review this from an obvious to you have pre-and post-creation. From a pre-content creation perspective having the ability to replicate the creation model provides a process ensuring the highest levels of compliance, however this has the creative residuals of the process and therefore the post-creation actions have the highest value to the business. Post-creation content is generally manageable if the proper thought is given to the residual and its final destination. Application created content needs to be managed from a historical perspective due to the predominant customer facing it provides, and the need to ensure the availability and proof of distribution. Application created content encompasses content, as types of environments, like Enterprise Resource Planning (ERP), eBilling (billing information sent by the customers or partners), financial applications and XML.

Human Created Content; Defining human created content is very broad the reason for that is they can be unstructured, unformatted and not validated against any backend systems or repositories. As an example of how human created content leverages office productivity products such as the product used to view this document, or to view a spreadsheet and even create presentation, but let's not forget the tool that is used most frequently and more successfully phone conversations and other rich media forms includes such content as streaming audio, video, PodCasting, and all OIP (Over Internet Protocol) content such as Voice, Fax and TV.

Voice/Fax Over Internet Protocol; Voice over Internet Protocol, also called VoIP, IP Telephony, Internet telephony, Broadband telephony, Broadband Phone and Voice over Broadband is the routing of voice conversations over the Internet or through any other IP-based network. Protocols which are used to carry voice signals over the IP network are commonly referred to as Voice over IP or VoIP protocols. They may be viewed as commercial realizations of the experimental Network Voice Protocol (1973) invented for the ARPANET.ce providers. Some cost savings are due to utilizing a single network to carry voice and data, especially where users have existing underutilized network capacity they can use for VoIP at no additional cost. VoIP to VoIP phone calls on any provider are typically free, while VoIP to PSTN calls generally costs the VoIP user. There are two types of PSTN to VoIP services: DID (Direct Inward Dialing) and access numbers. DID will connect the caller directly to the VoIP user while access numbers require the caller to input the extension number of the VoIP user. Access numbers are usually charged as a local call to the caller and free to the VoIP user[citation needed] while DID usually has a monthly fee. There are also DID that are free to the VoIP user but is chargeable to the caller. The ability to extract content from conversations and processed through a “Recognition Technologies” has the potential to aid value in may areas within a line-of-business.

Rich Media; was coined to describe a broad range of digital interactive media. Rich media can be downloadable or may be embedded in a webpage. If downloadable, it can be viewed or used offline with media players such as Real Networks' RealPlayer, Microsoft Media Player, or Apple's QuickTime, among others.

Office Documents; in computing, an office suite, sometimes called an office application suite or productivity suite is a software suite intended to be used by typical clerical and knowledge workers. The components are generally distributed together, have a consistent user interface and usually can interact with each other, sometimes in ways that the operating system would not normally allow. Most office application suites include at least a word processor and a spreadsheet element. In addition to these, the suite may contain a presentation program, database tool, and graphics suite and communications tools. An office suite may also include an email client and a personal information manager or groupware package.

Forms; Form refers to a document that is commonly used to request information and data. Forms are available in printed or electronic format, the latter being the most versatile as it enables the user to type the requested information using a computer keyboard and allows them to easily distribute the content contained within using the Internet and email.

Central/Decentralize; Central is the process of localizing image capture where objects are centralized for distribution most “heavy lifting” using “Recognition Technologies” are executed centrally. Decentralize is the process of dispersing decision-making closer to the point of service or action where simple functions exist.

Saturday, December 16, 2006

Defining the ECM High-Level Model

Organizations primary resource for future capital expansion is to define and better understand the content that is currently within the company and unmanaged. My suggestion is that a defined and establish standard functional components and associate them to technology solutions for any business. Within this Blog I will create the definitions or taxonomies will be established for consumption, which will allow for a consolidated view of Enterprise Content Management (ECM). For the purpose of this document Content Management (CM) or Enterprise Content Management (ECM), is defined as:
The technologies, tools, and methods used to capture, manage, store, preserve, and deliver content across an enterprise.

Unstructured data or -information refers to masses of computerized information, which do not have a data structure, which is easily readable by a machine. The most important item to remember is that content is vital to an organization. Ultimately, the greatest value is ensuring proper management techniques and tools are available, understood and proper integration. My opinion is that no one single function is greater or lessor than any other component within the model, all components are relative at a peer level and without this view a incomplete model exists and will not operate successfully. I will not decompose the functional framework; a granular level for each function has an operational view that creates an interdependence crossing each function. Understanding the core components of a ECM environment provides the baseline for defining vision and direction for the business. The five (5) core modules are: Capture - comprises how business content, whether paper or electronic, is transported into a content repository for reuse, distribution, and storage; Manage - comprised of tools and techniques for moving content around an organization and monitoring those tools’ performance; Store - determines where the content goes and how it can be located; Preserve - for long-term archival and storage of organizations' essential content resides; Deliver - determines how to get the right content to the right audience on the correct device.


The Capture Model comprises how business content, whether paper or electronic, is transported into a content repository for reuse, distribution, and storage. The Capture Module contains functionalities and components for generating, capturing, preparing and processing analog and electronic information. There are several levels and technologies, from simple information capture to complex information preparation using automatic classification. Capture components are often also called Input components. Manual capture can involve all forms of information, from paper documents to electronic office documents, e-mails, forms, multimedia objects, digitized speech and video, and microfilm. Automatic or semi-automatic capture can use EDI or eXtensible Markup Language (XML) documents, business and ERP applications or existing specialist application systems as sources. The first step in managing content is getting it into the IT infrastructure. While internal company documents are born, and remain, digital throughout their lifecycle organizations still needs to ingest a tremendous amount of paper-based communications from outside. Checks, invoices, application forms, customer letters, etc., all need to be turned into digital form for insertion into the content management electronic workflow and/or directly into storage. Capturing documents puts them in motion and enables the information contained in those documents to be acted upon. While Citiroup predominantly refer to capture as the digitizing of paper documents, capture can also mean ingesting electronic files into your content management environment.



The Manage Module, comprised of tools and techniques for moving content around an organization and monitoring those tools’ performance. The Manage Module is for the management, processing, and use of information. A closed ECM system is to provide manage components just once as services for any “Manage” solutions such as Document Management, Collaboration, Web Content Management, Records Management and Business Process Management/Workflow. To link the various “Manage” components, they will have standardized interfaces and secure transaction processes for inter-component communication. Solution must be a component of, and include a complete and integrated product suite at the module level as defined. This module has an initial scope of the following sub-categories and will have addition as the document develops. Security requirements; to ensure the technology supports secure access that meets organizations business needs, the solution must also be assessed with respect to how it supports end-to-end security as related to user authentication, document authentication, and secure network transactions over the Internet, Intranet, and Extranet as necessary. The complexity and scope of organizations security issues especially when dealing with distribution will require the collaboration of multiple sectors, segments and lines of business organizational disciplines including legal, business operations, system administration, network administration, vendors, and external users of the system. For more information on security related requirements, organizations should review for consideration, ISO 17799 Information technology -- Security techniques -- Code of practice for information security management.



The Store Module determines where the content goes and how it can be located. The Store Module is used for the temporary or transient storage of information, which it is not required or desired to archive. Even if it uses media that are suitable for long-term archiving, “Store” is still separate from “Preserve.” These infrastructure components are sometimes held at the operating system level like the file system, and also include security technologies, which will be discussed farther below in the “Deliver Module” area. However, security technologies including access control are super-ordinate components of an ECM solution.



The Preserve Module addresses long-term archival and storage of organizations' essential content resides. The Preserve Module of ECM handles the long-term, safe storage and backup of static, unchanging information, as well as temporary storage of information that it is not desired or required to archive. For purely securing information microfilm is still viable, and is now offered in hybrid systems with electronic media and database-supported access. The decisive factor for all long-term storage systems is the timely planning and regular performance of migrations, in order to keep information available in the changing technical landscape. This ongoing policy is called Continuous Migration. The Preserve Module contains special viewers, conversion and migration tools, and long term storage media.



The Deliver module determines how to get the right content to the right audience on the correct device. The deliver module for ECM is used to present information from manage, store and preserve modules. They also contain functions used to enter information in systems (such as information transfer to media or generation of formatted output files) or for readying (for example converting or compressing) information for the store and preserve components. The functionality in the deliver module is also known as “output” and summarized under the term “Output Management.” The deliver module comprises of three groups for functions and media: Transformation Technologies, Security Technologies, and Distribution. Transformation and Security as services belong on the middleware level and should be available to all ECM components equally.