DITA was designed to solve problems for businesses and authors. A DITA metrics solution should do the same. Metrics should be consistent across all documentation, relevant to writers and managers, and easy to understand.

What are Metrics?

DITA Metrics are used to measure the health of a DITA documentation suite. There are many reasons why professionals are concerned about metrics, but some of the most common reasons are below:

  • Technical writers are usually interested in metrics because it helps them measure content reuse and workload in the documentation they are responsible for.
  • Managers are concerned about productivity and workload of individual writers, as well as how to increase the efficiency of their team.
  • Upper management is concerned with metrics because it helps them during budget discussions, for example to prove the ROI of a new content management system.

In addition to these concerns, a DITA Metrics solution must possess certain characteristics:

  • Consistent: Technology changes, documentation grows, new systems are deployed. If the DITA Metrics solution is not giving consistent, useful results as the documentation environment changes, what good are they? A metrics solution, for example, must be able to meaningfully compare a documentation suite before and after deploying a new CMS. If you can't measure productivity increases across technologies, the metrics are not useful.
  • Relevant: It goes without saying that if a DITA Metrics solution is not relevant to the concerns of employees, managers and business, then it is not a useful solution. The DITA Metrics must address the concerns listed above, and more.
  • Simple: Employees change careers, new employees must be trained. People may forget certain definitions between meetings. The DITA Metrics solution should be easy to understand and explain.

What to measure?

With these concerns in mind, what form should a DITA Merics solution take? What should it measure? What are we concerned with?

What do writers do? What does their workload consist of? Writers manage page layout, publish documents, collaborate with SMEs, etc, etc. However, most of what writers do is ... writing. They write words.

At Samalander, we base our metrics on the number of words in a set of documentation. We believe that this is a good starting point because the number of words accurately reflects the content that a writer is responsible for. Measurements of workload, content reuse and productivity ultimately come down to the number of words in a set of documentation.

Let's revisit the requirements above:

  • Consistent: The consistency requirement means that we cannot use metrics such as page count or topic count. Pages can be reformatted and topics can be reorganized without changing the underlying content - and therefore without changing the workload on the writer - at all. The number of words, however, will remain consistent across times and technologies. For example, when moving from file-system based documentation into a CMS the topic structure may change, but the number of words will remain constant.
  • Relevant: Since words are what the writer actually spends most of their time on, measuring words is the best way of measuring a writer's workload. This leads into accurate measurements for productivity increases, etc.
  • Simple: When counting words, there are no abstract concepts. It's very close to the actual day-to-day work of the writer, and it's very easy to understand.

Publication and Reuse

An important feature of DITA is content reuse. Much of DITA's power and productivity increases come from content reuse, and therefore it is important to measure.

When does reuse actually happen? Individual topics contain no information about where they are used. The reuse is defined by a publication event through a DITAMAP. The DITAMAP defines which concepts are used, through topicrefs, and also a DITAMAP is required for resolving @keyref and @conkeyref elements.

Therefore, if you want to measure reuse, you must consider the DITAMAP and define reuse through publication events. However, the content that is published is different from the content in the entire documentation set - generally only a subset is published. How do we then define the metrics?

Words-Counted-Once and Reuse

In Samalander DITA Metrics, we use a concept we call "Word-Counted-Once." When a document is published it may reuse a topic in several places. The source topic will contribute to the total word count of the publication more than once. As mentioned above, this content reuse is an important part of DITA and should be measured, but it is the source topic, not the published content, that contributed to the writer's workload.

"Words-Counted-Once" is simply a measure of how many words appear in the source documentation. The words may appear more than once in the publication, but to measure the writer's workload we must count the words contained in each distinct topic in the source documentation. This measurement is "Words-Counted-Once."

Once we measure the Words-Counted-Once, we have an accurate representation the total workload on the writer. But productivity increases are accomplished through increases in content reuse. The total number of words in published content is easily measured, and then the amount of reused words becomes:

(1 - (Words-Counted-Once / Total Published Words)) x 100 = Reuse Percent

Element Metrics

We can measure topics, tasks and any other element using the same technique. To measure topics, we can count the number of distinct topics in the source documentation to get Topics-Counted-Once and then calculate the amount of topic reuse in the published content. Any element can be measured this way.