Thursday, February 25, 2010

Adopt Free and Open Standards for Digital Documents

When you buy a male plug (often known as a 'top'), you don't normally have to worry about whether it will fit into the female plug (power outlet) you have at home. This is because both of them are manufactured according to standards that are followed by the industry. It is easy for anyone to design and construct a 'top' that fits into an ordinary power outlet because the standards are available for anyone to study and make use of. So anyone can manufacture plugs that fit existing sockets or vice versa. And everyone benefits. Similar is the case with so many other devices and components we use that life would have been so much more difficult for us without these standards. Another example is bolts and nuts. One could buy bolts and nuts of the same size from different shops and still use them together.

The same is true in the case of the ubiquitous modern device, the computer. The keyboard is the same on all of them, the compact disks we use are of the same size and we need not relearn the use of a keyboard when we buy a new computer, nor do we have to specify a certain size for the CD we buy. These are all examples of open standards that have been adopted by the industry. And these standards make our life easier because we need not unduly worry about what we buy. However, that is not necessarily the case with what we use in the computer.

Society has started using the computer for all kinds of purposes, including creating documents, pictures, animations, videos, databases and so on. These are stored in a manner that only a computer can decipher. This happens transparently so that even computer users may not realise how they are stored. For instance, when I type this article, it is stored in the computer in the form of what we call a “file”. A file is a collection of ones and zeroes stored electronically. Each character in a text document like this one or each tiny bit of colour in a picture is represented in a computer by a group of ones and zeroes. What set of ones and zeroes represent a character is decided by the method of encoding used for creating the file. For many purposes, the rule by which this conversion is made is open to the public – such as ASCII or Unicode. But certain applications, such as Microsoft Word, save files it creates in a manner that is not open. It writes files into hard disks in a purely “binary” format that consists of ones and zeroes in an order that is determined by a rule that is known only to the makers of Word. In other words, that file can be opened and the original document seen in the proper manner only if Microsoft Word is used. Of course, some other people have worked out the manner in which Word does it, but not perfectly. Therefore, the file could be opened, albeit imperfectly, using some other applications also. This is the case not only with Microsoft Word, but is true of all proprietary applications including Adobe Photoshop, Lotus 123 and CorelDraw. Of course, this is the default behaviour and each of these is capable of saving the files in other formats (which, again, may not be perfect).

But that is certainly not an ideal situation. What it means is that we users become dependent on the software that we use to create our documents to open them. That means, if we want to open a document created using Microsoft Word, and see it in the way we had made it, we will have to use Microsoft Word itself. The problem is that we don't know what a later version of the application will support. There is no guarantee that a file created with an earlier version will be cleanly opened by a new version of the same software. More importantly, in today's dog eats dog world, there is no guarantee that even the company wll exist a few years hence. And this is not mere speculation. WordPerfect used to be the most popular word processor some years back. But very few people use it today, even though it is probably the best word processor ever according to many people. WordPerfect was purchased by Corel, which itself is a struggling company. It could be wound up any day, and WordPerfect may no longer be available. And no other application today can cleanly open documents created with WordPerfect.

Such a situation is bad enough for private documents. Imagine the situation in which important documents related to the matters of a country are stored in formats created by proprietary software companies – formats that are not open and therefore inaccessible to anyone other than those who created the software. It could become very difficult to retrieve the information. Or critical information about citizens of a country. The risk is not only of the information getting locked in, but also of a software company potentially holding a country to ransom. It may be appropriate to narrate an incident in this connection.

Venezuela is a poor country that is very much dependent on its oil resources. PetrĂ³leos de Venezuela S.A. (PDVSA), often pronounced Pedevesa, is a state-owned company that runs the oil industry there. A whole lot of processes, including metering, invoicing, billing, and customer service, was handled by proprietary software. At a particular point in the political transformation that happened in Venezuela, a large number of employees who were managing the computer system resigned and left suddenly, and took the user manuals with them (it seems under instigation from external sources that wanted to create trouble for the new regime). Not only was the whole system left without anyone to manage, the entire data, in a format designed by the company making it inaccessible to others. The company, conveniently, refused to help the country. The entire economy of the country was on the brink of collapse. It was saved only because a group of young people managed to migrate the entire computer system to Free Software – software that would save all information in a format that is open, ensuring that it will always be accessible.

Obviously, governments should ensure that all their data are stored in open formats so that at no time will they held to ransom by agencies, external or internal, that wished to create trouble for them. Use of standards is a requirement for a technological society. The standards for the units of measurement were an important step in the history of mankind. The possibility of interoperability of equipment such as telephones, radios or televisions, among others, is based on common standards of operation. The only way to ensure that everyone can build appliances, equipment or software that interact with each other is the use of open standards. This enables the storage and exchange of information and data between individuals or groups in different places and times.

There seems to exist an argument that proprietary standards should be allowed to co-exist with free and open standards. But this is meaningless, since there can be only one standard for one purpose. Else, it ceases to be a standard. Imagine having the metre and the fathom as "standards". While the former is a measure of length that is defined as precisely as humanly possible, the other is a rough measure based on the distance between the fingertips of an average individual's hands. Let us forget this aspect for the moment, and imagine that the fathom is another standard. What could happen is that some people could specify lengths in terms of the metre, and others could use the fathom. People would have to go on converting from one system to the other. Of course, this may contribute to improving the computing skills of individuals, but it could contribute much more to confusion. Remember the times we used to get bolts, nuts and screws in both the Imperial and the metric standards. Many unwary people landed up purchasing bolts that would not fit the nuts and screws that wouldn't go into threaded holes that were meant for them.

In any case, it is not good for government documents that will have to be maintained for a number of years. Problems could be especially severe if the "standard" adopted is not Free and Open and is proprietary. In the case of Free and open standards, even if the application that created the documents ceased to exist, the government could hire someone to create an application to open the document, since the format is open. Documents that are encoded in proprietary formats, such as the .doc format of MS Word or the .pm format of Adobe Pagemaker, would make it extremely difficult to open if the original application ceases to exist. Even if it did not, the government would become dependent on that company for ever. It is for such reasons that Free Software activists and others who understand the trap have been saying that the formats used by the governments for storing documents should be free and open.

But the latest draft document on formats to be used for e-governance speaks about other formats co-existing with free and open formats, which it had not till the previous draft. Why this change was made is totally unclear. Till the previous version, the draft was prepared after extensive consultations with the industry and the community, making it rather transparent. But the latest draft was prepared by someone (a bureaucrat?) under the strong influence of one or more company executives. The government seems to have placed the interests of the companies before those of the nation and the people. I would like to join a number of people including Free Software enthusiasts to urge the government to drop this move and admit only the use of Free and Open formats for documents of the government.