Monday, November 9, 2009

Personal computing history (part 6)

And, after a while, DOS was no longer king. Windows ruled.

And with Windows came Microsoft's Word for Windows. Version 6.

And then came macro viruses...

From the anti-virus vendors came much shouting "How do we parse these files? What is in these files?"

And from Microsoft came the grudging reply "We cannot tell you."

Not "Will not". "Cannot".

You see, back then Word files (and Excel files) were saved C++ object streams. To save a WORD file, you had C++ write the associated objects to disk. To read it back, you have C++ read the associated objects from disk.

So nobody had "designed" the file. It was just C++ objects.

If you use the right C++ compiler and objects, everything just works. Use something else...

So, the best that Microsoft (again grudgingly) could do was to provide a "reference" implementation of C++ objects to read/write an appropriate OLE file, and some documentation on the internal WORD objects (complete with bugs...)

So, what do you do with this? Well, the anti-virus vendors split into 2 camps:
      1) Use the reference implementation as the basis of the scanner
      2) Use the reference implementation to figure out the actual binary contents of a .DOC file.

The company I worked for went for option #2. I had much "fun" (for strange values of "fun") translating object streams into actual binary data.

But in the end, we knew what was in a Word file (or Excel file or...). Down to the bit. And could parse a Word file in ANY language...

No comments: