Thursday, January 29, 2009

In the beginning...

I played games on a dial-up teletype connected to a HP 2000C mainframe. Slow and frustrating, it did get me into computers, and I've been there ever since.

Today, I can look back at my humble origins, but I still can't decide if the latest whiz-bang computer game is more fun or not...

Monday, January 26, 2009

Smoke and Mirrors...

The web is a linked world. That's what it was designed to be, and it works.

But, sometimes that can be a problem.

Basically, all information on the web is ephemeral. If you look at a page today, you pull it from the server. If you look at the same page tomorrow, you pull it from the server again. The information that was here today can be gone, like a puff of smoke, tomorrow.

For lots of things, this doesn't matter. Who cares what was on diggs' homepage last year?

But for CNN(for example), we have completely different expectations. If I want to look up which politicians were indicited on Jan 21, 2009 (I can dream, can't I?), but CNN pulled the page after 1 month, how do I look back into history?

Even worse, the Orwellian prospect that history has been changed. Has a name been dropped from the list, or has one been added? A publisher can't recall and change all his "dead tree editions" overnight, but a minute with a keyboard and everyone's history has changed. What, this isn't what was there yesterday? Prove it... (this is why groklaw keeps an archive of all legal documents related to the SCO vs IBM and related cases.)

Sorry, veered off into politics. I must watch myself.

Getting back to my last post, the linked nature of the web's information can be it's weak point. If something happened to me, what would become of (for example) DeSmet C?

In a month or so, my provider would notice that I've stopped paying my bill, and cut off my service. The webserver would no longer be online, and the pages would vanish.

Could the internet archive carry on? To an extent -- it can't save everything, and has a size limit on what it does copy. The files aren't there, only the text.

There are links to the site and it's content, but everyone assumes that the information will be there (effectively) forever. I'm guilty of it, too...

What needs to be done is to not just link, but mirror. Wget is your friend. Disk is cheap.

Has an interesting / worthy site gone away? It doesn't have to vanish forever.

Friday, January 23, 2009

Because newer is not always better...

Sometimes, "new and improved" means it breaks things.

Your favorite feature may not work anymore...

It might not work with your system...

It might not be compatible with older files...

The UI was designed by a color-blind dyslexic student during a Saturday night bender...

And, with the linked nature of the web, chances are everyone links to the "canonical" download site (which will come up in a rant for another day...), so that you don't have a choice about what you are getting.

Unless you go to a site like oldversion.com, where the past versions of software are still available.

Wednesday, January 21, 2009

Geek relaxation...

After all the source code reconstruction articles, it's time for some geeky relaxation.

Go over to comic-images.com and wallow in it!

Monday, January 19, 2009

Re-creating source code, epilogue

So, what's the status of the DeSmet 2.51 recreation?

As of Jan 19, 2009:

The C88 souce has been recreated, compiled, and code equivalency (it's not identical, unfortunately) has been verified.

The GEN source is a work in progress. GEN has actually changed a lot between the versions (GEN was given the ability to generate object code directly, not just assembly), and backtracking it is tedious. That, and our cats (ok, that last page is very out of date. But it gives you an idea...) and work are taking a lot of time.

ASM88 has also changed, but I expect the changes to be fairly minor.

BIND has changed, but again I expect the changes to be fairly minor.

I really need to get off my ass and post at least the C88 source I worked so hard on...

Friday, January 16, 2009

Re-creating source code, Part 8

OK, enough suspense...

DeSmet C was written in DeSmet C.

But which version? Aye, there's the rub.

I don't know.

As I said before, later versions of the compiler have additional functionality and improvements. Part of the improvements is better code generation and register utilization.

For example, the C source file

int i;
char *j
main(argc,argv)
   int argc;
   char *argv[];
   {
   for (i = 0; i < argc; i++)
      j = argv[i];
   }

compiled with either version 2.51 or 3.03 generates (for the "j = argv[i];")

mov   si,word i_
shl   si,1
add   si,word [bp+6]
mov   si,word [si]
mov   word j_,si

but compiled with version 2.40, we get

mov   ax,word i_
shl   ax,1
mov   si,ax
add   si,word [bp+6]
mov   si,word [si]
mov   word j_,si

In addition, the x86 architecture has both "short JMPs" (for locations within
-126 to +129 bytes of the current location) and "long JMPs" (for locations within the current 64K segment). (It also has FAR JMPs, but we don't have to concern ourselves with this for the small memory model of DeSmet C).

All conditional JMPs are "short JMPs" (also called near JMPs). If your conditional has to send you to a location that is furthur away, you have to do a reverse-sense conditional JMP around a long JMP.

The compiler tries to use short JMPs wherever possible (both unconditional and conditional) to produce smaller executables. But, trying to guess how far away a forward JMP is is tricky, and sometimes it errs on the side of safety and uses a longer JMP just to make sure that it doesn't try to go "out of bounds". The exact rules, however, vary from compiler version to compiler version.

The result of all this is that, even after re-creating the source to version 2.51, I cannot get the same executable as the distributed 2.51. All the code sequences are equivalent, but the details of register useage and short JMP / long JMP are different. Very close, but no cigar.

Wednesday, January 14, 2009

Busy today...

Blog tomorrow.

Sorry about that.

Monday, January 12, 2009

Re-creating source code, Part 7

So, how do you compile a compiler?

It seems pretty straight forward, something like
      make compiler
and in many cases this is all that is necessary. But not always.

Many languages (like C or Pascal) are written in themselves (i.e. C compilers are written in C, Pascal compilers are written in Pascal). But, how do you get the first compiler compiled? This is known as bootstrapping a compiler.

If this is really the first instance of the language (i.e. you are Dennis Ritchie triumphantly creating the C language), your first compiler, however primitive, is written in another language. Later versions of the compiler are then written in the language itself.

If the language already exists on one computer and you are bringing it to a new system, you can either write the compiler in an already existing language for the new system, or you can use a cross compiler to generate the first generation of the compiler for a new system.

So, what does this have to do with re-creating an early version of the DeSmet C source?

Well, what compiler was used to compile DeSmet C?

Friday, January 9, 2009

Piles of bits...

I don't have time to do the blog entry I was planning for today (if you're curious and want to get a head start, look up up bootstrapping).

In my spare time (probably 10 minutes a day...), I've been archiving my old floppies. That isn't anywhere near a complete list (I probably have over a thousand floating around), I was interrupted by a lawsuit (not aimed at me, but that didn't matter). But I'm back to work at it now.

It amazes me, however, at all the stuff I've collected over the years. Who remembers the CDC Plato system? I found some source code for lessons. I have my virus zoo. Old games. Early programming attempts. Drivers for long gone and forgotten (by everyone, including their manufacturers) hardware. Commerical software where I wish I had kept the manual instead of the disk. And so much stuff that makes me wonder "Why the hell did I ever have that in the first place, and why have I kept it all these years?"

Ahhh... nostalgia.

Wednesday, January 7, 2009

Re-creating source code, Part 6

So, what toys do I have?

Well, I have the source to a later version of the compiler I use daily.

I have the compiler I use daily in binary form.

I have a disassembler.

I know how to decompile.

The magic trick is realize that the later version of the compiler is (almost always) a superset of the earlier compiler. It has the functionality of the earlier compiler, plus additional features and improvements.

So, I can compile the later version of the compiler, the disassemble it. This gives me source, binary, disassembly, and labels.

I can disassemble the binary of the earlier compiler. This gives binary and disassembly.

I can now compare the disassemblies. This will let me generate labels for the disassembly of the earlier compiler. It will also let me (thinking about decompiling) see what source statements were / were not in the source to the earlier compiler.

So, as I compare the disassemblies, I can edit the later compiler source to backtrack it to the earlier compiler source.

Good, right? Well, it turns out that there is one more thing that must be considered in doing this. I'll talk about it next time.

Monday, January 5, 2009

Re-creating source code, Part 5

So, looking at the previous example, it's obvious that decompiling is a lot harder than disassembly.

So, it would be stupid (or really masochistic) to try to decompile a big program. Like a compiler.

But, before I got the source code to DeSmet C, I tried just that. Ran the compiler (at least the 1st pass of the compiler) through a disassembler, and went in and started decompiling.

This took me a few months of spare time. In the end, I had a decompiled compiler, but there was too much ambiguity (like unknown variable names -- even badly named variables are much better documentation than no names at all) to be of any use. Oh, well...

Fortunately, I did track down the source code sometime later, and had the toy I wanted. Almost...

I had the source code to a later version of the compiler. And later means more features and bug fixes. This is good.

But, I've also learned that you don't change compilers (or even compiler versions) unless you have a good reason to do so.

Most of my early code was in an earlier version, and since it worked, I kept on using it. And this was not the version I had the source to. This is not so good.

But then I had an idea. I'll talk about it next time.

Friday, January 2, 2009

Nothing to do with Re-creating source code

Happy New Year!

Next time, more tech geek stuff...