Reversing a Checksum Algorithm? 35
Todd asks: "Does anyone have good suggestions or advice regarding techniques for determining the checksum of a serial data protocol? I am currently working on a fun little project using a digital display sign which uses an unknown serial protocol. I cannot move on with the project until I figure this dreaded checksum out."
Just a Q (Score:1)
it's not these guys right? (you didn't say what phone # you tried so...)
Use the Force Luke! (Score:5, Insightful)
I don't think you have much choice but brute-force it.
Collect as many as possible checksum -generators, and try them on portions of message.
Some more-or less trivial are 'LRC', CRC, CRC16, and CRC32 ones. Of course XOR too. difficulty is to determine which characters go to checksum, as it's not always that 'STX'/'ETX' are included.
UTFS? (Score:2)
trying to emulate, let someone in a free country trace
it with a debugger (gdb, WinICE, Turbo Debugger...)
in order to find at least the assembly code out.
Then you can hire an assembly guru (like me
to write a routine which does the same.
Need more info (Score:4, Interesting)
Is it a very simple machine with a very simple processor and limited memory? If so, they may well have written their own with shifts and xors. On the other hand, if it is a modern processor with megabytes of memory, they probably have used a library function. Can you get access to the standard libraries for that chip?
How long is the checksum? What happens if you feed a wrong one? If it is just a single byte, and the box doesn't lock up completely on error, try all possible values for every packet!
Why do you think they put the checksum in the protocol? To catch a few dropped bits over a bad line, or to prevent tampering? If the later, watch out for some real cryptography. If only the former, assume they took the path of least resistance.
Re:Need more info (Score:1)
Take the code from the dos App. (Score:3, Insightful)
Well run it in a debugger and locate the part the does generates the checksum, you could rip the code out directly(or load the application up into memory and call the checksum bit), this will probably violate a few laws though.
So the best way is use there code to work out the algorythm and then write you own code(even better get someone else to do it) based on the algorythm
Deciphering Bits (Score:2)
The 13-14 bits look like they change with the length of your message. With messages of 'TEST' or 'WICK' the value in 13-14 is 'DA BC'. With a message of 'T', the value in 13-14 is '5A BA'.
Neat project - keep us posted on the progress!
Hope this helps.
Broken link (Score:2)
Though it was years and years ago, I'd done some 6502 assembler programming. In fact, I just pulled out my copy of "Programming the 6502" by Rodnay Zaks (Sybex Inc.). I was never that good at it, but I had a lot of fun trying!
Unfortunately, when I tried to download the binary ROM image [dickwick.com] of the 6502 code on his web page I got an error: Cannot find the file specified.
If/when that gets fixed, the next step would be to get a computer do the work. A quick check of google [google.com] using the search string: "6502 disassembler" [google.com] brought up about 337 matches. Granted, some of the disassemblers are designed to run on older hardware (e.g. Atari, Commodore 64), but I saw links for unix, windows, and DOS, too.
My process (Score:2, Interesting)
Re:My process (Score:2)
If you can sniff good packets, you can compare slightly different ones. If you can't sniff good packets, you can try generating packets exhaustively to find good ones, and then attack those differentially.
Try everything (Score:2)
Shouldn't be hard at all (Score:2)
Next, set it up to send a test message, like "test 12345". Then set a breakpoint in the code that starts transmitting to the serial port. That shouldn't be hard to figure out with some old DOS references.
Then run the program, and when the breakpoint hits search memory for your test string. Then you find the memory location of the buffer being sent by the application. Then set a memory breakpoint so the debugger will break when that buffer is read, and continue the program and have it send the message again. Now you should be able to zero in on the crc code, as it reads the buffer to compute the checksum.
Once you find the loop of asm code that's computing it, you should be able to understand the algorithm pretty easy.
Re:Shouldn't be hard at all (Score:1)
Set up a breakpoint there or something; the register AH is 01; AL is the character that's being written; and DX={0=COM1, 1=COM2, ...}
(hope you like assembly!)
Re:Shouldn't be hard at all (Score:2)
ah, but those were the days, defining what io port and irq a serialport was on to every application that used it...
Re:Shouldn't be hard at all (Score:1)
I scanned thru the EXE file myself for an occurence of this function call; I did find one pattern, but couldn't determine what program segment it was is, if it's actual code, etc. Still trying to get dos's old DEBUG.EXE working under wine (and don't want to install windows again)
Latteral thinking (Score:2)
Since most of the good advice I can think of for reverse engineering the check sum has already been posted, I'll suggest something orthogonal:
You might consider black-boxing the original app using DosEmu and glue code of your own devising. Stuff goes in, stuff comes out, and you're done.
-- MarkusQ
P.S. I'm assuming, of course, that this is just a one-off project, not something you plan to replicate.
just run it emulated or interpreted (Score:2)
Input! More Input! (Score:5, Insightful)
8-bit-number
Bits == 76543210
Csum(x) is the value of the checksum with slot #x and all else held constant.
Csum(x) = Csum(0) +
(Bit0(x) ? -0x0F : 0) +
(Bit1(x) ? -0x1E : 0) +
(Bit2(x) ? +0x3C : 0) +
(Bit3(x) ? -0x78 : 0) +
(Bit4(x) ? -0xF0 : 0)
Note that the values added/subtracted for each bit follow a pattern:
0x0F = 00001111
0x1E = 00011110
0x3C = 00111100
0x78 = 01111000
0xF0 = 11110000
More data might shed some light on the pattern. Whatever the case, I think this is reminiscent of a CRC16, since I don't think a checksum would have this kind of behavior -- a standard addition checksum or a XOR-based checksum (even with bit rotation) would make a bit always add/subtract the same amount, but it would be a power of 2 (I think).
So now you need to find the CRC's polynomial, and I don't know enough about that kind of thing to help you. (And there is a chance that everything I've said here is wrong, since this is not my specialty!)
talent is the key (Score:2)
Most of the checksum algorithms employed in that era were extremely light weight and not at all good by crytographic measures. The 6502 has a register set that makes x86 look positively beefy by comparison.
This could work against a simple reverse analysis. The algorithm might be table based to alleviate register pressure, and it's clear to me that the topic author is not up to the task of fleshing out such a table by differential techniques. If he had that kind of talent, he wouldn't be asking us.
On the other hand, working with a tool such as SoftIce, it's not at all difficult to capture the checksum code red handed. I used to play Falcon 3.0 under SoftIce so I could repair the system crashes on the fly (especially the guaranteed hang when you hit a ground target with one specific type of munition). "R2, see what you can to about the power."
And yes, I bypassed many kinds of crashes and checks using SoftIce. It was far easier than you would think and I enjoyed the small challenges as much as any game I played. Differential analysis is on a par with learning to play Go because you enjoy of the humility of probing your boundless incompetence.
At the other extreme, the 6502 instruction set can be learned in a couple of hours. People forget how simple things used to be. The 6502 is a four function calculator on steroids. The average configuration file these days (Apache, SMB) takes longer to master.
Send simple variations (Score:2)
Ask the VP? (Score:4, Insightful)
Solved it (Score:5, Interesting)
Good luck with your project,
Gijs
Re:Solved it - almost (Score:3, Insightful)
Except that you didn't, completely. The trickiest part of this checksum is that it shows up TWICE in the packet format. As you state, the checksum is initialized after the ff attention byte, and then most of the packet is run through it. However, looking at the initial packet:
FF 35 35 00 0E DA BC 01 00 00 08 00 19 02 54 45 53 54 1F 8C 52
^^^^^ ^^^^^
both of the indicated portions are checksums. The first checksum covers the string "35 35 00 0E" and the second covers the string "35 35 00 0E 01 00 00 08 00 19 02 54 45 53 54 1F". The checksum is not re-initialized after it's output. Incidentally, here's my perl version - it uses bit operations in preference to the %-heavy code that you use, but it's the same algorithm:
#!perl
$check = 0xa55a;
$sum=0;
@bytes = (@ARGV);
for $pos (0..$#bytes) {
$sum += (hex($bytes[$i]) ^ ($pos & 0xff));
$sum = $sum & 0xffff;
$check += $sum;
$check = (($check >> 1) & 0x7fff) | (($check & 1) << 15);
}
printf "%04x\n", $check;
__END__
The easiest thing.. (Score:2, Insightful)
I'm no asm programmer (esp not 6502), but I'm guessing it's (counting from zero at start of rom)
(02A1) BEQ 02B0 and
(02AE) BEQ 02C5
that should be changed from BEQ to BNE (branch equal/not equal).
I spent some time in dosemu debug, but there didn't seem to be any simple algorithm (again, I don't know asm).
Brute forcing a checksum algorithm is pretty much impossible unless they've used a standard one. The best bet imho is to either grab it out of the sign.exe or disable it in the sign.
Have you checked... (Score:2, Interesting)
There's a thread the 'The LED Forum' about this sign. The people there might be able to help... http://www.eio.com/public/led/0219.html [eio.com]
Also, one message in that thread is for a message at SignIndustry.com http://www.signindustry.com/mb/read.php?f=19&i=13
While the information you are after is not on these boards, at least you will be able to contact some people who have the boards - maybe they've found some more information.
Other than that, I can only repeat what a couple of other people have said here - do some very methodical testing, starting with all single character messages from a-z, A-Z, 0-9 and punctuation. Then try a fair range of two character messages with the same options as the first run. (aa-az, aA-aZ, a0-a9, ba-bz, etc, but there's no real need to try ALL combinations.) Then try a fair sampling of 3 char messages. Next try some longer strings, modifying one character at a time and the same string modifying each property. After that, try a range of single characters (say 10-15 different characters) with each of the options. Finally, do some 2 char, 3 char and a longer message with a reasonable set of different options.
If you gather a truck-load of data (several hundred sets, not just 100) you stand a better chance of figuring it out. You really need the trivial cases (single and two character strings) and some non-trivial cases (a phrase, not just 'TEST') to make it easier to reverse-engineer the algorithm.
As others have mentioned, given the time frame and the fact that it is 6502-based, the algorithm is not likely to be particularly complex, but working it out can take a lot of time!
Mean-while, I might have a look at the 6502 code and your test data to see if anything jumps out at me...
Good luck!
I've got the same sign... already figured it out. (Score:1)
http://www.kuci.org/~jburley/LED/ [kuci.org]
Hope that helps!
online tutorial (Score:1)
Not fun. (Score:2)
If the sign doesn't use one of the standard polynomial CRC algorithms, you're only real option is to disassemble the binary you have to dig out the algorithm. Even if it does uses a standard algorithm (It's probably CRC16), you'll still have to figure out what data it's using. You can guess and check, and you may get lucky. Disassembling is a sure thing, and probably the quickest way to go, but check your licensing agreement.
I recently reverse engineered the CRC for my LG TP-5250 cell phone which, as it turns out, uses the standard CRC16 algorithm, but is creative about which data is uses to compute the checksum. Without a dissassembler it would have taken months to figure out what bytes were being used and in what order. You can read about how/what I did here [spineless.org].
Pay no attention to the man behind the curtain (Score:1)
Re:Pay no attention to the man behind the curtain (Score:2)