XSE Algorithm Description
This scheme uses Marsaglia's XorShift PRNG set to generate a keystream, and then uses it to encrypt plaintext by simple XOR operation (in a way possibly not so easy to break). Each PRNG requires a 32bit seed and generates a different sequence of 32bit integers of period 2^32-1. As you probably already know, they are fast (3 shifts and 3 xor to generate a number) and have good statistical properties (and yes, they can be seen as linear functions). Shift amounts and directions differentiates the 648 PRNGs that form the complete set. The primary key of the algorithm is 128bit long (the MD5 hash of the string inserted on the command line). Firstly this key is extended to 1040 bytes by using the MD5 hash function. This key can be viewed as an array of 32bit words.

For every 32bit word of input (read sequentially), 4 (possibly) different PRNGs are selected from the set. Four sequential key words chosen from the extended 1040-bytes key are used as seed for the 4 PRNGs, generating 4 new numbers. These four numbers are used to replace the four words of the key and to encrypt the input word in this way: the first two are xored together and the result is rotated by a certain amount that depends on the chosen PRNGs. The same is done for the 3rd and 4th word. The two resulting values are xored with the input word, together with another value that depends (again) on the 4 chosen PRNGs. To achieve higher security, the algorithm periodically regenerates the key and changes the PRNG selection function. (I know that all this sounds vague and confusing - read carefully the pseudo code below for a throughtful explanation)

The following technique is used to make the cipher resistant to chosen and known plaintext attacks, and to verify payload integrity: upon encryption an MD5 value ("uniqueID") is output before the actual ciphertext. If the plaintext comes from a file, this value is equal to MD5(MD5(plaintext)+key); if it comes from stdin, this value is set to 16 bytes read from "/dev/urandom". The actual base key used during encryption is MD5(key+uniqueID). By doing this, each bit of the key (and thus of the ciphertext) depends on the entire plaintext. Upon decryption, a check is made to verify that the uniqueID of the decrypted data matches the one included in the ciphertext. If the ciphertext is modified by an attacker, this check will fail. If the uniqueID comes from "/dev/urandom" then this check is skipped (...only in this version...it could be made by calculating MD5(MD5(plaintext)+key) on input stream EOF and by appending it at the end of the output stream...but I'm lazy! :-).

To be more precise, here is the detailed pseudo code of the encryption process.
(k[0..259] is the extended 1040 bytes key (1040 bytes=260 32bit ints))

• Compute command_line_key=MD5(command line key string)
• Compute uniqueID=MD5(MD5(input file)+command_line_key)
• Compute key[0..3]=MD5(command_line_key+uniqueID)
• While any of the 32bit words k[0..3] equals 0 --> key[0..3]=MD5(key+uniqueID)
• selector=key[0]^key[1]^key[2]^key[3]
• orig_key=key[0..3]
• Extend key from 16 bytes to 1040 bytes: repeat key[i..i+3]=MD5(orig_key+key[i-4..i-1]) for "i" ranging from 4 to 256, step 4.
• Using the same method, generate 81 more md5 sums (1296 bytes). Associate these bytes to the xorshift PRNGs (2 for each prng - a[0..647] and b[0..647])
• Shuffle the complete set of xorshift PRNGs, depending on the extendend key.
• Choose one of the 648 PRNGs as the function used to alter "selector" during encryption ("selrng")
• Choose koff = offset in key[0..255]
• Set keyptr=&key[0]
• While more data to process:
• Read one block (possibly) of input data (consisting of 16384 or 1024 bytes, depending on the command line)
• Repeat "rounds" times the following:
• For each input 32bit word of the block:
• Use the first byte of "selector" as an index in the shuffled PRNG array. Use the indexex PRNG to transform the key word koff[0]. Use the generated number to replace the key word, and annotate it.
• Use second byte of "selector" as an index in the shuffled PRNG array+131. Use the indexex PRNG to transform the key word koff[1]. Use the generated number to replace the key word, and annotate it.
• Use third byte of "selector" as an index in the shuffled PRNG array+261. Use the indexex PRNG to transform the key word koff[2]. Use the generated number to replace the key word, and annotate it.
• Use last byte of "selector" as an index in the shuffled PRNG array+392. Use the indexex PRNG to transform the key word koff[3]. Use the generated number to replace the key word, and annotate it.
• XOR the first two number together, rotate the result right by b[selector[2]]&31 bits
• XOR the third and the fourth number together, rotate the result right by b[selector[3]]&31 bits
• XOR the input word with these two rotated values
• XOR the input word with another 32bit int formed by concatenating a[selector[0]] + a[selector[1]] + a[selector[2]] + a[selector[3]]
• Output the result as encrypted/decrypted word
• Update selector by running it through "selrng"
• Update koff by xoring it with b[selector[0]] ^ b[selector[1]] ^ b[selector[2]] ^ b[selector[3]]
• Output processed block
• Regenerate 4 words of the key: Compute keyptr[0..3]=MD5(keyptr[0..3]+orig_key) until keyptr[0..3] does not contain null 32bit words
• Reset selector: selector=keyptr[0]^keyptr[1]^keyptr[2]^keyptr[3]
• keyptr += 4
• Reset selrng and koff
I decided to make the number of rounds configurable so that if you have to encrypt a small file (say 5MB or less -- should cover 90% of uses) and you accept to wait a few tenth of second more, you can use 8 rounds with rapid key regeneration ("e8+" mode) to reach a high level of security. As a general rule, watch your PC's hd activity led during the "Encrypting..." phase. Try incrementing the number of rounds (or add the "+" option) until the led starts to blink. By doing this, you won't waste time waiting for your HD to do it's job. However, people needing speed (say for streaming encrypted content) achieve speeds faster than AES with default 1-round mode (unoptimized asm version runs 2x faster than many AES implementations).
A word on self-decrypting files: by using this mode, you enable your friends to receive files encrypted with XSE and decrypt them without having anything installed on their machine. They just double-click on the self-decrypting executable and, after introducing the key, the original file appears next to it. It couldn't be easier! Keep in mind that the original filename is stored inside the executable in plain text (at least in this version).
Two friends of mine ("Roccio" and "Mandingo"), made a nice little GUI for windows that uses the drag&drop method to automatically encrypt/decrypt files by calling the dos executable. It doesn't mess your registry file up, and it's extremely easy to use...try it!
Encryption raw speed
(in MB/sec, not including I/O)

Tested on a 1.4GHz Centrino Notebook on 200MB of data
The use of the "+" option (fast key regeneration) slows down speed by 1%..4% depending on the number of rounds
 Rounds Speed 1 81.733 2 41.003 3 27.366 4 20.536 5 16.411 6 13.698 7 11.731 8 10.267

Final considerations
I really hope to get some feedback from all this (flames excluded). I've done my best to make it secure, fast and scalable. Every pass of this scheme has a reason behind and aims at making an attacker's task more difficult. But while its security is kind of speculative, perhaps its most important quality is simplicity. It should be very easy to cryptanalize it and to find weaknesses/strong points against known attacks (which I know very little, I admit...but that's why I'm posting it in a sci.crypt sandbox!). I've even included a command line option ("-") that prevents the periodic state reset, so that you can study the cipher more "confortably". PLEASE, if you spend some time on this cipher, share your thoughts (positive or negative) with me. Maybe it's strong, maybe I can improve it, maybe it's dead in the water, but I'd really like to know your opinion - and learn something more.

One final word: if someone among you is an assemply code master and knows how to optimize my little and completely unoptimized asm loop in xse.c, you're welcome! :-)