PoLlama Forums  

Go Back   PoLlama Forums > PoForums > The Thinking Cap

Reply
 
LinkBack Thread Tools
Old 08-05-2008   #1
 
MrApples's Avatar
 
Join Date: Sep 2006
Location: Your Mom
Posts: 2,241
MrApples has a spectacular aura about
eTail - Text Compression Algorithm?

While working on the idea for another text compression algorithm, the idea for this one hit me.

You should know
A character on windows, or a letter ( such as 'a', 'A', '1', '$' ) takes up 1 byte. There are 256 different values in a byte, which is 8 bits ( binary ), so 2^8 = 256. 10001100 - 256 combinations

Most of these characters are almost never used, like % and ones I don't even know how to type. So, by using a smaller character set by default, and then extra bits for when a rare character is used, logically you should be able to save space.

00000 = a
00001 = b
00010 = c
...
11111 = *other* + 10101 = $

So a common character would take 5 bits, and a rare one would take 10, instead of every taking 8 ( an estimate ).

This is where it gets ugly
My idea is to have a separate character set for every NEXT character based on the CURRENT character.

If you look at the statistics for commonly used characters in HTML, you will notice there is always a steep slope on probability of the next character.

Probability of next character for 'a' in HTML
  • "s" 14098
  • "n" 12682
  • "r" 11894
  • "l" 10842
  • " " 10324
  • "t" 9159
  • ">" 7724
  • "d" 7303
  • "m" 6676
  • "c" 6145
  • "g" 5836
  • "v" 5548
  • "b" 3946
  • "y" 3132
  • "i" 2974
  • "p" 2698

Probability of next character for '<' in HTML
  • "/" 26523
  • "a" 8032
  • "d" 6211
  • "l" 3135
  • "s" 2847
  • "t" 2533
  • "b" 1980
  • "i" 1979
  • "h" 1168
  • "T" 853
  • "p" 823

So, logically by the laws of probability, this can be used to save data. As most, in terms of percentage/bit look like this.
  • 1-bit 19.7%
  • 2-bit 36.4%
  • 3-bit 61.9%
  • 4-bit 89.1%
  • 5-bit 97.7%
  • 6-bit 99.8%
  • 7-bit 100%

The algorithm is also very adaptive, and by just feeding it examples, it can work much better for those kinds.

It can also be much faster in comparison to current compression algorithms, such as gZip, as it doesn't rely on guesswork.

------------------------------------

Anyone follow? Anyone heard of something similar?

Last edited by MrApples; 08-05-2008 at 11:19 AM.
MrApples is offline   Reply With Quote
Old 08-05-2008   #2
 
King Stubby's Avatar
 
Join Date: Mar 2007
Location: Underground
Posts: 5,862
King Stubby is just really niceKing Stubby is just really niceKing Stubby is just really nice
Re: eTail - Text Compression Algorithm?

that is cool! I've never heard of anything like it!

MrApples, your genius is showing...


And now, directly from chip's signature...
King Stubby is offline   Reply With Quote
Old 08-05-2008   #3
 
GhostlySpoon's Avatar
 
Join Date: Jun 2008
Location: Northern Ireland
Posts: 840
GhostlySpoon is just really niceGhostlySpoon is just really niceGhostlySpoon is just really niceGhostlySpoon is just really nice
Re: eTail - Text Compression Algorithm?

I haven't heard anything like it, and I don't really understand it all, I understand up to the point of the next character probability thing. I understand what you mean about making some of 5-bit and others 10-bit.

What I find hard to understand though, is what the whole probability thing would do.


The World Ends With You!
GhostlySpoon is offline   Reply With Quote
Old 08-05-2008   #4
 
MrApples's Avatar
 
Join Date: Sep 2006
Location: Your Mom
Posts: 2,241
MrApples has a spectacular aura about
Re: eTail - Text Compression Algorithm?

Quote:
What I find hard to understand though, is what the whole probability thing would do.
What the first part is about, using less bits for much more common characters, and using more bits for rare characters.
MrApples is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may post new threads
You may post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Bug Report Text bugs Halb Bug Report Board 1 04-02-2008 03:22 PM
New Map Released (Neo LT v2.1) (17.10.07) night_wolveX News 0 01-14-2008 05:20 PM
Floating Text DarkBlade World Editor Help 3 12-19-2007 03:58 PM
Unit editor basic guide Halakbalakbalak Submit a Tutorial [World Editor] 1 10-19-2007 02:18 PM
Perma-flaoting text... Halakbalakbalak World Editor Help 5 10-13-2007 02:06 PM


All times are GMT -5. The time now is 06:22 PM.

A friend of Wc3Happy
Powered by vBulletin® Version 3.7.1
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165