Of algorithms and elephants: June 2011

Wednesday, June 8, 2011

Can you boot from an email-account?

Hi! You know how there are programs like Gmail Drive that let you use your email account as normal storage space? And you also know how you can boot from the network, right?

Now, if we had our OS mirrored to our Gmail account using Gmail drive, and have some other PC forward the virtual drive to us over the LAN, could we actually boot from that?

I'd love seeing anyone trying this and tell about his experience. Of course there is absolutely no use to this, but i felt it would be a funny experiment.

Monday, June 6, 2011

Google Flu Trends

Hi, check out this: Google Flu Trends

Apparently, Google is generating flu statistics from their search statistics - and it seems to work! Makes you wonder what other kind of information might be hidden in that statistical data, doesn't it?

Infinity: Hypercubes

We all know what a square and a cube is, and their generalization for higher dimensions is called Hypercube.

Now, let's see... A square has four vertices (0,0) , (0,1), (1,0), (1,1), thus representing 2 bits of information in it's vertices. A cube has 8 = 2^3 vertices and so on... You get the image.

According to wikipedia, the human genome has 3 billion DNA base pairs. We ll know there are four bases, thus every base encodes 2 bits of information. Any human's DNA is therefore one vertex of a 6 billion dimensional Hypercube. Somewhat scary, isn't it? Even more if we realize that all humans together, wo ever lived and will live, only cover an incredibly small subset of the vertices of that hypercube.

Now, let's get to something bigger - Infinity. We can encode every natural number as an infinite sequence of 0's and 1's. Therefore, every natural number is a vertex of an infinite dimensional hypercube. Note that the power set of the natural numbers can as well be expressed as an infinite sequence of 0's and 1's showing whether a number is in the subset or not. However, this does not show N = P(N), as N does not cover any vertices with an infinite amount of 1's in the sequence, while P(N) does. P(N) actually covers ALL vertices of that hypercube.

Now to an even fancier Vector space: The space of R -> R is also a vector space, but this space has uncountably infinite base vectors. R -> {0,1} is a subset of that space, our hypercube again (this time with oncountably infinite dimensions).

... Wait a second, didn't the last one look familiar? Right, looks like binary classification, doesn't it? Binary classifiers are the vertices of that last hypercube. This can be generalized to R^d -> R, but i won't discuss that further. Can we do search in such a space, maybe to find a binary classifier? Of course this is not as good as an SVM or even a Neural Network, but for the heck of it: Why not build a Bounding Volume Hierarchy (Bounding Squares, that is)? Here is how i imagine such an algorithm:

let train(i) be the i'th train data point. Let f(]x,y[) be defined such that for all x' in ]x,y[ f(x') = f(]x,y[) -> we are building some sort of hierarchical structure

1.) set f(]-∞,∞[) = 0
2.) partition(f,-∞,∞,0,n)

partition(f,a,b,i,j):-
3.) let m = (i - j)/2
3.) set f(]a,b[) = label(train(m))
4.) partition(a,train(m),i,m - 1)
5.) partition(train(m),b,m+1,j)

This should run in O(n) or O(n log n) depending on our data structures.
This can be generalized to more than 1 dimension, but it's obvious that this yields a classifier prone to overfitting (and also, i didn't want to invest too much time in something obviously bad).

How secure are my windows passwords?

I just stumbled upon this Guide for cracking Passwords, through a reference on heise.de.
As you might know, passwords are most of the time (hopefully) not stored in clear text, but in the form of one-way hashes. Thus, even if an attacker got hold of a complete copy of the user data (username + password), this does not automatically mean he will be able to access the user's data immediately.
Hash-Functions, to be secure, have to have the property that it is difficult to determine to a given hash(x) one possible x. Some hash-functions however, are a little weak, at least they become weak as the computational power of PCs grow.

For short passwords, Windows (before Vista) seems to store such a weak hash-value. The Guide above has links to the Windows articles showing how to deactivate this behavior. Also, the Guide gives a way for encrypting the hash-file, making things a little more complicated for the potentiall attacker.

Anyway, the truth is: no one actually ever needs your windows password to access your data. If someone has access to the pc, he/she can just boot it from a Boot Disk and access any data that is stored on the hard drive. Since cracking the passwords as described above would need access to the PC, we can assume the attacker has that access. In such a case, it is much easier to just bypass the OS and boot from a CD. The only protection against this is encrypting the hard drive, and/or setting up a BIOS password. If you lose your password, however, you will lose your data (in case of the encrypted hard drive) or your laptop altogether (in the case of PC, the BIOS password can be reset, but it's a hassle). Also, if only using BIOS password, you data could still be read by stealing the (non-encrypted) Hard Drive.

Another easy way to get access to a windows account described in the guide above, is to reset the account's password (which is far easier than cracking it). There are programs which can do this, see the guide for details. The good message about this is, you can reset your password if you have forgotten it! Plus, if someone did this to you, you would be able to realize there was an attack.

I also found a link about hard drive password protection (which is not like Encryption, but like the BIOS-Password) here.