Of algorithms and elephants: May 2011

Friday, May 27, 2011

Solving HornSAT in linear time

Recently, i wanted to program a simple horn sat solver for a project (turned out i didn't need it, oh well).
I couldn't find anything on the net (in reasonable time) about how to solve HornSAT in linear time, only references to that it's possible. The naive approach, however, at least the one i took, has quadratic worst-case complexity. Now i thought i might spread the knowledge a bit about how to solve the horn satisfiability problem in linear time.
See also Unit Propagation

1.) build a directed (bipartite) graph, connecting the Horn clauses to their respective positive literal (if they have one), and connecting the literals to all clauses where they appear negated.
2.)[UPDATE] for every clause c not containing a negated literal, do {
if c has no positive literal STOP -> unsatisfiable
else propagate(c)
}

3.) finally, the function for the unit propagation:

propagate(c) :-
if the positive literal of c (p) is not yet marked true {

mark p true
for all (p,c') in the Edge set of the graph{
remove p from c'
if c' has no more negated literals, propagate(c')
}
}

The marked literals form the minimal model of that Horn formula.

Because every literal is only marked true once, the whole algorithm runs in linear time (with respect to the number of literals, because of the graph building).

[Update]
While thinking about this some more, i figured the above algorithm is only better than the naive one, if the number of literals is in O(#clauses²). For completeness, here the algorithm i call the "naive":

while there exists a clause c without negative literals{

if c has no positive literal then STOP -> formula not satisfiable;
mark the variable of the positive literal of c as true;
for all clauses c' remove the just marked literal from the list of negated literals of c'

}

Let c be the number of clauses and l the number of literals.
With the proper implementation, the naive algorithm can run in O(min(c², cl)). This is better then the algorithm above if l is in Ω(c²).
We can easily check this by summing the sizes of the sets of negated literals, which can be done in O(c). If we wire this all together, we get an algorithm in O(min(l,c²)), as c is in O(l).

Here is also a paper on linear time algorithms for Horn Sat: Link

Wednesday, May 25, 2011

What to do when you cannot post comments on blogger.com

Hi! Just a short notice:
Some people have problems not being able to post comments on blogger. Happened to me actually.
The reason for this is that blogger uses third-party cookies for the commenting. I have no clue why they would do that (probably trying to give a cookie from blogger.com or google.com while you are actually at blogspot.com).
Anyway, the solution is accepting third-party cookies. You can do that through your browser configuration.

What are third-party cookies?

Websites use third-party cookies for tracking people, to identify the users when they visit (revisit, actually) their site. This could ALSO be done without cookies, through the IP-Address (except for some Providers, where the IP-Address changes repeatedly, or someone using Thor), but the IP-Address changes every time you connect to the Internet (normally. that is), so they can only be used for short-term tracking. The really interesting data comes from long-term tracking, as web sites can get a feeling for how many regular visitors they really have. So, third-party cookies don't harm you physically, but to a certain extent, violate your privacy.

Tuesday, May 24, 2011

Family Search and Machine Learning

I watched these two videos on youtube lately: FamilySearch Granite Mountain Vault - Part 1 and FamilySearch Granite Mountain Vault - Part 2

Apart from me thinking: "Man, these guys are serious about long-term storage", i was also impressed by the huge amount of handwritten documents that will be available in digital form by ten years from now (or, in other words, get available more and more as we speak). I mean, billions of documents. Would that not be enough training data to train a really well performing handwriting recognition engine.

It seems there might already be such engines, but if there are, why are they not used,let's say, as some kind of recommendation inside of new.familysearch's indexing program? If there's not, oh well, here's a massive amount of to-be indexed handwriting. But this dataset could even go so far (and would only be useful if) as to recognize writing like old german handwriting (which less and less people are able to read) and other old handwriting. Combined with high-level domain knowledge about existing cities and common names at that time, this could make up quite a tight expert system. I would never dare to let such a system do the indexing by itself, but help is always aprreciated, isn't it :-)

It would actually already be a help sometimes to just have character extraction to tell the single characters apart. Oh well, this might be the most difficult part of the work the machine would have to do...

Saturday, May 21, 2011

How to access Wiktionary to check for the existence of a word

I was just listening to this great presentation on robotics and cloud computing, when it struck me that i could share some aimple code i wrote a while ago, with application to cloud computing (although i wouldn't call it that way).

This was for some simple game i wrote in Java, where you had to find words in a character matrix (on Facebook there is a similar game called Scramble). In order for this to work, i needed a dictionary with valid words. Now i didn't know where to get such a list of words from, so i decided to let the program learn the words by itself, through user input, but also through access to online dictionaries. Here's the code:

protected boolean checkThroughWiki(String word) {
 URL url = null;
 try {
     url = new URL("http://de.wiktionary.org/wiki/" + word);
 } catch (MalformedURLException e) {
     return false;
 }
 try {
  HttpURLConnection con = (HttpURLConnection) url.openConnection();
  con.connect();
  if (con.getResponseCode() == 200)
      return true;
  return false;
 } catch (IOException ex) {
  System.out.println(ex.getMessage());
  ex.printStackTrace();
  return false;
 }
}

Doesn't look to difficult, does it? Plus it worked the last time i checked. Of course this is for the german Wiktionary, so you would need to change the URL to point to the wiktionary version of your choice.
What it does is simply checking if the page that should contain the word exists.

Voila, now it's up to you to make a cloud translation or whatever library out of this. ;-)