Saturday, March 6, 2010

Java Web Browser with Gecko

For my latest project on analyzing web-pages for user actionable items, I wanted to create a customizable web-browser. The idea is to have a controlled environment in which the web-page is rendered. It would have been simple to do this via java-script that is run before the page is being loaded, for example via a extension in Firefox, however, I wanted to do it in the back-end in a headless application. The first step would be to run a Gecko browser in Java.

This turned out to be simpler than expected. First, download swt.jar from eclipse. This provides us with a simple widget framework, inside of which our browser is rendered. Once we have that, download Gecko SDK aka XULRunner SDK from Mozilla. After installation, run
xulrunner.exe --register-user

Note: I tried xulrunner.exe --register-global, but it failed on my Windows 7 machine, perhaps due to user restrictions.

Now we create a new java eclipse project. Add swt.jar into the java build path. Also add MozillaInterfaces.jar and MozillaGlue.jar from xulrunner-sdk\lib\. Create a simple class with the following code


Display display = new Display();
shell = new Shell(display);
shell.setSize(800, 600);
shell.open();
Browser browser = new Browser(shell, SWT.MOZILLA);
browser.setBounds(shell.getClientArea());
browser.setUrl("http://www.google.com");


That is the basic code. However, this needs to be modified in order to manage timing issues. The entire java file with comments is reproduced here. This is from an independent source on the net, which at the moment I am unable to find and so cannot reference.

Now we just need to find a way to plug into the browser environment, get access to Dom, execute some javascript, and probably write some extensions to existing Dom elements.


import java.io.IOException;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;

import org.eclipse.swt.SWT;
import org.eclipse.swt.SWTError;
import org.eclipse.swt.browser.Browser;
import org.eclipse.swt.browser.ProgressEvent;
import org.eclipse.swt.browser.ProgressListener;
import org.eclipse.swt.widgets.Display;
import org.eclipse.swt.widgets.Shell;

public class SimpleBrowserWithGo {

// We will need SWT display to execute methods
// into the SWT event thread.

Browser browser;
private Display display;

// Latch used to manage page loading
// Uses a count of 1, so when the browser starts loading
// a page, we create a new latch, which will be
// decremented when the page is loaded.
private CountDownLatch latch;

// Default timeout to 60 seconds
private long defaultTimeout = 60000;

/**
* Creates a web browser which is able to load pages waiting until the page
* is completely loaded.
*
*/
public SimpleBrowserWithGo() {

// Use a latch to wait for the browser initialization.
final CountDownLatch initLatch = new CountDownLatch(1);

// MozillaBrowser needs a window manager to work. We are using SWT
// for the graphical interface, so we need to execute MozillaBrowser
// methods into the SWT event thread. If we were use another thread,
// that methods could not work properly and throw an exception,
// breaking the execution flow and crashing our application.
new Thread("SWT-Event-Thread") {
@Override
public void run() {

display = new Display();
Shell shell = new Shell(display);

shell.setSize(800, 600);
shell.open();

// If you have XULRunner installed, you can call the constructor
// without
// the last parameter:
//
// final MozillaBrowser browser = new
// MozillaBrowser(shell,SWT.BORDER);
//
// That last parameter is the path for XULRunner files
// (where you have uncompressed downloaded XULRunner package).
try {
browser = new Browser(shell, SWT.MOZILLA);
} catch (SWTError e) {
System.out.println("Could not instantiate Browser: "
+ e.getMessage());
e.printStackTrace();
return;
}

// Adapt browser size to shell size
browser.setBounds(shell.getClientArea());

// Listens for page loading status.
browser.addProgressListener(new ProgressListener() {
public void changed(ProgressEvent event) {
}

public void completed(ProgressEvent event) {
// When a page is loaded, decrement the latch,
// which count will be 0 after this call.
latch.countDown();
}
});

// Realease the initialization latch, which has value 1,
// so after this call its value will be 0.
initLatch.countDown();

while (!shell.isDisposed()) {
if (!display.readAndDispatch()) {
display.sleep();
}
}

System.exit(0);
}
}.start();

try {
// Waits until the initialization latch is released.
initLatch.await();
} catch (InterruptedException e) {
Thread.interrupted();
}
}

/**
* Loads an URL into the browser and waits until the page is totally loaded.
*
* @param url
* @throws SimpleBrowserException
*/
public void go(final String url) throws IOException {

// Creates a latch with count 1
latch = new CountDownLatch(1);

// Uses the SWT event thread to execute the method to
// load an URL in the browser.
display.syncExec(new Runnable() {
public void run() {
browser.setUrl(url);
}
});

// Waits for the finish of the page loading, or for a given
// timeout in case that the loading doesn't finish in a
// reasonable time.
boolean timeout = waitLoad(defaultTimeout);
if (timeout) {
throw new IOException("Timeout waiting page loading.");
}

}

private boolean waitLoad(long millis) {
try {
// Uses the latch, created by 'go' method to wait for
// the finish of the page loading (it will occurs when
// our 'progressListener' receives a event for its method
// 'completed'), or for a given timeout in case that the

// loading doesn't finish in a reasonable time.
boolean timeout;
timeout = !latch.await(millis, TimeUnit.MILLISECONDS);

if (timeout) {
// If the timeout expired, then we will stop
// page loading.
display.syncExec(new Runnable() {
public void run() {
browser.stop();
}
});
// Waits for the loading is stopped
latch.await(millis, TimeUnit.MILLISECONDS);
}
return timeout;
} catch (InterruptedException e) {
throw new Error(e);
}
}

public static void main(String[] args) {

// Instantiate our simple web browser
SimpleBrowserWithGo simpleBrowser = new SimpleBrowserWithGo();

try {
// Use the new functionality to load some URLs
// with our browser.
// simpleBrowser.go("http://www.google.com");
// Thread.sleep(3000);
// simpleBrowser.go("http://www.urjc.es");
// Thread.sleep(3000);
simpleBrowser.go("http://www.mozilla.org");
Thread.sleep(3000);
System.in.read();
} catch (IOException e) {
System.err.println("Problems calling go method.");
e.printStackTrace();
} catch (InterruptedException e) {
System.err.println("Problems calling sleep.");
e.printStackTrace();
Thread.interrupted();
}

Runtime.getRuntime().halt(0);

}

}

No comments: