<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Codethink &#187; java</title>
	<atom:link href="https://codethink.no-ip.org/tags/java/feed" rel="self" type="application/rss+xml" />
	<link>https://codethink.no-ip.org</link>
	<description>A blog about coding, life, and other arbitrary topics</description>
	<lastBuildDate>Sun, 15 Mar 2026 21:30:15 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.29</generator>
	<item>
		<title>Matchbook &#8211; Multi-platform Realtime Gaming</title>
		<link>https://codethink.no-ip.org/archives/1046</link>
		<comments>https://codethink.no-ip.org/archives/1046#comments</comments>
		<pubDate>Sun, 26 Jan 2014 13:56:58 +0000</pubDate>
		<dc:creator><![CDATA[aroth]]></dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[objective-c]]></category>
		<category><![CDATA[software]]></category>
		<category><![CDATA[framework]]></category>
		<category><![CDATA[objc]]></category>
		<category><![CDATA[open-source]]></category>

		<guid isPermaLink="false">http://codethink.no-ip.org/wordpress/?p=1046</guid>
		<description><![CDATA[Have you ever thought that it would be cool if you could build a cross-platform or multi-platform game and connect from one platform to another without having to do all the heavy lifting with respect to matchmaking, communications, and related &#8230; <a href="https://codethink.no-ip.org/archives/1046">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Have you ever thought that it would be cool if you could build a cross-platform or multi-platform game and connect from one platform to another without having to do all the heavy lifting with respect to matchmaking, communications, and related tasks yourself?  Well now you can, thanks to <a href="https://github.com/adam-roth/matchbook" target="_blank">Matchbook</a>.</p>
<div id="attachment_1049" style="width: 650px" class="wp-caption aligncenter"><a href="http://codethink.no-ip.org/wordpress/wp-content/uploads/2014/01/running_small.jpg" rel="lightbox[1046]"><img src="http://codethink.no-ip.org/wordpress/wp-content/uploads/2014/01/running-1200x721.jpg" alt="Matchbook Example &quot;Game&quot;" title="Matchbook Example &quot;Game&quot;" width="640" height="384" class="size-large wp-image-1049" /></a><p class="wp-caption-text">Matchbook Example &quot;Game&quot;, 2x iOS and 2x Android clients</p></div>
<p>Matchbook is a lightweight and platform-agnostic matchmaking solution, intended for use in mobile applications (think games that require near-real-time, relatively-low-latency, persistent communications between two or more client devices).</p>
<p>At its core is a server component, which provides a JSON-based webservice allowing clients to find, create, and join matches. The server also acts as a proxy/relay when necessary, allowing client devices to tunnel through any firewalls that might exist between them.</p>
<p>In addition to the server component, Matchbook includes prebuilt SDK&#8217;s for both Java and Objective-C. These SDK&#8217;s are intended to support the development of native applications that make use of the Matchbook webservice on Android and iOS devices, respectively.</p>
<p>I could go on at length, but it&#8217;s simpler to just link to the project on Github (Matchbook is open-source, and permissively licensed, naturally):</p>
<p><a href="https://github.com/adam-roth/matchbook" target="_blank">https://github.com/adam-roth/matchbook</a></p>
<p>Note that Google is currently building comparable functionality into Google Play, although their realtime communications API is currently only available on Android (iOS support is under development).  </p>
<p>And although Apple already has realtime and turn-based gaming API&#8217;s for iOS, they natually have no intention of inviting Android devices to the party.  Ever.</p>
<p>So let the record show that Matchbook got there first.</p>
]]></content:encoded>
			<wfw:commentRss>https://codethink.no-ip.org/archives/1046/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Resurrecting sun.misc.Unsafe</title>
		<link>https://codethink.no-ip.org/archives/712</link>
		<comments>https://codethink.no-ip.org/archives/712#comments</comments>
		<pubDate>Sat, 09 Jul 2011 15:21:48 +0000</pubDate>
		<dc:creator><![CDATA[aroth]]></dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[hack]]></category>

		<guid isPermaLink="false">http://codethink.no-ip.org/wordpress/?p=712</guid>
		<description><![CDATA[Here&#8217;s one that only the hardcore Java hackers will enjoy. Perhaps you are already familiar with sun.misc.Unsafe. This heavily protected internal class provides access to a number of low-level memory operations and system functions that are generally hidden away from &#8230; <a href="https://codethink.no-ip.org/archives/712">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Here&#8217;s one that only the hardcore Java hackers will enjoy.  Perhaps you are already familiar with <a href="http://www.docjar.com/docs/api/sun/misc/Unsafe.html" target="_blank">sun.misc.Unsafe</a>.  This heavily protected internal class provides access to a number of low-level memory operations and system functions that are generally hidden away from user-code executing in the JRE.  If you&#8217;ve never heard of it before (or even if you have) then I encourage you to read this <a href="http://javapapers.com/core-java/address-of-a-java-object/" target="_blank">article</a> to get an idea about the sort of things that can be done using &#8216;<em>Unsafe</em>&#8216;.  Note that we are talking about very low-level operations here.  Operations where if you make a mistake, you won&#8217;t just get a simple &#8216;<em>Exception</em>&#8216; complaining that you did something bad.  Instead you&#8217;re liable to bring the entire JVM to a crashing halt if you do something wrong (it&#8217;s called &#8216;<em>Unsafe</em>&#8216; for a reason, after all), so you should proceed only if you&#8217;re sure you know what you are doing.</p>
<p>Also note that the method described in the <a href="http://javapapers.com/core-java/address-of-a-java-object/" target="_blank">other article</a> for getting your hands on an instance of &#8216;<em>Unsafe</em>&#8216; no longer works.  Using the latest JDK, the following code will not work (you&#8217;ll get compiler errors for even trying to <em>import</em> sun.misc.Unsafe):</p>
<pre class="brush: java; title: ; notranslate">private static Unsafe getUnsafeInstance() throws SecurityException, NoSuchFieldException, IllegalArgumentException,
    IllegalAccessException {
    Field theUnsafeInstance = Unsafe.class.getDeclaredField(&quot;theUnsafe&quot;);
    theUnsafeInstance.setAccessible(true);
    return (Unsafe) theUnsafeInstance.get(Unsafe.class);
 }</pre>
<p>Here is a variant that still works:</p>
<pre class="brush: java; title: ; notranslate">	public static Object getUnsafe() {
		try {
			Class unsafeClass = Class.forName(&quot;sun.misc.Unsafe&quot;);
			Field unsafeField = unsafeClass.getDeclaredField(&quot;theUnsafe&quot;);
			unsafeField.setAccessible(true);
			Object unsafeObj = unsafeField.get(unsafeClass);
			
			return unsafeObj;
		}
		catch (Exception e) {
			return null;//UNSAFE_ERROR_CODE;
		}
	}</pre>
<p>Note that as importing sun.misc.Unsafe now causes a compiler error, it is only possible to return an Object reference to the &#8216;<em>Unsafe</em>&#8216; instance.  This means that any method that you want to call on it must be done through reflection, which is tedious at best.  Luckily, I&#8217;ve taken care of this tedious bit for you, and created a wrapper that provides the same public API as sun.misc.Unsafe and uses reflection to delegate method calls to the real &#8216;<em>Unsafe</em>&#8216; instance.  The source code for this utility is quite long and not very interesting, so I&#8217;m not going to post it here.  Instead you can use this <a href="http://aroth.no-ip.org/UnsafeUtil.java">direct link</a> to download a copy.</p>
<p>Here is a simple example of how to use this class to perform some basic tasks:</p>
<pre class="brush: java; title: ; notranslate">		System.out.println(&quot;Testing sun.misc.Unsafe...&quot;);
		double[] averages = {-777.0, -777.0, -777.0};
		UnsafeUtil util = new UnsafeUtil();
		System.out.println(&quot;addressSize=&quot; + util.addressSize() + &quot;, pageSize=&quot; + util.pageSize());
		
		long memPointer = util.allocateMemory(1024);
		long memPointer2 = util.allocateMemory(1024);
		System.out.println(&quot;1K memory blocks at addresses:  &quot; + memPointer + &quot;, &quot; + memPointer2);
		
		//valid copy
		util.copyMemory(memPointer, memPointer2, 1024);
		
		//invalid copy
		//util.copyMemory(memPointer, memPointer2, 1024000);
		
		int result = util.getLoadAverage(averages, 1);
		System.out.println(&quot;getLoadAverage:  Result=&quot; + result + &quot;, averages=&quot; + averages[0] + &quot;, &quot; + averages[1] + &quot;, &quot; + averages[2]);</pre>
<p>Do note that if you uncomment the &#8220;invalid copy&#8221; line then running this code will very likely crash your JVM.  As noted above, you should use this utility with caution, particularly if you are running it in an environment shared by other Java code.</p>
<p>And why do all this when Sun (now Oracle) clearly does not want people mucking around with this class?  Because ultimately power belongs in the hands of developers.  All developers, not just the select few chosen to work on privileged system code.</p>
]]></content:encoded>
			<wfw:commentRss>https://codethink.no-ip.org/archives/712/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>[Android] Installing Adobe AIR on the Android Emulator</title>
		<link>https://codethink.no-ip.org/archives/699</link>
		<comments>https://codethink.no-ip.org/archives/699#comments</comments>
		<pubDate>Sun, 03 Jul 2011 13:30:57 +0000</pubDate>
		<dc:creator><![CDATA[aroth]]></dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[configuration]]></category>
		<category><![CDATA[AIR]]></category>
		<category><![CDATA[android]]></category>
		<category><![CDATA[emulator]]></category>
		<category><![CDATA[java]]></category>

		<guid isPermaLink="false">http://codethink.no-ip.org/wordpress/?p=699</guid>
		<description><![CDATA[I&#8217;ve got no idea why this process is so poorly documented, nor why many of the existing resources describing how to install the AIR runtime on the Android emulator are so needlessly circuitous, pointing you to links on the official &#8230; <a href="https://codethink.no-ip.org/archives/699">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>I&#8217;ve got no idea why this process is so poorly documented, nor why many of the existing resources describing how to install the AIR runtime on the Android emulator are so needlessly circuitous, pointing you to links on the official Adobe site that have moved or no longer exist.  </p>
<p>But suffice to say, if you want to install the Adobe AIR runtime on an emulated Android device for testing or development purposes without having to wade through a ton of fuss and nonsense, you have two basic options.  The first is described briefly <a href="http://renaun.com/blog/2010/12/finding-the-air-for-android-emulator-runtime/" target="_blank">here</a>, and basically involves installing the latest Flash/Flex/AIR SDK from the <a href="http://www.adobe.com/products/air/sdk/" target="_blank">official website</a> and then grabbing the AIR runtime package from &#8216;<em>&lt;AIR_SDK_ROOT&gt;/runtimes/air/android/emulator/Runtime.apk</em>&#8216;.  This file is the AIR runtime that must be installed on the emulator for any apps built with Adobe AIR to function.  And this method works fine if you don&#8217;t mind downloading and installing the entire AIR SDK (90 MB) just to grab this one file.</p>
<p>You other option is to use this <a href="http://aroth.no-ip.org/com.adobe.air.emulator.2.7.apk">direct link</a> to download just the AIR runtime package (6.1 MB).  Note that this is version 2.7 of the AIR runtime, and has been tested with an emulator running Android 3.1 only.  I cannot vouch for it working with any other configurations.  </p>
<p>In either case, once you have gotten a hold of the the runtime .apk file, installing it on your emulator is a relatively simple process.  All you need is the &#8216;<em>adb</em>&#8216; utility that is included in the &#8220;Android SDK Platform Tools&#8221; package (note that this package is not the same as the similarly named &#8220;Android SDK Tools&#8221; package).  If you don&#8217;t have this package installed yet, then use your Android configuration manager to install it.  Then simply navigate to &#8216;<em>&lt;ANDROID_SDK_ROOT&gt;/platform-tools</em>&#8216; and run the following command (while your Android emulator is running):</p>
<pre class="brush: plain; title: ; notranslate">
adb install /path/to/your/AIR/runtime.apk
</pre>
<p>This will install the AIR runtime on your emulated device, and you can now install and run AIR-based Android apps on your emulator.  Quite simple, really.  Which only doubles my confusion with respect to why this simple process seems to be so poorly documented online.</p>
]]></content:encoded>
			<wfw:commentRss>https://codethink.no-ip.org/archives/699/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>[Java] Quarantining Request Parameters</title>
		<link>https://codethink.no-ip.org/archives/663</link>
		<comments>https://codethink.no-ip.org/archives/663#comments</comments>
		<pubDate>Sat, 11 Jun 2011 23:21:41 +0000</pubDate>
		<dc:creator><![CDATA[aroth]]></dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[servlet]]></category>
		<category><![CDATA[filter]]></category>

		<guid isPermaLink="false">http://codethink.no-ip.org/wordpress/?p=663</guid>
		<description><![CDATA[This is a follow-up on an earlier post in which I described a method for modifying HTTP request parameters on the fly in a servlet/web application. I mentioned that the technique could be used to provide a filter that automatically &#8230; <a href="https://codethink.no-ip.org/archives/663">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>This is a follow-up on an <a href="http://codethink.no-ip.org/wordpress/archives/634">earlier post</a> in which I described a method for modifying HTTP request parameters on the fly in a servlet/web application.  I mentioned that the technique could be used to provide a filter that automatically quarantines any potentially unsafe parameters so that application code/business logic does not need to worry about such things as XSS and SQL injection attacks, but I didn&#8217;t provide any complete reference code for such a filter.  So today I aim to rectify that omission.  </p>
<p>As this example code builds upon the <a href="http://codethink.no-ip.org/wordpress/archives/634">OverridableHttpRequest</a> class discussed in the previous post, you will need that class (or your own comparable implementation) before you get started.  Once you have that, the code for the filter implementation is not very complex:</p>
<pre class="brush: java; title: ; notranslate">public class InputSanitizerFilter implements Filter {
	private static final Logger LOG = Logger.getLogger(InputSanitizerFilter.class);
	
	private static final String BANNED_INPUT_CHARS = &quot;.*[^a-zA-Z0-9\\@\\'\\,\\.\\/\\(\\)\\+\\=\\-\\_\\[\\]\\{\\}\\^\\!\\*\\&amp;\\%\\$\\:\\;\\? \\t]+.*&quot;;
	
	public static final String QUARANTINE_ATTRIBUTE_NAME = &quot;filter.quarantined.params&quot;;
	public static final String SUSPICIOUS_REQUEST_FLAG_NAME = &quot;filter.suspicious.request&quot;;

	@Override
	public void destroy() {
		//no work necessary
	}

	@Override
	@SuppressWarnings(&quot;unchecked&quot;)
	public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
		//wrap the original request and set up an empty quarantine map
		OverridableHttpRequest newRequest = new OverridableHttpRequest((HttpServletRequest)request);
		Map&lt;String, String&gt; quarantine = new HashMap&lt;String, String&gt;();
		newRequest.setAttribute(QUARANTINE_ATTRIBUTE_NAME, quarantine);
		
		//inspect each parameter, and move any suspicious ones into thw quarantine area
		Enumeration&lt;String&gt; names = request.getParameterNames();
		while (names.hasMoreElements()) {
			String name = names.nextElement();
			String value = request.getParameter(name);
			if (value.matches(BANNED_INPUT_CHARS)) {
				//uh-oh, found something that doesn't look right, quarantine it and make sure the request is flagged as suspicious
				LOG.warn(&quot;Removing potentially malicious parameter from request:  &quot; + name);
				quarantine.put(name, value);
				newRequest.removeParameter(name);
				newRequest.setAttribute(SUSPICIOUS_REQUEST_FLAG_NAME, &quot;true&quot;);
			}
		}
		
		//done, send the modified request on down the chain
		chain.doFilter(newRequest, response);
	}

	@Override
	public void init(FilterConfig arg0) throws ServletException {
		//no work necessary
	}
}</pre>
<p>Basically there is a regex that defines the set of disallowed characters (anything that is not a letter, number, standard punctuation character, or whitespace&#8230;this example will also exclude any parameters that contain newlines as the webapp that it is used in does not contain any inputs that can accept newlines, but you can obviously modify it to your liking), and every parameter in the request is inspected to see if it contains one or more banned character(s).  Any parameter that is found to include banned characters is copied into the &#8216;<em>quarantine</em>&#8216; map, and then removed as a parameter (so calling &#8216;<em>request.getParameter(&#8220;badParam&#8221;)</em>&#8216; will return <em>null</em>).  This will prevent application code from ever seeing the suspect parameter(s), unless it explicitly goes looking inside of the quarantine map (which you might do if you want to allow users to have special characters in their password, for instance).</p>
<p>Also, if one or more parameters are quarantined, the request is flagged as suspicious by setting an attribute named &#8216;<em>filter.suspicious.request</em>&#8216;.  This gives code downstream from the filter a quick way to see if the request has been modified by the filter in any way, and allows the application to make its own decisions about whether or not it wants to venture into the quarantine area to inspect the potentially unsafe data.  For instance:</p>
<pre class="brush: java; title: ; notranslate">if (request.getAttribute(InputSanitizerFilter.SUSPICIOUS_REQUEST_FLAG_NAME) != null) {
    Map&lt;String, String&gt; quarantine = (Map&lt;String, String&gt;)request.getAttribute(InputSanitizerFilter.QUARANTINE_ATTRIBUTE_NAME);
    for (String key : quarantine.keySet()) {
        System.out.println(&quot;Quarantined parameter:  name=&quot; + key + &quot;, value=&quot; + quarantine.get(key));
    }
}</pre>
<p>&#8230;or, with the right frameworks in place you might easily create something like an &#8216;<em>@IgnoresQuarantine</em>&#8216; annotation that can be applied selectively to business methods to mark a parameter (or set of parameters) as not subject to quarantine on calls to that method so that you never need to manually go looking through the quarantine area (after implementing the annotation and its backing logic).  But that&#8217;s just slightly outside of the scope for today.</p>
]]></content:encoded>
			<wfw:commentRss>https://codethink.no-ip.org/archives/663/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>[Java] Override HTTP Request Parameters</title>
		<link>https://codethink.no-ip.org/archives/634</link>
		<comments>https://codethink.no-ip.org/archives/634#comments</comments>
		<pubDate>Sun, 15 May 2011 07:25:39 +0000</pubDate>
		<dc:creator><![CDATA[aroth]]></dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[hack]]></category>

		<guid isPermaLink="false">http://codethink.no-ip.org/wordpress/?p=634</guid>
		<description><![CDATA[Often in a Java web-application I come across cases where it would be useful to directly override or modify one or more HTTP request parameters. To be clear, by &#8220;request parameter&#8221; I am referring to the value that is returned &#8230; <a href="https://codethink.no-ip.org/archives/634">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Often in a Java web-application I come across cases where it would be useful to directly override or modify one or more HTTP request parameters.  To be clear, by &#8220;request parameter&#8221; I am referring to the value that is returned by the <a href="http://tomcat.apache.org/tomcat-5.5-doc/servletapi/javax/servlet/ServletRequest.html" target="_blank">ServletRequest</a>&#8216;s &#8216;<em>getParameter()</em>&#8216; method.  For whatever reason the architects of the Servlet spec decided that direct modification of request parameters was not to be supported, despite the number of use-cases that can benefit from such a feature.</p>
<p>For example, say that you have a <a href="http://download.oracle.com/javaee/5/api/javax/servlet/Filter.html" target="_blank">Filter</a> that you are using to sanitize input parameters and guard against things like <a href="http://en.wikipedia.org/wiki/Cross-site_scripting" target="_blank">XSS attacks</a> by ensuring that obviously invalid values like &#8220;&lt;script&gt;alert(&#8216;hacked!&#8217;);&lt;/script&gt;&#8221; are filtered out.  Wouldn&#8217;t it be great if you could implement your Filter such that when it finds one or more forbidden values it flags the request as potentially malicious (using a request attribute), removes any parameters that contain potentially unsafe data, and relocates the potentially unsafe data to a predetermined quarantine area (also accessible via a request attribute)?  This would protect your webapp code from ever receiving a malicious parameter, while still allowing parts of the code that might permit seemingly malicious parameters (for instance, there is no reason to prevent a user from registering with a password of &#8220;&lt;script&gt;alert(&#8216;hacked!&#8217;);&lt;/script&gt;&#8221; if that is what they want to use, particularly if you are hashing user passwords like you should be) to still access them if desired by going through the quarantine area.  </p>
<p>Of course, two-thirds of the functionality described above can be implemented without being able to override request parameters.  With the standard API you can certainly check for potentially malicious parameters, set an attribute if you find any, and copy their values into a quarantine area.  But what you cannot do is remove the parameters from the request, so any application code that directly accesses a request parameter value may still be at risk, particularly if its author forgets to check to see if the request has been flagged as suspect.  The real beauty of being able to override parameter values is that you can do things like completely prevent a malicious parameter from ever being visible to your application code unless your application code goes out of its way to look for it (and if you do that, and you do it incorrectly, then that&#8217;s your own fault).  </p>
<p>Anyways, the code to enable this kind of functionality is a fairly straightforward (if tedious) exercise in writing an HttpServletRequest wrapper and then overriding a few choice methods (and adding a couple new ones):  </p>
<pre class="brush: java; title: ; notranslate">import java.io.BufferedReader;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.security.Principal;
import java.util.Collections;
import java.util.Enumeration;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Locale;
import java.util.Map;
import java.util.Set;

import javax.servlet.RequestDispatcher;
import javax.servlet.ServletInputStream;
import javax.servlet.http.Cookie;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpSession;

public class OverridableHttpRequest implements HttpServletRequest {
        
	private HttpServletRequest wrappedRequest;
	private Map&lt;String, String&gt; newParams;
	private Set&lt;String&gt; removedParams;

	public OverridableHttpRequest(HttpServletRequest requestToWrap) {
		this.wrappedRequest = requestToWrap;
		this.newParams = new HashMap&lt;String, String&gt;();
		this.removedParams = new HashSet&lt;String&gt;();
	}

	// these things we add so that params can be overridden
	public void setParameter(String name, String value) {
		this.removedParams.remove(name);
		this.newParams.put(name, value);
	}

	public void removeParameter(String name) {
		this.newParams.remove(name);
		this.removedParams.add(name);
	}

	// these things we need to override so that the correct state is exposed through the standard API
	@SuppressWarnings(&quot;rawtypes&quot;)
	@Override
	public Enumeration getParameterNames() {
		Set&lt;String&gt; result = new HashSet&lt;String&gt;();
		Enumeration requestParams = this.wrappedRequest.getParameterNames();
		while (requestParams.hasMoreElements()) {
			Object param = requestParams.nextElement();
			if (!removedParams.contains(param)) {
				result.add((String) param);
			}
		}
		result.addAll(newParams.keySet());

		return Collections.enumeration(result);
	}

	@Override
	public String[] getParameterValues(String arg0) {
		//NOTE:  not strictly to spec
		String[] result = new String[1];
		result[0] = this.getParameter(arg0);

		return result;
	}

	@Override
	public String getParameter(String arg0) {
		if (removedParams.contains(arg0)) {
			return null;
		}
		if (newParams.containsKey(arg0)) {
			return newParams.get(arg0);
		}
		return this.wrappedRequest.getParameter(arg0);
	}

	@SuppressWarnings(&quot;rawtypes&quot;)
	@Override
	public Map getParameterMap() {
		Map&lt;String, String[]&gt; result = new HashMap&lt;String, String[]&gt;();
		for (Object key : this.wrappedRequest.getParameterMap().keySet()) {
			result.put((String)key, (String[])this.wrappedRequest.getParameterMap().get(key));
		}
		for (String key : this.newParams.keySet()) {
			result.put(key, new String[] {this.newParams.get(key)});
		}
		for (String key : this.removedParams) {
			result.remove(key);
		}
		
		return result;
	}

	// these things we should probably override but don't right now
	@Override
	public String getRequestURI() {
		// FIXME: should return a modified URI based upon current state
		return this.wrappedRequest.getRequestURI();
	}

	@Override
	public StringBuffer getRequestURL() {
		// FIXME: should return a modified URL based upon current state
		return this.wrappedRequest.getRequestURL();
	}

	@Override
	public String getQueryString() {
		// FIXME: should return a modified String based upon current state
		return this.wrappedRequest.getQueryString();
	}

	// everything else just passes through
	@Override
	public Object getAttribute(String arg0) {
		return this.wrappedRequest.getAttribute(arg0);
	}

	@SuppressWarnings(&quot;rawtypes&quot;)
	@Override
	public Enumeration getAttributeNames() {
		return this.wrappedRequest.getAttributeNames();
	}

	@Override
	public String getCharacterEncoding() {
		return this.wrappedRequest.getCharacterEncoding();
	}

	@Override
	public int getContentLength() {
		return this.wrappedRequest.getContentLength();
	}

	@Override
	public String getContentType() {
		return this.wrappedRequest.getContentType();
	}

	@Override
	public ServletInputStream getInputStream() throws IOException {
		return this.wrappedRequest.getInputStream();
	}

	@Override
	public String getLocalAddr() {
		return this.wrappedRequest.getLocalAddr();
	}

	@Override
	public String getLocalName() {
		return this.wrappedRequest.getLocalName();
	}

	@Override
	public int getLocalPort() {
		return this.wrappedRequest.getLocalPort();
	}

	@Override
	public Locale getLocale() {
		return this.wrappedRequest.getLocale();
	}

	@SuppressWarnings(&quot;rawtypes&quot;)
	@Override
	public Enumeration getLocales() {
		return this.wrappedRequest.getLocales();
	}

	@Override
	public String getProtocol() {
		return this.wrappedRequest.getProtocol();
	}

	@Override
	public BufferedReader getReader() throws IOException {
		return this.wrappedRequest.getReader();
	}

	@SuppressWarnings(&quot;deprecation&quot;)
	@Override
	public String getRealPath(String arg0) {
		return this.wrappedRequest.getRealPath(arg0);
	}

	@Override
	public String getRemoteAddr() {
		return this.wrappedRequest.getRemoteAddr();
	}

	@Override
	public String getRemoteHost() {
		return this.wrappedRequest.getRemoteHost();
	}

	@Override
	public int getRemotePort() {
		return this.wrappedRequest.getRemotePort();
	}

	@Override
	public RequestDispatcher getRequestDispatcher(String arg0) {
		return this.wrappedRequest.getRequestDispatcher(arg0);
	}

	@Override
	public String getScheme() {
		return this.wrappedRequest.getScheme();
	}

	@Override
	public String getServerName() {
		return this.wrappedRequest.getServerName();
	}

	@Override
	public int getServerPort() {
		return this.wrappedRequest.getServerPort();
	}

	@Override
	public boolean isSecure() {
		return this.wrappedRequest.isSecure();
	}

	@Override
	public void removeAttribute(String arg0) {
		this.wrappedRequest.removeAttribute(arg0);
	}

	@Override
	public void setAttribute(String arg0, Object arg1) {
		this.wrappedRequest.setAttribute(arg0, arg1);
	}

	@Override
	public void setCharacterEncoding(String arg0)
			throws UnsupportedEncodingException {
		this.wrappedRequest.setCharacterEncoding(arg0);
	}

	@Override
	public String getAuthType() {
		return this.wrappedRequest.getAuthType();
	}

	@Override
	public String getContextPath() {
		return this.wrappedRequest.getContextPath();
	}

	@Override
	public Cookie[] getCookies() {
		return this.wrappedRequest.getCookies();
	}

	@Override
	public long getDateHeader(String arg0) {
		return this.wrappedRequest.getDateHeader(arg0);
	}

	@Override
	public String getHeader(String arg0) {
		return this.wrappedRequest.getHeader(arg0);
	}

	@SuppressWarnings(&quot;rawtypes&quot;)
	@Override
	public Enumeration getHeaderNames() {
		return this.wrappedRequest.getHeaderNames();
	}

	@SuppressWarnings(&quot;rawtypes&quot;)
	@Override
	public Enumeration getHeaders(String arg0) {
		return this.wrappedRequest.getHeaders(arg0);
	}

	@Override
	public int getIntHeader(String arg0) {
		return this.wrappedRequest.getIntHeader(arg0);
	}

	@Override
	public String getMethod() {
		return this.wrappedRequest.getMethod();
	}

	@Override
	public String getPathInfo() {
		return this.wrappedRequest.getPathInfo();
	}

	@Override
	public String getPathTranslated() {
		return this.wrappedRequest.getPathTranslated();
	}

	@Override
	public String getRemoteUser() {
		return this.wrappedRequest.getRemoteUser();
	}

	@Override
	public String getRequestedSessionId() {
		return this.wrappedRequest.getRequestedSessionId();
	}

	@Override
	public String getServletPath() {
		return this.wrappedRequest.getServletPath();
	}

	@Override
	public HttpSession getSession() {
		return this.wrappedRequest.getSession();
	}

	@Override
	public HttpSession getSession(boolean arg0) {
		return this.wrappedRequest.getSession(arg0);
	}

	@Override
	public Principal getUserPrincipal() {
		return this.wrappedRequest.getUserPrincipal();
	}

	@Override
	public boolean isRequestedSessionIdFromCookie() {
		return this.wrappedRequest.isRequestedSessionIdFromCookie();
	}

	@Override
	public boolean isRequestedSessionIdFromURL() {
		return this.wrappedRequest.isRequestedSessionIdFromURL();
	}

	@SuppressWarnings(&quot;deprecation&quot;)
	@Override
	public boolean isRequestedSessionIdFromUrl() {
		return this.wrappedRequest.isRequestedSessionIdFromUrl();
	}

	@Override
	public boolean isRequestedSessionIdValid() {
		return this.wrappedRequest.isRequestedSessionIdValid();
	}

	@Override
	public boolean isUserInRole(String arg0) {
		return this.wrappedRequest.isUserInRole(arg0);
	}

}</pre>
<p>So the new methods being added here are &#8216;<em>removeParameter(String name)</em>&#8216; and &#8216;<em>setParameter(String name)</em>&#8216;, which do pretty much what their name implies.  If you are familiar with the standard &#8216;<em>removeAttribute(String name)</em>&#8216; and &#8216;<em>setAttribute(String name)</em>&#8216; methods, then you should feel right at home with these new additions.  They simply let you manipulate request parameters in a way that&#8217;s identical to how you can already manipulate request attributes.  </p>
<p>One minor deviation from the Servlet specification that is worth noting is that I have overridden &#8216;<em>getParameterValues(String name)</em>&#8216; such that it only returns the first value associated with a given parameter.  This means that if for some reason your webapp uses URL&#8217;s like &#8220;http://mysite.com/api?user=bob&#038;user=jane&#038;user=paul&#8221; then you will only see &#8220;bob&#8221; as a value for the &#8216;user&#8217; parameter.  In practice I have not ever come across a web application that intentionally relied on a single parameter name having multiple values associated with it, and if you are designing your web application in such a way then you should probably just stop and pick a less confusing pattern.  I see no value in a feature that allows a single parameter to have multiple values that you only get to see if you use a different API method to get them (&#8216;<em>getParameterValues</em>&#8216; instead of &#8216;<em>getParameter</em>&#8216;), and so I have removed this feature from the implementation.  If someone can come up with a solid justification for having such a feature, I will add it back in.</p>
<p>Also, left as an exercise is overriding &#8216;<em>getRequestURI()</em>&#8216;, &#8216;<em>getRequestURL()</em>&#8216;, and &#8216;<em>getQueryString()</em>&#8216; to return the correct values based upon the modified request state.  It&#8217;s fairly rare to have application code that depends upon the values of these calls, so in most cases you will not need to do this.</p>
<p>In any case, to make use of the OverridableHttpRequest class, you can do the following:</p>
<pre class="brush: java; title: ; notranslate">public class ExampleFilter implements Filter {
	@Override
	public void init(FilterConfig filterConfig) throws ServletException {
		//do initialization things here
	}

	@Override
	public void doFilter(ServletRequest request, ServletResponse response,
			FilterChain filterChain) throws IOException, ServletException {

		HttpServletRequest httpReq = (HttpServletRequest) response;
		OverridableHttpRequest newRequest = new OverridableHttpRequest(httpReq);
     
		//do work and modify the request as desired
		newRequest.removeParameter(&quot;someBadXssParam&quot;);

		//pass the modified request on to the webapp, anyone downstream will see 
		//the modified state with no 'someBadXssParam' in it
		filterChain.doFilter(newRequest, response);
	}

	@Override
	public void destroy() {
		//do shutdown things here
	}
}</pre>
<p>Simple, but powerful.  I just wish the Servlet spec included this kind of functionality out of the box so that it wouldn&#8217;t be necessary to implement a complete HttpServletRequest wrapper just to add a couple of basic mutator methods.</p>
]]></content:encoded>
			<wfw:commentRss>https://codethink.no-ip.org/archives/634/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>[Java] Defeating CAPTCHA Images</title>
		<link>https://codethink.no-ip.org/archives/150</link>
		<comments>https://codethink.no-ip.org/archives/150#comments</comments>
		<pubDate>Sat, 29 Jan 2011 14:40:27 +0000</pubDate>
		<dc:creator><![CDATA[aroth]]></dc:creator>
				<category><![CDATA[coding]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[captcha]]></category>
		<category><![CDATA[hack]]></category>

		<guid isPermaLink="false">http://codethink.no-ip.org/wordpress/?p=150</guid>
		<description><![CDATA[Disclaimer: Depending upon the country you currently reside in, programmatically defeating CAPTCHA images may technically be illegal. Whether or not there is any merit behind such a law I leave as a matter for you to work out with your &#8230; <a href="https://codethink.no-ip.org/archives/150">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p><strong>Disclaimer:</strong>  Depending upon the country you currently reside in, programmatically defeating CAPTCHA images may technically be illegal.  Whether or not there is any merit behind such a law I leave as a matter for you to work out with your representatives or equivalent lawmaking body.  But suffice to say, the information in this post is intended for educational and informative purposes only, and should not be used in any other context.  It should also be noted that the CAPTCHA images that are used in this example are quite old, and were cracked by others long ago.</p>
<p>I&#8217;ve always been mildly amused at the continually growing use of CAPTCHA images, or more accurately, at their ever-increasing complexity.  It seems that the only truly effective CAPTCHA&#8217;s are the ones that even human beings can barely decipher.  But more interesting to me is the fact that these distorted snippets of letters and numbers have become a sort of de-facto Turing test.  If you can determine what the characters are, then you are human; otherwise you are not.  For whatever reason, these images have become a symbolic line in the sand separating man from machine, and by exploring ways to cross this line we may move ever so slightly closer towards the creation of true artificial-intelligence.</p>
<p>So let&#8217;s examine a very basic CAPTCHA image, one that was used in a popular online-forum distribution before it was cracked long ago:</p>
<p><a href="http://codethink.no-ip.org/wordpress/wp-content/uploads/2011/01/phpbb2.jpg" rel="lightbox[150]"><img src="http://codethink.no-ip.org/wordpress/wp-content/uploads/2011/01/phpbb2.jpg" alt="PHPBB2 CAPTCHA" title="PHPBB2 CAPTCHA" width="320" height="50" class="aligncenter size-full wp-image-151" /></a></p>
<p>This CAPTCHA works on the principle of contrast. Human beings can discern distinct regions in an otherwise noisy image so long as each distinct region meets some minimum contrast level above/below that of the background noise.  This kind of image can be difficult to decipher computationally, because pulling out coherent regions from amongst the background noise requires contextual understanding of large portions of the image at once, which is generally a difficult thing to accomplish programmatically. That isn&#8217;t to say it can&#8217;t be done, however.</p>
<p>A human being looking at this image is able to recognize that there is some threshold created by the background noise that has been introduced, above which an element is part of the encoded data, and below which an element is simply part of the background noise and should be discarded.  Once that is done discerning the text becomes a simple matter of discarding everything below the noise threshold, and keeping everything above.  So let&#8217;s see if we can code it.  First, we need a way to determine the noise threshold:</p>
<pre class="brush: java; title: ; notranslate">        //init, determine the average color intensity of the image
        int average = 0;
        for (int row = 0; row &lt; image.getHeight(); row++) {
                for (int column = 0; column &lt; image.getWidth(); column++) {
                        int color = image.getRGB(column, row) &amp; 0x000000FF;  //only need the last 8 bits
                        average += color;
                }
        }
        average /= image.getWidth() * image.getHeight();</pre>
<p>This bit of code determines the average color intensity of the entire image (216 / 255 in this case).  Because this CAPTCHA is in grayscale it only needs to look at a single component of the pixel color, but colorized CAPTCHA images could be processed in a similar fashion by computing the intensity using the full RGB value.  In any case, now we have a basic threshold that we can use for determining which parts of the image contain valuable data, and which parts contain only noise.  We can do that like so:</p>
<pre class="brush: java; title: ; notranslate">        //first pass, mark all pixels as WHITE or BLACK
        for (int row = 0; row &lt; image.getHeight(); row++) {
                for (int column = 0; column &lt; image.getWidth(); column++) {
                        int color = image.getRGB(column, row) &amp; 0x000000FF;  //only need the last 8 bits
                        if (color &lt;= average * .70 ) {
                                image.setRGB(column, row, BLACK);
                                darkRegion = true;
                        }
                        else if (color &lt; .85 * average &amp;&amp; darkRegion &amp;&amp; row &lt; image.getHeight() - 1 
                                &amp;&amp; (image.getRGB(column, row + 1) &amp; 0x000000FF) &lt; .85 * average) {
                                image.setRGB(column, row, BLACK);
                        }
                        else if (color &lt; .85 * average &amp;&amp; ! darkRegion &amp;&amp; row &lt; image.getHeight() - 1 &amp;&amp; column &gt; 0 
                                &amp;&amp; column &lt; image.getWidth() - 1 
                                &amp;&amp;  (((image.getRGB(column, row + 1) &amp; 0x000000FF) &lt; color) 
                                        || ((image.getRGB(column + 1, row) &amp; 0x000000FF) &lt; color) 
                                        || ((image.getRGB(column - 1, row) &amp; 0x000000FF) &lt; color))) {
                                image.setRGB(column, row, BLACK);
                                darkRegion = true;
                        }
                        else {
                                image.setRGB(column, row, WHITE);
                                darkRegion = false;
                        }
                }
        }</pre>
<p>Note that this code assumes that darker pixels are part of the data and lighter pixels are part of the background noise, because that is how the input CAPTCHA is set up.  A smarter approach would be to look at the number of pixels falling above the noise threshold and the number of pixels falling below, and then keep whichever group is smaller.  For a CAPTCHA like this one to be effective, there must be more noise than data, so it follows that the data that you&#8217;re looking for will always be in the smaller group of pixels.</p>
<p>In any case, what the above code does is traverse the image, and turn any pixels that appear to be noise white, and any pixels that appear to be data black.  Note that it includes some rudimentary region-detection code, owing to the fact that we expect our data pixels to be tightly clustered together in distinct regions.  So when the code encounters a pixel that it considers to be part of the data, it also lowers the selection criteria for the next pixel because there is a strong possibility that the next pixel will also be data.  This helps prevent false-negatives from erroneously dropping out valuable pieces of data.  Let&#8217;s take a peek at what our CAPTCHA image looks like at this point:</p>
<p><a href="http://codethink.no-ip.org/wordpress/wp-content/uploads/2011/01/captcha-out.png" rel="lightbox[150]"><img src="http://codethink.no-ip.org/wordpress/wp-content/uploads/2011/01/captcha-out.png" alt="PHPBB2 CAPTCHA, after first pass" title="PHPBB2 CAPTCHA, after first pass" width="320" height="50" class="aligncenter size-full wp-image-159" /></a></p>
<p>It&#8217;s not perfect, but it is definitely improved.  We have successfully removed all of the background noise from the image, but unfortunately we have also removed some pieces of the actual data.  The data that is left is all in the right place, however, so perhaps we can amplify and/or reconstruct it:</p>
<pre class="brush: java; title: ; notranslate">                //second pass, eliminate horizontal gaps
                for (int row = 0; row &lt; image.getHeight(); row++) {
                        for (int column = 0; column &lt; image.getWidth(); column++) {
                                int color = image.getRGB(column, row) &amp; 0x000000FF;  //only need the last 8 bits
                                if (color == 255) {
                                        consecutiveWhite++;
                                }
                                else {
                                        if (consecutiveWhite &lt; 3 &amp;&amp; column &gt; consecutiveWhite) {  
                                                for (int col = column - consecutiveWhite; col &lt; column; col++) {
                                                        image.setRGB(col, row, BLACK);
                                                }
                                        }
                                        consecutiveWhite = 0;
                                }
                        }
                }
                consecutiveWhite = 0;
                
                //third pass, eliminate vertical gaps
                for (int column = 0; column &lt; image.getWidth(); column++) {
                        for (int row = 0; row &lt; image.getHeight(); row++) {
                                int color = image.getRGB(column, row) &amp; 0x000000FF;  //only need the last 8 bits
                                if (color == 255) {
                                        consecutiveWhite++;
                                }
                                else {
                                        if (consecutiveWhite &lt; 2 &amp;&amp; row &gt; consecutiveWhite) {
                                                for (int r = row - consecutiveWhite; r &lt; row; r++) {
                                                        image.setRGB(column, r, BLACK);
                                                }
                                        }
                                        consecutiveWhite = 0;
                                }
                        }
                }</pre>
<p>This code fills in any small vertical and horizontal runs of white pixels with black pixels, the rationale being that any small group of white pixels that is surrounded on either end by black pixels is virtually guaranteed to be part of the data that was erroneously discarded.  Again we can take a peek at our result:</p>
<p><a href="http://codethink.no-ip.org/wordpress/wp-content/uploads/2011/01/captcha-out1.png" rel="lightbox[150]"><img src="http://codethink.no-ip.org/wordpress/wp-content/uploads/2011/01/captcha-out1.png" alt="PHPBB2 CAPTCHA, after third pass" title="PHPBB2 CAPTCHA, after third pass" width="320" height="50" class="aligncenter size-full wp-image-163" /></a></p>
<p>Getting better, but we&#8217;re not quite there yet.  Our characters are much more distinct, but there is still some missing data.  A fair bit of the missing data is now contained in small regions of white pixels that are actually encapsulated within our characters.  Filling them in is a relatively simple matter:</p>
<pre class="brush: java; title: ; notranslate">                //fourth pass, attempt to fill regions
                for (int row = 0; row &lt; image.getHeight(); row++) {
                        for (int column = 0; column &lt; image.getWidth(); column++) {
                                if (image.getRGB(column, row) == WHITE) {
                                        int height = countVerticalWhite(image, column, row);
                                        int width = countHorizontalWhite(image, column, row);
                                        int area = width * height;
                                        if ((area &lt;= 12) || (width == 1) || (height == 1)){
                                                image.setRGB(column, row, BLACK);
                                        }
                                }
                        }
                }
                
                //fifth pass repeats the fourth
                for (int row = 0; row &lt; image.getHeight(); row++) {
                        for (int column = 0; column &lt; image.getWidth(); column++) {
                                if (image.getRGB(column, row) == WHITE) {
                                        int height = countVerticalWhite(image, column, row);
                                        int width = countHorizontalWhite(image, column, row);
                                        int area = width * height;
                                        if ((area &lt;= 12) || (width == 1) || (height == 1)){
                                                image.setRGB(column, row, BLACK);
                                        }
                                }
                        }
                }</pre>
<p>Here we check, for each white pixel, how many adjacent white pixels exist both vertically and horizontally.  This gives us a rough estimate of the size of the current region of white pixels.  If the size is too small, then the code assumes that the white pixel is actually supposed to be part of the data, and turns it black.  Note that the algorithm is methodical in its approach, in that when it detects a small region of white pixels, it toggles only the initial pixel that it tested in that region.  This toggling will reduce the region-size reported for any adjacent white pixels, increasing the likelihood that they will be toggled as well on the next iteration, which is why two passes of the same algorithm are applied.  And yes, I know having the same code repeated twice is poor coding style, but for illustrative purposes it gets the job done.  Anyways, we now have:</p>
<p><a href="http://codethink.no-ip.org/wordpress/wp-content/uploads/2011/01/captcha-out2.png" rel="lightbox[150]"><img src="http://codethink.no-ip.org/wordpress/wp-content/uploads/2011/01/captcha-out2.png" alt="PHPBB2 CAPTCHA, after fifth pass" title="PHPBB2 CAPTCHA, after fifth pass" width="320" height="50" class="aligncenter size-full wp-image-166" /></a></p>
<p>Many of the gaps are now filled in, and the text is starting to look fairly legible.  There are now, however, a few spurious black pixels that have cropped up along the edges of the characters.  We could go back and refine the previous step, but instead let&#8217;s just prune out these outliers:</p>
<pre class="brush: java; title: ; notranslate">                //sixth pass, clear any false-positive
                for (int row = 0; row &lt; image.getHeight(); row++) {
                        for (int column = 0; column &lt; image.getWidth(); column++) {
                                if (image.getRGB(column, row) != WHITE) {
                                        if (countBlackNeighbors(image, column, row) &lt; 3) {
                                                image.setRGB(column, row, WHITE);
                                        }
                                }
                        }
                }</pre>
<p>This pruning step removes any black pixels that are bordered by 3 or fewer black pixels.  This is a fairly strict threshold, and will have the effect of smoothing/rounding out corners (i.e. some legitimate data will be discarded), but it will also clear out any spurious black pixels that exist in the image.  Now our image looks like so:</p>
<p><a href="http://codethink.no-ip.org/wordpress/wp-content/uploads/2011/01/captcha-out3.png" rel="lightbox[150]"><img src="http://codethink.no-ip.org/wordpress/wp-content/uploads/2011/01/captcha-out3.png" alt="PHPBB2 CAPTCHA, after sixth pass" title="PHPBB2 CAPTCHA, after sixth pass" width="320" height="50" class="aligncenter size-full wp-image-169" /></a></p>
<p>The letters have taken on a softer, more rounded quality.  They also happen to look vaguely reminiscent of what you might get if you were to scan a text document using an older scanner.  Which is worth mentioning because we will eventually be feeding our cleaned-up CAPTCHA image to an optical-character-recognition program that is designed to process just this sort of data.  First, however, our characters are all misaligned.  We&#8217;ve come this far, so we might as well fix the alignment issue while we&#8217;re at it:</p>
<pre class="brush: java; title: ; notranslate">                //now find the characters
                List&lt;CharacterBox&gt; characters = new ArrayList&lt;CharacterBox&gt;();
                int totalCharWidth = 10;
                int maxCharHeight = 0;
                for (int column = 0; column &lt; image.getWidth(); column++) {
                        int highestBlack = countVerticalWhite(image, column, 0);
                        if (highestBlack &lt; image.getHeight()) {
                                totalCharWidth += 5; //5 px spacing in between chars
                                CharacterBox box = new CharacterBox();
                                box.setX(column);
                                while (column &lt; image.getWidth() &amp;&amp; countVerticalWhite(image, column, 0) &lt; image.getHeight()) {
                                        int currentBlack = countVerticalWhite(image, column, 0);
                                        if (currentBlack &lt; highestBlack) {
                                                highestBlack = currentBlack;
                                        }
                                        column++;
                                }
                                box.setWidth(column - box.getX());
                                box.setY(highestBlack - 5);
                                box.setHeight(image.getHeight() - highestBlack + 5); //can trim this later
                                if (box.getHeight() &gt; maxCharHeight) {
                                        maxCharHeight = box.getHeight();
                                }
                                totalCharWidth += box.getWidth();
                                characters.add(box);
                        }
                }</pre>
<p>Here we simply compute a bounding box for each distinct region of black pixels (i.e. each character), plus some additional padding so that our output image will draw nicely.  Speaking of output image, we can now create it by positioning our characters in correct alignment with each other in a new image, like so:</p>
<pre class="brush: java; title: ; notranslate">                //output a new image with aligned characters
                BufferedImage dst = new BufferedImage (totalCharWidth, maxCharHeight,
                                                           BufferedImage.TYPE_INT_BGR);
                for (int column = 0; column &lt; dst.getWidth(); column++) {
                        for (int row = 0; row &lt; dst.getHeight(); row++) {
                                dst.setRGB(column, row, WHITE);
                        }
                }
                int xPos = 5;
                int yPos = 0;
                for (CharacterBox box : characters) {
                        for (int oldY = box.getY(); oldY &lt; box.getY() + box.getHeight(); oldY++) {
                                for (int oldX = box.getX(); oldX &lt; box.getX() + box.getWidth(); oldX++) {
                                        dst.setRGB(xPos + (oldX - box.getX()), yPos + (oldY - box.getY()), image.getRGB(oldX, oldY));
                                }
                        }
                        xPos += box.getWidth() + 5;
                }
                ImageIO.write(dst, &quot;png&quot;, new File(OUTPUT));</pre>
<p>Now we have the following:</p>
<p><a href="http://codethink.no-ip.org/wordpress/wp-content/uploads/2011/01/captcha-out4.png" rel="lightbox[150]"><img src="http://codethink.no-ip.org/wordpress/wp-content/uploads/2011/01/captcha-out4.png" alt="PHPBB2 CAPTCHA, fully processed" title="PHPBB2 CAPTCHA, fully processed" width="173" height="40" class="aligncenter size-full wp-image-171" /></a></p>
<p>The characters are nicely aligned and uniformly spaced.  We now have something that is suitable for sending into a character-recognition program.  For this example we use <a href="http://code.google.com/p/tesseract-ocr/" target="_blank">tesseract</a>, a free and open-source OCR program that provides a good level of accuracy.  We can send our output to tesseract like so:</p>
<pre class="brush: java; title: ; notranslate">                Process tesseractProc = Runtime.getRuntime().exec(TESSERACT_BIN + &quot; &quot; + OUTPUT + &quot; &quot; + TESSERACT_OUTPUT);
                tesseractProc.waitFor();</pre>
<p>This invokes tesseract on our output image, and it writes its results to a text file located at &#8216;<em>TESSERACT_OUTPUT</em>&#8216;.  In this case, the text file contains the following:</p>
<pre class="brush: plain; title: ; notranslate">IKEECL</pre>
<p>&#8230;which is 100% correct.  </p>
<p>Using a handful of very simple image filtering loops based around a brief examination of how a human being would approach the image, and some existing OCR software, the CAPTCHA has been defeated.  Of course, this only works for this one specific style of CAPTCHA, but the basic approach of reducing noise, amplifying data, and isolating characters should be broadly applicable to a wide range of different CAPTCHA styles.  The challenge lies not in breaking the CAPTCHA, but in devising an algorithm that can attempt to break any number of different CAPTCHA styles dynamically and with a success rate comparable to that of a human being.  It needs a way to determine, from the CAPTCHA image itself, what kind of noise exists and how it should best be removed.  That is the real challenge, and it&#8217;s beyond the scope of this article.  </p>
<p>Note that for the sake of preserving some sense of brevity I&#8217;ve left out the implementation of some minor utility functions and variable declarations and the like.  In general, you can assume that a function (or variable) does what its name implies.  If, however, you would like a complete copy of the source-code used, you can download it using <a href="http://codethink.no-ip.org/captcha.zip">this link</a> (zipped Eclipse project).  </p>
<p>Note that in order to get it to run you will also need to install <a href="http://code.google.com/p/tesseract-ocr/" target="_blank">tesseract</a> on your system, and edit the values at the start of the Java code to point at your local tesseract installation.</p>
]]></content:encoded>
			<wfw:commentRss>https://codethink.no-ip.org/archives/150/feed</wfw:commentRss>
		<slash:comments>30</slash:comments>
		</item>
	</channel>
</rss>
