OpenCV Feature Detection – Part #1: Acquiring Red Trousers

This blog post is the first in a series about using C++, Python and OpenCV to train a classifier to detect red trousers (it’s a fairly arbitrary choice of feature to detect – I have no strong feelings against them!). In this post, I’ll explain how you can use Python’s Scrapy module to acquire training data by building a spider to extract image URLs, and the Image Pipeline functionality to download the images. The site in question is “http://lookatmyfuckingredtrousers.blogspot.co.uk”, which contains around 200 images of people wearing red trousers. Whether or not this is enough images to train our classifier remains to be seen!

To start off with, Download and Install Scrapy. This can normally be done using pip install scrapy, or perhaps easy_install scrapy if you don’t have Pip installed (you should; it’s better!).

Once installed, you the ‘scapy’ tool should accessible from the command (if not, there’s probably a problem with your PATH environmental variable, but that’s out side the scope of this post). We can use this tool to help write the XPath selector we need to be able to access the relevant images. XPath is an XML query language that can be used to select elements from an XML tree – in this case, we’re interested in image nodes held within divs of the class ‘entry-content':

By running the command ‘scrapy shell http://lookatmyfuckingredtrousers.blogspot.co.uk’, we’re dropped into an interactive Scrapy session (which is really just a jazzed up Python shell), through which we can query the DOM.

As mentioned, we are interested in divs of class ‘entry-content’, as these would seem to hold the content for each post. The Scrapy shell exposes an object called ‘response’, which contains the HTML of the response received from the web server, along with several methods that we can use to query said HTML. You’d need to consult other resources to learn all of the XPath syntax – but in this case,  the query '//*[contains(@class, "entry-content")]' will match every node with a class attribute set to "entry-content" (‘//*’ matches all nodes, these nodes are then filtered by class value using '[contains(@class, "entry-content")]'.

To query the HTML in our Scrapy session, we can run: response.selector.xpath('//*[contains(@class, "entry-content")]'). You should see a big list of Scrapy Selectors, 50 in total. Now that we have a selector to match the post divs, we need to be able to extract the image URLs. As it turns out, in this case the <img> tags aren’t directly inside the “entry-content” div’s, they are held within another div (named separator), which contains an a node, which then contains the img node we’re interested in. Phew.

With a little bit of trial and error, the final XPath selector that we can use to grab the src attribute from the relevant images is: '//*[contains(@class, "entry-content")]/div[contains(@class, "separator")]/a/img/@src'.

Now we can create our Scrapy project. Do do this, simply run "scrapy startproject trousers". This will copy over some boilerplate code that we can use to create the spider. You should now have a folder called “Trousers”, inside which is another folder named “Trousers”. Open up the file named “items.py”, and change the “TrouserItems” class to look like the following:

class TrousersItem(scrapy.Item):
    image_urls = scrapy.Field()
    images = scrapy.Field()

We need to use these specific variable names, as Scrapy’s ImagePipeline (which we can use to do the actual image downloading) expects these names. We will populate image_urls with data extracted with our XPath query, and the ImagePipeline will populate images with the actual image file.

Now that we have a class to hold data about our Images, we can write write the Spider that tells Scrapy how it should acquire the images. Create a new file (I called mine redtrousers.py, you can name it anything) in the spiders/ directory, containing the following:

from scrapy import Spider, Item, Field, Request
from items import TrousersItem
class TrouserScraper(Spider):
    name, start_urls = "Trousers", ["http://lookatmyfuckingredtrousers.blogspot.co.uk"]
    def parse(self, response):
        for image in response.selector.xpath('//*[contains(@class, "entry-content")]/div[contains(@class, "separator")]/a/img/@src'):
             yield TrousersItem(image_urls=[image.extract()])
        for url in response.selector.xpath("//*[contains(@class, 'blog-pager-older-link')]/@href"):
            yield Request(url.extract(), callback=self.parse)

In this file, we are creating a new class (inheriting from the Spider Scapy class), defining a name, starting URL and parse method for our spider. The parse method is looping over each matching element from our XPath query, yielding a new TrousersItem. It is also finding the hyperlink to the “Older Posts” link (if such a link exists), and recursively calling itself under such circumstances. This is an easy way of dealing with pagination.

As we want to download the matching images, we can use the ImagePipeline feature in Scrapy. To enable it, modify the “settings.py” file in the Trousers subdirectory, adding the following two line (inserting a valid path in place of /Path/To/Your/Chosen/Directory):

ITEM_PIPELINES = {'scrapy.contrib.pipeline.images.ImagesPipeline': 1}
IMAGES_STORE = '/Path/To/Your/Chosen/Directory'

The Image Pipeline will now consume the TrouserItems we yield from our TrouseScraper.parse() method, downloading them to the IMAGES_STORE folder.

To run the spider, execute the command “scrapy crawl Trousers”, cross your fingers, and check the IMAGE_STORE directory for a shed load of images of people wearing red trousers!

Screen Shot 2015-01-22 at 18.05.58

If you receive errors about PIL not being available, it’s because you’re missing the Python Imaging Library. Running ‘pip install Pillow’ should sort the problem out.

In the next post, we’ll build our Cascade Classier using our scraped images, and start detecting some red trousers!

 

Bypassing Root Detection in Three InTouch

Three recently released “InTouch”, an application for Android and iOS that allows you to use a WiFi network to send/receive phone calls and text messages, meaning that you can continue to use your phone as a phone without having a cellular network connection.

Unfortunately for me, Three decided not to allow rooted devices to use the application – launching the app on a rooted device resulted in a “It seems that the device is rooted. This application can not run on rooted device” error.

Screenshot_2014-10-16-14-48-57

 

Not wanting to miss out on being able to use their application (my house is a signal deadzone), and being unwilling to un-root my phone, I decided to explore other avenues.

Firstly, I downloaded the APK file from my phone using adb:

adb pull /data/app/com.hutchison3g.threeintouch-1.apk

I then decompiled the application into Smali using apktool, by running the following command:

apktool d com.hutchison3g.threeintouch-1.apk

This created a new folder with the same name as the APK file. Inside that folder was another folder called “smali’, which contains the smali disassembly of the APK.

A simple grep for the string “root” was all that was needed to find the sections of the disassembly responsible for root detection:

The relevant lines were those containing “device is rooted” – in this case, “v.smali” and “FgVoIP.smali”. Opening up FgVoIP.smali and searching for the line containing the word “root” gave me some context:

Screen Shot 2014-10-16 at 15.21.09

Line 4193 is an if statement, checking if the register v0 is equal to zero. The value of v0 is return value of the method invoked on line 4189. In the case that v0 is equal to zero, execution jumps to whatever is at the label :cond_2 – if v0 is anything other than 0, then a string mentioning “device is rooted” is defined, and passed to another method. With that in mind, it’s fair to say that a() in the FgVoIP class is probably their “root checking” method.

An easy way to patch this root detection out is to modify the if statement on 4193 to make it unconditional. I did this by replacing “if-eqz v0, :cond_2″ with “goto :cond_2″:

Screen Shot 2014-10-16 at 15.27.21

I then repeated a similar process on “v.smali”.

Once I had modified the two smali files to skip the root detection, I needed to re-compile the apk file so that I could install it on my device. I accomplished this by running:

apktool b com.hutchison3g.threeintouch-1 -o com.hutchison3g.threeintouch-1-patched.apk

However, the resultant APK was un-signed. In order to install the APK onto my device, I needed to generate a key and sign the APK. I did this by following the instructions for “Signing Your App Manually” on the Android SDK documentation.

Once I had signed my app, I was able to install it by running “adb install com.hutchison3g.threeintouch-1-patched.apk”. I was then able to launch and use the Three InTouch app without any problems.

Screenshot_2014-10-16-15-40-16

 

It’s worth noting that I did this as a learning exercise, and don’t recommend that you necessarily go out there and do this yourself. Similar techniques can be used to bypass root detection in many Android Applications.

 

eBay Reflected XSS

Earlier in the year, I discovered an XSS vulnerability in the Selling Manager section of the eBay.

The problem was caused by improper escaping of the URL’s GET parameters, which were reflected back on on the page. When choosing the “drafts” section of the session manager, I noticed that several parameters appeared in the URL:

eBay XSS URL parameters

 

Naturally (after confirming that eBay allowed such testing), I tried modifying these parameters – to my surprise, the page happily showed my new, update values (although they weren’t saved server-wide). I could modify my feedback score, message count, inventory counts etc to contain invalid characters, such as letters. Unfortunately, eBay was escaping the strings to remove anything that would allow cross-site scripting – or so I thought.

After some more playing, I accidentally included a URL parameter twice. Again, to my surprise, the page showed both values, but separated by commas – however, this time the second value was not being escaped. By setting the duplicate parameters value to be a snippet of javascript, I could run malicious code in the context of sm.ebay.co.uk.

Combined with a phishing attack, an attacker could easily exploit this vulnerability to steal money from a user, gain access to their account and/or cause all kinds of trouble.

I reported this vulnerability to the “eBay inc Bug Bounty” on the 30th of May, and after some prodding, received an email back telling me that the eBay Inc bug bounty didn’t cover the eBay website. The problem then got forwarded on other eBay Bug Bounty . Fast forward to mid-July, I was asked for an example URL that would trigger the XSS (which I had included in my original report, but must have somehow got lost). I have not heard anything from eBay since, but the problem now seems to have been fixed.

Fixing “java.awt.HeadlessException” when launching an AVD

I’ve just bought a new laptop (Lenovo x230), and decided to go with a Debian Testing install. One of the problems I ran into was that I was unable to start an AVD (Android Virtual Device) using the GUI – the AVD manager would just crash, and I’d end up with a “java.awt.HeadlessException” exception printed to my console.

Google suggested that this was due to some incompatability between the AVD Manager and `openjdk-7-jre`, and that the fix was to remove `openjdk-7-jre` and replace it with `openjdk-6-jre`. I didn’t really want to have to do that, so came up with an alternative solution.

By downloading openjdk-6-jre, and modifying the ‘android’ bash script (located in android-sdk-linux/tools/android) by changing the line:

java_cmd="java"

to

java_cmd="/usr/lib/jvm/java-6-openjdk-amd64/jre/bin/java"

it becomes possible to use Java 7 by default, but run the AVD Manager under Java 6.

Introduction to virtualenv

Keeping track of  Python package dependencies can be a tricky task, especially when you’ve already got multiple packages installed and you’re not sure what your project is/isn’t using. Thankfully, a tool called virtualenv exists which helps keep track of your packages and lets you isolate installations.

Installing virtualenv is easy – it’s a Python script, and can be obtained by running pip install virtualenv.

Once virtualenv is installed, you can create your virtual environment by running virtualenv my_env_name. This will copy your system’s Python binary (you can specify a custom version by passing the --python=/path/to/your/desired/python flag), and install both pip and setuptools to the my_env_name folder. It also creates an activation script, which you can call by running source my_env_name/bin/activate. Activating your virtual environment will update your PATH to use the newly copied Python, as well as the new packages folder and pip install.

Now that your newly create virtual environment has been activated, any calls to Python, easy_install or pip will be passed to your newly created Python install. This means that pip will install packages to your virtual environment rather than to your system install, and that any system packages that you had previously installed are no longer accessible. A useful side-effect of running under a virtual environment is that both pip and easy_install no longer require special write privileges – you’ll no longer need sudo/root privileges to install packages.

Another handy use of virtualenv is to generate a list of requirements for your project – running pip freeze > requirements.txt will create a pip install -r compatible requirements.txt file for you, allowing you to easily distribute and keep track of project dependencies.

virtualenv can be de-activated by running deactivate from your shell, which restores your environment to its former self.