Data Manipulation

One key aspect of Habu is that it’s inspired on the following Unix phisolophy concept:

“Write programs to handle text streams, because that is a universal interface.”

You can manipulate text data with a lot of Unix well known tools, like sed, awk and grep, but Habu has some tools that can do other specific things.

Data Extraction

One thing that you will need to do a lot of times it’s extract information from text streams, like IP addresses, domain names, etc.

You can use the awesome “grep” tool (http://man7.org/linux/man-pages/man1/grep.1.html) for that, but you need to know exactly wich regular expression (https://en.wikipedia.org/wiki/Regular_expression) do you need for each case.

Also, some data extraction tasks can’t be only done with regular expresions (like validate if a domain exists).

For the data extraction process, you can use the habu.data.extract.* tools, that are the following:

  • habu.data.extract.ipv4
  • habu.data.extract.ipv6
  • habu.data.extract.domain
  • habu.data.extract.fqdn
  • habu.data.extract.email

A simple example:

$ cat logfile.txt | habu.data.extract.ipv4
181.143.135.154
192.168.0.128
152.118.223.187
181.12.112.78
181.22.122.130
181.22.122.130
172.16.5.14
200.125.122.20

Probably, you will see a lot of repeated results (like in the example above). To show each record only one time, you can use the awesome “sort” tool, with the “-u” (unique) parameter, like this:

$ cat logfile.txt | habu.data.extract.ipv4 | sort -u
152.118.223.187
172.16.5.14
181.12.112.78
181.143.135.154
181.22.122.130
192.168.0.128
200.125.122.20

For convenience, the habu.data.extract.* commands support the ‘-u’ option, that does the same:

$ cat logfile.txt | habu.data.extract.ipv4 -u
152.118.223.187
172.16.5.14
181.12.112.78
181.143.135.154
181.22.122.130
192.168.0.128
200.125.122.20

Data Enrichment

Now, you have a list of IP addresses that appears in the file ‘logfile.txt’. But, possibly, you need to get more information about each item, so, you can use the command habu.data.enrich, like this:

$ cat logfile.txt | habu.data.extract.ipv4 -u | habu.data.enrich
[
 // LINES ABOVE REMOVED FOR CLARITY //
 {
     "item": "181.22.122.130",
     "family": "ipv4_address",
     "is_multicast": false,
     "is_global": true,
     "is_unspecified": false,
     "is_reserved": false,
     "is_loopback": false,
     "is_link_local": false
 },
 {
     "item": "192.168.0.128",
     "family": "ipv4_address",
     "is_multicast": false,
     "is_global": false,
     "is_unspecified": false,
     "is_reserved": false,
     "is_loopback": false,
     "is_link_local": false
 },
 {
     "item": "200.125.122.20",
     "family": "ipv4_address",
     "is_multicast": false,
     "is_global": true,
     "is_unspecified": false,
     "is_reserved": false,
     "is_loopback": false,
     "is_link_local": false
 }
]

Now, you have a JSON formated list of items, each one with additional information.

On the current version, Habu supports the following item families:

  • ipv4_address
  • ipv4_network
  • ipv6_address
  • ipv6_network

More families can be added on next versions.

Data Filtering

With the enrichment done in the later step, we can use the command habu.data.filter to take only the items in which we are interested.

Supose that you only want to get the private IP addresses, you can use the following:

$ cat logfile.txt | habu.data.extract.ipv4 -u | habu.data.enrich | habu.data.filter is_global false
[
    {
        "item": "172.16.5.14",
        "family": "ipv4_address",
        "is_multicast": false,
        "is_global": false,
        "is_unspecified": false,
        "is_reserved": false,
        "is_loopback": false,
        "is_link_local": false
    },
    {
        "item": "192.168.0.128",
        "family": "ipv4_address",
        "is_multicast": false,
        "is_global": false,
        "is_unspecified": false,
        "is_reserved": false,
        "is_loopback": false,
        "is_link_local": false
    }
]

You can pipe the commands much times has you need, to make more filterings.

Available Filters

Operator Description Parameter
gt The value is greater than x
lt The value is lesser than x
eq The value is equal to x
ne The value is not equal to x
ge The value is greater or equal to x
le The value is lesser or equal to x
in The value is in the following list of command separated items x
contains The value contains the following string x
defined The field is defined  
undefined The field is not defined  
true The value is True  
false The value is False  

The operators marked with an ‘x’ in the “Parameter” column needs a parameter to be compared.

If you see the description of each operator, this is pretty obvious, because you need a parameter to compare the values if you’re using operators like “greater than” or “lesser than”.

But, you don’t need parameters if you are checking if a field is defined or not.

Note: The command habu.data.filter can be used to filter any JSON formated text, not only the outputs that have been produced by another Habu command.

Data Selection

To finish, maybe you need to select only the value of one field of each item, like this:

$ cat logfile.txt | habu.data.extract.ipv4 -u | habu.data.enrich | habu.data.filter is_global false | habu.data.select item
172.16.5.14
192.168.0.128

Summary

And thats all, with some simple commands, you’ve made the following steps:

  1. Extract the IPv4 addresses from a text file
  2. Enriched each IPv4 address to know things like if they’re public or private addresses
  3. Filtered only the private addresses, discarding the public ones
  4. Select only the address part of each item, discaring the other information