In Linux you can use regular expressions with grep to extract an IP address from a file. The grep command has the -E (extended regex) option to allow it to interpret a pattern as a extended regular expression.

The format of an IPv4 address is a 32-but numeric address, grouped 8 bits at a time (called an octet), each separated by a dot. Each octet can range from 0-255.

To start we will create a text file that contains both valid and invalid IP addresses.

[savona@putor ~]$ cat ips.txt 

Now we can create a simple regular expression to look for 4 blocks of 1-3 digits separated by a dot, like so:

grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' ips.txt

That command would be fine for most purposes, but if there is an invalid IP address like (remember an octet cannot be higher than 255) it would still find it.

Here is an example:

[savona@putor ~]$ grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' ips.txt

In the above example, it found all the entries. As we know, 2 of these are NOT valid IP addresses.

We have to expand this regular expression to tell it to ignore numbers of 255 or higher. Here is our second attempt to build a regular expression that would only extract valid IP addresses.

[savona@putor ~]$ grep -E '^((25[0-5]|2[0-4][0-9]|[1]?[1-9][0-9]?).){3}(25[0-5]|2[0-4][0-9]|[1]?[1-9]?[0-9])$' ips.txt

The above does a good job, but it still has issues. It will not find an IP address with leading zeros, nor will it find an IP address with 0 as the only number of the octet. This regular expression would not match for example.

In order to find a regular expression that will only extract valid IP addresses, we have to go to great lengths to validate every octet in the pattern.

Here is an example:

[savona@putor ~]$ grep -E "(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)" ips.txt

Now we were able to grep valid IP addresses. But, even the above is not without it's flaws.

It is an industry standard to remove leading zeros from addresses in both IPv4 and IPv6. But just because that is a standard way to represent IPv4 addresses, it doesn't mean everyone will.

For example, you could represent a zero filled octet with three zeros or a single zero. =

Both of these addresses would work fine on a network, but even ping removes the leading zeros.

[savona@putor ~]$ ping
 PING ( 56(84) bytes of data.
 64 bytes from icmp_seq=1 ttl=64 time=0.252 ms
 64 bytes from icmp_seq=2 ttl=64 time=0.312 ms

There is a little tool built by some folks at Red Hat called ipcalc. It was originally built to calculate IP information for a host, but can also be used for IP address validation. Surely enough, this tool also calls and IP address with leading zeros an invalid IP address.

[savona@putor ~]$ ipcalc -c
 ipcalc: bad IPv4 address:

As you can see it is fairly easy to use grep and regular expressions to extract an IP address from a file. It is not so easy to ensure that the matched pattern is a valid IP address.

Related Articles:
How to grep Email Addresses from a Text File Using Regular Expressions