Q: I need to use my Linux system to grep email addresses out of a text file. Is there a way I can tell grep to just look for emails?
A: You can use regular expressions with grep. If you construct a good regex you can pull just about anything out of a text file. Below we use grep with the -E (extended regex) option which allows interpretation of the pattern as a regular expression. The -o option tells grep to only show the matching pattern, not the whole line.
grep -E -o "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+.[a-zA-Z0-9.-]+\b" filename.txt
You can also use egrep instead of grep with the -E switch.
egrep -o "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+.[a-zA-Z0-9.-]+\b" filename.txt
That's it. With the above regular expression you should be able to find all the email addresses in your file.
$ egrep -o "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+.[a-zA-Z0-9.-]+\b" test
[email protected]
Let's break down the regular expression.
\b is a word boundary, so we put one on each side. This basically tells grep that there should be a blank space on either side of the match.
[a-zA-Z0-9.-] tries to specify any valid character for the beginning of the email address. These being lowercase a to z, uppercase a to z, any digit, a period or a dash.
The plus sign means add to or concatenate.
Then we specify the @ symbol, which is very recognizable.
Then we repeat the same section looking for valid characters twice, separated by a period. This all makes up the basic structure of an email address.
From grep man pages:
-E = Interpret PATTERN as an extended regular expression.
-o = Show only the part of a matching line that matches PATTERN.
Resources:
GREP MAN PAGE: https://ss64.com/bash/grep.html
Leave a Reply Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
40 Comments
Join Our Newsletter
Categories
- Bash Scripting (17)
- Basic Commands (50)
- Featured (7)
- Just for Fun (5)
- Linux Quick Tips (98)
- Linux Tutorials (65)
- Miscellaneous (15)
- Network Tools (6)
- Reviews (2)
- Security (32)
Thanks! I was almost there but not quite . .
thank you ...very usefull
You missed the underscore...
According to your regexp, [email protected] will be a valid email address.
Gentlemen, I appreciate you finding issues with my regex. Please post some solutions!
Thanks!
Thank you! Very Good!
thnx
Thanks a lot!
Thanks !!
grep -E -o "b[a-zA-Z0-9.-._]+@[a-zA-Z0-9.-]+.[a-zA-Z0-9.-]+b" filename.txt
I love this command! thank you!
Thanks. It works perfectly!
Just what I was after - thanks!
Thanks a lot. You saved my afternoon ^^
work, thanx
Can you help me to clear b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+.[a-zA-Z0-9.-]+b , i dont understand it ? can you explain ? Thanks
Thanks alo .you make a great help !
egrep -i ^[a-z0-9.-]+@[a-z0-9.-]+.[a-z0-9.-]+$ filename.txt
This is using regular expressions, here is some reasoning.
b = Tell grep to match a word boundary
[a-zA-Z0-9] = Tells grep to match any character from a-z, then the same thing capitalized, and also match anything from 0-9 (So basically any letter or number)
+ = Tell grep to match the preceeding any number of times. Which means all thoughter any number of upper case letters, lower case letters or digits.
And so on... Here are some good resources:
https://www.gnu.org/software/findutils/manual/html_node/find_html/egrep-regular-expression-syntax.html
http://www.cs.columbia.edu/~tal/3261/fall07/handout/egrep_mini-tutorial.htm
This comment has been removed by a blog administrator.
@Khanbaba khan - That is in imperfect solution. It will find "joe@domain." which is not a valid email address.
cool! but how to add a comma "," in every email adress? like this [email protected],[email protected],...
I came up with this in 5 seconds, might be a cleaner way though.
for i in `grep -E -o "b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+.[a-zA-Z0-9.-]+b" help`; do echo -n "$i,"; done
Although you will have a comma at the end of the list.
BTW, that should all be on one line.
Thank you . It was very useful...............
Oh, you just saved me an hour!
You're right that you can't end with a period. Though this leads to a related issue: nearly anything@anything is a valid email address according to the full spec (RFC822), including things like "{-a.-b@c=d$*!/?" (not to mention Unicode). If it doesn't matter for your application to reject uncommon addresses, this isn't much of an issue; just force people to get a "real" address that ends in .blah and doesn't contain fancy symbols. But if you want to err on the side of caution, *@* is pretty much the only way to go. A separate RegEx or script can be used later for actual validation. For example, processing the TLD according to a separate whitelist (like only accepting currently valid TLDs like com, net, gov, tv) though even that changes yearly and the list numbers in the thousands.
Simpler with [:alnum:]. "_" and "-" allowed and verify correct string length of domain and top domain:
egrep -o "[[:alnum:]_-]+@[[:alnum:]_-]{2,}\.[[:alnum:]]{2,}"
Sorry correction
egrep -o "b[[:alnum:]_-]+@[[:alnum:]_-]{2,}.[[:alnum:]]{2,}b"
thank you. This was a timesaver
Thank you 🙂 !!
Thank you for this command 🙂 it is quite useful to extract emails from various files, not only txt but cvs and similar...
plus(+) is a valid email address for most email systems [email protected] will get delivered to [email protected] but you will know +which... as it is ignored, you know which company is spamming you
late to the party by a good few years here, but this was very helpful! thankyou!
Thanks !
Thanks !
thank you
Thanks!
Shouldn't the dot after the second + sign be prefixed by \ ?