we had to find a way to prevent users from publishing their contact information for our
documenta accommodation project.
here are a few regexp's that make you life easier (it's not 100% waterproof, but prevents most forms of cheating)
patterns = [
/([\w\._%-]+[\s]*)(@|at|\[at\]|\(at\))([\s]*[\w\.-]+[\s]*\.[\s]*[a-zA-Z]{2,4})/i,
/([http\:\/\/]*)([\s]*)([www\.]*)([\s]*)([\w\.-]+)([\s]*)\.([\s]*)(de|at|ch|com|org|net)/i,
/[\d]{2,}/
]
result = orig
for p in patterns
result = result.gsub(p, '')
end
the full script, including test:
#!/usr/bin/env ruby -w
#
# INPUT
#
orig =
"""test@web.de
test2 @ web.de
test3 @ web . de
test4 AT web.de
test5(aT)betrug.de
test6 (at) betrug.de
test7[At]betrug.de
test8 [at] betrug.de
http://www.betrug.de
www.betrug.de
betrug.de
www.betrug.com
http://www.betrug.com
www . betrug . com
www. betrug.de
01702138958
01602138958
0171 2138958
0561 1234567
+49 171 2138958
00491712138958
171 213895
"""
#
# REGEXP PATTERNS
#
patterns = [
/([\w\._%-]+[\s]*)(@|at|\[at\]|\(at\))([\s]*[\w\.-]+[\s]*\.[\s]*[a-zA-Z]{2,4})/i,
/([http\:\/\/]*)([\s]*)([www\.]*)([\s]*)([\w\.-]+)([\s]*)\.([\s]*)(de|at|ch|com|org|net)/i,
/[\d]{2,}/
]
result = orig
for p in patterns
result = result.gsub(p, '')
end
#
# TEST
#
spacer = ""; 50.times do spacer << "-" end
puts spacer
puts "ORIGINAL"
puts spacer
i=0
for line in orig
puts "#{i}: #{line}"
i=i+1
end
puts spacer
i=0
for line in result
puts "#{i}: #{line}"
i=i+1
end
Tags
+opensource
+ruby
+snippets