Strip contact data from strings

marcus • May 26th, 2007

we had to find a way to prevent users from publishing their contact information for our documenta accommodation project. here are a few regexp's that make you life easier (it's not 100% waterproof, but prevents most forms of cheating)
patterns = [
  /([\w\._%-]+[\s]*)(@|at|\[at\]|\(at\))([\s]*[\w\.-]+[\s]*\.[\s]*[a-zA-Z]{2,4})/i,
  /([http\:\/\/]*)([\s]*)([www\.]*)([\s]*)([\w\.-]+)([\s]*)\.([\s]*)(de|at|ch|com|org|net)/i,
  /[\d]{2,}/
]

result = orig
for p in patterns
  result = result.gsub(p, '')
end
the full script, including test:
#!/usr/bin/env ruby -w

#
# INPUT
#
orig = 
"""test@web.de
test2 @ web.de
test3 @ web . de
test4 AT web.de
test5(aT)betrug.de
test6 (at) betrug.de
test7[At]betrug.de
test8 [at] betrug.de
http://www.betrug.de
www.betrug.de
betrug.de
www.betrug.com
http://www.betrug.com
www . betrug . com
www. betrug.de
01702138958
01602138958 
0171 2138958
0561 1234567
+49 171 2138958
00491712138958
171 213895
"""

#
# REGEXP PATTERNS 
#
patterns = [
  /([\w\._%-]+[\s]*)(@|at|\[at\]|\(at\))([\s]*[\w\.-]+[\s]*\.[\s]*[a-zA-Z]{2,4})/i,
  /([http\:\/\/]*)([\s]*)([www\.]*)([\s]*)([\w\.-]+)([\s]*)\.([\s]*)(de|at|ch|com|org|net)/i,
  /[\d]{2,}/
]

result = orig
for p in patterns
  result = result.gsub(p, '')
end


#
# TEST
#
spacer = ""; 50.times do  spacer << "-" end
  
puts spacer
puts "ORIGINAL"
puts spacer

i=0
for line in orig
  puts "#{i}: #{line}"
  i=i+1
end  

puts spacer

i=0
for line in result
  puts "#{i}: #{line}"
  i=i+1
end

Tags

+opensource +ruby +snippets
3d 4-space abstract aesthetic system aesthetics algorithm alien ambient ambisonics animation architecture art artificial audio audio research black&white book caskets classic clicks & cuts code color computer-vision conceptual art consoles cpp culture ddr design devices digtial fabrication documenta documentation drawing dynamics electricity electromagnetism electronics environment event exhibition experimental exploration fashion festival film flocking folk food fractal furniture gamedev generative genetic geometry glitch graphic hacks haptics hardware history hyperspace ideas illustration images inspiration installation instrument intelligence interactive interieur japan java knowledge management landscape library life light liquid live london math micro minimal modernism monochrome motion motion graphics multiples music naming nature nervous ink networked networking opensource osx painting paper particles performance personal photography physics playful politics press print processing processing.org programming quotes recipes research retro romance ruby scripts sculpture SENDUNG.net shopping snippets social software sound space space exploration craft space exploration craft orbiter supercollider swiss systems technology theory theremin toys transformed travel tricks typography universe video visual vj water web2.0 xcode