PHP Email validation function

My PHP email validation function nowadays is:

function isEmail($value, $network_validation = true) {

    // Create the syntactical validation regular expression
    // (broken in 2 lines for better readability)
    $regexp = "/^([_a-z0-9-]+)(\.[_a-z0-9-]+)*@".
              "([a-z0-9-]+)(\.[a-z0-9-]+)*(\.[a-z]{2,4})$/i";

    // Validate the syntax
    if(preg_match($regexp, $value)) {
        if (! $network_validation)
            return true;
  	 
        $tmp = explode('@', $value);
        $username  = $tmp[0];
        $domaintld = $tmp[1];
 
        // Validate the domain
        if (getmxrr($domaintld, $mxrecords) || checkdnsrr($domaintld, 'A'))
            return true;
    }
    return false;
}

This is for my own reference, maybe this is all messed up, so use it at your own risk.

PHP regexp replace word(s) in html string if not inside tags

The problem, was to find and replace text inside HTML (without breaking the HTML), take for example this example string:

<img title=”My image” alt=”My image” src=”/gfx/this is my image.gif”><p>This is my string</p>

and you want to replace the string “my” to another string or to enclose it inside another tag (let’s assume <strong></strong>), but only the “my” outside the html tags. So after the transformation it would look like:

<img title=”My image” alt=”My image” src=”/gfx/this is my image.gif”><p>This is <strong>my</strong> string</p>

With PHP Regular Expression functions, the typical solution find and replace with word boundary fails here.

preg_replace('/\b(my)\b/i',
             '<strong>$1</strong>',
             $html_string);

you will end up with messed up html

<img title=”<strong>My</strong> image” alt=”<strong>My</strong> image” src=”/gfx/this is <strong>my</strong> image.gif”><p>This is <strong>my</strong> string</p>

now think the wonderful mess that would be if you are replacing the words like “form” or “alt” that can be a text node, a html tag or attribute….

So how to fix this? I figured that the only common thing to all tags is the open and close character, the < and >, from here you simply search the word you want to replace and the next close tag char (the > sign), and within the matched result, you try to find a open tag char, if you don’t find an open tag you are within a tag, so you abort the replace. Here it is the code:

function checkOpenTag($matches) {
    if (strpos($matches[0], '<') === false) {
        return $matches[0];
    } else {
        return '<strong>'.$matches[1].'</strong>'.$matches[2];
    }
}

preg_replace_callback('/(\bmy\b)(.*?>)/i',
                      'checkOpenTag',
                      $html_string);

If you are going to use this kind of code to implement several words search in a HTML text (ex: a glossary implementation) test for performance and do think about a caching system.

That’s it, remember as this solution worked fine for me, it also can work terribly bad for you so proceed at your own risk (aka liability disclaimer).

UPDATE 19-04-14
There was a comment about this post that warms about only the first occurrence being replaced in an HTML segment. So, there is an updated version of the PHP example with this issue corrected:

<?

class replaceIfNotInsideTags {

  private function checkOpenTag($matches) {
    if (strpos($matches[0], '<') === false) {
      return $matches[0];
    } else {
      return '<strong>'.$matches[1].'</strong>'.$this->doReplace($matches[2]);
    }
  }

  private function doReplace($html) {
    return preg_replace_callback('/(\b'.$this->word.'\b)(.*?>)/i',
                                 array(&$this, 'checkOpenTag'),
                                 $html);
  }

  public function replace($html, $word) {
    $this->word = $word;

    return $this->doReplace($html);
  }
}

$html = '<p>my bird is my life is my dream</p>';

$obj = new replaceIfNotInsideTags();
echo $obj->replace($html, 'my');

?>