preg_replace
(PHP 3 >= 3.0.9, PHP 4, PHP 5)
preg_replace -- 執行正則表達式的搜索和替換說明
mixed preg_replace ( mixed pattern, mixed replacement, mixed subject [, int limit] )在 subject 中搜索 pattern 模式的匹配項並替換為 replacement。如果指定了 limit,則僅替換 limit 個匹配,如果省略 limit 或者其值為 -1,則所有的匹配項都會被替換。
replacement 可以包含 \\n 形式或(自 PHP 4.0.4 起)$n 形式的逆向引用,首選使用后者。每個此種引用將被替換為與第 n 個被捕獲的括號內的子模式所匹配的文本。n 可以從 0 到 99,其中 \\0 或 $0 指的是被整個模式所匹配的文本。對左圓括號從左到右計數(從 1 開始)以取得子模式的數目。
對替換模式在一個逆向引用后面緊接着一個數字時(即:緊接在一個匹配的模式后面的數字),不能使用熟悉的 \\1 符號來表示逆向引用。舉例說 \\11,將會使 preg_replace() 搞不清楚是想要一個 \\1 的逆向引用后面跟着一個數字 1 還是一個 \\11 的逆向引用。本例中的解決方法是使用 \${1}1。這會形成一個隔離的 $1 逆向引用,而使另一個 1 只是單純的文字。
例子 1. 逆向引用后面緊接着數字的用法
|
如果搜索到匹配項,則會返回被替換后的 subject,否則返回原來不變的 subject。
preg_replace() 的每個參數(除了 limit)都可以是一個數組。如果 pattern 和 replacement 都是數組,將以其鍵名在數組中出現的順序來進行處理。這不一定和索引的數字順序相同。如果使用索引來標識哪個 pattern 將被哪個 replacement 來替換,應該在調用 preg_replace() 之前用 ksort() 對數組進行排序。
例子 2. 在 preg_replace() 中使用索引數組
|
如果 subject 是個數組,則會對 subject 中的每個項目執行搜索和替換,並返回一個數組。
如果 pattern 和 replacement 都是數組,則 preg_replace() 會依次從中分別取出值來對 subject 進行搜索和替換。如果 replacement 中的值比 pattern 中的少,則用空字符串作為余下的替換值。如果 pattern 是數組而 replacement 是字符串,則對 pattern 中的每個值都用此字符串作為替換值。反過來則沒有意義了。
/e 修正符使 preg_replace() 將 replacement 參數當作 PHP 代碼(在適當的逆向引用替換完之后)。提示:要確保 replacement 構成一個合法的 PHP 代碼字符串,否則 PHP 會在報告在包含 preg_replace() 的行中出現語法解析錯誤。
例子 3. 替換數個值
本例將輸出:
|
例子 4. 使用 /e 修正符
這將使輸入字符串中的所有 HTML 標記變成大寫。 |
例子 5. 將 HTML 轉換成文本
|
注: limit 參數是 PHP 4.0.1pl2 之后加入的。
25-May-2006 01:58
Updated version of the link script, since the other version didn't work with links in beginning of line, links without http:// and emails. Oh, and a bf2:// detection too for all you gamers ;)
function make_links_blank($text)
{
return preg_replace(
array(
'/(?(?=<a[^>]*>.+<\/a>)
(?:<a[^>]*>.+<\/a>)
|
([^="\']?)((?:https?|ftp|bf2|):\/\/[^<> \n\r]+)
)/iex',
'/<a([^>]*)target="?[^"\']+"?/i',
'/<a([^>]+)>/i',
'/(^|\s)(www.[^<> \n\r]+)/iex',
'/(([_A-Za-z0-9-]+)(\\.[_A-Za-z0-9-]+)*@([A-Za-z0-9-]+)
(\\.[A-Za-z0-9-]+)*)/iex'
),
array(
"stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\">\\2</a>\\3':'\\0'))",
'<a\\1',
'<a\\1 target="_blank">',
"stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\">\\2</a>\\3':'\\0'))",
"stripslashes((strlen('\\2')>0?'<a href=\"mailto:\\0\">\\0</a>':'\\0'))"
),
$text
);
}
19-May-2006 11:28
Re: preg_replace() with the /e modifier; handling escaped quotes.
I was writing a replacement pattern to parse HTML text which sometimes contained PHP variable-like strings. Various initial solutions yeilded either escaped quotes or fatal errors due to these variable-like strings being interpreted as poorly formed variables.
"Tim K." and "steven -a-t- acko dot net" provide some detailed discussion of preg_replace's quote escaping in the comments below, including the use of str_replace() to remove the preg_replace added slash-quote. However, this suggestion is applied to the entire text AFTER the preg_match. This isn't a robust solution in that it is conceivable that the text unaffected by the preg_replace() may contain the string \\" which should not be fixed. Furthermore, the addition of escaped quotes within preg_replaces with multiple patterns/replacements (with arrays) may break one of the following patterns.
The solution, then, must fix the quote-escaped text BEFORE replacing it in the target, and possibly before it is passed to a function within the replacement code. Since the replacement string is interpreted as PHP code, just use str_replace('\\"','"','$1') where you need an unadulterated $1 to appear. The key is to properly escape the necessary characters. Three variations appear in the examples below, as well as a set of incorrect examples. I haven't seen this solution posted before, so hopefully this will be helpful rather than covering old ground.
Try this example code:
<?php
/*
Using preg_replace with the /e modifier on ANY text, regardless of single
quotes, double quotes, dollar signs, backslashes, and variable interpolation.
Tested on PHP 5.0.4 (cli), PHP 5.1.2-1+b1 (cli), and PHP 5.1.2 for Win32.
Solution?
1. Use single quotes for the replacement string.
2. Use escaped single quotes around the captured text variable (\\'$1\\').
3. Use str_replace() to remove the escaped double quotes
from within the replacement code (\\" -> ").
*/
function _prc_function1($var1,$var2,$match) {
$match = str_replace('\\"','"',$match);
// ... do other stuff ...
return $var1.$match.$var2;
}
function _prc_function2($var1,$var2,$match) {
// ... do other stuff ...
return $var1.$match.$var2;
}
$v1 = '(';
$v2 = ')';
// Lots of tricky characters:
$text = "<xxx>...\\$varlike_text[_'\\\\\\""\\"'...</xxx>";
$pattern = '/<xxx>(.*?)</xxx>/e';
echo $text . " Original.<br>\\n";
// Example #1 - Processing in place.
// returns (...$varlike_text['"""\\'...)
echo preg_replace(
$pattern,
'$v1 . str_replace(\\'\\\\"',\\'"\\',\\'$1\\') . $v2',
$text) . " Escaped double quotes replaced with str_replace. (Good.)<br>n";
// Example #2 - Processing within a function.
// returns (...$varlike_text['\\"""'...)
echo preg_replace(
$pattern,
'_prc_function1($v1,$v2,'$1\\')',
$text) . " Escaped double quotes replaced in a function. (Good.)<br>\\n";
// Example #3 - Preprocessing before a function.
// returns (...$varlike_text['"""\\'...)
echo preg_replace(
$pattern,
'_prc_function2($v1,$v2,str_replace(\\'\\\\"',\\'"\\',\\'$1\\'))',
$text) . " Escaped double quotes replaced with str_replace before sending match to a function. (Good.)<br>n";
// Example #4 - INCORRECT implementations
// a. returns (...$varlike_text[_'\\\\""\\"'...)
// b. returns (...$varlike_text[_'"\\\\""\\'...)
// c. returns (...$varlike_text[_'\\\\""\\"'...)
// d. Causes a syntax+fatal error, unexpected T_BAD_CHARACTER...
echo preg_replace( $pattern, "\\$v1 . '$1' . \\$v2", $text)," Enclosed in double/single quotes. (Wrong! Extra slashes.)<br>\\n";
echo preg_replace( $pattern, "\\$v1 . '$1' . \\$v2", $text)," Enclosed in double/single quotes, $ escaped. (Wrong! Extra slashes.)<br>\\n";
echo preg_replace( $pattern, '$v1 . '$1\\' . $v2', $text)," Enclosed in single/single quotes. (Wrong! Extra slashes.)<br>\\n";
echo preg_replace( $pattern, '$v1 . "$1" . $v2', $text)," Enclosed in single quotes. (Wrong! Dollar sign in text is interpreted as variable interpolation.)<br>\\n";
?>
16-May-2006 05:24
See as well the excellent tutorial at http://www.tote-taste.de/X-Project/regex/index.php
;-) Klemens
21-Apr-2006 08:15
For those of you that have ever had the problem where clients paste text from msword into a CMS, where word has placed all those fancy quotes throughout the text, breaking the XHTML validator... I have created a nice regular expression, that replaces ALL high UTF-8 characters with HTML entities, such as ’.
Note that most user examples on php.net I have read, only replace selected characters, such as single and double quotes. This replaces all high characters, including greek characters, arabian characters, smilies, whatever.
It took me ages to get it just downto two regular expressions, but it handles all high level characters properly.
$text = preg_replace('/([\xc0-\xdf].)/se', "'&#' . ((ord(substr('$1', 0, 1)) - 192) * 64 + (ord(substr('$1', 1, 1)) - 128)) . ';'", $text);
$text = preg_replace('/([\xe0-\xef]..)/se', "'&#' . ((ord(substr('$1', 0, 1)) - 224) * 4096 + (ord(substr('$1', 1, 1)) - 128) * 64 + (ord(substr('$1', 2, 1)) - 128)) . ';'", $text);
20-Apr-2006 11:37
I just wanted to give an example for some people that have the problem, that their match is taking away too much of the string.
I wanted to have a function that extracts only wanted parameters out of a http query string, and they had to be flexible, eg 'updateItem=1' should be replaced, as well as 'updateCategory=1', but i sometimes ended up having too much replaced from the query.
example:
my query string: 'updateItem=1&itemID=14'
ended up in a query string like this: '4' , which was not really covering the plan ;)
i was using this regexp:
preg_replace('/&?update.*=1&?/','',$query_string);
i discovered, that preg_replace matches the longest possible string, which means that it replaces everything from the first u up to the 1 after itemID=
I assumed, that it would take the shortest possible match.
19-Apr-2006 05:08
for those of you with multiline woes like I was having, try:
$str = preg_replace('/<tag[^>](.*)>(.*)<\/tag>/ims','<!-- edited -->', $str);
10-Apr-2006 02:54
Here recently I needed a way to replace links (<a href="blah.com/blah.php">Blah</a>) with their anchor text, in this case Blah. It might seem simple enough for some..or most, but at the benefit of helping others:
<?php
$value = '<a href="http://www.domain.com/123.html">123</a>';
echo preg_replace('/<a href="(.*?)">(.*?)<\\/a>/i', '$2', $value);
//Output
// 123
?>
08-Apr-2006 04:13
If you have a form element displaying the amounts using "$" and ",". Before posting it to the db you can use the following:
$search = array('/,/','/\$/');
$replace = array('','');
$data['amount_limit'] = preg_replace($search,'',$data['amount_limit']);
06-Apr-2006 01:21
I found some situations that my function bellow doesn't
perform as expected. Here is the new version.
<?php
function make_links_blank( $text )
{
return preg_replace(
array(
'/(?(?=<a[^>]*>.+<\/a>)
(?:<a[^>]*>.+<\/a>)
|
([^="\'])((?:https?|ftp):\/\/[^<> \n\r]+)
)/iex',
'/<a([^>]*)target="?[^"\']+"?/i',
'/<a([^>]+)>/i'
),
array(
"stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\">\\2</a>\\3':'\\0'))",
'<a\\1',
'<a\\1 target="_blank">'
),
$text
);
}
?>
This function replaces links (http(s)://, ftp://) with respective html anchor tag, and also makes all anchors open in a new window.
28-Mar-2006 11:40
Something innovative for a change ;-) For a news system, I have a special format for links:
"Go to the [Blender3D Homepage|http://www.blender3d.org] for more Details"
To get this into a link, use:
$new = preg_replace('/\[(.*?)\|(.*?)\]/', '<a href="$2" target="_blank">$1</a>', $new);
18-Mar-2006 06:35
In response to elaineseery at hotmail dot com
[quote]if you're new to this function, and getting an error like 'delimiter must not alphanumeric backslash ...[/quote]
Note that if you use arrays for search and replace then you will want to quote your searching expression with / or you will get this error.
However, if you use a single string to search and replace then you will not recieve this error if you do not quote your regular expression in /
16-Mar-2006 06:46
I said there was a better way. There is!
The regexp is essentially the same but now I deal with problems that it couldn't handle, such as urls, which tended to screw things up, and the odd placement of a : or ; in the body text, by using functions. This makes it easier to expand to take account of all the things I know I've not taken account of. But here it is in its essential glory. Or mediocrity. Take your pick.
<?php
define('PARSER_ALLOWED_STYLES_',
'text-align,font-family,font-size,text-decoration');
function strip_styles($source=NULL) {
$exceptions = str_replace(',', '|', @constant('PARSER_ALLOWED_STYLES_'));
/* First we want to fix anything that might potentially break the styler stripper, sow e try and replace
* in-text instances of : with its html entity replacement.
*/
function Replacer($text) {
$check = array (
'@:@s',
);
$replace = array(
':',
);
return preg_replace($check, $replace, $text[0]);
}
$source = preg_replace_callback('@>(.*)<@Us', 'Replacer', $source);
$regexp =
'@([^;"]+)?(?<!'. $exceptions. ')(?<!\>\w):(?!\/\/(.+?)\/|<|>)((.*?)[^;"]+)(;)?@is';
$source = preg_replace($regexp, '', $source);
$source = preg_replace('@[a-z]*=""@is', '', $source);
return $source;
}
?>
16-Mar-2006 05:33
"Document contains no data" message in FF and 'This page could not be found' in IE occures when you pass too long <i>subject</i> string to preg_replace() with default <i>limit</i>.
Increment the limit to be sure it's larger than a subject lenght.
16-Mar-2006 06:50
Here is a function that replaces the links (http(s)://, ftp://) with respective html anchor, and also makes all anchors open in a new window.
function make_links_blank( $text )
{
return preg_replace( array(
"/[^\"'=]((http|ftp|https):\/\/[^\s\"']+)/i",
"/<a([^>]*)target=\"?[^\"']+\"?/i",
"/<a([^>]+)>/i"
),
array(
"<a href=\"\\1\">\\1</a>",
"<a\\1",
"<a\\1 target=\"_blank\" >"
),
$text
);
}
13-Mar-2006 01:02
Sorry, I don't know English.
Replacing letters of badword for a definite character.
View example:
<?php
function censured($string, $aBadWords, $sChrReplace) {
foreach ($aBadWords as $key => $word) {
// Regexp for case-insensitive and use the functions
$aBadWords[$key] = "/({$word})/ie";
}
// to substitue badwords for definite character
return preg_replace($aBadWords,
"str_repeat('{$sChrReplace}', strlen('\\1'))",
$string
);
}
// To show modifications
print censured('The nick of my friends are rand, v1d4l0k4, P7rk, ferows.',
array('RAND', 'V1D4L0K4', 'P7RK', 'FEROWS'),
'*'
);
?>
07-Mar-2006 05:32
Inspired by the query-string cleaner from greenthumb at 4point-webdesign dot com and istvan dot csiszar at weblab dot hu. This little bit of code cleans up any "style" attributes in your tags, leaving behind only styles that you have specifically allowed. Also conveniently strips out nonsense styles. I've not fully tested it yet so I'm not sure if it'll handle features like url(), but that shouldn't be a difficulty.
<?php
/* The string would normally be a form-submitted html file or text string */
$string = '<span style="font-family:arial; font-size:20pt; text-decoration:underline; sausage:bueberry;" width="200">Hello there</span> This is some <div style="display:inline;">test text</div>';
/* Array of styles to allow. */
$except = array('font-family', 'text-decoration');
$allow = implode($except, '|');
/* The monster beast regexp. I was up all night trying to figure this one out. */
$regexp = '@([^;"]+)?(?<!'.$allow.'):(?!\/\/(.+?)\/)((.*?)[^;"]+)(;)?@is';
print str_replace('<', '<', $regexp).'<br/><br/>';
$out = preg_replace($regexp, '', $string);
/* Now lets get rid of any unwanted empty style attributes */
$out = preg_replace('@[a-z]*=""@is', '', $out);
print $out;
?>
This should produce the following:
<span style="font-family:arial; text-decoration:underline;" width="200">Hello there</span> This is some <div >test text</div>
Now, I'm a relative newbie at this so I'm sure there's a better way to do it. There's *always* a better way.
15-Feb-2006 10:44
if you're new to this function, and getting an error like
'delimiter must not alphanumeric backslash ...
note that whatever is in $pattern (and only $pattern, not $string, or $replacement) must be enclosed by '/ /' (note the forward slashes)
e.g.
$pattern = '/and/';
$replacement = 'sandy';
$string = 'me and mine';
generates 'me sandy mine'
seems to be obvious to everyone else, but took me a while to figure out!!
08-Feb-2006 01:23
If the lack of &$count is aggravating in PHP 4.x, try this:
$replaces = 0;
$return .= preg_replace('/(\b' . $substr . ')/ie', '"<$tag>$1<$end_tag>" . (substr($replaces++,0,0))', $s2, $limit);
05-Feb-2006 04:21
decodes ie`s escape() result
<?
function unicode_unescape(&$var, $convert_to_cp1251 = false){
$var = preg_replace(
'#%u([\da-fA-F]{4})#mse',
$convert_to_cp1251 ? '@iconv("utf-16","windows-1251",pack("H*","\1"))' : 'pack("H*","\1")',
$var
);
}
//
$str = 'to %u043B%u043E%u043F%u0430%u0442%u0430 or not to %u043B%u043E%u043F%u0430%u0442%u0430';
unicode_unescape($str, true);
echo $str;
?>
05-Feb-2006 01:40
I've found out a really odd error.
When I try to use the 'empty' function in the replacement string (when using the 'e' modifier, of course) the regexp interpreter get stucked at that point.
An examplo of this failure:
<?php
echo $test = preg_replace( "/(bla)/e", "empty(123)", "bla bla ble" );
# it should print something like:
# "1 1 ble"
?>
Very odd, huh?
fairly useful script to replace normal html entities with ordinal-value entities. Useful for writing to xml documents where entities aren't defined.
<?php
$p='#(\&[\w]+;)#e';
$r="'&#'.ord(html_entity_decode('$1')).';'";
$text=preg_replace($p,$r,$_POST['data']);
?>
03-Feb-2006 03:51
Following up on pietjeprik at gmail dot com's great string to parse [url] bbcode:
<?php
$url = '[url=http://www.foo.org]The link[/url]';
$text = preg_replace("/\[url=(\W?)(.*?)(\W?)\](.*?)\[\/url\]/", '<a href="$2">$4</a>', $url);
?>
This allows for the user to enter variations:
[url=http://www.foo.org]The link[/url]
[url="http://www.foo.org"]The link[/url]
[url='http://www.foo.org']The link[/url]
or even
[url=#http://www.foo.org#]The link[/url]
[url=!http://www.foo.org!]The link[/url]
Uh-oh. When I looked at the text in the preview, I had to double the number of backslashes to make it look right.
I'll try again with my original text:
$full_text = preg_replace('/\[p=(\d+)\]/e',
"\"<a href=\\\"./test.php?person=$1\\\">\"
.get_name($1).\"</a>\"",
$short_text);
I hope that it comes out correctly this time :-)
01-Feb-2006 12:24
I've found a use for preg_replace. If you've got eg. a database with persons assiciated with numbers, you may want to input links in a kind of shorthand, like [p=12345], and have it expanded to a full url with a name in it.
This is my solution:
$expanded_text = preg_replace('/\\[p=(\d+)\\]/e',
"\\"<a href=\\\\\\"./test.php?person=$1\\\\\\">\\".get_name($1).\\"</a&>\\"",
$short_text);
It took me some time to work out the proper number of quotes and backslashes.
regards, Leif.
20-Jan-2006 08:43
Re: wcc at techmonkeys dot org
You could put this in 1 replace for faster execution as well:
<?php
/*
* Removes all blank lines from a string.
*/
function removeEmptyLines($string)
{
return preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);
}
?>
First, I have no idea about regexp, all I did has been through trial and error,
I wrote this function which tries to clean crappy ms word html, I use it to clean user pasted code to online wysiwyg online editors from ms word.
Theres a huge space for improvement, I post it here because after searching I could not find any pure php solution, the best alternative however, is tidy, but for those of us who are still using PHP 4 and do not have access to the server, this could be an alternative, use it under your own risk... once again, it was a quickie and I know there can be much better ways to do this:
function decraper($htm, $delstyles=false) {
$commoncrap = array('"'
,'font-weight: normal;'
,'font-style: normal;'
,'line-height: normal;'
,'font-size-adjust: none;'
,'font-stretch: normal;');
$replace = array("'");
$htm = str_replace($commoncrap, $replace, $htm);
$pat = array();
$rep = array();
$pat[0] = '/(<table\s.*)(width=)(\d+%)(\D)/i';
$pat[1] = '/(<td\s.*)(width=)(\d+%)(\D)/i';
$pat[2] = '/(<th\s.*)(width=)(\d+%)(\D)/i';
$pat[3] = '/<td( colspan="[0-9]+")?( rowspan="[0-9]+")?
( width="[0-9]+")?( height="[0-9]+")?.*?>/i';
$pat[4] = '/<tr.*?>/i';
$pat[5]=
'/<\/st1:address>(<\/st1:\w*>)?
<\/p>[\n\r\s]*<p[\s\w="\']*>/i';
$pat[6] = '/<o:p.*?>/i';
$pat[7] = '/<\/o:p>/i';
$pat[8] = '/<o:SmartTagType[^>]*>/i';
$pat[9] = '/<st1:[\w\s"=]*>/i';
$pat[10] = '/<\/st1:\w*>/i';
$pat[11] = '/<p[^>]*>(.*?)<\/p>/i';
$pat[12] = '/ style="margin-top: 0cm;"/i';
$pat[13] = '/<(\w[^>]*) class=([^ |>]*)([^>]*)/i';
$pat[14] = '/<ul(.*?)>/i';
$pat[15] = '/<ol(.*?)>/i';
$pat[17] = '/<br \/> <br \/>/i';
$pat[18] = '/ <br \/>/i';
$pat[19] = '/<!-.*?>/';
$pat[20] = '/\s*style=(""|\'\')/';
$pat[21] = '/ style=[\'"]tab-interval:[^\'"]*[\'"]/i';
$pat[22] = '/behavior:[^;\'"]*;*(\n|\r)*/i';
$pat[23] = '/mso-[^:]*:"[^"]*";/i';
$pat[24] = '/mso-[^;\'"]*;*(\n|\r)*/i';
$pat[25] = '/\s*font-family:[^;"]*;?/i';
$pat[26] = '/margin[^"\';]*;?/i';
$pat[27] = '/text-indent[^"\';]*;?/i';
$pat[28] = '/tab-stops:[^\'";]*;?/i';
$pat[29] = '/border-color: *([^;\'"]*)/i';
$pat[30] = '/border-collapse: *([^;\'"]*)/i';
$pat[31] = '/page-break-before: *([^;\'"]*)/i';
$pat[32] = '/font-variant: *([^;\'"]*)/i';
$pat[33] = '/<span [^>]*><br \/><\/span><br \/>/i';
$pat[34] = '/" "/';
$pat[35] = '/[\t\r\n]/';
$pat[36] = '/\s\s/s';
$pat[37] = '/ style=""/';
$pat[38] = '/<span>(.*?)<\/span>/i';
//empty (no attribs) spans
$pat[39] = '/<span>(.*?)<\/span>/i';
//twice, nested spans
$pat[40] = '/(;\s|\s;)/';
$pat[41] = '/;;/';
$pat[42] = '/";/';
$pat[43] = '/<li(.*?)>/i';
$pat[44] =
'/(<\/b><b>|<\/i><i>|<\/em><em>|
<\/u><u>|<\/strong><strong>)/i';
$rep[0] = '$1$2"$3"$4';
$rep[1] = '$1$2"$3"$4';
$rep[2] = '$1$2"$3"$4';
$rep[3] = '<td$1$2$3$4>';
$rep[4] = '<tr>';
$rep[5] = '<br />';
$rep[6] = '';
$rep[7] = '<br />';
$rep[8] = '';
$rep[9] = '';
$rep[10] = '';
$rep[11] = '$1<br />';
$rep[12] = '';
$rep[13] = '<$1$3';
$rep[14] = '<ul>';
$rep[15] = '<ol>';
$rep[17] = '<br />';
$rep[18] = '<br />';
$rep[19] = '';
$rep[20] = '';
$rep[21] = '';
$rep[22] = '';
$rep[23] = '';
$rep[24] = '';
$rep[25] = '';
$rep[26] = '';
$rep[27] = '';
$rep[28] = '';
$rep[29] = '';
$rep[30] = '';
$rep[31] = '';
$rep[32] = '';
$rep[33] = '<br />';
$rep[34] = '""';
$rep[35] = '';
$rep[36] = '';
$rep[37] = '';
$rep[38] = '$1';
$rep[39] = '$1';
$rep[40] = ';';
$rep[41] = ';';
$rep[42] = '"';
$rep[43] = '<li>';
$rep[44] = '';
if($delstyles===true){
$pat[50] = '/ style=".*?"/';
$rep[50] = '';
}
ksort($pat);
ksort($rep);
return $htm;
}
Hope it helps, critics are more than welcome.
23-Dec-2005 04:08
Here is a regular expression to "slashdotify" html links. This has worked well for me, but if anyone spots errors, feel free to make corrections.
<?php
$url = '<a attr="garbage" href="http://us3.php.net/preg_replace">preg_replace - php.net</a>';
$url = preg_replace( '/<.*href="?(.*:\/\/)?([^ \/]*)([^ >"]*)"?[^>]*>(.*)(<\/a>)/', '<a href="$1$2$3">$4</a> [$2]', $url );
?>
Will output:
<a href="http://us3.php.net/preg_replace">preg_replace - php.net</a> [us3.php.net]
21-Dec-2005 05:53
This is an addition to the previously sent removeEvilTags function. If you don't want to remove the style tag entirely, just certain style attributes within that, then you might find this piece of code useful:
<?php
function removeEvilStyles($tagSource)
{
// this will leave everything else, but:
$evilStyles = array('font', 'font-family', 'font-face', 'font-size', 'font-size-adjust', 'font-stretch', 'font-variant');
$find = array();
$replace = array();
foreach ($evilStyles as $v)
{
$find[] = "/$v:.*?;/";
$replace[] = '';
}
return preg_replace($find, $replace, $tagSource);
}
function removeEvilTags($source)
{
$allowedTags = '<h1><h2><h3><h4><h5><a><img><label>'.
'<p><br><span><sup><sub><ul><li><ol>'.
'<table><tr><td><th><tbody><div><hr><em><b><i>';
$source = strip_tags(stripslashes($source), $allowedTags);
return trim(preg_replace('/<(.*?)>/ie', "'<'.removeEvilStyles('\\1').'>'", $source));
}
?>
18-Dec-2005 01:13
to remove Bulletin Board Code (remove bbcode)
$body = preg_replace("[\[(.*?)\]]", "", $body);
09-Dec-2005 04:16
Escaping quotes may be very tricky. Magic quotes and preg_quote are not protected against double escaping. This means that an escaped quote will get a double backslash, or even more. preg_quote ("I\'m using regex") will return "I\\'m using regex".
The following example escapes only unescaped single quotes:
<?php
$a = "I'm using regex";
$b = "I\'m using regex";
$patt = "/(?<!\\\)\'/";
$repl = "\\'";
print "a: ".preg_replace ($patt, $repl, $a)."\n";
print "b: ".preg_replace ($patt, $repl, $b)."\n";
?>
and prints:
a: I\'m using regex
b: I\'m using regex
Remark: matching a backslashe require a triple backslash (\\\).
16-Aug-2005 04:00
Here are two functions to trim a string down to a certain size.
"wordLimit" trims a string down to a certain number of words, and adds an ellipsis after the last word, or returns the string if the limit is larger than the number of words in the string.
"stringLimit" trims a string down to a certain number of characters, and adds an ellipsis after the last word, without truncating any words in the middle (it will instead leave it out), or returns the string if the limit is larger than the string size. The length of a string will INCLUDE the length of the ellipsis.
<?php
function wordLimit($string, $length = 50, $ellipsis = '...') {
return count($words = preg_split('/\s+/', ltrim($string), $length + 1)) > $length ?
rtrim(substr($string, 0, strlen($string) - strlen(end($words)))) . $ellipsis :
$string;
}
function stringLimit($string, $length = 50, $ellipsis = '...') {
return strlen($fragment = substr($string, 0, $length + 1 - strlen($ellipsis))) < strlen($string) + 1 ?
preg_replace('/\s*\S*$/', '', $fragment) . $ellipsis : $string;
}
echo wordLimit(' You can limit a string to only so many words.', 6);
// Output: "You can limit a string to..."
echo stringLimit('Or you can limit a string to a certain amount of characters.', 32);
// Output: "Or you can limit a string to..."
?>
25-Apr-2005 03:04
Just a note for all FreeBSD users wondering why this function is not present after installing php / mod_php (4 and 5) from ports.
Remember to install:
/usr/ports/devel/php4-pcre (or 5 for -- 5 ;)
That's all... enjoy - and save 30 mins. like I could have used :D
19-Feb-2005 06:04
It took me a while to figure this one out, but here is a nice way to use preg_replace to convert a hex encoded string back to clear text
<?php
$text = "PHP rocks!";
$encoded = preg_replace(
"'(.)'e"
,"dechex(ord('\\1'))"
,$text
);
print "ENCODED: $encoded\n";
?>
ENCODED: 50485020726f636b7321
<?php
print "DECODED: ".preg_replace(
"'([\S,\d]{2})'e"
,"chr(hexdec('\\1'))"
,$encoded)."\n";
?>
DECODED: PHP rocks!
15-Feb-2005 01:56
on the topic of implementing forum code ([b][/b] to <b></b> etc), i found this worked well...
<?php
$body = preg_replace('/\[([biu])\]/i', '<\\1>', $body);
$body = preg_replace('/\[\/([biu])\]/i', '</\\1>', $body);
?>
First line replaces [b] [B] [i] [I] [u] [U] with the appropriate html tags(<b>, <i>, <u>)
Second one does the same for closing tags...
For urls, I use...
<?php
$body = preg_replace('/\s(\w+:\/\/)(\S+)/', ' <a href="\\1\\2" target="_blank">\\1\\2</a>', $body);
?>
and for urls starting with www., i use...
<?php
$body = preg_replace('/\s(www\.)(\S+)/', ' <a href="http://\\1\\2" target="_blank">\\1\\2</a>', $body);
?>
Pop all these lines into a function that receives and returns the text you want 'forum coded' and away you go:)
30-Jan-2005 08:25
A better way for link & email conversaion, i think. :)
<?php
function change_string($str)
{
$str = trim($str);
$str = htmlspecialchars($str);
$str = preg_replace('#(.*)\@(.*)\.(.*)#','<a href="mailto:\\1@\\2.\\3">Send email</a>',$str);
$str = preg_replace('=([^\s]*)(www.)([^\s]*)=','<a href="http://\\2\\3" target=\'_new\'>\\2\\3</a>',$str);
return $str;
}
?>
26-Jan-2005 12:28
note the that if you want to replace all backslashes in a string with double backslashes (like addslashes() does but just for backslashes and not quotes, etc), you'll need the following:
$new = preg_replace('/\\\\/','\\\\\\\\',$old);
note the pattern uses 4 backslashes and the replacement uses 8! the reason for 4 slashses in the pattern part has already been explained on this page, but nobody has yet mentioned the need for the same logic in the replacement part in which backslashes are also doubly parsed, once by PHP and once by the PCRE extension. so the eight slashes break down to four slashes sent to PCRE, then two slashes put in the final output.
21-Jan-2005 07:05
Here is a more secure version of the link conversion code which hopefully make cross site scripting attacks more difficult.
<?php
function convert_links($str) {
$replace = <<<EOPHP
'<a href="'.htmlentities('\\1').htmlentities('\\2').//remove line break
'">'.htmlentities('\\1').htmlentities('\\2').'</a>'
EOPHP;
$str = preg_replace('#(http://)([^\s]*)#e', $replace, $str);
return $str;
}
?>
22-Oct-2004 04:22
I needed to treat exclusively long urls and not shorter ones for which my client prefered to have their complete addresses displayed. Here's the function I end up with:
<?php
function auto_url($txt){
# (1) catch those with url larger than 71 characters
$pat = '/(http|ftp)+(?:s)?:(\\/\\/)'
.'((\\w|\\.)+)(\\/)?(\\S){71,}/i';
$txt = preg_replace($pat, "<a href=\"\\0\" target=\"_blank\">$1$2$3/...</a>",
$txt);
# (2) replace the other short urls provided that they are not contained inside an html tag already.
$pat = '/(?<!href=\")(http|ftp)+(s)?:' .
.'(\\/\\/)((\\w|\\.)+) (\\/)?(\\S)/i';
$txt = preg_replace($pat,"<a href=\"$0\" target=\"_blank\">$0</a> ",
$txt);
return $txt;
}
?>
Note the negative look behind expression added in the second instance for exempting those that are preceded with ' href=" ' (meaning that they were already put inside appropiate html tags by the previous expression)
(get rid of the space between question mark and the last parenthesis group in both regex, I need to put it like that to be able to post this comment)
19-Oct-2004 04:39
It is useful to note that the 'limit' parameter, when used with 'pattern' and 'replace' which are arrays, applies to each individual pattern in the patterns array, and not the entire array.
<?php
$pattern = array('/one/', '/two/');
$replace = array('uno', 'dos');
$subject = "test one, one two, one two three";
echo preg_replace($pattern, $replace, $subject, 1);
?>
If limit were applied to the whole array (which it isn't), it would return:
test uno, one two, one two three
However, in reality this will actually return:
test uno, one dos, one two three
19-Mar-2004 10:00
Using preg_rep to return extracts without breaking the middle of words
(useful for search results)
<?php
$string = "Don't split words";
echo substr($string, 0, 10); // Returns "Don't spli"
$pattern = "/(^.{0,10})(\W+.*$)/";
$replacement = "\${1}";
echo preg_replace($pattern, $replacement, $string); // Returns "Don't"
?>
25-Feb-2004 05:02
I noticed that a lot of talk here is about parsing URLs. Try the
parse_url() function in PHP to make things easier.
http://www.php.net/manual/en/function.parse-url.php
- J.
09-Feb-2004 01:45
People using the /e modifier with preg_replace should be aware of the following weird behaviour. It is not a bug per se, but can cause bugs if you don't know it's there.
The example in the docs for /e suffers from this mistake in fact.
With /e, the replacement string is a PHP expression. So when you use a backreference in the replacement expression, you need to put the backreference inside quotes, or otherwise it would be interpreted as PHP code. Like the example from the manual for preg_replace:
preg_replace("/(<\/?)(\w+)([^>]*>)/e",
"'\\1'.strtoupper('\\2').'\\3'",
$html_body);
To make this easier, the data in a backreference with /e is run through addslashes() before being inserted in your replacement expression. So if you have the string
He said: "You're here"
It would become:
He said: \"You\'re here\"
...and be inserted into the expression.
However, if you put this inside a set of single quotes, PHP will not strip away all the slashes correctly! Try this:
print ' He said: \"You\'re here\" ';
Output: He said: \"You're here\"
This is because the sequence \" inside single quotes is not recognized as anything special, and it is output literally.
Using double-quotes to surround the string/backreference will not help either, because inside double-quotes, the sequence \' is not recognized and also output literally. And in fact, if you have any dollar signs in your data, they would be interpreted as PHP variables. So double-quotes are not an option.
The 'solution' is to manually fix it in your expression. It is easiest to use a separate processing function, and do the replacing there (i.e. use "my_processing_function('\\1')" or something similar as replacement expression, and do the fixing in that function).
If you surrounded your backreference by single-quotes, the double-quotes are corrupt:
$text = str_replace('\"', '"', $text);
People using preg_replace with /e should at least be aware of this.
I'm not sure how it would be best fixed in preg_replace. Because double-quotes are a really bad idea anyway (due to the variable expansion), I would suggest that preg_replace's auto-escaping is modified to suit the placement of backreferences inside single-quotes (which seemed to be the intention from the start, but was incorrectly applied).
02-Nov-2003 09:00
Suppose you want to match '\n' (that's backslash-n, not newline). The pattern you want is not /\\n/ but /\\\\n/. The reason for this is that before the regex engine can interpret the \\ into \, PHP interprets it. Thus, if you write the first, the regex engine sees \n, which is reads as newline. Thus, you have to escape your backslashes twice: once for PHP, and once for the regex engine.
18-Oct-2003 06:37
I spent some time fighting with this, so hopefully this will help someone else.
Escaping a backslash (\) really involves not two, not three, but four backslashes to work properly.
So to match a single backslash, one should use:
preg_replace('/(\\\\)/', ...);
or to, say, escape single quotes not already escaped, one could write:
preg_replace("/([^\\\\])'/", "\$1\'", ...);
Anything else, such as the seemingly correct
preg_replace("/([^\\])'/", "\$1\'", ...);
gets evaluated as escaping the ] and resulting in an unterminated character class.
I'm not exactly clear on this issue of backslash proliferation, but it seems to involve the combination of PHP string processing and PCRE processing.