LYCOS RETRIEVER
Regular Expression
built 634 days ago
Even these simple examples testify to the power of regular expressions. In the first instance, you've copied all the files which end in ".html" (as opposed to copying them one by one); in the second, you've conducted a search not only for "
Source:
By now you've probably noticed that regular expressions are a very compact notation, but they're not terribly readable. REs of moderate complexity can become lengthy collections of backslashes, parentheses, and metacharacters, making them difficult to read and understand.
Source:
The formal definition of regular expressions is purposely parsimonious and avoids defining the redundant quantifiers ? and +, which can be expressed as follows: a+ = aa*, and a? = (a|ε). Sometimes the complement operator ~ is added; ~R denotes the set of all strings over Σ* that are not in R. The complement operator is redundant, as it can always be expressed by using the other operators (although the process for computing such a representation is complex, and the result may be exponentially larger).
Source:
The regular expression below is one way to effect this. Assume the HTML code for your hyperlinks is in a variable "$links", and "$file" is the name of the current webpage(such as "index.html" or "all.html").
Source:
One of the most important parts of understanding regular expressions is knowing about the "as few as possible" quantifiers. Many times, your regular expression will "overshoot" the text you are trying to capture. For example, let's say you want to capture all text inside the parenthesis if you have a string:
Source:
Regular expressions tend to be easier to write than they are to read. This is less of a problem if you are the only one who ever needs to maintain the program (or sed routine, or shell script, or what have you), but if several people need to watch over it, the syntax can turn into more of a hindrance than an aid.
Source: