Tag Archives: Regular expressions

Determining if a Group is in the Path with Regular expressions

Introduction

Note: See this article about working with Regular Expressions.

// Compile regular expression with a back reference to group 1
String patternStr = "<(\S+?).*?>(.*?)</\1>";
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher("");

// Set the input
matcher.reset("xx <tag a=b> yy </tag> zz");j

// Get tagname and contents of tag
boolean matchFound = matcher.find();   // true
String tagname = matcher.group(1);     // tag
String contents = matcher.group(2);    //  yy

matcher.reset("xx <tag> yy </tag0>");
matchFound = matcher.find();           // false

 

Handling escape characters in Regular expressions



Introduction

The backslash (‘\’) is used to escape characters with a special meaning in Java. Some examples of its use are:

  • \n – New Line
  • \t – Tab
  • \r – Carriage Return

When you want to use the backslash in itself (for example in a Windows Path, you need to escape it with another backslash. For example the Path C:\Program Files needs to be written as C:\\Program Files.

In Regular Expressions you can also use the backslash, for example to match a backslash. In Regular Expression however the backslash also is a special character. So to match a backslash in a Java Regular Expression you should type it 3 times, one because it’s for Java, one to escape the regex and the backslash itself.

To replace a backslash in a String with a space:

String str = "Hello\Regex";
String replaced = str.replacaAll("\\", " ");
//Test if it worked:
System.out.println(replaced);
//Output: 'Hello Regex'

 

Some other examples:

String patternStr = "i.e.";

boolean matchFound = Pattern.matches(patternStr, "i.e.");// true
matchFound = Pattern.matches(patternStr, "ibex");        // true

// Quote the pattern; i.e. surround with Q and E
matchFound = Pattern.matches("\Q"+patternStr+"\E", "i.e.");  // true
matchFound = Pattern.matches("\Q"+patternStr+"\E", "ibex");  // false

//  the pattern
patternStr = escapeRE(patternStr);                       // i.e.

matchFound = Pattern.matches(patternStr, "i.e.");        // true
matchFound = Pattern.matches(patternStr, "ibex");        // false

// Returns a pattern where all punctuation characters are escaped.
static Pattern escaper = Pattern.compile("([^a-zA-z0-9])");
public static String escapeRE(String str) {
    return escaper.matcher(str).replaceAll("\\$1");
}

For more information on Regular Expressions see my article about grep.