Summary

This document is a chapter from a Java programming textbook. It provides an overview of the String class, StringBuilders, characters and regular expressions for validation. The text is part of the Java How to Program textbook, 10th edition.

Full Transcript

Java How to Program, 10/e © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  This...

Java How to Program, 10/e © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  This chapter discusses class String, class StringBuilder and class Character from the java.lang package.  These classes provide the foundation for string and character manipulation in Java.  The chapter also discusses regular expressions that provide applications with the capability to validate input. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  A program may contain character literals.  An integer value represented as a character in single quotes.  The value of a character literal is the integer value of the character in the Unicode character set.  String literals (stored in memory as String objects) are written as a sequence of characters in double quotation marks. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Class String is used to represent strings in Java.  The next several subsections cover many of class String’s capabilities. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  No-argument constructor creates a String that contains no characters (i.e., the empty string, which can also be represented as "") and has a length of 0.  Constructor that takes a String object copies the argument into the new String.  Constructor that takes a char array creates a String containing a copy of the characters in the array.  Constructor that takes a char array and two integers creates a String containing the specified portion of the array. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  String method length determines the number of characters in a string.  String method charAt returns the character at a specific position in the String.  String method getChars copies the characters of a String into a character array.  The first argument is the starting index in the String from which characters are to be copied.  The second argument is the index that is one past the last character to be copied from the String.  The third argument is the character array into which the characters are to be copied.  The last argument is the starting index where the copied characters are placed in the target character array. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Strings are compared using the numeric codes of the characters in the strings.  Figure 14.3 demonstrates String methods equals, equalsIgnoreCase, compareTo and regionMatches and using the equality operator == to compare String objects. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Method equals tests any two objects for equality  The method returns true if the contents of the objects are equal, and false otherwise.  Uses a lexicographical comparison.  When primitive-type values are compared with ==, the result is true if both values are identical.  When references are compared with ==, the result is true if both references refer to the same object in memory.  Java treats all string literal objects with the same contents as one String object to which there can be many references. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  String method equalsIgnoreCase ignores whether the letters in each String are uppercase or lowercase when performing the comparison.  Method compareTo is declared in the Comparable interface and implemented in the String class.  Returns 0 if the Strings are equal, a negative number if the String that invokes compareTo is less than the String that is passed as an argument and a positive number if the String that invokes compareTo is greater than the String that is passed as an argument. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Method regionMatches compares portions of two Strings for equality.  The first argument to this version of the method is the starting index in the String that invokes the method.  The second argument is a comparison String.  The third argument is the starting index in the comparison String.  The last argument is the number of characters to compare.  Five-argument version of method regionMatches:  When the first argument is true, the method ignores the case of the characters being compared.  The remaining arguments are identical to those described for the four-argument regionMatches method. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  String methods startsWith and endsWith determine whether strings start with or end with a particular set of characters © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Figure 14.5 demonstrates the many versions of String methods indexOf and lastIndexOf that search for a specified character or substring in a String. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Method indexOf locates the first occurrence of a character in a String. If the method finds the character, it returns the character’s index in the String—otherwise, it returns –1.  A second version of indexOf takes two integer arguments—the character and the starting index at which the search of the String should begin.  Method lastIndexOf locates the last occurrence of a character in a String. The method searches from the end of the String toward the beginning. If it finds the character, it returns the character’s index in the String—otherwise, it returns –1.  A second version of lastIndexOf takes two integer arguments—the integer representation of the character and the index from which to begin searching backward.  There are also versions of these methods that search for substrings in Strings. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Class String provides two substring methods to enable a new String object to be created by copying part of an existing String object. Each method returns a new String object.  The version that takes one integer argument specifies the starting index in the original String from which characters are to be copied.  The version that takes two integer arguments receives the starting index from which to copy characters in the original String and the index one beyond the last character to copy. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  String method concat concatenates two String objects (similar to using the + operator) and returns a new String object containing the characters from both original Strings.  The original Strings to which s1 and s2 refer are not modified. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Method replace returns a new String object in which every occurrence of the first char argument is replaced with the second.  An overloaded version enables you to replace substrings rather than individual characters.  Method toUpperCase generates a new String with uppercase letters.  Method toLowerCase returns a new String object with lowercase letters.  Method trim generates a new String object that removes all whitespace characters that appear at the beginning or end of the String on which trim operates.  Method toCharArray creates a new character array containing a copy of the characters in the String. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Class String provides static valueOf methods that take an argument of any type and convert it to a String object.  Class StringBuilder is used to create and manipulate dynamic string information.  Every StringBuilder is capable of storing a number of characters specified by its capacity.  If the capacity of a StringBuilder is exceeded, the capacity expands to accommodate the additional characters. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  We now discuss the features of class StringBuilder for creating and manipulating dynamic string information—that is, modifiable strings.  Every StringBuilder is capable of storing a number of characters specified by it’s capacity.  If a StringBuilder‘s capacity is exceeded, the capacity expands to accommodate additional characters. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  No-argument constructor creates a StringBuilder with no characters in it and an initial capacity of 16 characters.  Constructor that takes an integer argument creates a StringBuilder with no characters in it and the initial capacity specified by the integer argument.  Constructor that takes a String argument creates a StringBuilder containing the characters in the String argument. The initial capacity is the number of characters in the String argument plus 16.  Method toString of class StringBuilder returns the StringBuilder contents as a String. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Methods length and capacity return the number of characters currently in a StringBuilder and the number of characters that can be stored in a without allocating more memory, respectively.  Method ensureCapacity guarantees that a StringBuilder has at least the specified capacity.  Method setLength increases or decreases the length of a StringBuilder.  If the specified length is less than the current number of characters, the buffer is truncated to the specified length.  If the specified length is greater than the number of characters, null characters are appended until the total number of characters in the StringBuilder is equal to the specified length. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Method charAt takes an integer argument and returns the character in the StringBuilder at that index.  Method getChars copies characters from a StringBuilder into the character array argument.  Four arguments—the starting index from which characters should be copied, the index one past the last character to be copied, the character array into which the characters are to be copied and the starting location in the character array where the first character should be placed.  Method setCharAt takes an integer and a character argument and sets the character at the specified position in the StringBuilder to the character argument.  Method reverse reverses the contents of the StringBuilder. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Overloaded append methods allow values of various types to be appended to the end of a StringBuilder.  Versions are provided for each of the primitive types and for character arrays, Strings, Objects, and more. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  The compiler can use StringBuilder and the append methods to implement the + and += String concatenation operators. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Overloaded insert methods insert values of various types at any position in a StringBuilder.  Versions are provided for the primitive types and for character arrays, Strings, Objects and CharSequences.  Each method takes its second argument, converts it to a String and inserts it at the index specified by the first argument.  Methods delete and deleteCharAt delete characters at any position in a StringBuilder.  Method delete takes two arguments—the starting index and the index one past the end of the characters to delete.  Method deleteCharAt takes one argument—the index of the character to delete. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Eight type-wrapper classes that enable primitive-type values to be treated as objects:  Boolean, Character, Double, Float, Byte, Short, Integer and Long  Most Character methods are static methods designed for convenience in processing individual char values. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Method isDefined determines whether a character is defined in the Unicode character set.  Method isDigit determines whether a character is a defined Unicode digit.  Method isJavaIdentifierStart determines whether a character can be the first character of an identifier in Java—that is, a letter, an underscore (_) or a dollar sign ($).  Method isJavaIdentifierPart determine whether a character can be used in an identifier in Java—that is, a digit, a letter, an underscore (_) or a dollar sign ($). © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Method isLetter determines whether a character is a letter.  Method isLetterOrDigit determines whether a character is a letter or a digit.  Method isLowerCase determines whether a character is a lowercase letter.  Method isUpperCase determines whether a character is an uppercase letter.  Method toUpperCase converts a character to its uppercase equivalent.  Method toLowerCase converts a character to its lowercase equivalent. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Methods digit and forDigit convert characters to digits and digits to characters, respectively, in different number systems.  Common number systems: decimal (base 10), octal (base 8), hexadecimal (base 16) and binary (base 2).  The base of a number is also known as its radix.  For more information on conversions between number systems, see Appendix I. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Character method forDigit converts its first argument into a character in the number system specified by its second argument.  Character method digit converts its first argument into an integer in the number system specified by its second argument.  The radix (second argument) must be between 2 and 36, inclusive. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Java automatically converts char literals into Character objects when they are assigned to Character variables  Process known as autoboxing.  Method charValue returns the char value stored in the object.  Method toString returns the String representation of the char value stored in the object.  Method equals determines if two Characters have the same contents. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning.  Compilers also perform tokenization.  String method split breaks a String into its component tokens and returns an array of Strings.  Tokens are separated by delimiters  Typically white-space characters such as space, tab, newline and carriage return.  Other characters can also be used as delimiters to separate tokens. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  A regular expression is a specially formatted String that describes a search pattern for matching characters in other Strings.  Useful for validating input and ensuring that data is in a particular format.  One application of regular expressions is to facilitate the construction of a compiler.  Often, a large and complex regular expression is used to validate the syntax of a program.  If the program code does not match the regular expression, the compiler knows that there is a syntax error within the code. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  String method matches receives a String that specifies the regular expression and matches the contents of the String object on which it’s called to the regular expression.  The method returns a boolean indicating whether the match succeeded.  A regular expression consists of literal characters and special symbols. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Figure 14.19 specifies some predefined character classes that can be used with regular expressions.  A character class is an escape sequence that represents a group of characters.  A digit is any numeric character.  A word character is any letter (uppercase or lowercase), any digit or the underscore character.  A white-space character is a space, a tab, a carriage return, a newline or a form feed.  Each character class matches a single character in the String we’re attempting to match with the regular expression.  Regular expressions are not limited to predefined character classes.  The expressions employ various operators and other forms of notation to match complex patterns. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  To match a set of characters that does not have a predefined character class, use square brackets, [].  The pattern "[aeiou]" matches a single character that’s a vowel.  Character ranges are represented by placing a dash (-) between two characters.  "[A-Z]" matches a single uppercase letter.  If the first character in the brackets is "^", the expression accepts any character other than those indicated.  "[^Z]" is not the same as "[A-Y]", which matches uppercase letters A–Y—"[^Z]" matches any character other than capital Z, including lowercase letters and nonletters such as the newline character. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Ranges in character classes are determined by the letters’ integer values.  "[A-Za-z]" matches all uppercase and lowercase letters.  The range "[A-z]" matches all letters and also matches those characters (such as [ and \) with an integer value between uppercase Z and lowercase a.  Like predefined character classes, character classes delimited by square brackets match a single character in the search object. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  When the regular-expression operator "*" appears in a regular expression, the application attempts to match zero or more occurrences of the subexpression immediately preceding the "*".  Operator "+" attempts to match one or more occurrences of the subexpression immediately preceding "+".  The character "|" matches the expression to its left or to its right.  "Hi (John|Jane)" matches both "Hi John" and "Hi Jane".  Parentheses are used to group parts of the regular expression. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  The asterisk (*) and plus (+) are formally called quantifiers.  Figure 14.22 lists all the quantifiers.  A quantifier affects only the subexpression immediately preceding the quantifier.  Quantifier question mark (?) matches zero or one occurrences of the expression that it quantifies.  A set of braces containing one number ({n}) matches exactly n occurrences of the expression it quantifies.  Including a comma after the number enclosed in braces matches at least n occurrences of the quantified expression.  A set of braces containing two numbers ({n,m}), matches between n and m occurrences of the expression that it qualifies. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Quantifiers may be applied to patterns enclosed in parentheses to create more complex regular expressions.  All of the quantifiers are greedy.  They match as many occurrences as they can as long as the match is still successful.  If a quantifier is followed by a question mark (?), the quantifier becomes reluctant (sometimes called lazy).  It will match as few occurrences as possible as long as the match is still successful.  String Method matches checks whether an entire String conforms to a regular expression. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Sometimes it’s useful to replace parts of a string or to split a string into pieces. For this purpose, class String provides methods replaceAll, replaceFirst and split. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  String method replaceAll replaces text in a String with new text (the second argument) wherever the original String matches a regular expression (the first argument).  Escaping a special regular-expression character with \ instructs the matching engine to find the actual character.  String method replaceFirst replaces the first occurrence of a pattern match. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  In addition to the regular-expression capabilities of class String, Java provides other classes in package java.util.regex that help developers manipulate regular expressions.  Class Pattern represents a regular expression.  Class Matcher contains both a regular-expression pattern and a CharSequence in which to search for the pattern.  CharSequence (package java.lang) is an interface that allows read access to a sequence of characters.  The interface requires that the methods charAt, length, subSequence and toString be declared.  Both String and StringBuilder implement interface CharSequence, so an instance of either of these classes can be used with class Matcher. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  If a regular expression will be used only once, static Pattern method matches can be used.  Takes a String that specifies the regular expression and a CharSequence on which to perform the match.  Returns a boolean indicating whether the search object (the second argument) matches the regular expression. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  If a regular expression will be used more than once, it’s more efficient to use static Pattern method compile to create a specific Pattern object for that regular expression.  Receives a String representing the pattern and returns a new Pattern object, which can then be used to call method matcher  Method matcher receives a CharSequence to search and returns a Matcher object.  Matcher method matches performs the same task as Pattern method matches, but receives no arguments— the search pattern and search object are encapsulated in the Matcher object.  Class Matcher provides other methods, including find, lookingAt, replaceFirst and replaceAll. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  The dot character "." in a regular expression matches any single character except a newline character.  Matcher method find attempts to match a piece of the search object to the search pattern.  Each call to this method starts at the point where the last call ended, so multiple matches can be found.  Matcher method lookingAt performs the same way, except that it always starts from the beginning of the search object and will always find the first match if there is one. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.  Matcher method group returns the String from the search object that matches the search pattern.  The String that is returned is the one that was last matched by a call to find or lookingAt.  As you’ll see in Section 17.7, you can combine regular- expression processing with Java SE 8 lambdas and streams to implement powerful String-and-file processing applications. © Copyright 1992-2015 by Pearson Education, Inc. All Rights Reserved.

Use Quizgecko on...
Browser
Browser