Week 6.docx
Document Details
Uploaded by LuxuryAbundance
Algonquin College
Tags
Full Transcript
**Week 6** **Introduction to GLOB patterns** Your shell has a pathname-matching (wildcard) feature that makes operating on large numbers of pathnames easy: **\$ cp a\*.. \# copy all names starting with \'a\' to the parent directory** The Unix name for wildcard pattern matching is *GLOBbing*, fro...
**Week 6** **Introduction to GLOB patterns** Your shell has a pathname-matching (wildcard) feature that makes operating on large numbers of pathnames easy: **\$ cp a\*.. \# copy all names starting with \'a\' to the parent directory** The Unix name for wildcard pattern matching is *GLOBbing*, from the idea that the pattern matches a "global" list of names. Other operating systems may call these *wildcard* characters. GLOB patterns do not only match file names. The names being matched by a GLOB pattern can be anything: files, directories, symbolic links, etc. Sometimes we say that GLOB patterns match "file names", but what we really mean is that they match any kind of name. The shell will try to expand expressions with GLOB patterns into existing pathnames in the file system and will replace them with every matching existing pathname. GLOB patterns cannot generate any names that do not exist. The GLOB patterns must always match existing names. **Definitions** **[GLOB patterns (wildcard patterns)]** Characters that the shell will try to expand to match existing pathnames in the file system. For example: below, the character **\*** is a GLOB pattern **\$ cp a\*..** **[Meta-character]** A character that has a special meaning to the shell. While regular characters like 'a' or '1' are not meta-characters, semicolons, blanks/spaces, redirection and GLOB pattern characters like '\*' (asterisk/star) are meta-characters. For example: below, spaces, ';', '\>' and '\*' are meta-characters **\$ touch a ; ls a\* \>list.txt** *You can usually turn a meta-character into a regular character by **quoting** it (more on quoting below).* *\ ***[Token (word)]** When you enter a command, the shell breaks it into small meaningful parts. Each part is called a token. A meta-character is an example for a token, commonly separating other tokens. Examples: **one two \# two tokens separated by blank (space)** **one;two \# three tokens (\';\' is a meta-character)** **cd.. \# One token (dots aren\'t meta characters)** **[Quoting]** Turning off the special meaning of meta-characters by surrounding them with single or double quotes, or putting a backslash in front of them: **\$ echo \'Single quotes hide the semicolon ; from the shell\'** **Single quotes hide the semicolon ; from the shell** **\$ echo \"Double quotes hide the semicolon ; from the shell\"** **Double quotes hide the semicolon ; from the shell** **\$ echo The backslash hides the semicolon \\; from the shell** **The backslash hides the semicolon ; from the shell** **[Hidden names]** Names that start with a leading dot ('.') are not shown by default by some commands (e.g. **ls**) Pathnames with leading periods are never matched by GLOB meta-characters unless they explicitly begin with a dot. **Shell GLOB meta-characters** These are the GLOB meta-characters recognized and processed by the shells: **Meta-character** **Explanation** -------------------- --------------------------------------------------------------- **\*** Matches **zero or more** of any characters **?** Matches **exactly one** character **\[ \]** Matches **exactly one** character listed between the brackets The shell always processes GLOB characters that it finds on the command line, even for commands that do not take pathnames. (The shell doesn't know which commands do or do not take pathnames.) For example: **\$ echo \*** The shell calls the **echo** program and gives it all the non-hidden names in the current directory, despite **echo** accepting strings of text as arguments. The shell can't know which programs want pathnames and which ones don't, it *always* expands GLOB patterns. **Using \* to match any number of any characters** As a GLOB meta-character, the asterisk **\*** matches zero or more of any character in a name, including spaces or other symbols, however **\*** never matches the leading period on a hidden name. The command **echo \*** never shows any names starting with a period. **Pattern** **Explanation** **Examples** ------------- ----------------------------------------------------- --------------------------------------------------- **\*foo** Matches non-hidden names ending with **foo** **foo** , **xxxfoo** , **123foo** **foo\*** Matches non-hidden names beginning with **foo** **foo** , **fooxxx** , **foo123** **\*foo\*** Matches non-hidden names containing **foo**anywhere **foo** , **fooxxx** , **123foo** , **ZZZfoo@@@** **\ ** **3.2 Using ? to match only one single character, any character** As a GLOB meta-character, the question mark **?** matches exactly one of any character in a name, including a space or other symbols, however **?** never matches the leading period on a hidden name. The command **echo ?** never shows the current directory name **.** that is a single dot. **Pattern** **Explanation** **Examples** ------------- ------------------------------------------------------------------- --------------------------------- **???** Matches non-hidden names that are exactly three characters long **abc** , **Tea** , **0\_0** **???\*** Matches non-hidden names that are three *or more* characters long **tomato** , **cat** , **1337** **Using \[\] to match single characters from a *list*** As a GLOB meta-character pair, the square brackets **\[\]** matches exactly one character in a name from a *list* of characters that are between the brackets. The GLOB pattern **\[abc\]** does not match the three-character name **abc**; it matches only the one-character names **a** or **b** or **c**: **\$ touch a b c abc** **\$ echo \[abc\]** **a b c** The **\[\]** pattern works like a **?** pattern where we decide which characters can be matched. The list of characters can never match the leading period on a hidden name: **echo \[.\]** never shows the current directory name **.** that is a single dot. *The **\[\]** pattern always matches exactly one character, regardless of how many characters are listed within.* Note that the GLOB patterns **\[aA\]** and **\[a\]\[A\]** are very different: - **\[aA\]** is one list - matches only one-character name, either **a** or **A** - **\[a\]\[A\]** is made of two lists - the first matches only **a** and the second matches only **A**. The whole pattern only matches **aA**. *Having a GLOB square bracket list with only one character in it, e.g. **\[a\]**, is not usually useful, rather than write **\*\[a\]bc** use the equivalent and much simpler **\*abc** that matches exactly the same names.* **Inverting selecting in \[ \]** You can have a **\[\]** select any character that **isn't** listed, by adding an **!** or **\^** immediately after the **\[**.\ For example: **\$ echo \[!abc\] \# Match any character that is not a or b or c** **Using ranges of letters in \[ \]** You can use a dash **-** to indicate a range of digits inside a **\[\]** list: **\$ touch 1 2 3 4 5** **\$ echo \[2-4\]** **2 3 4** ***IMPORTANT:*** *Don't use ranges of letters, e.g. **\[a-c\]** unless you fully understand the effects of your machine's internationalization **locale** setting. For many (most?) modern Linux machines with a modern **locale** setting (e.g. **en\_US.utf8**), the trivial character range **\[a-c\]** actually matches the five characters: **a A b B c** instead of **a b c*** ***Only use ranges of digits.*** **Using \[ \] to match case insensitive patterns** GLOB patterns are case-sensitive, and **abc\*** will not match any of the names: **ABC**, **aBc**, **Abc**, etc. If you want to match both upper-case and lower-case letters in names, make each letter into its own little two-character **\[\]** list: **\$ touch abc aBc aBC ABc ABC Abc** **\$ echo abc\*** **abc** **\$ echo \[aA\]bc** **Abc abc** **\$ echo \[aA\]\[bB\]c** **ABc Abc aBc abc** **\$ echo \[aA\]\[bB\]\[cC\]** **ABC ABc Abc aBC aBc abc** **Matching character classes** There are certain preset character classes (part of the POSIX standard) that can be used inside a list to may **any** letter belonging to a certain class: **Character Class** **Matches** --------------------- --------------------------------------- **\[:upper:\]** All uppercase letters **\[:lower:\]** All lowercase letters **\[:alpha:\]** All letters **\[:digit:\]** All digits, equivalent to **\[0-9\]** **\[:alnum:\]** All letters and digits - *A character class needs to be used **inside** a list, for example: **\[\[:upper:\]\]\*** matches all the pathnames starting with an uppercase letter.* - *There are additional character classes not listed here, you can see more in the following manual page under "Character Classes": **man 7 glob*** **\$ touch a b c A B C abc ABC 1 2 3 123 1a 2b 3c x1 y2 z3** **\$ echo \[\[:lower:\]\]** **a b c** **\$ echo \[\[:upper:\]\]** **A B C** **\$ echo \[\[:alpha:\]\]** **A B C a b c** **\$ echo \[\[:alpha:\]\]\*** **ABC B C a abc b c x1 y2 z3** **\$ echo \*\[\[:alpha:\]\]** **ABC B C a abc b c 1a 2b 3c** **Verifying GLOB patterns before using them** Until you are sure you know how the shell uses GLOB patterns to match names, use the echo or ls command to see what names are being matched (if any): \$ echo \[abc\]\* \# this verifies that the GLOB pattern works \$ ls -d \[abc\]\* \# this verifies that the GLOB pattern works Use these commands first, to avoid taking action on the wrong files with more impactful commands such as rm or mv. You can pipe the output into pagination program such as less if there are too many matches. **Make sure that the GLOB pattern is correct before you use it.** **\ **You can pipe the output into pagination program such as less if there are too many matches. **GLOBbing is always done by the shell** Like other meta-characters in the shell, GLOB pattern pathname matching is done by the shell, before the command is executed and not by the program. ** Quoting to hide GLOB meta-characters** If you do not want GLOB processing to happen, hide the GLOB characters from the shell by using quoting - surround the token with single or double quotes or precede each GLOB meta-character with a backslash: **\$ echo \*** **a b c** **\$ echo \"\*\"** **\*** **\$ echo \'\*\'** **\*** **\$ echo \\\*** **\*** **Here are several examples using quoting to hide GLOB characters:** **\$ echo \"\*\*\* Warning: assuming the worst \*\*\*\"** **\$ find /usr/bin -name \'\*ho\*\'** ***find** is one of a few commands that do support GLOB patterns. However, if you run: find /usr/bin -name \*ho\* then the shell with replace \*ho\* with matching pathnames (if there are any). This can lead to unexpected results. Always quote pathnames when passing them as arguments to programs.* **GLOB patterns subtle rules and examples** **Matching any type of pathname** **\$ mkdir a ; touch b** **\$ echo \*** **a b** **\$ rm \*** **rm: cannot remove \'a\': Is a directory** ***GLOB Patterns match any type of file*** ***GLOBbing happens on each token*** ***\$ touch a b c*** ***\$ echo \**** ***a b c*** ***\$ echo \* \**** ***a b c a b c*** ***\$ echo \* \* \**** ***a b c a b c a b c*** ***Multiple patterns results in multiple separate matches*** *The command lines below produce identical output; since, only one token is found and expanded by the shell, and \* means the same thing as \*\* or \*\*\* when there are no spaces between the meta-characters:* ***\$ touch a b c*** ***\$ echo \**** ***a b c*** ***\$ echo \*\**** ***a b c*** ***\$ echo \*\*\**** ***a b c*** *If there are no other meta-characters to separate, the GLOB pattern is composed of all GLOB pattern characters.* *\ * ***GLOB meta-characters only match implicit hidden name*** ***\$ mkdir newdir ; cd newdir*** ***\$ touch.a.ab.abc.abcde.abcdef*** ***\$ echo \**** ***\**** ***\$ echo ?*** ***?*** ***\$ echo ??*** ***??*** ***\$ echo.?*** ***...a*** ***\$ echo.??*** ***.ab*** ***\$ echo.\**** ***....a.ab.abc.abcde.abcdef*** ***\$ echo.??\**** ***.ab.abc.abcde.abcdef*** ***\$ echo \[.\]\**** ***\[.\]\**** *GLOB patterns only match hidden names if implicitly begin with a dot* ***\ *** ***Unmatched GLOB patterns are passed unchanged*** ***\$ touch someverylongfilename.txt*** ***\$ ls*** ***someverylongfilename.txt*** ***\$ cp /etc/passwd sme\* \# typing error; should be some\* !*** ***\$ ls*** ***sme\* someverylongfilename.txt \# silently created a new file name*** *If GLOB matching fails, because no names match the pattern, the GLOB pattern is passed unchanged to the command.* ***\$ mkdir empty ; cd empty*** ***\$ touch ? \* \# GLOB doesn\'t match; creates two new files*** ***\$ ls*** ***\* ?*** *Some shells (like C-Shell) do produce error messages when GLOB patterns fail, and refuse to run the command. You can optionally make BASH behave this way, too, by setting the BASH **failglob** option using **shopt** built-in command (highly recommended!). That option will make it an error to use a GLOB pattern that doesn't match anything.* ***GLOB patterns do not match or span slashes*** *GLOB characters can match any character in a pathname component, including spaces, newlines, and unprintable characters, but they do not match or cross the slashes that separate pathname components.* *If a token containing GLOB patterns has two non-adjacent slashes, all the matched existing pathnames must also have exactly two slashes:* ***\$ echo /\*/ls*** ***/bin/ls*** ***\$ echo /\*/\*lt*** ***/bin/c++filt /bin/readmult /etc/default /sbin/grub2-set-default /sbin/halt*** ***\$ echo /\[bs\]\*/\*lt*** ***/bin/c++filt /bin/readmult /sbin/grub2-set-default /sbin/halt*** ***\$ echo /\* \# matches names directly under the ROOT but no deeper*** ***\$ echo /bin/\* \# matches names directly under /bin/ but no deeper*** ***\$ echo /usr/bin/\* \# matches names directly under /usr/bin/ but no deeper*** *A GLOB pattern does not cross slashes in a pathname.* *If the token containing the GLOB pattern has N non-adjacent slashes, all the matched existing pathnames will also have exactly the same number of slashes:* ***A GLOB pattern that matches only directory names*** *Given an existing directory **dir1**, this pathname argument is valid:* ***\$ ls dir1/.*** ***\...files list here\...*** *Given file name **file1**, this similar pathname argument is **not** valid:* ***\$ ls file1/.*** ***ls: cannot access file1/: Not a directory*** *A file name **file1** cannot be used as if it were a directory. Only directory names can appear to the left of slashes in valid Unix pathnames.* *If we replace the name **dir1** in the token **dir1/.** with a GLOB pattern, as in **\*/.**, the result can only expand to be a valid pathname if the GLOB pattern matches a directory.*