CS 131.01 Processing Big Data: File Systems & Directories PDF
Document Details
Uploaded by RefinedBowenite
San José State University
Dr. Andreopoulos
Tags
Summary
These lecture notes cover file systems and directories, including basic Linux commands like chmod and find. It explains the concept of hierarchies and inodes. The summary focuses on the key concepts and commands.
Full Transcript
CS 131.01 Processing Big Data: Tools and Techniques 4. File systems and Directories San José State University Slides are adapted from slides created by Dr. Andreopoulos 1 Optional readings 5th Internet edition available online: https://linuxcommand.org/tlcl.php/ Ch. 3-4, 9 Ch. 1-4 Ch. 1-5 2...
CS 131.01 Processing Big Data: Tools and Techniques 4. File systems and Directories San José State University Slides are adapted from slides created by Dr. Andreopoulos 1 Optional readings 5th Internet edition available online: https://linuxcommand.org/tlcl.php/ Ch. 3-4, 9 Ch. 1-4 Ch. 1-5 2 Bash shortcut tip of the day Ctrl+K deletes from cursor to the end of line 3 Linux files and directories andreopo$ ls -latr / total 38 The UNIX file system is a hierarchical arrangement of directories and files. Everything starts in the directory called root, whose name is the single forward-slash /. Two filenames are automatically created whenever a new directory is created: . (called dot) .. (called dot-dot). Dot refers to the current directory, and dot-dot refers to the parent directory. In the root directory, dot-dot is the same as dot. drwxr-xr-x drwxrwxr-t drwxr-xr-x drwxr-xr-x@ drwxr-xr-x drwxr-xr-x drwxr-xr-x@ drwxr-xr-x@ drwxr-xr-x drwxr-xr-x drwxr-xr-x@ drwxr-xr-x+ dr-xr-xr-x drwxrwxr-x+ drwxr-xr-x+ dr-xr-xr-x dr-xr-xr-x 3 2 2 5 6 5 37 64 29 29 11 64 3 65 5 2 2 root root root root root root root root root root root root root root root root root wheel admin wheel wheel wheel admin wheel wheel wheel wheel wheel wheel wheel admin wheel wheel wheel 96 64 64 160 192 160 1184 2048 928 928 352 2048 4616 2080 160 1 1 Sep Feb Feb May May Aug Oct Oct Oct Oct Oct Jan Jan Jan Jan Jan Jan 26 25 25 3 3 8 12 12 23 23 23 6 25 25 27 28 28 2016 2019 2019 2019 2019 20:59 19:42 19:42 00:12 00:12 00:13 15:21 15:50 15:51 09:04 21:12 21:12 opt cores Network System private Users bin sbin .. . usr Library dev Applicat Volumes net home Try: ashish@Ashishs-MacBook-Pro Desktop % mkdir newDir ashish@Ashishs-MacBook-Pro Desktop % cd newDir ashish@Ashishs-MacBook-Pro newDir % ls -a 5 . .. Next: chmod and file permissions 6 ◻ ◻ Permissions in File Systems (chmod) There are 3 basic permissions □ Read (r) 4 Sum them up to get a number representing □ Write (w) 2 permissions for a user □ Execute (x) 1 There are 3 basic levels of users ✔ ✔ ✔ ◻ Owner Groups Others How to find the permissions: □ ls -l 7 chmod: changing permissions (1) Octal method Eg: chmod 444 hello.txt (2) Alphabets 4 means read ® 2 means write (w) 1 means execute (x) 5 means rx 6 means rw 7 means rwx 0 means none chmod [a/u/g/o +/- r/w/x] <filename> Eg: chmod ugo+r-wx hello.txt 7 Can also run recursively chmod, chown, chgrp - change permissions, owner, or group of files. Can run recursively: chmod -R 775 . 8 Polling question What permissions will chmod 755 set? (check codes on previous slide) A) B) C) D) rwx for owner, rx for group and others rwx for owner, r for group and others rw for owner, r for group and others rw for owner, rwx for group and others 9 Polling question What permissions will chmod 700 set? A) rwx for owner, no permissions for group and others B) rwx for owner, r for group and others C) rw for owner, r for group and others D) rw for owner, rwx for group and others 10 Summary → You need execute (x) permission for a directory in order to cd to the directory and access files. → You need read (r) permission for a directory to ls (list all files) under the directory. To create files under a directory you need both execute and write (w) permissions. → You can only read/write/execute the files that have r/w/x permissions set for you or your group or others. → If you have execute but not read permissions on a directory, then you need to know the exact filename you want to access under it (since you cannot ls the dir) 11 Filename Generation (globbing) • Words on the command line are expanded if they contain one of the characters “*”, “?”, “[“. • The word is replaced with a sorted list of filenames that match the given pattern. • If no matching filenames are found, the word is left unchanged. * ? […] [x-y] [!…] Matches any string (including null). Matches any single character. Matches any one of the enclosed characters. Matches any character lexically between the pair. Matches any character not enclosed. • The character “.” at the start of a filename denotes a hidden file (e.g. .bashrc) and must be matched explicitly. Polling question Which of these will find .bashrc? a) ls .[b]ashrc b) ls .[ba]ashrc c) ls .[!c]ashrc d) all match Using wildcards vs. find Wildcards: Purpose: Wildcards are used for pattern matching within a single directory or when you have an idea of the filenames you want to match based on a specific pattern. Examples: *.txt: Matches all files with a ".txt" extension in the current directory. Limitations: Wildcards only work in the current directory or with specific commands. They don't provide the ability to search for files recursively in subdirectories. 15 Find command Purpose: The find command is used for searching files and directories across the entire filesystem or within a specified directory, including subdirectories. It is more versatile for complex and recursive searches. Usage: find is typically used when you need to locate files or directories based on various criteria, such as name, type, size, modification time, and permissions. Example: find . -name testfile.txt Try: Make dir starting with newDir and use find : find . -name "newDir*" and try the type option find . -type f -name "newDir*" #search for files find . -type d -name "newDir*" #search for directories 16 17 Other useful find file commands find - find a file with a name or extension: find ./ -type f -name “*.txt” find /home -name “*.jpg” find . -type f -empty Find files in a very large file system quickly find /tmp -size +1000MB (find files of size at least 1GB) find /tmp -mtime -1 (find files modified less than 1 day ago) find /tmp -mtime +1 (find files modified more than 1 day ago) find /etc -type f -name “*.conf” (find files with names ending in .conf) find /tmp -type d (find directories under /tmp) 18 xargs through pipe (will revisit later) how- to deal with thousands of files find / -size +1000MB | xargs rm -rf find /etc -type f -name “*.temp” | xargs rm -rf The xargs command in Linux is used to read items from standard input (usually a list of items separated by spaces or newlines) and execute a specified command with those items as arguments. Its primary purpose is to handle and process large lists of arguments that might be too long or unwieldy to pass directly to a command. 19 20 Directory Tree (hierarchy) / foo bar. txt root directory bar bar foo bar. txt An Example Directory Tree Valid files (absolute pathname): /foo/bar.txt /bar/foo/bar.txt Valid directory: / /foo /bar Sub-directories /bar/bar /bar/foo/ Polling question When we log in, the UNIX places us in a directory, called a) home b) main c) parent d) current 21 Polling question When we log in, the UNIX places us in a directory, called directory a) home b) main c) parent Go to your home dir: cd cd ~ d) current 22 23 Directory hierarchy ● Each directory entry can point to a regular file or a subdirectory. ● This allows a hierarchical (treelike) organization of files in a file system / foo bar. txt root directory bar bar foo bar. txt 24 File path in the Directory hierarchy ● File path is the human-readable string of characters we use to refer to a / root directory node in directory tree ● Examples: ○ / or /foo -> valid paths ○ /foo/bar.txt -> valid path ○ /foo/bar2.txt -> invalid path foo bar. txt bar bar foo bar2.txt ● In every step, kernel should check access permissions to see if user has been granted access Polling question The root directory in UNIX-like OSs is represented by a) \ b) / c) * d) $ 25 Kudos to y’all 1. chmod u+x,g+x,o+x <filename> Last day of the month: 1. https://linuxhint.com/schedule-cron-job-run-last-day-every-month/ 1:30PM of last day 30 13 28-31 * * [ “$(date +\%d -d tomorrow)” = “01” ] 2. Just schedule on midnight of 1st day of every month 3. Have 2 crons: cron = "0 0 0 28 2 *" cron = "0 0 0 30 1,3,4,5,6,7,8,9,10,11,12 *" Polling question The root directory in UNIX-like OSs is represented by a) \ b) / c) * d) $ 26 Polling question Filenames in UNIX are case-sensitive. a) True b) False 27 Polling question UNIX imposes no rule for filename extensions - but software applications may impose rules (e.g. C compiler expects source code filenames to end with .c). a) True b) False 27 For example, you might have a file named "document.txt" or simply "document" in a UNIX system, and both would be treated as text files. The system relies on the file's contents and other attributes to determine how to handle it. Try : ashish@Ashishs-MacBook-Pro CS131 % echo "hello" > a.txt ashish@Ashishs-MacBook-Pro CS131 % echo "hello" > b ashish@Ashishs-MacBook-Pro CS131 % cat a.txt hello ashish@Ashishs-MacBook-Pro CS131 % cat b hello 29 Unix Directories are also a type of file ● Directories are a type of file on Unix-like OSs ● Directory contains a list of (file name -> inode number) mappings, named dentries ● The name of a file is kept in the directory, paired with an inode ● Directories can be thought of as “files containing lists of filenames and inode numbers” ○ There are regular files that contain data (text, pics, etc) ○ Directories contain lists of names and inode numbers. 30 A directory in UNIX is a file that contains a list of directory entries (dentries) Each directory entry contains a file name along with a structure of information (inode) describing these attributes of the file: ● ● ● ● type of file (regular data file or directory), size of the file, owner of the file, permissions for the file (whether other users may access this file), The inodes hold metadata on files and directories. More on inodes next. ● when the file was last modified. ● and also holds pointers to the data of the file 31 Directory data consists of “dentries” □ Directory contains a list of <file name, inode number> pairs. □ Each directory has two extra files .”dot” for current directory and ..”dot-dot” for parent directory ◆ For example, this directory has three files: a1, a2, a3 on-disk dentry for a directory inum | strlen | filename 5 2 . 2 3 .. 12 3 a1 13 3 a2 24 3 a3 Polling question What is a directory file? a) a directory containing data b) a directory containing details of the files and subdirectories it contains c) a directory contains files d) a directory containing data and files 32 33 34 The iNode □ inode has all of the information about a file ⬥ File type (regular file, directory, device, etc.), ⬥ Size, the number of blocks allocated to it. ⬥ Permissions (who owns the file, who can access, etc). ⬥ Modification, access time information. ⬥ Etc. 35 The iNode Size 2 2 4 4 4 4 4 4 2 2 4 4 60 4 4 4 4 12 What is this inode field for? Name file type and can this file be read/written/executed by owner/group/others? mode who owns this file? uid how many bytes are in this file? size what time was this file last accessed? time what time was this file created? ctime what time was this file last modified? mtime what time was this inode deleted? dtime which group does this file belong to? gid links_count how many hard links are there to this file? how many blocks have been allocated to this file? blocks how should ext2 use this inode? flags an OS-dependent field osd1 bloc a set of disk pointers (15 total) file version (used by NFS) kgeneration a new permissions model beyond mode bits file_acl called access control lists dir_acl faddr an unsupported field i_osd2 another OS-dependent field 36 Regular data file inode num X Directory file inode num Z Could be either a regular data file or a directory file Tying it all together Polling question In UNIX, the file name and file size are stored in the data itself. file’s a) True b) False 37 38 stat or fstat -- to see file information from the inode To view file information, you can use the command line tool stat. ◆ File system keeps this file information in an inode structure for each file. 39 Rename a file to a different name □ It is implemented as an atomic call. □ □ Example: Change from file1 to file2: Example: How to update a file atomically: $ mv file1 file2 // mv uses the system call rename() Polling question mv file1 file2 will move the file’s data to a new on disk. location a) True b) False 42 Polling question mv file1 file2 will move the file’s data to a new on disk. location a) True b) False A UNIX file data is stored separately from the file name. All the file data is stored separately in a separate area of hard disk. Thus the file name can be changed to anything without changing the file data. 43 42 Make a new directory □ mkdir: Make a directory prompt> strace mkdir mydir … mkdir(“mydir”, 0777) = 0 prompt> ⬥ When a directory is created, it is empty. ⬥ Empty directory have only how many entries? prompt> ls –a ./ ../ 43 Remove empty directory □ rmdir: Delete a directory. ◆ Require that the directory be empty. ◆ If you call rmdir to a non-empty directory, it will fail. □ Should only have “.” and “..” entries. If your directory is NOT empty, you can use rm -rf , where -r is recursive and -f is forced removal of all files and the directory itself 44 Other useful file commands cp - copy a file, makes a new file and inode object: cp file1 file2 cp -r olddir newdir : copies a directory rsync - preserve modification dates, checks bitsum: rsync -arv olddir/ newdir/ : rsyncs a directory Keep the forward slashes! They matter for rsync Device Files • All forms of I/O in UNIX go through the file interface. • To write to a Terminal screen, for instance, you just write to the appropriate device file: $ cat > /dev/tty Hi guy! • This will cause the text “Hi guy!” to appear on a screen. Note : /dev/tty is is a common character device file representing the terminal device. It allows processes to read from and write to the terminal. • The same holds true for reading and writing to disks, tapes, mice, tablets, robot arms, the computer’s ram memory, etc… 45 Polling question The most common file type on a filesystem is a) ordinary (data) file b) directory file c) device file d) both ordinary file and directory file 28