HDFS Commands 2024 PDF
Document Details
Uploaded by LikableHarmony2263
Sri Lanka Institute of Information Technology (SLIT)
Tags
Summary
This document provides instructions for using a Cloudera VM for data processing, including practical exercises on HDFS commands. The document is part of an IT3061 course, likely in a computer science program.
Full Transcript
IT3061 – Massive Data Processing and Cloud Computing Year 3, Semester 2 Data Processing Practical 1 Instructions for using the Cloudera VM Once the file is downloaded from the shared location, go to the download folder and unz...
IT3061 – Massive Data Processing and Cloud Computing Year 3, Semester 2 Data Processing Practical 1 Instructions for using the Cloudera VM Once the file is downloaded from the shared location, go to the download folder and unzip these files. It can then be used to set up a single node Cloudera cluster. Shown below are the two virtual images of Cloudera QuickStart VM. Cloudera QuickStart VM Installation Before setting up the Cloudera Virtual Machine, you would need to have a virtual machine such as VMware or Oracle VirtualBox on your system. In this case, we are using Oracle VirtualBox to set up the Cloudera QuickStart VM. To set up the Cloudera QuickStart VM in your Oracle VirtualBox Manager, click on ‘File’ and then select ‘Import Appliance’. Choose the QuickStart VM image by looking into your downloads. Click on ‘Open’ and then ‘Next’. Now you can see the specifications, then click on ‘Import’. This will start importing the virtual disk image.vmdk file into your VM box. Once this is done, we have to change the specifications of the machines to use. Here, we are giving 2 CPU cores and 5GB RAM. Wait for a while, as the importing finishes. Once the importing is complete, you can see the Cloudera QuickStart VM on the left side panel. Follow a few steps to gain admin console access. You need to click on the terminal present on top of the desktop screen, and type in the following: hostname # This shows the hostname which will be quickstart.cloudera hdfs dfs -ls / # Checks if you have access and if your cluster is working. It displays what exists on your HDFS location by default service cloudera-scm-server status # Tells what command you have to type to use cloudera express free su - #Login as root service cloudera-scm-server status # The password for root is cloudera Once you see that your HDFS access is working fine, you can close the terminal. Then, you have to click on the following icon that says ‘Launch Cloudera Express’. Once you click on the express icon, a screen will appear with the following command: You are required to copy the command, and run it on a separate terminal. Hence, open a new terminal, and use the below command to close the Cloudera based services. It will restart the services, after which you can access your admin console. Now that our deployment has been configured, client configurations have also been deployed. Additionally, it has restarted the Cloudera Management Service, which gives access to the Cloudera QuickStart admin console with the help of a username and password. Go on and open up the browser and change the port number to 7180. You can log in to the Cloudera Manager by providing your username and password. Since Cloudera is CPU and memory intensive, it could slow down if you haven’t assigned enough RAM to the Cloudera cluster. So, it’s always recommended to stop or delete the services that you don’t need. You can go ahead and restart the services now. It will ensure that the cluster becomes accessible either by Hue as a web interface or Cloudera QuickStart Terminal, where you can write your commands. You can switch to an HDFS user, which is the admin user. This usually does not have a password unless you have set it. Now, you can type any HDFS command in the terminal, which will give the output. Please practice the following hdfs commands in Hadoop 1. Create two directories called "data" and “data_copy” in HDFS hdfs dfs -mkdir -p /data hdfs dfs -mkdir -p /data_copy -p: creates parent directories if they do not exist 2. Upload a file from the local file system to the "/data" directory in HDFS hdfs dfs -put -f /home/cloudera/eclipse/about.html /data -f: overwrites the destination if it already exists 3. List the contents of the "/data" directory in HDFS hdfs dfs -ls /data 4. Copy a file from one location in HDFS to another location within HDFS hdfs dfs -cp -f /data/about.html /data_copy/about_2.html 5. Delete a file from HDFS hdfs dfs -rm -f /data_copy/about_2.html -f: ignores non-existent files and does not prompt for confirmation -r: deletes directories recursively 6. Display the content of a file stored in HDFS hdfs dfs -cat /data/about.html 7. At the end, clean up HDFS hdfs dfs -rm -r -f /data hdfs dfs -rm -r -f /data_copy