🎧 New: AI-Generated Podcasts Turn your study notes into engaging audio conversations. Learn more

Data Science Methodology: Data Collection and Processing
30 Questions
0 Views

Data Science Methodology: Data Collection and Processing

Created by
@AmiablePearTree

Podcast Beta

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What library is used for scraping HTML pages in Python?

  • Chrome Developer Tools
  • Mozilla HTML elements
  • PyPDF
  • Beautiful Soup (correct)
  • Which module is used for handling exceptions in the provided Python code snippet?

  • `try`
  • `from urllib import urlopen`
  • `import pyPdf` (correct)
  • `from BeautifulSoup import BeautifulSoup`
  • What does the code snippet pdf.getPage(0).extractText() do?

  • Opens a PDF file using PyPDF
  • Handles exceptions in reading a PDF file
  • Gets the text of the first page of a PDF file (correct)
  • Extracts text from an HTML page
  • Which tool is mentioned for a Quick Tour of HTML in the provided text?

    <p>Chrome Developer Tools</p> Signup and view all the answers

    What does the tag

    • represent in HTML?

    <p>Bullet points list</p> Signup and view all the answers

    Which keyword is used to handle exceptions in Python?

    <p><code>except</code></p> Signup and view all the answers

    What is the primary purpose of data collection?

    <p>To gather and measure information on variables of interest in an established systematic fashion</p> Signup and view all the answers

    Which of the following is NOT a type of sensory-based data mentioned in the text?

    <p>Tactile</p> Signup and view all the answers

    What is the purpose of data scraping according to the text?

    <p>To organize and store data in a structured format</p> Signup and view all the answers

    Which of the following is considered a type of 'manifest data' according to the text?

    <p>Likes, Ratings, Reviews, Comments, Views, Searches</p> Signup and view all the answers

    Which of the following is an example of a 'proprietary data collection' mentioned in the text?

    <p>Lexis-Nexis</p> Signup and view all the answers

    Which of the following is an example of 'bulk downloads' mentioned in the text?

    <p>Wikipedia</p> Signup and view all the answers

    What is data scraping?

    <p>A technology to extract data from websites in an automated manner</p> Signup and view all the answers

    What are some software tools commonly used for data scraping?

    <p>Python, Ruby, Perl, Java, and web scrapers like 30 Digits and Grepsr</p> Signup and view all the answers

    What should be considered when scraping data from websites?

    <p>Check if there is an API or downloadable data available, and respect website policies</p> Signup and view all the answers

    What is OCR (Optical Character Recognition) used for?

    <p>Creating digital images from paper documents</p> Signup and view all the answers

    Which software is considered the best in class for open-source OCR?

    <p>Tesseract</p> Signup and view all the answers

    What is Amazon's Mechanical Turk used for in data scraping?

    <p>Creating Human Intensive Tasks (HITs) for data processing</p> Signup and view all the answers

    What is the Levenshtein distance between the strings 'intention' and 'execution' if each operation costs 2 for substitution?

    <p>8</p> Signup and view all the answers

    In text processing, what is the purpose of stemming and lemmatization?

    <p>Transforming words to their base or root form</p> Signup and view all the answers

    What is an important step in preparing text data that involves removing common words like 'if, and, but, who'?

    <p>Removing stop words</p> Signup and view all the answers

    Which percentage of rare words is typically removed in text processing depending on the application?

    <p>5%</p> Signup and view all the answers

    What Python library is commonly used for Natural Language Processing tasks like text processing?

    <p>NLTK (Natural Language Toolkit)</p> Signup and view all the answers

    What is the purpose of regular expressions in text processing?

    <p>Performing complex search patterns</p> Signup and view all the answers

    What does the symbol ? represent in regular expressions?

    <p>Matches the previous character zero or one time</p> Signup and view all the answers

    What is the purpose of the + operator in regular expressions?

    <p>Matches one or more occurrences of the previous character or group</p> Signup and view all the answers

    What does a false positive (Type 1 error) mean in the context of regular expressions?

    <p>Matching a pattern that should not have been matched</p> Signup and view all the answers

    What is the minimum edit distance between two strings?

    <p>The minimum number of edit operations (insertion, deletion, substitution) required to convert one string to another</p> Signup and view all the answers

    Which of the following is not an application of edit distance?

    <p>Network routing</p> Signup and view all the answers

    What does the .* pattern match in regular expressions?

    <p>Any character zero or more times</p> Signup and view all the answers

    More Quizzes Like This

    Use Quizgecko on...
    Browser
    Browser