Data Science Methodology: Data Collection and Processing
30 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What library is used for scraping HTML pages in Python?

  • Chrome Developer Tools
  • Mozilla HTML elements
  • PyPDF
  • Beautiful Soup (correct)

Which module is used for handling exceptions in the provided Python code snippet?

  • `try`
  • `from urllib import urlopen`
  • `import pyPdf` (correct)
  • `from BeautifulSoup import BeautifulSoup`

What does the code snippet pdf.getPage(0).extractText() do?

  • Opens a PDF file using PyPDF
  • Handles exceptions in reading a PDF file
  • Gets the text of the first page of a PDF file (correct)
  • Extracts text from an HTML page

Which tool is mentioned for a Quick Tour of HTML in the provided text?

<p>Chrome Developer Tools (B)</p> Signup and view all the answers

What does the tag

  • represent in HTML?

<p>Bullet points list (B)</p> Signup and view all the answers

Which keyword is used to handle exceptions in Python?

<p><code>except</code> (A)</p> Signup and view all the answers

What is the primary purpose of data collection?

<p>To gather and measure information on variables of interest in an established systematic fashion (B)</p> Signup and view all the answers

Which of the following is NOT a type of sensory-based data mentioned in the text?

<p>Tactile (B)</p> Signup and view all the answers

What is the purpose of data scraping according to the text?

<p>To organize and store data in a structured format (D)</p> Signup and view all the answers

Which of the following is considered a type of 'manifest data' according to the text?

<p>Likes, Ratings, Reviews, Comments, Views, Searches (B)</p> Signup and view all the answers

Which of the following is an example of a 'proprietary data collection' mentioned in the text?

<p>Lexis-Nexis (A)</p> Signup and view all the answers

Which of the following is an example of 'bulk downloads' mentioned in the text?

<p>Wikipedia (D)</p> Signup and view all the answers

What is data scraping?

<p>A technology to extract data from websites in an automated manner (B)</p> Signup and view all the answers

What are some software tools commonly used for data scraping?

<p>Python, Ruby, Perl, Java, and web scrapers like 30 Digits and Grepsr (A)</p> Signup and view all the answers

What should be considered when scraping data from websites?

<p>Check if there is an API or downloadable data available, and respect website policies (A)</p> Signup and view all the answers

What is OCR (Optical Character Recognition) used for?

<p>Creating digital images from paper documents (C)</p> Signup and view all the answers

Which software is considered the best in class for open-source OCR?

<p>Tesseract (C)</p> Signup and view all the answers

What is Amazon's Mechanical Turk used for in data scraping?

<p>Creating Human Intensive Tasks (HITs) for data processing (A)</p> Signup and view all the answers

What is the Levenshtein distance between the strings 'intention' and 'execution' if each operation costs 2 for substitution?

<p>8 (B)</p> Signup and view all the answers

In text processing, what is the purpose of stemming and lemmatization?

<p>Transforming words to their base or root form (C)</p> Signup and view all the answers

What is an important step in preparing text data that involves removing common words like 'if, and, but, who'?

<p>Removing stop words (D)</p> Signup and view all the answers

Which percentage of rare words is typically removed in text processing depending on the application?

<p>5% (B)</p> Signup and view all the answers

What Python library is commonly used for Natural Language Processing tasks like text processing?

<p>NLTK (Natural Language Toolkit) (D)</p> Signup and view all the answers

What is the purpose of regular expressions in text processing?

<p>Performing complex search patterns (D)</p> Signup and view all the answers

What does the symbol ? represent in regular expressions?

<p>Matches the previous character zero or one time (B)</p> Signup and view all the answers

What is the purpose of the + operator in regular expressions?

<p>Matches one or more occurrences of the previous character or group (B)</p> Signup and view all the answers

What does a false positive (Type 1 error) mean in the context of regular expressions?

<p>Matching a pattern that should not have been matched (A)</p> Signup and view all the answers

What is the minimum edit distance between two strings?

<p>The minimum number of edit operations (insertion, deletion, substitution) required to convert one string to another (C)</p> Signup and view all the answers

Which of the following is not an application of edit distance?

<p>Network routing (D)</p> Signup and view all the answers

What does the .* pattern match in regular expressions?

<p>Any character zero or more times (D)</p> Signup and view all the answers

More Like This

Research Methods and Data Collection
25 questions
Research Methodology Steps and Designs
29 questions
Science Methodology and Experiments
67 questions
Use Quizgecko on...
Browser
Browser