Data Visualization and Analysis using Python
40 Questions
0 Views

Data Visualization and Analysis using Python

Created by
@DetachableSense3100

Podcast Beta

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the purpose of np.where in the context of the provided program?

  • To combine multiple arrays into one
  • To find the indices of specific elements (correct)
  • To change the shape of the array
  • To repeat values in the array
  • What will be the output of np.prod(array1_e[5:]) * np.prod(array2_e[5:])?

  • The minimum value in both arrays
  • The sum of all elements in the arrays
  • The product of the second half of both arrays (correct)
  • The product of the first half of the arrays
  • Which function is used to compute the covariance in the provided example?

  • np.mean
  • np.corrcoef
  • np.cov (correct)
  • np.sum
  • What will the command series_a.sort_values() return?

    <p>A sorted series based on the values</p> Signup and view all the answers

    What type of arrays are created in the initial part of the program?

    <p>Random arrays</p> Signup and view all the answers

    What does the command np.isnan(array_c) check for in the array?

    <p>The presence of NaN values</p> Signup and view all the answers

    In comparing Array1 and Array4, which statistical measure is calculated?

    <p>Covariance</p> Signup and view all the answers

    What does the term 'index' refer to in the context of a Pandas Series?

    <p>Labels for accessing individual elements</p> Signup and view all the answers

    What method is used to sort the DataFrame based on the first column?

    <p>df.sort_values(by='A')</p> Signup and view all the answers

    What function is utilized to find the correlation between the first and second columns?

    <p>df['A'].corr(df['B'])</p> Signup and view all the answers

    Which of the following correctly identifies the output when removing duplicates from column 'A'?

    <p>df.drop_duplicates(subset='A')</p> Signup and view all the answers

    What function is used to compute the mean of a two-dimensional array along the second axis?

    <p>np.mean</p> Signup and view all the answers

    How many bins are created when discretizing the second column?

    <p>5</p> Signup and view all the answers

    Which method is appropriate for reshaping a NumPy array?

    <p>np.reshape</p> Signup and view all the answers

    What will happen if you try to reshape an array to a new shape that has a different total number of elements?

    <p>An error will occur.</p> Signup and view all the answers

    When merging two DataFrames, which method is used to find the names of students who attended both workshops?

    <p>pd.merge()</p> Signup and view all the answers

    What is the result of using pd.concat() on two DataFrames to find total records?

    <p>merged_df.shape</p> Signup and view all the answers

    In the context of the provided Python programs, which types of elements does the function np.isnan check for?

    <p>NaN values.</p> Signup and view all the answers

    How can you create a random integer array of size m x n in NumPy?

    <p>np.random.randint(0, 10, size=(m, n))</p> Signup and view all the answers

    What does pd.concat([df1, df2]).drop_duplicates(keep=False) accomplish?

    <p>Finds students who attended a single workshop only.</p> Signup and view all the answers

    Which columns are used as multi-row indexes when merging two DataFrames row-wise?

    <p>Name and Date</p> Signup and view all the answers

    When subtracting two arrays of the same size, what will be the dimensions of the resulting array?

    <p>The sizes will remain unchanged.</p> Signup and view all the answers

    What does the np.cov function compute?

    <p>Covariance between two arrays.</p> Signup and view all the answers

    What type of data does the dtype attribute of a NumPy array return?

    <p>The data type of the array elements.</p> Signup and view all the answers

    What is the method used to obtain the minimum rank of a Pandas Series?

    <p>first</p> Signup and view all the answers

    What does the 'max' method in ranking return for a Pandas Series?

    <p>The highest rank</p> Signup and view all the answers

    How can you identify the index of the minimum element in a Pandas Series?

    <p>series_b.idxmin()</p> Signup and view all the answers

    In the DataFrame creation example, how many rows are generated?

    <p>50</p> Signup and view all the answers

    What percentage of values in the DataFrame is replaced by null values?

    <p>10%</p> Signup and view all the answers

    Which function is used to count the number of missing values in the DataFrame?

    <p>df.isnull().sum()</p> Signup and view all the answers

    What is the column drop criterion based on the number of null values?

    <p>More than 5 null values</p> Signup and view all the answers

    What happens to the row with the maximum sum of all values in the DataFrame?

    <p>It is dropped</p> Signup and view all the answers

    What method is used to calculate the average monthly income of female members in the DataFrame?

    <p>mean()</p> Signup and view all the answers

    Which DataFrame method is utilized to group data by a specific attribute?

    <p>groupby()</p> Signup and view all the answers

    What is the purpose of the idxmax() function in the context provided?

    <p>To determine the index of the maximum value</p> Signup and view all the answers

    In the Titanic dataset, how is the total number of passengers under 30 determined?

    <p>By using shape on a filtered DataFrame</p> Signup and view all the answers

    How do you calculate the familywise gross monthly income in the example provided?

    <p>By summing monthly incomes for each family name</p> Signup and view all the answers

    What will the 'high_income_members' DataFrame contain?

    <p>Members with income greater than Rs. 60000.00</p> Signup and view all the answers

    What data type is the 'MonthlyIncome (Rs.)' column expected to be?

    <p>Float</p> Signup and view all the answers

    Which function would you use to load a CSV file in Pandas?

    <p>read_csv()</p> Signup and view all the answers

    Study Notes

    NumPy Programs for Data Analysis

    • Generate a 2D random integer array and calculate the mean, standard deviation, and variance along the second axis.
    • Create a 2D array of size m x n with random integers, displaying its shape, type, and data type, then reshape it to n x m according to user input.
    • For a 1D array, identify indices of elements that are zero, non-zero, and NaN, storing these indices in separate arrays.
    • Create three random arrays, perform subtraction of the second from the third, double the values of the first array, and calculate covariance and correlation between specified pairs.
    • Generate two random arrays of size 10, and compute the sum of the first half and the product of the second half of both arrays.

    Pandas Series Tasks

    • Create a Pandas Series with five elements, displaying it sorted by index and values separately.
    • Generate a Series with duplicate values, calculating the minimum and maximum ranks using 'first' and 'max' methods.
    • Determine the index positions of the minimum and maximum elements of the Series.

    DataFrame Manipulations

    • Create a DataFrame with 3 columns and 50 rows using random numerical data, replacing 10% of values with NaN.
    • Identify total missing values in the DataFrame and drop any columns with more than 5 nulls.
    • Identify the row with the maximum values' sum, drop that row, sort the DataFrame by the first column, and remove duplicates from the first column.
    • Calculate the correlation between the first and second columns, and covariance between the second and third columns.
    • Discretize the second column into 5 bins.

    Excel Data Handling

    • Import workshop attendance data from two Excel files into separate DataFrames.
    • Merge the DataFrames to find students who attended both workshops.
    • Identify students who attended only one workshop by concatenating both DataFrames and dropping duplicates.
    • Merge the DataFrames row-wise to count total records and perform hierarchical indexing using names and dates.

    Income Data Analysis

    • Create a DataFrame with members' names, genders, and monthly incomes.
    • Calculate family-wise gross monthly income by summing incomes grouped by names.
    • Identify the member with the highest income and display monthly incomes of members earning more than Rs. 60,000.
    • Calculate the average monthly income of female members.

    Titanic Dataset Analysis

    • Load the Titanic dataset and count the total number of passengers aged under 30.
    • Calculate the total fare paid by first-class passengers.
    • Compare the number of survivors across different passenger classes.
    • Compute descriptive statistics for any numeric attribute, differentiated by gender.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Description

    This assignment focuses on data visualization and statistical analysis using Python's NumPy library. Students will write programs to compute essential statistical measures such as mean, standard deviation, and variance, as well as create two-dimensional arrays. This is particularly relevant for those studying biomedical science and interested in data analytics.

    More Like This

    Use Quizgecko on...
    Browser
    Browser