Mitmw Reviewer Aly PDF
Document Details
Uploaded by AwestruckSugilite3877
Tags
Summary
This document discusses measures of central tendency, including mean, median, and mode. It also explains linear regression and offers a formula to calculate the correlation coefficient and the line of best fit. It is mainly about statistics.
Full Transcript
EASURE OF CENTRAL TENDENCY M ttributed to changes in an explanatory variable, which is...
EASURE OF CENTRAL TENDENCY M ttributed to changes in an explanatory variable, which is a Measure of central tendency placed on the x-axis - one of the basic statistical concepts that is used to findasinglevaluerepresentingthecenterofasetof Line of best fit or least-squares regression line data - usually of most interest - involvethemethodoffindingoutthecentralvalueofa - line that fits the data better than any other line that statistical series or set of quantitative data might be drawn - any single valuethatisusedtoidentifythecenterof - set of bivariate data, line that minimizes the sum of the data or typical value the squaresoftheverticaldeviationsfromeachdata Three types of central tendency point to the line Mean: sum of all observed values divided by the number of observations Median: positional middle value when observations are ordered from smallest to largest or vice versa Mode: observed value that occurs most frequently Unimodal: one mode Bimodal: two mode Trimodal: three mode Mean Median Mode Quantitative Data Quantitative Data uantitative and Q Qualitative Data - most popular - extreme values - may not exist measure of do not affect the - may not be central location median as unique - affected by strongly as they - extreme values extreme values do the mean do not affect the - unique, one - useful when mode answer comparing setsof - no values Three main purpose of regression analysis - useful when data repeat: mode is - To describe or model a set of data with one comparing setsof - unique, one every value and dependent variable and one or more independent data answer useless - more than 1 variables mode: difficult to - To predict or estimate the values of the dependent interpret and variable based on given value(s) oftheindependent compare variable(s) - To control or administer standards from a useable ormula F ormula F ormula F statistical relationship ∑x/n n+1/2 position none Linear correlation coefficient INEAR REGRESSION AND CORRELATION L - determine the strength of a linear relationship Data analysis: provides insight that improve decisions between two variables Linear regression model: have an important role for many - statistic used by statisticians analyses and predictions - denoted by the variable r orrelation or simple linear regression analysis: determine if C two numeric variables are significantly linearly related Correlation analysis: providesinformationonthestrengthand direction of the linear relationship between two variables Simple linear regression analysis: estimates parameters in a linear equation that can be used to predict values of one variable based on the other If r is positive, the relationship between the variables has a positivecorrelation.Inthiscase,ifonevariableincreases,the inear regression: statistical model that attemptstoshowthe L other variable also tends to increase. relationship between two variables with a linear equation If r is negative, the linear relationship between the variables egression analysis:graphingalineoverasetofdatapoints R has a negative correlation. In this case, if one variable that most closely fits the overall shape of the data\ increases, the other variable tends to decrease. egression: shows the extent to which c hanges in a R he closer | r | is to 1, the stronger the linear relationship T dependent variable, which is put on the y-axis, can be between the variables. Other strengths of association r value Interpretation 0.9 strong association 0.5 moderate association 0.25 weak association orrelation: measure of association between two numerical C variables earson’s sample correlation coefficient, r: measures the P directionandthestrengthofthelinearassociationbetweentwo numerical paired variables Regression - specificstatisticalmethodsforfindingthelineofbest Strength of linear association fit for one response (dependent) numerical variable based on one or more explanatory (independent) r value Intrepretation variables - statisticalmethodstoassessthegoodnessoffitofthe 1 perfect positive linear relationship model 0 no relationship - Correlation Coefficient -1 perfect negative linear relationship Simple linear regression - statistical methods for finding the line of best fit for one response (dependent) numerical variable based on one or more explanatory (independent) variables Least squares regression - minimize the sum of the square of the errors ofthe data points - minimizes the Mean Square Error Steps to reaching a solution Draw a scatterplot of the data. isually, consider the strength of the linear V Using data ethically relationship. - What is data? If the relationship appears relatively strong, find the - How is data used? correlation coefficient as a numerical verification. - What is data misconduct? Ifthecorrelationisstillrelativelystrong,thenfindthe How we work with data? simple linear regression line. - Generating Interpreting and Visualizing - Curating - Recording y = a + bx - Processing - value of b: slope - Dissemenating - value of a: y-intercept - Sharing - r: correlation coefficient - Using - r^2: coefficient of determination Plagiarism Misconduct trength of the association: r^2 S - more than just plagiarism Coefficient of determination - fabrication and falsification - r^2 - Department of Health and Human Services, - percent of the variationintheresponsevariablethat fabrication, falsification, or plagiarism in proposing, is explained or determined by the model and the performing, or reviewing research results explanatory variable Fabrication: making up results and recording or reporting them Falsification - manipulation of research materials, equipment, or process - changing or omitting results so research is not accurately represented Plagiarism - appropriation of another’s ideas, processes, results, or words without giving proper credit Real life application - Multiple regression: cost estimating for future space flight vehicles - Nonlinearapplication:predictingwhensolarmaximum will occur - Periodic: estimating seasonal sales for department stores - predicting student grades based on time spent studying THICAL ISSUES IN MANAGEMENT OF DATA E Data - simply a piece of information - facts or statistics Data ethics - National Center for Biotechnology Information, new branch of ethics that studies and evaluates moral problems related to data, algorithm, and corresponding practices in order to formulate and support morally gold solutions - branch of ethics that studies and evaluates moral concerns related to data - includes but is not limited to any kindofinformation created - include algorithims, scripts, and research processes (references, results, samples, and raw data) Sensitive data: personally identifiable information