Summary

This document discusses measures of central tendency, including mean, median, and mode. It also explains linear regression and offers a formula to calculate the correlation coefficient and the line of best fit. It is mainly about statistics.

Full Transcript

‭ EASURE OF CENTRAL TENDENCY‬ M ‭ ttributed‬ ‭to‬ ‭changes‬ ‭in‬ ‭an‬ ‭explanatory‬ ‭variable,‬ ‭which‬ ‭is‬...

‭ EASURE OF CENTRAL TENDENCY‬ M ‭ ttributed‬ ‭to‬ ‭changes‬ ‭in‬ ‭an‬ ‭explanatory‬ ‭variable,‬ ‭which‬ ‭is‬ a ‭Measure of central tendency‬ ‭placed on the x-axis‬ ‭-‬ ‭one‬ ‭of‬ ‭the‬ ‭basic‬ ‭statistical‬ ‭concepts‬ ‭that‬ ‭is‬ ‭used‬ ‭to‬ ‭find‬‭a‬‭single‬‭value‬‭representing‬‭the‬‭center‬‭of‬‭a‬‭set‬‭of‬ ‭Line of best fit or least-squares regression line‬ ‭data‬ ‭-‬ ‭usually of most interest‬ ‭-‬ ‭involve‬‭the‬‭method‬‭of‬‭finding‬‭out‬‭the‬‭central‬‭value‬‭of‬‭a‬ ‭-‬ ‭line‬ ‭that‬ ‭fits‬ ‭the‬ ‭data‬ ‭better‬ ‭than‬ ‭any‬ ‭other‬ ‭line‬ ‭that‬ ‭statistical series or set of quantitative data‬ ‭might be drawn‬ ‭-‬ ‭any‬ ‭single‬ ‭value‬‭that‬‭is‬‭used‬‭to‬‭identify‬‭the‬‭center‬‭of‬ ‭-‬ ‭set‬ ‭of‬ ‭bivariate‬ ‭data,‬ ‭line‬ ‭that‬ ‭minimizes‬ ‭the‬ ‭sum‬ ‭of‬ ‭the data or typical value‬ ‭the‬ ‭squares‬‭of‬‭the‬‭vertical‬‭deviations‬‭from‬‭each‬‭data‬ ‭Three types of central tendency‬ ‭point to the line‬ ‭‬ ‭Mean:‬ ‭sum‬ ‭of‬ ‭all‬ ‭observed‬ ‭values‬ ‭divided‬ ‭by‬ ‭the‬ ‭number of observations‬ ‭‬ ‭Median:‬ ‭positional‬ ‭middle‬ ‭value‬ ‭when‬ ‭observations‬ ‭are ordered from smallest to largest or vice versa‬ ‭‬ ‭Mode: observed value that occurs most frequently‬ ‭Unimodal: one mode‬ ‭Bimodal: two mode‬ ‭Trimodal: three mode‬ ‭Mean‬ ‭Median‬ ‭Mode‬ ‭Quantitative Data‬ ‭Quantitative Data‬ ‭ uantitative‬ ‭and‬ Q ‭Qualitative Data‬ -‭ ‬ ‭most‬ ‭popular‬ -‭ ‬ ‭extreme‬ ‭values‬ ‭- may not exist‬ ‭measure‬ ‭of‬ ‭do‬ ‭not‬ ‭affect‬ ‭the‬ ‭-‬ ‭may‬ ‭not‬ ‭be‬ ‭central location‬ ‭median‬ ‭as‬ ‭unique‬ ‭-‬ ‭affected‬ ‭by‬ ‭strongly‬ ‭as‬ ‭they‬ ‭-‬ ‭extreme‬ ‭values‬ ‭extreme values‬ ‭do the mean‬ ‭do‬ ‭not‬ ‭affect‬ ‭the‬ ‭-‬ ‭unique,‬ ‭one‬ ‭-‬ ‭useful‬ ‭when‬ ‭mode‬ ‭answer‬ ‭comparing‬ ‭sets‬‭of‬ ‭-‬ ‭no‬ ‭values‬ ‭Three main purpose of regression analysis‬ ‭-‬ ‭useful‬ ‭when‬ ‭data‬ ‭repeat:‬ ‭mode‬ ‭is‬ ‭-‬ ‭To‬ ‭describe‬ ‭or‬ ‭model‬ ‭a‬ ‭set‬ ‭of‬ ‭data‬ ‭with‬ ‭one‬ ‭comparing‬ ‭sets‬‭of‬ ‭-‬ ‭unique,‬ ‭one‬ ‭every‬ ‭value‬ ‭and‬ ‭dependent‬ ‭variable‬ ‭and‬ ‭one‬ ‭or‬ ‭more‬ ‭independent‬ ‭data‬ ‭answer‬ ‭useless‬ ‭-‬ ‭more‬ ‭than‬ ‭1‬ ‭variables‬ ‭mode:‬ ‭difficult‬ ‭to‬ ‭-‬ ‭To‬ ‭predict‬ ‭or‬ ‭estimate‬ ‭the‬ ‭values‬ ‭of‬ ‭the‬ ‭dependent‬ ‭interpret‬ ‭and‬ ‭variable‬ ‭based‬ ‭on‬ ‭given‬ ‭value(s)‬ ‭of‬‭the‬‭independent‬ ‭compare‬ ‭variable(s)‬ ‭-‬ ‭To‬ ‭control‬ ‭or‬ ‭administer‬ ‭standards‬ ‭from‬ ‭a‬ ‭useable‬ ‭ ormula‬ F ‭ ormula‬ F ‭ ormula‬ F ‭statistical relationship‬ ‭∑x/n‬ ‭n+1/2 position‬ ‭none‬ ‭Linear correlation coefficient‬ ‭ INEAR REGRESSION AND CORRELATION‬ L ‭-‬ ‭determine‬ ‭the‬ ‭strength‬ ‭of‬ ‭a‬ ‭linear‬ ‭relationship‬ ‭Data analysis: provides insight that improve decisions‬ ‭between two variables‬ ‭Linear‬ ‭regression‬ ‭model:‬ ‭have‬ ‭an‬ ‭important‬ ‭role‬ ‭for‬ ‭many‬ ‭-‬ ‭statistic used by statisticians‬ ‭analyses and predictions‬ ‭-‬ ‭denoted by the variable r‬ ‭ orrelation‬ ‭or‬ ‭simple‬ ‭linear‬ ‭regression‬ ‭analysis:‬ ‭determine‬ ‭if‬ C ‭two numeric variables are significantly linearly related‬ ‭Correlation‬ ‭analysis:‬ ‭provides‬‭information‬‭on‬‭the‬‭strength‬‭and‬ ‭direction of the linear relationship between two variables‬ ‭Simple‬ ‭linear‬ ‭regression‬ ‭analysis:‬ ‭estimates‬ ‭parameters‬ ‭in‬ ‭a‬ ‭linear‬ ‭equation‬ ‭that‬ ‭can‬ ‭be‬ ‭used‬ ‭to‬ ‭predict‬ ‭values‬ ‭of‬ ‭one‬ ‭variable based on the other‬ I‭f‬ ‭r‬ ‭is‬ ‭positive,‬ ‭the‬ ‭relationship‬ ‭between‬ ‭the‬ ‭variables‬ ‭has‬ ‭a‬ ‭positive‬‭correlation.‬‭In‬‭this‬‭case,‬‭if‬‭one‬‭variable‬‭increases,‬‭the‬ ‭ inear‬ ‭regression:‬ ‭statistical‬ ‭model‬ ‭that‬ ‭attempts‬‭to‬‭show‬‭the‬ L ‭other variable also tends to increase.‬ ‭relationship between two variables with a linear equation‬ I‭f‬ ‭r‬ ‭is‬ ‭negative,‬ ‭the‬ ‭linear‬ ‭relationship‬ ‭between‬ ‭the‬ ‭variables‬ ‭ egression‬ ‭analysis:‬‭graphing‬‭a‬‭line‬‭over‬‭a‬‭set‬‭of‬‭data‬‭points‬ R ‭has‬ ‭a‬ ‭negative‬ ‭correlation.‬ ‭In‬ ‭this‬ ‭case,‬ ‭if‬ ‭one‬ ‭variable‬ ‭that most closely fits the overall shape of the data\‬ ‭increases, the other variable tends to decrease.‬ ‭ egression:‬ ‭shows‬ t‭he‬ ‭extent‬ ‭to‬ ‭which‬ c‭ hanges‬ ‭in‬ ‭a‬ R ‭ he‬ ‭closer‬ ‭|‬ ‭r‬ ‭|‬ ‭is‬ ‭to‬ ‭1,‬ ‭the‬ ‭stronger‬ ‭the‬ ‭linear‬ ‭relationship‬ T ‭dependent‬ ‭variable,‬ ‭which‬ ‭is‬ ‭put‬ ‭on‬ ‭the‬ ‭y-axis,‬ ‭can‬ ‭be‬ ‭between the variables.‬ ‭Other strengths of association‬ ‭r value‬ ‭Interpretation‬ ‭0.9‬ ‭strong association‬ ‭0.5‬ ‭moderate association‬ ‭0.25‬ ‭weak association‬ ‭ orrelation:‬ ‭measure‬ ‭of‬ ‭association‬ ‭between‬ ‭two‬ ‭numerical‬ C ‭variables‬ ‭ earson’s‬ ‭sample‬ ‭correlation‬ ‭coefficient,‬ ‭r:‬ ‭measures‬ ‭the‬ P ‭direction‬‭and‬‭the‬‭strength‬‭of‬‭the‬‭linear‬‭association‬‭between‬‭two‬ ‭numerical paired variables‬ ‭Regression‬ ‭-‬ ‭specific‬‭statistical‬‭methods‬‭for‬‭finding‬‭the‬‭line‬‭of‬‭best‬ ‭Strength of linear association‬ ‭fit‬ ‭for‬ ‭one‬ ‭response‬ ‭(dependent)‬ ‭numerical‬ ‭variable‬ ‭based‬ ‭on‬ ‭one‬ ‭or‬ ‭more‬ ‭explanatory‬ ‭(independent)‬ ‭r value‬ ‭Intrepretation‬ ‭variables‬ ‭-‬ ‭statistical‬‭methods‬‭to‬‭assess‬‭the‬‭goodness‬‭of‬‭fit‬‭of‬‭the‬ ‭1‬ ‭perfect positive linear relationship‬ ‭model‬ ‭0‬ ‭no relationship‬ ‭-‬ ‭Correlation Coefficient‬ ‭-1‬ ‭perfect negative linear relationship‬ ‭Simple linear regression‬ ‭-‬ ‭statistical‬ ‭methods‬ ‭for‬ ‭finding‬ ‭the‬ ‭line‬ ‭of‬ ‭best‬ ‭fit‬ ‭for‬ ‭one‬ ‭response‬ ‭(dependent)‬ ‭numerical‬ ‭variable‬ ‭based‬ ‭on one or more explanatory (independent) variables‬ ‭Least squares regression‬ ‭-‬ ‭minimize‬ ‭the‬ ‭sum‬ ‭of‬ ‭the‬ ‭square‬ ‭of‬ ‭the‬ ‭errors‬ ‭of‬‭the‬ ‭data points‬ ‭-‬ ‭minimizes the Mean Square Error‬ ‭Steps to reaching a solution‬ ‭‬ ‭Draw a scatterplot of the data.‬ ‭‬ ‭ isually,‬ ‭consider‬ ‭the‬ ‭strength‬ ‭of‬ ‭the‬ ‭linear‬ V ‭Using data ethically‬ ‭relationship.‬ ‭-‬ ‭What is data?‬ ‭‬ ‭If‬ ‭the‬ ‭relationship‬ ‭appears‬ ‭relatively‬ ‭strong,‬ ‭find‬ ‭the‬ ‭-‬ ‭How is data used?‬ ‭correlation coefficient as a numerical verification.‬ ‭-‬ ‭What is data misconduct?‬ ‭‬ ‭If‬‭the‬‭correlation‬‭is‬‭still‬‭relatively‬‭strong,‬‭then‬‭find‬‭the‬ ‭How we work with data?‬ ‭simple linear regression line.‬ ‭-‬ ‭Generating‬ ‭‬ ‭Interpreting and Visualizing‬ ‭-‬ ‭Curating‬ ‭-‬ ‭Recording‬ ‭y = a + bx‬ ‭-‬ ‭Processing‬ ‭-‬ ‭value of b: slope‬ ‭-‬ ‭Dissemenating‬ ‭-‬ ‭value of a: y-intercept‬ ‭-‬ ‭Sharing‬ ‭-‬ ‭r: correlation coefficient‬ ‭-‬ ‭Using‬ ‭-‬ ‭r^2: coefficient of determination‬ ‭Plagiarism‬ ‭Misconduct‬ ‭ trength of the association: r^2‬ S ‭-‬ ‭more than just plagiarism‬ ‭Coefficient of determination‬ ‭-‬ ‭fabrication and falsification‬ ‭-‬ ‭r^2‬ ‭-‬ ‭Department‬ ‭of‬ ‭Health‬ ‭and‬ ‭Human‬ ‭Services,‬ ‭-‬ ‭percent‬ ‭of‬ ‭the‬ ‭variation‬‭in‬‭the‬‭response‬‭variable‬‭that‬ ‭fabrication,‬ ‭falsification,‬ ‭or‬ ‭plagiarism‬ ‭in‬ ‭proposing,‬ ‭is‬ ‭explained‬ ‭or‬ ‭determined‬ ‭by‬ ‭the‬ ‭model‬ ‭and‬ ‭the‬ ‭performing, or reviewing research results‬ ‭explanatory variable‬ ‭Fabrication: making up results and recording or reporting them‬ ‭Falsification‬ ‭-‬ ‭manipulation‬ ‭of‬ ‭research‬ ‭materials,‬ ‭equipment,‬ ‭or‬ ‭process‬ ‭-‬ ‭changing‬ ‭or‬ ‭omitting‬ ‭results‬ ‭so‬ ‭research‬ ‭is‬ ‭not‬ ‭accurately represented‬ ‭Plagiarism‬ ‭-‬ ‭appropriation‬ ‭of‬ ‭another’s‬ ‭ideas,‬ ‭processes,‬ ‭results,‬ ‭or words without giving proper credit‬ ‭Real life application‬ ‭-‬ ‭Multiple‬ ‭regression:‬ ‭cost‬ ‭estimating‬ ‭for‬ ‭future‬ ‭space‬ ‭flight vehicles‬ ‭-‬ ‭Nonlinear‬‭application:‬‭predicting‬‭when‬‭solar‬‭maximum‬ ‭will occur‬ ‭-‬ ‭Periodic:‬ ‭estimating‬ ‭seasonal‬ ‭sales‬ ‭for‬ ‭department‬ ‭stores‬ ‭-‬ ‭predicting‬ ‭student‬ ‭grades‬ ‭based‬ ‭on‬ ‭time‬ ‭spent‬ ‭studying‬ ‭ THICAL ISSUES IN MANAGEMENT OF DATA‬ E ‭Data‬ ‭-‬ ‭simply a piece of information‬ ‭-‬ ‭facts or statistics‬ ‭Data ethics‬ ‭-‬ ‭National‬ ‭Center‬ ‭for‬ ‭Biotechnology‬ ‭Information,‬ ‭new‬ ‭branch‬ ‭of‬ ‭ethics‬ ‭that‬ ‭studies‬ ‭and‬ ‭evaluates‬ ‭moral‬ ‭problems‬ ‭related‬ ‭to‬ ‭data,‬ ‭algorithm,‬ ‭and‬ ‭corresponding‬ ‭practices‬ ‭in‬ ‭order‬ ‭to‬ ‭formulate‬ ‭and‬ ‭support morally gold solutions‬ ‭-‬ ‭branch‬ ‭of‬ ‭ethics‬ ‭that‬ ‭studies‬ ‭and‬ ‭evaluates‬ ‭moral‬ ‭concerns related to data‬ ‭-‬ ‭includes‬ ‭but‬ ‭is‬ ‭not‬ ‭limited‬ ‭to‬ ‭any‬ ‭kind‬‭of‬‭information‬ ‭created‬ ‭-‬ ‭include‬ ‭algorithims,‬ ‭scripts,‬ ‭and‬ ‭research‬ ‭processes‬ ‭(references, results, samples, and raw data)‬ ‭Sensitive data: personally identifiable information‬

Use Quizgecko on...
Browser
Browser