Data Mining Notes .pdf
Document Details
Uploaded by Deleted User
Tags
Full Transcript
. Data Mining Data mining is a form of data analysis. It is a form of artificial analysis. Data mining is the act of sorting through large sets of data to identify patterns and establish relationships. The goal of mining data is to extract information from sets of data that can be used to inform a...
. Data Mining Data mining is a form of data analysis. It is a form of artificial analysis. Data mining is the act of sorting through large sets of data to identify patterns and establish relationships. The goal of mining data is to extract information from sets of data that can be used to inform and instruct future decisions, by identifying past and present trends. Data mining involves finding trends and then applying a theory to new data sets in order to try and validate the changes that are occurring. The overall goal is to be able to predict changes before they actually occur. This means that a business can be in the right place at the right time. Computers are used for data mining because of the sheer volume of data that is searched and analyzed. Cross-Industry Standard Process for Data Mining (CRISP-DM) The six phases of CRISP-DM include: 1. Business Understanding: In this step, the goals of the businesses are set and the important factors that will help in achieving the goal are discovered. There are three main elements to the business understanding stage: setting the objectives, developing the project plan and establishing the criteria for success. Once the business needs have been established, other important factors such as 2. Data Understanding: This step will collect the whole data and populate the data in the tool (if using any tool). The data is listed with its data source, location, how it is acquired and if any issue encountered. Data is visualized and queried to check its completeness. 3. Data Preparation: This step involves selecting the appropriate data, cleaning, constructing attributes from data, integrating data from multiple databases. This can then allow for the patterns and trends to be established in the data, relating to the business needs. This is the largest stage in the project and the most time consuming. 4. Data Modeling: Selection of the data mining technique such as decision-tree, generate test design for evaluating the selected model, building models from the dataset and assessing the built model with experts to discuss the result is done in this step. This will allow the business to understand whether the models are suitable for the business and that the models fall in the line with the business initiatives. 5. Evaluation: This step will determine the degree to which the resulting model meets the business requirements. Evaluation can be done by testing the model on real applications. The model is reviewed for any mistakes or steps that should be repeated. 6. Deployment: In this step a deployment plan is made, strategy to monitor and maintain the data mining model results to check for its usefulness is formed, final reports are made and review of the whole process is done to check any mistake and see if any step is repeated. If the stakeholders feel the business hasn’t achieved an informative enough result, they may choose to repeat the data mining process to further refine the information gathered from the process. Uses of Data Mining 1. National Security & Surveillance: Data mining has many applications in security including in national security (e.g., surveillance) as well as in cyber security (e.g., virus detection). The threats to national security include attacking buildings and destroying critical infrastructures such as power grids and telecommunication systems. Data mining techniques are being used to identify suspicious individuals and groups, and to discover which individuals and groups are capable of carrying out terrorist activities. Cyber security is concerned with protecting computer and network systems from corruption due to malicious software including Trojan horses and viruses. Data mining is also being applied to provide solutions such as intrusion detection and auditing. 2. Businesses: Data mining can contribute to solving business problems in banking and finance by finding patterns, causalities, and correlations in business information and market prices that are not immediately apparent to managers because the volume data is too large or is generated too quickly to screen by experts. The managers may find this information for better segmenting, targeting, acquiring, retaining and maintaining a profitable customer. Data mining allows to find a segment of customers based on vulnerability and the business could offer them with special offers and enhance satisfaction. To maintain a proper relationship with a customer a business need to collect data and analyze the information. This is where data mining plays its part. With data mining technologies the collected data can be used for analysis. Instead of being confused where to focus to retain customer, the seekers for the solution get filtered results. Market basket analysis is a modelling technique based upon a theory that if you buy a certain group of items you are more likely to buy another group of items. This technique may allow the retailer to understand the purchase behavior of a buyer. This information may help the retailer to know the buyer’s needs and change the store’s layout accordingly. Using differential analysis comparison of results between different stores, between customers in different demographic groups can be done. 3. Scientific Research: A large volume of complex, multi-dimensional scientific data is collected and stored daily. Data mining and predictive modeling offer a means of analysis of that data. Data mining and predictive modeling are capable of automatic extraction of knowledge deeply hidden in data, enabling discovery of knowledge not otherwise attainable. Data mining extracts patterns, changes, associations and anomalies from large data sets. Work in data mining ranges from theoretical work on the principles of learning and mathematical representations of data to building advanced engineering systems that perform information filtering on the web, find genes in DNA sequences, help understand trends and anomalies in economics and education, and detect network intrusion. 4. Heath Care: Data mining in healthcare has proven effective in areas such as predictive medicine, customer relationship management, detection of fraud and abuse, management of healthcare and measuring the effectiveness of certain treatments. The adoption of electronic health records has allowed healthcare professionals to distribute the knowledge across all sectors of healthcare, which in turn, helps reduce medical errors and improve patient care and satisfaction. 5. Social and Economic Trends Data mining has a large role in predicting future social and economic trends. Many institutions are concerned with the stabilization and growth of the global economic market. Any data and intelligence that can predict what might happen in the future with the economy can help important institutions, such as government, to prepare for any possible crisis that may occur. Companies can also use the ability to predict what may lay ahead with the economy to make important business decisions, such as where they should plan to expand their business. Ethical and Privacy Implications Individuals and organizations have concerned about the ethical and privacy implications of data mining; It often involves around the use of data mined about individuals that is then used to target them with products and advertising. There can be certain issues with this. IF a person searches for a certain product or service, they may be bombarded with similar products for a long time afterwards. The constant reminder of the targeted advertising may be harmful and may create upsetting feelings at a later date. The other example may be if a person searches for a particular ailment, the data is mined by lots of companies and the individual may never want people to be aware of their medical issues. Advantages Disadvantages Allows organizations to make strategic Software tools and skilled staff is required. decisions that can help maintain or increase their revenue. Allows organizations to understand their Many people see the practice of data customers and create the products they mining as both unethical and an invasion of need. their privacy. Allows individuals to see targeted product Storage cost for the data are very advertising based on the things they expensive, therefore, this can also increase already like. the cost of the process of data mining. Allows important institutions to predict A great security issue, as hackers will want future crisis that they can then plan to gain access to the data because it has a strategies and solutions to help handle or high value. avoid them. Allows business to save costs either by The outcomes produced by data mining understanding how to streamline what are only predictions based on patterns and they already do or by not investing in a trends in past data. They are not an future product that they can now be aware accurate science and it is very possible for of. them to be incorrect.