- Forward


Machine Learning
An Introduction


Prof. David Bernstein
James Madison University

Computer Science Department
bernstdh@jmu.edu

Print

Motivation
Back SMYC Forward
  • Review:
    • Linear regression analysis can be used to determine the dependence of a continuous variable on other variables
    • Binary logit models can be used to determine the dependence of a 0-1 variable on other variables
  • Commonalities:
    • The models have a known form that includes the number and types of the terms, functions, and parameters
    • The parameters are estimated by maximizing or minimizing a "fit" function
Motivation (cont.)
Back SMYC Forward
  • A Question:
    • Can we "determine" the form of the model as well as its parameters?
  • The Answer:
    • Most obviously, we can try different specific models
    • Less obviously, we can specify a general class of models and find the best within that class
  • A Danger:
    • Overfitting
Some History
Back SMYC Forward
  • Different Disciplines/Researchers:
    • Researchers in statistics have studied this problem for hundreds of years
    • Researchers in artificial intelligence have studied this problem for 50 years
    • Researchers in data science and big data have studied this problem for a few years
  • The Resulting "Dispute":
    • Which techniques belong in which discipline?
    • Which disciplines have made the big discoveries and which are just producing hype?
    • Which terminology to use?
The Machine Learning "Perspective"
Back SMYC Forward
  • Terminology:
    • Observations consist of features
    • The response variable (or target) is the feature being predicted and the other features are predictors
    • The system learns from the training set
  • Types of Learning:
    • Supervised - the "correct" answer is identified/labeled in the training set and a loss function is optimized
    • Unsupervised - the system infers things from the training set
    • Reinforcement - the system attempts to maximize (using dynamic programming techniques) the cumulative reward (by balancing the "exploration of uncharted territory" and the "exploitation of current knowledge")
The Machine Learning "Perspective" (cont.)
Back SMYC Forward
  • Types of Response Variables:
    • Continuous
    • Categorical (or Classification)
  • Quality of the Predictions:
    • Defined using an evaluation function and measured using a testing set
Data for Machine Learning
Back SMYC Forward
  • Structured:
    • Data in which the features are formatted according to a pre-defined schema (e.g., tables, hierarchies)
  • Unstructured:
    • Everything else (e.g., images, audio files, video files, text documents)
Artificial Neural Networks
Back SMYC Forward
  • The Inspiration:
    • A simple model of the brain consisting of a network of neurons with axons at each end
  • A "Special" Aspect of Biological Neural Networks:
    • There is a gap (called a synapse) between the axons of different neurons (unlike graphs/networks in which the edges/links/arcs "meet" at nodes/vertexes)
Artificial Neural Networks (cont.)
Back SMYC Forward
  • The Inspiration (cont.):
    • A message will be passed across the synapse if the sum of the weighted input signals exceeds a threshold (a process known as activation)
  • Using the Inspiration:
    • Construct a network consisting of input nodes, hidden nodes, and output nodes
    • Provide the network with inputs
    • Adjust the parameters (i.e., learn) until the "best" outputs are achieved (i.e., until the loss is minimized)
Artificial Neural networks (cont.)
Back SMYC Forward

An Example

images/artificial-neural-network_simple.png
Artificial Neural Networks (cont.)
Back SMYC Forward
  • Shallow vs. Deep Learning:
    • Is really just about the number of hidden layers
  • The Advantage of Deep Learning:
    • The features needn't be specified, they can be learned (through model tuning)
  • The Disadvantages of Deep Learning:
    • Needs more data
    • The learning algorithm is more computationally demanding
ANNs with Supervised Learning
Back SMYC Forward
  • Decisions to be Made when Constructing an ANN:
    • Inputs and Outputs (and how to make them numerical)
    • Shallow (i.e., one hidden layer) or Deep (i.e., multiple hidden layers)
    • Weighting Schemes and Activation Functions
    • Loss Function
    • Learning Algorithm (i.e., how to minimize the loss function)
  • The Result After Training:
    • A weighted network that can be given inputs and will produce predicted outputs
Some Supervised Classification Techniques
Back SMYC Forward
  • Support Vector Machines:
    • Find a hyperplane (e.g., a line in \(\mathbb{R}^2\), a plane in \(\mathbb{R}^3\)) \(N\) dimensions that distinctly classifies the data (e.g., one color on one side of the hyperplane and another color on the other)
  • \(K\)-Nearest Neighbors (KNN):
    • A point is classified based on the classification of its K-nearest neighbors
  • Naive Bayes Classifiers:
    • A point is classified by assuming that each feature contributes independently to the probability that point belongs to a particular class (e.g., the color, shape and size of a fruit contribute independently to the probability that it is an apple)
Some Unsupervised Clustering Techniques
Back SMYC Forward
  • \(K\)-Means:
    • A point is classified based on the classification of its K-nearest neighbors
  • Hierarchical Clustering:
    • Groups data into a dendogram (i.e., a multi-level tree of clusters)
There's Always More to Learn
Back -