Fuzzy C-Means Clustering for Iris Data
Try Stork, a research tool we developed. Stork is a publication alert app developed by us at Stanford. As a researcher we often forget to follow up important publications - and it's practically impossible to search many keywords or researchers' names everyday. If you are an R blogger yourself you are invited to add your own R content feed to this site (Non-English R bloggers should add themselves- here). For a list of free machine learning books available for download, go here. For a list of (mostly) free machine learning courses available online, go here. For a list of blogs on data science and machine learning, go here. For a list of free-to-attend meetups and local events, go here. Python is a basic calculator out of the box. Here we consider the most basic mathematical operations: addition, subtraction, multiplication, division and exponenetiation. We use the func:print to get the output.
This example shows how to use fuzzy c-means clustering for the iris data set. This dataset was collected by botanist Edgar Anderson and contains random samples of flowers belonging to three species of iris flowers: setosa, versicolor, and virginica. For each of the species, the data set contains 50 observations for sepal length, sepal width, petal length, and petal width.
Load Data
Load the data set from the
iris.dat
data file.Partition the data into three groups named
setosa
, versicolor
, and virginica
.Iris Dataset R
Plot Data in 2-D
The iris data contains four dimensions representing sepal length, sepal width, petal length, and petal width. Plot the data points for each combination of two dimensions.
![Matlab download for mac Matlab download for mac](/uploads/1/2/6/3/126331434/286446585.png)
Setup Parameters
Download Manager For Mac
Specify the options for clustering the data using fuzzy c-means clustering. These options are:
Nc
— Number of clusters- M — Fuzzy partition matrix exponent, which indicates the degree of fuzzy overlap between clusters. For more information, see Adjust Fuzzy Overlap in Fuzzy C-Means Clustering.
maxIter
— Maximum number of iterations. The clustering process stops after this number of iterations.minImprove
— Minimum improvement. The clustering process stops when the objective function improvement between two consecutive iterations is less than this value.
For more information about these options and the fuzzy c-means algorithm, see
fcm
.Compute Clusters
Fuzzy c-means clustering is an iterative process. Initially, the
fcm
function generates a random fuzzy partition matrix. This matrix indicates the degree of membership of each data point in each cluster.In each clustering iteration,
fcm
calculates the cluster centers and updates the fuzzy partition matrix using the calculated center locations. It then computes the objective function value.Cluster the data, displaying the objective function value after each iteration.
The clustering stops when the objective function improvement is below the specified minimum threshold.
Plot the computed cluster centers as bold numbers.
See Also
Related Topics
Sample Data Sets
Statistics and Machine Learning Toolbox™ software includes the sample data sets in the following table.
To load a data set into the MATLAB® workspace, type:
where
filename
is one of the files listed in the table.Data sets contain individual data variables, description variables with references, and dataset arrays encapsulating the data set and its description, as appropriate.
File | Description of Data Set |
---|---|
acetylene.mat | Chemical reaction data with correlated predictors |
arrhythmia.mat | Cardiac arrhythmia data from the UCI machine learning repository |
carbig.mat | Measurements of cars, 1970–1982 |
carsmall.mat | Subset of carbig.mat . Measurements of cars, 1970, 1976, 1982 |
census1994.mat | Adult data from the UCI machine learning repository |
cereal.mat | Breakfast cereal ingredients |
cities.mat | Quality of life ratings for U.S. metropolitan areas |
discrim.mat | A version of cities.mat used for discriminant analysis |
examgrades.mat | Exam grades on a scale of 0–100 |
fisheriris.mat | Fisher's 1936 iris data |
flu.mat | Google Flu Trends estimated ILI (influenza-like illness) percentage for various regions of the US, and CDC weighted ILI percentage based on sentinel provider reports |
gas.mat | Gasoline prices around the state of Massachusetts in 1993 |
hald.mat | Heat of cement vs. mix of ingredients |
hogg.mat | Bacteria counts in different shipments of milk |
hospital.mat | Simulated hospital data |
humanactivity.mat | Human activity recognition data of five activities: sitting, standing, walking, running, and dancing |
imports-85.mat | 1985 Auto Imports Database from the UCI repository |
ionosphere.mat | Ionosphere dataset from the UCI machine learning repository |
kmeansdata.mat | Four-dimensional clustered data |
lawdata.mat | Grade point average and LSAT scores from 15 law schools |
mileage.mat | Mileage data for three car models from two factories |
moore.mat | Biochemical oxygen demand on five predictors |
morse.mat | Recognition of Morse code distinctions by non-coders |
nlpdata.mat | Natural language processing data extracted from the MathWorks® documentation. |
ovariancancer.mat | Grouped observations on 4000 predictors |
parts.mat | Dimensional run-out on 36 circular parts |
polydata.mat | Sample data for polynomial fitting |
popcorn.mat | Popcorn yield by popper type and brand |
reaction.mat | Reaction kinetics for Hougen-Watson model |
sat.dat | Scholastic Aptitude Test averages by gender and test (table) |
sat2.dat | Scholastic Aptitude Test averages by gender and test (csv) |
spectra.mat | NIR spectra and octane numbers of 60 gasoline samples |
stockreturns.mat | Simulated stock returns |