Introduction
When discussing classification models, we seek the most effective one. The Bayes Classifier is often considered the best classifier available, as it utilizes all information about the probability distribution to predict the most probable class. Here, we assume that and have a joint distribution, denoted as , and represents a random sample from . In this context, the training and testing data are independent draws from the same distribution.
Suppose you want to classify whether a test subject has stress based on brain activity. In this case, we define:
- : Average brain activity in the Amygdala.
- : 1 if the subject has stress (red), and 0 if not (green).
Figure 1: Example of a stress classification model.
As shown in Figure 1, higher values of are correlated with stress. In this example, our test subject is classified as experiencing stress. We can view our probability distribution as and , as illustrated in Figure 2.
Figure 2: Example of stress classification model probabilities.
Thus, we can say that for a given , is the most likely label:
The Bayes’ Theorem
Now, let’s use Bayes’ Theorem to rewrite our classification function (eq 1) in terms of likelihood and marginal distributions. We start with Bayes’ Theorem:
Using (2) in (1), we have:
Next, we can consider the appropriate probability models for the terms and the conditional distribution of the inputs .
In a classification model, it is natural to assume that . In this case, we model , noting that the parameters of the conditional distribution may differ for and .
Decision Boundary
The decision boundary occurs when . Here, we can compute the ratio:
This leads to:
Example
In the context of stress classification, suppose we know that:
The Classifier
In this case, the formula for equation 3 is:
If one test subject has in their brain activity, you can substitute the given values into the classifier and obtain , meaning the algorithm predicts that the subject does not have stress.
Decision Boundary
This is the case where . After performing the algebra, you can show that:
Using the provided data:
Finally, our classification problem is illustrated in Figure 3.
Figure 3: Classification model.
Conclusions
You can prove that the Bayes classifier is optimal; however, it never captures the true probability distribution. Some models that I plan to explore in the future have been developed to address this issue.
As future work, we can find the decision boundary when and then work with multivariate distributions for .
R Code
library(ggplot2)
# Distribution for y
theta <- 0.3
y <- rbinom(100000, 1, theta)
n1 <- sum(y == 1)
n2 <- sum(y == 0)
# Distribution for x given y=0
mu0 <- 5
sigma0 <- 1
# Distribution for x given y=1
mu1 <- 7
sigma1 <- 1
# Create a simulated dataframe
xy0 <- rnorm(n2, mu0, sigma0)
xy1 <- rnorm(n1, mu1, sigma1)
sim_df <- data.frame(rbind(cbind(xy0, 0), cbind(xy1, 1)))
colnames(sim_df) <- c("x", "y")
# The calculation for y given x distribution
ygx <- function(y, x) {
if (y == 0) {
mu <- mu0
sigma <- sigma0
} else if (y == 1) {
mu <- mu1
sigma <- sigma1
}
fx <- dnorm(x, mu, sigma) * (theta^y) * (1 - theta)^(1 - y)
return(fx)
}
# Calculate argmax
argmax <- function(x) {
if (ygx(1, x) < ygx(0, x)) {
return(0)
} else {
return(1)
}
}
sim_df["y_pred"] <- apply(sim_df["x"], FUN = argmax, MARGIN = 1)
head(sim_df)
# Confusion matrix
table(sim_df$y, sim_df$y_pred)
objective_function <- function(x) {
abs(ygx(y = 1, x = x) - ygx(y = 0, x = x))
}
result <- optim(par=(mu0+mu1)/2, fn = objective_function, method='L-BFGS-B', lower=min(mu0, mu1), upper=max(mu0, mu1))
result
ygx(1, result$par) / ygx(0, result$par)
hist(sim_df$x)
# Open a PNG graphics device
# Plot the graph
plot(x=seq(-0, 10, by=0.1), y=ygx(y=1, seq(-0, 10, by=0.1)), type='l',
xlab='Value for x', ylab='P(Y|X)',
col='red',
main="Optimal Bayes Classifier Simulation",
ylim=c(0, 0.5))
points(x=seq(-0, 10, by=0.1), y=ygx(y=0, seq(-0, 10, by=0.1)), type='l', col="green")
abline(v=result$par, lty=2, col="blue")
# Agrega la leyenda
legend("topleft", legend=c("y=1", "y=0", "Bayes decision boundary"),
col=c("red", "green", "blue"), lty=c(1, 1, 2),
title="Color guide")