In information theory, the cross-entropy between two probability distributions {\displaystyle p} and {\displaystyle q} over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution {\displaystyle q}, rather than the true distribution {\displaystyle p}.