Optimum Probability Estimation from Empirical Distributions

Probability estimation is important for the application of probabilistic models as well as for any evaluation in IR. We discuss the interdependencies between parameter estimation and certain properties of probabilistic models: dependence assumptions, binary vs. non-binary features, estimation sample selection. Then we define an optimum estimate for binary features which can be applied to various typical estimation problems in IR. A method for computing this estimate using empirical data is described. Some experiments show the applicability of our method, whereas comparable approaches are partially based on false assumptions or yield biased estimates.


