乳臭未干是什么意思| 药流前需要做什么检查| 植物神经紊乱吃什么中成药| o发什么音| 其实不然是什么意思| 北上广深是什么意思| 肠胃痉挛吃什么药| 神经性皮炎用什么药膏好| 什么是黑色星期五| 为什么今年这么热| 血便是什么颜色| 梦到捉鱼是什么意思| 心肌供血不足吃什么| 越狱男主角叫什么| rue是什么意思| 辟谷吃什么| tct检查是什么检查| 状元是什么意思| 铁皮石斛有什么功效| 心慌是什么引起的| 匹维溴铵片治什么病| 闰六月要给父母买什么| 一个虫一个离念什么| 纤维灶是什么意思| 懵的意思是什么| 什么山什么水| 易经是什么| 竹勿念什么| 领导喜欢什么样的员工| 志心皈命礼是什么意思| 雨字五行属什么| sdh是什么意思| 舌头尖疼吃什么药| 夜间盗汗是什么原因| 为什么会得糖尿病| 梦见小猪仔什么意思| 摸头杀是什么意思| 酸辣土豆丝用什么醋| 早上6点到7点是什么时辰| 西米是什么做的| 绍兴有什么大学| latex是什么| 鲨鱼肚是什么| k金是什么金| 热疹子是什么症状图片| 偷鸡不成蚀把米是什么生肖| 有机会是什么意思| 腮帮子长痘痘是什么原因| 浑水摸鱼是什么意思| 女生喜欢什么礼物| 同房干涩什么原因导致的| 兔子尾巴像什么| 吃什么补津液| 15岁可以做什么兼职| 看不起是什么意思| 什么姓氏好听| 伶字五行属什么| rm是什么意思| 6月2日是什么星座| 皮肤敏感是什么意思| 口干舌燥是什么病的前兆| 日昳是什么意思| 胃蛋白酶原1偏低是什么意思| hh是什么品牌| 非你不可什么意思| 神经性皮炎是什么| 慢性非萎缩性胃炎什么意思| 金银花和什么搭配喝好| 总警监是什么级别| 211是什么| 咽炎要注意什么饮食| 谷雨是什么时候| 清洁度三度什么意思| 人老是犯困想睡觉是什么原因| 喝咖啡对身体有什么好处| 闷骚男是什么意思| 西米是什么做的| 黄芪不能和什么一起吃| 辛未日五行属什么| 什么人什么己| 垂体是什么意思| cdf是什么意思| 江西古代叫什么| 王八蛋是什么意思| 父亲节应该送什么| ld是什么意思| 荨麻疹是什么引起的| 焦糖是什么糖| 男人喝什么酒壮阳最快| 扶山是什么意思| ost什么意思| 澳门什么时候回归祖国| 养胃喝什么茶好| 男生纹身纹什么好| 袋鼠属于什么类动物| 自渡是什么意思| 血糖高什么原因引起| 金蟾折桂什么意思| 薄荷与什么相克| 梦见一坨屎是什么意思| 执子之手与子偕老什么意思| 音字五行属什么| 孩子口臭是什么原因| 学生早餐吃什么方便又营养| 男人要吃什么才能壮阳| 电解质水是什么| 男生第一次什么感觉| 早上吃黄瓜有什么好处| ap医学上是什么意思| 粒细胞偏高是什么意思| 鬼畜是什么意思| 丼什么意思| 蒙奇奇是什么动物| 什么是盗汗症状| 怀孕有什么感觉| 腿水肿是什么原因引起的| 肠易激综合征是什么病| 班长是什么军衔| 5月是什么季节| 背靠背是什么牌子| 女人喝枸杞水有什么好处| 车万是什么意思| 辅酶q10什么时间吃好| 布加综合征是什么病| 淋巴细胞偏高说明什么问题| 克拉是什么意思| 为什么会拉血| 特斯拉用的是什么电池| 深圳有什么好玩的地方| 同房后需要注意什么| 肚子痛拉稀吃什么药| 拮抗剂是什么| 肠化生是什么症状| 夏至是什么生肖| 什么地走路| 一加一为什么等于二| 什么蔬菜补钾| 家政公司是做什么的| 情人节什么时候| 什么的长城| 麦冬长什么样子图片| 924是什么星座| 15年什么婚| 凌波仙子指的是什么花| 牙冠是什么意思| 月是什么意思| 小孩风热感冒吃什么药| 匹诺曹什么意思| 感冒低烧吃什么药| 19点是什么时辰| 什么照镜子里外不是人| 小产后可以吃什么水果| 今期难过美人关是什么生肖| 尿突然是红褐色的是什么问题| 焦虑症吃什么| 中焦不通吃什么药| 咽喉有异物感吃什么药| 运动后喝什么饮料最好| 电表走的快是什么原因| 风湿病吃什么药| 喉咙里的小肉球叫什么| 曲奇是什么意思| 孕酮什么意思| 舌系带短会有什么影响| 规格是指什么| 车工是做什么的| 欣赏一个人是什么意思| 89是什么意思| 甲状腺结节是什么引起的| 屁多是什么病的前兆| 嗣子是什么意思| 核桃壳有什么用处| 脂肪分解成什么| 打嗝用什么药| 为什么吹空调会咳嗽| 月子里可以吃什么水果| 梦见巨蟒是什么预兆| 慢性盆腔炎吃什么药效果好| 惊涛骇浪什么意思| 九月15是什么星座| 武汉有什么好玩的| 八三年属什么生肖| 遗尿是什么症状| 小奶猫吃什么| 什么能让男人变大变长| 时光荏苒岁月如梭是什么意思| kappa是什么牌子| 急性肠胃炎可以吃什么食物| 甘草长什么样子图片| 吃什么养头发| 便秘灌肠用什么水| m是什么意思| 陈赫的老婆叫什么名字| 猫是什么生肖| 2.3什么星座| 唐朝什么时候灭亡的| 牛肉用什么腌制比较嫩| 经常干咳是什么原因| 圣旨是什么意思| 猫弓背什么意思| 大人退烧吃什么药| 便秘吃什么| 惊世骇俗的意思是什么| 什么为笑| 心态崩了什么意思| 姓名字号是什么意思| 脚踝肿了是什么原因| 化名是什么意思| 房间隔缺损是什么意思| 内蒙古有什么特产| 多囊卵巢综合征是什么意思| 怀孕是什么脉象| infp是什么意思| 脾胃虚寒吃什么药| 狗男和什么属相最配| 珮字五行属什么| 野鸭子吃什么| 菜粥里面放什么菜最好| 白薯是什么| 治未病是什么意思| 腋臭挂什么科室| 甲亢的早期症状是什么| 什么木质手串最好| 是什么拼音| 什么是条件反射| 居高临下是什么意思| 海绵体修复吃什么药| 癫疯病早期有什么症状| 招财猫鱼吃什么| 欧什么意思| 事半功倍什么意思| 恃宠而骄什么意思| 女孩和女人有什么区别| 为什么会得肾构错瘤| 蚊子最怕什么| 为什么会缺钾| 五十肩是什么意思| 感冒吃什么食物比较好| 百香果什么味道| 咽喉炎有什么症状| et什么意思| 待业什么意思| 累了喝什么缓解疲劳| 为什么会基因突变| 吃完油炸的东西后吃什么化解| 热惊厥病发是什么症状| 手心热是什么原因| hr医学上是什么意思| 李倩梅结局是什么| 多囊吃什么药| 吸入物变应原筛查是什么| 尿很臭是什么原因女性| 安瓶是什么| 冬瓜不能和什么一起吃| examine什么意思| rolls是什么意思| 什么是化疗和放疗| 梦见死人了是什么预兆| 乙肝表面抗体阳性什么意思| 拉屎臭是什么原因| 跳蚤咬了擦什么药最好| 梦见自己把蛇打死了是什么意思| 表姐的女儿叫什么| 割爱是什么意思| 百度

用车蝙蝠侠也会喜欢 自制蝙蝠车还可以加载iPa

百度 各级政府要从我国农村经济与社会发展的全局出发,充分认识逐步建立农村社会养老保险制度对于深化农村发展改革、缩小城乡差别、保护农民权益、改善党群干群关系和落实计划生育基本国策、促进农村经济发展和社会稳定的重要意义。

In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes.[1] That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables (which may be real-valued, binary-valued, categorical-valued, etc.).

Multinomial logistic regression is known by a variety of other names, including polytomous LR,[2][3] multiclass LR, softmax regression, multinomial logit (mlogit), the maximum entropy (MaxEnt) classifier, and the conditional maximum entropy model.[4]

Background

edit

Multinomial logistic regression is used when the dependent variable in question is nominal (equivalently categorical, meaning that it falls into any one of a set of categories that cannot be ordered in any meaningful way) and for which there are more than two categories. Some examples would be:

  • Which major will a college student choose, given their grades, stated likes and dislikes, etc.?
  • Which blood type does a person have, given the results of various diagnostic tests?
  • In a hands-free mobile phone dialing application, which person's name was spoken, given various properties of the speech signal?
  • Which candidate will a person vote for, given particular demographic characteristics?
  • Which country will a firm locate an office in, given the characteristics of the firm and of the various candidate countries?

These are all statistical classification problems. They all have in common a dependent variable to be predicted that comes from one of a limited set of items that cannot be meaningfully ordered, as well as a set of independent variables (also known as features, explanators, etc.), which are used to predict the dependent variable. Multinomial logistic regression is a particular solution to classification problems that use a linear combination of the observed features and some problem-specific parameters to estimate the probability of each particular value of the dependent variable. The best values of the parameters for a given problem are usually determined from some training data (e.g. some people for whom both the diagnostic test results and blood types are known, or some examples of known words being spoken).

Assumptions

edit

The multinomial logistic model assumes that data are case-specific; that is, each independent variable has a single value for each case. As with other types of regression, there is no need for the independent variables to be statistically independent from each other (unlike, for example, in a naive Bayes classifier); however, collinearity is assumed to be relatively low, as it becomes difficult to differentiate between the impact of several variables if this is not the case.[5]

If the multinomial logit is used to model choices, it relies on the assumption of independence of irrelevant alternatives (IIA), which is not always desirable. This assumption states that the odds of preferring one class over another do not depend on the presence or absence of other "irrelevant" alternatives. For example, the relative probabilities of taking a car or bus to work do not change if a bicycle is added as an additional possibility. This allows the choice of K alternatives to be modeled as a set of K ? 1 independent binary choices, in which one alternative is chosen as a "pivot" and the other K ? 1 compared against it, one at a time. The IIA hypothesis is a core hypothesis in rational choice theory; however numerous studies in psychology show that individuals often violate this assumption when making choices. An example of a problem case arises if choices include a car and a blue bus. Suppose the odds ratio between the two is 1 : 1. Now if the option of a red bus is introduced, a person may be indifferent between a red and a blue bus, and hence may exhibit a car : blue bus : red bus odds ratio of 1 : 0.5 : 0.5, thus maintaining a 1 : 1 ratio of car : any bus while adopting a changed car : blue bus ratio of 1 : 0.5. Here the red bus option was not in fact irrelevant, because a red bus was a perfect substitute for a blue bus.

If the multinomial logit is used to model choices, it may in some situations impose too much constraint on the relative preferences between the different alternatives. It is especially important to take into account if the analysis aims to predict how choices would change if one alternative were to disappear (for instance if one political candidate withdraws from a three candidate race). Other models like the nested logit or the multinomial probit may be used in such cases as they allow for violation of the IIA.[6]

Model

edit

Introduction

edit

There are multiple equivalent ways to describe the mathematical model underlying multinomial logistic regression. This can make it difficult to compare different treatments of the subject in different texts. The article on logistic regression presents a number of equivalent formulations of simple logistic regression, and many of these have analogues in the multinomial logit model.

The idea behind all of them, as in many other statistical classification techniques, is to construct a linear predictor function that constructs a score from a set of weights that are linearly combined with the explanatory variables (features) of a given observation using a dot product:

 

where Xi is the vector of explanatory variables describing observation i, βk is a vector of weights (or regression coefficients) corresponding to outcome k, and score(Xi, k) is the score associated with assigning observation i to category k. In discrete choice theory, where observations represent people and outcomes represent choices, the score is considered the utility associated with person i choosing outcome k. The predicted outcome is the one with the highest score.

The difference between the multinomial logit model and numerous other methods, models, algorithms, etc. with the same basic setup (the perceptron algorithm, support vector machines, linear discriminant analysis, etc.) is the procedure for determining (training) the optimal weights/coefficients and the way that the score is interpreted. In particular, in the multinomial logit model, the score can directly be converted to a probability value, indicating the probability of observation i choosing outcome k given the measured characteristics of the observation. This provides a principled way of incorporating the prediction of a particular multinomial logit model into a larger procedure that may involve multiple such predictions, each with a possibility of error. Without such means of combining predictions, errors tend to multiply. For example, imagine a large predictive model that is broken down into a series of submodels where the prediction of a given submodel is used as the input of another submodel, and that prediction is in turn used as the input into a third submodel, etc. If each submodel has 90% accuracy in its predictions, and there are five submodels in series, then the overall model has only 0.95 = 59% accuracy. If each submodel has 80% accuracy, then overall accuracy drops to 0.85 = 33% accuracy. This issue is known as error propagation and is a serious problem in real-world predictive models, which are usually composed of numerous parts. Predicting probabilities of each possible outcome, rather than simply making a single optimal prediction, is one means of alleviating this issue.[citation needed]

Setup

edit

The basic setup is the same as in logistic regression, the only difference being that the dependent variables are categorical rather than binary, i.e. there are K possible outcomes rather than just two. The following description is somewhat shortened; for more details, consult the logistic regression article.

Data points

edit

Specifically, it is assumed that we have a series of N observed data points. Each data point i (ranging from 1 to N) consists of a set of M explanatory variables x1,i ... xM,i (also known as independent variables, predictor variables, features, etc.), and an associated categorical outcome Yi (also known as dependent variable, response variable), which can take on one of K possible values. These possible values represent logically separate categories (e.g. different political parties, blood types, etc.), and are often described mathematically by arbitrarily assigning each a number from 1 to K. The explanatory variables and outcome represent observed properties of the data points, and are often thought of as originating in the observations of N "experiments" — although an "experiment" may consist of nothing more than gathering data. The goal of multinomial logistic regression is to construct a model that explains the relationship between the explanatory variables and the outcome, so that the outcome of a new "experiment" can be correctly predicted for a new data point for which the explanatory variables, but not the outcome, are available. In the process, the model attempts to explain the relative effect of differing explanatory variables on the outcome.

Some examples:

  • The observed outcomes are different variants of a disease such as hepatitis (possibly including "no disease" and/or other related diseases) in a set of patients, and the explanatory variables might be characteristics of the patients thought to be pertinent (sex, race, age, blood pressure, outcomes of various liver-function tests, etc.). The goal is then to predict which disease is causing the observed liver-related symptoms in a new patient.
  • The observed outcomes are the party chosen by a set of people in an election, and the explanatory variables are the demographic characteristics of each person (e.g. sex, race, age, income, etc.). The goal is then to predict the likely vote of a new voter with given characteristics.

Linear predictor

edit

As in other forms of linear regression, multinomial logistic regression uses a linear predictor function   to predict the probability that observation i has outcome k, of the following form:

 

where   is a regression coefficient associated with the mth explanatory variable and the kth outcome. As explained in the logistic regression article, the regression coefficients and explanatory variables are normally grouped into vectors of size M + 1, so that the predictor function can be written more compactly:

 

where   is the set of regression coefficients associated with outcome k, and   (a row vector) is the set of explanatory variables associated with observation i, prepended by a 1 in entry 0.

As a set of independent binary regressions

edit

To arrive at the multinomial logit model, one can imagine, for K possible outcomes, running K independent binary logistic regression models, in which one outcome is chosen as a "pivot" and then the other K ? 1 outcomes are separately regressed against the pivot outcome. If outcome K (the last outcome) is chosen as the pivot, the K ? 1 regression equations are:

 .

This formulation is also known as the Additive Log Ratio transform commonly used in compositional data analysis. In other applications it’s referred to as “relative risk”.[7]

If we exponentiate both sides and solve for the probabilities, we get:

 

Using the fact that all K of the probabilities must sum to one, we find:

 

We can use this to find the other probabilities:

 .

The fact that we run multiple regressions reveals why the model relies on the assumption of independence of irrelevant alternatives described above.

Estimating the coefficients

edit

The unknown parameters in each vector βk are typically jointly estimated by maximum a posteriori (MAP) estimation, which is an extension of maximum likelihood using regularization of the weights to prevent pathological solutions (usually a squared regularizing function, which is equivalent to placing a zero-mean Gaussian prior distribution on the weights, but other distributions are also possible). The solution is typically found using an iterative procedure such as generalized iterative scaling,[8] iteratively reweighted least squares (IRLS),[9] by means of gradient-based optimization algorithms such as L-BFGS,[4] or by specialized coordinate descent algorithms.[10]

As a log-linear model

edit

The formulation of binary logistic regression as a log-linear model can be directly extended to multi-way regression. That is, we model the logarithm of the probability of seeing a given output using the linear predictor as well as an additional normalization factor, the logarithm of the partition function:

 

As in the binary case, we need an extra term   to ensure that the whole set of probabilities forms a probability distribution, i.e. so that they all sum to one:

 

The reason why we need to add a term to ensure normalization, rather than multiply as is usual, is because we have taken the logarithm of the probabilities. Exponentiating both sides turns the additive term into a multiplicative factor, so that the probability is just the Gibbs measure:

 

The quantity Z is called the partition function for the distribution. We can compute the value of the partition function by applying the above constraint that requires all probabilities to sum to 1:

 

Therefore

 

Note that this factor is "constant" in the sense that it is not a function of Yi, which is the variable over which the probability distribution is defined. However, it is definitely not constant with respect to the explanatory variables, or crucially, with respect to the unknown regression coefficients βk, which we will need to determine through some sort of optimization procedure.

The resulting equations for the probabilities are

 


The following function:

 

is referred to as the softmax function. The reason is that the effect of exponentiating the values   is to exaggerate the differences between them. As a result,   will return a value close to 0 whenever   is significantly less than the maximum of all the values, and will return a value close to 1 when applied to the maximum value, unless it is extremely close to the next-largest value. Thus, the softmax function can be used to construct a weighted average that behaves as a smooth function (which can be conveniently differentiated, etc.) and which approximates the indicator function

 

Thus, we can write the probability equations as

 

The softmax function thus serves as the equivalent of the logistic function in binary logistic regression.

Note that not all of the   vectors of coefficients are uniquely identifiable. This is due to the fact that all probabilities must sum to 1, making one of them completely determined once all the rest are known. As a result, there are only   separately specifiable probabilities, and hence   separately identifiable vectors of coefficients. One way to see this is to note that if we add a constant vector to all of the coefficient vectors, the equations are identical:

 

As a result, it is conventional to set   (or alternatively, one of the other coefficient vectors). Essentially, we set the constant so that one of the vectors becomes  , and all of the other vectors get transformed into the difference between those vectors and the vector we chose. This is equivalent to "pivoting" around one of the K choices, and examining how much better or worse all of the other K ? 1 choices are, relative to the choice we are pivoting around. Mathematically, we transform the coefficients as follows:

 

This leads to the following equations:

 

Other than the prime symbols on the regression coefficients, this is exactly the same as the form of the model described above, in terms of K ? 1 independent two-way regressions.

As a latent-variable model

edit

It is also possible to formulate multinomial logistic regression as a latent variable model, following the two-way latent variable model described for binary logistic regression. This formulation is common in the theory of discrete choice models, and makes it easier to compare multinomial logistic regression to the related multinomial probit model, as well as to extend it to more complex models.

Imagine that, for each data point i and possible outcome k = 1,2,...,K, there is a continuous latent variable Yi,k* (i.e. an unobserved random variable) that is distributed as follows:

 

where   i.e. a standard type-1 extreme value distribution.

This latent variable can be thought of as the utility associated with data point i choosing outcome k, where there is some randomness in the actual amount of utility obtained, which accounts for other unmodeled factors that go into the choice. The value of the actual variable   is then determined in a non-random fashion from these latent variables (i.e. the randomness has been moved from the observed outcomes into the latent variables), where outcome k is chosen if and only if the associated utility (the value of  ) is greater than the utilities of all the other choices, i.e. if the utility associated with outcome k is the maximum of all the utilities. Since the latent variables are continuous, the probability of two having exactly the same value is 0, so we ignore the scenario. That is:

 

Or equivalently:

 

Let's look more closely at the first equation, which we can write as follows:

 

There are a few things to realize here:

  1. In general, if   and   then   That is, the difference of two independent identically distributed extreme-value-distributed variables follows the logistic distribution, where the first parameter is unimportant. This is understandable since the first parameter is a location parameter, i.e. it shifts the mean by a fixed amount, and if two values are both shifted by the same amount, their difference remains the same. This means that all of the relational statements underlying the probability of a given choice involve the logistic distribution, which makes the initial choice of the extreme-value distribution, which seemed rather arbitrary, somewhat more understandable.
  2. The second parameter in an extreme-value or logistic distribution is a scale parameter, such that if   then   This means that the effect of using an error variable with an arbitrary scale parameter in place of scale 1 can be compensated simply by multiplying all regression vectors by the same scale. Together with the previous point, this shows that the use of a standard extreme-value distribution (location 0, scale 1) for the error variables entails no loss of generality over using an arbitrary extreme-value distribution. In fact, the model is nonidentifiable (no single set of optimal coefficients) if the more general distribution is used.
  3. Because only differences of vectors of regression coefficients are used, adding an arbitrary constant to all coefficient vectors has no effect on the model. This means that, just as in the log-linear model, only K ? 1 of the coefficient vectors are identifiable, and the last one can be set to an arbitrary value (e.g. 0).

Actually finding the values of the above probabilities is somewhat difficult, and is a problem of computing a particular order statistic (the first, i.e. maximum) of a set of values. However, it can be shown that the resulting expressions are the same as in above formulations, i.e. the two are equivalent.

Estimation of intercept

edit

When using multinomial logistic regression, one category of the dependent variable is chosen as the reference category. Separate odds ratios are determined for all independent variables for each category of the dependent variable with the exception of the reference category, which is omitted from the analysis. The exponential beta coefficient represents the change in the odds of the dependent variable being in a particular category vis-a-vis the reference category, associated with a one unit change of the corresponding independent variable.

Likelihood function

edit

The observed values   for   of the explained variables are considered as realizations of stochastically independent, categorically distributed random variables  .

The likelihood function for this model is defined by

 

where the index   denotes the observations 1 to n and the index   denotes the classes 1 to K.   is the Kronecker delta.

The negative log-likelihood function is therefore the well-known cross-entropy:

 

Application in natural language processing

edit

In natural language processing, multinomial LR classifiers are commonly used as an alternative to naive Bayes classifiers because they do not assume statistical independence of the random variables (commonly known as features) that serve as predictors. However, learning in such a model is slower than for a naive Bayes classifier, and thus may not be appropriate given a very large number of classes to learn. In particular, learning in a naive Bayes classifier is a simple matter of counting up the number of co-occurrences of features and classes, while in a maximum entropy classifier the weights, which are typically maximized using maximum a posteriori (MAP) estimation, must be learned using an iterative procedure; see #Estimating the coefficients.

See also

edit

References

edit
  1. ^ Greene, William H. (2012). Econometric Analysis (Seventh ed.). Boston: Pearson Education. pp. 803–806. ISBN 978-0-273-75356-8.
  2. ^ Engel, J. (1988). "Polytomous logistic regression". Statistica Neerlandica. 42 (4): 233–252. doi:10.1111/j.1467-9574.1988.tb01238.x.
  3. ^ Menard, Scott (2002). Applied Logistic Regression Analysis. SAGE. p. 91. ISBN 9780761922087.
  4. ^ a b Malouf, Robert (2002). A comparison of algorithms for maximum entropy parameter estimation (PDF). Sixth Conf. on Natural Language Learning (CoNLL). pp. 49–55.
  5. ^ Belsley, David (1991). Conditioning diagnostics : collinearity and weak data in regression. New York: Wiley. ISBN 9780471528890.
  6. ^ Baltas, G.; Doyle, P. (2001). "Random Utility Models in Marketing Research: A Survey". Journal of Business Research. 51 (2): 115–125. doi:10.1016/S0148-2963(99)00058-2.
  7. ^ Stata Manual “mlogit — Multinomial (polytomous) logistic regression”
  8. ^ Darroch, J.N. & Ratcliff, D. (1972). "Generalized iterative scaling for log-linear models". The Annals of Mathematical Statistics. 43 (5): 1470–1480. doi:10.1214/aoms/1177692379.
  9. ^ Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer. pp. 206–209.
  10. ^ Yu, Hsiang-Fu; Huang, Fang-Lan; Lin, Chih-Jen (2011). "Dual coordinate descent methods for logistic regression and maximum entropy models" (PDF). Machine Learning. 85 (1–2): 41–75. doi:10.1007/s10994-010-5221-8.
血红素是什么 梦见麒麟是什么兆头 慢性萎缩性胃炎伴糜烂吃什么药 单抗是什么药 1月1号是什么星座
昂热为什么认识路鸣泽 螃蟹过街的歇后语是什么 重症肌无力用什么药 意淫是什么意思 电子烟是什么
为什么月经来是黑色的 胆气不足吃什么中成药 湿厕纸是干什么用的 人几读什么 老是打饱嗝是什么原因
阴毛长虱子用什么药 手指甲上有白点是什么原因 28年属什么生肖 散光轴位是什么 病毒感染咳嗽吃什么药效果好
叶倩文属什么生肖hcv8jop0ns1r.cn 心功能iv级是什么意思hcv7jop6ns5r.cn 痔疮用什么药最好hcv8jop9ns5r.cn 天罗地网是什么生肖hcv8jop8ns6r.cn 婚检挂什么科gangsutong.com
什么雨hcv8jop5ns3r.cn remember是什么意思hcv8jop6ns6r.cn 动脉硬化吃什么可以软化血管cl108k.com 澳大利亚有什么动物wzqsfys.com 油边是什么肉hcv9jop1ns6r.cn
汽车五行属什么hcv9jop3ns5r.cn 蓝绿色是什么颜色hcv8jop2ns1r.cn 电焊打眼最有效最快的方法是什么hcv8jop5ns3r.cn 属猴与什么属相最配hcv7jop9ns2r.cn 哒哒是什么意思hcv7jop6ns3r.cn
核桃什么时候成熟hcv8jop7ns9r.cn 杯弓蛇影是什么物理现象hcv9jop6ns4r.cn 发炎不能吃什么东西hcv9jop8ns3r.cn 7朵玫瑰花代表什么意思hcv9jop0ns6r.cn 三栖明星是什么意思sscsqa.com
百度