什么如生| 腰酸是什么原因女性| 右上腹是什么器官| 天雨粟鬼夜哭什么意思| 618是什么节日| 中国红是什么颜色| 右侧后背疼是什么原因| 神经是什么东西| 哈利波特是什么意思| 修罗道是什么意思| 钾低吃什么药| 睡眠时间短早醒是什么原因| rl是什么单位| 叶酸什么时候吃最好| 咳嗽吐血是什么原因| 虚心接受是什么意思| 舜字五行属什么| 猪身上红疙瘩用什么药| 面瘫有什么症状| 散射光是什么意思| 舒肝健胃丸治什么病| 左侧附件区囊性回声是什么意思| 什么样的糯米| 地藏菩萨是管什么的| 落空是什么意思| 职别是什么意思| 什么人不能吃绿豆| bgm是什么意思| 怕热是什么原因| 乌豆是什么| 心脏主要由什么组织构成| 月经不调看什么科室| 男生学什么技术吃香| 火龙果和什么相克| 带状疱疹吃什么药好| 保胎针是什么药| 空心菜什么人不能吃| 脸肿是什么原因引起的| 检查乙肝挂什么科| 30岁以上适合用什么牌子的护肤品| 十一月三十是什么星座| 怀孕了胃不舒服是什么原因| 吃无花果有什么好处和坏处| 月经粉红色是什么原因| 穆斯林为什么不吃猪肉| 司法警察是做什么的| 流口水吃什么药| 双身什么意思| 口是心非是什么动物| 冬至是什么时候| 色相是什么意思| 肾在什么位置图片| 血管炎是什么症状| 气血不足吃什么食物最好| 怀孕吃什么水果好| 眩晕吃什么药| 月经提前量少是什么原因| 唯我独尊是什么生肖| 深圳少年宫有什么好玩的| 胃痛吃什么药效果好| 瓤是什么意思| 旗舰店什么意思| 木冉读什么| 贫血貌是什么表现| 水印是什么| 核准日期是什么意思| 湿热体质吃什么食物好| 三进宫是什么意思| 丛林之王是什么动物| 碘酒是什么| 萌字五行属什么| 熬夜吃什么补回来| 阿莫西林治什么病| 骨折补钙吃什么钙片好| 嗓子发炎吃什么药| 低密度脂蛋白低是什么原因| 高血压吃什么降的快| 感冒挂什么科室| 痱子粉什么牌子好| 上吐下泻吃什么药| 肝内胆管结石是什么意思| 紫癜吃什么好得快| 中医为什么下午不把脉| 葡萄糖高是什么意思| 白虎是什么意思| 1月11日是什么星座| 8月23号是什么星座| gu是什么品牌| 言字旁有什么字| 蛊虫是什么| 脑梗前有什么征兆| 女生左手无名指戴戒指什么意思| 本是同根生相煎何太急是什么意思| ki是什么意思| 尿道口痛什么原因| 高什么亮什么| gop是什么| 2004是什么年| 清洁度lv是什么意思| 溶肌症的症状是什么| 槟榔是什么| 心肌酶高有什么症状| 急性肠胃炎能吃什么水果| 扁桃体切除有什么坏处| 排黑便是什么原因| 胃溃疡吃什么药| 绝经后子宫内膜增厚是什么原因| 什么是负离子| 天月二德是什么意思| 炸酱面的酱是什么酱| sop是什么意思| 肌醇是什么东西| 月经提前了10天是什么原因| 疲惫是什么意思| 桂花像什么| 慢性肠炎吃什么药调理| 吃什么奶水多| 梦见花生是什么意思| 你正在干什么用英语怎么说| 手刃是什么意思| 柠檬水喝了有什么好处| 什么血型最招蚊子| 动卧是什么意思| 吃完晚饭就犯困是什么原因| 国民老公是什么意思| 痛经看什么科| gc什么意思| 银梳子梳头有什么好处和坏处| 夏天吹空调感冒了吃什么药| 什么是二代身份证| 什么是牙齿根管治疗| 不畏将来不念过往什么意思| 深井冰什么意思| 104是什么意思| 为什么会得疣| 什么节气开始凉快| 1983是什么年| 网络用语是什么意思| 水落石出是什么生肖| 吹面不寒杨柳风什么意思| 什么是甲醛| 锁钥是什么意思| 汕头有什么好玩的景点| 衙内是什么意思| 舒张压偏高是什么原因造成的| 睡觉流口水吃什么药| 紫微斗数是什么| 为什么都说头胎很重要| 儿童回春颗粒主要治什么| 什么是初吻| jealousy是什么意思| 海绵体修复吃什么药| 什么时机塞给医生红包| 病魔是什么意思| 大便绿色什么原因| 肺大泡是什么原因造成的| 宝齐莱手表什么档次| 鱼腥草有什么作用| 美食家是什么意思| 怀挺是什么意思| 白化病是什么原因引起的| 乙醇是什么东西| 88年属龙的是什么命| 喝咖啡胃疼是什么原因| 吃人参对身体有什么好处| 开普拉多的都是什么人| 口腔溃疡吃什么好| 血象高是什么原因| 合龙是什么意思| 伊朗用什么货币| 什么是社恐| 鸡蛋属于什么类食品| fbi是什么| bliss是什么意思| 补维生素吃什么药最好| 单核细胞高是什么感染| 抑郁症什么症状表现| 头疼吃什么药好| 牙龈出血什么原因| 大小休是什么意思| 厥阴是什么意思| 什么叫扁平疣长什么样| 脚气吃什么维生素| 7月有什么活动| 水瓶女和什么座最配| 脱发严重应该去医院挂什么科| 准奏是什么意思| 什么是理想| 螺旋杆菌是什么病| 什么钱最值钱| blue是什么颜色| 耐药是什么意思| 侍寝是什么意思| 老年人脸肿是什么原因引起的| 胚根发育成什么| 多梦吃什么药| 办暂住证需要什么| 左眼跳是什么意思| 血糖低吃什么| 貌不惊人是什么意思| 婧五行属什么| 女龙配什么属相最好| 窗口期是什么意思| 什么的亮光| 肛瘘是什么意思| 高原反应什么症状| 血氯高是什么原因| 孕妇什么时候吃dha效果比较好| 肠胃炎吃什么药好| 孕妇生气对胎儿有什么影响| 五塔标行军散有什么功效| 生门是什么意思| 女生纹身什么图案好看| 洗衣机单漂洗是什么意思| 421是什么意思| 生蚝不能和什么一起吃| 吹泡泡什么意思| 六根不净是什么意思| 白细胞低代表什么意思| 甲状腺属于什么系统| 白化病是什么原因引起的| novo是什么牌子| 荔枝适合什么地方种植| 阔绰什么意思| 什么人招蚊子| 嗓子疼咳嗽挂什么科| 绝经什么意思| 吃虾不能和什么一起吃| 眼球出血是什么原因引起的| 黄金是什么药材| 梦见已故老人是什么预兆| 有机磷是什么| 阴道是什么意思| 不知道为了什么| 什么是黑户| 吃什么补充维生素d| 乙肝阻断针什么时候打| 紫丁香什么时候开花| 脾与什么相表里| 房颤是什么症状| 凌晨三点是什么时辰| 生姜红糖水有什么作用| 月经不能吃什么东西| 十二月十二日是什么星座| 志气是什么意思| 床上为什么会有跳蚤| 坐飞机不能带什么东西| 吃什么水果退烧| 细胞器是什么| 晚上右眼跳是什么预兆| 鸡的贵人是什么生肖| cognac是什么酒| 为什么光放屁| 眼皮肿什么原因引起的| 打蛇打七寸是什么意思| 居住证签注是什么意思| 插入是什么感觉| 宫内积液什么意思| 脚板肿是什么原因引起的| 肝癌有什么症状| 梦字五行属什么| 3个土念什么| 脚掌发红是什么原因| 下面外面瘙痒用什么药| 百度
百度 然而现实中,无论路况好不好,无论是否拥堵,无论拥堵有多严重,无论车辆走不走得动,走不走得快,都一律按收费标准收费。

The activation function of a node in an artificial neural network is a function that calculates the output of the node based on its individual inputs and their weights. Nontrivial problems can be solved using only a few nodes if the activation function is nonlinear.[1]

Logistic activation function

Modern activation functions include the logistic (sigmoid) function used in the 2012 speech recognition model developed by Hinton et al;[2] the ReLU used in the 2012 AlexNet computer vision model[3][4] and in the 2015 ResNet model; and the smooth version of the ReLU, the GELU, which was used in the 2018 BERT model.[5]

Comparison of activation functions

edit

Aside from their empirical performance, activation functions also have different mathematical properties:

Nonlinear
When the activation function is non-linear, then a two-layer neural network can be proven to be a universal function approximator.[6] This is known as the Universal Approximation Theorem. The identity activation function does not satisfy this property. When multiple layers use the identity activation function, the entire network is equivalent to a single-layer model.
Range
When the range of the activation function is finite, gradient-based training methods tend to be more stable, because pattern presentations significantly affect only limited weights. When the range is infinite, training is generally more efficient because pattern presentations significantly affect most of the weights. In the latter case, smaller learning rates are typically necessary.[citation needed]
Continuously differentiable
This property is desirable (ReLU is not continuously differentiable and has some issues with gradient-based optimization, but it is still possible) for enabling gradient-based optimization methods. The binary step activation function is not differentiable at 0, and it differentiates to 0 for all other values, so gradient-based methods can make no progress with it.[7]

These properties do not decisively influence performance, nor are they the only mathematical properties that may be useful. For instance, the strictly positive range of the softplus makes it suitable for predicting variances in variational autoencoders.

Mathematical details

edit

The most common activation functions can be divided into three categories: ridge functions, radial functions and fold functions.

An activation function   is saturating if  . It is nonsaturating if it is  . Non-saturating activation functions, such as ReLU, may be better than saturating activation functions, because they are less likely to suffer from the vanishing gradient problem.[8]

Ridge activation functions

edit

Ridge functions are multivariate functions acting on a linear combination of the input variables. Often used examples include:[clarification needed]

  • Linear activation:  ,
  • ReLU activation:  ,
  • Heaviside activation:  ,
  • Logistic activation:  .

In biologically inspired neural networks, the activation function is usually an abstraction representing the rate of action potential firing in the cell.[9] In its simplest form, this function is binary—that is, either the neuron is firing or not. Neurons also cannot fire faster than a certain rate, motivating sigmoid activation functions whose range is a finite interval.

The function looks like  , where   is the Heaviside step function.

If a line has a positive slope, on the other hand, it may reflect the increase in firing rate that occurs as input current increases. Such a function would be of the form  .

 
Rectified linear unit and Gaussian error linear unit activation functions

Radial activation functions

edit

A special class of activation functions known as radial basis functions (RBFs) are used in RBF networks. These activation functions can take many forms, but they are usually found as one of the following functions:

  • Gaussian:  
  • Multiquadratics:  
  • Inverse multiquadratics:  
  • Polyharmonic splines

where   is the vector representing the function center and   and   are parameters affecting the spread of the radius.

Other examples

edit

Periodic functions can serve as activation functions. Usually the sinusoid is used, as any periodic function is decomposable into sinusoids by the Fourier transform.[10]

Quadratic activation maps  .[11][12]

Folding activation functions

edit

Folding activation functions are extensively used in the pooling layers in convolutional neural networks, and in output layers of multiclass classification networks. These activations perform aggregation over the inputs, such as taking the mean, minimum or maximum. In multiclass classification the softmax activation is often used.

Table of activation functions

edit

The following table compares the properties of several activation functions that are functions of one fold x from the previous layer or layers:

Name Plot Function,   Derivative of  ,   Range Order of continuity
Identity          
Binary step          
Logistic, sigmoid, or soft step          
Hyperbolic tangent (tanh)          
Soboleva modified hyperbolic tangent (smht)        
Softsign        
Rectified linear unit (ReLU)[13]          
Gaussian Error Linear Unit (GELU)[5]     where   is the gaussian error function.   where   is the probability density function of standard gaussian distribution.    
Softplus[14]          
Exponential linear unit (ELU)[15]    
with parameter  
     
Scaled exponential linear unit (SELU)[16]    
with parameters   and  
     
Leaky rectified linear unit (Leaky ReLU)[17]          
Parametric rectified linear unit (PReLU)[18]    
with parameter  
     
Rectified Parametric Sigmoid Units (flexible, 5 parameters)
 
Rectified Parametric Sigmoid Units
 

where   [19]

     
Sigmoid linear unit (SiLU,[5] Sigmoid shrinkage,[20] SiL,[21] or Swish-?1[22])          
Exponential Linear Sigmoid SquasHing (ELiSH)[23]
 
An image of the ELiSH activation function plotted over the range [-3, 3] with a minumum value of ~0.881 at x ~= -0.172
       
Gaussian          
Sinusoid        

The following table lists activation functions that are not functions of a single fold x from the previous layer or layers:

Name Equation,   Derivatives,   Range Order of continuity
Softmax      for i = 1, …, J  [1][2]    
Maxout[24]        
^ Here,   is the Kronecker delta.
^ For instance,   could be iterating through the number of kernels of the previous neural network layer while   iterates through the number of kernels of the current layer.

Quantum activation functions

edit

In quantum neural networks programmed on gate-model quantum computers, based on quantum perceptrons instead of variational quantum circuits, the non-linearity of the activation function can be implemented with no need of measuring the output of each perceptron at each layer. The quantum properties loaded within the circuit such as superposition can be preserved by creating the Taylor series of the argument computed by the perceptron itself, with suitable quantum circuits computing the powers up to a wanted approximation degree. Because of the flexibility of such quantum circuits, they can be designed in order to approximate any arbitrary classical activation function.[25]

See also

edit

References

edit
  1. ^ Hinkelmann, Knut. "Neural Networks, p. 7" (PDF). University of Applied Sciences Northwestern Switzerland. Archived from the original (PDF) on 2025-08-07. Retrieved 2025-08-07.
  2. ^ Hinton, Geoffrey; Deng, Li; Deng, Li; Yu, Dong; Dahl, George; Mohamed, Abdel-rahman; Jaitly, Navdeep; Senior, Andrew; Vanhoucke, Vincent; Nguyen, Patrick; Sainath, Tara; Kingsbury, Brian (2012). "Deep Neural Networks for Acoustic Modeling in Speech Recognition". IEEE Signal Processing Magazine. 29 (6): 82–97. doi:10.1109/MSP.2012.2205597. S2CID 206485943.
  3. ^ Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E. (2025-08-07). "ImageNet classification with deep convolutional neural networks". Communications of the ACM. 60 (6): 84–90. doi:10.1145/3065386. ISSN 0001-0782.
  4. ^ King Abdulaziz University; Al-johania, Norah; Elrefaei, Lamiaa; Benha University (2025-08-07). "Dorsal Hand Vein Recognition by Convolutional Neural Networks: Feature Learning and Transfer Learning Approaches" (PDF). International Journal of Intelligent Engineering and Systems. 12 (3): 178–191. doi:10.22266/ijies2019.0630.19.
  5. ^ a b c Hendrycks, Dan; Gimpel, Kevin (2016). "Gaussian Error Linear Units (GELUs)". arXiv:1606.08415 [cs.LG].
  6. ^ Cybenko, G. (December 1989). "Approximation by superpositions of a sigmoidal function" (PDF). Mathematics of Control, Signals, and Systems. 2 (4): 303–314. Bibcode:1989MCSS....2..303C. doi:10.1007/BF02551274. ISSN 0932-4194. S2CID 3958369.
  7. ^ Snyman, Jan (3 March 2005). Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms. Springer Science & Business Media. ISBN 978-0-387-24348-1.
  8. ^ Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E. (2025-08-07). "ImageNet classification with deep convolutional neural networks". Communications of the ACM. 60 (6): 84–90. doi:10.1145/3065386. ISSN 0001-0782. S2CID 195908774.
  9. ^ Hodgkin, A. L.; Huxley, A. F. (2025-08-07). "A quantitative description of membrane current and its application to conduction and excitation in nerve". The Journal of Physiology. 117 (4): 500–544. doi:10.1113/jphysiol.1952.sp004764. PMC 1392413. PMID 12991237.
  10. ^ Sitzmann, Vincent; Martel, Julien; Bergman, Alexander; Lindell, David; Wetzstein, Gordon (2020). "Implicit Neural Representations with Periodic Activation Functions". Advances in Neural Information Processing Systems. 33. Curran Associates, Inc.: 7462–7473. arXiv:2006.09661.
  11. ^ Flake, Gary William (1998), Orr, Genevieve B.; Müller, Klaus-Robert (eds.), "Square Unit Augmented Radially Extended Multilayer Perceptrons", Neural Networks: Tricks of the Trade, Lecture Notes in Computer Science, vol. 1524, Berlin, Heidelberg: Springer, pp. 145–163, doi:10.1007/3-540-49430-8_8, ISBN 978-3-540-49430-0, retrieved 2025-08-07
  12. ^ Du, Simon; Lee, Jason (2025-08-07). "On the Power of Over-parametrization in Neural Networks with Quadratic Activation". Proceedings of the 35th International Conference on Machine Learning. PMLR: 1329–1338. arXiv:1803.01206.
  13. ^ Nair, Vinod; Hinton, Geoffrey E. (2010), "Rectified Linear Units Improve Restricted Boltzmann Machines", 27th International Conference on International Conference on Machine Learning, ICML'10, USA: Omnipress, pp. 807–814, ISBN 9781605589077
  14. ^ Glorot, Xavier; Bordes, Antoine; Bengio, Yoshua (2011). "Deep sparse rectifier neural networks" (PDF). International Conference on Artificial Intelligence and Statistics.
  15. ^ Clevert, Djork-Arné; Unterthiner, Thomas; Hochreiter, Sepp (2025-08-07). "Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)". arXiv:1511.07289 [cs.LG].
  16. ^ Klambauer, Günter; Unterthiner, Thomas; Mayr, Andreas; Hochreiter, Sepp (2025-08-07). "Self-Normalizing Neural Networks". Advances in Neural Information Processing Systems. 30 (2017). arXiv:1706.02515.
  17. ^ Maas, Andrew L.; Hannun, Awni Y.; Ng, Andrew Y. (June 2013). "Rectifier nonlinearities improve neural network acoustic models". Proc. ICML. 30 (1). S2CID 16489696.
  18. ^ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2025-08-07). "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification". arXiv:1502.01852 [cs.CV].
  19. ^ Atto, Abdourrahmane M.; Galichet, Sylvie; Pastor, Dominique; Méger, Nicolas (2023), "On joint parameterizations of linear and nonlinear functionals in neural networks", Elsevier Pattern Recognition, vol. 160, pp. 12–21, arXiv:2101.09948, doi:10.1016/j.neunet.2022.12.019, PMID 36592526
  20. ^ Atto, Abdourrahmane M.; Pastor, Dominique; Mercier, Grégoire (2008), "Smooth sigmoid wavelet shrinkage for non-parametric estimation" (PDF), 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3265–3268, doi:10.1109/ICASSP.2008.4518347, ISBN 978-1-4244-1483-3, S2CID 9959057
  21. ^ Elfwing, Stefan; Uchibe, Eiji; Doya, Kenji (2018). "Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning". Neural Networks. 107: 3–11. arXiv:1702.03118. doi:10.1016/j.neunet.2017.12.012. PMID 29395652. S2CID 6940861.
  22. ^ Ramachandran, Prajit; Zoph, Barret; Le, Quoc V (2017). "Searching for Activation Functions". arXiv:1710.05941 [cs.NE].
  23. ^ Basirat, Mina; Roth, Peter M. (2025-08-07), The Quest for the Golden Activation Function, arXiv:1808.00783
  24. ^ Goodfellow, Ian J.; Warde-Farley, David; Mirza, Mehdi; Courville, Aaron; Bengio, Yoshua (2013). "Maxout Networks". JMLR Workshop and Conference Proceedings. 28 (3): 1319–1327. arXiv:1302.4389.
  25. ^ Maronese, Marco; Destri, Claudio; Prati, Enrico (2022). "Quantum activation functions for quantum neural networks". Quantum Information Processing. 21 (4): 128. arXiv:2201.03700. Bibcode:2022QuIP...21..128M. doi:10.1007/s11128-022-03466-0. ISSN 1570-0755.

Further reading

edit
便秘吃什么可以调理 喝什么茶养肝护肝最好 清明有什么习俗 叫姑姑是什么关系 低血糖的人吃什么东西最好
胆红素偏高有什么危害 250为什么是骂人的话 72年五行属什么 戒烟吃什么药 喝茶喝多了有什么坏处
花木兰是什么朝代 pnh是什么病 舌头黄是什么原因 什么样的野花 怀孕肚子疼是什么原因
今年是什么生肖年 阳历7月7日是什么日子 4月8号什么星座 来大姨妈血块多是什么原因 新生儿痤疮是什么引起的
新生儿湿疹抹什么药膏hcv8jop6ns8r.cn 经血发黑是什么原因hcv9jop7ns0r.cn 退休是什么意思hcv8jop6ns1r.cn 徽音是什么意思clwhiglsz.com 吃什么食物下奶快而且奶多hcv8jop9ns0r.cn
肠癌是什么原因造成的weuuu.com 小孩抵抗力差吃什么提高免疫力0297y7.com 残局是什么意思hcv8jop1ns7r.cn 什么是无期徒刑hcv8jop9ns7r.cn 调理内分泌失调吃什么药效果好hcv8jop6ns0r.cn
爱长闭口用什么护肤品hcv8jop5ns6r.cn 面是什么做的hcv9jop5ns6r.cn 什么是肾结石hcv9jop7ns4r.cn 什么是生育津贴mmeoe.com 玮字五行属什么hcv8jop3ns8r.cn
高原反应有什么症状hcv8jop8ns9r.cn g1是什么意思hcv8jop4ns5r.cn 与什么俱什么hcv7jop9ns8r.cn 乙肝五项245阳性是什么意思hcv9jop5ns8r.cn 是什么星座hcv7jop7ns2r.cn
百度