1.is 和 == 的区别
相信学过 Python 小伙伴们都知道 is 和 == 都是用来比较 Python 对象的,但是区别就是
- is 比较需要对象的值和内存地址都相等
- == 比较只需要对象的值相等就行了
我们来看一个例子
import timet1 = time.time()t2 = time.time()print("t1 的值:", t1)print("t2 的值:", t2)print(t1 == t2)print(t1 is t2)import time t1 = time.time() t2 = time.time() print("t1 的值:", t1) print("t2 的值:", t2) print(t1 == t2) print(t1 is t2)import time t1 = time.time() t2 = time.time() print("t1 的值:", t1) print("t2 的值:", t2) print(t1 == t2) print(t1 is t2)
#结果t1 的值: 1679973143.1747568t2 的值: 1679973143.1747568TrueFalse#结果 t1 的值: 1679973143.1747568 t2 的值: 1679973143.1747568 True False#结果 t1 的值: 1679973143.1747568 t2 的值: 1679973143.1747568 True False
我们可以看到,time 模块的 time() 方法用于获取当前时间,所以 t1 和 t2 的值都是一样的
== 用来判断 t1 和 t2 的值是否相等,所以返回 True
虽然 t1 和 t2 的值相等,但它们是两个不同的对象(每次调用 time() 都返回不同的对象),所以t1 is t2
返回 False
那么,如何判断两个对象是否相同呢?
答:判断两个对象的内存地址。如果内存地址相同,说明两个对象使用的是同一块内存,当然就是同一个对象了
我们来看下 t1 和 t2 的内存地址
import timet1 = time.time()t2 = time.time()print("t1 的内存地址:", id(t1))print("t2 的内存地址:", id(t2)import time t1 = time.time() t2 = time.time() print("t1 的内存地址:", id(t1)) print("t2 的内存地址:", id(t2)import time t1 = time.time() t2 = time.time() print("t1 的内存地址:", id(t1)) print("t2 的内存地址:", id(t2)
#结果t1 的内存地址: 2251407006832t2 的内存地址: 2251405788464#结果 t1 的内存地址: 2251407006832 t2 的内存地址: 2251405788464#结果 t1 的内存地址: 2251407006832 t2 的内存地址: 2251405788464
可以看到它们两个的内存地址是不一样的
2.小整数池 & 缓存机制
但是有小伙伴可能会遇到下面的这种情况
a = 4b = 4print(a == b) # Trueprint(a is b) # Truea = 4 b = 4 print(a == b) # True print(a is b) # Truea = 4 b = 4 print(a == b) # True print(a is b) # True
咦?怎么 a is b 结果是 True?这应该是两个不同的对象啊
这其实是因为小整数池
python 中经常使用的一些数值定义为小整数池,小整数池的范围是[-5,256]
python 对这些数值已经提前创建好了内存空间,即使多次重新定义也不会在重新开辟新的空间,但是小整数池外的数值在重新定义时都会再次开辟新的空间
所以对于小整数池中的数,内存地址一定是相同的,小整数池中外的数,内存地址是不同的
a = 4b = 4print(id(a))print(id(b))a = 4 b = 4 print(id(a)) print(id(b))a = 4 b = 4 print(id(a)) print(id(b))
# 结果23084904882082308490488208# 结果 2308490488208 2308490488208# 结果 2308490488208 2308490488208
好,那这次我用小整数池之外的数
a = 1000b = 1000print(a == b) # Trueprint(a is b) # Truea = 1000 b = 1000 print(a == b) # True print(a is b) # Truea = 1000 b = 1000 print(a == b) # True print(a is b) # True
a = 1000b = 1000print(id(a))print(id(b))a = 1000 b = 1000 print(id(a)) print(id(b))a = 1000 b = 1000 print(id(a)) print(id(b))
#结果21023488523682102348852368#结果 2102348852368 2102348852368#结果 2102348852368 2102348852368
?玩我呢吧,说好的小整数池中外的数,内存地址是不同的,那上面的代码怎么跟说的不一样
上面的代码我是在 IDE 环境下面敲的,我们试着在交互模式下敲
#小整数池>>> a = 4>>> b = 4>>> print(a == b)True>>> print(a is b)True#小整数池 >>> a = 4 >>> b = 4 >>> print(a == b) True >>> print(a is b) True#小整数池 >>> a = 4 >>> b = 4 >>> print(a == b) True >>> print(a is b) True
#非小整数池>>> a = 1000>>> b = 1000>>> print(a == b)True>>> print(a is b)False#非小整数池 >>> a = 1000 >>> b = 1000 >>> print(a == b) True >>> print(a is b) False#非小整数池 >>> a = 1000 >>> b = 1000 >>> print(a == b) True >>> print(a is b) False
可以看到,在交互模式下,小整数池外的数内存地址不相同,这是为什么呢?
先说结论:这是因为 Python 的缓存机制,所以在 IDE 环境或者脚本模式下同一个整数被多个变量引用不会开辟新的内存空间
Python 缓存机制
- Python 解释器启动时会先从内存空间中开辟出来一小部分,用于存储高频使用的数据(不可变数据类型),这样可以大大减少高频使用的数据对象创建时申请内存和销毁时撤销内存的开销
- 在同一代码块下,不可变数据类型的对象(数字,字符串,元祖)被多个变量引用,不会重复开辟内存空间
由上面得知,只有不可变的数据类型(字符串、元祖、基础数据类型)如果被多个变量引用,是不会重复开辟内存空间,但可变数据类型(列表、字典、集合)就除外
- 可变数据类型
我们来看看
#列表l1 = [1, 2, 3]l2 = [1, 2, 3]print(id(l1))print(id(l2))print(l1 is l2)#结果21576015586562157601388224False#列表 l1 = [1, 2, 3] l2 = [1, 2, 3] print(id(l1)) print(id(l2)) print(l1 is l2) #结果 2157601558656 2157601388224 False#列表 l1 = [1, 2, 3] l2 = [1, 2, 3] print(id(l1)) print(id(l2)) print(l1 is l2) #结果 2157601558656 2157601388224 False
#字典dict1 = {'name': "kanye", "age":18}dict2 = {'name': "kanye", "age":18}print(id(dict1))print(id(dict2))print(dict1 is dict2)#结果20965764182402096576418432False#字典 dict1 = {'name': "kanye", "age":18} dict2 = {'name': "kanye", "age":18} print(id(dict1)) print(id(dict2)) print(dict1 is dict2) #结果 2096576418240 2096576418432 False#字典 dict1 = {'name': "kanye", "age":18} dict2 = {'name': "kanye", "age":18} print(id(dict1)) print(id(dict2)) print(dict1 is dict2) #结果 2096576418240 2096576418432 False
#集合s1 = {1, 2, '3'}s2 = {1, 2, '3'}print(id(s1))print(id(s2))print(s1 is s2)#结果23261840691522326184068928False#集合 s1 = {1, 2, '3'} s2 = {1, 2, '3'} print(id(s1)) print(id(s2)) print(s1 is s2) #结果 2326184069152 2326184068928 False#集合 s1 = {1, 2, '3'} s2 = {1, 2, '3'} print(id(s1)) print(id(s2)) print(s1 is s2) #结果 2326184069152 2326184068928 False
在交互模式下结果也是如此
>>> s1 = {1, 2, '3'}>>> s2 = {1, 2, '3'}>>> print(s1 is s2)False>>> dict1 = {'name': "kanye", "age":18}>>> dict2 = {'name': "kanye", "age":18}>>> print(dict1 is dict2)False>>> l1 = [1, 2, 3]>>> l2 = [1, 2, 3]>>> print(l1 is l2)False>>> s1 = {1, 2, '3'} >>> s2 = {1, 2, '3'} >>> print(s1 is s2) False >>> dict1 = {'name': "kanye", "age":18} >>> dict2 = {'name': "kanye", "age":18} >>> print(dict1 is dict2) False >>> l1 = [1, 2, 3] >>> l2 = [1, 2, 3] >>> print(l1 is l2) False>>> s1 = {1, 2, '3'} >>> s2 = {1, 2, '3'} >>> print(s1 is s2) False >>> dict1 = {'name': "kanye", "age":18} >>> dict2 = {'name': "kanye", "age":18} >>> print(dict1 is dict2) False >>> l1 = [1, 2, 3] >>> l2 = [1, 2, 3] >>> print(l1 is l2) False
- 不可变数据类型
1、小整数池里的数
我们来看下交互模式下的不可变数据类型的缓存机制
>>> a=4>>> b=4>>> print(a is b)True>>> num1=100>>> num2=100>>> print(num1 is num2)True>>> a=4 >>> b=4 >>> print(a is b) True >>> num1=100 >>> num2=100 >>> print(num1 is num2) True>>> a=4 >>> b=4 >>> print(a is b) True >>> num1=100 >>> num2=100 >>> print(num1 is num2) True
可以看到,Python 中整数范围 [-5, 256] 中的数为固定缓存,只要是使用到该范围内的数字,不管是直接赋值还是表达式计算得到的,都会使用固定缓存中的数据
2、非小整数池里的数
对于非小整数池里的数,在 IDE 环境下会使用到缓存,即多个变量引用同一个数据,不会开辟新的内存空间
#结果都为 Truea = -10b = -10print(a is b)num1 = 1.0num2 = 1.0print(num1 is num2)n1 = 1000n2 = 1000print(n1 is n2)#结果都为 True a = -10 b = -10 print(a is b) num1 = 1.0 num2 = 1.0 print(num1 is num2) n1 = 1000 n2 = 1000 print(n1 is n2)#结果都为 True a = -10 b = -10 print(a is b) num1 = 1.0 num2 = 1.0 print(num1 is num2) n1 = 1000 n2 = 1000 print(n1 is n2)
对于非小整数池里的数,在交互模式下,除非同时赋值或者在同一个代码块里面赋值,否则不会使用缓存机制
#同时赋值>>> n1,n2=1000,1000>>> print(n1 is n2)True>>> f1,f2=-10.2,-10.2>>> print(f1 is f2)True#同时赋值 >>> n1,n2=1000,1000 >>> print(n1 is n2) True >>> f1,f2=-10.2,-10.2 >>> print(f1 is f2) True#同时赋值 >>> n1,n2=1000,1000 >>> print(n1 is n2) True >>> f1,f2=-10.2,-10.2 >>> print(f1 is f2) True
#同一代码块下赋值>>> for i in range(3):... a=-10... b=-10... print(a is b)...TrueTrueTrue>>> for i in range(3):... a=1000... b=1000... print(a is b)...TrueTrueTrue>>> for i in range(3):... num1=-100... num2=-100... print(num1 is num2)...TrueTrueTrue>>> for i in range(3):... f1=-10.2... f2=-10.2... print(f1 is f2)...TrueTrueTrue#同一代码块下赋值 >>> for i in range(3): ... a=-10 ... b=-10 ... print(a is b) ... True True True >>> for i in range(3): ... a=1000 ... b=1000 ... print(a is b) ... True True True >>> for i in range(3): ... num1=-100 ... num2=-100 ... print(num1 is num2) ... True True True >>> for i in range(3): ... f1=-10.2 ... f2=-10.2 ... print(f1 is f2) ... True True True#同一代码块下赋值 >>> for i in range(3): ... a=-10 ... b=-10 ... print(a is b) ... True True True >>> for i in range(3): ... a=1000 ... b=1000 ... print(a is b) ... True True True >>> for i in range(3): ... num1=-100 ... num2=-100 ... print(num1 is num2) ... True True True >>> for i in range(3): ... f1=-10.2 ... f2=-10.2 ... print(f1 is f2) ... True True True
4.intern 机制
我们知道,由于 Python 的缓存机制:
- 不可变的数据类型(字符串、元祖、基础数据类型)如果被多个变量引用,是不会重复开辟内存空间
- 但可变数据类型(列表、字典、集合)被多个变量引用就会开辟新的内存空间
- 对于小整数池里的整数,被多个变量引用,不会重复开辟内存空间
但是到目前为止我们知道:在交互模式下,除了特殊情况(同时赋值、同一局域代码块内赋值)以及小整数池之外,所有数据在被多个变量引用时都会开辟新的内存空间
其实还有一种特殊情况,我们来看这么一个例子
#交互模式>>> s1='hello'>>> s2='hello'>>> print(s1 is s2)True#交互模式 >>> s1='hello' >>> s2='hello' >>> print(s1 is s2) True#交互模式 >>> s1='hello' >>> s2='hello' >>> print(s1 is s2) True
看着输出的结果,再跟刚刚所学到的知识做一下对比,是不是发现有不对劲的地方
交互模式下,多个变量引用字符串(不可变数据类型)应该是开辟新的内存空间啊,为啥上面的例子没有开辟
intern机制
字符串类型作为Python中最常用的数据类型之一,Python 为了提高字符串使用的效率和使用性能,使用了 intern(字符串驻留)的技术来提高字符串效率
即值同样的字符串对象仅仅会保存一份,放在一个字符串储蓄池中,是共用的,有新的变量引用同样的字符串的时候,不会开辟新的内存空间,而是引用这个共用的字符串
- 原理
实现 Intern 机制的方式非常简单,就是通过维护一个字符串储蓄池,这个池子是一个字典结构,如果字符串已经存在于池子中就不再去创建新的字符串,直接返回之前创建好的字符串对象,如果之前还没有加入到该池子中,则先构造一个字符串对象,并把这个对象加入到池子中去,方便下一次获取
下面是伪代码
intern_pool = {}def intern(s):if str in intern_pool:return intern_pool[str]else:obj = PyStringObject(str)intern_pool[str] = objreturn objintern_pool = {} def intern(s): if str in intern_pool: return intern_pool[str] else: obj = PyStringObject(str) intern_pool[str] = obj return objintern_pool = {} def intern(s): if str in intern_pool: return intern_pool[str] else: obj = PyStringObject(str) intern_pool[str] = obj return obj
1、在交互模式下,只包含字母数字下划线的字符串才会触发 intern 机制
>>> s1='hello'>>> s2='hello'>>> print(s1 is s2)True>>> s1='hello' >>> s2='hello' >>> print(s1 is s2) True>>> s1='hello' >>> s2='hello' >>> print(s1 is s2) True
#使用了特殊字符,就不会触发 intern 机制>>> s1='hello&'>>> s2='hello&'>>> print(s1 is s2)False>>> a='12 3'>>> b='12 3'>>> print(a is b)False#使用了特殊字符,就不会触发 intern 机制 >>> s1='hello&' >>> s2='hello&' >>> print(s1 is s2) False >>> a='12 3' >>> b='12 3' >>> print(a is b) False#使用了特殊字符,就不会触发 intern 机制 >>> s1='hello&' >>> s2='hello&' >>> print(s1 is s2) False >>> a='12 3' >>> b='12 3' >>> print(a is b) False
2、在 IDE 环境或者脚本模式下,只要长度不超过20(长度限制),即使使用特殊字符也会触发 intern 机制
a = '12 3'b = '12 3'print(a is b) # Trues1='hello&'s2='hello&'print(s1 is s2) # Truea = '12 3' b = '12 3' print(a is b) # True s1='hello&' s2='hello&' print(s1 is s2) # Truea = '12 3' b = '12 3' print(a is b) # True s1='hello&' s2='hello&' print(s1 is s2) # True
s1 = "a b" * 10s2 = "a b" * 10print(s1 is s2) # Trues1 = "ab" * 11s2 = "ab" * 11print(s1 is s2) # Falses1 = "a b" * 10 s2 = "a b" * 10 print(s1 is s2) # True s1 = "ab" * 11 s2 = "ab" * 11 print(s1 is s2) # Falses1 = "a b" * 10 s2 = "a b" * 10 print(s1 is s2) # True s1 = "ab" * 11 s2 = "ab" * 11 print(s1 is s2) # False
PS:我在写这篇文章的时候用的是 python 3.9,发现没有长度限制了,都会触发 intern 机制
s1 = "a b" * 22s2 = "a b" * 22print(s1 is s2) # Trues1 = "%^&?" * 22s2 = "%^&?" * 22print(s1 is s2) # Trues1 = "a b" * 22 s2 = "a b" * 22 print(s1 is s2) # True s1 = "%^&?" * 22 s2 = "%^&?" * 22 print(s1 is s2) # Trues1 = "a b" * 22 s2 = "a b" * 22 print(s1 is s2) # True s1 = "%^&?" * 22 s2 = "%^&?" * 22 print(s1 is s2) # True