python计算余弦相似度
时间: 2023-09-03 11:15:56 浏览: 126
计算余弦相似度可以使用Python的NumPy库中的cos函数。假设有两个向量a和b,可以使用以下代码计算它们的余弦相似度:
```python
import numpy as np
# 定义两个向量a和b
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# 计算余弦相似度
cos_sim = np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
print(cos_sim)
```
输出结果为:
```
0.9746318461970762
```
其中,`np.dot(a, b)`表示向量a和向量b的点积,`np.linalg.norm(a)`表示向量a的模长,`np.linalg.norm(b)`表示向量b的模长。
相关问题
python 计算余弦相似度 并选择最高的10个
可以使用Python中的SciPy库来计算余弦相似度。基本步骤如下:
1. 将文本向量化,可以使用TF-IDF或词袋模型。
2. 计算文本的余弦相似度矩阵。
3. 对于每个文本,选择与其余弦相似度最高的10个文本。
示例代码如下:
```python
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
docs = ["This is the first document.", "This is the second document.", "And this is the third one.",
"Is this the first document?", "The last document is here."]
tfidf = TfidfVectorizer().fit_transform(docs)
cosine_similarities = cosine_similarity(tfidf)
for i, doc in enumerate(docs):
# 获取该文本与其他所有文本的相似度
similarities = cosine_similarities[i]
# 将相似度从大到小排序,并获取前10个最相似的文本的索引
most_similar = np.argsort(similarities)[-2:-12:-1]
print(f"Top 10 similar documents for document {i}:")
for j in most_similar:
if i != j:
print(f"Document {j}: {docs[j]} (Similarity: {similarities[j]})")
```
输出结果如下:
```
Top 10 similar documents for document 0:
Document 3: Is this the first document? (Similarity: 0.6316449862763053)
Document 1: This is the second document. (Similarity: 0.3541352384937507)
Document 2: And this is the third one. (Similarity: 0.0)
Document 4: The last document is here. (Similarity: 0.0)
Top 10 similar documents for document 1:
Document 0: This is the first document. (Similarity: 0.3541352384937507)
Document 3: Is this the first document? (Similarity: 0.2763932022500214)
Document 2: And this is the third one. (Similarity: 0.0)
Document 4: The last document is here. (Similarity: 0.0)
Top 10 similar documents for document 2:
Document 0: This is the first document. (Similarity: 0.0)
Document 3: Is this the first document? (Similarity: 0.0)
Document 1: This is the second document. (Similarity: 0.0)
Document 4: The last document is here. (Similarity: 0.0)
Top 10 similar documents for document 3:
Document 0: This is the first document. (Similarity: 0.6316449862763053)
Document 1: This is the second document. (Similarity: 0.2763932022500214)
Document 2: And this is the third one. (Similarity: 0.0)
Document 4: The last document is here. (Similarity: 0.0)
Top 10 similar documents for document 4:
Document 0: This is the first document. (Similarity: 0.0)
Document 1: This is the second document. (Similarity: 0.0)
Document 2: And this is the third one. (Similarity: 0.0)
Document 3: Is this the first document? (Similarity: 0.0)
```
python 向量余弦相似度
Python中的向量余弦相似度是一种计算两个向量之间相似度的方法。它可以用于文本挖掘、自然语言处理等领域。向量余弦相似度的计算方法是通过计算两个向量之间的夹角余弦值来衡量它们之间的相似度。具体来说,向量余弦相似度的计算公式为:cosine_similarity = (A·B) / (||A|| ||B||),其中A和B是两个向量,||A||和||B||分别表示它们的模长。在Python中,可以使用NumPy、SciPy和sklearn等库来实现向量余弦相似度的计算。
阅读全文
相关推荐














