基于R-CNN与FastR-CNN的目标检测模型实现
立即解锁
发布时间: 2025-09-01 01:19:29 阅读量: 11 订阅数: 28 AIGC 

### 基于R-CNN与Fast R-CNN的目标检测模型实现
#### 1. 数据准备
在进行模型训练之前,需要先准备好训练和验证数据集以及相应的数据加载器。具体步骤如下:
```python
n_train = 9*len(FPATHS)//10
train_ds = RCNNDataset(FPATHS[:n_train],ROIS[:n_train],
CLSS[:n_train], DELTAS[:n_train],
GTBBS[:n_train])
test_ds = RCNNDataset(FPATHS[n_train:], ROIS[n_train:],
CLSS[n_train:], DELTAS[n_train:],
GTBBS[n_train:])
from torch.utils.data import TensorDataset, DataLoader
train_loader = DataLoader(train_ds, batch_size=2,
collate_fn=train_ds.collate_fn,
drop_last=True)
test_loader = DataLoader(test_ds, batch_size=2,
collate_fn=test_ds.collate_fn,
drop_last=True)
```
上述代码将数据集按照9:1的比例划分为训练集和测试集,并使用`DataLoader`进行数据加载。
#### 2. R-CNN网络架构
接下来,我们将构建一个能够预测区域建议的类别和偏移量的模型,以在图像中的对象周围绘制紧密的边界框。具体策略如下:
1. 定义一个VGG骨干网络。
2. 通过预训练模型传递归一化的裁剪区域后提取特征。
3. 在VGG骨干网络上附加一个带有sigmoid激活函数的线性层,以预测区域建议对应的类别。
4. 附加一个额外的线性层来预测四个边界框偏移量。
5. 定义两个输出(一个用于预测类别,另一个用于预测四个边界框偏移量)的损失计算。
6. 训练能够预测区域建议类别和四个边界框偏移量的模型。
以下是具体的代码实现:
```python
# 定义VGG骨干网络
import torchvision.models as models
import torch.nn as nn
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
vgg_backbone = models.vgg16(pretrained=True)
vgg_backbone.classifier = nn.Sequential()
for param in vgg_backbone.parameters():
param.requires_grad = False
vgg_backbone.eval().to(device)
# 定义RCNN网络模块
class RCNN(nn.Module):
def __init__(self):
super().__init__()
feature_dim = 25088
self.backbone = vgg_backbone
self.cls_score = nn.Linear(feature_dim, len(label2target))
self.bbox = nn.Sequential(
nn.Linear(feature_dim, 512),
nn.ReLU(),
nn.Linear(512, 4),
nn.Tanh(),
)
self.cel = nn.CrossEntropyLoss()
self.sl1 = nn.L1Loss()
def forward(self, input):
feat = self.backbone(input)
cls_score = self.cls_score(feat)
bbox = self.bbox(feat)
return cls_score, bbox
def calc_loss(self, probs, _deltas, labels, deltas):
detection_loss = self.cel(probs, labels)
ixs, = torch.where(labels != 0)
_deltas = _deltas[ixs]
deltas = deltas[ixs]
self.lmb = 10.0
if len(ixs) > 0:
regression_loss = self.sl1(_deltas, deltas)
return detection_loss + self.lmb * regression_loss, detection_loss.detach(), regression_loss.detach()
else:
regression_loss = 0
return detection_loss + self.lmb * regression_loss, detection_loss.detach(), regression_loss
# 定义训练和验证函数
def train_batch(inputs, model, optimizer, criterion):
input, clss, deltas = inputs
model.train()
optimizer.zero_grad()
_clss, _deltas = model(input)
loss, loc_loss, regr_loss = criterion(_clss, _deltas, clss, deltas)
accs = clss == decode(_clss)
loss.backward()
optimizer.step()
return loss.detach(), loc_loss, regr_loss, accs.cpu().numpy()
@torch.no_grad()
def validate_batch(inputs, model, criterion):
input, clss, deltas = inputs
with torch.no_grad():
model.eval()
_clss,_deltas = model(input)
loss,loc_loss,regr_loss = criterion(_clss, _deltas, clss, deltas)
_, _clss = _clss.max(-1)
accs = clss == _clss
return _clss,_deltas,loss.detach(), loc_loss, regr_loss, accs.cpu().numpy()
# 创建模型对象,定义损失函数、优化器和训练轮数
import torch.optim as optim
rcnn = RCNN().to(device)
criterion = rcnn.calc_loss
optimizer = optim.SGD(rcnn.parameters(), lr=1e-3)
n_epochs = 5
log = Report(n_epochs)
# 训练模型
for epoch in range(n_epochs):
_n = len(train_loader)
for ix, inputs in enumerate(train_loader):
loss, loc_loss,regr_loss,accs = train_batch(inputs, rcnn, optimizer, criterion)
pos = (epoch + (ix+1)/_n)
log.record(pos, trn_loss=loss.item(),
trn_loc_loss=loc_loss,
trn_regr_loss=regr_loss,
trn_acc=accs.mean(), end='\r')
_n = len(test_loader)
for ix,inputs in enumerate(test_loader):
_clss, _deltas, loss, loc_loss, regr_loss, accs = validate_batch(inputs, rcnn, criterion)
pos = (epoch + (ix+1)/_n)
log.record(pos, val_loss=loss.item(),
val_loc_loss=loc_loss,
val_regr_loss=regr_loss,
val_acc=accs.mean(), end='\r')
# 绘制训练和验证指标
log.plot_epochs('trn_loss,val_loss'.split(','))
```
#### 3. 在新图像上进行预测
训练好模型后,我们可以使用它在新图像上进行预测。具体步骤如下:
1. 从新图像中提取区域建议。
2. 调整每个裁剪区域的大小并进行归一化。
3. 前向传播处理后的裁剪区域,以预测类别和偏移量。
4. 执行非极大值抑制,仅获取包含对象置信度最高的边界框。
以下是具体的代码实现:
```python
import cv2
import numpy as np
import torch
from torchvision.ops import nms
import matplotlib.pyplot as plt
def test_predictions(filename, show_output=True):
img = np.array(cv2.imread(filename, 1)[...,::-1])
candidates = extract_candidates(img)
candidates = [(x,y,x+w,y+h) for x,y,w,h in candidates]
input = []
for candidate in candidates:
x,y,X,Y = candidate
crop = cv2.resize(img[y:Y,x:X], (224,224))
input.append(preprocess_image(crop/255.)[None])
input = torch.cat(input).to(device)
with torch.no_grad():
rcnn.eval()
probs, deltas = rcnn(input)
probs = torch.nn.functional.softmax(probs, -1)
confs, clss = torch.max(probs, -1)
candidates = np.array(candidates)
confs,clss,probs,deltas=[tensor.detach().cpu().numpy() for tensor in [confs, clss, probs, deltas]]
ixs = clss!=background_class
confs,clss,probs,deltas,candidates = [tensor[ixs] for tensor in [confs,clss, probs, deltas,candidates]]
bbs = (candidates + deltas).astype(np.uint16)
ixs = nms(torch.tensor(bbs.astype(np.float32)), torch.tensor(confs), 0.05)
confs,clss,probs,deltas,candidates,bbs=[tensor[ixs] for tensor in [confs, clss, probs, deltas, candidates, bbs]]
if len(ixs) == 1:
confs, clss, probs, deltas, candidates, bbs = [tensor[None] for tensor in [confs, clss, probs, deltas, candidates, bbs]]
if len(confs) == 0 and not show_output:
return (0,0,224,224), 'background', 0
if len(confs) > 0:
best_pred = np.argmax(confs)
best_conf = np.max(confs)
best_bb = bbs[best_pred]
x,y,X,Y = best_bb
```
0
0
复制全文
相关推荐









