0% found this document useful (0 votes)

0 views10 pages

Contrastive Learning for Sentence Representation

The paper introduces CLEAR, a contrastive learning framework aimed at improving sentence representation by utilizing various sentence-level augmentation strategies. It demonstrates that different augmentations, such as word deletion and synonym substitution, lead to significant performance improvements on NLP tasks compared to existing methods. CLEAR outperforms strong baselines like RoBERTa and BERT on benchmarks including GLUE and SentEval, highlighting the effectiveness of its approach to sentence representation learning.

Uploaded by

Jing Ma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views10 pages

Contrastive Learning for Sentence Representation

Uploaded by

Jing Ma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

CLEAR: Contrastive Learning for Sentence Representation

Zhuofeng Wu1∗ Sinong Wang2 Jiatao Gu2

Madian Khabsa2 Fei Sun3 Hao Ma2
1
School of Information, University of Michigan
[email protected]
2
Facebook AI
{sinongwang, jgu, mkhabsa, haom}@fb.com
3
Institute of Computing Technology, Chinese Academy of Sciences
[email protected]

Abstract that averaging of all output word vectors out-

performs the CLS-token embedding marginally.
arXiv:2012.15466v1 [cs.CL] 31 Dec 2020

Pre-trained language models have proven their

unique powers in capturing implicit language Sentence-BERT’s results suggest that models like
features. However, most pre-training ap- BERT learn a better representation at the token
proaches focus on the word-level training ob- level. One natural question is how to better learn
jective, while sentence-level objectives are sentence representation.
rarely studied. In this paper, we propose Inspired by the success of contrastive learn-
Contrastive LEArning for sentence Repre- ing in computer vision (Zhuang et al., 2019;
sentation (CLEAR), which employs multiple
Tian et al., 2019; He et al., 2020; Chen et al.,
sentence-level augmentation strategies in or-
der to learn a noise-invariant sentence repre- 2020; Misra and Maaten, 2020), we are interested
sentation. These augmentations include word in exploring whether it could also help language
and span deletion, reordering, and substitu- models generate a better sentence representation.
tion. Furthermore, we investigate the key rea- The key method in contrastive learning is augment-
sons that make contrastive learning effective ing positive samples during the training. How-
through numerous experiments. We observe ever, data augmentation for text is not as fruitful
that different sentence augmentations during
as for image. The image can be augmented eas-
pre-training lead to different performance im-
provements on various downstream tasks.Our ily by rotating, cropping, resizing, or cutouting,
approach is shown to outperform multiple ex- etc. (Chen et al., 2020). In NLP, there are mini-
isting methods on both SentEval and GLUE mal augmentation ways that have been researched
benchmarks. in literature (Giorgi et al., 2020; Fang and Xie,
2020). The main reason is that every word in a
1 Introduction
sentence may play an essential role in expressing
Learning a better sentence representation model the whole meaning. Additionally, the order of the
has always been a fundamental problem in Natu- words also matters.
ral Language Processing (NLP). Taking the mean Most existing pre-trained language mod-
of word embeddings as the representation of sen- els (Devlin et al., 2019; Liu et al., 2019;
tence (also known as mean pooling) is a com- Lewis et al., 2019) are adding different kinds
mon baseline in the early stage. Later on, of noises to the text and trying to restore them
pre-trained models such as BERT (Devlin et al., at the word-level. Sentence-level objectives
2019) propose to insert a special token (i.e., are rarely studied. BERT (Devlin et al., 2019)
[CLS] token) during the pre-training and take combines the word-level loss, masked language
its embedding as the representation for the sen- modeling (MLM) with a sentence-level loss,
tence. Because of the tremendous improve- next sentence prediction (NSP), and observes
ment brought by BERT (Devlin et al., 2019), that MLM+NSP is essential for some down-
people seemed to agree that CLS-token em- stream tasks. RoBERTa (Liu et al., 2019) drops
bedding is better than averaging word embed- the NSP objective during the pre-training but
dings. Nevertheless, a recent paper Sentence- achieves a much better performance in a variety
BERT (Reimers and Gurevych, 2019) observed of downstream tasks. ALBERT (Lan et al., 2019)
∗
Work done while the author was an intern at Facebook proposes a self-supervised loss for Sentence-
AI. Order Prediction (SOP), which models the
z1 maximize agreement z2
g(·) g(·)

h [CLS] h1 ··· hi ··· hj ··· hN h[CLS] h1 ··· hi ··· hj ··· hN

transformer encoder f (·) transformer encoder f (·)

E [CLS] E1 ··· Ei ··· Ej ··· EN E [CLS] E1 ··· Ei ··· Ej ··· EN

[CLS] Tok′1 ··· Tok′i ··· Tok′j ··· Tok′N [CLS] Tok′′
1 ··· Tok′′
i ··· Tok′′
j ··· Tok′′
N

se1 = AUG(s, seed 1 ) se2 = AUG(s, seed 2 )

AUG ∼ A AUG ∼ A
Tok1 ··· Toki ··· Tokj · · · TokN

Original sentence s

Figure 1: The proposed contrastive learning framework CLEAR.

inter-sentence coherence. Their work shows about what kind of augmentations can be
that coherence prediction is a better choice used in contrastive learning.
than the topic prediction, the way NSP uses.
• We showed that model pre-trained by our
DeCLUTR (Giorgi et al., 2020) is the first work
proposed method outperforms several strong
to combine Contrastive Learning (CL) with
baselines (including RoBERTa and BERT)
MLM into pre-training. However, it requires an
on both GLUE (Wang et al., 2018) and Sen-
extremely long input document, i.e., 2048 tokens,
tEval (Conneau and Kiela, 2018) benchmark.
which restricts the model to be pre-trained on
For example, we showed +2.2% absolute im-
limited data. Further, DeCLUTR trains from
provement on 8 GLUE tasks and +5.7% abso-
existing pre-trained models, so it remains un-
lute improvement on 7 SentEval semantic tex-
known whether it could also achieve the same
tual similarity tasks compared to RoBERTa
performance when it trains from scratch.
model.
Drawing from the recent advances in pre-
trained language models and contrastive learning, 2 Related Work
we propose a new framework, CLEAR, combining There are three lines of literatures that are closely
word-level MLM objective with sentence-level CL related to our work: sentence representation, large-
objective to pre-train a language model. MLM ob- scale pre-trained language representation models,
jective enables the model capture word-level hid- contrastive learning.
den features while CL objective ensures the model
with the capacity of recognizing similar meaning 2.1 Sentence Representation
sentences by training an encoder to minimize the Learning the representation of sentence has been
distance between the embeddings of different aug- studied by many existing works. Applying
mentations of the same sentence. In this paper, various pooling strategies onto word embed-
we present a novel design of augmentations that dings as the representation of sentence is a
can be used to pre-train a language model at the common baseline (Iyyer et al., 2015; Shen et al.,
sentence-level. Our main findings and contribu- 2018; Reimers and Gurevych, 2019). Skip-
tions can be summarized as follows: Thoughts (Kiros et al., 2015) trains an encoder-
decoder model trying to reconstruct surrounding
• We proposed and tested four basic sentence sentences. Quick-Thoughts (Logeswaran and Lee,
augmentations: random-words-deletion, 2018) trains a encoder-only model with the abil-
spans-deletion, synonym-substitution, and ity to select the correct context of the sen-
reordering, which fills a large gap in NLP tence out of other contrastive sentences. Later
on, many pre-trained language models such as 2020) regards different spans inside one document
BERT (Devlin et al., 2019) propose to use the are similar to each others. Our model differs
manually-inserted token (the [CLS] token) as from CERT in adopting an encoder-only struc-
the representation of the whole sentence and be- ture, which decreases noise brought by the de-
come the new state-of-the-art in a variety of coder. Further, unlike DeCLUTR, which only tests
downstream tasks. One recent paper Sentence- one augmentation and trains the model from an
BERT (Reimers and Gurevych, 2019) compares existing pre-trained model, we pre-train all mod-
the average BERT embeddings with the CLS- els from scratch, which provides a straightforward
token embedding and surprisingly finds that com- comparison with the existing pre-trained models.
puting the mean of all output vectors at the
last layer of BERT outperforms the CLS-token 3 Method
marginally. This section proposes a novel framework and
2.2 Large-scale Pre-trained Language several sentence augmentation methods for con-
trastive learning in NLP.
Representation Models
The deep pre-trained language models have 3.1 The Contrastive Learning Framework
proven their powers in capturing implicit language Borrow from SimCLR (Chen et al., 2020), we pro-
features even with different model architectures, pose a new contrastive learning framework to learn
pre-training tasks, and loss functions. Two of the the sentence representation, named as CLEAR.
early works that are GPT (Radford et al., 2018) There are four main components in CLEAR, as
and BERT (Devlin et al., 2019): GPT uses a left- outlined in Figure 1.
to-right Transformer while BERT designs a bidi-
rectional Transformer. Both created an incredible • An augmentation component AUG(·) which
new state of the art in a lot of downstream tasks. apply the random augmentation to the orig-
Following this observation, recently, a tremen- inal sentence. For each original sentence s,
dous number of research works are published in we generate two random augmentations se1 =
the pre-trained language model domain. Some ex- AUG(s, seed 1 ) and se2 = AUG(s, seed 2 ),
tend previous models to a sequence-to-sequence where seed 1 and seed 2 are two random seeds.
structure (Song et al., 2019; Lewis et al., 2019; Note that, to test each augmentation’s ef-
Liu et al., 2020), which enforces the model’s fect solely, we adopt the same augmentation
capability on language generation. The oth- to generate se1 and se2 . Testing the mixing
ers (Yang et al., 2019; Liu et al., 2019; Clark et al., augmentation models requests more compu-
2020) explore the different pre-training objectives tational resources, which we plan to leave for
to either improve the model’s performance or ac- future work. We will detail the proposed aug-
celerate the pre-training. mentation set A at Section 3.3.

2.3 Contrastive Learning • A transformer-based encoder f (·) that learns

the representation of the input augmented
Contrastive Learning has become a rising do- sentences H1 = f (e s1 ) and H2 = f (e s2 ).
main because of its significant success in various Any encoder that learns the sentence repre-
computer vision tasks and datasets. Several re- sentation can be used here to replace our en-
searchers (Zhuang et al., 2019; Tian et al., 2019; coder. We choose the current start-of-the-
Misra and Maaten, 2020; Chen et al., 2020) pro- art (i.e., transformer (Vaswani et al., 2017))
posed to make the representations of the different to learn sentence representation and use the
augmentation of an image agree with each other representation of a manually-inserted token
and showed positive results. The main difference as the vector of the sentence (i.e., [CLS], as
between these works is their various definition of used in BERT and RoBERTa).
image augmentation.
Researchers in the NLP domain have also • A nonlinear neural network projection head
started to work on finding suitable augmentation g(·) that maps the encoded augmentations H1
for text. CERT (Fang and Xie, 2020) applies and H2 to the vector z1 = g(H1 ), z2 =
the back-translation to create augmentations of g(H2 ) in a new space. According to observa-
original sentences, while DeCLUTR (Giorgi et al., tions in SimCLR (Chen et al., 2020), adding
sentence after word deletion sentence after span deletion

Tok[del] Tok3 Tok[del] Tok5 ··· TokN Tok[del] Tok5 ··· TokN

Tok[del] Tok[del] Tok3 Tok[del] Tok5 ··· TokN Tok[del] Tok[del] Tok[del] Tok[del] Tok5 ··· TokN

Tok1 Tok2 Tok3 Tok4 Tok5 ··· TokN Tok1 Tok2 Tok3 Tok4 Tok5 ··· TokN

original sentence original sentence

(a) Word Deletion: Tok1 , Tok2 , and Tok4 are (b) Span Deletion: The span [Tok1 , Tok2 , Tok3 ,
deleted, the sentence after augmentation will be: Tok4 ] is deleted, the sentence after augmentation
[Tok[del] , Tok3 , Tok[del] , Tok5 , . . . , TokN ]. will be: [Tok[del] , Tok5 , . . . , TokN ].

sentence after reordering sentence after similar word subsitution

Tok4 Tok3 Tok1 Tok2 Tok5 ··· TokN Tok1 Tok′2 Tok′3 Tok4 Tok5 ··· Tok′N

Tok1 Tok2 Tok3 Tok4 Tok5 ··· TokN Tok1 Tok2 Tok3 Tok4 Tok5 ··· TokN

original sentence original sentence

(c) Reordering: Two spans [Tok1 , Tok2 ] and (d) Synonym Substitution: Tok2 , Tok3 , and
[Tok4 ] are reordered, the sentence after aug- TokN are substituted by their synonyms Tok′2 ,
mentation will be: [Tok4 , Tok3 , Tok1 , Tok2 , Tok′3 , and Tok′N , respectively. The sentence after
Tok5 , . . . , TokN ]. augmentation will be: [Tok1 , Tok′2 , Tok′3 , Tok4 ,
Tok5 , . . . , Tok′N ].

Figure 2: Four sentence augmentation methods in proposed contrastive learning framework CLEAR.

a nonlinear projection head can significantly whether k 6= i, τ is a temperature parameter,

improve representation quality of images. sim(u, v) = u⊤ v/(kuk2 kvk2 ) denotes the
cosine similarity of two vector u and v. The
• A contrastive learning loss function defined overall contrastive learning loss is defined as
for a contrastive prediction task, i.e., try- the sum of all positive pairs’ loss in a mini-
ing to predict positive augmentation pair (e s1 , batch:
se2 ) in the set {s̃}. We construct the set 2N X
2N
X
{s̃} by randomly augmenting twice for all LCL = m(i, j)l(i, j) (2)
the sentences in a minibatch (assuming a i=1 j=1
minibatch is a set {s} size N ), getting a
where m(i, j) is a function returns 1 when i
set {s̃} with size 2N . The two variants
and j is a positive pair, returns 0 otherwise.
from the same original sentence form the
positive pair, while all other instances from 3.2 The Combined Loss for Pre-training
the same minibatch are regarded as negative Similar to (Giorgi et al., 2020), for the purpose of
samples for them. The contrastive learning grabbing both token-level and sentence-level fea-
loss has been tremendously used in previ- tures, we use a combined loss of MLM objective
ous work (Wu et al., 2018; Chen et al., 2020; and CL objective to get the overall loss:
Giorgi et al., 2020; Fang and Xie, 2020). The
Ltotal = LMLM + LCL (3)
loss function for a positive pair is defined as:
where LMLM is calculated through predicting
exp (sim(zi , zj )/τ )
the random-masked tokens in set {s} as de-
l(i, j)=− log P2N
k=1 1[k6=i] exp (sim(zi , zk )/τ )
scribed in BERT and RoBERTa (Devlin et al.,
(1) 2019; Liu et al., 2019). Our pre-training target is
where 1[k6=i] is the indicator function to judge to minimize the Ltotal .
3.3 Design Rationale for Sentence sample some words and replace them with syn-
Augmentations onyms to construct one augmentation. The syn-
The data augmentation is crucial for learning onym list comes from a vocabulary they used. In
the representation of image (Tian et al., 2019; our pre-training corpus, there are roughly 40% to-
Jain et al., 2020). However, in language modeling, kens with at least one similar-meaning token in the
it remains unknown whether data (sentence) aug- list.
mentation would benefit the representation learn-
ing and what kind of data augmentation could ap- 4 Experiment
ply to the text. To answer these questions, we This section presents empirical experiments that
explore and test four basic augmentations (shown compare the proposed methods with various base-
in Figure 2) and their combinations in our exper- lines and alternative approaches.
iment. We do believe there exist more potential
augmentations, which we plan to leave for future 4.1 Setup
exploration.
Model configuration: We use the Transformer
One type of augmentation we consider is dele-
(12 layers, 12 heads and 768 hidden size) as
tion, which bases on the hypothesis that some dele-
our primary encoder (Vaswani et al., 2017). Mod-
tion in a sentence wouldn’t affect too much of the
els are pre-trained for 500K updates, with mini-
original semantic meaning. In some case, it may
batches containing 8,192 sequences of maximum
happen that deleting some words leads the sen-
length 512 tokens. For the first 24,000 steps, the
tence to a different meaning (e.g., the word not).
learning rate is warmed up to a peak value of 6e−4,
However, we believe including proper noise can
then linearly decayed for the rest. All models are
benefit the model to be more robust. We consider
optimized by Adam (Kingma and Ba, 2014) with
two different deletions, i.e., word deletion and
β1 = 0.9, β2 = 0.98, ǫ = 1e−6, and L2 weight de-
span deletion.
cay of 0.01. We use 0.1 for dropout on all layers
• Word deletion (shown in Figure 2a) ran- and in attention. All of the models are pre-trained
domly selects tokens in the sentence and on 256 NVIDIA Tesla V100 32GB GPUs.
replace them by a special token [DEL], Pre-training data: We pre-train all the models on
which is similar to the token [MASK] in a combination of BookCorpus (Zhu et al., 2015)
BERT (Devlin et al., 2019). and English Wikipedia datasets, the data BERT
used for pre-training. For more statistics of the
• Span deletion (shown in Figure 2b) picks and dataset and processing details, one can refer to
replaces the deletion objective on the span- BERT (Devlin et al., 2019).
level. Generally, span-deletion is a special Hyperparameters for MLM: For calculating
case of word-deletion, which puts more focus MLM loss, we randomly mask 15% tokens of the
on deleting consecutive words. input text s and use the surrounding tokens to pre-
dict them. To fill the gap between fine-tuning
To avoid the model easily distinguishing the two and pre-training, we also adopt the 10%-random-
augmentations from the remaining words at the replacement and 10%-keep-unchanged setting in
same location, we eliminate the consecutive token BERT for the masked tokens.
[DEL] into one token. Hyperparameters for CL: To compute CL loss,
Reordering (shown in Figure 2c) is another we set up different hyperparameters:
widely-studied augmentation that can keep the
original sentence’s features. BART (Lewis et al., • For Word Deletion (del-word), we delete
2019) has explored restoring the original sentence 70% tokens.
from the random reordered sentence. We ran-
domly sample several pairs of span and switch • For Span Deletion (del-span), we delete 5
them pairwise to construct the reordering augmen- spans (each with 5% length of the input text).
tation in our implementation.
Substitution (shown in Figure 2d) has been • For Reordering (reorder), we randomly
proven efficient in improving model’s robust- pick 5 pairs of spans (each with roughly 5%
ness (Jia et al., 2019). Following their work, we length as well) and switch spans pairwise.
Table 1: Performance of competing methods evaluated on GLUE dev set. Following GLUE’s setting (Wang et al.,
2018), unweighted average accuracy on the matched and mismatched dev sets is reported for MNLI. The un-
weighted average of accuracy and F1 is reported for MRPC and QQP. The unweighted average of Pearson and
Spearman correlation is reported for STS-B. The Matthews correlation is reported for CoLA. For all other tasks
we report accuracy.

Method MNLI QNLI QQP RTE SST-2 MRPC CoLA STS Avg

Baselines
BERT-base (Devlin et al., 2019) 84.0 89.0 89.1 61.0 93.0 86.3 57.3 89.5 81.2
RoBERTa-base (Liu et al., 2019) 87.2 93.2 88.2 71.8 94.4 87.8 56.1 89.4 83.5

MLM+1-CL-objective
MLM+ del-word 86.8 93.0 90.2 79.4 94.2 89.7 62.1 90.5 85.7
MLM+ del-span 87.3 92.8 90.1 79.8 94.4 89.9 59.8 90.3 85.6
MLM+2-CL-objective
MLM+ subs+ del-word 87.3 93.1 90.0 73.3 93.7 90.2 62.1 90.1 85.0
MLM+ subs+ del-span 87.0 93.4 90.3 74.4 94.3 90.5 63.3 90.5 85.5
MLM+ del-word+ reorder 87.0 92.7 89.5 76.5 94.5 90.6 59.1 90.4 85.0
MLM+ del-span+ reorder 86.7 92.9 90.0 78.3 94.5 89.2 64.3 89.8 85.7

• For Substitution (subs), we randomly select 4.2 GLUE Results

30% tokens and replace each token with one
We mainly evaluate all the models by the Gen-
of their similar-meaning tokens.
eral Language Understanding Evaluation (GLUE)
benchmark development set (Wang et al., 2018).
GLUE is a benchmark containing several differ-
Some of the above hyperparameters are slightly-
ent types of NLP tasks: natural language infer-
tuned on the WiKiText-103 dataset (Merity et al.,
ence task (MNLI, QNLI, and RTE), similarity
2016) (trained for 100 epochs, evaluated on
task (QQP, MRPC, STS), sentiment analysis task
the GLUE dev benchmark). For example, we
(SST), and linguistic acceptability task(CoLA).
find 70% deletion model perform best out of
It provides a comprehensive evaluation for pre-
{30%, 40%, 50%, 60%, 70%, 80%, 90%} deletion
trained language models.
models. For models using mixed augmentations,
like MLM+2-CL-objective in Table 1, they use the To fit the different downstream tasks’ require-
same optimized hyperparameters as in the single ments, we follow the RoBERTa’s hyperparamters
model. For instance, our notation MLM+subs+del- to finetune our model for various tasks. Specifi-
span represents a model combining the MLM loss cally, we add an extra fully connected layer and
with CL loss: for MLM, it masks 15% tokens; for then finetune the whole model on different train-
CL, it substitutes 30% tokens first and then deletes ing sets.
5 spans to generate augmented sentences. The primary baselines we include are BERT-
base and RoBERTa-base. The results for BERT-
Note that the hyperparameters we used might
base are from huggingface’s reimplementation1 . A
not be the most optimized ones. Yet, it is un- more fair comparison comes from RoBERTa-base
known whether optimized hyperparameters on a since we use the same hyperparameters RoBERTa-
1-CL-objective model perform consistently on a base used for MLM loss. Note that our models are
2-CL-objective model. Additionally, it is also all combining two-loss, it is still unfair to compare
unclear whether the optimized hyperparameters a MLM-only model with a MLM+CL model. To
for WiKiText-103 are still the optimized ones
answer this question, we set two other baselines
on BookCorpus and English Wikipedia datasets.
in Section 5.1 to make a more strict comparison:
However, it is hard to tune every possible hyperpa-
one combines two MLM losses, the other adopts a
rameter due to the extensive computation resource
double batch size.
requirement for pre-training. We will leave these
1
questions to explore in the future. https://huggingface.co/transformers/v1.1.0/examples.html
Table 2: Performance of competing methods evaluated on SentEval. All results are pre-trained on BookCorpus
and English Wikipedia datasets for 500k steps.

Method SICK-R STS-B STS12 STS13 STS14 STS15 STS16 Avg

Baselines
RoBERTa-base-mean 74.1 65.6 47.2 38.3 46.7 55.0 49.5 53.8
RoBERTa-base-[CLS] 75.9 71.9 47.4 37.5 47.9 55.1 57.6 56.1
MLM+1-CL-objective
MLM+ del-word-mean 75.9 69.0 50.6 40.0 50.2 58.9 52.4 56.7
MLM+ del-span-mean 71.0 62.6 49.3 41.7 48.9 58.1 52.3 54.8
MLM+ del-word-[CLS] 77.1 71.6 50.6 44.5 48.3 58.4 56.1 58.1
MLM+ del-span-[CLS] 62.7 57.4 34.4 20.4 24.3 32.0 31.5 37.5
MLM+2-CL-objective
MLM+ del-word+ reorder-mean 75.8 66.2 51.1 45.7 51.8 61.3 57.0 58.4
MLM+ del-span+ reorder-mean 75.4 67.8 48.3 50.3 54.9 60.4 56.8 59.1
MLM+ subs+ del-word-mean 73.6 63.4 44.6 39.8 50.1 55.5 49.6 53.8
MLM+ subs+ del-span-mean 75.5 67.0 48.3 45.0 54.6 60.9 58.5 58.5
MLM+ del-word+ reorder-[CLS] 71.9 63.8 41.9 30.9 37.4 48.9 52.1 49.6
MLM+ del-span+ reorder-[CLS] 75.0 68.7 49.4 54.3 57.6 64.0 61.4 61.5
MLM+ subs+ del-word-[CLS] 73.6 62.9 44.5 35.8 47.6 55.8 59.6 54.3
MLM+ subs+ del-span-[CLS] 75.6 72.5 49.0 48.9 57.4 63.6 65.6 61.8

As we can see in Table 1, our proposed sev- fine-tuning like in GLUE. We evaluate the per-
eral models outperform the baselines on GLUE. formance of our proposed methods for common
Note that different tasks adopt different evalua- Semantic Textual Similarity (STS) tasks on
tion matrices, our two best models MLM+del- SentEval. Note that some previous models (e.g.,
word and MLM+del-span+reorder both improve Sentence-BERT (Reimers and Gurevych, 2019))
the best baseline RoBERTa-base by 2.2% on aver- on the SentEval leaderboard trains on the specific
age score. Besides, a more important observation datasets such as Stanford NLI (Bowman et al.,
is that all best performance for each task comes 2015) and MultiNLI (Williams et al., 2017),
from our proposed model. On CoLA and RTE, our which makes it hard for a direct comparison. To
best model exceeds the baseline by 7.0% and 8.0% make it easier, we compare one of our proposed
correspondingly. Further, we also find that differ- models with RoBERTa-base directly on SentEval.
ent downstream tasks benefit from different aug- According to Sentence-BERT, using the mean of
mentations. We will make a more specific analysis all output vectors in the last layer is more effective
in Section 5.2. than using the CLS-token output. We test both
One notable thing is that we don’t show pooling strategies for each model.
the result of MLM+subs, MLM+reorder, and
MLM+subs+reorder in Table 1. We observe that From Table 2, we observe that mean-pooling
the pre-training for these three models either con- strategy does not show much advantages. In many
verges quickly or suffers from a gradient explosion of the cases, CLS-pooling is better than the mean-
problem, which indicates that these three augmen- pooling for our proposed models. The underlying
tations are too easy to distinguish. reason is that the contrastive learning directly up-
dates the representation of [CLS] token. Besides
4.3 SentEval Results for Semantic Textual that, we find adding the CL loss makes the model
Similarity Tasks especially good at the Semantic Textual Similar-
SentEval is a popular benchmark for eval- ity (STS) task, beating the best baseline by a large
uating general-purpose sentence representa- margin (+5.7%). We think it is because the pre-
tions (Conneau and Kiela, 2018). The specialty training of contrastive learning is to find the sim-
for this benchmark is that it doesn’t do the ilar sentence pairs, which aligns with STS task.
Table 3: Ablation study for several methods evaluated on GLUE dev set. All results are pre-trained on wiki-103
data for 500 epochs.

Method MNLI-m QNLI QQP RTE SST-2 MRPC CoLA STS Avg
RoBERTa-base 80.4 87.5 87.4 61.4 91.4 82.4 38.9 81.9 76.4
MLM-variant
Double-batch RoBERTa-base 80.3 88.0 87.1 59.9 91.9 82.1 43.0 82.0 76.8
Double MLM RoBERTA-base 80.5 87.6 87.3 57.4 90.4 77.7 42.2 83.0 75.8
MLM+CL-objective
MLM+ del-span 80.6 88.8 87.3 62.1 92.1 77.8 44.1 81.4 76.8
MLM+ del-span + reorder 81.1 88.7 87.5 58.1 90.0 80.4 43.3 87.4 77.1
MLM+ subs + del-word + reorder 80.5 87.7 87.3 59.6 90.4 80.2 45.1 87.1 77.2

This could explain why our proposed models show model. It tells us the proposed model does not
such large improvements on STS. solely benefit from a larger batch; CL loss also
helps.
5 Discussion
This section discusses an ablation study to com- 5.2 Different Augmentation Learns Different
pare the CL loss and MLM loss and shows some Features
observations about what different augmentation
In Table 1, we find an interesting phenomenon:
learns.
different proposed models are good at specific
5.1 Ablation Study tasks.
Our proposed CL-based models outperforms One example is MLM+subs+del-span helps the
MLM-based models, one remaining question is, model be good at dealing with similarity and para-
where does our proposed model benefit from? phrase tasks. On QQP and STS, it achieves the
Does it come from the CL loss, or is it from highest score; on MRPC, it ranks second. We
the larger batch size (since to calculate CL loss, infer the outperformance of MLM+subs+del-span
one needs to store extra information per batch)? in this kind of task is because synonym substitu-
To answer this question, we set up two extra tion helps translate the original sentence to similar
baselines: Double MLM RoBERTa-base adopts meaning sentences while deleting different spans
the MLM+MLM loss, each MLM is performed makes more variety of similar sentences visible.
on different mask for the same original sentence; Combining them enhances the model’s capacity to
the other Double-batch RoBERTa-base uses single deal with many unseen sentence pairs.
MLM loss with a double-size batch. We also notice that MLM+del-span achieves
Due to the limitation of computational re- good performance on inference tasks (MNLI,
source, we conduct the ablation study on a QNLI, RTE). The underlying reason is, with a
smaller pre-training corpus, i.e., WiKiText-103 span deletion, the model has already been pre-
dataset (Merity et al., 2016). All the models listed trained well to infer the other similar sentences.
in Table 3 are pre-trained for 500 epochs on 64 The ability to identify similar sentence pairs helps
NVIDIA Tesla V100 32GB GPUs. Three of to recognize the contradiction. Therefore, the gap
our proposed models are reported in the table. between the pre-trained task and this downstream
The general performance for the variants doesn’t task narrows.
show much difference compared with the original Overall, we observe that different augmentation
RoBERTa-base, with a +0.4% increase on the aver- learns different features. Some specific augmen-
age score on Double-batch RoBERTa-base, which tations are especially good at some certain down-
confirms the idea that a larger batch benefits the stream tasks. Designing a task-specific augmen-
representation training as proposed by previous tation or exploring meta-learning to adaptively se-
work (Liu et al., 2019). Yet, the best-performed lect different CL objectives is a promising future
baseline is still not as good as our best-proposed direction.
6 Conclusion Mohit Iyyer, Varun Manjunatha, Jordan Boyd-Graber,
and Hal Daumé III. 2015. Deep unordered compo-
In this work, we presented an instantiation for con- sition rivals syntactic methods for text classification.
trastive sentence representation learning. By care- In Proceedings of the 53rd annual meeting of the as-
fully designing and testing different data augmen- sociation for computational linguistics and the 7th
international joint conference on natural language
tations and combinations, we prove the proposed processing (volume 1: Long papers), pages 1681–
methods’ effectiveness on GLUE and SentEval 1691.
benchmark under the diverse pre-training corpus.
Paras Jain, Ajay Jain, Tianjun Zhang, Pieter Abbeel,
The experiment results indicate that the pre- Joseph E Gonzalez, and Ion Stoica. 2020. Con-
trained model would be more robust when lever- trastive code representation learning. arXiv preprint
aging adequate sentence-level supervision. More arXiv:2007.04973.
importantly, we reveal that different augmentation Robin Jia, Aditi Raghunathan, Kerem Göksel, and
learns different features for the model. Finally, we Percy Liang. 2019. Certified robustness to
demonstrate that the performance improvement adversarial word substitutions. arXiv preprint
comes from both the larger batch size and the con- arXiv:1909.00986.
trastive loss. Diederik P Kingma and Jimmy Ba. 2014. Adam: A
method for stochastic optimization. arXiv preprint
arXiv:1412.6980.
References Ryan Kiros, Yukun Zhu, Russ R Salakhutdinov,
Samuel R Bowman, Gabor Angeli, Christopher Potts, Richard Zemel, Raquel Urtasun, Antonio Torralba,
and Christopher D Manning. 2015. A large anno- and Sanja Fidler. 2015. Skip-thought vectors. In
tated corpus for learning natural language inference. Advances in neural information processing systems,
arXiv preprint arXiv:1508.05326. pages 3294–3302.

Ting Chen, Simon Kornblith, Mohammad Norouzi, Zhenzhong Lan, Mingda Chen, Sebastian Goodman,
and Geoffrey Hinton. 2020. A simple framework for Kevin Gimpel, Piyush Sharma, and Radu Soricut.
contrastive learning of visual representations. arXiv 2019. Albert: A lite bert for self-supervised learn-
preprint arXiv:2002.05709. ing of language representations. arXiv preprint
arXiv:1909.11942.
Kevin Clark, Minh-Thang Luong, Quoc V Le, and
Christopher D Manning. 2020. Electra: Pre-training Mike Lewis, Yinhan Liu, Naman Goyal, Mar-
text encoders as discriminators rather than genera- jan Ghazvininejad, Abdelrahman Mohamed, Omer
tors. arXiv preprint arXiv:2003.10555. Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019.
Bart: Denoising sequence-to-sequence pre-training
Alexis Conneau and Douwe Kiela. 2018. Senteval: An for natural language generation, translation, and
evaluation toolkit for universal sentence representa- comprehension. arXiv preprint arXiv:1910.13461.
tions. arXiv preprint arXiv:1803.05449.
Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Edunov, Marjan Ghazvininejad, Mike Lewis, and
Kristina Toutanova. 2019. Bert: Pre-training of Luke Zettlemoyer. 2020. Multilingual denoising
deep bidirectional transformers for language under- pre-training for neural machine translation. arXiv
standing. In Proceedings of the 2019 Conference of preprint arXiv:2001.08210.
the North American Chapter of the Association for Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man-
Computational Linguistics: Human Language Tech- dar Joshi, Danqi Chen, Omer Levy, Mike Lewis,
nologies, Volume 1 (Long and Short Papers), pages Luke Zettlemoyer, and Veselin Stoyanov. 2019.
4171–4186. Roberta: A robustly optimized bert pretraining ap-
proach. arXiv preprint arXiv:1907.11692.
Hongchao Fang and Pengtao Xie. 2020. Cert: Con-
trastive self-supervised learning for language under- Lajanugen Logeswaran and Honglak Lee. 2018. An
standing. arXiv preprint arXiv:2005.12766. efficient framework for learning sentence represen-
tations. arXiv preprint arXiv:1803.02893.
John M Giorgi, Osvald Nitski, Gary D Bader, and
Bo Wang. 2020. Declutr: Deep contrastive learn- Stephen Merity, Caiming Xiong, James Bradbury, and
ing for unsupervised textual representations. arXiv Richard Socher. 2016. Pointer sentinel mixture mod-
preprint arXiv:2006.03659. els. arXiv preprint arXiv:1609.07843.
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ishan Misra and Laurens van der Maaten. 2020. Self-
Ross Girshick. 2020. Momentum contrast for unsu- supervised learning of pretext-invariant representa-
pervised visual representation learning. In Proceed- tions. In Proceedings of the IEEE/CVF Conference
ings of the IEEE/CVF Conference on Computer Vi- on Computer Vision and Pattern Recognition, pages
sion and Pattern Recognition, pages 9729–9738. 6707–6717.
Alec Radford, Karthik Narasimhan, Tim Salimans, and Chengxu Zhuang, Alex Lin Zhai, and Daniel Yamins.
Ilya Sutskever. 2018. Improving language under- 2019. Local aggregation for unsupervised learning
standing by generative pre-training. URL https://s3- of visual embeddings. In Proceedings of the IEEE
us-west-2. amazonaws. com/openai-assets/research- International Conference on Computer Vision, pages
covers/languageunsupervised/language understand- 6002–6012.
ing paper. pdf.

Nils Reimers and Iryna Gurevych. 2019. Sentence-

bert: Sentence embeddings using siamese bert-
networks. arXiv preprint arXiv:1908.10084.

Dinghan Shen, Guoyin Wang, Wenlin Wang, Mar-

tin Renqiang Min, Qinliang Su, Yizhe Zhang, Chun-
yuan Li, Ricardo Henao, and Lawrence Carin.
2018. Baseline needs more love: On simple word-
embedding-based models and associated pooling
mechanisms. arXiv preprint arXiv:1805.09843.

Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, and Tie-

Yan Liu. 2019. Mass: Masked sequence to sequence
pre-training for language generation. arXiv preprint
arXiv:1905.02450.

Yonglong Tian, Dilip Krishnan, and Phillip Isola.

2019. Contrastive multiview coding. arXiv preprint
arXiv:1906.05849.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob

Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz
Kaiser, and Illia Polosukhin. 2017. Attention is all
you need. In Advances in neural information pro-
cessing systems, pages 5998–6008.

Alex Wang, Amanpreet Singh, Julian Michael, Felix

Hill, Omer Levy, and Samuel R Bowman. 2018.
Glue: A multi-task benchmark and analysis platform
for natural language understanding. arXiv preprint
arXiv:1804.07461.

Adina Williams, Nikita Nangia, and Samuel R Bow-

man. 2017. A broad-coverage challenge corpus for
sentence understanding through inference. arXiv
preprint arXiv:1704.05426.

Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua

Lin. 2018. Unsupervised feature learning via non-
parametric instance discrimination. In Proceedings
of the IEEE Conference on Computer Vision and Pat-
tern Recognition, pages 3733–3742.

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Car-

bonell, Ruslan Salakhutdinov, and Quoc V Le.
2019. Xlnet: Generalized autoregressive pretrain-
ing for language understanding. arXiv preprint
arXiv:1906.08237.

Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhut-

dinov, Raquel Urtasun, Antonio Torralba, and Sanja
Fidler. 2015. Aligning books and movies: Towards
story-like visual explanations by watching movies
and reading books. In Proceedings of the IEEE inter-
national conference on computer vision, pages 19–
27.

Task-Based Language Teaching
From Everand
Task-Based Language Teaching
Farahnaz Faez
No ratings yet
Transforming Education with AI: Guide to Understanding and Using ChatGPT in the Classroom
From Everand
Transforming Education with AI: Guide to Understanding and Using ChatGPT in the Classroom
Shane Snipes, PhD
No ratings yet
AugCSE: Contrastive Sentence Embedding With Diverse Augmentations
No ratings yet
AugCSE: Contrastive Sentence Embedding With Diverse Augmentations
24 pages
Simcse: Simple Contrastive Learning of Sentence Embeddings
No ratings yet
Simcse: Simple Contrastive Learning of Sentence Embeddings
17 pages
Simcse: Simple Contrastive Learning of Sentence Embeddings
No ratings yet
Simcse: Simple Contrastive Learning of Sentence Embeddings
17 pages
Transformer Part3 16 Mar 23 PDF
No ratings yet
Transformer Part3 16 Mar 23 PDF
59 pages
N Efficient Framework For Learning Sentence Representations
No ratings yet
N Efficient Framework For Learning Sentence Representations
16 pages
SimCSE - Simple Contrastive Learning of Sentence Embeddings
No ratings yet
SimCSE - Simple Contrastive Learning of Sentence Embeddings
16 pages
BERT Finetuning Theory
No ratings yet
BERT Finetuning Theory
14 pages
Whitening Sentence Representations For Better Semantics and Faster Retrieval
No ratings yet
Whitening Sentence Representations For Better Semantics and Faster Retrieval
9 pages
Unsupervised Learning of Sentence Embeddings Using Compositional N-Gram Features
No ratings yet
Unsupervised Learning of Sentence Embeddings Using Compositional N-Gram Features
11 pages
Pretraining Part1 16 Mar 23 PDF
No ratings yet
Pretraining Part1 16 Mar 23 PDF
32 pages
2024.eacl-long.104
No ratings yet
2024.eacl-long.104
14 pages
No Training Required Exploring Random Encoders For Sentence Classification
No ratings yet
No Training Required Exploring Random Encoders For Sentence Classification
16 pages
Understanding by Understanding Not- Modeling Negation in Language Models
No ratings yet
Understanding by Understanding Not- Modeling Negation in Language Models
12 pages
1909.02209v3 (1)
No ratings yet
1909.02209v3 (1)
8 pages
Jacob Devlin BERT
No ratings yet
Jacob Devlin BERT
43 pages
Truncated_Doc_1
No ratings yet
Truncated_Doc_1
3 pages
Improving Language Understanding by Generative Pre-Training
No ratings yet
Improving Language Understanding by Generative Pre-Training
12 pages
Conditional BERT Contextual
No ratings yet
Conditional BERT Contextual
12 pages
Semantics-Aware BERT For Language Understanding: Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li
No ratings yet
Semantics-Aware BERT For Language Understanding: Zhuosheng Zhang, Yuwei Wu, Hai Zhao, Zuchao Li
8 pages
Transformers MUIA
No ratings yet
Transformers MUIA
34 pages
BERT Architecture
No ratings yet
BERT Architecture
23 pages
2004 09813v1 PDF
No ratings yet
2004 09813v1 PDF
10 pages
A Broad-Coverage Challenge Corpus For Sentence Understanding Through Inference
No ratings yet
A Broad-Coverage Challenge Corpus For Sentence Understanding Through Inference
11 pages
495 Lecture 11 BERT
No ratings yet
495 Lecture 11 BERT
31 pages
Improving BERT Model Using Contrastive Learning For Biomedical Relation Extraction
No ratings yet
Improving BERT Model Using Contrastive Learning For Biomedical Relation Extraction
10 pages
N19-1213
No ratings yet
N19-1213
7 pages
How To Fine-Tune BERT For Text Classification?: Corresponding Author The Source Codes Are Available at
No ratings yet
How To Fine-Tune BERT For Text Classification?: Corresponding Author The Source Codes Are Available at
10 pages
2302.13007v3
No ratings yet
2302.13007v3
12 pages
Transfer Learning in Natural Language Processing PDF
0% (1)
Transfer Learning in Natural Language Processing PDF
238 pages
Bert
No ratings yet
Bert
20 pages
A Large Annotated Corpus for Learning Natural Language Inference
No ratings yet
A Large Annotated Corpus for Learning Natural Language Inference
11 pages
NLP-LLM
No ratings yet
NLP-LLM
47 pages
Ablation
No ratings yet
Ablation
15 pages
BERT
No ratings yet
BERT
4 pages
GLM: General Language Model Pretraining With Autoregressive Blank Infilling
No ratings yet
GLM: General Language Model Pretraining With Autoregressive Blank Infilling
16 pages
Ernie 2.0 A Continual Pre-Training Framework For
No ratings yet
Ernie 2.0 A Continual Pre-Training Framework For
11 pages
5. Scaling Sentence Embeddings With Large Language Models
No ratings yet
5. Scaling Sentence Embeddings With Large Language Models
22 pages
GTE
No ratings yet
GTE
18 pages
Bert
No ratings yet
Bert
10 pages
Pretraining-Based Natural Language Generation For Text Summarization
No ratings yet
Pretraining-Based Natural Language Generation For Text Summarization
7 pages
BERT Slides
No ratings yet
BERT Slides
41 pages
Pre Trained Models For NLP
No ratings yet
Pre Trained Models For NLP
15 pages
Multi-Task Pre-Training Language Model For Semantic Network Completion
No ratings yet
Multi-Task Pre-Training Language Model For Semantic Network Completion
10 pages
Bert - Se: A P - L R M S E: RE Trained Anguage Epresentation Odel For Oftware Ngineering
No ratings yet
Bert - Se: A P - L R M S E: RE Trained Anguage Epresentation Odel For Oftware Ngineering
17 pages
cs224n 2023 Lecture9 Pretraining
No ratings yet
cs224n 2023 Lecture9 Pretraining
54 pages
Language-Agnostic BERT Sentence Embedding
No ratings yet
Language-Agnostic BERT Sentence Embedding
14 pages
Inducing Neural Models of Script Knowledge
No ratings yet
Inducing Neural Models of Script Knowledge
9 pages
report24
No ratings yet
report24
7 pages
Multi-Task Deep Neural Networks For Natural Language Understanding
No ratings yet
Multi-Task Deep Neural Networks For Natural Language Understanding
10 pages
Transactions of The Association For COmputational Linguistics PDF
No ratings yet
Transactions of The Association For COmputational Linguistics PDF
14 pages
1 s2.0 S2095809922006324 Main
No ratings yet
1 s2.0 S2095809922006324 Main
20 pages
Sun 等 - 2022 - Sentence Similarity Based on Contexts
No ratings yet
Sun 等 - 2022 - Sentence Similarity Based on Contexts
16 pages
A Little Pretraining Goes A Long Way: A Case Study On Dependency Parsing Task For Low-Resource Morphologically Rich Languages
No ratings yet
A Little Pretraining Goes A Long Way: A Case Study On Dependency Parsing Task For Low-Resource Morphologically Rich Languages
10 pages
Multi-Task Deep Neural Networks for Natural Language Understanding
No ratings yet
Multi-Task Deep Neural Networks for Natural Language Understanding
10 pages
uniclip评测
No ratings yet
uniclip评测
20 pages
(Evenzoha, Danr) Uiuc, Edu: A Classification Approach To Word Prediction
No ratings yet
(Evenzoha, Danr) Uiuc, Edu: A Classification Approach To Word Prediction
8 pages
Agentive Cognitive Construction Grammar: Mind, Agency and the Materiality of Language: Agentive Cognitive Construction Grammar
From Everand
Agentive Cognitive Construction Grammar: Mind, Agency and the Materiality of Language: Agentive Cognitive Construction Grammar
Sergio Torres-Martínez
No ratings yet
Insights into Task-Based Language Teaching
From Everand
Insights into Task-Based Language Teaching
Sima Khezrlou
No ratings yet
2507.08775v1
No ratings yet
2507.08775v1
10 pages
2507.08792v1
No ratings yet
2507.08792v1
49 pages
2507.08778v1
No ratings yet
2507.08778v1
9 pages
2507.08760v1
No ratings yet
2507.08760v1
26 pages
2507.02384v1
No ratings yet
2507.02384v1
17 pages
2507.02685v1
No ratings yet
2507.02685v1
36 pages
Measuring and Controlling Instruction in Stability in Language Model Dialogs
No ratings yet
Measuring and Controlling Instruction in Stability in Language Model Dialogs
19 pages
2507.02147v1
No ratings yet
2507.02147v1
32 pages
2507.02062v1
No ratings yet
2507.02062v1
223 pages
2507.02780v1
No ratings yet
2507.02780v1
28 pages
2506.15779v1
No ratings yet
2506.15779v1
13 pages
2506.15775v1
No ratings yet
2506.15775v1
20 pages
2506.14869v1
No ratings yet
2506.14869v1
11 pages
2506.14592v1
No ratings yet
2506.14592v1
21 pages
Full-Stack Optimized Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation
No ratings yet
Full-Stack Optimized Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation
30 pages
SQL Exercises
No ratings yet
SQL Exercises
17 pages
Customizing Language Models With Instance-wise LoRA for Sequential Recommendation
No ratings yet
Customizing Language Models With Instance-wise LoRA for Sequential Recommendation
17 pages
2506.15780v1
No ratings yet
2506.15780v1
13 pages
2506.15027v1
No ratings yet
2506.15027v1
28 pages
2506.15462v1
No ratings yet
2506.15462v1
5 pages
Moe
No ratings yet
Moe
23 pages
Separation Strategies for Three Pitfalls in AB Testing Withacknowledgments
No ratings yet
Separation Strategies for Three Pitfalls in AB Testing Withacknowledgments
7 pages
Dialogue Natural Language Inference
No ratings yet
Dialogue Natural Language Inference
11 pages
2307.05300v4
No ratings yet
2307.05300v4
23 pages
Agent Survey
No ratings yet
Agent Survey
35 pages
A Survey on Contrastive Self-supervised Learning
No ratings yet
A Survey on Contrastive Self-supervised Learning
21 pages
RAG_PText
No ratings yet
RAG_PText
11 pages
Dimensionality Reduction by Learning an Invariant Mapping
No ratings yet
Dimensionality Reduction by Learning an Invariant Mapping
9 pages
m Rag Survey
No ratings yet
m Rag Survey
80 pages
Notes-8398-6867-Mid Term Notes Grade 10
No ratings yet
Notes-8398-6867-Mid Term Notes Grade 10
8 pages
Customizing Stata Graphs Made Easy (Part 2) : 18, Number 4, Pp. 786-802
No ratings yet
Customizing Stata Graphs Made Easy (Part 2) : 18, Number 4, Pp. 786-802
17 pages
Systems Analysis and Design: Determining System Requirements
No ratings yet
Systems Analysis and Design: Determining System Requirements
36 pages
Sample Computer Practical File 12
No ratings yet
Sample Computer Practical File 12
130 pages
Web Technology Lab
No ratings yet
Web Technology Lab
1 page
Novel Convolutional Neural Network (NCNN) For The Diagnosis of Bearing Defects in Rotary Machinery
No ratings yet
Novel Convolutional Neural Network (NCNN) For The Diagnosis of Bearing Defects in Rotary Machinery
10 pages
Sanidhya Rajput
No ratings yet
Sanidhya Rajput
6 pages
Spin Dump
No ratings yet
Spin Dump
1,695 pages
Datasheet BRAX FA5532
No ratings yet
Datasheet BRAX FA5532
3 pages
EWM CLASS 33 - Slotting Process
No ratings yet
EWM CLASS 33 - Slotting Process
8 pages
DSP QB Updated - New
No ratings yet
DSP QB Updated - New
7 pages
Time Table For Summer 2024 Theory Examination
No ratings yet
Time Table For Summer 2024 Theory Examination
14 pages
ISIO 200: Binary Input/Output (I/O) Terminal With IEC 61850 Interface
No ratings yet
ISIO 200: Binary Input/Output (I/O) Terminal With IEC 61850 Interface
8 pages
Lecture 12 DFA With Output 29052023 112044pm
No ratings yet
Lecture 12 DFA With Output 29052023 112044pm
42 pages
Organization Names
No ratings yet
Organization Names
4 pages
Java File 44214802718 2
No ratings yet
Java File 44214802718 2
84 pages
Grade 8 Ch 2 QA
No ratings yet
Grade 8 Ch 2 QA
2 pages
Unit 8 - Object Oriented Programming / C++
No ratings yet
Unit 8 - Object Oriented Programming / C++
32 pages
f3 Comps Assignment
No ratings yet
f3 Comps Assignment
3 pages
Sysmocom - S.F.M.C. GMBH: Sysmousim / Sysmoisim User Manual
No ratings yet
Sysmocom - S.F.M.C. GMBH: Sysmousim / Sysmoisim User Manual
51 pages
Online BCA Brochure 2025
No ratings yet
Online BCA Brochure 2025
18 pages
MLA Style Research Paper Template - Basic - Google Docs
No ratings yet
MLA Style Research Paper Template - Basic - Google Docs
4 pages
Computer Science Project
No ratings yet
Computer Science Project
29 pages
Database Design
No ratings yet
Database Design
6 pages
Microchip Mid-Range PIC MCU Peripherals
No ratings yet
Microchip Mid-Range PIC MCU Peripherals
74 pages
C Mock Test-2
No ratings yet
C Mock Test-2
10 pages
HVCK Issue 2 2022
No ratings yet
HVCK Issue 2 2022
58 pages
lastException_63853554167
No ratings yet
lastException_63853554167
29 pages
Data Reduction Techniques
No ratings yet
Data Reduction Techniques
41 pages
FPGA DS 02007 2 2 CrossLink Family Data Sheet
No ratings yet
FPGA DS 02007 2 2 CrossLink Family Data Sheet
67 pages