BERT_EM for Relation Classification

Unofficial implementation of the first contribution (BERT_EM) from the "MTB"-paper: Matching the Blanks: Distributional Similarity for Relation Learning. We evaluate this method as baseline in our paper Why only Micro-$F_1$? Class Weighting of Measures for Relation Classification.

Introduction

Previous work in relation classification (RC) shows great efforts to extract good relation representations. Inspired by the huge success of Transformers in NLP tasks, Baldini Soares et al. observes that Transformers (such as BERT) are good relation embedders, and further studies which embedding strategy out of the 6 variants can yield the best performance.

This repository is for reproducing the results of BERT_EM from their paper: Matching the Blanks: Distributional Similarity for Relation Learning.

🔭 Overview

Path	Description
configs/	This directory contains the Hydra config files that specify pre-defined settings.
data/	This directory where the user should put their data files.
docs/	This directory contains the auxiliary files such as the figures and the license presented in README.
src/mtb/	This directory is the package to be installed, which contains the source code of our implementation.

🚀 Installation

From source

git clone git@github.com:chen-yuxuan/MTB.git
cd MTB
pip install -e .

💡 Usage

To evaluate the default setting (i.e. model="bert-large-uncased", variant="f", max_length=512, batch_size=64, num_epochs=5, lr=3e-5, dropout=0), run:

python main.py

To run your own setting, for example do:

python main.py variant=a model="bert-base-cased" batch_size=32 num_epochs=10

To show the default config, do:

python main.py --help

which results in something like this:

== Config ==
Override anything in the config (foo.bar=value)

seed: 1234
cuda_device: 0
train_file: ./data/tacred/train.json
eval_file: ./data/tacred/dev.json
model: bert-large-uncased
variant: f
max_length: 512
batch_size: 64
lr: 3.0e-05
num_epochs: 5
dropout: 0

🔬 Experiments

Here we evaluate the TACRED and the SemEval datasets. If users have access to the two dataset, then put them under the ./data directory, such as ./data/tacred/train.json (as well as test and dev json files).

TACRED with `bert-base-cased`

Variant	Max-length	Micro F1-score (%)
a	512	18.4
b	512	65.8
d	512	65.5
e	512	66.3
f	512	65.7

SemEval with `bert-large-uncased`

Variant	Max-length	Micro F1-score (%)
a	128	79.4
b	128	89.2
d	128	88.7
e	128	89.6
f	128	89.0

📚 Citation

@inproceedings{baldini-soares-etal-2019-matching,
    title = "Matching the Blanks: Distributional Similarity for Relation Learning",
    author = "Baldini Soares, Livio  and
      FitzGerald, Nicholas  and
      Ling, Jeffrey  and
      Kwiatkowski, Tom",
    booktitle = "Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/P19-1279",
    doi = "10.18653/v1/P19-1279",
    pages = "2895--2905",
    abstract = "General purpose relation extractors, which can model arbitrary relations, are a core aspiration in information extraction. Efforts have been made to build general purpose extractors that represent relations with their surface forms, or which jointly embed surface forms with relations from an existing knowledge graph. However, both of these approaches are limited in their ability to generalize. In this paper, we build on extensions of Harris{'} distributional hypothesis to relations, as well as recent advances in learning text representations (specifically, BERT), to build task agnostic relation representations solely from entity-linked text. We show that these representations significantly outperform previous work on exemplar based relation extraction (FewRel) even without using any of that task{'}s training data. We also show that models initialized with our task agnostic representations, and then tuned on supervised relation extraction datasets, significantly outperform the previous methods on SemEval 2010 Task 8, KBP37, and TACRED",
}

📘 License

BERT_EM is released under the terms of the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
configs		configs
docs		docs
src/mtb		src/mtb
.gitignore		.gitignore
README.md		README.md
main.py		main.py
main_fs_all.py		main_fs_all.py
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BERT_EM for Relation Classification

Table of Contents

Introduction

🔭 Overview

🚀 Installation

From source

💡 Usage

🔬 Experiments

TACRED with `bert-base-cased`

SemEval with `bert-large-uncased`

📚 Citation

📘 License

About

Releases

Packages

Languages

chen-yuxuan/MTB

Folders and files

Latest commit

History

Repository files navigation

BERTEM for Relation Classification

Table of Contents

Introduction

🔭 Overview

🚀 Installation

From source

💡 Usage

🔬 Experiments

TACRED with bert-base-cased

SemEval with bert-large-uncased

📚 Citation

📘 License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

BERT_EM for Relation Classification

TACRED with `bert-base-cased`

SemEval with `bert-large-uncased`

Packages