AI EXPRESS - Hot Deal 4 VCs instabooks.co
  • AI
    Zoom enters the conversational AI arena

    Zoom enters the conversational AI arena

    How AI can help reduce food waste

    How AI can help reduce food waste

    Top AI startup news of the week: generative AI is blowing up

    Top AI startup news of the week: generative AI is blowing up

    NIST releases new AI risk management framework for 'trustworthy' AI

    NIST releases new AI risk management framework for ‘trustworthy’ AI

    Accelerating AI for growth: The key role of infrastructure

    Accelerating AI for growth: The key role of infrastructure

    AI reskilling: A solution to the worker crisis

    How companies can practice ethical AI

  • ML
    Cohere brings language AI to Amazon SageMaker

    Cohere brings language AI to Amazon SageMaker

    Upscale images with Stable Diffusion in Amazon SageMaker JumpStart

    Upscale images with Stable Diffusion in Amazon SageMaker JumpStart

    Best Egg achieved three times faster ML model training with Amazon SageMaker Automatic Model Tuning

    Best Egg achieved three times faster ML model training with Amazon SageMaker Automatic Model Tuning

    Explain text classification model predictions using Amazon SageMaker Clarify

    Explain text classification model predictions using Amazon SageMaker Clarify

    Build a loyalty points anomaly detector using Amazon Lookout for Metrics

    Build a loyalty points anomaly detector using Amazon Lookout for Metrics

    Machine Learning

    Beginner’s Guide to Machine Learning and Deep Learning in 2023

    ­­How CCC Intelligent Solutions created a custom approach for hosting complex AI models using Amazon SageMaker

    ­­How CCC Intelligent Solutions created a custom approach for hosting complex AI models using Amazon SageMaker

    Churn prediction using multimodality of text and tabular features with Amazon SageMaker Jumpstart

    Churn prediction using multimodality of text and tabular features with Amazon SageMaker Jumpstart

    Set up Amazon SageMaker Studio with Jupyter Lab 3 using the AWS CDK

    Set up Amazon SageMaker Studio with Jupyter Lab 3 using the AWS CDK

  • NLP
    Predictions 2023: What's coming next in enterprise technology

    Predictions 2023: What’s coming next in enterprise technology

    Google

    How Google’s AI tool Sparrow is looking to kill ChatGPT

    IDLE Signs Letter of Intent fo

    IDLE Signs Letter of Intent fo

    5 Ways ML And SME Collaboration Can Accelerate Innovation

    5 Ways ML And SME Collaboration Can Accelerate Innovation

    Best AI Voice Generators In 2023

    Best AI Voice Generators In 2023

    A Guide For Tech Leaders

    A Guide For Tech Leaders

    WFIN Local News

    Move over, Siri: Apple’s new audiobook AI voice sounds like a human

    Aveni Detect arrives on Genesys AppFoundry

    Tintra hires fromer HSBC exec Paul James as COO

    BioDatAi partners with Krista Software and Self Pay Medical to Enhance Information Sharing and Collaboration Between Healthcare Providers, Patients, and Payers

  • Vision
    A Review of the Image Quality Metrics used in Image Generative Models

    A Review of the Image Quality Metrics used in Image Generative Models

    CoaXPress Frame Grabbers for Machine Vision

    CoaXPress Frame Grabbers for Machine Vision

    Translation Invariance & Equivariance in Convolutional Neural Networks

    Translation Invariance & Equivariance in Convolutional Neural Networks

    Roll Model: Smart Stroller Pushes Its Way to the Top at CES 2023

    Roll Model: Smart Stroller Pushes Its Way to the Top at CES 2023

    Image Annotation: Best Software Tools and Solutions in 2023

    Image Annotation: Best Software Tools and Solutions in 2023

    Artificial Neural Network: Everything you need to know

    Artificial Neural Network: Everything you need to know

    Deep Learning Model Explainability with SHAP

    Deep Learning Model Explainability with SHAP

    Image Segmentation with Deep Learning (Guide)

    Image Segmentation with Deep Learning (Guide)

    The Most Popular Deep Learning Software In 2023

    The Most Popular Deep Learning Software In 2023

  • Robotics
    asensus surgical

    Asensus Surgical wins CE mark for expanded machine learning

    Built Robotics acquires Roin Technologies to accelerate construction robotics roadmap

    Built Robotics acquires Roin Technologies to accelerate construction robotics roadmap

    6 keys to selecting a contract manufacturer

    6 keys to selecting a contract manufacturer

    Savioke is now Relay Robotics

    Relay Robotics expands senior product leadership team

    Scythe Robotics raises $42M to scale autonomous lawnmowers

    Scythe Robotics raises $42M to scale autonomous lawnmowers

    cepton

    Cepton raises $100M for LiDAR sensors

    DLR

    DLR launches robot control software

    brightpick

    Brightpick brings in $19M for US expansion

    Ottonomy launches new Ottobot YETI autonomous delivery robot

    Ottonomy launches new Ottobot YETI autonomous delivery robot

  • RPA
    Future of Electronic Visit Verification (EVV) for Homecare

    Future of Electronic Visit Verification (EVV) for Homecare

    Benefits of Implementing RPA in Banking Industry

    Benefits of Implementing RPA in Banking Industry

    Robotic Process Automation

    What is RPA (Robotic Process Automation)?

    Top RPA Use Cases in Banking Industry in 2023

    Top RPA Use Cases in Banking Industry in 2023

    Accelerate Account Opening Process Using KYC Automation

    Accelerate Account Opening Process Using KYC Automation

    RPA Case Study in Banking

    RPA Case Study in Banking

    Reducing Service Ticket Volumes through Automated Password Reset Process

    Reducing Service Tickets Volume Using Password Reset Automation

    AccentCare Reduced 80% of Manual Work With AutomationEdge’ s RPA

    AccentCare Reduced 80% of Manual Work With AutomationEdge’ s RPA

    Why Every Business Should Implement Robotic Process Automation (RPA) in their Marketing Strategy

    Why Every Business Should Implement Robotic Process Automation (RPA) in their Marketing Strategy

  • Gaming
    God of War Ragnarok had a banner debut week at UK retail

    God of War Ragnarok had a banner debut week at UK retail

    A Little To The Left Review (Switch eShop)

    A Little To The Left Review (Switch eShop)

    Horizon Call of the Mountain will release alongside PlayStation VR2 in February

    Horizon Call of the Mountain will release alongside PlayStation VR2 in February

    Sonic Frontiers has Dreamcast-era jank and pop-in galore - but I can't stop playing it

    Sonic Frontiers has Dreamcast-era jank and pop-in galore – but I can’t stop playing it

    Incredible November Xbox Game Pass addition makes all other games obsolete

    Incredible November Xbox Game Pass addition makes all other games obsolete

    Free Monster Hunter DLC For Sonic Frontiers Now Available On Switch

    Free Monster Hunter DLC For Sonic Frontiers Now Available On Switch

    Somerville review: the most beautiful game I’ve ever played

    Somerville review: the most beautiful game I’ve ever played

    Microsoft Flight Sim boss confirms more crossover content like Halo's Pelican and Top Gun Maverick

    Microsoft Flight Sim boss confirms more crossover content like Halo’s Pelican and Top Gun Maverick

    The Game Awards nominations are in, with God of War Ragnarok up for 10 of them

    The Game Awards nominations are in, with God of War Ragnarok up for 10 of them

  • Investment
    OpenWeb

    OpenWeb Acquires Jeeng, for $100M

    elaborate

    Elaborate Raises $10M in Seed Funding

    Alleviant Medical

    Alleviant Medical Closes $75M Financing

    Ethos Wallet

    Ethos Wallet Raises $4.2M in Seed Funding

    ACE & Company Closes Fourth Buyout Co-Investment Fund, at $244M

    Tritium Partners Secures $684M for Third Private Equity Fund

    Floodbase

    Floodbase Raises $12M in Series A funding

    UptimeHealth

     UptimeHealth Raises $4.5M in Series A Funding

    PlanetWatch Raises €3M in Funding

    PlanetWatch Raises €3M in Funding

    Suppli

    Suppli Raises $3.1M in Seed Funding

  • More
    • Data analytics
    • Apps
    • No Code
    • Cloud
    • Quantum Computing
    • Security
    • AR & VR
    • Esports
    • IOT
    • Smart Home
    • Smart City
    • Crypto Currency
    • Blockchain
    • Reviews
    • Video
No Result
View All Result
AI EXPRESS - Hot Deal 4 VCs instabooks.co
No Result
View All Result
Home Machine Learning

Build a robust text-based toxicity predictor

by
December 6, 2022
in Machine Learning
0
Build a robust text-based toxicity predictor
0
SHARES
3
VIEWS
Share on FacebookShare on Twitter

With the expansion and recognition of on-line social platforms, individuals can keep extra linked than ever by way of instruments like on the spot messaging. Nonetheless, this raises an extra concern about poisonous speech, in addition to cyber bullying, verbal harassment, or humiliation. Content material moderation is essential for selling wholesome on-line discussions and creating wholesome on-line environments. To detect poisonous language content material, researchers have been creating deep learning-based pure language processing (NLP) approaches. Most up-to-date strategies make use of transformer-based pre-trained language fashions and obtain excessive toxicity detection accuracy.

In real-world toxicity detection purposes, toxicity filtering is generally utilized in security-relevant industries like gaming platforms, the place fashions are consistently being challenged by social engineering and adversarial assaults. Consequently, straight deploying text-based NLP toxicity detection fashions might be problematic, and preventive measures are crucial.

Analysis has proven that deep neural community fashions don’t make correct predictions when confronted with adversarial examples. There was a rising curiosity in investigating the adversarial robustness of NLP fashions. This has been accomplished with a physique of newly developed adversarial assaults designed to idiot machine translation, query answering, and textual content classification techniques.

On this put up, we prepare a transformer-based toxicity language classifier utilizing Hugging Face, check the skilled mannequin on adversarial examples, after which carry out adversarial coaching and analyze its impact on the skilled toxicity classifier.

Answer overview

Adversarial examples are deliberately perturbed inputs, aiming to mislead machine studying (ML) fashions in direction of incorrect outputs. Within the following instance (supply: https://aclanthology.org/2020.emnlp-demos.16.pdf), by altering simply the phrase “Good” to “Spotless,” the NLP mannequin gave a totally reverse prediction.

Social engineers can use this sort of attribute of NLP fashions to bypass toxicity filtering techniques. To make text-based toxicity prediction fashions extra strong towards deliberate adversarial assaults, the literature has developed a number of strategies. On this put up, we showcase one in every of them—adversarial coaching, and the way it improves textual content toxicity prediction fashions’ adversarial robustness.

Adversarial coaching

Profitable adversarial examples reveal the weak spot of the goal sufferer ML mannequin, as a result of the mannequin couldn’t precisely predict the label of those adversarial examples. By retraining the mannequin with a mixture of unique coaching information and profitable adversarial examples, the retrained mannequin might be extra strong towards future assaults. This course of is named adversarial coaching.

TextAttack Python library

TextAttack is a Python library for producing adversarial examples and performing adversarial coaching to enhance NLP fashions’ robustness. This library supplies implementations of a number of state-of-the-art textual content adversarial assaults from the literature and helps quite a lot of fashions and datasets. Its code and tutorials can be found on GitHub.

Dataset

The Toxic Comment Classification Challenge on Kaggle supplies numerous Wikipedia feedback which were labeled by human raters for poisonous conduct. The forms of toxicity are:

  • poisonous
  • severe_toxic
  • obscene
  • menace
  • insult
  • identity_hate

On this put up, we solely predict the poisonous column. The prepare set accommodates 159,571 situations with 144,277 non-toxic and 15,294 poisonous examples, and the check set accommodates 63,978 situations with 57,888 non-toxic and 6,090 poisonous examples. We break up the check set into validation and check units, which include 31,989 situations every with 29,028 non-toxic and a couple of,961 poisonous examples. The next charts illustrate our information distribution.

For the aim of demonstration, this put up randomly samples 10,000 situations for coaching, and 1,000 for validation and testing every, with every dataset balanced on each lessons. For particulars, discuss with our notebook.

Prepare a transformer-based poisonous language classifier

Step one is to coach a transformer-based poisonous language classifier. We use the pre-trained DistilBERT language mannequin as a base and fine-tune the mannequin on the Jigsaw poisonous remark classification coaching dataset.

Tokenization

Tokens are the constructing blocks of pure language inputs. Tokenization is a approach of separating a bit of textual content into tokens. Tokens can take a number of kinds, both phrases, characters, or subwords. To ensure that the fashions to know the enter textual content, a tokenizer is used to arrange the inputs for an NLP mannequin. Just a few examples of tokenizing embody splitting strings into subword token strings, changing token strings to IDs, and including new tokens to the vocabulary.

Within the following code, we use the pre-trained DistilBERT tokenizer to course of the prepare and check datasets:

pretrained_model_name_or_path = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path)

def preprocess_function(examples):
    end result = tokenizer(
        examples["text"], padding="max_length", max_length=128, truncation=True
    )
    return end result

train_dataset = train_dataset.map(
    preprocess_function, batched=True, load_from_cache_file=False, num_proc=num_proc
)

valid_dataset = valid_dataset.map(
    preprocess_function, batched=True, load_from_cache_file=False, num_proc=num_proc
)

test_dataset = test_dataset.map(
    preprocess_function, batched=True, load_from_cache_file=False, num_proc=num_proc
)

For every enter textual content, the DistilBERT tokenizer outputs 4 options:

  • textual content – Enter textual content.
  • labels – Output labels.
  • input_ids – Indexes of enter sequence tokens in a vocabulary.
  • attention_mask – Masks to keep away from performing consideration on padding token indexes. Masks values chosen are [0, 1]:
    • 1 for tokens that aren’t masked.
    • 0 for tokens which can be masked.

Now that we’ve the tokenized dataset, the following step is to coach the binary poisonous language classifier.

Modeling

Step one is to load the bottom mannequin, which is a pre-trained DistilBERT language mannequin. The mannequin is loaded with the Hugging Face Transformers class AutoModelForSequenceClassification:

base_model = AutoModelForSequenceClassification.from_pretrained(
    pretrained_model_name_or_path, num_labels=1
)

Then we customise the hyperparameters utilizing class TrainingArguments. The mannequin is skilled with batch dimension 32 on 10 epochs with studying charge of 5e-6 and warmup steps of 500. The skilled mannequin is saved in model_dir, which was outlined to start with of the pocket book.

training_args = TrainingArguments(
    output_dir=model_dir,
    num_train_epochs=10,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=5,
    logging_dir=os.path.be part of(model_dir, "logs"),
    learning_rate=5e-6,
    load_best_model_at_end=True,
    metric_for_best_model="loss",
    disable_tqdm=True,
)

To judge the mannequin’s efficiency throughout coaching, we have to present the Coach with an analysis perform. Right here we’re report accuracy, F1 scores, common precision, and AUC scores.

# compute metrics perform
def compute_metrics(pred):
    targets = 1 * (pred.label_ids >= 0.5)
    outputs = 1 * (pred.predictions >= 0.5)
    accuracy = metrics.accuracy_score(targets, outputs)
    f1_score_micro = metrics.f1_score(targets, outputs, common="micro")
    f1_score_macro = metrics.f1_score(targets, outputs, common="macro")
    f1_score_weighted = metrics.f1_score(targets, outputs, common="weighted")
    ap_score_micro = metrics.average_precision_score(
        targets, pred.predictions, common="micro"
    )
    ap_score_macro = metrics.average_precision_score(
        targets, pred.predictions, common="macro"
    )
    ap_score_weighted = metrics.average_precision_score(
        targets, pred.predictions, common="weighted"
    )
    auc_score_micro = metrics.roc_auc_score(targets, pred.predictions, common="micro")
    auc_score_macro = metrics.roc_auc_score(targets, pred.predictions, common="macro")
    auc_score_weighted = metrics.roc_auc_score(
        targets, pred.predictions, common="weighted"
    )
    return {
        "accuracy": accuracy,
        "f1_score_micro": f1_score_micro,
        "f1_score_macro": f1_score_macro,
        "f1_score_weighted": f1_score_weighted,
        "ap_score_micro": ap_score_micro,
        "ap_score_macro": ap_score_macro,
        "ap_score_weighted": ap_score_weighted,
        "auc_score_micro": auc_score_micro,
        "auc_score_macro": auc_score_macro,
        "auc_score_weighted": auc_score_weighted,
    }

The Coach class supplies an API for feature-complete coaching in PyTorch. Let’s instantiate the Coach by offering the bottom mannequin, coaching arguments, coaching and analysis dataset, in addition to the analysis perform:

coach = Coach(
    mannequin=base_model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=valid_dataset,
    compute_metrics=compute_metrics,
)

After the Coach is instantiated, we will kick off the coaching course of:

train_result = coach.prepare()

When the coaching course of is completed, we save the tokenizer and mannequin artifacts regionally:

tokenizer.save_pretrained(model_dir)
coach.save_model(model_dir)

Consider the mannequin robustness

On this part, we attempt to reply one query: how strong is our toxicity filtering mannequin towards text-based adversarial assaults? To reply this query, we choose an assault recipe from the TextAttack library and use it to assemble perturbed adversarial examples to idiot our goal toxicity filtering mannequin. Every assault recipe generates textual content adversarial examples by remodeling seed textual content inputs into barely modified textual content samples, whereas ensuring the seed and its perturbed textual content observe sure language constraints (for instance, semantic preserved). If these newly generated examples trick a goal mannequin into unsuitable classifications, the assault is profitable; in any other case, the assault fails for that seed enter.

See also  Liverpool partners to build sustainable travel corridors

A goal mannequin’s adversarial robustness is evaluated by way of the Assault Success Charge (ASR) metric. ASR is outlined because the ratio of profitable assaults towards all of the assaults. The decrease the ASR, the extra strong a mannequin is towards adversarial assaults.

Attack Success Rate

First, we outline a customized mannequin wrapper to wrap the tokenization and mannequin prediction collectively. This step additionally makes certain the prediction outputs meet the required output codecs by the TextAttack library.

class CustomModelWrapper(ModelWrapper):
    def __init__(self, mannequin):
        self.mannequin = mannequin

    def __call__(self, text_input_list):
        machine = self.mannequin.machine
        encoded_input = tokenizer(
            text_input_list,
            truncation=True,
            padding="max_length",
            max_length=128,
            return_tensors="pt",
        ).to(machine)     
        # print(encoded_input.machine)

        with torch.no_grad():
            output = self.mannequin(**encoded_input)
        logits = output.logits
        preds = torch.sigmoid(logits)
        preds = preds.squeeze(dim=-1)
        final_preds = torch.stack((1 - preds, preds), dim=1)
        return final_preds

Now we load the skilled mannequin and create a customized mannequin wrapper utilizing the skilled mannequin:

trained_model = AutoModelForSequenceClassification.from_pretrained(model_dir)
trained_model = trained_model.to("cuda:0")

model_wrapper = CustomModelWrapper(trained_model)

Generate assaults

Now we have to put together the dataset as seed for an assault recipe. Right here we solely use these poisonous examples as seeds, as a result of in a real-world situation, the social engineer will principally attempt to perturb poisonous examples to idiot a goal filtering mannequin as benign. Assaults might take time to generate; for the aim of this put up, we randomly pattern 1,000 poisonous coaching samples to assault.

We generate the adversarial examples for each check and prepare datasets. We use check adversarial examples for robustness analysis and the prepare adversarial examples for adversarial coaching.

threshold = 0.5
sub_sample_to_attack = 1000
df_train_to_attack = df_train[df_train['labels']==1].pattern(sub_sample_to_attack)

## We assault the poisonous samples
## Purpose is to perturbe poisonous samples sufficient that the mannequin classifies them as Non-toxic
test_dataset_to_attack = textattack.datasets.Dataset(
    [
        (x, 1)
        for x, y in zip(
            test_dataset["text"], 
            test_dataset["labels"], 
        )
        if y > threshold
    ]
)

train_dataset_to_attack = textattack.datasets.Dataset(
    [
        (x, 1)
        for x, y in zip(
            df_train_to_attack["text"],
            df_train_to_attack["labels"],
        )
        if y > threshold
    ]
)

Then we outline the perform to generate the assaults:

def generate_attacks(
    recipe, model_wrapper, dataset_to_attack, num_examples=-1, parallel=False
):
    print(f"The Assault Recipe is: {recipe}")
    if recipe == "textfoolerjin2019":
        assault = TextFoolerJin2019.construct(model_wrapper)
    elif recipe == "a2t_yoo_2021":
        assault = A2TYoo2021.construct(model_wrapper)
    elif recipe == "Pruthi2019":
        assault = Pruthi2019.construct(model_wrapper)
    elif recipe == "TextBuggerLi2018":
        assault = TextBuggerLi2018.construct(model_wrapper)
    elif recipe == "DeepWordBugGao2018":
        assault = DeepWordBugGao2018.construct(model_wrapper)

    attack_args = textattack.AttackArgs(
        num_examples=num_examples, parallel=parallel, num_workers_per_device=5
    )  
    ## num_examples = -1 means the whole dataset
    attacker = Attacker(assault, dataset_to_attack, attack_args)
    attack_results = attacker.attack_dataset()
    return attack_results

Select an assault recipe and generate assaults:

%%time
recipe="textfoolerjin2019"
test_attack_results = generate_attacks(recipe, model_wrapper, test_dataset_to_attack, num_examples=-1)
train_attack_results = generate_attacks(recipe, model_wrapper, train_dataset_to_attack, num_examples=-1)

Log the assault outcomes right into a Pandas information body:

def log_attack_results(attack_results):
    exception_ids = []
    logger = CSVLogger(color_method="html")
    
    for i in vary(len(attack_results)):
        attempt:
            end result = attack_results[i]
            logger.log_attack_result(end result)
        besides:
            exception_ids.append(i)
    df_attacks = logger.df
    return df_attacks, exception_ids


df_attacks_test, test_exception_ids = log_attack_results(test_attack_results)
df_attacks_train, train_exception_ids = log_attack_results(train_attack_results)

The assault outcomes include original_text, perturbed_text, original_output, and perturbed_output. When the perturbed_output is the other of the original_output, the assault is profitable.

Data

show(
    HTML(df_attacks_test[["original_text", "perturbed_text"]].head().to_html(escape=False))
)

The purple textual content represents a profitable assault, and the inexperienced represents a failed assault.

Attack Results

Consider the mannequin robustness by way of ASR

Use the next code to guage the mannequin robustness:

ASR_test = (
    df_attacks_test.result_type.value_counts()["Successful"]
    / df_attacks_test.result_type.value_counts().sum()
)

ASR_train = (
    df_attacks_train.result_type.value_counts()["Successful"]
    / df_attacks_train.result_type.value_counts().sum()
)

print(f"The Assault Success Charge of the mannequin towards check dataset is {ASR_test*100}%")

print(f"The Assault Success Charge of the mannequin towards prepare dataset is {ASR_train*100}%")

This returns the next:

The Assault Success Charge of the mannequin towards check dataset is 52.400000000000006%
The Assault Success Charge of the mannequin towards prepare dataset is 51.1%

Put together profitable assaults

With all of the assault outcomes out there, we take the profitable assault from the prepare adversarial examples and use them to retrain the mannequin:

# Provide the unique labels to the profitable assaults
# Right here the unique labels are all 1, there are additionally some datasets with fractional labels between 0-1

df_attacks_train = df_attacks_train[["perturbed_text", "result_type"]].copy()
df_attacks_train["labels"] = df_train_to_attack["labels"].reset_index(drop=True)

# Clear the textual content
df_attacks_train["text"] = df_attacks_train["perturbed_text"].change(
    "<font shade = .{1,6}>|</font>", "", regex=True
)
df_attacks_train["text"] = df_attacks_train["text"].change("<SPLIT>", "n", regex=True)

# Put together information so as to add to the coaching dataset
df_succ_attacks_train = df_attacks_train.loc[
    df_attacks_train.result_type == "Successful", ["text", "labels"]
]
df_succ_attacks_train.form, df_succ_attacks_train.head(2)

Successful Attacks

Adversarial coaching

On this part, we mix the profitable adversarial assaults from the coaching information with the unique coaching information, then prepare a brand new mannequin on this mixed dataset. This mannequin is named the adversarial skilled mannequin.

# New Prepare: Authentic Prepare + Profitable Assaults on Authentic Prepare

df_train_attacked = pd.concat([df_train, df_succ_attacks_train], ignore_index=True)
data_train_attacked = Dataset.from_pandas(df_train_attacked)
data_train_attacked = data_train_attacked.map(
    preprocess_function, batched=True, load_from_cache_file=False, num_proc=num_proc
)

training_args_AT = TrainingArguments(
    output_dir=model_dir_AT,
    num_train_epochs=10,
    per_device_train_batch_size=32,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    save_total_limit=5,
    logging_dir=os.path.be part of(model_dir, "logs"),
    learning_rate=5e-6,
    load_best_model_at_end=True,
    metric_for_best_model="loss",
    disable_tqdm=True,
)

trainer_AT = Coach(
    mannequin=base_model,
    args=training_args_AT,
    train_dataset=data_train_attacked,
    eval_dataset=valid_dataset,
    compute_metrics=compute_metrics
)

trainer_AT.prepare()

Save the adversarial skilled mannequin to native listing model_dir_AT:

tokenizer.save_pretrained(model_dir_AT)
trainer_AT.save_model(model_dir_AT)

Consider the robustness of the adversarial skilled mannequin

Now the mannequin is adversarially skilled, we wish to see how the mannequin robustness adjustments accordingly:

trained_model_AT = AutoModelForSequenceClassification.from_pretrained(model_dir_AT)
trained_model_AT = trained_model_AT.to("cuda:0")
trained_model_AT.machine

model_wrapper_AT = CustomModelWrapper(trained_model_AT)
test_attack_results_AT = generate_attacks(recipe, model_wrapper_AT, test_dataset_to_attack, num_examples=-1)
df_attacks_AT_test, AT_test_exception_ids = log_attack_results(test_attack_results_AT)

ASR_AT_test = (
    df_attacks_AT_test.result_type.value_counts()["Successful"]
    / df_attacks_AT_test.result_type.value_counts().sum()
)

print(f"The Assault Success Charge of the mannequin is {ASR_AT_test*100}%")

The previous code returns the next outcomes:

The Assault Success Charge of the mannequin is nineteen.8%

Evaluate the robustness of the unique mannequin and the adversarial skilled mannequin:

print(
    f"The ASR of the Adversarial Skilled mannequin has a {(ASR_test - ASR_AT_test)/ASR_test*100}% lower evaluate with the unique mannequin. This proves that the Adversarial Coaching improves the mannequin's robustness towards the assaults."
)

This returns the next:

The ASR of the Adversarial Skilled mannequin has a 62.213740458015266% lower
evaluate with the unique mannequin. This proves that the Adversarial Coaching
improves the mannequin's robustness towards the assaults.

To date, we’ve skilled a DistilBERT-based binary toxicity language classifier, examined its robustness towards adversarial textual content assaults, carried out adversarial coaching to acquire a brand new toxicity language classifier, and examined the brand new mannequin’s robustness towards adversarial textual content assaults.

See also  Train a time series forecasting model faster with Amazon SageMaker Canvas Quick build

We observe that the adversarial skilled mannequin has a decrease ASR, with an 62.21% lower utilizing the unique mannequin ASR because the benchmark. This means that the mannequin is extra strong towards sure adversarial assaults.

Mannequin efficiency analysis

Apart from mannequin robustness, we’re additionally eager about studying how a mannequin predicts on clear samples after it’s adversarially skilled. Within the following code, we use batch prediction mode to hurry up the analysis course of:

def batch_predict(model_wrapper, text_list, batch_size=64):
    """This perform performs batch prediction for given mannequin nad textual content listing"""
    predictions = []
    for i in tqdm(vary(0, len(text_list), batch_size)):
       batch = text_list[i : i + batch_size]
       model_predictions = model_wrapper(batch)[:, 1]
       model_predictions = model_predictions.cpu().numpy()
       predictions.append(model_predictions)
       predictions = np.concatenate(predictions, axis=0)
    return predictions

Consider the unique mannequin

We use the next code to guage the unique mannequin:

test_text_list = df_test.textual content.to_list()

model_predictions = batch_predict(model_wrapper, test_text_list, batch_size=64)

y_true_prob = np.array(df_test["labels"])
y_true = [0 if x < 0.5 else 1 for x in y_true_prob]

threshold = 0.5
y_pred_prob = model_predictions.flatten()
y_pred = [0 if x < threshold else 1 for x in y_pred_prob]

fig, ax = plt.subplots(figsize=(10, 10))
conf_matrix = confusion_matrix(y_true, y_pred)
ConfusionMatrixDisplay(conf_matrix).plot(ax=ax)
print(classification_report(y_true, y_pred))

The next figures summarize our findings.

Consider the adversarial skilled mannequin

Use the next code to guage the adversarial skilled mannequin:

model_predictions_AT = batch_predict(model_wrapper_AT, test_text_list, batch_size=64)

y_pred_prob_AT = model_predictions_AT.flatten()
y_pred_AT = [0 if x < threshold else 1 for x in y_pred_prob_AT]

fig, ax = plt.subplots(figsize=(10, 10))
conf_matrix = confusion_matrix(y_true, y_pred_AT)
ConfusionMatrixDisplay(conf_matrix).plot(ax=ax)
print(classification_report(y_true, y_pred_AT))

The next figures summarize our findings.

We observe that the adversarial skilled mannequin tended to foretell extra examples as poisonous (801 predicted as 1) in contrast with the unique mannequin (763 predicted as 1), which results in a rise in recall of the poisonous class and precision of the non-toxic class, and a drop in precision of the poisonous class and recall of the non-toxic class. This may as a consequence of the truth that extra of the poisonous class is seen within the adversarial coaching course of.

Abstract

As a part of content material moderation, toxicity language classifiers are used to filter poisonous content material and create more healthy on-line environments. Actual-world deployment of toxicity filtering fashions requires not solely excessive prediction efficiency, but in addition for being strong towards social engineering, like adversarial assaults. This put up supplies a step-by-step course of from coaching a toxicity language classifier to enhance its robustness with adversarial coaching. We present that adversarial coaching may help a mannequin grow to be extra strong towards assaults whereas sustaining excessive mannequin efficiency. For extra details about this up-and-coming matter, we encourage you to discover and check our script by yourself. You possibly can entry the pocket book on this put up from the AWS Examples GitHub repo.

Hugging Face and AWS introduced a partnership earlier in 2022 that makes it even simpler to coach Hugging Face fashions on SageMaker. This performance is out there by way of the event of Hugging Face AWS DLCs. These containers embody the Hugging Face Transformers, Tokenizers, and Datasets libraries, which permit us to make use of these sources for coaching and inference jobs. For a listing of the out there DLC photographs, see Available Deep Learning Containers Images. They’re maintained and commonly up to date with safety patches.

You’ll find many examples of learn how to prepare Hugging Face fashions with these DLCs within the following GitHub repo.

AWS provides pre-trained AWS AI companies that may be built-in into purposes utilizing API calls and require no ML expertise. For instance, Amazon Comprehend can carry out NLP duties corresponding to customized entity recognition, sentiment evaluation, key phrase extraction, matter modeling, and extra to assemble insights from textual content. It might probably carry out textual content evaluation on all kinds of languages for its varied options.

References


Concerning the Authors

Yi Xiang is a Information Scientist II on the Amazon Machine Studying Options Lab, the place she helps AWS prospects throughout completely different industries speed up their AI and cloud adoption.

Yanjun Qi is a Principal Utilized Scientist on the Amazon Machine Studying Answer Lab. She innovates and applies machine studying to assist AWS prospects velocity up their AI and cloud adoption.

Source link

Tags: BuildPredictorRobustTextbasedtoxicity
Previous Post

Startups set to solve supply chain problems

Next Post

Tech Briefing: Cisco-$2.B patent case; quantum & supercomputer linkup; AT&T’s fine

Next Post
Tech Briefing: Cisco-$2.B patent case; quantum & supercomputer linkup; AT&T’s fine

Tech Briefing: Cisco-$2.B patent case; quantum & supercomputer linkup; AT&T’s fine

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Newsletter

Popular Stories

  • Danbury, Conn., Officials Push for Fiber-Linked Smart Signals

    Danbury, Conn., Officials Push for Fiber-Linked Smart Signals

    0 shares
    Share 0 Tweet 0
  • Best Video Doorbell Cameras for 2023 – Including 24/7 recording

    0 shares
    Share 0 Tweet 0
  • Amid low rankings, Indiana eyes $240M increase in public health spending | News

    0 shares
    Share 0 Tweet 0
  • First primate relatives discovered in the high Arctic from around 52 million years ago

    0 shares
    Share 0 Tweet 0
  • Serotonin can impact the mitral valve of the heart, the study

    0 shares
    Share 0 Tweet 0

ML Jobs

View 115 ML Jobs at Tesla

View 165 ML Jobs at Nvidia

View 105 ML Jobs at Google

View 135 ML Jobs at Amamzon

View 131 ML Jobs at IBM

View 95 ML Jobs at Microsoft

View 205 ML Jobs at Meta

View 192 ML Jobs at Intel

Accounting and Finance Hub

Raised Seed, Series A, B, C Funding Round

Get a Free Insurance Quote

Try Our Accounting Service

AI EXPRESS – Hot Deal 4 VCs instabooks.co

AI EXPRESS is a news site that covers the latest developments in Artificial Intelligence, Data Analytics, ML & DL, Algorithms, RPA, NLP, Robotics, Smart Homes & Cities, Cloud & Quantum Computing, AR & VR and Blockchains

Categories

  • AI
  • Ai videos
  • Apps
  • AR & VR
  • Blockchain
  • Cloud
  • Computer Vision
  • Crypto Currency
  • Data analytics
  • Esports
  • Gaming
  • Gaming Videos
  • Investment
  • IOT
  • Iot Videos
  • Low Code No Code
  • Machine Learning
  • NLP
  • Quantum Computing
  • Robotics
  • Robotics Videos
  • RPA
  • Security
  • Smart City
  • Smart Home

Quick Links

  • Reviews
  • Deals
  • Best
  • AI Jobs
  • AI Events
  • AI Directory
  • Industries

© 2021 Aiexpress.io - All rights reserved.

  • Contact
  • Privacy Policy
  • Terms & Conditions

No Result
View All Result
  • AI
  • ML
  • NLP
  • Vision
  • Robotics
  • RPA
  • Gaming
  • Investment
  • More
    • Data analytics
    • Apps
    • No Code
    • Cloud
    • Quantum Computing
    • Security
    • AR & VR
    • Esports
    • IOT
    • Smart Home
    • Smart City
    • Crypto Currency
    • Blockchain
    • Reviews
    • Video

© 2021 Aiexpress.io - All rights reserved.