AI EXPRESS
  • AI
    A close up of a microphone.

    IRS expands voice bot options for faster service

    HCL Technologies DRYiCE launches full-stack AIOps and observability solution

    HCL Technologies DRYiCE launches full-stack AIOps and observability solution

    Google places engineer on leave after claim LaMDA is ‘sentient’

    Google places engineer on leave after claim LaMDA is ‘sentient’

    Google employs ML to make Chrome more secure and enjoyable

    Google employs ML to make Chrome more secure and enjoyable

    Axon’s AI ethics board resign after TASER drone announcement

    Axon’s AI ethics board resign after TASER drone announcement

    IBM’s AI-powered Mayflower ship crosses the Atlantic

    IBM’s AI-powered Mayflower ship crosses the Atlantic

  • ML
    Remove Item From List Python

    How to Remove an Item From List Python

    Choose specific timeseries to forecast with Amazon Forecast

    Choose specific timeseries to forecast with Amazon Forecast

    python main

    Python Main Function and Examples with Code

    Import data from cross-account Amazon Redshift in Amazon SageMaker Data Wrangler for exploratory data analysis and data preparation

    Import data from cross-account Amazon Redshift in Amazon SageMaker Data Wrangler for exploratory data analysis and data preparation

    Fully Automating Server-side Object Detection Workflows – The Official Blog of BigML.com

    Fully Automating Server-side Object Detection Workflows –

    A Guide to installing Python Pip in 2022

    A Guide to installing Python Pip in 2022

    Accelerate your career with ML skills through the AWS Machine Learning Engineer Scholarship

    Accelerate your career with ML skills through the AWS Machine Learning Engineer Scholarship

    Programmable Object Detection, Fast and Easy – The Official Blog of BigML.com

    Programmable Object Detection, Fast and Easy –

    python substring

    Python Substring: What is a String in Python?

  • NLP
    AI Favors Autocracy, But Democracies Can Still Fight Back

    AI Favors Autocracy, But Democracies Can Still Fight Back

    25 projects highlighted at COMPSPEX event

    25 projects highlighted at COMPSPEX event

    Global Cloud Natural Language Processing Market

    Cloud Natural Language Processing Market to Eyewitness Massive Growth by 2031 – Designer Women

    Artificial Intelligence in the 4th Industrial Revolution

    Artificial Intelligence in the 4th Industrial Revolution

    SAS honors teams from around globe in global Hackathon event

    SAS honors teams from around globe in global Hackathon event

    Assistant / Associate Professor, College of Information Technology job with UNITED ARAB EMIRATES UNIVERSITY

    Assistant / Associate Professor, College of Information Technology job with UNITED ARAB EMIRATES UNIVERSITY

    OctoML CEO: MLOps needs to step aside for DevOps

    OctoML CEO: MLOps needs to step aside for DevOps

    ‘Europe has fallen behind in AI commercialisation’

    ‘Europe has fallen behind in AI commercialisation’

    CyberSaint Releases CyberStrong Version 3.20 Empowering Customers to Further Automate the Cyber & IT Risk Management Function

    CyberSaint Releases CyberStrong Version 3.20 Empowering Customers to Further Automate the Cyber & IT Risk Management Function

  • Vision
    Writing ResNet from Scratch in PyTorch

    Writing ResNet from Scratch in PyTorch

    Introduction to Pattern Matching

    Introduction to Pattern Matching

    viso.ai Logo

    MediaPipe: Google’s Open Source Framework for ML solutions (2022 Guide)

    Image Classification with Attention

    Image Classification with Attention

    viso.ai Logo

    Deep Reinforcement Learning: How It Works and Real World Examples

    viso.ai Logo

    Deep Face Recognition: An Easy-To-Understand Overview

    viso.ai Logo

    Image Data Augmentation for Computer Vision in 2022 (Guide)

    What’s Trending in Machine Vision? Part 4

    What’s Trending in Machine Vision? Part 4

    viso.ai Logo

    Object Detection in 2022: The Definitive Guide

  • Robotics
    cruise robotaxis in San Francisco

    Cruise hits milestone by charging for robotaxis rides

    UR20 cobot Universal Robots

    Anders Beck introduces the UR20; California bans autonomous tractors

    Are farmers ready for autonomous tractors?

    Calif.’s ongoing ban of autonomous tractors a major setback

    robots in mine

    Hiring levels for robotics jobs in mining hit year high in May

    Synkar offers sidewalk delivery as a service

    Synkar offers sidewalk delivery as a service

    Robust.AI announces new Grace software suite

    Robust.AI announces new Grace software suite

    osaro robot picks items for customer order

    OSARO automates Zenni fulfillment center

    csail simulation

    MIT CSAIL releases open-source simulator for autonomous vehicles

    proteus robot

    A decade after acquiring Kiva, Amazon unveils its first AMR

  • RPA
    Take employee experience into hyperdrive with Hyperautomation

    Hyperautomation- Your Answer to Enhance Employee Experience| AutomationEdge

    Know Why Automation Now Resides in the Heart of Customer Contact Centers| AutomationEdge

    Know Why Automation Now Resides in the Heart of Customer Contact Centers| AutomationEdge

    Conversational AI, Healing the Healthcare Industry| AutomationEdge

    Conversational AI, Healing the Healthcare Industry| AutomationEdge

    Reimagining the Ideal Service Desk with Conversational IT and AI

    Reimagining the Ideal Service Desk with Conversational IT and AI

    Breaking Through All the Customer Engagement Myths with Conversational AI

    Breaking Through All the Customer Engagement Myths with Conversational AI

    Reimagine and Recreate Customer Engagement with Conversational AI

    Reimagine and Recreate Customer Engagement with Conversational AI

    Invoice Management Made Easy With Automation and RPA solution

    Automated Invoice Processing: An Ardent Need of Modern Day Businesses

    Conversational AI- Oomphing Up HR Digitization Factor| AutomationEdge

    Conversational AI- Oomphing Up HR Digitization Factor| AutomationEdge

    Know how to Implement Conversational AI

    Alarm Ringing! Top 10 Tips to go about Conversational Marketing

  • Gaming
    EA to reveal Skate 4 in July - report

    EA to reveal Skate 4 in July – report

    Bungie suing person responsible for multiple fraudulent Destiny 2 DMCA takedowns

    Bungie suing person responsible for multiple fraudulent Destiny 2 DMCA takedowns

    Best Sonic Games Of All Time

    Best Sonic Games Of All Time

    Rumor has it Skull and Bones will be re-revealed in early July

    Rumor has it Skull and Bones will be re-revealed in early July

    Persona 5 fan zine founder syphons roughly $21,000 of raised funds - allegedly into Genshin Impact

    Persona 5 fan zine founder syphons roughly $21,000 of raised funds – allegedly into Genshin Impact

    Confusion reigns over PS Plus Premium's Classics catalogue

    Confusion reigns over PS Plus Premium’s Classics catalogue

    Stardew Valley Creator Working On Version 1.6, Includes "Some New Content"

    Stardew Valley Creator Working On Version 1.6, Includes “Some New Content”

    Why wait for another Fire Emblem when you can play Shining Force instead?

    Why wait for another Fire Emblem when you can play Shining Force instead?

    FromSoftware's next game in final stages of development as studio looks to beef up staff for multiple projects

    FromSoftware’s next game in final stages of development as studio looks to beef up staff for multiple projects

  • Investment
    Tibit Raises $30M in Series C Funding

    Tibit Raises $30M in Series C Funding

    Mana Interactive Raises Over $7M IN Seed Funding

    System 9 Closes $5.7M Series A Funding Round

    Prime Trust Raises Over $100M in Series B Funding

    Prime Trust Raises Over $100M in Series B Funding

    Post Script Media

    Post Script Media Raises $2M in Funding

    Evinced_Logo

    Evinced Raises $38M in Series B Funding

    CityFALCON Logo

    CityFALCON Raises $2M in Finding

    HourWork Raises $10M in Series A Funding

    Unify Jobs Raises $4.5M in Seed Funding

    Codetta Biosciences Raises $15M in Series A Financing

    Mojia Biotech Completes $80M Series B Financing

    ConductorOne

    ConductorOne Raises $15M in Series A Funding

  • More
    • Data analytics
    • Apps
    • No Code
    • Cloud
    • Quantum Computing
    • Security
    • AR & VR
    • Esports
    • IOT
    • Smart Home
    • Smart City
    • Crypto Currency
    • Blockchain
    • Reviews
    • Video
No Result
View All Result
AI EXPRESS
No Result
View All Result
Home Machine Learning

Fine-tune and deploy a Wav2Vec2 model for speech recognition with Hugging Face and Amazon SageMaker

by
April 15, 2022
in Machine Learning
0
Fine-tune and deploy a Wav2Vec2 model for speech recognition with Hugging Face and Amazon SageMaker
0
SHARES
86
VIEWS
Share on FacebookShare on Twitter

Computerized speech recognition (ASR) is a generally used machine studying (ML) know-how in our each day lives and enterprise situations. Purposes akin to voice-controlled assistants like Alexa and Siri, and voice-to-text functions like computerized subtitling for movies and transcribing conferences, are all powered by this know-how. These functions take audio clips as enter and convert speech alerts to textual content, additionally referred as speech-to-text functions.

This know-how has matured lately, and most of the newest fashions can obtain an excellent efficiency, akin to transformer-based fashions Wav2Vec2 and Speech2Text. Transformer is a sequence-to-sequence deep studying structure initially proposed for machine translation. Now it’s prolonged to unravel every kind of pure language processing (NLP) duties, akin to textual content classification, textual content summarization, and ASR. The transformer structure yields excellent mannequin efficiency and ends in numerous NLP duties; nevertheless, the fashions’ sizes (the variety of parameters) in addition to the quantity of knowledge they’re pre-trained on enhance exponentially when pursuing higher efficiency. It turns into very time-consuming and dear to coach a transformer from scratch, for instance coaching a BERT mannequin from scratch may take 4 days and value $6,912 (for extra data, see The Staggering Cost of Training SOTA AI Models). Hugging Face, an AI firm, offers an open-source platform the place builders can share and reuse hundreds of pre-trained transformer fashions. With the transfer learning method, you possibly can fine-tune your mannequin with a small set of labeled information for a goal use case. This reduces the general compute price, quickens the event lifecycle, and lessens the carbon footprint of the group.

AWS introduced collaboration with Hugging Face in 2021. Builders can simply work with Hugging Face fashions on Amazon SageMaker and profit from each worlds. You’ll be able to fine-tune and optimize all fashions from Hugging Face, and SageMaker offers managed coaching and inference providers that supply excessive efficiency sources and excessive scalability through Amazon SageMaker distributed coaching libraries. This collaboration may help you speed up your NLP duties’ productization journey and notice enterprise advantages.

This put up exhibits the right way to use SageMaker to simply fine-tune the newest Wav2Vec2 mannequin from Hugging Face, after which deploy the mannequin with a custom-defined inference course of to a SageMaker managed inference endpoint. Lastly, you possibly can take a look at the mannequin efficiency with pattern audio clips, and overview the corresponding transcription as output.

Wav2Vec2 background

Wav2Vec2 is a transformer-based structure for ASR duties and was launched in September 2020. The next diagram exhibits its simplified structure. For extra particulars, see the original paper. Because the diagram exhibits, the mannequin consists of a multi-layer convolutional community (CNN) as a function extractor, which takes an enter audio sign and outputs audio representations, additionally thought-about as options. They’re fed right into a transformer community to generate contextualized representations. This a part of coaching will be self-supervised; the transformer will be educated with unlabeled speech and study from it. Then the mannequin is fine-tuned on labeled information with the Connectionist Temporal Classification (CTC) algorithm for particular ASR duties. The bottom mannequin we use on this put up is Wav2Vec2-Base-960h, fine-tuned on 960 hours of Librispeech on 16 kHz sampled speech audio.

CTC is a character-based algorithm. Throughout coaching, it’s in a position to demarcate every character of the transcription within the speech mechanically, so the timeframe alignment isn’t required between audio sign and transcription. For instance, if the audio clip says “Whats up World,” we don’t must know during which second the phrase “howdy” is positioned. It saves plenty of labeling effort for ASR use circumstances. For extra details about how the algorithm works, check with Sequence Modeling With CTC.

Answer overview

On this put up, we use the SUPERB (Speech processing Universal PERformance Benchmark) dataset out there from the Hugging Face Datasets library, and fine-tune the Wav2Vec2 mannequin and deploy it as a SageMaker endpoint for real-time inference for an ASR job. SUPERB is a leaderboard to benchmark the efficiency of a shared mannequin throughout a variety of speech processing duties.

The next diagram offers a high-level view of the answer workflow.

First, we present the right way to load and preprocess the SUPERB dataset in a SageMaker atmosphere so as to get hold of a tokenizer and have extractor, that are required for fine-tuning the Wav2Vec2 mannequin. Then we use SageMaker Script Mode for coaching and inference steps, which lets you outline and use {custom} coaching and inference scripts, and SageMaker offers supported Hugging Face framework Docker containers. For extra details about coaching and serving Hugging Face fashions on SageMaker, see Use Hugging Face with Amazon SageMaker. This performance is offered by the event of Hugging Face AWS Deep Studying Containers (DLCs).

The pocket book and code from this put up can be found on GitHub. The pocket book is examined in each Amazon SageMaker Studio and SageMaker pocket book environments.

Knowledge preprocessing

On this part, we stroll by the steps to preprocess the info.

Course of the dataset

On this put up we use SUPERB dataset, which you’ll be able to load from the Hugging Face Datasets library instantly utilizing the load_dataset perform. The SUPERB dataset additionally contains speaker_id and chapter_id; we take away these columns and solely maintain audio information and transcriptions to fine-tune the Wav2Vec2 mannequin for an ASR job, which transcribes speech to textual content. To hurry up the fine-tuning course of for this instance, we solely take the take a look at dataset from the unique dataset, then break up it into practice and take a look at datasets. See the next code:

information = load_dataset("very good", 'asr', ignore_verifications=True) 
information = information.remove_columns(['speaker_id', 'chapter_id', 'id'])
# scale back the info quantity for this instance. solely take the take a look at information from the unique dataset for fine-tune
information = information['test'] 

train_test = information.train_test_split(test_size=0.2)
dataset = DatasetDict({
    'practice': train_test['train'],
    'take a look at': train_test['test']})

After we course of the info, the dataset construction is as follows:

DatasetDict({
    practice: Dataset({
        options: ['file', 'audio', 'text'],
        num_rows: 2096
    })
    take a look at: Dataset({
        options: ['file', 'audio', 'text'],
        num_rows: 524
    })
})

Let’s print one information level from the practice dataset and study the knowledge in every function. ‘file’ is the audio file path the place it’s saved and cached within the native repository. ‘audio’ accommodates three elements: ‘path’ is identical as ‘file’, ‘array’ is the numerical illustration of the uncooked waveform of the audio file in NumPy array format, and ‘sampling_rate’ exhibits the variety of samples of audio recorded each second. ‘textual content’ is the transcript of the audio file.

print(dataset['train'][0])
outcome: 
{ {'file': '/root/.cache/huggingface/datasets/downloads/extracted/e0f3d50e856945385982ba36b58615b72eef9b2ba5a2565bdcc225b70f495eed/LibriSpeech/test-clean/7021/85628/7021-85628-0000.flac',
 'audio': {'path': '/root/.cache/huggingface/datasets/downloads/extracted/e0f3d50e856945385982ba36b58615b72eef9b2ba5a2565bdcc225b70f495eed/LibriSpeech/test-clean/7021/85628/7021-85628-0000.flac',
  'array': array([-0.00018311, -0.00024414, -0.00018311, ...,  0.00061035,
          0.00064087,  0.00061035], dtype=float32),
  'sampling_rate': 16000},
 'textual content': 'however anders cared nothing about that'}

Construct a vocabulary file

The Wav2Vec2 mannequin makes use of the CTC algorithm to coach deep neural networks in sequence issues, and its output is a single letter or clean. It makes use of a character-based tokenizer. Due to this fact, we extract distinct letters from the dataset and construct the vocabulary file utilizing the next code:

def extract_characters(batch):
  texts = " ".be part of(batch["text"])
  vocab = record(set(texts))
  return {"vocab": [vocab], "texts": [texts]}

vocabs = dataset.map(extract_characters, batched=True, batch_size=-1, 
                   keep_in_memory=True, remove_columns= dataset.column_names["train"])

vocab_list = record(set(vocabs["train"]["vocab"][0]) | set(vocabs["test"]["vocab"][0]))
vocab_dict = {v: okay for okay, v in enumerate(vocab_list)}
vocab_dict["|"] = vocab_dict[" "]
del vocab_dict[" "]

vocab_dict["[UNK]"] = len(vocab_dict) # add "unknown" token 
vocab_dict["[PAD]"] = len(vocab_dict) # add a padding token that corresponds to CTC's "clean token"

with open('vocab.json', 'w') as vocab_file:
    json.dump(vocab_dict, vocab_file)

Create a tokenizer and have extractor

The Wav2Vec2 mannequin accommodates a tokenizer and have extractor. On this step, we use the vocab.json file that we created from the earlier step to create the Wav2Vec2CTCTokenizer. We use Wav2Vec2FeatureExtractor to be sure that the dataset utilized in fine-tuning has the identical audio sampling charge because the dataset used for pre-training. Lastly, we create a Wav2Vec2 processor that may wrap the function extractor and the tokenizer into one single processor. See the next code:

# create Wav2Vec2 tokenizer
tokenizer = Wav2Vec2CTCTokenizer("vocab.json", unk_token="[UNK]",
                                  pad_token="[PAD]", word_delimiter_token="|")

# create Wav2Vec2 function extractor
feature_extractor = Wav2Vec2FeatureExtractor(feature_size=1, sampling_rate=16000, 
                                             padding_value=0.0, do_normalize=True, return_attention_mask=False)
# create a processor pipeline 
processor = Wav2Vec2Processor(feature_extractor=feature_extractor, tokenizer=tokenizer)

Put together the practice and take a look at datasets

Subsequent, we extract the array illustration of the audio information and its sampling_rate from the dataset and course of them utilizing the processor, so as to have practice and take a look at information that may be consumed by the mannequin:

# extract the numerical illustration from the dataset
def extract_array_samplingrate(batch):
    batch["speech"] = batch['audio']['array'].tolist()
    batch["sampling_rate"] = batch['audio']['sampling_rate']
    batch["target_text"] = batch["text"]
    return batch

dataset = dataset.map(extract_array_samplingrate, 
                      remove_columns=dataset.column_names["train"])

# course of the dataset with processor pipeline that created above
def process_dataset(batch):  
    batch["input_values"] = processor(batch["speech"], 
                            sampling_rate=batch["sampling_rate"][0]).input_values

    with processor.as_target_processor():
        batch["labels"] = processor(batch["target_text"]).input_ids
    return batch

data_processed = dataset.map(process_dataset, 
                    remove_columns=dataset.column_names["train"], batch_size=8, 
                    batched=True)

train_dataset = data_processed['train']
test_dataset = data_processed['test']

Then we add the practice and take a look at information to Amazon Easy Storage Service (Amazon S3) utilizing the next code:

from datasets.filesystems import S3FileSystem
s3 = S3FileSystem()

# save train_dataset to s3
training_input_path = f's3://{BUCKET}/{PREFIX}/practice'
train_dataset.save_to_disk(training_input_path,fs=s3)

# save test_dataset to s3
test_input_path = f's3://{BUCKET}/{PREFIX}/take a look at'
test_dataset.save_to_disk(test_input_path,fs=s3)

High-quality-tune the Hugging Face mannequin (Wav2Vec2)

We use SageMaker Hugging Face DLC script mode to assemble the coaching and inference job, which lets you write {custom} coaching and serving code and utilizing Hugging Face framework containers which might be maintained and supported by AWS.

See also  Merge cells and column headers in Amazon Textract tables

After we create a coaching job utilizing the script mode, the entry_point script, hyperparameters, its dependencies (inside necessities.txt), and enter information (practice and take a look at datasets) are copied into the container. Then it invokes the entry_point coaching script, the place the practice and take a look at datasets are loaded, coaching steps are carried out, and mannequin artifacts are saved in /choose/ml/mannequin within the container. After coaching, artifacts on this listing are uploaded to Amazon S3 for later mannequin internet hosting.

You’ll be able to examine the coaching script within the GitHub repo, within the scripts/ listing.

Create an estimator and begin a coaching job

We use the Hugging Face estimator class to coach our mannequin. When creating the estimator, you must specify the next parameters:

  • entry_point – The identify of the coaching script. It hundreds information from the enter channels, configures coaching with hyperparameters, trains a mannequin, and saves the mannequin.
  • source_dir – The situation of the coaching scripts.
  • transformers_version – The Hugging Face Transformers library model we wish to use.
  • pytorch_version – The PyTorch model that’s appropriate with the Transformers library.

For this use case and dataset, we use one ml.p3.2xlarge occasion, and the coaching job is ready to end in round 2 hours. You’ll be able to choose a extra highly effective occasion with extra reminiscence and GPU to scale back the coaching time; nevertheless, it incurs extra price.

Whenever you create a Hugging Face estimator, you possibly can configure hyperparameters and supply a {custom} parameter into the coaching script, akin to vocab_url on this instance. Additionally, you possibly can specify the metrics within the estimator, parse the logs of those metrics, and ship them to Amazon CloudWatch to observe and observe the coaching efficiency. For extra particulars, see Monitor and Analyze Coaching Jobs Utilizing Amazon CloudWatch Metrics.

from sagemaker.huggingface import HuggingFace

#create an distinctive id to tag coaching job, mannequin identify and endpoint identify. 
id = int(time.time())

TRAINING_JOB_NAME = f"huggingface-wav2vec2-training-{id}"
vocab_url = f"s3://{BUCKET}/{PREFIX}/vocab.json"

hyperparameters = {'epochs':10, # you possibly can enhance the epoch quantity to enhance mannequin accuracy
                   'train_batch_size': 8,
                   'model_name': "fb/wav2vec2-base",
                   'vocab_url': vocab_url
                  }
                  
# outline metrics definitions
metric_definitions=[
        e-)[0-9]+),?",
        e-)[0-9]+),?",
        e-)[0-9]+),?",
        e-)[0-9]+),?",
        e-)[0-9]+),?"]

OUTPUT_PATH= f's3://{BUCKET}/{PREFIX}/{TRAINING_JOB_NAME}/output/'

huggingface_estimator = HuggingFace(entry_point="practice.py",
                                    source_dir="./scripts",
                                    output_path= OUTPUT_PATH, 
                                    instance_type="ml.p3.2xlarge",
                                    instance_count=1,
                                    transformers_version='4.6.1',
                                    pytorch_version='1.7.1',
                                    py_version='py36',
                                    function=ROLE,
                                    hyperparameters = hyperparameters,
                                    metric_definitions = metric_definitions,
                                   )

#Begins the coaching job utilizing the match perform, coaching takes roughly 2 hours to finish.
huggingface_estimator.match({'practice': training_input_path, 'take a look at': test_input_path},
                          job_name=TRAINING_JOB_NAME)

Within the following determine of CloudWatch coaching job logs, you possibly can see that, after 10 epochs of coaching, the mannequin analysis metrics WER (phrase error charge) can obtain round 0.17 for the subset of the SUPERB dataset. WER is a generally used metric to judge speech recognition mannequin efficiency, and the target is to attenuate it. You’ll be able to enhance the variety of epochs or use the complete SUPERB dataset to enhance the mannequin additional.

See also  RNN vs. CNN: Which Neural Network Is Right for Your Project?

Deploy the mannequin as an endpoint on SageMaker and run inference

On this part, we stroll by the steps to deploy the mannequin and carry out inference.

Inference script

We use the SageMaker Hugging Face Inference Toolkit to host our fine-tuned mannequin. It offers default features for preprocessing, predicting, and postprocessing for sure duties. Nevertheless, the default capabilities can’t inference our mannequin correctly. Due to this fact, we outlined the {custom} features model_fn(), input_fn(), predict_fn(), and output_fn() within the inference.py script to override the default settings with {custom} necessities. For extra particulars, check with the GitHub repo.

As of January 2022, the Inference Toolkit can inference duties from architectures that finish with 'TapasForQuestionAnswering', 'ForQuestionAnswering', 'ForTokenClassification', 'ForSequenceClassification', 'ForMultipleChoice', 'ForMaskedLM', 'ForCausalLM', 'ForConditionalGeneration', 'MTModel', 'EncoderDecoderModel','GPT2LMHeadModel', and 'T5WithLMHeadModel'. The Wav2Vec2 mannequin just isn’t at the moment supported.

You’ll be able to examine the complete inference script within the GitHub repo, within the scripts/ listing.

Create a Hugging Face mannequin from the estimator

We use the Hugging Face Model class to create a mannequin object, which you’ll be able to deploy to a SageMaker endpoint. When creating the mannequin, specify the next parameters:

  • entry_point – The identify of the inference script. The strategies outlined within the inference script are applied to the endpoint.
  • source_dir – The situation of the inference scripts.
  • transformers_version – The Hugging Face Transformers library model we wish to use. It needs to be in keeping with the coaching step.
  • pytorch_version – The PyTorch model that’s appropriate with the Transformers library. It needs to be in keeping with the coaching step.
  • model_data – The Amazon S3 location of a SageMaker mannequin information .tar.gz file.
from sagemaker.huggingface import HuggingFaceModel

huggingface_model = HuggingFaceModel(
        entry_point="inference.py",
        source_dir="./scripts",
        identify = f'huggingface-wav2vec2-model-{id}',
        transformers_version='4.6.1', 
        pytorch_version='1.7.1', 
        py_version='py36',
        model_data=huggingface_estimator.model_data,
        function=ROLE,
    )

predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g4dn.xlarge", 
    endpoint_name = f'huggingface-wav2vec2-endpoint-{id}'
)

Whenever you create a predictor by utilizing the mannequin.deploy perform, you possibly can change the occasion depend and occasion kind based mostly in your efficiency necessities.

Inference audio information

After you deploy the endpoint, you possibly can run prediction checks to test the mannequin efficiency. You’ll be able to obtain an audio file from the S3 bucket by utilizing the next code:

import boto3
s3 = boto3.shopper('s3')
s3.download_file(BUCKET, 'huggingface-blog/sample_audio/xxx.wav', 'downloaded.wav')
file_name="downloaded.wav"

Alternatively, you possibly can obtain a sample audio file to run the inference request:

import soundfile
!wget https://datashare.ed.ac.uk/bitstream/deal with/10283/343/MKH800_19_0001.wav
file_name="MKH800_19_0001.wav"
speech_array, sampling_rate = soundfile.learn(file_name)
json_request_data = {"speech_array": speech_array.tolist(),
                     "sampling_rate": sampling_rate}

prediction = predictor.predict(json_request_data)
print(prediction)

The anticipated result’s as follows:

['"she had your dark suit in grecy wash water all year"', 'application/json']

Clear up

Whenever you’re completed utilizing the answer, delete the SageMaker endpoint to keep away from ongoing costs:

predictor.delete_endpoint()

Conclusion

On this put up, we confirmed the right way to fine-tune the pre-trained Wav2Vec2 mannequin on SageMaker utilizing a Hugging Face estimator, and in addition the right way to host the mannequin on SageMaker as a real-time inference endpoint utilizing the SageMaker Hugging Face Inference Toolkit. For each coaching and inference steps, we supplied {custom} outlined scripts for higher flexibility, that are enabled and supported by SageMaker Hugging Face DLCs. You should utilize the strategy from this put up to fine-tune a We2Vec2 mannequin with your personal datasets, or to fine-tune and deploy a special transformer mannequin from Hugging Face.

Try the pocket book and code of this challenge from GitHub, and tell us your feedback. For extra complete data, see Hugging Face on SageMaker and Use Hugging Face with Amazon SageMaker.

As well as, Hugging Face and AWS introduced a partnership in 2022 that makes it even simpler to coach Hugging Face fashions on SageMaker. This performance is offered by the event of Hugging Face AWS DLCs. These containers embrace the Hugging Face Transformers, Tokenizers, and Datasets libraries, which permit us to make use of these sources for coaching and inference jobs. For an inventory of the out there DLC photos, see Available Deep Learning Containers Images. They’re maintained and usually up to date with safety patches. You’ll find many examples of the right way to practice Hugging Face fashions with these DLCs and the Hugging Face Python SDK within the following GitHub repo.


In regards to the Creator

Ying Hou, PhD, is a Machine Studying Prototyping Architect at AWS. Her most important areas of pursuits are deep studying, pc imaginative and prescient, NLP, and time collection information prediction. In her spare time, she enjoys studying novels and mountain climbing in nationwide parks within the UK.

Source link

Tags: AmazondeployfacefinetuneHuggingmodelrecognitionSageMakerspeechWav2Vec2
Previous Post

PitchBook said AI chip startup investment doubled in last five years

Next Post

Elon Musk Believes Twitter’s Algorithm Should Be Open Source

Next Post
Elon Musk Wants To Know If You Want Twitter To Offer An Edit Button

Elon Musk Believes Twitter's Algorithm Should Be Open Source

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Newsletter

Popular Stories

  • DeepFace - Most Popular Deep Face Recognition in 2022 (Guide)

    DeepFace – Most Popular Deep Face Recognition in 2022 (Guide)

    0 shares
    Share 0 Tweet 0
  • How To Set Up PS5 Remote Play On The Steam Deck

    0 shares
    Share 0 Tweet 0
  • Google’s PaLM AI Is Far Stranger Than Conscious

    0 shares
    Share 0 Tweet 0
  • Mirato’s mitigation planning feature allows users to uncover potential third-party risks

    0 shares
    Share 0 Tweet 0
  • Cyberint Raises $40M in Funding

    0 shares
    Share 0 Tweet 0

ML Jobs

View 115 ML Jobs at Tesla

View 165 ML Jobs at Nvidia

View 105 ML Jobs at Google

View 135 ML Jobs at Amamzon

View 131 ML Jobs at IBM

View 95 ML Jobs at Microsoft

View 205 ML Jobs at Meta

View 192 ML Jobs at Intel

Accounting and Finance Hub

Raised Seed, Series A, B, C Funding Round

Get a Free Insurance Quote

Try Our Accounting Service

AI EXPRESS

AI EXPRESS is a news site that covers the latest developments in Artificial Intelligence, Data Analytics, ML & DL, Algorithms, RPA, NLP, Robotics, Smart Homes & Cities, Cloud & Quantum Computing, AR & VR and Blockchains

Categories

  • AI
  • Ai videos
  • Apps
  • AR & VR
  • Blockchain
  • Cloud
  • Computer Vision
  • Crypto Currency
  • Data analytics
  • Esports
  • Gaming
  • Gaming Videos
  • Investment
  • IOT
  • Iot Videos
  • Low Code No Code
  • Machine Learning
  • NLP
  • Quantum Computing
  • Robotics
  • Robotics Videos
  • RPA
  • Security
  • Smart City
  • Smart Home

Quick Links

  • Reviews
  • Deals
  • Best
  • AI Jobs
  • AI Events
  • AI Directory
  • Industries

© 2021 Aiexpress.io - All rights reserved.

  • Contact
  • Privacy Policy
  • Terms & Conditions

No Result
View All Result
  • AI
  • ML
  • NLP
  • Vision
  • Robotics
  • RPA
  • Gaming
  • Investment
  • More
    • Data analytics
    • Apps
    • No Code
    • Cloud
    • Quantum Computing
    • Security
    • AR & VR
    • Esports
    • IOT
    • Smart Home
    • Smart City
    • Crypto Currency
    • Blockchain
    • Reviews
    • Video

© 2021 Aiexpress.io - All rights reserved.