AI EXPRESS
  • AI
    DARPA seeks AI solutions for sourcing critical minerals

    DARPA seeks AI solutions for sourcing critical minerals

    Who owns DALL-E images? Legal AI experts weigh in

    Who owns DALL-E images? Legal AI experts weigh in

    Broadcom turbocharges AI and ML with Tomahawk 5

    Broadcom turbocharges AI and ML with Tomahawk 5

    Did data drift in AI models cause the Equifax credit score glitch?

    Did data drift in AI models cause the Equifax credit score glitch?

    In human-centered AI, UX and software roles are evolving

    In human-centered AI, UX and software roles are evolving

    4 ways to build ESG business value with satellite data 

    4 ways to build ESG business value with satellite data 

  • ML
    Announcing the launch of the model copy feature for Amazon Rekognition Custom Labels

    Announcing the launch of the model copy feature for Amazon Rekognition Custom Labels

    Use deep learning frameworks natively in Amazon SageMaker Processing

    Intelligent document processing with AWS AI services: Part 2

    Customize your recommendations by promoting specific items using business rules with Amazon Personalize

    Customize your recommendations by promoting specific items using business rules with Amazon Personalize

    Amazon SageMaker JumpStart solutions now support custom IAM role settings

    Amazon SageMaker JumpStart solutions now support custom IAM role settings

    Amazon SageMaker Automatic Model Tuning now supports SageMaker Training Instance Fallbacks

    Amazon SageMaker Automatic Model Tuning now supports SageMaker Training Instance Fallbacks

    Build a GNN-based real-time fraud detection solution using Amazon SageMaker, Amazon Neptune, and the Deep Graph Library

    Build a GNN-based real-time fraud detection solution using Amazon SageMaker, Amazon Neptune, and the Deep Graph Library

    Build an air quality anomaly detector using Amazon Lookout for Metrics

    Build an air quality anomaly detector using Amazon Lookout for Metrics

    Use computer vision to measure agriculture yield with Amazon Rekognition Custom Labels

    Use computer vision to measure agriculture yield with Amazon Rekognition Custom Labels

    Smart Safe Stadiums with Machine Learning

    Preventing Violence and Racism at Sports Venues with Machine Learning –

  • NLP
    A Little More Conversation: How Programmatic Is Driving Podcast Ad Innovation

    A Little More Conversation: How Programmatic Is Driving Podcast Ad Innovation

    Historical costume descriptors bridge gap between past and present | VTx

    Historical costume descriptors bridge gap between past and present | VTx

    Maritime industry to spend $931 mln on AI solutions in 2022

    Maritime industry to spend $931 mln on AI solutions in 2022

    IonQ Announces Second Quarter 2022 Financial Results

    PatientMetRx Patient Opinion Map

    Through a glass, (more) clearly – PharmaLive

    Researchers Develop DL-GuesS: A Deep Learning and Sentiment Analysis-Based Framework For Cryptocurrency Price Prediction

    Researchers Develop DL-GuesS: A Deep Learning and Sentiment Analysis-Based Framework For Cryptocurrency Price Prediction

    Busting homophobic, anti-queer bias in AI language models

    Busting homophobic, anti-queer bias in AI language models

    IATA Launches Online Platform to Help Identify Security Risks

    IATA Launches Online Platform to Help Identify Security Risks

    Tek Fog: A New Cyber-Troop Cracking Down on Human Rights

    Tek Fog: A New Cyber-Troop Cracking Down on Human Rights

  • Vision
    Seoul Robotics Helps Cars Move, Park on Their Own

    Seoul Robotics Helps Cars Move, Park on Their Own

    Pattern Recognition With Geometric Model Finder

    Pattern Recognition With Geometric Model Finder

    The Top 10 Applications of Computer Vision in Aviation

    The Top 10 Applications of Computer Vision in Aviation

    YOLOv7: The Fastest Object Detection Algorithm (2022)

    YOLOv7: The Fastest Object Detection Algorithm (2022)

    Progressive Growing Generative Adversarial Networks

    Progressive Growing Generative Adversarial Networks

    Deep Learning for Image Dehazing- The What, Why, and How

    Deep Learning for Image Dehazing- The What, Why, and How

    How to train and use a custom YOLOv7 model

    How to train and use a custom YOLOv7 model

    viso.ai Logo

    Deep Learning for Person Re-Identification (2022)

    NVIDIA Jetson AGX Orin 32GB Production Modules Now Available; Partner Ecosystem Appliances and Servers Arrive

    NVIDIA Jetson AGX Orin 32GB Production Modules Now Available; Partner Ecosystem Appliances and Servers Arrive

  • Robotics
    seoul robotics

    Seoul Robotics makes regular cars autonomous with LV5 CTRL TWR

    scan&sand

    GrayMatter Robotics’ sanding solution brings in $20M

    pitchfire

    Pitchfire startup competition submissions open

    sprout

    Muddy Machines brings in $1.8M for asparagus harvesting robot Sprout

    Levita Magnetics raises $26M for Magnetic-Assisted Robotic Surgery platform

    Levita Magnetics raises $26M for Magnetic-Assisted Robotic Surgery platform

    Marc Raibert Atlas dancing

    Hyundai launches Boston Dynamics AI Institute

    programmable material

    MIT CSAIL creates materials that can sense the way they move

    robotics investments and business opportunities

    The state of robotics investment

    Xiaomi demos new CyberOne bipedal robot

    Xiaomi demos new CyberOne bipedal robot

  • RPA
    How to Create a Rock Solid Technology Portfolio with Hyperautomation?| AutomationEdge

    How to Create a Rock Solid Technology Portfolio with Hyperautomation?| AutomationEdge

    Unlocking the Top Healthcare Automation Trends with Use Cases that Rule the World| AutomationEdge

    Unlocking the Top Healthcare Automation Trends with Use Cases that Rule the World| AutomationEdge

    Staying Ahead of the Time with AI-Powered Customer Experience

    Staying Ahead of the Time with AI-Powered Customer Experience| AutomationEdge

    Why is Developing Decision Intelligence with AI Support Crucial in Healthcare?

    Why is Developing Decision Intelligence with AI Support Crucial in Healthcare?

    Robotic Process Automation using Blue Prism

    Robotic Process Automation using Blue Prism

    AI- The Tech Medicine Ameliorating the Healthcare Industry?

    AI- The Tech Medicine Ameliorating the Healthcare Industry?| AutomationEdge

    Take employee experience into hyperdrive with Hyperautomation

    Hyperautomation- Your Answer to Enhance Employee Experience| AutomationEdge

    Know Why Automation Now Resides in the Heart of Customer Contact Centers| AutomationEdge

    Know Why Automation Now Resides in the Heart of Customer Contact Centers| AutomationEdge

    Conversational AI, Healing the Healthcare Industry| AutomationEdge

    Conversational AI, Healing the Healthcare Industry| AutomationEdge

  • Gaming
    Why 4 million people – and counting – are flocking to Korea’s hottest battle royale

    Why 4 million people – and counting – are flocking to Korea’s hottest battle royale

    Saints Row trailer gives you a taste of the story

    Saints Row trailer gives you a taste of the story

    Advance Wars 1+2: Re-Boot Camp Scheduled Maintenance Spotted

    Advance Wars 1+2: Re-Boot Camp Scheduled Maintenance Spotted

    MultiVersus datamine suggests Beetlejuice and Oz’s Wicked Witch of the West are coming to the game

    MultiVersus datamine suggests Beetlejuice and Oz’s Wicked Witch of the West are coming to the game

    Call of Duty: Modern Warfare 2 open beta and multiplayer reveal dated

    Call of Duty: Modern Warfare 2 open beta and multiplayer reveal dated

    Xbox Game Pass is losing some brilliant games soon

    Xbox Game Pass is losing some brilliant games soon

    Random: New Zelda: Breath Of The Wild Glitch Means You'll Never Be Short Of Materials

    Random: New Zelda: Breath Of The Wild Glitch Means You’ll Never Be Short Of Materials

    Spider-Man PC files reveal that co-op and PvP were once being developed

    Spider-Man PC files reveal that co-op and PvP were once being developed

    Surprise! Anime characters and Hitler dominate Tower of Fantasy's create-a-character library

    Surprise! Anime characters and Hitler dominate Tower of Fantasy’s create-a-character library

  • Investment
    Bearing Raises $7M in Post-Seed Funding - FinSMEs

    Bearing Raises $7M in Post-Seed Funding – FinSMEs

    HyperTrack

    HyperTrack Raises $25M in Series A Funding

    bob

    HiBob Raises $150M in Series D Funding

    incredible health

    Incredible Health Raises $80M in Series B; Valued at $1.65 Billion

    Smart Robotics

    GrayMatter Robotics Raises $20M in Series A Funding

    alivecor

    AliveCor Closes Series F Financing

    sailpoint

    Thoma Bravo Completes Acquisition of SailPoint Technologies

    omni

    Omni Raises $26.9M in Funding

    Ledgebrook

    Ledgebrook Raises $4.2M in Seed Funding

  • More
    • Data analytics
    • Apps
    • No Code
    • Cloud
    • Quantum Computing
    • Security
    • AR & VR
    • Esports
    • IOT
    • Smart Home
    • Smart City
    • Crypto Currency
    • Blockchain
    • Reviews
    • Video
No Result
View All Result
AI EXPRESS
No Result
View All Result
Home Machine Learning

Build and train ML models using a data mesh architecture on AWS: Part 2

by
July 31, 2022
in Machine Learning
0
Build and train ML models using a data mesh architecture on AWS: Part 2
0
SHARES
4
VIEWS
Share on FacebookShare on Twitter

That is the second a part of a collection that showcases the machine studying (ML) lifecycle with a knowledge mesh design sample for a big enterprise with a number of strains of enterprise (LOBs) and a Middle of Excellence (CoE) for analytics and ML.

Partly 1, we addressed the info steward persona and showcased a knowledge mesh setup with a number of AWS information producer and shopper accounts. For an summary of the enterprise context and the steps to arrange a knowledge mesh with AWS Lake Formation and register a knowledge product, discuss with half 1.

On this submit, we deal with the analytics and ML platform staff as a shopper within the information mesh. The platform staff units up the ML setting for the info scientists and helps them get entry to the required information merchandise within the information mesh. The information scientists on this staff use Amazon SageMaker to construct and prepare a credit score danger prediction mannequin utilizing the shared credit score danger information product from the patron banking LoB.

The code for this instance is obtainable on GitHub.

Analytics and ML shopper in a knowledge mesh structure

Let’s recap the high-level structure that highlights the important thing elements within the information mesh structure.

Within the information producer block 1 (left), there’s a information processing stage to make sure that shared information is well-qualified and curated. The central information governance block 2 (middle) acts as a centralized information catalog with metadata of varied registered information merchandise. The information shopper block 3 (proper) requests entry to datasets from the central catalog and queries and processes the info to construct and prepare ML fashions.

With SageMaker, information scientists and builders within the ML CoE can shortly and simply construct and prepare ML fashions, after which straight deploy them right into a production-ready hosted setting. SageMaker supplies quick access to your information sources for exploration and evaluation, and in addition supplies frequent ML algorithms and frameworks which can be optimized to run effectively towards extraordinarily massive information in a distributed setting. It’s simple to get began with Amazon SageMaker Studio, a web-based built-in growth setting (IDE), by finishing the SageMaker area onboarding course of. For extra data, discuss with the Amazon SageMaker Developer Information.

Information product consumption by the analytics and ML CoE

The next structure diagram describes the steps required by the analytics and ML CoE shopper to get entry to the registered information product within the central information catalog and course of the info to construct and prepare an ML mannequin.

The workflow consists of the next elements:

  1. The producer information steward supplies entry within the central account to the database and desk to the patron account. The database is now mirrored as a shared database within the shopper account.
  2. The buyer admin creates a useful resource hyperlink within the shopper account to the database shared by the central account. The next screenshot reveals an instance within the shopper account, with rl_credit-card being the useful resource hyperlink of the credit-card database.

  3. The buyer admin supplies the Studio AWS Identification and Entry Administration (IAM) execution position entry to the useful resource linked database and the desk recognized within the Lake Formation tag. Within the following instance, the patron admin offered to the SageMaker execution position has permission to entry rl_credit-card and the desk satisfying the Lake Formation tag expression.
  4. As soon as assigned an execution position, information scientists in SageMaker can use Amazon Athena to question the desk by way of the useful resource hyperlink database in Lake Formation.
    1. For information exploration, they’ll use Studio notebooks to course of the info with interactive querying by way of Athena.
    2. For information processing and have engineering, they’ll run SageMaker processing jobs with an Athena information supply and output outcomes again to Amazon Easy Storage Service (Amazon S3).
    3. After the info is processed and obtainable in Amazon S3 on the ML CoE account, information scientists can use SageMaker coaching jobs to coach fashions and SageMaker Pipelines to automate model-building workflows.
    4. Information scientists can even use the SageMaker mannequin registry to register the fashions.

Information exploration

The next diagram illustrates the info exploration workflow within the information shopper account.

The buyer begins by querying a pattern of the info from the credit_risk desk with Athena in a Studio pocket book. When querying information by way of Athena, the intermediate outcomes are additionally saved in Amazon S3. You should use the AWS Data Wrangler library to run a question on Athena in a Studio pocket book for information exploration. The next code instance reveals how to query Athena to fetch the outcomes as a dataframe for information exploration:

df= wr.athena.read_sql_query('SELECT * FROM credit_card LIMIT 10;', database="rl_credit-card", ctas_approach=False)

Now that you’ve got a subset of the info as a dataframe, you can begin exploring the info and see what characteristic engineering updates are wanted for mannequin coaching. An instance of knowledge exploration is proven within the following screenshot.

See also  How to Web Scraping with Python using BeautifulSoup (Code Example)

If you question the database, you may see the entry logs from the Lake Formation console, as proven within the following screenshot. These logs offer you details about who or which service has used Lake Formation, together with the IAM position and time of entry. The screenshot reveals a log about SageMaker accessing the desk credit_risk in AWS Glue by way of Athena. Within the log, you may see the extra audit context that comprises the question ID that matches the question ID in Athena.

The next screenshot reveals the Athena question run ID that matches the question ID from the previous log. This reveals the info accessed with the SQL question. You possibly can see what information has been queried by navigating to the Athena console, selecting the Latest queries tab, after which in search of the run ID that matches the question ID from the extra audit context.

Information processing

After information exploration, you might wish to preprocess your complete massive dataset for characteristic engineering earlier than coaching a mannequin. The next diagram illustrates the info processing process.

On this instance, we use a SageMaker processing job, by which we outline an Athena dataset definition. The processing job queries the info by way of Athena and makes use of a script to separate the info into coaching, testing, and validation datasets. The outcomes of the processing job are saved to Amazon S3. To learn to configure a processing job with Athena, discuss with Use Amazon Athena in a processing job with Amazon SageMaker.

On this instance, you should utilize the Python SDK to set off a processing job with the Scikit-learn framework. Earlier than triggering, you may configure the inputs parameter to get the enter information by way of the Athena dataset definition, as proven within the following code. The dataset comprises the situation to obtain the outcomes from Athena to the processing container and the configuration for the SQL question. When the processing job is completed, the outcomes are saved in Amazon S3.

AthenaDataset = AthenaDatasetDefinition (
  catalog = 'AwsDataCatalog', 
  database="rl_credit-card", 
  query_string = 'SELECT * FROM "rl_credit-card"."credit_card""',                                
  output_s3_uri = 's3://sagemaker-us-east-1-********7363/athenaqueries/', 
  work_group = 'major', 
  output_format="PARQUET")

dataSet = DatasetDefinition(
  athena_dataset_definition = AthenaDataset, 
  local_path="/choose/ml/processing/enter/dataset.parquet")


sklearn_processor.run(
    code="processing/preprocessor.py",
    inputs=[ProcessingInput(
      input_name="dataset", 
      destination="/opt/ml/processing/input", 
      dataset_definition=dataSet)],
    outputs=[
        ProcessingOutput(
            output_name="train_data", source="/opt/ml/processing/train", destination=train_data_path
        ),
        ProcessingOutput(
            output_name="val_data", source="/opt/ml/processing/val", destination=val_data_path
        ),
        ProcessingOutput(
            output_name="model", source="/opt/ml/processing/model", destination=model_path
        ),
        ProcessingOutput(
            output_name="test_data", source="/opt/ml/processing/test", destination=test_data_path
        ),
    ],
    arguments=["--train-test-split-ratio", "0.2"],
    logs=False,
)

Mannequin coaching and mannequin registration

After preprocessing the info, you may prepare the mannequin with the preprocessed information saved in Amazon S3. The next diagram illustrates the mannequin coaching and registration course of.

For information exploration and SageMaker processing jobs, you may retrieve the info within the information mesh by way of Athena. Though the SageMaker Coaching API doesn’t embrace a parameter to configure an Athena information supply, you may question information by way of Athena within the coaching script itself.

On this instance, the preprocessed information is now obtainable in Amazon S3 and can be utilized straight to coach an XGBoost mannequin with SageMaker Script Mode. You possibly can present the script, hyperparameters, occasion kind, and all the extra parameters wanted to efficiently prepare the mannequin. You possibly can set off the SageMaker estimator with the coaching and validation information in Amazon S3. When the mannequin coaching is full, you may register the mannequin within the SageMaker mannequin registry for experiment monitoring and deployment to a manufacturing account.

estimator = XGBoost(
    entry_point=entry_point,
    source_dir=source_dir,
    output_path=output_path,
    code_location=code_location,
    hyperparameters=hyperparameters,
    instance_type="ml.c5.xlarge",
    instance_count=1,
    framework_version="0.90-2",
    py_version="py3",
    position=position,
)

inputs = {"prepare": train_input_data, "validation": val_input_data}

estimator.match(inputs, job_name=job_name)

Subsequent steps

You may make incremental updates to the answer to handle necessities round information updates and mannequin retraining, automated deletion of intermediate information in Amazon S3, and integrating a characteristic retailer. We talk about every of those in additional element within the following sections.

See also  Morgan Stanley Plans on Purchasing More Bitcoin in 2022, Data Analysis

Information updates and mannequin retraining triggers

The next diagram illustrates the method to replace the coaching information and set off mannequin retraining.

The method contains the next steps:

  1. The information producer updates the info product with both a brand new schema or extra information at a daily frequency.
  2. After the info product is re-registered within the central information catalog, this generates an Amazon CloudWatch occasion from Lake Formation.
  3. The CloudWatch occasion triggers an AWS Lambda perform to synchronize the up to date information product with the patron account. You should use this set off to mirror the info adjustments by doing the next:
    1. Rerun the AWS Glue crawler.
    2. Set off mannequin retraining if the info drifts past a given threshold.

For extra particulars about establishing an SageMaker MLOps deployment pipeline for drift detection, discuss with the Amazon SageMaker Drift Detection GitHub repo.

Auto-deletion of intermediate information in Amazon S3

You possibly can mechanically delete intermediate information that’s generated by Athena queries and saved in Amazon S3 within the shopper account at common intervals with S3 object lifecycle guidelines. For extra data, discuss with Managing your storage lifecycle.

SageMaker Function Retailer integration

SageMaker Function Retailer is purpose-built for ML and might retailer, uncover, and share curated options utilized in coaching and prediction workflows. A characteristic retailer can work as a centralized interface between completely different information producer groups and LoBs, enabling characteristic discoverability and reusability to a number of customers. The characteristic retailer can act as an alternative choice to the central information catalog within the information mesh structure described earlier. For extra details about cross-account structure patterns, discuss with Allow characteristic reuse throughout accounts and groups utilizing Amazon SageMaker Function Retailer.

Conclusion

On this two-part collection, we showcased how one can construct and prepare ML fashions with a multi-account information mesh structure on AWS. We described the necessities of a typical monetary companies group with a number of LoBs and an ML CoE, and illustrated the answer structure with Lake Formation and SageMaker. We used the instance of a credit score danger information product registered in Lake Formation by the patron banking LoB and accessed by the ML CoE staff to coach a credit score danger ML mannequin with SageMaker.

Every information producer account defines information merchandise which can be curated by individuals who perceive the info and its entry management, use, and limitations. The information merchandise and the appliance domains that devour them are interconnected to kind the info mesh. The information mesh structure permits the ML groups to find and entry these curated information merchandise.

Lake Formation permits cross-account entry to Information Catalog metadata and underlying information. You should use Lake Formation to create a multi-account information mesh structure. SageMaker supplies an ML platform with key capabilities round information administration, information science experimentation, mannequin coaching, mannequin internet hosting, workflow automation, and CI/CD pipelines for productionization. You possibly can arrange a number of analytics and ML CoE environments to construct and prepare fashions with information merchandise registered throughout a number of accounts in a knowledge mesh.

Check out the AWS CloudFormation templates and code from the instance repository to get began.


In regards to the authors

Karim Hammouda is a Specialist Options Architect for Analytics at AWS with a ardour for information integration, information evaluation, and BI. He works with AWS clients to design and construct analytics options that contribute to their enterprise development. In his free time, he likes to look at TV documentaries and play video video games together with his son.

Hasan Poonawala is a Senior AI/ML Specialist Options Architect at AWS, Hasan helps clients design and deploy machine studying functions in manufacturing on AWS. He has over 12 years of labor expertise as a knowledge scientist, machine studying practitioner, and software program developer. In his spare time, Hasan likes to discover nature and spend time with family and friends.

Benoit de Patoul is an AI/ML Specialist Options Architect at AWS. He helps clients by offering steerage and technical help to construct options associated to AI/ML utilizing AWS. In his free time, he likes to play piano and spend time with buddies.

Source link

Tags: architectureAWSBuilddatameshmodelspartTrain
Previous Post

Why Corporate Purpose And AI Ethics Must Be A Leadership And Risk Management Priority (Blog Series 2 Of 5)

Next Post

Ozette Raises $26M in Series A Funding

Next Post
ozette

Ozette Raises $26M in Series A Funding

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Newsletter

Popular Stories

  • Cilium launches eBPF-powered Kubernetes service mesh

    Don’t overengineer your cloud architecture

    0 shares
    Share 0 Tweet 0
  • How to train and use a custom YOLOv7 model

    0 shares
    Share 0 Tweet 0
  • DeepFace – Most Popular Deep Face Recognition in 2022 (Guide)

    0 shares
    Share 0 Tweet 0
  • YOLOv7: The Fastest Object Detection Algorithm (2022)

    0 shares
    Share 0 Tweet 0
  • LG TV Owners Can Get 90 Days Of Stadia Pro For Free

    0 shares
    Share 0 Tweet 0

ML Jobs

View 115 ML Jobs at Tesla

View 165 ML Jobs at Nvidia

View 105 ML Jobs at Google

View 135 ML Jobs at Amamzon

View 131 ML Jobs at IBM

View 95 ML Jobs at Microsoft

View 205 ML Jobs at Meta

View 192 ML Jobs at Intel

Accounting and Finance Hub

Raised Seed, Series A, B, C Funding Round

Get a Free Insurance Quote

Try Our Accounting Service

AI EXPRESS

AI EXPRESS is a news site that covers the latest developments in Artificial Intelligence, Data Analytics, ML & DL, Algorithms, RPA, NLP, Robotics, Smart Homes & Cities, Cloud & Quantum Computing, AR & VR and Blockchains

Categories

  • AI
  • Ai videos
  • Apps
  • AR & VR
  • Blockchain
  • Cloud
  • Computer Vision
  • Crypto Currency
  • Data analytics
  • Esports
  • Gaming
  • Gaming Videos
  • Investment
  • IOT
  • Iot Videos
  • Low Code No Code
  • Machine Learning
  • NLP
  • Quantum Computing
  • Robotics
  • Robotics Videos
  • RPA
  • Security
  • Smart City
  • Smart Home

Quick Links

  • Reviews
  • Deals
  • Best
  • AI Jobs
  • AI Events
  • AI Directory
  • Industries

© 2021 Aiexpress.io - All rights reserved.

  • Contact
  • Privacy Policy
  • Terms & Conditions

No Result
View All Result
  • AI
  • ML
  • NLP
  • Vision
  • Robotics
  • RPA
  • Gaming
  • Investment
  • More
    • Data analytics
    • Apps
    • No Code
    • Cloud
    • Quantum Computing
    • Security
    • AR & VR
    • Esports
    • IOT
    • Smart Home
    • Smart City
    • Crypto Currency
    • Blockchain
    • Reviews
    • Video

© 2021 Aiexpress.io - All rights reserved.