Jobs
Required Skills
Python
Pandas
NumPy
SQL
NoSQL
Position Summary
We are seeking a specialized Data Engineer or Data Scientist to manage the complete lifecycle of the training data that powers our AI models. This role is pivotal in curating, sanitizing, and structuring high-quality speech and text datasets, serving as the foundation for training state-of-the-art Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and Machine Translation (MT) systems
Role and Responsibilities
Data Pipeline Architecture Design, build, and maintain robust pipelines for the ingestion, processing, and management of heterogeneous data sources, ensuring efficient flow from raw collection to model-ready inputs.
Unstructured Data Extraction Extract and process high-fidelity speech data from complex, unstructured sources, including video feeds, multi-channel audio recordings, and raw text archives.
Corpus Curation & Management Organize, structure, and analyze complex linguistic datasets, including speech-to-text alignments and parallel translation corpora, ensuring metadata accuracy and consistency.
Data Cleaning & Noise Reduction Implement rigorous quality control protocols to identify and correct errors, remove artifacts, and apply noise reduction techniques to enhance audio clarity.
Dataset Enhancement Strategies Develop and execute strategies to improve data quantity and diversity, including the application of data augmentation techniques and synthetic data generation.
Cross-Functional Collaboration Partner closely with Machine Learning Engineers to align data preprocessing workflows and formatting with the specific requirements of various model architectures.
Skills and Qualifications:
Programming Proficiency Advanced proficiency in Python and core data manipulation libraries (e.g., Pandas, Num Py) with the ability to write clean, efficient, and scalable code.
Audio & Data Tooling Hands-on experience with audio processing and analysis tools (e.g., librosa, torchaudio, Praat) and database management systems (SQL/NoSQL).
ML & NLP Fundamentals Solid understanding of Machine Learning principles and the specific preprocessing and tokenization requirements for Natural Language Processing (NLP) and speech tasks.
Data Quality Expertise:
Proven track record in handling large-scale, messy, or unstructured datasets, with a strong focus on data validation, cleaning, and sanitization techniques.
- Please visit Samsung membership to see Privacy Policy, which defaults according to your location, at: https://account.samsung.com/membership/policy/privacy. You can change Country/Language at the bottom of the page. If you are European Economic Resident, please click here: https://europe-samsung.com/ghrp/PrivacyNoticeforEU.html
Total Views
0
Apply Clicks
0
Mock Applicants
0
Scraps
0
Similar Jobs

CNO Analyst and Programmer
Booz Allen Hamilton · Annapolis Junction, MD

Operating Engineer
JLL · Oakbrook Terrace, IL

Software Principal Engineer
Dell · Bangalore, India

Systems Engineer, Senior
Booz Allen Hamilton · Fayetteville, NC

Software Engineer - Platform
Juniper Networks · Bangalore, Karnataka, India
About Samsung
Reviews
3.7
15 reviews
Work Life Balance
2.0
Compensation
2.5
Culture
1.5
Career
2.0
Management
1.8
15%
Recommend to a Friend
Pros
Hardware/technology leadership
Competitive salary offers for some roles
Sign-on bonuses available
Cons
Toxic culture and politics
Poor work-life balance with strict RTO policies
Micromanagement and employee tracking
Salary Ranges
22 data points
Senior/L5
Senior/L5 · Digital Transformation Manager
1 reports
$180,827
total / year
Base
$157,414
Stock
-
Bonus
-
$180,827
$180,827
Interview Experience
6 interviews
Difficulty
2.2
/ 5
Duration
14-28 weeks
Offer Rate
67%
Experience
Positive 33%
Neutral 33%
Negative 34%
Interview Process
1
Application Review
2
Phone Screen
3
Technical/Video Interview
4
Team Interview
5
Offer
Common Questions
Technical Knowledge
Behavioral/STAR
Past Experience
Role-Specific Skills
News & Buzz
Historic Attendance Elevates Korean Cultural Legacy as ‘Korean Treasures’ Exhibition Draws to a Close in Washington - samsung.com
Source: samsung.com
News
·
5w ago
Samsung Electronics Announces Fourth Quarter and FY 2025 Results - samsung.com
Source: samsung.com
News
·
5w ago
Samsung Galaxy Unpacked 2026: Everything we think the company will unveil - Engadget
Source: Engadget
News
·
5w ago
Samsung Nears Nvidia’s Approval for Key HBM4 AI Memory Chips - Bloomberg
Source: Bloomberg
News
·
5w ago
