Видео 293
Просмотров 4 558 181

Big Data Engineer Mock Interview | Questions on Data Skewness | Salting | Out of Memory Error

24:12

Big Data Engineer Live Mock Interview | Topics: Pyspark, Delta Lake, Data Profiling, Data Governance

45:45

Top Big Data Interview Questions asked in 2024 | Cloud Data Engineer | Azure | Spark | SQL#interview

51:18

Azure Cloud Data Engineer Interview | Real-time Scenario based Questions & Expert Feedback | BigData

34:56

Azure Cloud Data Engineer Mock Interview | Important Questions asked in Big Data Interviews| Pyspark

29:08

Live Must Watch Mock Interview for Experienced Big Data Engineers | Debug Production Issues | CICD

33:46

Big Data Interview | Mock | Problem Solving | Technical Round | Pyspark , SQL #interview #question

To enhance your career as a Cloud Data Engineer, Check trendytech.in/?src=youtube&sub=mockdec for curated courses developed by me.
I have trained over 20,000+ professionals in the field of Data Engineering in the last 5 years.
BIG DATA INTERVIEW SERIES
This mock interview series is launched as a community initiative under Data Engineers Club aimed at aiding the community's growth and development
Our highly experienced guest interviewer, Harsh Patil, www.linkedin.com/in/imharsh044/ shares invaluable insights and practical guidance drawn from his extensive expertise in the Big Data Domain.
Our expert guest interviewee, Himanshu Mishra, www.linkedin.com/in/himanshu-mishra-4796014b/ has an intere...

Видео

Big Data Engineer Mock Interview | Questions on Data Skewness | Salting | Out of Memory Error

24:12

Big Data Engineer Mock Interview | Questions on Data Skewness | Salting | Out of Memory Error

Просмотров 6 тыс.Месяц назад

To enhance your career as a Cloud Data Engineer, Check trendytech.in/?src=youtube&sub=mockdec for curated courses developed by me. I have trained over 20,000 professionals in the field of Data Engineering in the last 5 years. BIG DATA INTERVIEW SERIES This mock interview series is launched as a community initiative under Data Engineers Club aimed at aiding the community's growth and development...

Big Data Engineer Live Mock Interview | Topics: Pyspark, Delta Lake, Data Profiling, Data Governance

45:45

Big Data Engineer Live Mock Interview | Topics: Pyspark, Delta Lake, Data Profiling, Data Governance

Просмотров 6 тыс.Месяц назад

Top Big Data Interview Questions asked in 2024 | Cloud Data Engineer | Azure | Spark | SQL#interview

51:18

Top Big Data Interview Questions asked in 2024 | Cloud Data Engineer | Azure | Spark | SQL#interview

Просмотров 9 тыс.Месяц назад

Azure Cloud Data Engineer Interview | Real-time Scenario based Questions & Expert Feedback | BigData

34:56

Azure Cloud Data Engineer Interview | Real-time Scenario based Questions & Expert Feedback | BigData

Просмотров 3,3 тыс.Месяц назад

Azure Cloud Data Engineer Mock Interview | Important Questions asked in Big Data Interviews| Pyspark

29:08

Azure Cloud Data Engineer Mock Interview | Important Questions asked in Big Data Interviews| Pyspark

Просмотров 3,7 тыс.Месяц назад

Azure Cloud Data Engineer Mock Interview | Important Questions asked in Big Data Interviews| Pyspark

Live Must Watch Mock Interview for Experienced Big Data Engineers | Debug Production Issues | CICD

33:46

Live Must Watch Mock Interview for Experienced Big Data Engineers | Debug Production Issues | CICD

Просмотров 1,8 тыс.Месяц назад

Live Must Watch Mock Interview for Experienced Big Data Engineers | Debug Production Issues | CICD

Must Watch Live Mock Interview for Aspiring Big Data Engineers | PySpark, Hive & SQL #interview

34:08

Must Watch Live Mock Interview for Aspiring Big Data Engineers | PySpark, Hive & SQL #interview

Просмотров 4,2 тыс.Месяц назад

Must Watch Live Mock Interview for Aspiring Big Data Engineers | PySpark, Hive & SQL #interview

Important Data Structures and Algorithms Mock Interview for Data Engineers Spark | SQL #interview

31:05

Important Data Structures and Algorithms Mock Interview for Data Engineers Spark | SQL #interview

Просмотров 3,5 тыс.Месяц назад

Important Data Structures and Algorithms Mock Interview for Data Engineers Spark | SQL #interview

Must Watch Live Mock Interview for Aspiring Azure Data Engineers | Azure Data Pipeline, PySpark, SQL

42:41

Must Watch Live Mock Interview for Aspiring Azure Data Engineers | Azure Data Pipeline, PySpark, SQL

Просмотров 1,1 тыс.Месяц назад

Must Watch Live Mock Interview for Aspiring Azure Data Engineers | Azure Data Pipeline, PySpark, SQL

Real-time Data Modeling & System Design Mock Interview for Data Engineers #interview #important

58:03

Real-time Data Modeling & System Design Mock Interview for Data Engineers #interview #important

Просмотров 3,3 тыс.Месяц назад

Real-time Data Modeling & System Design Mock Interview for Data Engineers #interview #important

Must Watch Live Mock Interview For Data Engineers | System Design | Data Modeling #interview

59:41

Must Watch Live Mock Interview For Data Engineers | System Design | Data Modeling #interview

Просмотров 6 тыс.Месяц назад

Must Watch Live Mock Interview For Data Engineers | System Design | Data Modeling #interview

Everything that you need to know about the Big Data Interviews | Pyspark | Databricks | Azure | AWS

42:33

Everything that you need to know about the Big Data Interviews | Pyspark | Databricks | Azure | AWS

Просмотров 3 тыс.2 месяца назад

Everything that you need to know about the Big Data Interviews | Pyspark | Databricks | Azure | AWS

PySpark Mock Interview | Basic Coding Round | Python | SQL | Join Optimizations #interview #bigdata

35:41

PySpark Mock Interview | Basic Coding Round | Python | SQL | Join Optimizations #interview #bigdata

Просмотров 3 тыс.2 месяца назад

PySpark Mock Interview | Basic Coding Round | Python | SQL | Join Optimizations #interview #bigdata

Live Big Data Project Interview Questions | Project Architecture | Data Pipeline #interview

11:10

Live Big Data Project Interview Questions | Project Architecture | Data Pipeline #interview

Просмотров 2,4 тыс.2 месяца назад

Live Big Data Project Interview Questions | Project Architecture | Data Pipeline #interview

Top Cloud Data Engineer Interview Questions | Azure | AWS | #interview #bigdata #question

6:14

Top Cloud Data Engineer Interview Questions | Azure | AWS | #interview #bigdata #question

Просмотров 1,5 тыс.2 месяца назад

Top Cloud Data Engineer Interview Questions | Azure | AWS | #interview #bigdata #question

Big Data Mock Interview | DSA | APACHE SPARK | PYTHON | DATABRICKS #interview

32:44

Big Data Mock Interview | DSA | APACHE SPARK | PYTHON | DATABRICKS #interview

Просмотров 4,5 тыс.2 месяца назад

Big Data Mock Interview | DSA | APACHE SPARK | PYTHON | DATABRICKS #interview

Live Apache Spark Mock Interview | Spark | SQL | Databricks | Project based #interview #question

44:35

Live Apache Spark Mock Interview | Spark | SQL | Databricks | Project based #interview #question

Просмотров 3,9 тыс.2 месяца назад

Live Apache Spark Mock Interview | Spark | SQL | Databricks | Project based #interview #question

Top 15 Spark Interview Questions in less than 15 minutes Part-2 #bigdata #pyspark #interview

12:46

Top 15 Spark Interview Questions in less than 15 minutes Part-2 #bigdata #pyspark #interview

Просмотров 11 тыс.2 месяца назад

Top 15 Spark Interview Questions in less than 15 minutes Part-2 #bigdata #pyspark #interview

15 Data Engineering Interview Questions in less than 15 minutes Part-1 #bigdata #interview

12:44

15 Data Engineering Interview Questions in less than 15 minutes Part-1 #bigdata #interview

Просмотров 9 тыс.2 месяца назад

15 Data Engineering Interview Questions in less than 15 minutes Part-1 #bigdata #interview

Live Databricks Developer Mock Interview | Azure | Spark | SQL #coding #interview

47:01

Live Databricks Developer Mock Interview | Azure | Spark | SQL #coding #interview

Просмотров 5 тыс.2 месяца назад

Live Databricks Developer Mock Interview | Azure | Spark | SQL #coding #interview

Live Big Data Mock Interview | WALMART | Technical Round | Spark, SQL, Python #interview

35:32

Live Big Data Mock Interview | WALMART | Technical Round | Spark, SQL, Python #interview

Просмотров 6 тыс.2 месяца назад

Live Big Data Mock Interview | WALMART | Technical Round | Spark, SQL, Python #interview

Live Big Data Mock Interview | Techno-Managerial Round | Scenario Based Questions #interview

37:49

Live Big Data Mock Interview | Techno-Managerial Round | Scenario Based Questions #interview

Просмотров 2,9 тыс.2 месяца назад

Live Big Data Mock Interview | Techno-Managerial Round | Scenario Based Questions #interview

Live Big Data Mock Interview | Technical Round 2 : PySpark | Slowly Changing Dimensions | Data Skew

30:26

Live Big Data Mock Interview | Technical Round 2 : PySpark | Slowly Changing Dimensions | Data Skew

Просмотров 3,7 тыс.2 месяца назад

Live Big Data Mock Interview | Technical Round 2 : PySpark | Slowly Changing Dimensions | Data Skew

Apache Spark 1st Technical Round Live Interview | Spark Optimization Coding #interview #question

31:26

Apache Spark 1st Technical Round Live Interview | Spark Optimization Coding #interview #question

Просмотров 5 тыс.2 месяца назад

Apache Spark 1st Technical Round Live Interview | Spark Optimization Coding #interview #question

Big Data Mock Interview for Freshers | Data Engineer Screening Round | Spark, Python, SQL #interview

59:01

Big Data Mock Interview for Freshers | Data Engineer Screening Round | Spark, Python, SQL #interview

Просмотров 1,9 тыс.2 месяца назад

Big Data Mock Interview for Freshers | Data Engineer Screening Round | Spark, Python, SQL #interview

Live System Design Mock Interview Round for Data Engineers | Project & Scenario Based #interview

52:18

Live System Design Mock Interview Round for Data Engineers | Project & Scenario Based #interview

Просмотров 3 тыс.2 месяца назад

Live System Design Mock Interview Round for Data Engineers | Project & Scenario Based #interview

Apache Spark 1st Technical Round Live Interview for Experienced Candidates | Azure | SQL #interview

41:58

Apache Spark 1st Technical Round Live Interview for Experienced Candidates | Azure | SQL #interview

Просмотров 4,6 тыс.2 месяца назад

Apache Spark 1st Technical Round Live Interview for Experienced Candidates | Azure | SQL #interview

Live Azure Data Engineering Mock Interview | Technical Round | Out of Memory Issue | SQL #interview

44:42

Live Azure Data Engineering Mock Interview | Technical Round | Out of Memory Issue | SQL #interview

Просмотров 3,4 тыс.2 месяца назад

Live Azure Data Engineering Mock Interview | Technical Round | Out of Memory Issue | SQL #interview

Live AWS Mock Interview for Data Engineers | AWS Techno Managerial Round | Scenario-based | Project

51:29

Live AWS Mock Interview for Data Engineers | AWS Techno Managerial Round | Scenario-based | Project

Просмотров 2,9 тыс.2 месяца назад

Live AWS Mock Interview for Data Engineers | AWS Techno Managerial Round | Scenario-based | Project

@pranyajain8526 7 часов назад
This is amazing, very well explained Sumit Sir! Do you give career advice also?
@hdr-tech4350 15 часов назад
Spark core -Rdd (flexible) high level apis- Df and Spark sql (easy to write query) Transformation n action Spark submit process Deployment modes Types of transformation Repartition n coalesce Methods for schema enforcement - ddl, struct Consecutive wins in sql
@hdr-tech4350 15 часов назад
Java used in Hadoop Bound to work on mapreduce Can only work on batch process not real time in map reduce
@hdr-tech4350 21 час назад
Data lake vs delta lake Unity catalog Data profiling Data governance
@hdr-tech4350 21 час назад
Predicate pushdown What opt used in spark Transformation used Groupby n reduceby Faced oom error ? Salting Data skewness Data spillness Cache persists Lru Repartition vs coalesce Rnk densernk rno What happen submit spark job
@talknow2859 День назад
I definitely learnt something. Thanks.🎉
@gurupradeep9648 День назад
first one Select name from salary_table as a where salary>(select salary from salary_table where id =a.managerid)
@MayankSharma-cp6yu 2 дня назад
What is the answer to the last question - 'When we align transformations and action in a DAG, what is that graph called?'
@abhishekn786 3 дня назад
Dear Sir, hope you are doing well, when can we expect the next video on Python? It's been more than 3 months now, that you haven't posted any followed video on Python. I guess everyone is eagerly waiting. Please post it asap. Thanks
@ANIRUDH6315 3 дня назад
sir please upload more videos. We are waiting for the next
@janardhanreddy3267 3 дня назад
interview Series are good , please upload remaining 10 questions , eagerly waiting sir
@user-oi5pw9ly7r 3 дня назад
Master and slave architecture we have driver node acts master node multiple worker nodes submitted spark jobs contexts of spark node - create jobs and execute plan spark cluster entry point driver will request resource manager send back to driver also do iteration of jobs
@VenkatBala-jv2yh 4 дня назад
Great questions....
@Sudeep-ow4pe 4 дня назад
The interview series is really helpful, Thank you
@rishiraj2014 5 дней назад
One correctness here 9.59, in Narrow transformation there is no shuffling and in wide transformation there is shuffling of data.
@sashikiran9 5 дней назад
Good content!
@TanyaSingh-yb5hl 5 дней назад
please share the link of these 175 questions of leetcode..it would be very helpful
@niridha23 5 дней назад
Thanks for conducting these mock interviews Sumit sir. It is really helpful😊
@Nnirvana 5 дней назад
Parquet is columnar based file format which stores the metadata along with the original data. i.e. MIN MAX values of the different columns in that file. During Read operation it checks the metadata and avoids scanning entire file that are irrelevant. Also by default it comes with Snappy compression which saves good amount of storage space.
@SrikarPalivela 6 дней назад
🎯 Key points for quick navigation: 00:24 *🎓 Anur's background and expertise* - Anur has 9 years of industry experience, with over 7 years in Big Data. - Anur's primary skills revolve around Spark, AWS, cloud, data bricks, Kafka, and airflow. - Anur is currently working as an assistant manager at KPMG Global Services. 02:41 *🛠️ Ram's project overview (Insurance & Green Energy domains)* - Ram's previous project involved data injection, cleaning, transformation from raw to gold layer tables in a medallion architecture. - Ram's current project in the green energy domain includes moving data between systems, managing data warehouse, and creating a data strategy for the company. - Ram uses a variety of sources for data ingestion, from APIs to third-party sources dumped in ADLS Gen2. 05:29 *🔐 Data Security Framework Implementation* - Ram implemented an entitlement framework in Databricks for row-level and column-level masking. - Encryption methods were used, with keys stored in Azure Key Vault linked to Databricks using DBUtils secret scope. - Utilized Databricks Delta Lake for file format and ensured data security across various domains within the project. 11:44 *🔄 Resolving Pipeline Latency Bottlenecks* - Ram faced a bottleneck due to processing slowdown with accumulated data in a Databricks pipeline. - Resolved by optimizing cluster configurations, increasing CPU cores, and adjusting shuffle partitions for better performance. - Leveraged properties like cost-based optimization, join reordering, and adaptive query execution in Databricks for further pipeline optimization. 16:52 *📊 Real-time Dashboard Pipeline Implementation* - Ram's approach for a real-time dashboard involves capturing data from Oracle in a live stream, storing in Azure ADLS Gen2, and aggregating for the dashboard. - Explained the use of CDC feature, ADF integration runtime, and operations in Databricks for updating and reflecting real-time data changes on the dashboard. - Implemented triggers in Azure Data Factory, focusing on storage-based triggers to capture file arrivals for immediate processing. 21:29 *📊 Initial Spark optimization explanation* - Transformations in Spark are lazy and added to the directed acyclic graph (DAG) for evaluation when data is requested. - Spark applies lazy evaluation, leading to optimized action execution for processing only necessary data. 22:25 *🖥️ Cluster configuration and executor numbers* - Calculating ideal executor numbers and memory allocation based on data size and CPU cores. - Starting with a baseline of the number of executor cores for optimal cluster performance. 24:58 *💻 Approaching a Spark SQL problem* - Creating a cumulative revenue calculation from a DataFrame in Spark SQL. - Utilizing window functions and data manipulation to achieve the desired output. Made with HARPA AI
@naveenkumarsingh3829 6 дней назад
bhai ye toh reverse list k case me cheating kr rha..side me notes dekh kar answer likh rha..waah
@rabeeahmohammedyaqeen3956 7 дней назад
are you doing on command prompt or what
@singhjirajeev 8 дней назад
Insert INTO emp2 Select * from emp1;
@bharanidharanm2653 9 дней назад
3rd scenario is not clear. Are we updating ant congratulation setting to avoid small files problem
@RohitSharma-ug8rv 10 дней назад
What is cardinality
@krisharjunakinjarapu3071 7 дней назад
Cartinality tells the no of distinct values in column related to rows
@souravdas-kt7gg 10 дней назад
with c as(select e.*,m.salary as manager_salary,m.name as manager_name from employees_prac e left join employees_prac m on e.managerid=m.id where e.salary-m.salary>0) select name from c; My solution
@talknow2859 10 дней назад
Very helpful 🎉🎉🎉
@jhonsen9842 10 дней назад
LoL Why this question relevant ?
@priyatamnayak2208 11 дней назад
Dear sir I am facing an issue when giving mysql-ctl cli; which shows command not found... Kindly help
@vanshagarwal1355 6 дней назад
same issue
@AnandPatil-eu1tl 12 дней назад
Thank you sir , this videos are very helpful
@shobhittiwari2014 13 дней назад
Sir when is next video coming?
@YashKarambalkar-og3sy 14 дней назад
Best video for starters <3
@user-bj3mh3nm1n 14 дней назад
Will u teach urself or someone else?
@AniketPatil-yr1iw 14 дней назад
Hi sumit sir . In this 5th video . The video description is without the topic list . Can you please add.
@mayanksatija684 14 дней назад
As per me, we can do the second question with below : with t1 as ( select customer_number,count(*) as count from orders group by customer_number) select t1.customer_number from t1 where t1.count = (select max(count) from t1)
@kmthailu2262 14 дней назад
Thank you! This is very helpful
@mahavirsinghrajpurohit8004 15 дней назад
Order by and distinct will work together if you add order by column name with select.
@ravulapallivenkatagurnadha9605 15 дней назад
Nice videos
@siddharthbarthwal630 16 дней назад
very nice way to teach. thank u Sir.
@AnandKumar-wq3vo 16 дней назад
with cte as ( select *, row_number() over (order by start_time) as rownum, DATEADD(MINUTE,-1* row_number() over (order by start_time) ,start_time) as updated_time from service_status where status = 'down' ) select service_name,min(start_time) as start_updated_time,max(start_time) as end_updated_time, status from cte group by service_name,updated_time,status having count(*)>3
@bharathKumar-or6gd 16 дней назад
Clear and Great Explanation on Where and Having Clause 👌👌👌👌
@rahulpandit9082 17 дней назад
Dono hi ase hn ,... Interviewer ko nind aa rhi h, aur candidate uska fayda uthana chahta h😂

Sumit Mittal

Видео

Комментарии