SwethaCheruku Hadoopdeveloper
SwethaCheruku Hadoopdeveloper
Ph No:734-928-2140
Email id:[email protected]
Summary
Technical Skills
Silver Spring Networks Inc. develops transformational products and solutions to help cities, utilities, and other
businesses build intelligent, more efficient networks. SSNI applications manage millions of devices for some of
the biggest utilities and Smart City operators in the world. These applications gather the data needed to generate
bills, control the equipment used for distribution of power, communicate with in-home devices to manage
demand and reduce blackouts and grid failures, and control and monitor Smart City devices remotely.
Responsibilities:
Responsible for building scalable distributed data solutions using Hadoop.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it using
Map Reduce programs.
Worked on Pig and Hiveqlfor processing and analyzing data generated by distributed IOT networks.
Created the Hive queries for data sampling and analysis of the data generated by CustomerIQ
application.
Handled importing of data from various data sources, performed transformations using Hive,
MapReduce, loaded data into HDFS and Extracted the data from HDFStoMYSQLusing Sqoop.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate
reports for the BI team.
Worked on storing/retrieving data in the SilverLink Data Platform.
Migrating various hive UDFs and queries into Spark SQL for faster requests as part of POC
implementation.
Used Spark for Parallel data processing and better performances.
Worked in data warehouse schema creation and management.
Worked on Oozie workflows to run multiple Hive and Pig jobs.
Balanced and tuned HDFS, Hive, MapReduce, and Oozie work flows.
Worked on installing operating system and Hadoop updates, patches, version upgrades when
required.
Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
Environment: CDH 5.7.1, CDH 5.6.1, CentOS 7,RHEL 7,Ganglia, Hadoop, Hive,Oozie,Pig,Java, HDFS, Map
Reduce, Spark, Sqoop
RedFin is a real estate search site for homebuyers, renters and real estate professionals in United States. It has a
database of over 100 million homes and 35 million users. RedFin Insight is a service that provides high quality
leads to real estate professionals by leveraging big data sourced through RedFins consumer search dataset.
Worked with RedFin Insight Analytics team, to allow real estate professionals to gain deep understanding of
client needs by exposing home search preferences, financing prequalification details and home buying lifecycle
information.
Responsibilities:
Worked on and designed Big Data analytics platform for processing customer interface preferences
and comments using Java, Hadoop, Hive and Pig.
Involved in Hive-Hbase integration by creating hive external tables and specifying storage as Hbase
format.
Performance tuning of the Hadoop cluster workloads, bottle necks and job queuing.
Used Oozie to automate/schedule business workflows which invoke Sqoop, MapReduce and Pig
jobs as per the requirements.
Worked on accessing Hive tables to perform analytics from java applications using JDBC
Developed Sqoop scripts to import and export the data from relational sources and handled
incremental loading on the customer and transaction data by date.
Worked with various HDFS file formats like Avro, SequenceFile and various compression formats
like Snappy, bzip2.
Developed the Pig UDF's to pre-process the data for analysis.
Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
Developed Hive queries for data sampling and analysis to the analysts.
Loaded data into the cluster from dynamically generated files using Flume and from relational
database management systems using Sqoop.
Used Solr search API and have developed custom Solr Request Handler
Developed custom Python and Unix SHELL scripts to do data sampling, pre and post validations of
master and slave nodes, before and after configuring the name node and data nodes respectively.
Developed and used Pig Scripts to process and query flat files in HDFS which cannot be accessed
using HIVE.
Environment: RedHat Linux 5, MS SQL Server, Oracle, Hadoop CDH 4, PIG, Hive, ZooKeeper, Flume,
HDFS, HBase, Sqoop, Solr, Python, Oozie, UNIX Shell Scripting, PL/SQL.
JD Power and Associates, a McGraw Hill Financial company, is a marketing research company known for its
prowess in conducting surveys in several industries such as automobiles, hotels etc. With increasing demand in
building an out-of-the-box platform for its customers, Nextgen platform was built with cutting-edge
technologies which enables its customers to conduct their own surveys.
Responsibilities:
Responsible for cluster maintenance, monitoring, commissioning and decommissioning data nodes,
manage data backups.
Supported MapReduce Programs that are running on the cluster.
Designed appropriate partitioning/bucketing schema to allow faster data retrieval during analysis
using HIVE.
Involved in creating Hive tables, loading data and running hive queries.
Extensive working knowledge of partitioned table, UDFs, performance tuning, compression-
related properties in Hive.
Implemented and configured High Availability Hadoop Cluster (Quorum Based).
Periodically reviewed Hadoop related logs and fixing errors and preventing errors by analyzing the
warnings.
Worked in using Flume to stream data into HDFS - from various sources. Managed interdependent
Hadoop jobs and automated several types of Hadoop map-reduce jobs, Hive.
Installed and configured Hadoop and responsible for maintaining cluster and managing and
reviewing Hadoop log files.
Provided operational support services related to Hadoop infrastructure and application installation.
Handled the imports and exports of data onto HDFS using Flume and Sqoop.
Supported technical team members in management and review of Hadoop log files and data
backups.
Environment: HDFS, CDH3, CDH4, Hbase, NOSQL, RHEL 4/5, Hive, Pig, Perl Scripting, Sqoop, Flume
Microsoft is a multinational technology company that develops, manufactures, licenses, supports and sells
computer software, consumer electronics and personal computers and services. Teamed with infrastructure,
network, database, application and business intelligence teams to evaluate new host requests and resource
management, perform updates and upgrades to the existing farm from time to time.
Responsibilities:
Involved in Cluster maintenance using Cloudera Manager, used JobTracker UI to analyze incomplete or
failed jobs and ran file merger to consolidate small files and directories.
Worked with data delivery teams, Linux admin team to setup new users, user spaces, quotas, setting up
Kerberos principals and testing HDFS/MapReduce access, Hive/Pig access for them.
Teamed with infrastructure, network, database, application and business intelligence teams to evaluate new
host requests and resource management, perform updates and upgrades to the existing farm from time to
time.
Wrote shell scripts and used Cloudera Manager to monitor the health check of Hadoop daemon services and
respond accordingly to any warning or failure conditions.
Performed tuning of Hadoop MapReduce routines written in Java and provided 24X7 support for developers
who use Hadoop stack. Automated MapReduce job workflows using Oozie scheduler.
Environment: Cloudera,Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper,
Kerboros, RedHat Linux
Responsibilities:
Analyzed all business functionality related to back end database interfaces.
Developed technical specifications for various back end modules from business requirements.
Specifications are done according to standard specification formats.
Worked with DBA in making enhancements to physical DB schema. Also coordinated with DBA in
creating and managing table, indexes, tablespaces, triggers, db links and privileges.
Analyzed and designed tables based on small and large database transactions.
Developed back end interfaces using PL/SQL stored packages, procedures, functions, Collections,
Object Types, triggers, C, K-Shell scripts.
Developed screens and reports using Oracle Forms/Reports.
Responsible in taking crystal reports and SQL reports
Utilized SQL*Loader to load flat files database tables.
Involved in Extracting, Transforming and Loading by using Informatica tool.
Responsible for SQL tuning and optimization using Analyze, Explain Plan, TKPROF utility and
optimizer hints.
Utilized SQL developer tool in developing all back end database interfaces.
Responsible for performing code reviews.
Developed user documentation for all the application modules. Also responsible for writing test plan
documents and unit testing for the application modules.
Environment: SQL, PL/SQL, Java, Oracle 10g, SQL*Plus, Windows, SQL*Loader, Explain Plan and
TKPROF tuning utility, SQL Developer, TOAD
Responsibilities:
As a Software Developer, responsible for design and development of module specifications.
Analyzed, designed, optimized and tuned Java programs, PL/SQL Procedures, Oracle
StoredProcedures
Wrote cursors and control structures using PL/SQL.
Creating PL/SQL objects like stored procedure, functions, Packages, Cursors with best Optimized
Techniques.
Creating various types of triggers like DML triggers, DDL triggers, Database Triggers.
Involved in bug fixing of tickets.
Preparation of Unit Test Data.
Execution of Unit Test Plan Conditions and Test Cases.
Simulation and Code Walk through.
Environment: SQL, PL/SQL, Java, Oracle 10g, SQL*Plus, Windows, SQL*Loader, Explain Plan and
TKPROF tuning utility, SQL Developer, TOAD
Education