Hadoop Developer/Administrator and Content Editor
I am a Hadoop Developer/Administrator and Content Editor. Developed Oozie workflow to automate the loading of data into HDFS and Pig for data pre-processing. Architected 60-node Hadoop clusters with CDH4.4 on CentOS. Successfully implemented Cloudera on a 30-node cluster. Leveraged Sqoop to import data from RDBMS into HDFS. Developed ETL framework using Python and Hive (including daily runs, error handling, and logging) to glean useful data and improve vendor negotiations. Performed cleaning and filtering on imported data using Hive and MapReduce. Developed and designed a 10-node Hadoop cluster for sample data analysis. Regularly tune performance of Hive and Pig queries to improve data processing and retrieving. Run Hadoop streaming jobs to process terabytes of XML data. Created visualizations and reports for the business intelligence team, using Tableau. Analyzed datasets using Pig, Hive, MapReduce, and Sqoop to recommend business improvements. Setup, installed, and monitored 3-node enterprise Hadoop cluster on Ubuntu Linux. Analyzed and interpreted transaction behaviors and clickstream data with Hadoop and HDP to predict what customers might buy in the future. Setup the ganglia monitoring tool to monitor both hadoop specific metrics and also system metrics. Wrote custom Nagiosscripts to monitor Namenode, data node, secondary name node, job tracker and task trackers daemons and setup alerting system. Experimented to dump the data from MySQL to HDFS using sqoop.
As a content editor wrote copy sourced material and developed ideas. Proofed and edited online copy for newspaper website. Ensured all articles and posts followed correct grammar and style guidelines.
Tested website usability and appearance for attractiveness and errors. Designed new features for site to increase user engagement. Monitored website communications and escalated problems.