Work History

Below are my job history and accomplishments at each job.

What I've been up to in the last couple years:

  1. King & Spalding Law [Atlanta, Georgia (Hybrid)]

    • Architected and implemented new database tables in MSSQL to align with evolving business requirements, enhancing data accessibility and analytical capabilities.
    • Developed and optimized stored procedures to reflect changes in business logic, ensuring data integrity and consistency across the database.
    • Engineered multi-hop ETL pipelines using Azure Data Factory (ADF) to facilitate seamless data movement within a Medallion Architecture.
    • Designed, built, and maintained robust data pipelines, improving data flow efficiency and ensuring the timely delivery of critical business intelligence.
    • Adhered to CI/CD best practices by utilizing Azure DevOps for the automated integration and deployment of database objects (tables, stored procedures) and ADF ETL pipelines to production environments.
    • Streamlined the deployment process for new data solutions, reducing manual effort and minimizing the risk of errors.
  2. Boeing [Seattle, Washington (Remote)]

    • Led a team of 3 developers to develop and deploy multiple AI data engineering pipelines for Boeing Global Services.
    • Architected and built scalable data pipelines to ingest vast, diverse datasets (documents, databases, web content) from various sources, and transforms them into a structured format suitable for the RAG system.
    • Orchestrated the use of embedding models to convert text chunks into numerical vectors, which are essential for the retrieval component of the RAG pipeline.
    • Architected, implemented, and maintained vector databases, which serves as the core knowledge base for RAG models.
    • Created automated, event-driven pipelines to continuously update vector databases with new data.
    • Created External Delta Tables leveraging features such as Change Data Feed for row-level data management, Schema Enforcement for columns and datatypes validation, and Vacuum and AutoOptimize for efficient file storage management.
    • Managed and granted Unity Catalog privileges to principals on objects such as Metastores, Schemas, Tables and Views, using both SQL command and Unity Catalog UI to ensure proper data governance.
    • Implemented CI/CD processes for propagating Databricks workspaces and jobs from DEV to PROD using GitHub Actions.
    • Utilized languages such as Pyspark for distributed data processing in Databricks notebooks; and SQL for DDL, DML and DQL operations in Databricks notebooks and SQL Warehouse.
    • Designed Gold tables that are optimized for query performance to enable seamless Power BI dashboard reporting.
  3. Capgemini: The Hartford Insurance Group [Hartford, CO (Remote)]

    • Led a team of 3 to design data pipelines to feed Small Commercial machine learning models.
    • Configured, tested and monitored Databricks Batch ELTL jobs that extracted and loaded big data from sources such as REST API, Oracle, Snowflake, SharePoint, NTFS, to Azure Data Lake following the Medallion Multi-hop Architecture.
    • Designed streaming pipelines that combined Azure Data Factory and Databricks’ Autoloader to stream near real-time big data from Apache Kafka.
    • Followed best practices to develop scalable jobs using Job Clusters with appropriate sizes and number of workers to ensure proper scaling and cost minimization.
    • Combined data modelling principles with business requirements to design External Delta Tables following the Medallion Architecture with delta files secured and governed in ADLS.
    • Leveraged Delta Lake features such as Change Data Feed for accurate row-level ingestion; Schema Enforcement for columns and datatypes validation; AutoOptimize and Vacuum for efficient delta files management.
    • Implemented CI/CD processes for propagating Databricks workspaces from Dev to QAT to Prod using GitHub Actions.
    • Utilized languages such Pyspark for distributed data processing in Databricks notebooks; and SQL for DDL, DML and DQL operations in Databricks, Snowflake, Oracle.
    • Designed Gold Tables that are optimized for query performance to enable seamless Power BI dashboard reporting.
  4. Capgemini: Citi [Buffalo, NY]

    • Orchestrated Databricks ELTL workflows with streaming and batch jobs that moved big data from sources such as Financial Systems, Marketing/Advertising, REST API, SQL Server, NTFS, to AWS S3 using the Medallion Multi-hop Architecture.
    • Combined data modelling principles with business requirements to design and create External Delta Tables following the Medallion Architecture with delta files secured and governed in S3.
    • Optimized cluster configurations and Spark settings on Databricks to achieve optimal performance and scalability for large-scale data processing workloads.
    • Configured AWS Networking resources, such as Subnets and Firewalls to ensure secure data routing in the Data Plane Designed Gold Tables that are optimized for query performance to enable seamless Tableau dashboard reporting.
    • Performed Databricks administrative tasks such as, but not limited to workspace provisioning and configuration; job scheduling and monitoring; notebook and code management; and secret management via AWS KMS.
    • Performed DataOps processes such as streamlining ingestion processes, automated testing and CI/CD
  5. Cummins Inc [Columbus, IN]

    • Designed and built ingestion framework to efficiently consume and transform big data using Azure Databricks Spark and ADF engines.
    • Performed Databricks administrative tasks such as, but not limited to workspace provisioning and configuration; user management and permissions; cluster management; job scheduling and monitoring; code management; and secret management via Azure Vault.
    • Established data governance policies and data quality standards for managing metadata, lineage, and data cataloging on Azure Databricks.
    • Deployed resources on Azure using terraform.
    • Utilized Databricks notebooks and libraries to prototype, test, and deploy Spark jobs, improving code reusability and maintainability.
    • Utilized ticketing systems and collaboration tools to manage customer inquiries, escalate issues, and track resolution progress to ensure timely resolution.
    • Implemented CI/CD pipelines for deploying data engineering and analytics solutions on Databricks using Azure DevOps.
    • Performed data preparation processes such as cleaning/wrangling and feature engineering using Pandas library and PySpark.
    • Optimized SQL queries to perform data extraction to fit the analytical requirements.
    • Managed project with Jira in agile methodology and documented projects using Confluence.
    • Built Power BI dashboards to visualize data.
  6. Wegmans Food Markets [Rochester, NY]

    • Extracted inventory flow and stock level data across various nodes (hubs, stores, etc.) by joining tables from more than 10 databases.
    • Designed, developed ETL (Extract, Transformation and Load) strategy to populate the Data Warehouse from various source systems feeds using ETL tools like Informatica, Power Exchange, MDM Web services, PL/SQL and Unix Shell scripts.
    • Extensively performed Data Cleansing during ETLs extraction and loading phase by analyzing the raw data and writing SAS Program and creating complex reusable macros.
    • Migrated data from SQL Server (On-prem) to AWS S3.
    • Built Tableau dashboards for ad-hoc analysis.