Data Engineer
Stitch Labs
At Stitch, the data science team works to drive internal business decisions and product decisions with empirical evidence and experimentation. We are also strong believers that Stitch can bring a similar level of insight to our customers by helping them make decisions around their own purchasing, inventory, and sales processes.
We use our sharded MySQL application database as a source of information for analysis in Python on a daily basis. Providing customer facing insights, we use Sqoop and PIG to ingest data into HDFS and forecast at scale with Hadoop Streaming. As Stitch grows as a company, so has the scale of data; there is a continual effort to make access to application data scaleable for both internal analysis and customer facing insights.
We're looking for an experienced Data Engineer to join our data science team to own the infrastructure and ETL processes within our data stack.
What You'll Do
An exceptional candidate will be a pioneer who thrive in a semi-unstructured and ambiguous environment. You’ll navigate resource constrained start-up culture with ease and resiliently respond well from collaboration to communication and from speed through iteration. You understand their is no perfect way, but there is always a better way.
- Owns a vision for scalable analytics infrastructure at Stitch and an up-to-date source of truth for analysis and insight; implement it!
- Works with our Engineering Operations team to create a data pipeline for extracting data from our distributed MySQL infrastructure into scalable data warehousing and analytics infrastructure while maintaining data validity
- Automates ETL processes from other data sources key to Stitch operations such as Salesforce, Google Analytics, and SnowPlow
- Collaborates with data scientists and business analysts to provide centralized access to insights and data for internal stakeholders
- Collaborates with data scientists and product managers to develop infrastructure to provide real time insights to customers to automate and inform business processes for retailers
- Orchestrates between big-data driven research and production ready engineering in all projects
Qualifications
- B.S., M.S. or Ph.D. in Computer Science or related field
- 1+ years of experience in a high volume data environment
- Implemented ETL processes from MySQL/API services
- Data warehousing for OLAP reporting (Redshift, Bigquery, Impala)
- Experience on Hadoop/Hbase/Pig is highly preferred (Bonus for Spark, Flume)
- Strong knowledge of data structure, algorithm, software design patterns and principles (e.g. estimate tradeoffs of performance and complexity)
- Expert knowledge of key data structures and algorithms (i.e., indexing, hash tables, joins, aggregation)
- Experience with algorithms; understand general machine learning/data mining concepts is preferred (Bonus for Python Data Stack experience)
Why Stitch?
We are a restless bunch at Stitch. We're leaders in our industry, but our hunger to make an impact on the world is barely satisfied. Our grit, passion, and unyielding dedication to improve the lives our customers influences our culture in ways you won't find at other companies. And while we continue to be the best, it doesn't go without care, compassion and appreciation for the people around us. When you walk in our doors, you'll find smart, courageous people who consider one another family. And this is something we protect every day.
You'll be encouraged to embrace curiosity and always ask "why". Whatever you thought about working hard will be put to the challenge here. So if you are the kind of person who welcomes the challenge, an environment that will push you to grow and a company that moves quickly, well - you've found the right place.
About Stitch Labs
Stitch is an online inventory control solution that simplifies multichannel retail business. It automatically syncs inventory, orders and sales across channels, which provides retailers a holistic understanding of their operations. With Stitch, retailers save time, make better decisions, and grow their business. Stitch integrates with top sales channels such as Amazon, eBay, Etsy, Shopify, WooCommerce, and Square, as well as add-ons including Quickbooks, Xero, and ShipStation.
Full Time