Transactional DB (nice to have)
ORM (nice to have)
DB Modelling (nice to have)
AWS S3 (nice to have)
General Description
Data engineers are mainly tasked with transforming data from various data sources into a format that can be easily consumed (analyzed, visualized, etc.
They achieve it by developing, maintaining, and testing technical infrastructures (meant as a general term not associated with IT infrastructure only) for data generation.
Data Engineer works closely with other DEV OPS departments in solution design and with Product Managers and Data Analysts in the requirements and testing phase.
Responsibilities
Create and maintain optimal data pipeline architecture,
Assemble large, complex, distributed data sets from various data sources to meet business functional and non-functional requirements (BI solutions, KPI visualization, reporting structures, etc.),
Invokes and ensures IT Security principles are being assessed and included in the design to meet regulatory and policy requirements,
Identify, design, and implement internal process improvements : automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.,
Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using Airflow, SQL, and AWS technologies,
Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency, and other key business performance metrics,
Work with stakeholders including the Executive, Product Management, and DEV OPS teams to assist with data-related technical issues and support their data infrastructure needs.
Qualification Needs
Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases (mainly MongoDB),
Basic knowledge about the DBMS architecture and capable to define some administration tasks (e.g. indexing strategy)
Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement,
Strong analytic (inclusive data quality) skills related to working with unstructured datasets,
Hands-on experience with data pipeline technologies : ETL systems inclusive optimization techniques (e.g. pagination, the balance between memory and storage)
Fluent in Python programming language with experience with the OOP paradigm,
Being able to work with Docker to set up the local development environment using the predefined Docker compose files,
Strong project management and organizational skills as mainly working in virtual setup,
Experience supporting and working with cross-functional teams in a dynamic environment.
Job details :
Recruitment language : Polski & English
Paid holiday, Employment contract
Flexible hours
Perks in the office :
Modern office full of coffee, snacks, and beverages
Startup atmosphere, no dress code
Bike parking, great location of the office
Benefits :
Training budget for self-development, courses, conferences, and more
Flat structure and small teams
International project
Team Events
Private healthcare, Life & group insurance
Sports subscription or other benefits of your choice
Birthday celebrations
Data Sharing - the best way to better data.
We are passionate about new technologies and we constantly improve our stack. There is always enough room to learn the ropes as we provide internal and external training and education.
We are looking for a person who loves the high quality of code and is interested in big data sets.
90% of the world's data has been produced in just the last two years. Our mission is to help companies with our innovative Data Quality Services.
CDQ is the first company in the market to leverage Data Sharing as the best way to better data. Our customers share data quality rules, enrich their data from publicly shared sources, exchange hand-validated data records, and alert each other of data fraud.
The result of collaborative master data management : Higher efficiency, shared efforts, better quality, less risks.
With our cloud-based software platform, CDQ can provide Data Quality as a Service customers' master data is monitored and updated continuously, using our Data Quality Tools software or fully integrated with their ERP or CRM systems.