Engine - why, who and what for

Identify key indicators of employee attrition for past SanDisk employees based on publicly available data.
Analyse patterns in profile information of past SanDisk employees to solve the following:

  • Identify attributes that could be indicative of "risk of leaving". E.g. Profiles with Stanford MS+2 year tenure has a "low or 20%" risk of leaving the company, profiles with same title for more than 2 years have a "high or 40%" risk of leaving etc.
  • Develop a model to predict employee attrition using features identified.
It's my first time here and I want to participate in a challenge. How should I start?

As a starting point, profile data of past SanDisk employees have been provided to solvers, under the “Data” tab. Solvers may find more profiles or resumes by doing Google searches. Solvers may also find publicly available resumes on sites such as indeed, simplyhired etc.

I work for business / institution and I want to host a contest. How can I do that?
  1. Submit an approach note
    • Solvers will submit an approach to solve for the objective.
      • Please be as detailed as possible. RAW data has been provided to you for building your approach note.
      • What challenges you foresee and how to solve should be mentioned in the approach note
      • What kind of pre-processing and feature engineering you will perform should also be explained.
      • Which analytical/modeling approach you will take to solve the objective (please be detailed by describing the model steps using the data provided)
    • Engine will evaluate the approach (es) and promote 6 solvers to participate in tasks 2 and 3.
  2. (Only promoted solvers) Building the data set
    • Each participant will process the given RAW data and extract relevant information to create data sets/features conducive to analysis.
    • Data enrichment (if any)
    • Solvers are also encouraged to layer additional data (which is publicly available) on top of given data as appropriate.
  3. (Only promoted solvers) Modeling employee attrition
    • Solvers are expected to model and identify attributes that are indicative of ‘risk of leaving’ of employees.
    • Some key indicators identified by solvers should be applicable to new employees in SanDisk as well. If this is not fulfilled, the indicators will have limited usage for the client.
How can I...

Client will evaluate approaches/models based on the following for individual tasks:

  • Approach Note
    • Types of attributes that solvers intend to extract and consider for modeling employee attrition from the given data
    • Additional data that solvers intend to layer on top of given data
    • Challenges solvers foresee in the task and how they plan to overcome challenges
    • Modeling approach used for solving the problem including how Solvers intend to define attrition as part of analysis design
  • Data Building
    • Feature Engineering
    • Number of well-defined attributes considered in the data set built for performing analysis
    • Number of new attributes introduced in the analysis dataset that are in addition to ones considered from given data
    • Number of additional data sources used for building the data set / Data enrichment
    • Data Structuring / Treatment of missing values if any / Data Sampling
  • Modeling
    • Client will evaluate the model with their internal information for validating the accuracy of the model objectively
    • Applicability of key indicators identified as top predictors of employee attrition to new employees in SanDisk
List of possible...

All Solvers who get promoted to participate/submit in Task 2 & 3 will be paid a base prize of USD 1000 as long as they fulfil the ‘Minimum criteria’ listed in the section below and the deliverables expected in each task as mentioned above.

Bonuses will be paid to solvers based on client evaluation of the models submitted (end of Task 3):

1st bonus – USD 1500

2nd bonus – USD 1000

3rd bonus – USD 500