path: root/app/fcuny-net/content/resume.md



# Franck Cuny

Technical Director Site Reliability Engineer

Email: franck@fcuny.net | Phone: 415-617-5129

Results-driven Site Reliability Engineering leader with extensive
experience in architecting, scaling, and optimizing large-scale
distributed systems. Proven track record of driving reliability
improvements, fostering cross-functional collaboration, and mentoring
engineering talent. Dedicated to building resilient infrastructures and
cultivating a strong reliability culture.

## Core Competencies:

- Technical leadership and mentorship
- Cross-team collaboration and communication
- Large-scale distributed systems architecture
- Reliability engineering and disaster recovery
- Infrastructure optimization and cost reduction
- Production readiness and failure testing methodologies

## Career Focus:

Seeking opportunities to lead transformative reliability initiatives,
mentor the next generation of SREs, and drive architectural decisions
that significantly enhance system resilience and performance at scale.

# Experience

## Roblox, San Mateo

______________________________________________________________________

Site Reliability Engineer Technical Director (IC7) August 2024 - to date
Site Reliability Engineer Principal II (IC6) Feb 2022 - August 2024

______________________________________________________________________

As a Team Lead for the Site Reliability group, I define road-maps,
milestones, and identify areas where SREs can partner with different
teams to improve overall reliability of our infrastructure and services.
Key projects and responsibilities include:

- \*[Cell Architecture Implementation\*
  ](https://corp.roblox.com/newsroom/2023/12/making-robloxs-infrastructure-efficient-resilient):
  Led the SRE effort to transition from monolithic Compute clusters to
  a Cell architecture, significantly enhancing Roblox's
  infrastructure resilience and efficiency. Developed migration plans,
  identified necessary automation, and drove production readiness for
  this critical reliability improvement.

- **Edge Infrastructure Migration**: Spearheaded the migration from
  HAproxy to Envoy at the edge, aimed at reducing failure domains,
  improving performance by streamlining the proxy chain, and enabling
  user traffic steering to specific cells from the edge.

- **Active/Passive Reliability Lead**: Orchestrated the failover
  strategy across multiple teams, developing detailed action plans and
  validation procedures. Conducted comprehensive tests to ensure plan
  effectiveness. This work reduced the amount of time for a fail-over
  from days to hours.

- **Reliability Culture Champion**: Mentored engineers of various
  levels (both SREs and SWEs), established a model for production
  readiness, and popularized the practice of running failure exercises
  for new large infrastructure projects.

- **Technical Leadership**: Acted as tech lead on numerous projects,
  demonstrating strong cross-team collaboration skills. Provided
  technical guidance and mentorship to the SRE team, fostering a
  culture of reliability and continuous improvement.

Key strengths include driving complex infrastructure projects,
mentoring, setting reliability standards, and facilitating effective
cross-team collaboration.

## Twitter, San Francisco

______________________________________________________________________

Site Reliability Engineer Senior Staff Engineer Jan 2018 - Jan 2022
Site Reliability Engineer Staff Engineer Aug 2016 - Jan 2018
Site Reliability Engineer Senior Engineer Aug 2014 - Jan 2016

______________________________________________________________________

### Key Achievements and Responsibilities:

- **Large-Scale Infrastructure Management**: Led SRE efforts for one
  of the world's largest compute clusters (Mesos), spanning hundred
  of thousands of nodes across multiple data centers. Defined KPIs and
  improved automation for managing a massive fleet of bare metal
  machines.

- **Kubernetes Adoption**: Spearheaded the initiative to adopt
  Kubernetes for on-premise infrastructure, driving architectural
  decisions and implementation strategies.

- **Cost Optimization**: Designed and implemented strategies that
  significantly improved hardware utilization, resulting in tens of
  millions of dollars in savings on hardware costs.

- **Tech Leadership**: Served as Tech Lead for a team of 6 SREs
  supporting Compute infrastructure. Established critical team
  processes including on-call rotations and postmortem procedures.

- **Cloud and On-Premise Expertise**: Led multiple efforts related to
  Kubernetes deployment and management, both in cloud environments and
  on-premise infrastructure.

- **Storage Systems Migration**: Successfully migrated all pub-sub
  systems from bare-metal deployment to Aurora/Mesos, pioneering the
  adoption of the Compute orchestration platform among storage teams.
  This transition reduced operational overhead, decreased deployment
  times, and enhanced overall system reliability.

- **Network Infrastructure Improvement**: Advocated for and
  implemented the adoption of 10Gb+ networking in data centers,
  enabling significant scaling improvements for storage systems.

- **Cross-Functional Leadership**: Served as the SRE Tech Lead for the
  real time storage team, driving improvements in performance,
  operations, and automation across storage systems.

I consistently demonstrated the ability to lead complex technical
initiatives, deliver impactful projects on-time, optimize large-scale
systems, and drive cross-functional collaboration to achieve significant
improvements in infrastructure reliability, efficiency, and
cost-effectiveness.

## Say Media, San Francisco

______________________________________________________________________

Software Engineer Senior Engineer Aug 2011 - Aug 2014

______________________________________________________________________

During my time at Say Media, I worked on two different teams. I started
as a software engineer in the platform team building APIs then I then
transitioned to the operation team to develop tooling in order to
increase the effectiveness of the engineering organization.

## Linkfluence, Paris

______________________________________________________________________

Software Engineer Senior SWE July 2007 - July 2011

______________________________________________________________________

I was one of the early engineers joining Linkfluence in 2007. I led the
development of the company's crawler (web, feeds). I was responsible
for defining the early architecture of the company, and designed the
internal platforms (Service Oriented Architecture). I contributed to
multiple open source projects on behalf of the company and represented
the company at numerous open source conferences in Europe.
# Franck Cuny

Technical Director Site Reliability Engineer

Email: franck@fcuny.net | Phone: 415-617-5129

Results-driven Site Reliability Engineering leader with extensive
experience in architecting, scaling, and optimizing large-scale
distributed systems. Proven track record of driving reliability
improvements, fostering cross-functional collaboration, and mentoring
engineering talent. Dedicated to building resilient infrastructures and
cultivating a strong reliability culture.

## Core Competencies:

- Technical leadership and mentorship
- Cross-team collaboration and communication
- Large-scale distributed systems architecture
- Reliability engineering and disaster recovery
- Infrastructure optimization and cost reduction
- Production readiness and failure testing methodologies

## Career Focus:

Seeking opportunities to lead transformative reliability initiatives,
mentor the next generation of SREs, and drive architectural decisions
that significantly enhance system resilience and performance at scale.

# Experience

## Roblox, San Mateo

______________________________________________________________________

Site Reliability Engineer Technical Director (IC7) August 2024 - to date
Site Reliability Engineer Principal II (IC6) Feb 2022 - August 2024

______________________________________________________________________

As a Team Lead for the Site Reliability group, I define road-maps,
milestones, and identify areas where SREs can partner with different
teams to improve overall reliability of our infrastructure and services.
Key projects and responsibilities include:

- \*[Cell Architecture Implementation\*
  ](https://corp.roblox.com/newsroom/2023/12/making-robloxs-infrastructure-efficient-resilient):
  Led the SRE effort to transition from monolithic Compute clusters to
  a Cell architecture, significantly enhancing Roblox's
  infrastructure resilience and efficiency. Developed migration plans,
  identified necessary automation, and drove production readiness for
  this critical reliability improvement.

- **Edge Infrastructure Migration**: Spearheaded the migration from
  HAproxy to Envoy at the edge, aimed at reducing failure domains,
  improving performance by streamlining the proxy chain, and enabling
  user traffic steering to specific cells from the edge.

- **Active/Passive Reliability Lead**: Orchestrated the failover
  strategy across multiple teams, developing detailed action plans and
  validation procedures. Conducted comprehensive tests to ensure plan
  effectiveness. This work reduced the amount of time for a fail-over
  from days to hours.

- **Reliability Culture Champion**: Mentored engineers of various
  levels (both SREs and SWEs), established a model for production
  readiness, and popularized the practice of running failure exercises
  for new large infrastructure projects.

- **Technical Leadership**: Acted as tech lead on numerous projects,
  demonstrating strong cross-team collaboration skills. Provided
  technical guidance and mentorship to the SRE team, fostering a
  culture of reliability and continuous improvement.

Key strengths include driving complex infrastructure projects,
mentoring, setting reliability standards, and facilitating effective
cross-team collaboration.

## Twitter, San Francisco

______________________________________________________________________

Site Reliability Engineer Senior Staff Engineer Jan 2018 - Jan 2022
Site Reliability Engineer Staff Engineer Aug 2016 - Jan 2018
Site Reliability Engineer Senior Engineer Aug 2014 - Jan 2016

______________________________________________________________________

### Key Achievements and Responsibilities:

- **Large-Scale Infrastructure Management**: Led SRE efforts for one
  of the world's largest compute clusters (Mesos), spanning hundred
  of thousands of nodes across multiple data centers. Defined KPIs and
  improved automation for managing a massive fleet of bare metal
  machines.

- **Kubernetes Adoption**: Spearheaded the initiative to adopt
  Kubernetes for on-premise infrastructure, driving architectural
  decisions and implementation strategies.

- **Cost Optimization**: Designed and implemented strategies that
  significantly improved hardware utilization, resulting in tens of
  millions of dollars in savings on hardware costs.

- **Tech Leadership**: Served as Tech Lead for a team of 6 SREs
  supporting Compute infrastructure. Established critical team
  processes including on-call rotations and postmortem procedures.

- **Cloud and On-Premise Expertise**: Led multiple efforts related to
  Kubernetes deployment and management, both in cloud environments and
  on-premise infrastructure.

- **Storage Systems Migration**: Successfully migrated all pub-sub
  systems from bare-metal deployment to Aurora/Mesos, pioneering the
  adoption of the Compute orchestration platform among storage teams.
  This transition reduced operational overhead, decreased deployment
  times, and enhanced overall system reliability.

- **Network Infrastructure Improvement**: Advocated for and
  implemented the adoption of 10Gb+ networking in data centers,
  enabling significant scaling improvements for storage systems.

- **Cross-Functional Leadership**: Served as the SRE Tech Lead for the
  real time storage team, driving improvements in performance,
  operations, and automation across storage systems.

I consistently demonstrated the ability to lead complex technical
initiatives, deliver impactful projects on-time, optimize large-scale
systems, and drive cross-functional collaboration to achieve significant
improvements in infrastructure reliability, efficiency, and
cost-effectiveness.

## Say Media, San Francisco

______________________________________________________________________

Software Engineer Senior Engineer Aug 2011 - Aug 2014

______________________________________________________________________

During my time at Say Media, I worked on two different teams. I started
as a software engineer in the platform team building APIs then I then
transitioned to the operation team to develop tooling in order to
increase the effectiveness of the engineering organization.

## Linkfluence, Paris

______________________________________________________________________

Software Engineer Senior SWE July 2007 - July 2011

______________________________________________________________________

I was one of the early engineers joining Linkfluence in 2007. I led the
development of the company's crawler (web, feeds). I was responsible
for defining the early architecture of the company, and designed the
internal platforms (Service Oriented Architecture). I contributed to
multiple open source projects on behalf of the company and represented
the company at numerous open source conferences in Europe.