Using live data in development means you can test real workloads and get realistic results in transactions and reports. It’s also a significant security risk, as U.K. baby retailer Kiddicare recently found out : The company used real customer names, delivery addresses, email addresses and telephone numbers on a test site, only to have the data extracted and used to send phishing text messages to customers.
In 2015, Patreon CEO Jack Conte admitted the names, shipping addresses and email addresses for 2.3 million users of the crowdfunding site had been breached, also “via a debug version of our website that was visible to the public” that had a “development server that included a snapshot of our production database.” And earlier this year a developer at Sydney University in Australia lost a laptop containing an unencrypted copy of a database with the personal and medical details of 6,700 disabled students.
“We can point to incidents such as Kiddicare and Patreon to show the serious security ramifications of this,” says security expert Troy Hunt, who runs the Have I Been Pwned? site to help consumers find out if any of their accounts have been compromised. “There are industry precedents for just how bad this can go.”
Being able to simulate or virtualize data is not only safer, says Hunt, but it can be a productivity boost. “It’s not just the security and code quality issues; generating test data in an automated fashion enables you to easily recreate the same environment for others on the team. In an ideal world, you simply fire up the data generation script and provision yourself a fully populated, non-production environment. Yes, it may be more work than a one-off copy of production but you only need to do this once and you’re not faced with dealing with customer data outside production.”