For DBAs: "Think of backup strategies as restore strategies"

July 24th, 2014
Posted in Tech

Andrew Pruski is an SQL Server DBA currently working for Ding in Dublin, Ireland. He’s previously worked for Occam-DM Ltd and the United Kingdom Hydrographic Office in England. He’s benefitted immensely from the online DBA community, and his blog is an attempt to give something in return.

Aside from the SQL Server, he’s keen on running and playing rugby. He can also play the guitar (badly, but it doesn’t stop him).

How come you got involved in IT?

I always enjoyed working with computers when I was in school and so it seemed a natural area to study at the university. I enjoyed studying database design so once I completed my BSC in 2005, I started looking for a database developer position.

Whilst working as a junior developer, I realised that the code I was writing took a long time to execute (mainly because of all the cursors I was using). I did some research online and found multiple websites dedicated to SQL Server and opportunities to work as a DBA in particular. I read tons of online articles on websites such as SQLServerCentral.com and the more I read, the more interested I became in that area. It was then that I decided to pursue a career as a DBA.

What are the most common and the most challenging issues you've handled recently?

The most challenging issue I have ever had to perform was a task given to me when I was a relatively inexperienced DBA. A production data warehouse had a different collation to the development and staging versions and I was tasked with bringing the production database in-line with the other environments.

Why not change the collation of the databases in the other environments? Well, as I said, I was inexperienced and looking to prove myself, so off to work I went. The collation of a database can be changed quite simply from the Properties menu. The problem is that the setting only changes the collation for all new objects created. Everything else in the database remains on the old collation.

So after what felt like an age…I came up with the following procedure to change the collation of all the current objects with the database:

  1. Script out all database objects
  2. Drop all functions
  3. Drop all views
  4. Drop all foreign keys
  5. Drop all primary keys
  6. Drop all indexes
  7. Drop all statistics
  8. Change column collation
  9. Change database collation
  10. Recreate indexes
  11. Recreate primary keys
  12. Recreate foreign keys
  13. Recreate statistics
  14. Recreate views (and associated indexes)

I tested and tested, and tested, so when it came to deployment, it all ran smoothly. However, I’ll always think of this project as one of the most challenging I’ve worked on, not just because of the technical side but because of the human aspect of it as well. It taught me a lot about relationships within an SQL Server database but it also taught me that part of a DBA’s job should be to question why something is being done and to provide alternatives where possible. I should have questioned why the need to change the collation of the production database was felt. It would have been simpler (and a lot less risky) to change the development databases’ collation.

The most common issue that I deal with are blocking/deadlocks occurring within production databases. There is monitoring in place which sends out alerts (via email) meaning I can respond quickly to resolve any issues. An SQL Server agent polls active connections to check for blocking and Service Broker utilising Event Notifications has been implemented to generate deadlock alerts. This allows me to capture the offending query and see what action needs to be taken in order to prevent future incidents (more often than not, either rewriting the query or creating an appropriate index).

However, preventative measures are always better than having to react to issues in real time. I work closely with developers to ensure these issues are kept to a minimum by implementing good database design and by writing efficient SQL queries.

Do you use any external tools for database monitoring? Which ones?

I have worked with Nagios, Zabbix, RedGate, Idera and Confio monitoring tools. All are good systems with their own positives and negatives but I strongly feel that any DBA should be able to implement their own monitoring system. I wrote a short blog post detailing my reasons why.

In your opinion, what server status variables should a DBA log and monitor?

I always create my own database in any SQL instances that I administer. This allows me to setup monitoring for the following:

  • Blocking
  • Deadlocks
  • Auto-Growth
  • Failed SQL Server Agent jobs
  • Backups
  • Percentage transaction log used
  • Corruption alerts
  • Wait statistics
  • Index usage statistics
  • Index fragmentation
  • Error log auditing
  • Disk space
  • Memory Usage
  • Page life expectancy vs. lazy writes
  • IO subsystem performance.

What about disaster recovery and prevention? What is your approach?

Every disaster recovery plan should be tailored to each individual system. It is one of the key aspects of a DBA role within the company to work closely with system owners to define Recovery Point Objectives (how much data can be lost it the event of a failure) and Recovery Time Objectives (how long it should take to bring the system back online).

Once the RPO & RTO have been established, a high availability solution can be built and a disaster recovery strategy implemented. High availability covers SQL Server features such as mirroring, clustering and always on. Disaster recovery strategies start with a backup schedule for the databases but can include features such as log shipping or asynchronous mirroring to a DR site.

I attended an SQL Skills training course last year in which Paul Randal talked about how to approach backup strategies. His advice was to think of backup strategies as restore strategies and that DBAs should consider performing many restores to bring the database online in the event of a system failure. This is one of the best pieces of advice I have been given as a DBA. It is all well and good to back up database transaction logs every 15 minutes, but how many of those will need to be restored in the event of a failure? I really do not want to be restoring hundreds of log backups at 3 o’clock in the morning (because this is generally when failures happen for me, for some reason).

Contact him at: 

Email       dbafromthecold@gmail.com

Twitter     @DBAFromTheCold

Blog         dbafromthecold.wordpress.com

LinkedIn  ie.linkedin.com/in/andrewpruski/



Facebook suffers global outage

June 19th, 2014
Posted in Monitoring, Industry News, Tech

facebook.com went down for aboutt 15 minutes between 4:00 and 4:15 AM EST. Users trying to connect to the site were seeing an error message "Sorry, something went wrong".  The issue has been confirmed by multiple locations around the world and as far as we can tell all Facebook users were affected.

 

 

The Facebook Platform Status showed a sharp increase in API response times for the time of the outage.

Facebook Platform Status

The last information posted by Facebook is

Sitewide issue resolved

Earlier this morning, we experienced an issue that prevented use of the API for a brief period of time. We resolved the issue quickly, and we are now back to 100%. We're sorry for any inconvenience this may have caused.

We are awaiting an official statement and more details from Facebook and we will update this post as soon as more information is available.

 

 

Websitepulse's 2014 Mother's Day Performance and Uptime Report [INFOGRAPHIC]

June 4th, 2014
Posted in Monitoring

Click to Enlarge

According to a survey conducted by the National Retail Federation, the total spent for Mother’s Day this year reached $20 billion.

Considering the popularity of this day, and the growing popularity of online shopping, we couldn’t stress more on how important it is for online stores to maintain flawless performance and availability during the weeks prior to Mother’s Day. Just like a bad selection of gifts and high prices could turn visitors away from your website, a slow website, inability to perform an online purchase or website downtime could be just as harmful, sending your potential buyers to competitors’ websites.

As usual, we monitored 12 of the most popular websites for purchasing gifts and flowers for moms. We were specifically observing their uptime and response time as the most important metrics for the reliability of a website.

Response time varied between 1,4 (personalizationmall.com) and 18,4 seconds (giftbasketsplus.com). This means that some of the websites were not able to bear the heavy traffic during a certain period of time. That, on the other hand, could have resulted in disappointed buyers and revenue loss.

At the same time, uptime varied between 100% (gifttree.com, sees.com, proflowers.com, fromyouflowers.com, beyondblossoms.com) and 99.7% (giftbasketsplus.com) which is quite acceptable.

For more details, look at the infographic above or go to websitepulse.com.

Websitepulse 2014 Tax Performance & Uptime Report [INFOGRAPHIC]

April 30th, 2014
Posted in Monitoring

Click to Enlarge

Tax return season is over and the U.S. Internal Revenue Service (IRS) reported that millions of dollars in fraud were filed this year.

Just days before the April 15 deadline, the IRS started 295 identity theft investigations pushing the total number of active fraud cases to more than 1,800, according to newsnet5.com.

"Identity theft is one of the fastest growing crimes nationwide, and refund fraud caused by identity theft is one of the biggest challenges facing the IRS," said IRS Commissioner John Koskinen.

Fraud cases increase right before the deadline simply because most people wait until the last minute to file their tax returns online.

Little can be done to avoid identity theft or a fraudulent tax return except being alert and choosing a trustworthy tax return website. On the other hand, tax return websites should be prepared to protect their clients from fraudsters but they should also ensure their websites are up and running 24/7.

Websites are usually likely to experience traffic outages right before the end of the tax return period, disabling customers to save or submit their tax information on time. Disappointed customers would then be highly unlikely to return to the same tax return websites next year.  We couldn’t tell whether there were clients who could not register successfully, but we could tell how each of these websites performed during the monitoring period, as we were reporting their response time and availability on a daily basis. Olt.com, for example, was having great performance and availability until the last day, April 15th, when it experienced traffic overload resulting to the website being unavailable during 7 hours of the monitoring period. The 7 hours of downtime may not be much, but compared to the other websites, it was the highest. The rest of the monitored websites performed quite well, as the majority demonstrated 100% uptime.  As far as the total response time, taxbrain.com scored an average of 22 seconds which again, is not much, but compared to freetaxusa.com with only 2 seconds of load time for the whole monitoring period, would be considered high.

For details, see the infographic above, or see the daily reports.

Test Your Server Against the Heartbleed OpenSSL Vulnerability

April 8th, 2014
Posted in Tools, Tech

A major vulnerability in OpenSSL software was announced late yesterday, impacting all servers having the Heartbeat TLS extension enabled with OpenSSL versions states above.

The "heartbleed" vulnerability, has been already recorded as CVE-2014-0160. Further details can be found at http://heartbleed.com and https://www.openssl.org/news/secadv_20140407.txt.

The bug has already scared a lot of system administrators and site owners, and the one that we've done on WebSitePulse was to release a test against this vulnerability.

So, if you want to check whether your secure server is affected or not, please visit: http://www.websitepulse.com/heartbeat.php