Essential SQL Date Functions for Data Science Mastery
Written on
Chapter 1: Introduction to SQL Date Functions
In today's data-driven landscape, timestamps can be precise to the millisecond. However, this level of detail isn't always necessary. Often, you only require specific components of a date, such as the year, month, or day.
Understanding how to extract the relevant information from date-time variables is crucial. In this guide, I will explore five essential SQL functions, complete with examples, to equip you with the skills needed for effective date-time data manipulation and analysis. This concise article is designed for quick reading so that you can swiftly grasp these interview-winning SQL techniques. 🏆
Table of Contents:
- DATE and TIME Function
- EXTRACT()
- DATEDIFF()
- TIMESTAMPDIFF()
- MONTHNAME() and DAYNAME()
đź“Ť Note: The examples utilize MySQL Workbench and an earthquake dataset sourced from Kaggle, which is freely available on my GitHub under the MIT license!
The dataset I'll be referencing is as follows: Earthquakes Dataset | Image by Author
It consists of a straightforward 2000 x 7 data structure.
Let’s dive into the basics—extracting date and time from a date-time variable.
Section 1.1: DATE and TIME Function
This function allows you to isolate the date from a date-time variable. As shown in the previous image, the 'Dates' column comprises both date and time in the format YYYY-MM-DD HH:MM:SS. To extract just the date component (month, day, and year), the DATE() function serves as the simplest solution.
SELECT Dates, DATE(Dates) AS only_date
FROM sql_practice.earthquakes;
Similarly, if you need to extract only the time (hour, minute, second), you can use the TIME() function:
SELECT Dates, TIME(Dates) AS only_time
FROM sql_practice.earthquakes;
This process is straightforward, just as the function names imply.
Section 1.2: EXTRACT()
The EXTRACT() function enables you to pull specific parts from given date-time values. In many interviews, you may be asked to aggregate data by month or week, or to compute metrics for a particular month or quarter.
EXTRACT() offers the ultimate flexibility for retrieving various date components, following this simple syntax:
EXTRACT(part_of_date FROM date-time_column)
MySQL currently supports the following parts for extraction:
SELECT Dates AS given_date,
EXTRACT(SECOND FROM Dates) AS seconds_value,
EXTRACT(MINUTE FROM Dates) AS minute_value,
EXTRACT(HOUR FROM Dates) AS hour_value,
EXTRACT(DAY FROM Dates) AS day_value,
EXTRACT(MONTH FROM Dates) AS month_value,
EXTRACT(YEAR FROM Dates) AS year_value,
EXTRACT(WEEK FROM Dates) AS week_value,
EXTRACT(QUARTER FROM Dates) AS quarter_value
FROM sql_practice.earthquakes
LIMIT 5;
This way, you can extract commonly used components from dates. MySQL also accommodates less common combinations, such as:
SELECT Dates AS given_date,
EXTRACT(YEAR_MONTH FROM Dates) AS year_month_value,
EXTRACT(DAY_HOUR FROM Dates) AS day_hour_value,
EXTRACT(HOUR_MINUTE FROM Dates) AS hour_minute_value,
EXTRACT(MINUTE_SECOND FROM Dates) AS minute_second_value
FROM sql_practice.earthquakes
LIMIT 5;
To illustrate how the EXTRACT() function can be applied, consider this scenario: you want to determine the hour of the day when each earthquake occurred.
SELECT Dates,
EXTRACT(HOUR FROM Dates) AS hour_of_day
FROM sql_practice.earthquakes;
This function is particularly useful for data aggregation on weekly, monthly, or quarterly bases or for analyzing trends over time.
For instance, to calculate the weekly total of earthquakes during the first three months of 1965:
SELECT EXTRACT(WEEK FROM Dates) AS week_of_year,
COUNT(DISTINCT ID) AS number_of_earthquakes
FROM sql_practice.earthquakes
WHERE Type LIKE 'Earthquake'
AND EXTRACT(MONTH FROM Dates) < 4
AND EXTRACT(YEAR FROM Dates) = 1965
GROUP BY EXTRACT(WEEK FROM Dates);
The 'week_of_year' column illustrates the week number in a year, showing that the maximum number of earthquakes occurred in the fifth week. This example highlights how to extract different date components and use them in WHERE and GROUP BY clauses—common tasks in data science interviews. ✅
Next, we often need to analyze timestamps based on the difference between two dates.
Section 1.3: DATEDIFF()
The DATEDIFF() function compares two dates and returns the difference in days. The syntax requires two arguments:
DATEDIFF(first_date, second_date)
For example:
SELECT DATEDIFF('1965-02-28', '1965-01-01') AS DATEDIFF_output;
In this case, the first date must be later than the second; otherwise, the result will be negative.
🔸 For another example, if you want to find out how many days before today (July 3, 2022) each earthquake occurred, you can utilize the NOW() function, which returns the current date in timestamp format:
SELECT Dates,
DATEDIFF(NOW(), Dates) AS DATEDIFF_Output
FROM sql_practice.earthquakes;
This function returns the difference in days, which might be too granular since DATEDIFF() only measures the difference in days without flexibility for months or years. This is where TIMESTAMPDIFF() comes into play.
Section 1.4: TIMESTAMPDIFF()
The TIMESTAMPDIFF() function provides greater flexibility for calculating the difference between two dates in terms of days, weeks, months, quarters, or years. Its syntax is as follows:
TIMESTAMPDIFF(part_of_date, first_date, second_date)
Here, the first date should be earlier than the second, and the part of the date can be DAY, MONTH, YEAR, or WEEK.
You could rewrite the previous example using TIMESTAMPDIFF() as follows:
SELECT Dates,
TIMESTAMPDIFF(YEAR, Dates, NOW()) AS TIMESTAMPDIFF_Output
FROM sql_practice.earthquakes;
Now, the output reflects the difference in years.
🔸 For a practical example, let’s filter for earthquakes that occurred within 10 days of January 2, 1965, and had a magnitude greater than 6:
SELECT ID,
Latitude,
Longitude,
Magnitude,
Dates
FROM sql_practice.earthquakes
WHERE Type LIKE 'Earthquake'
AND Magnitude > 6
AND Dates >= '1965-01-02'
AND TIMESTAMPDIFF(DAY, '1965-01-02', Dates) <= 10;
The results show earthquakes that occurred on January 5 and 10, which fit within the 10-day window. This functionality is useful in various real-life scenarios, such as analyzing package delivery times or subscription durations.
Another interesting use case for DATEDIFF or TIMESTAMPDIFF is the "Rising Temperature" question from Leetcode, where you need to select all IDs where the temperature exceeded that of the previous day. Given an input table like this:
Input table name: Weather
This query can be constructed using both DATEDIFF() and TIMESTAMPDIFF():
-- Query using DATEDIFF()
SELECT today.id FROM Weather AS today
JOIN Weather AS yesterday
ON today.temperature > yesterday.temperature
AND DATEDIFF(today.recordDate, yesterday.recordDate) = 1;
-- Query using TIMESTAMPDIFF()
SELECT today.id FROM Weather AS today
JOIN Weather AS yesterday
ON today.temperature > yesterday.temperature
AND TIMESTAMPDIFF(DAY, yesterday.recordDate, today.recordDate) = 1;
Output:
Note the order of date values in the TIMESTAMPDIFF and DATEDIFF syntaxes.
Section 1.5: MONTHNAME() and DAYNAME()
As their names suggest, MONTHNAME() retrieves the month name, while DAYNAME() returns the day name from a given date.
🔸 For example, to get today's day and month name:
SELECT NOW() AS given_date,
MONTHNAME(NOW()) AS month_name,
DAYNAME(NOW()) AS day_name;
This is particularly handy when aggregating data on a daily or monthly basis while dealing with date-time columns.
For example, to analyze which day of the week saw the most earthquakes:
SELECT DAYNAME(Dates) AS day_name,
COUNT(DISTINCT ID) AS number_of_earthquakes
FROM sql_practice.earthquakes
WHERE Type LIKE 'Earthquake'
GROUP BY DAYNAME(Dates)
ORDER BY number_of_earthquakes DESC;
This query reveals that the highest number of earthquakes occurred on Thursday. You can use MONTHNAME() similarly for monthly insights.
In conclusion, I hope this article has provided you with valuable insights into handling dates in SQL. The functions discussed cover a significant range of operations you will encounter with date-time values. With three years of experience in SQL, I can confidently say that these concepts frequently arise in data analyst and data scientist interviews, as well as being invaluable for real-world projects.
Interested in accessing unlimited stories on Medium? đź’ˇ Consider becoming a Medium Member for ongoing access to intriguing content and daily highlights. A small portion of your fee supports me, with no extra cost to you.
Don't forget to join my email list to stay updated with new articles on data science, SQL, and Python tips and tricks.
Thank you for reading! If you're unsure what to explore next, I've curated additional articles for you. For more practical queries, refer to my other resources. It's also crucial to be aware of best practices in SQL!
This video covers essential SQL date functions, including DATEPART, DATEADD, and DATEDIFF, which are vital for data science interviews.
This video showcases the 20 most important date and time functions in MySQL, providing deeper insights into their applications.