Date Parsing

Different feed types and versions use wildly different date formats. Universal Feed Parser will attempt to auto-detect the date format used in any date element, and parse it into a standard Python 9-tuple, as documented in the Python time module.

The following elements are parsed as dates:

History of Date Formats

Here is a brief history of feed date formats:

  • CDF states that all date values must conform to ISO 8601:1988. ISO 8601:1988 is not a freely available specification, but a brief (non-normative) description of the date formats it describes is available here: ISO 8601:1988 Date/Time Representations.
  • RSS 0.90 has no date elements.
  • Netscape RSS 0.91 does not specify a date format, but examples within the specification show RFC 822-style dates with 4-digit years.
  • Userland RSS 0.91 states, “All date-times in RSS conform to the Date and Time Specification of RFC 822.RFC 822 mandates 2-digit years; it does not allow 4-digit years.
  • RSS 1.0 states that all date elements must conform to W3DTF, which is a profile of ISO 8601:1988.
  • RSS 2.0 states, “All date-times in RSS conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred).
  • Atom states that all date elements must conform to W3DTF.

Recognized Date Formats

Here is a representative list of the formats that Universal Feed Parser can recognize in any date element:

Recognized Date Formats

Description Example Parsed Value
valid RFC 822 (2-digit year) Thu, 01 Jan 04 19:48:21 GMT (2004, 1, 1, 19, 48, 21, 3, 1, 0)
valid RFC 822 (4-digit year) Thu, 01 Jan 2004 19:48:21 GMT (2004, 1, 1, 19, 48, 21, 3, 1, 0)
valid W3DTF (numeric timezone) 2003-12-31T10:14:55-08:00 (2003, 12, 31, 18, 14, 55, 2, 365, 0)
valid W3DTF (UTC timezone) 2003-12-31T10:14:55Z (2003, 12, 31, 10, 14, 55, 2, 365, 0)
valid W3DTF (yyyy) 2003 (2003, 1, 1, 0, 0, 0, 2, 1, 0)
valid W3DTF (yyyy-mm) 2003-12 (2003, 12, 1, 0, 0, 0, 0, 335, 0)
valid W3DTF (yyyy-mm-dd) 2003-12-31 (2003, 12, 31, 0, 0, 0, 2, 365, 0)
valid ISO 8601 (yyyymmdd) 20031231 (2003, 12, 31, 0, 0, 0, 2, 365, 0)
valid ISO 8601 (-yy-mm) -03-12 (2003, 12, 1, 0, 0, 0, 0, 335, 0)
valid ISO 8601 (-yymm) -0312 (2003, 12, 1, 0, 0, 0, 0, 335, 0)
valid ISO 8601 (-yy-mm-dd) 03-12-31 (2003, 12, 31, 0, 0, 0, 2, 365, 0)
valid ISO 8601 (yymmdd) 031231 (2003, 12, 31, 0, 0, 0, 2, 365, 0)
valid ISO 8601 (yyyy-o) 2003-335 (2003, 12, 1, 0, 0, 0, 0, 335, 0)
valid ISO 8601 (yyo) 03335 (2003, 12, 1, 0, 0, 0, 0, 335, 0)
valid asctime Sun Jan 4 16:29:06 PST 2004 (2004, 1, 5, 0, 29, 6, 0, 5, 0)
bogus RFC 822 (invalid day/month) Thu, 31 Jun 2004 19:48:21 GMT (2004, 7, 1, 19, 48, 21, 3, 183, 0)
bogus RFC 822 (invalid month) Mon, 26 January 2004 16:31:00 EST (2004, 1, 26, 21, 31, 0, 0, 26, 0)
bogus RFC 822 (invalid timezone) Mon, 26 Jan 2004 16:31:00 ET (2004, 1, 26, 21, 31, 0, 0, 26, 0)
bogus W3DTF (invalid hour) 2003-12-31T25:14:55Z (2004, 1, 1, 1, 14, 55, 3, 1, 0)
bogus W3DTF (invalid minute) 2003-12-31T10:61:55Z (2003, 12, 31, 11, 1, 55, 2, 365, 0)
bogus W3DTF (invalid second) 2003-12-31T10:14:61Z (2003, 12, 31, 10, 15, 1, 2, 365, 0)

Universal Feed Parser recognizes all character-based timezone abbreviations defined in RFC 822. In addition, Universal Feed Parser recognizes the following invalid timezones:

  • AT is treated as AST
  • ET is treated as EST
  • CT is treated as CST
  • MT is treated as MST
  • PT is treated as PST
← Advanced Features
HTML Sanitization →