Address Quality Check
The address quality check uses a sequence of predefined test procedures to verify that an e-mail address is formally valid and does actually exist. There are two versions: a fast version and a more precise version. The interfaces of both versions differ only marginally, so we document them together and mention the differences where applicable.
The input address can include encoded Unicode characters.
To call the fast address quality check use this syntax:
|Parameter||An e-mail address as last part of the URL|
To call the enhanced address quality check use this one:
|Parameter||An e-mail address as last part of the URL|
The main difference between the two versions is the way they handle temporary errors. Temporary errors are used by mailservers to signal either Greylisting anti-spam measures or real temporary problems, e.g. a high server load.
The normal fast address quality check treats temporary errors as problems that make it impossible to verify the existence of an e-mail address. It will immediately return upon encountering such a problem.
The enhanced address quality check is more thoroughly, and starts a background check for addresses with temporary errors. The background checks repeat the tests according to predefined schedules, to check if the temporary problem caused by Greylisting or other problems goes away. The results of these background checks can be queried by simply repeating the enhanced address quality check. See the documentation for field address for more.
The result contains the results of a sequence of different tests that depend on each other. As soon as one of these fails (result = 0), the subsequent test will not be executed and contain also a 0 result.
We will discuss the results in the order of execution of the respective tests:
Tests the syntax of the address against the e-mail addressing standards, possible result values are
- 0: invalid syntax
- 1: valid syntax
- 2: probably valid syntax, Unicode problems were solved, see decoded
The test is stricter than the standards because it requires a valid domain name in the address. Localhost addresses and other exotic cases will not be accepted, because it is unlikely that these are e-mail addresses valid for business.
If the syntax result is 0 (invalid) or 2 (probably valid) the structure contains also syntax warnings explaining the problems, see below.
If the syntax test ended with a result of 2, this field will contain the decoded ASCII address. A syntax test result of 2 means that the address contained Unicode characters (e.g., umlauts, arabic or chinese characters), which are invalid in an e-mail address. These characters were successfully converted and the resulting, valid ASCII address was stored in decoded. Further tests should always use this decoded address.
The decoding during the syntax test is done in two stages:
- the local part of the address is checked for German umlauts. If found, they are converted to their usual ASCII counterparts (ü - ue, ä - ae, ö - oe, ß - ss)
- the domain part of the address is transformed to Punycode, according to the standard for international domains
Many e-mail providers have their own rules for valid e-mail addresses of their domains. These typically include the minimal length of an address, which punctuation characters are allowed etc. The extended syntax check verifies addresses against these rules. Possible results are:
- 0: invalid syntax for this domain
- 1: valid syntax for this domain
If the extended syntax check fails, the result structure will include syntax warnings explaining the problem, see below.
If one of the syntax checks fails, the result structure will include one ore more syntaxWarnings elements:
Each element contains a message code, which can be used to identify the problem. See the page syntax warnings for codes and explanations.
The domain test checks the validity of the domain name in the e-mail address. The system checks this be looking at the DNS record for the domain name. Possible result values are:
- 0: domain name does not exist
- 1: domain name exists
If the domain name is invalid, the system assumes a typo and tries to find similar domain names, which are offered to the user for selection. These domain names will be returned in the domainScores element, see domainScores.
The domainScores element consists of a list of similar sounding domain names ordered by a calculated score:
The higher the score the higher is the probability that this domain was intended. The system calculates the score by searching for similar domain names that are popular with e-mail marketing users.
If the domain exists, the system checks also whether the domain has a mail server defined. also by checking the DNS record. Possible result values are:
- 0: no mailserver found
- 1: mailserver found
Not all mailserver tell the truth about the existence of an e-mail address, mostly due to anti-spam measures. The mailserverDiagnosis element describes the response behaviour of the domains mailservers to SMTP requests:
- 0: unknown
- 1: server tells the truth
- 2: server answers always address exists
- 3: server answers always address does not exist
- 4: SMTP requests ended with errors (e.g, network errors, timeout, server errors)
The aggregated risk that an e-mail to this domain/address will be rejected, bounce:
- 0: there is a high bounce risk
- 1: there is a normal bounce risk
This test calculates the probabilty of the domain name. Domain name input in e-mail address forms often results in typos, and this check tries to find such errors. Possible result values are:
- 0: the domain name has a low probability
- 1: the domain name has a normal or high probability
If the test ends with a low probability result, thewn the system found more popular domains with similar names. These are returned in a domainScores element.
Unlike the previous tests, a negative result here will not cause a termination. Since the test relies on probabilities the assessment of the results is up to the user.
The address is verified by SMTP requests to one of the domains mailservers. All address quality check versions support the following results:
- 0: address does not exist
- 1: address does exist
- 2: address not verifiable
The enhanced address quality check provides one more result:
- -1: encountered temporary error, background check initiated
As mentioned above the enhanced quality check tries to work around temporary errors and will repeat the SMTP test periodically, to see if the error will go away. To query the result of the background checks just repeat the API call until a result other than -1 is returned for the address field.
This field provides further information about the address check. Possible values are:
- 0: the result in address was taken from the SMTP cache
- 1: the result in address is the result of a real SMTP check
This section describes the execution of the address quality check in detail. The check consist of the following steps:
checking the mailbox (local part) of the address for Unicode characters. Unicode is not allowed in the local part, so this routine only checks for typical, language-specific typos. Currently this includes only German umlauts. If found, they are converted to their usual ASCII counterparts: ü - ue, ä - ae, ö - oe, ß - ss. If at least one of these conversions happens, the syntax warning synm018 is added to the result, and the decoded ASCII value of the mailbox is stored in decoded.
checking the domain name of the address for Unicode characters. If Unicode is found, the domain is probably an IRI, and the domain name will be transformed according to the Punycode standard (RFC 3942). In this case the syntax warning synm017 is added to the result, and the decoded ASCII value of the domain name is stored in decoded.
if decoded contains a value this address will be used for all subsequent tests, else the original address.
checking the syntax according to the standards. As mentioned above, exotic cases, like localhost addresses or addresses with comments, will be rejected. It expects real addresses usable for e-mail transfer accross domains. If the snytax check fails, the test assumes an input error and result includes a list of similar sounding, popular domain names.
checking the syntax against the rulebase of extended syntax criteria. These are provider-specific syntax rules that can be changed over time.
checking the domain name. The component looks for a DNS (A) record for the domain. By default the component tries 2 times, with a timeout of 2 seconds each, before giving up. In case of failure the mentioned list of similar domain names is returned.
looking for a mailserver. Using the DNS information from the previous step, the component looks for MX entries for the domain. By default the component tries 2 times, with a timeout of 2 seconds each, before giving up. In case of failure the mentioned list of similar domain names is returned.
calculating the bounce risk from historic data contains an aggregation of previous e-mail transfers for various domains. If there were more than 80% bounces then the bounce risk is considered high, in all other cases it is normal.
calculating the domain name probability, if there are domains with similar names having higher address counts. If there are such domains, then an input error is assumed, and the list of similar domains is included in the result.
check the SMTP cache for the domains status. The SMTP cache tracks all SMTP test results and tries to find out whether a SMTP test for an address is appropriate. Some mailservers are set up to always answer positively or negatively when asked for a mail addresses. Others might have technical errors or be simply too slow. In all these cases it would be not useful to start a SMTP check. So the address check would be skipped. The SMTP cache contains also lists of exceptions for domains and mailservers that provide fixed answers.
check the SMTP cache for an already existing Greylisting result. This step only occurs in the enhanced quality check. Results of Greylisting background checks are temporarily stored because it could be computationally expensive to repeat the test. By default these results ares stored for 24 hours. If during the storage time one of the stored addresses is checked again, the SMTP results from the cache are used. Although these results are taken from the cache, the checked flag is set to 1 (really checked), because the cache content is the result from a recent SMTP check.
checking the address by contacting the mailserver. If the SMTP cache doesnt object, a SMTP conversation with one of the mailservers for the domain is initiated.
the response behaviour of the domains mailservers is diagnosed. If the address check was skipped due to the SMTP cache, the value from the cache will be returned, else the result of the SMTP check will be used