Skip to main content
Version: 1.3

utils.social

A namespace that contains various utilities to help you extract social handles from text, URLs and and HTML documents.

Example usage:

const Apify = require('apify');

const emails = Apify.utils.social.emailsFromText('alice@example.com bob@example.com');

social.LINKEDIN_REGEX

Regular expression to exactly match a single LinkedIn profile URL. It has the following form: /^...$/i and matches URLs such as:

https://www.linkedin.com/in/alan-turing
en.linkedin.com/in/alan-turing
linkedin.com/in/alan-turing

The regular expression does NOT match URLs with additional subdirectories or query parameters, such as:

https://www.linkedin.com/in/linus-torvalds/latest-activity

Example usage:

if (Apify.utils.social.LINKEDIN_REGEX.test('https://www.linkedin.com/in/alan-turing')) {
console.log('Match!');
}

social.LINKEDIN_REGEX_GLOBAL

Regular expression to find multiple LinkedIn profile URLs in a text or HTML. It has the following form: /.../ig and matches URLs such as:

https://www.linkedin.com/in/alan-turing
en.linkedin.com/in/alan-turing
linkedin.com/in/alan-turing

If the profile URL contains subdirectories or query parameters, the regular expression extracts just the base part of the profile URL. For example, from text such as:

https://www.linkedin.com/in/linus-torvalds/latest-activity

the expression extracts just the following base URL:

https://www.linkedin.com/in/linus-torvalds

Example usage:

const matches = text.match(Apify.utils.social.LINKEDIN_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} LinkedIn profiles found!`);

social.INSTAGRAM_REGEX

Regular expression to exactly match a single Instagram profile URL. It has the following form: /^...$/i and matches URLs such as:

https://www.instagram.com/old_prague
www.instagram.com/old_prague/
instagr.am/old_prague

The regular expression does NOT match URLs with additional subdirectories or query parameters, such as:

https://www.instagram.com/cristiano/followers

Example usage:

if (Apify.utils.social.INSTAGRAM_REGEX.test('https://www.instagram.com/old_prague')) {
console.log('Match!');
}

social.INSTAGRAM_REGEX_GLOBAL

Regular expression to find multiple Instagram profile URLs in a text or HTML. It has the following form: /.../ig and matches URLs such as:

https://www.instagram.com/old_prague
www.instagram.com/old_prague/
instagr.am/old_prague

If the profile URL contains subdirectories or query parameters, the regular expression extracts just the base part of the profile URL. For example, from text such as:

https://www.instagram.com/cristiano/followers

the expression extracts just the following base URL:

https://www.instagram.com/cristiano

Example usage:

const matches = text.match(Apify.utils.social.INSTAGRAM_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} Instagram profiles found!`);

social.TWITTER_REGEX

Regular expression to exactly match a single Twitter profile URL. It has the following form: /^...$/i and matches URLs such as:

https://www.twitter.com/apify
twitter.com/apify

The regular expression does NOT match URLs with additional subdirectories or query parameters, such as:

https://www.twitter.com/realdonaldtrump/following

Example usage:

if (Apify.utils.social.TWITTER_REGEX.test('https://www.twitter.com/apify')) {
console.log('Match!');
}

social.TWITTER_REGEX_GLOBAL

Regular expression to find multiple Twitter profile URLs in a text or HTML. It has the following form: /.../ig and matches URLs such as:

https://www.twitter.com/apify
twitter.com/apify

If the profile URL contains subdirectories or query parameters, the regular expression extracts just the base part of the profile URL. For example, from text such as:

https://www.twitter.com/realdonaldtrump/following

the expression extracts only the following base URL:

https://www.twitter.com/realdonaldtrump

Example usage:

const matches = text.match(Apify.utils.social.TWITTER_REGEX_STRING);
if (matches) console.log(`${matches.length} Twitter profiles found!`);

social.FACEBOOK_REGEX

Regular expression to exactly match a single Facebook profile URL. It has the following form: /^...$/i and matches URLs such as:

https://www.facebook.com/apifytech
facebook.com/apifytech
fb.com/apifytech
https://www.facebook.com/profile.php?id=123456789

The regular expression does NOT match URLs with additional subdirectories or query parameters, such as:

https://www.facebook.com/apifytech/photos

Example usage:

if (Apify.utils.social.FACEBOOK_REGEX.test('https://www.facebook.com/apifytech')) {
console.log('Match!');
}

social.FACEBOOK_REGEX_GLOBAL

Regular expression to find multiple Facebook profile URLs in a text or HTML. It has the following form: /.../ig and matches URLs such as:

https://www.facebook.com/apifytech
facebook.com/apifytech
fb.com/apifytech

If the profile URL contains subdirectories or query parameters, the regular expression extracts just the base part of the profile URL. For example, from text such as:

https://www.facebook.com/apifytech/photos

the expression extracts only the following base URL:

https://www.facebook.com/apifytech

Example usage:

const matches = text.match(Apify.utils.social.FACEBOOK_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} Facebook profiles found!`);

social.YOUTUBE_REGEX

Regular expression to exactly match a single Youtube video URL. It has the following form: /^...$/i and matches URLs such as:

https://www.youtube.com/watch?v=kM7YfhfkiEE
https://youtu.be/kM7YfhfkiEE

Example usage:

if (Apify.utils.social.YOUTUBE_REGEX.test('https://www.youtube.com/watch?v=kM7YfhfkiEE')) {
console.log('Match!');
}

social.YOUTUBE_REGEX_GLOBAL

Regular expression to find multiple Youtube video URLs in a text or HTML. It has the following form: /.../ig and matches URLs such as:

https://www.youtube.com/watch?v=kM7YfhfkiEE
https://youtu.be/kM7YfhfkiEE

Example usage:

const matches = text.match(Apify.utils.social.YOUTUBE_REGEX_GLOBAL);
if (matches) console.log(`${matches.length} Youtube videos found!`);

social.EMAIL_REGEX

Regular expression to exactly match a single email address. It has the following form: /^...$/i.


social.EMAIL_REGEX_GLOBAL

Regular expression to find multiple email addresses in a text. It has the following form: /.../ig.


social.emailsFromText(text)

The function extracts email addresses from a plain text. Note that the function preserves the order of emails and keep duplicates.

Parameters:

  • text: string - Text to search in.

Returns:

Array<string> - Array of emails addresses found. If no emails are found, the function returns an empty array.


social.emailsFromUrls(urls)

The function extracts email addresses from a list of URLs. Basically it looks for all mailto: URLs and returns valid email addresses from them. Note that the function preserves the order of emails and keep duplicates.

Parameters:

  • urls: Array<string> - Array of URLs.

Returns:

Array<string> - Array of emails addresses found. If no emails are found, the function returns an empty array.


social.phonesFromText(text)

The function attempts to extract phone numbers from a text. Please note that the results might not be accurate, since phone numbers appear in a large variety of formats and conventions. If you encounter some problems, please file an issue.

Parameters:

  • text: string - Text to search the phone numbers in.

Returns:

Array<string> - Array of phone numbers found. If no phone numbers are found, the function returns an empty array.


social.phonesFromUrls(urls)

Finds phone number links in an array of URLs and extracts the phone numbers from them. Note that the phone number links look like tel://123456789, tel:/123456789 or tel:123456789.

Parameters:

  • urls: Array<string> - Array of URLs.

Returns:

Array<string> - Array of phone numbers found. If no phone numbers are found, the function returns an empty array.


social.parseHandlesFromHtml(html, [data])

The function attempts to extract emails, phone numbers and social profile URLs from a HTML document, specifically LinkedIn, Twitter, Instagram and Facebook profile URLs. The function removes duplicates from the resulting arrays and sorts the items alphabetically.

Note that the phones field contains phone numbers extracted from the special phone links such as [call us](tel:+1234556789) (see social.phonesFromUrls()) and potentially other sources with high certainty, while phonesUncertain contains phone numbers extracted from the plain text, which might be very inaccurate.

Example usage:

const Apify = require('apify');

const browser = await Apify.launchPuppeteer();
const page = await browser.newPage();
await page.goto('http://www.example.com');
const html = await page.content();

const result = Apify.utils.social.parseHandlesFromHtml(html);
console.log('Social handles:');
console.dir(result);

Parameters:

  • html: string - HTML text
  • [data]: * | null = - Optional object which will receive the text and $ properties that contain text content of the HTML and cheerio object, respectively. This is an optimization so that the caller doesn't need to parse the HTML document again, if needed.

Returns:

SocialHandles - An object with the social handles.