Data Extractors¶
DataExtractor
¶
Bases: ABC
Base class for all data extractors
Source code in extract_emails/data_extractors/data_extractor.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
name
abstractmethod
property
¶
Name of the data extractor, e.g. email, linkedin
get_data(page_source)
abstractmethod
¶
Extract needed data from a string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
page_source
|
str
|
webpage content |
required |
Returns:
Type | Description |
---|---|
set[str]
|
Set of data, e.g. {'email@email.com', 'email2@email.com'} |
Source code in extract_emails/data_extractors/data_extractor.py
12 13 14 15 16 17 18 19 20 21 |
|
EmailExtractor
¶
Bases: DataExtractor
Source code in extract_emails/data_extractors/email_extractor.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
get_data(page_source)
¶
Extract emails from a string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
page_source
|
str
|
webpage content |
required |
Returns:
Type | Description |
---|---|
set[str]
|
Set of emails, e.g. {'email@email.com', 'email2@email.com'} |
Source code in extract_emails/data_extractors/email_extractor.py
17 18 19 20 21 22 23 24 25 26 27 |
|
LinkedinExtractor
¶
Bases: DataExtractor
Source code in extract_emails/data_extractors/linkedin_extractor.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
get_data(page_source)
¶
Extract links to Linkedin profiles
Parameters:
Name | Type | Description | Default |
---|---|---|---|
page_source
|
str
|
webpage content |
required |
Returns:
Type | Description |
---|---|
set[str]
|
Set of urls, e.g. {'https://www.linkedin.com/in/venjamin-brant-73381ujy3u'} |
Source code in extract_emails/data_extractors/linkedin_extractor.py
16 17 18 19 20 21 22 23 24 25 26 27 28 |
|