java.lang.Object
page.codeberg.friedolyn.client.parser.Parser
A class that provides methods to parse HTML documents from the University of Jena's Friedolin system.
-
Field Summary
FieldsModifier and TypeFieldDescriptionprivate static String
The CSS style file that defines the design of the Friedolin website.static final @NonNull Pattern
A regular expression pattern to extract the "nodeID" parameter, which is used in a Friedolin URL to identify the student's course.static final @NonNull LinkedHashMap
<String, String> The static parameters that are required to access the grades table of a student. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic @NonNull String
enrichHtmlWithCss
(@NonNull String html) Adds theFRIEDOLIN_CSS
to the given HTML document.private static @NonNull page.codeberg.friedolyn.client.parser.PersonalInformation
extractBasicPersonalInformationFromGradesTable
(@NonNull org.jsoup.nodes.Document document) Extracts the student's personal information from the given HTML document, which is expected to be the grades table.extractDegreeAndCourse
(@NonNull org.jsoup.nodes.Element table) Extracts the degree and course from the given HTML table, if possible.private static @NonNull DegreeAndCourseHeading
extractDegreeAndCourseHeading
(@NonNull org.jsoup.nodes.Document document) Searches the given document for the table that contains the heading for the first section of the grades table.static @NonNull URL
extractGradesTableURL
(@NonNull String html, @NonNull String academicDegree) Extracts the URL to the grades table from the given HTML document, which is expected to be the response from the Friedolin server after accepting the obligation of use.static @NonNull String
extractStudentClearname
(@NonNull String html) Extracts the student's clear name from the given HTML document, which is expected to be a response from the Friedolin server.static @NonNull jakarta.mail.internet.InternetAddress
extractStudentEmailAddress
(@NonNull String html) Extracts the student's e-mail address from the given HTML document, which is expected to be a response from the Friedolin server.static @NonNull String
extractUrlSecretParameter
(@NonNull String html) Extracts the secret URL parameter required for certain Friedolin requests from the given HTML document.private static boolean
isAccountHeading
(@NonNull org.jsoup.nodes.Element table) Checks whether the given HTML table is a row that contains the heading of an account in the grades table.private static boolean
isExamTable
(@NonNull org.jsoup.nodes.Element table) Checks whether the given HTML table is anexam table
in the grades table.static boolean
isFailedLogin
(@NonNull String html) Checks if there's an error message indicating that the student's user name or password for Friedolin is incorrect.private static @NonNull FuzzyBoolean
isModuleTable
(@NonNull org.jsoup.nodes.Element table) Checks whether the given HTML table might be amodule table
in the grades table.private static boolean
isSubaccountHeading
(@NonNull String account, @NonNull org.jsoup.nodes.Element table) Checks whether the given HTML table is a row that contains the heading of a sub-account in the grades table.private static boolean
isTableHeading
(@NonNull org.jsoup.nodes.Element table) Checks whether the given HTML table is the heading of the grades table, i.e.static @NonNull String
normaliseDegree
(@NonNull String degree) Converts the given university degree from the notation used at the top of the grades table in thepersonal information table
to the notation used in thecourse list
.private static @NonNull AccountHeading
parseAccountHeading
(@NonNull org.jsoup.nodes.Element table) Parses the given HTML table and extracts the info about the account heading.private static int
parseEctsInAccountHeading
(@NonNull org.jsoup.nodes.Element table) Extracts theECTS
present in the given account heading.private static int
parseEctsInAccountOrSubAccountHeading
(@NonNull AccountHeadingKind accountHeadingKind, @NonNull org.jsoup.nodes.Element table) Extracts theECTS
present in the given account or sub-account heading.private static int
parseEctsInSubAccountHeading
(@NonNull org.jsoup.nodes.Element table) Extracts theECTS
present in the given sub-account heading.static @NonNull String
parseExamNumber
(@NonNull String examString) Extracts theexam number
from the given string, as found in the grades table:private static @NonNull Exam
parseExamTable
(@NonNull org.jsoup.nodes.Element table) Extracts theinformation
from the given exam table in the grades table.static @NonNull GradesTable
parseGradesTable
(@NonNull String html) Extracts all available information from the grades table of the student: the student's name the student's birth date the student's home town the student's degree the student's Matrikelnummer the student's postal address the student's phone number the student's e-mail address the student's grades for all courses availableprivate static @NonNull Module
parseModuleTable
(@NonNull org.jsoup.nodes.Element table) Extracts the module information from the given HTML module table included in the grades table.private static Exam.ExamStatus
parsePassedStatus
(@NonNull String passedString) Classifies the given exam-status string as aFuzzyBoolean
value.private static @NonNull SubaccountHeading
parseSubAccountHeading
(@NonNull org.jsoup.nodes.Element table) Extracts theinformation
from the given sub-account heading in the grades table.
-
Field Details
-
FRIEDOLIN_CSS
The CSS style file that defines the design of the Friedolin website. This is required to make the downloaded HTML documents look like they would in a browser. The CSS file is loaded from the resources of this project. -
GET_GRADES_TABLE_PARAMETERS
The static parameters that are required to access the grades table of a student. In addition to these parameters, two dynamic parameters are required: "nodeID" and "asi". The "nodeID" is a unique identifier for the student's course and is extracted from the HTML of the Friedolin server's response. The "asi" is a secret parameter that is also extracted from the HTML of the Friedolin server's response. -
GET_GRADES_TABLE_NODE_ID_PATTERN
A regular expression pattern to extract the "nodeID" parameter, which is used in a Friedolin URL to identify the student's course.- Implementation Note:
-
-
Constructor Details
-
Parser
public Parser()
-
-
Method Details
-
extractUrlSecretParameter
@NonNull public static @NonNull String extractUrlSecretParameter(@NonNull @NonNull String html) throws IllegalArgumentException Extracts the secret URL parameter required for certain Friedolin requests from the given HTML document.- Parameters:
html
- The HTML document to extract the secret from. Should be a response from the Friedolin server. It is required that the user was logged in when the response was generated, because otherwise the secret parameter won't be present anywhere.- Returns:
true
if the secret was successfully extracted,false
otherwise- Throws:
IllegalArgumentException
- Implementation Note:
-
extractStudentEmailAddress
@NonNull public static @NonNull jakarta.mail.internet.InternetAddress extractStudentEmailAddress(@NonNull @NonNull String html) throws IllegalArgumentException Extracts the student's e-mail address from the given HTML document, which is expected to be a response from the Friedolin server. It is also expected that the user was logged in when the response was generated, because otherwise the e-mail address won't be present anywhere.- Parameters:
html
- The HTML document to extract the e-mail address from.- Returns:
- The student's e-mail address as indicated in the University of Jena's Friedolin system.
- Throws:
IllegalArgumentException
- If the student's e-mail address could not be determined from the given HTML.- Implementation Note:
As Friedolin explains in the screenshot, the primary e-mail address is the one that is labelled with “RZ” which stands for “Rechenzentrum” and means “data center”.
-
extractStudentClearname
@NonNull public static @NonNull String extractStudentClearname(@NonNull @NonNull String html) throws IllegalArgumentException Extracts the student's clear name from the given HTML document, which is expected to be a response from the Friedolin server. It is also expected that the user was logged in when the response was generated, because otherwise the clear name won't be present anywhere.- Parameters:
html
- The HTML document to extract the clear name from.- Returns:
- The student's clear name as indicated in the University of Jena's Friedolin system.
- Throws:
IllegalArgumentException
- If the student's clear name could not be determined from the given HTML.- Implementation Note:
-
extractGradesTableURL
@NonNull public static @NonNull URL extractGradesTableURL(@NonNull @NonNull String html, @NonNull @NonNull String academicDegree) throws IllegalArgumentException Extracts the URL to the grades table from the given HTML document, which is expected to be the response from the Friedolin server after accepting the obligation of use.- Parameters:
html
- The HTML document to extract the URL from.academicDegree
- The academic degrees of the student, which is used to identify the correct grade table among multiple ones when the student is enrolled in multiple courses.- Returns:
- The URL to the grades table belonging to the student's course.
- Throws:
IllegalArgumentException
- If the URL to the grades table could not be determined from the given HTML.- Implementation Note:
-
parseGradesTable
@NonNull public static @NonNull GradesTable parseGradesTable(@NonNull @NonNull String html) throws IllegalArgumentException Extracts all available information from the grades table of the student:- the student's name
- the student's birth date
- the student's home town
- the student's degree
- the student's Matrikelnummer
- the student's postal address
- the student's phone number
- the student's e-mail address
- the student's grades for all courses available
- Parameters:
html
- The grades table. Should be the result fromFriedolinClient.fetchGrades(String...)
- Throws:
IllegalArgumentException
- If any of the required information could not be extracted from the grades table.
-
extractBasicPersonalInformationFromGradesTable
@NonNull private static @NonNull page.codeberg.friedolyn.client.parser.PersonalInformation extractBasicPersonalInformationFromGradesTable(@NonNull @NonNull org.jsoup.nodes.Document document) throws IllegalArgumentException Extracts the student's personal information from the given HTML document, which is expected to be the grades table.- Parameters:
document
- The HTML document to extract the personal information from. Should be the result ofFriedolinClient.fetchGrades(String...)
.- Returns:
- The student's personal information, as indicated in the University of Jena's Friedolin system.
- Throws:
IllegalArgumentException
- If the given document does not contain the student's personal information in the expected format.- Implementation Note:
Stammdaten des Studierenden Name des Studierenden Edward Snowden Geburtsdatum und -ort 21.06.1983 in Elizabeth City (angestrebter) Abschluss [82] Bachelor of Science Matrikelnummer 788365 Anschrift 9800 Savage Road Suite 6272, Fort Meade, MD 20755-6000 Telefon 3016886524 EMail gedward.snowden@uni-jena.de
-
normaliseDegree
Converts the given university degree from the notation used at the top of the grades table in thepersonal information table
to the notation used in thecourse list
. For example, it converts “[82] Bachelor of Science” to “Abschluss 82 Bachelor of Science”.- Parameters:
degree
- The degree in the notation used in the personal information table.- Returns:
- The degree in the notation used in the course list.
-
extractDegreeAndCourse
@NonNull private static @NonNull Optional<String[]> extractDegreeAndCourse(@NonNull @NonNull org.jsoup.nodes.Element table) Extracts the degree and course from the given HTML table, if possible. The HTML element is expected to be the heading for the next part of the grades table, which contains the grades for a specific degree and course.- Parameters:
table
- The HTML table to extract the degree and course from.- Returns:
- The first element of the array is the degree, the second element is the course. If the degree and course could not be extracted, an empty optional is returned.
- Implementation Note:
-
extractDegreeAndCourseHeading
@NonNull private static @NonNull DegreeAndCourseHeading extractDegreeAndCourseHeading(@NonNull @NonNull org.jsoup.nodes.Document document) throws IllegalArgumentException Searches the given document for the table that contains the heading for the first section of the grades table. The heading is expected to include the degree and course of the student.- Parameters:
document
- The document to search in. Should be thegrades table
returned by the Friedolin server.- Returns:
- The degree and course extracted from the heading, along with the table that contains the heading.
- Throws:
IllegalArgumentException
- If the degree-and-course heading could not be extracted from the document.- Implementation Note:
- See
extractDegreeAndCourse(Element)
for implementation details.
-
isTableHeading
private static boolean isTableHeading(@NonNull @NonNull org.jsoup.nodes.Element table) Checks whether the given HTML table is the heading of the grades table, i.e. the table that contains the titles for the columns of all following tables (you could call it a “legend” like in a map).- Parameters:
table
- The HTML table to check.- Returns:
true
if the given table is the heading of the grades table,false
otherwise.
-
isAccountHeading
private static boolean isAccountHeading(@NonNull @NonNull org.jsoup.nodes.Element table) Checks whether the given HTML table is a row that contains the heading of an account in the grades table.- Parameters:
table
- The HTML table from the grades table to check.- Returns:
true
if the given table is an account heading in the grades table,false
otherwise.- Implementation Note:
Accounts are distinguished from sub-accounts by the background colour of the cells in the row: For accounts, the background colour is darker (#b2cde6) than for sub-accounts (#dbe2e7).
-
isSubaccountHeading
private static boolean isSubaccountHeading(@NonNull @NonNull String account, @NonNull @NonNull org.jsoup.nodes.Element table) Checks whether the given HTML table is a row that contains the heading of a sub-account in the grades table.- Parameters:
account
- The account number of the account that the sub-account belongs to.table
- The HTML table (from the grades table) suspected to be a sub-account heading.- Returns:
true
if the given table is a sub-account heading in the grades table,false
otherwise.- Implementation Note:
Sub-accounts are distinguished from accounts by the background colour of the cells in the row: For sub-accounts, the background colour is lighter (#dbe2e7) than for accounts (#b2cde6).
-
parseAccountHeading
@NonNull private static @NonNull AccountHeading parseAccountHeading(@NonNull @NonNull org.jsoup.nodes.Element table) throws IllegalArgumentException Parses the given HTML table and extracts the info about the account heading.- Parameters:
table
- The HTML table to parse. Should be an account heading in thegrades table
- Returns:
- The account heading information.
- Throws:
IllegalArgumentException
- If the given table is not an account heading in the grades table.- Implementation Note:
-
parseEctsInAccountOrSubAccountHeading
private static int parseEctsInAccountOrSubAccountHeading(@NonNull @NonNull AccountHeadingKind accountHeadingKind, @NonNull @NonNull org.jsoup.nodes.Element table) throws IllegalArgumentException Extracts theECTS
present in the given account or sub-account heading.- Parameters:
accountHeadingKind
- Whether the given table is an account or a sub-account heading.table
- The HTML table (account or sub-account heading) to extract the ECTS from.- Returns:
- The ECTS number from the given account or sub-account heading.
- Throws:
IllegalArgumentException
- If no ECTS were found in the given account or sub-account heading.- Implementation Note:
The ECTS for accounts are in the fifth cell of the row, while the ECTS for sub-accounts are in the sixth cell of the row.
-
parseEctsInAccountHeading
private static int parseEctsInAccountHeading(@NonNull @NonNull org.jsoup.nodes.Element table) throws IllegalArgumentException Extracts theECTS
present in the given account heading.- Parameters:
table
- The HTML table (account heading) to extract the ECTS from.- Returns:
- The ECTS number from the given account heading.
- Throws:
IllegalArgumentException
- If no ECTS were found in the given account heading.- Implementation Note:
The ECTS for accounts are in the fifth cell of the row.
-
parseEctsInSubAccountHeading
private static int parseEctsInSubAccountHeading(@NonNull @NonNull org.jsoup.nodes.Element table) throws IllegalArgumentException Extracts theECTS
present in the given sub-account heading.- Parameters:
table
- The HTML table (sub-account heading) to extract the ECTS from.- Returns:
- The ECTS number from the given sub-account heading.
- Throws:
IllegalArgumentException
- If no ECTS were found in the given sub-account heading.- Implementation Note:
The ECTS for sub-accounts are in the sixth cell of the row.
-
parseSubAccountHeading
@NonNull private static @NonNull SubaccountHeading parseSubAccountHeading(@NonNull @NonNull org.jsoup.nodes.Element table) throws IllegalArgumentException Extracts theinformation
from the given sub-account heading in the grades table.- Parameters:
table
- The HTML table to extract the sub-account heading from.- Returns:
- The sub-account heading information.
- Throws:
IllegalArgumentException
- If the given table is not a sub-account heading in the grades table.- Implementation Note:
Sub-accounts are distinguished from accounts by the background colour of the cells in the row: For sub-accounts, the background colour is lighter (#dbe2e7) than for accounts (#b2cde6).
-
isModuleTable
@NonNull private static @NonNull FuzzyBoolean isModuleTable(@NonNull @NonNull org.jsoup.nodes.Element table) Checks whether the given HTML table might be amodule table
in the grades table.- Parameters:
table
- The HTML table to check.- Returns:
FuzzyBoolean.FALSE
if the given table is definitely not a module tableFuzzyBoolean.MAYBE
if the given table might be a module tableFuzzyBoolean.TRUE
: never, because module tables can't be recognised with certainty
- See Also:
-
parseModuleTable
@NonNull private static @NonNull Module parseModuleTable(@NonNull @NonNull org.jsoup.nodes.Element table) throws IllegalArgumentException Extracts the module information from the given HTML module table included in the grades table.- Parameters:
table
- The HTML table to extract the module information from. Must be from the grades table.- Returns:
- the module code
- the module title
- the semester
- the grade
- the passed status
- the ECTS
- Throws:
IllegalArgumentException
- If the given table is not a module table in the grades table.- See Also:
- Implementation Note:
-
parsePassedStatus
@NonNull private static Exam.ExamStatus parsePassedStatus(@NonNull @NonNull String passedString) throws IllegalArgumentException Classifies the given exam-status string as aFuzzyBoolean
value.- Parameters:
passedString
- Whether the exam has been passed (bestanden
), failed (nicht bestanden
) or is pending (angemeldet
).- Returns:
- The
Exam.ExamStatus
value represented by the given string. - Throws:
IllegalArgumentException
- If the given string is neitherbestanden
nornicht bestanden
norangemeldet
.
-
isExamTable
private static boolean isExamTable(@NonNull @NonNull org.jsoup.nodes.Element table) Checks whether the given HTML table is anexam table
in the grades table.- Parameters:
table
- The HTML table to check.- Returns:
true
if the given table is an exam table in the grades table,false
otherwise.- See Also:
-
parseExamTable
@NonNull private static @NonNull Exam parseExamTable(@NonNull @NonNull org.jsoup.nodes.Element table) throws IllegalArgumentException Extracts theinformation
from the given exam table in the grades table.- Parameters:
table
- The HTML table to extract the exam information from. Must be from the grades table.- Returns:
- The embedded exam information.
- Throws:
IllegalArgumentException
- If the given table is not an exam table in the grades table.- See Also:
- Implementation Note:
P-Nr.: 65441 Fortgeschrittenes Programmierpraktikum: Projekt Wintersemester 22/23 1,0 bestanden 1 Wolfram Amme 22.02.2023
-
parseExamNumber
@NonNull public static @NonNull String parseExamNumber(@NonNull @NonNull String examString) throws IllegalArgumentException Extracts theexam number
from the given string, as found in the grades table:- Parameters:
examString
- The exam-number string to parse. Must be in the formatP-Nr.: 123456
.- Returns:
- The exam number extracted from the given string (e.g.
123456
). - Throws:
IllegalArgumentException
- If the exam number could not be extracted from the given string.
-
enrichHtmlWithCss
@NonNull public static @NonNull String enrichHtmlWithCss(@NonNull @NonNull String html) throws IllegalArgumentException Adds theFRIEDOLIN_CSS
to the given HTML document. The CSS is added to the document by replacing the link element whose `href` attribute includes "/qisserver/pub/QISDesign_FSU.css" with a style element containing the CSS.- Parameters:
html
- The HTML document to enrich with the CSS.- Returns:
- The enriched HTML document.
- Throws:
IllegalArgumentException
- If the CSS link could not be found in the HTML document.
-
isFailedLogin
Checks if there's an error message indicating that the student's user name or password for Friedolin is incorrect.- Parameters:
html
- The Friedolin HTML response to check.- Returns:
true
if the error message is present (login failed).false
if no error message is present (this is not a guarantee that the login was successful).
-