Grubisic_Zitko_Faculty of Science_ Enhancing ACNL Tutor_2022_v4

Published on Aug 22, 2022

Scene 1 (0s)

[Audio] Hello everyone, I am Branko. My colleague and I will present our accomplishment in this reporting period. So, lets start..

Scene 2 (17s)

[Audio] The main objective is to provide an enhanced version of the AC&NL Tutor that combines research results and application of concept map mining, hypermedia and quantities enriched domain knowledge, knowledge assessment, predictive models in educational data mining and learning analytics dashboard. We will develop an effective and efficient intelligent tutoring system that can be used by training populations of the Department of the Navy..

Scene 3 (48s)

[Audio] Our Technical Approach includes: Knowledge extraction and domain knowledge enriched with hypermedia, negation and cardinality Summarization in concept map mining algorithms and courseware adaptivity Learner model initialization and predictive learner modelling Knowledge assessment Learning analytics dashboard Empirical assessment.

Scene 4 (1m 17s)

[Audio] In this reporting period, we have been intensively working on accomplishing the following: Extracted semantic hypergraph Domain knowledge enriched with hypermedia Gold standard corpus for semantic hypergraph annotation Systematic review of Bayesian Knowledge Tracing learner probabilistic models Usability evaluation of the AC&NL Tutor * Evaluation of learning analytics dashboard for students ( LAD-S) prototype.

Scene 5 (1m 51s)

[Audio] We define our knowledge extraction process as two-step process where each process is descibed as a pipeline whose pipes are interpteted as a dependent subprocess. First, NLP analysis is done over the text and produces syntactical and semantical annotations. Most pipes in NLP process are from the third party, but for our needs we have built two more. Second step is semantic hypergraph extraction where syntesis of annotated text is done. All the pipes in this step are our accomplishments. Knowledge extraction from unstructured data, in this Our previous knowledge representation was layered where each layer was built upon previous one. At the top we had language layer which integrates standard NLP annotations of the source sentence into a knowledge graph At the bottom we had domain layer which is represented as an concept map As we can see, both are graph based representations. We plan to replace all representations with the unique one..

Scene 6 (3m 0s)

[Audio] Here is shown toy sentence and products of knowledge extraction that I will try to gradually desribe in following slides. At the top is our toy sentence, in the center is annotated sentence and at the bottom is semantic hypergraph of that sentence..

Scene 7 (3m 27s)

[Audio] There are so many pipes in NLP and I will try to picture them on a example sentence and show some visualisations we have used. Tagger is definitely basic pipe which makes basic part of speech tagging. Lemmatizer simply gives basic form or lemma of each word, while named entity recognizer classify spans (which are sequence of words inside sentence) as an some named entity or not. In this example, Patric is classified as person and IBM is classified as an organization..

Scene 8 (4m 1s)

[Audio] Dependency parsing is probably the most important pipe dependent on tagger. Its task is to put each word in syntactical relation to the other word which we call head of that word. For example proper noun " Patrick" is nominal subject of the head word "knew", " These binary relations are the most exploited in extraction of semantic hypergraph..

Scene 9 (4m 27s)

[Audio] Other annotations, especially POS tags also have great influence in extraction of hypergraph. Here are all NL annotations (except dependency: we have seen dependency on previous slide) Most of them are integrated into unique knowledge representation called semantic hypergraph. Most of them are on the word or sequence of words (which is called span) and don't describe any inter-word relations. Coreference resolution and SRL are exceptions..

Scene 10 (5m 10s)

[Audio] Here we see visualization of coreference resolution and SRL. Coreference resolution relates elements of the sentence. It clusters mentions that refer to the same thing. Usually, pronouns refers to some noun phrase. In this example pronoun "he" refers to the entity " Patrick", while pronoun "them" refers to " IBM's plans". Semantic Role Labeling also relates sentence predicates to its arguments. In this example, predicate "knew" has two arguments "Patrick" as Argument 0 and span "about IBM's plans" as Argument 1. These arguments does not tell us anything if the predicate sense is not disambiguated. So, in combination with ROLESET know. 01 we know that Patrick is agent of action, while about IBM's plans is patient of the action. Out of all models used in NLP we have built predicate sense and word sense disambiguation..

Scene 11 (6m 23s)

[Audio] Predicate sense disambiguation was usually subtask of semantic role labelers, but current state-of-the-art models doesn't deal with that. They only indentify and classify argument roles of the predicate. That is the reason why we have built model for predicate sense disambiguation. Our model is a hybrid model containing machine learnt classifier and production rules..

Scene 12 (6m 50s)

[Audio] Features used for training the model are lemma, part-of-speech, dependency and existing semantic roles. We have trained classifier for each predicate role and corpus for training is OntoNotes 5. Rule-part of the model deals with proper annotation of light verb constructor such as "take a break" or "take part" It also deals with non-existing roles which haven't appeared in OntoNotes 5. Evaluation is done on token, clause and whole sentence level..

Scene 13 (7m 23s)

[Audio] Another model that we prepared is intended for word sense disambiguation. We have fine-tunned GlossBERT model which uses context – gloss pairs for features and binary classifier for determining appropriate sense. This model was trained on SemCor 3.0 corpus and compared to other sense disambiguation models over different test sets. In average, GlossBERT score is the highest..

Scene 14 (7m 56s)

[Audio] In semantic hyperedge extraction we have build two models. First is atom tagger. Atoms in semantic hypergraph are vertices. Hyperedge can contain arbitrary number of atoms and another hyperedges. Atoms correspond to words in the text. Actually, atom in semantic hypergraph is a word and its specification which is build upon previous NL annotations. For example, " patrick" is defined as an proper concept since it is a proper noun. it is in singular form and it is a person. Obviously these informations are gathered from POS label and Entity labels of the text. Another example is verb "knew" whose specification tells us that it is a predicate of declarative clause. Since predicate "knew" is semantically related to two other words, their arguments are alse used in specification. Moreover, morphology of the verb is also specified. Here we see that knew is a finite verb in past tense. For predicates, specification is, beside POS labels, gathered from SRL and from dependency..

Scene 15 (9m 17s)

[Audio] Type and subtype specifications of the atom is produced by the rules. In our atom tagger we defined 29 rules whose LHS uses POS and dependency labels to detemine the type and the subtype of the atom. The most important property of the type is that it can be inferred from the types of its elements..

Scene 16 (9m 39s)

[Audio] Great propery of hyperedges is its simple inferrence mechanism based on hyperedge patterns. Here we can briefly see how hyperedges are created and created to form another hyperedge in our parser. First hyperedge combines ibm and plans into one concept. The inferred type of this pattern is concept. Then this hyperedge became a part of another concept, and its inferred type is relation specifier. Finally predicate "knew" is combined with atom " patrick" and previous hyperedge to form new hyperedge defined as relation. Relation hyperedges in its basic form defines clauses. Relational hyperedges can be in conjunction to form complex relational hyperedges, like it is done in the last row..

Scene 17 (10m 31s)

[Audio] Our hyperedge parser is bottom-up parser. It traverse the dependency tree by the depth and prioritize dependency relations. We defined 21 rules for combining atoms and hyperedges. Dependency labels and part-of-speech labels are in LHS of the rules. For some hyperedges we had to track which atoms have been visited..

Scene 18 (10m 57s)

[Audio] One of the reasons of putting word sense disambiguation in NLP pipeline is to connect text elements with structured information from DBPedia. This structured information includes short descriptions of given content but also multimedia informations. We have already connected around 7000 WordNet entries to DBpedia but that is around 5% of the whole database..

Scene 19 (11m 23s)

[Audio] To evaluate knowledge extraction, we have almost built dataset which contains all NL annotations of the according SH elements. We call our dataset NL2SH, or Natural Language to Semantic Hypergraph. This dataset contains 664 sentences and 5616 different words. There are 1265 clauses, 254 entities, 178 coreferences. But we haven't put all the synsets yet. Those sentences have produced 3844 hypergraphs, and in this number are counted all subedges..

Scene 20 (12m 5s)

[Audio] With this dataset we wanted to cover as many grammatically diverse sentences as possible. Other goal was to cover all NL annotations. We can say that we succeeded even so some annotations mentioned in guidelines are not covered. For example, these 5 POS tags are deprecated and 4 SRL arguments are extremely rare, and it was hard to construct sentences which will cover them..

Scene 21 (12m 34s)

[Audio] Here we show most common hyperedge patterns in our dataset. Out of 3844 different hyperedges, including recursive ones, there are 1436 patterns, so, the ratio is 2.676 hyperedges per pattern. Most patterns are modifiers of concept or predicates, but we have some relations and relational specifier..

Scene 22 (13m 14s)

[Audio] There are 664 sentences, therefore dataset contains 664 primary hyperedges. 80 patterns are used to describe those hyperedges, so the ratio is around 8 primary hyperedges per pattern. The most common primary hyperedge starts with predicate and has two concepts, which are subject and the object of the sentence..

Scene 23 (14m 15s)

[Audio] We have evaluated our knowledge extraction, rather we evaluated NLP and SHE pipelines over our dataset. Third party models and our models are used..

Scene 24 (14m 34s)

[Audio] Accuracy of overall knowledge extraction is 83.9 %. Out of 664 sentences, 557 of them produced correct primary hyperedge. Atom accuracy is 97.4%. Other subedges accuracy is 91.5%..

Scene 25 (15m 20s)

[Audio] Preliminary error analysis confirms that specific errors in NLP triggers errors in SHE. There is a strong correlation between POS and dependency annotations from the one side and atom and hyperedge from the other side..

Scene 26 (16m 19s)

[Audio] It has been 25 years of Bayesian Knowledge Tracing research. The vanilla BKT model is still a representative Bayesian network-based approach and a widely employed probabilistic student model. Over the years, various improvements have been proposed in the literature, mostly outperforming the vanilla model, but their limited availability negatively affected their further application. Even the latest research on deep learning-based knowledge tracing, compared only to the vanilla model, revealed that just enabling the forgetting parameter led to similar results as the neural network-based model. Because of the specificities of the educational platforms and even subsets of data used to train the models, there was no possibility to compare the achieved performance results of the enhanced models. Since its introduction in 1995, there has been no systematic review of all BKT enhancements. Since the researchers typically compared the enhanced BKT models to the vanilla model, the unique set of vanilla assumptions was used as appropriate review criteria. Since the researchers typically compared the enhanced BKT models to the vanilla model, we found the vanilla assumptions as appropriate review criteria. To address the RQ1, we elaborated on various enhancements of the vanilla BKT model, considering the vanilla assumptions summarized in the next table..

Scene 27 (17m 45s)

[Audio] The vanilla BKT assumes that all students can acquire expertise in the domain knowledge within a reasonable time frame. Also, the model assumes independent knowledge components that will be gradually presented to students using a mastery learning approach. Each knowledge component includes a set of equally difficult questions used during the knowledge inference process. Based on the Hidden Markov Model, the theory of knowledge inference consists of the knowledge node with the binary learned and unlearned state and the performance node with the binary correct and incorrect state. The model follows the no-forgetting paradigm by omitting the transition from the learned to the unlearned state. The set of the four BKT parameters is determined using the expert-based and empirical parameter fitting procedure. The overall tutoring process will finish when the BKT learning parameter reaches the probability of mastery threshold set at 0.95. Although a student may have multiple attempts to answer the question in the educational platform, the vanilla model counts only the first attempt..

Scene 28 (18m 54s)

[Audio] The results in the Table indicate that 16 enhanced BKT models improved the vanilla BKT network architecture and the parameter estimation approach. This combination of assumptions is somewhat dependent because any enhancement in the network architecture implies a change in parameters and related estimation approach. Moreover, the A5 assumption of the network architecture often combines with other ones frequently implemented as a change in the vanilla network architecture, e. g. the new node representing difficulty is added to the Hidden Markov Model to introduce a different level of question difficulty. In addition, the other 15 BKT models improved the single assumption of the parameter estimation approach..

Scene 29 (19m 42s)

[Audio] Besides the Intelligent Tutoring Systems, we found the application of the BKT model enhancements in MOOCs, game-based platforms, and online learning platforms in the field of human resources. The research on the BKT enhancements typically included the Cognitive Tutor and the ASSISTments. Other educational platforms with more than 2 applications were Massive Open Online Courses and the simulated datasets. The MOOC environments included the edX, the Coursera, the Khan Academy and the Junyi Academy. As for the domain, the examined datasets related to Math ( 32 applications), Language learning ( 8 applications), Computer Science and Programming ( 7 applications), Genetics, Science, Physics, Medicine and Chemistry (less than 5 application)..

Scene 30 (20m 35s)

[Audio] Regarding the used performance measures, the most frequent ones included the AUC ROC measure and the RMSE. These performance measures are frequently used metrics for classification tasks in the machine learning field..

Scene 31 (20m 52s)

[Audio] Since the AC&NL Tutor environment includes the SAAT Knowledge Extraction Tool used by teachers and the Tutomat Intelligent Tutoring System used by learners, both user groups participated in the usability study. Concerning the ways of obtaining the quantitative and qualitative measures, several measuring instruments were used, including attitude questionnaires for users' subjective satisfaction with the learning environment, specifically the System Usability Scale instrument for individual user experience, specifically the Geneva Emotion Wheel instrument, as well as for the evaluation of the learning experience. During the planning of the usability study, we developed task-based scenarios as sequences of typical tasks for both user groups. These scenarios and related tasks covered specific aspects of the AC&NL Tutor to be evaluated, offering a similar opportunity to accomplish the tasks for both user groups. The final domain knowledge used in the usability study included a short story about the history of coffee. To test the system support, assigned tasks and time intervals for both user groups, and the clarity of measuring instruments, we have conducted a pilot testing with four graduate students of the Faculty of Science, University of Split. During scenario group-based evaluation sessions that lasted for approximately 90 minutes, 26 users participated in the usability study. Three evaluators led the sessions and started with a video presentation of the AC&NL Tutor environment. After the introduction, both user groups filled out the pre-experiment questionnaires. Then, the learner group filled out the pre-test and both user groups started to use the AC&NL Tutor by following the prepared scenarios and tasks. We recorded the work on the AC&NL Tutor to facilitate the availability and processing of the results. Upon task completion in the AC&NL Tutor, both user groups filled out the Memorability test, the Subjective satisfaction questionnaire, and the User experience questionnaire. The learner group also filled out the Learning experience questionnaire and the post-test. Finally, semi-structured interviews with both user groups concluded the sessions..

Scene 32 (23m 13s)

[Audio] The learner group evaluation session consisted of one scenario with eight tasks. While the main focus of Scenario 1 was the task to learn the prepared material in the Tutomat, other tasks included the AC&NL Tutor's basic functions (e.g. logging in) and navigation through the interface options (e.g. checking the learning analytics dashboard). GEW shows that most students experienced positive valence of emotions. Concerning overall experience with the AC&NL Tutor, most learners reported positive feedback regarding the interface (e.g. easy to use and navigate, simple, and intuitive). One learner found the system frustrating but compelling as it required memorizing and inserting exact words. What learners liked the most about the system was its simplicity, clearness, access to analytics, visualization, and advancement from beginner to expert. The things that the learners did not like about the AC&NL Tutor were the organization and clarity of concept maps, lack of clear instructions concerning responses, duplicated questions, uncertainty about what kind of answer would fit, and no lesson goals. Some learners were surprised with the system's functionalities, i.e. generated sentences, questions, answers, and graphs for different levels of learning and how well they could memorize what they briefly read. What caused learners' frustration was the use of new technology, the recording of learners' work, the required precision of answers and technical problems (same two screens, last question not loaded)..

Scene 33 (24m 53s)

[Audio] The teacher group evaluated the AC&NL Tutor by following three scenarios and fifteen tasks. While Scenario 1 aimed to evaluate the SAAT and its possibilities to generate sentences, questions and domain knowledge graphs, Scenario 2- 3 investigated how teachers managed to administer instructional units and courses in the AC&NL Tutor. GEW shows that most teachers experienced positive emotions. As for teachers' experience with the AC&NL Tutor, some reported it was confusing and lengthy. Others found it easy to use, intuitive, smooth, appealing, and enjoyable. What teachers liked the most about the system was automatic question generation from sentences, undemanding to figure out, not cluttered interface, and not too many unnecessary options or customizations. The things the teachers did not like about the system were the non-intuitive icon or pop-up option for adding a lesson, failure to understand how the system can help one become a better teacher and some confusing functionalities. While some teachers reported no surprises in using the system, some failed to understand its purpose or were confused with the video. Others found domain knowledge graphs surprisingly accurate and quickly adaptable to changes in sentence structure. What caused frustration for teachers was the unavailability of the list of users added to the course, uncertainty about how to download the entire instructional unit diagram, different naming conventions between the tools and the instructions, and failure to understand the system completely..

Scene 34 (26m 33s)

[Audio] The only criteria that did not fully achieve the desired goal were Successful task completion rates. In Scenario 1 of the learning session, there was only one task with a Successful task completion rate below 78%. That task included checking the learning analytics dashboard in the middle of the learning process in the Tutomat and none of the learners did it. As for the teacher group's Scenarios, there were nine tasks with a Successful task completion rate below 78%. Too complex tasks for teachers included updating the given sentence to generate the correct output and administering the instructional unit and the course. Besides the task completion rate, the below 50% minimums of 95% confidence intervals confirmed the problems with the previous tasks. For the other assignments with acceptable Successful task completion rates, we expect that the presented 95% confidence intervals of our general user population will complete the task with no error. The obtained results suggested that the user's effectiveness in using the AC&NL Tutor did not meet the expectations and the enhanced future version should strive to meet them..

Scene 35 (27m 47s)

[Audio] Learning analytics dashboards are becoming increasingly commonplace within the educational sector with the aims of improving the quality of the learning process. Our Learning Analytics Dashboard ( LAD-s) prototype is created as a Moodle Block which may be added to any course in Moodle. Students access the LAD-s from the Moodle course page. There are three components ( Activity Component, Success Component and Prediction Component) in the LAD-s system which generate feedback presented furthermore to the student in the Dashboard component. All the components are programmed in the PHP programming language. Prediction Component also use Python for integrated Machine Learning algorithms for prediction of student's success. In addition, HTML, CSS, and JavaScript are used for front-end development. The objective of this research is to introduce the LAD-s to the student's and to see how students evaluate it..

Scene 36 (28m 49s)

[Audio] This study is conducted on a population of 228 student's ( 175 first year undergraduate students and 53 first year graduate students) all taking courses organized in a blended learning modality, which takes advantage of both face-to-face and online learning. Some course sections are done face-to-face, with some sections online. A survey was submitted at the end of the previously mentioned courses to the students who used LAD-s, in order to evaluate the presented tool. A survey was designed to examine students' opinion about the LAD-s, that included student's self-awareness, influence of the dashboard on learning effectiveness, satisfaction with the type of data collected, usefulness and ease-of-use, intention to use the learning analytics dashboard. The survey was adopted from recent work on the use of learning analytics dashboards as a decision support tool. Scheffel presents evaluation framework for Learning Analytics. This framework has four dimensions ( Data, Awareness, Reflection and Impact). Learning Analytics should stimulate the self-regulating skills of the learners and foster awareness and reflection processes for learners and teachers. We adopted Behavioral changes to see if LAD-s could have a better impact on changes during learning process. Perceived usability affects greatly students learning effectiveness and overall learning experience. In order to research degree of learner's opinion on usability of LAD-s, SUS questionnaire was adopted. Using the principles of technology acceptance model we explored students' acceptance of LAD-s. Participant's opinion and suggestions on LAD-s include questions to research what students thinks generally about LAD-s. Two open-ended questions are set to students to have the opportunity to offer something to add without currently being on the LAD-s and to give their suggestions for improvement in future. This study found that student's overall satisfaction on LAD-s is above average satisfied with all examined aspects. The greatest satisfaction is found for Ease-of-use. Data, Usefulness, Sus questionnaire, Behavioral intention and Satisfaction with individual functions of this tool. Lower, yet above-average satisfaction was obtained for the Influence of the LAD-s on more effective learning, Intention to use and satisfaction with the possibility of Behavioral changes. The impression remains that students will use LAD-s for the purpose of making better progress..

Scene 37 (31m 30s)

[Audio] We had some issues with project staff due to some personal situations (one project staff member was on her maternity leave, two project staff members decided not to work on the project anymore). Therefore, we plan to apply for one-year non-cost extension.

Scene 38 (31m 48s)

[Audio] We will develop an effective and efficient intelligent tutoring system that targets learners from all educational levels: elementary school, high school, undergraduate, graduate and postgraduate university learners as well as navy recruits This tutor will allow cost-effective deployment and support of intelligent tutoring systems and improve effectiveness of learning and teaching process This tutor will improve the training effectiveness for the Navy.

Scene 39 (32m 18s)

[Audio] Our plans for the next reporting period are: Enrich courseware elements with the DBPedia structured information Since we have already connected around 7000 WN entries to DBpedia, but that is around 5% of the whole database, we plan to connect as much as possible WN concepts to have as many of information as we can connected to the DBPedia. With this information we can create enrich courseware elements with information in the Wordnet to enhance the knowledge with additional structured information that is maybe implied but not explicitly written inside the courseware elements. Generate concept map and questions from semantic hypergraph. Semantic hypergraph is an input to concept map generation. Since concept map is graph-based representation, we need to prune some hyperedges in order to keep the most important concepts and their relations and to facilitate binary relations. We plan to develop algorithms for automatic student assessment that automatically generates questions based on the instructional unit semantic hypergraph. Parts of the sentences that will be questioned are concepts, relations and relation cardinalities. Define different difficulty levels of automatically generated sentences and questions. It will be investigated how different difficulty levels of automatically generated sentences and questions are perceived by teachers and learners. Based on the results, the automatic measures for defining the difficulty levels in the AC&NL Tutor's environment will be proposed. Implement text simplification and summarization tools and algorithms. These algorithms present a first phase in building adaptive courseware. Courseware in the EAC&NL tutor will be adapted to a learner stereotype model. Application of the probabilistic learner model. With the aim to make the tutoring process more efficient, the probabilistic learner model for the AC&NL Tutor's environment will be proposed. The stereotype-based and the Bayesian inference-based learner models will be compared using the measures of the total number of generated questions and the total time needed to master the knowledge at the highest level..

Scene 40 (34m 37s)

[Audio] Here are some of our submitted and published publications.

Scene 41 (35m 19s)

Recent Publications, Patents, Presentations, Demonstrations and Awards.

Scene 42 (35m 48s)

Timeline Description automatically generated. EAC&NL Tutor Ani Grubišić and Branko Žitko, Faculty of Science N000142012066, 2/1/2020 – 1/31/2024.