Student data, education and privacy are not new topics, but in 2014 we seemed to debate them everyday. Has student data privacy suddenly become fashionable? Maybe. It certainly dominated the headlines and students and parents are now central participants in the conversation.
So what happened to make Student Data Privacy the “it” thing?
Possibly the biggest story of the year was inBloom’s collapse. inBloom stood to be an efficient, scalable and economic solution for school districts that did not have adequate infrastructure for hosting and managing student data. Arguably the most misunderstood project in 2014, inBloom shut its doors in April. Was student data privacy saved? I don’t think so. Underfunded schools continue to lack the resources wealthy districts have when implementing student dashboards. And while inBloom closed, most school districts are still struggling with the same privacy concerns that existed before inBloom appeared on the scene.
Google began the year wrapped in controversy when we read how they were “data mining” student emails sent through their Google Apps for Education package. There was a lawsuit, but most importantly, Google changed their Terms of Service and stopped scanning emails for the purpose of targeted advertising. This is positive news particularly in light of the New York City Department of Education’s recent announcement that it had approved the use of Google’s Chromebooks for use in the city’s public classrooms. That is about 1.1 million students in NYC using Google products. With so many students using Apps for Education parents need to be assured their children’s emails will not be used for any other purpose other than what the school environment calls for.
New York state also contributed to the debate in 2014 by appointing the first State Chief Privacy Officer for Education. This was big news and it ought to be a valuable resource for parents and schools. Uniform best practices across the state should certainly benefit most students. However, here’s the rub, NY still doesn’t have a permanent appointee to the position. So we must wait and see. New York also issued a Parent’s Bill of Rights for data privacy. Unfortunately it fails to dig into the issues of data ownership, opt out consequences and parental consent for certain data collection. But it is a step in the right direction, and I am hopeful that during the year to come NY’s CPO will clarify the many questions surrounding the handling and protection of student data.
Even lawmakers paid attention. In 2014, 110 education data privacy bills were introduced and 28 were signed into law. Most noteworthy, California’s Student Online Personal Information Protection Act was called the first law in the nation to strengthen privacy protections for the personal information of California students and permit innovation in education and technology. Senators Edward Markey (D-Mass) and Orrin Hatch (R-Utah) introduced the Protecting Student Privacy Act, an effort to update the 40 year-old FERPA student privacy law. Not without controversy, some argue the bill is not strong enough, others argue it is unnecessary to revamp FERPA. Unfortunately, the bill has not moved, remaining in the “let’s see what happens” stage. And though the federal Department of Education issued guidelines on student data privacy, most answers in the guidelines boiled down to “it depends” and we’re still trying to figure it out.
In 2014, ed-tech companies were debated, attacked and praised. Most didn’t think they were doing anything wrong until parents brought their concerns to their attention, forcing them to take a closer look at their privacy policies.
In response to that, the Student Data Privacy Pledge came into being. Once again, not without controversy. An initiative from the Future of Privacy Forum and the Software & Information Industry Association, the pledge commits service providers to the secure handling of data for K-12 students. While some privacy advocates argue the Pledge is a mere PR move because it is not a legally binding document, I’d argue that if companies violate their own public representations they would be subject to enforcement by the Federal Trade Commission for deceptive trade practices. So there’s that.
Near and dear to my heart the website FERPA|Sherpa was launched, aimed at providing a one-stop shop for education privacy-related information of interest to parents, schools, education service providers and policymakers . And I started to write my blog (shameless plug) which has allowed me to dig and opine on topics of great interest to me.
There was a lot of discussion of student data privacy in 2014. It does seem this was the year student data privacy became fashionable. I hope it’s a trend that is here to stay. A timeless classic…
The challenges of student data privacy and the opportunities ahead – my conversation with the National PTA
Last week, I was invited to participate at an event organized by the National PTA in conjunction with Microsoft to discuss the complex issues surrounding the student data privacy debate. The goal of the event was to equip PTA members with tools to become trusted messengers and champions of student data privacy. I was able to share my experiences and perspective on student data privacy as a parent.
There were many topics discussed that day such as how data are used, data sharing, privacy policies, the Cloud and Big Data. It was an ambitious conversation but one focused on balancing the benefits of collecting student data while ensuring this information is kept private and secure. I was also able to deliver the message that a class of 5th graders eloquently expressed to me – that we need to be smart about protecting student data because it is important to them.
With that in mind we talked about what Big Data sets in State Longitudinal Databases can do for students and what it cannot do for an individual student. Big Data sets tell important stories and it’s valuable information. We discussed how this data allows us to identify student needs like providing free lunch to kids who need it or how can we address chronic absenteeism, as well as how studies that are generated from Big Data sets allow us to identify issues of discrimination and bias in schools. Recognizing we need the data, how can we properly de-identify it so that it cannot be back mapped to a particular student. Maximum protection of data is important but a big concern from the group is that we provide adequate training to those allowed to work with student data.
Having privacy protections is not enough if we do not have adequate training for school staff on what information can be disclosed to others not only while a child is in school but years after they have left the school. And should we be addressing “data term limits” as to how long student data are retained after a child finishes high school. We discussed the ethical uses of data and what are beneficial uses to help our most vulnerable learners. For example, English language learners might not understand how the data can be used to help them address their different learning needs. How can we assure them that the data will be ethically used and only disclosed to those that are allowed to see the information in order to help them?
Or how does having information on students with learning disabilities help schools address their issues as learners with different needs. How can we take the data and use it to help students but give them the assurance that their disability will not be disclosed to those who do not need the information. The biggest challenge we agreed on, was building trust between schools, parents and service providers that student data are being used responsibly. By building trust we can then focus our conversation on students and providing them with tools to decide what data is collected, how it can be used and ultimately who has access to it.
Focusing our conversation on these issues helped us discuss students and their needs, data and its ethical use, and being responsible custodians of data. At the end of the day, we recognized that having adequate training and materials to support those working with student data is important. But just as important is recognizing we need the data to help students not only at a macro level with longitudinal databases but at a micro level in schools every day for individual students. I was encouraged that our conversation focused on how we can help students and protect their privacy. Our conversation was rich in addressing the issues surrounding student privacy and ensuring that schools and parents understand the importance of being responsible custodians of data. We moved away from what data should and should not be collected and focused on helping students and empowering students to be effective digital citizens.
Big data can be messy and complicated or elegantly simple. Many big data projects begin from the need to answer specific questions and with the right analytics in place organizations can find actionable insights into their operations. In addition, big data allows for different variations of computer aided designs to check how even minor variations can affect outcomes. Big data projects can obtain, process and analyze data in a variety of ways. Every data source has different characteristics and provides valuable information. With this goal in mind the National Science Foundation awarded a $4.8 million grant to an education project called LearnSphere. LearnSphere is poised to hold large amounts of anonymous student information that is routinely collected for different data analysis purposes. Most importantly, it will allow for large scale analysis down to “being able to detect emotional states from keystroke data”
So what does all this mean? I had a few questions and Dr. Kenneth Koedinger, who is spearheading this effort, was open to address my questions and concerns. Ken is a professor of Human Computer Interaction and Psychology at Carnegie Mellon University. He has an M.S. in Computer Science, a Ph.D. in Cognitive Psychology, and experience teaching in an urban high school.
Although I admire the ambitions of the project, aiming for a deeper understanding of the learning process, my initial concern was the lack of individual student acknowledgment. Data analytics can be a great tool if we need to raise efficiency in production or predict consumer behavior but students must not merely be viewed as products to be improved upon. How do we ensure that the use of such data does not aggravate existing bias and discrimination in education? What measures shall protect our most at risk students– students with learning differences, students marginalized for their race, religion or nationality. And while the project insists that kids are not numbers, the data and information generated from students are looked at as numbers, so that the data and information can be looked at in the most unbiased way possible.
And this is where the project gets interesting. First, the data used for research has been de-identified. Demographic information has been removed from data sets researchers are looking at. The shared data sets are randomly assigned new identifiers and they do not indicate race, geographic location or school the information is coming from. Measures have been taken to make it difficult to tag records back to particular students. The project maintains there is no back mapping to the native records.
The main objective for LearnSphere is to improve student outcomes. What are the difficult parts of learning a course? Why do students have such a difficult time with certain mathematical problems and not others? For example, some might predict that math word problems will be harder for students to work through but as it turns out, some students did better with a math problem that used language instead of just looking at an equation on a piece of paper. The project will look at the learning barriers some students have and how can we use this information to improve learning designs. Where are students now in the learning process and how can we be sure we provide as much support as we can with the data we possess. As Dr. Koedinger clarified it for me “we are studying the terrain on a hiking course rather than the hikers” so how can a study like this make the terrain easier for our hikers? Well, LearnSphere aims to help us all identify those “expert blindspots.” For example, a teacher will know all his / her students but there are spots a teacher just doesn’t see because of closeness to the students. LearnSphere aims to help teachers create effective teaching environments so that our kids can hike the next hill a bit easier.
I still have some reservations when studying vast amounts of data. If data sets include student papers and extensive student data, the risk for re-identification exists. If we are studying data with such finite precision, at what point can the source of the data be tagged back to particular students? There can be no guarantee that data will be fully non-identifiable without an independent qualified expert review and approval of aggregation methodology. Leading de-identification experts need to be involved with a project of this scale. I also hope Dr Koedinger will take advantage of the opportunity to work with two of the most highly respected experts on privacy, Professors Lorrie Cranor and Allessandro Aquisti, both who happen to be located nearby at Carnegie Mellon University.
I advocate for a smart and ethical collection of data for I see the potential benefits of its use. We must remain mindful of the privacy issues raised in such a project as this. Will we truly be making it a more equitable educational system or will the algorithms of big data systematically discriminate against those learners already at a disadvantage? Projects like this can help, but only if the concerns of potential discrimination are carefully considered and the goal of helping individual students is paramount.
We ran out of time (or phone battery) when we spoke to Ken but there will be more information clarifying some of the still outstanding questions I have. Stay tuned…
In the meantime, here is a short video of Ken explaining his Learning Project