Understanding Algorithmic Operations and Their Challenges
Written on
Chapter 1: The Nature of Algorithms
Algorithms operate without the need for rest or downtime. They can generate results that translate into meaningful insights at any hour. This capability is often celebrated by customers, stakeholders, and entrepreneurs alike.
Algorithms function with two basic states: active and inactive. When set to 'active,' we can relinquish control and observe as algorithms carry out tasks on our behalf. However, it's crucial to clarify that algorithms do not act autonomously. They require precise instructions from humans, and they can make both anticipated and unexpected errors. Ultimately, people are responsible for enabling these errors.
Management of algorithms involves deciding when to activate or deactivate them. Although this oversight can be exhausting, it remains unavoidable. Despite ongoing attempts to mitigate the burden, stories of algorithms malfunctioning—such as misclassifying marginalized individuals, producing nonsensical chatbot responses, or autopilot systems issuing incorrect signals—are increasingly prevalent in the media. These issues affect not just the wealthy or famous but also everyday people, including our friends and family.
Increasing societal pressure demands greater transparency from algorithms, systems, and platforms. Big Tech’s guidelines on discussing AI ethics define transparency as revealing data and code, a challenge given the proprietary nature of much information. Many in the tech and data fields argue that true transparency is elusive, as algorithms make decisions based on provided data, and tracking every potential algorithmic outcome is virtually impossible. Nevertheless, when algorithms produce erroneous results, those in charge often find ways to retrace their steps, particularly when profits are at stake. Mistakes can occur at any stage of an algorithm’s lifecycle.
To effectively trace algorithms and identify errors, one must understand both the algorithm's inputs (data) and outputs (expected results). The processing that takes place within the algorithm can be easier to monitor when there's a clear expectation for valid outputs based on predetermined inputs. Rapid identification of algorithmic failures is critical to minimizing the time taken to switch the algorithm into 'inactive' mode.
“When algorithms produce incorrect outputs, rectifying the issue may take time.”
Focusing on data transparency is essential. We aim to uphold the ACID (Atomicity, Consistency, Isolation, Durability) principles, ensuring that our data is error-free and well-structured. However, the data we engage with often falls short of these ideals. Part of the challenge lies in formatting data to fit our required structures, while another aspect involves establishing transparent procedures for assessing the regulations governing data and datasets. It's vital that our data comes from reliable sources, instilling confidence in its validity.
One method to assess the transparency of datasets is by establishing clear metrics for data quality. Datasets provide a structured format with rows and columns that hold specific values. Data quality focuses on achieving a balance across six key elements: accuracy, consistency, completeness, relevance, timeliness, and validity. When these elements are adequately met, team members can significantly improve their output. Below are critical questions and metrics for each element:
- Completeness: Ask, "Is the necessary data present?" Calculate: 1 - (number of cells with missing data / (rows * columns)).
- Accuracy: Ask, "Does it accurately reflect reality?" Calculate: 1 - (number of erroneous cells / (rows * columns)).
- Relevancy: Ask, "Is it useful or valuable to the topic?" Calculate: (number of useful columns / number of columns).
- Timeliness: Ask, "Can we access it when needed?" Calculate: 1 - (adjusted time lapse between need and availability).
- Validity: Ask, "Is the data consistent across teams? Are there duplicates?" Calculate: a value [0,1] indicating if data lineage documentation is current.
- Consistency: Ask, "Does the information represent its intended purpose?" Calculate: completeness score + accuracy score + relevancy score + timeliness score + validity score.
Each element can be reduced to a straightforward metric, and the final calculation for consistency leverages these metrics, providing a solid foundation for assessment. This approach offers a starting point for organizations to enhance their data handling practices. Identifying mistakes in data can help prevent wrongful associations with errors in algorithms, allowing for a more comprehensive evaluation of both data quality and algorithmic performance.
Chapter 2: The Importance of Transparency in Algorithms
In this video, the speaker discusses how algorithms influence daily life decisions, from eating habits to sleeping patterns.
This video features a Harvard professor breaking down algorithms in five levels of difficulty, making the concept accessible to all audiences.